visit
In recent years, artificial intelligence (AI) has been the subject of intense exaggeration by the media. The Machine Learning and Deep Learning in Spanish Machine Learning (AA) and Learning Deep (AP), with the IA, have been mentioned in countless articles and media regularly outside the realm of purely technological publications. We are promised a future of smart chat bots, autonomous cars and digital assistants, a future sometimes painted in a gloomy tint and other times in a Utopian way, where jobs will be scarce and most economic activity will be managed by robots and machines. embedded with AI.
For the future or current Machine Learning practitioner, it is of vital importance to be able to recognize the signal in the noise, so that we are able to recognize and spread about the developments that are really changing our world and not the exaggerations commonly seen in the media. Communication. If, like me, you are a practitioner of Machine Learning, Deep Learning or another field of AI, we will probably be the people in charge of developing those intelligent machines and agents, and therefore, we will have an active role to play in this and future society.
First of all, let's clearly define what we are talking about when we talk about artificial intelligence. What is Artificial Intelligence, Machine Learning and Deep Learning? And what is the relationship between these terms?
Artificial Intelligence
Artificial intelligence (AI) was born in the 1950s at the hands of some pioneers in the nascent field of computer science. These pioneers then began to wonder about being able to get computers to "think." Therefore, a concise definition about the field of AI would be: the effort to automate intellectual tasks normally performed by humans. As such, AI is a general field that contains Machine Learning (AA) and Deep Learning (AP), but also includes other types of sub-fields that do not necessarily involve “Learning” as such.
The first programs that played chess, for example, only involved rigid rules created by programmers, so they do not qualify as machine learning. For a long time, many experts believed that human-level AI could be achieved by having programmers create by hand a set of rules large enough to manipulate knowledge and thus generate intelligent machines. This approach is known as symbolic AI , and it was the paradigm that dominated the AI field from 1950 to the late 1980s and peaked in popularity during the Expert Systems boom in 1980.
Although symbolic AI proved to be suitable for solving logical and well-defined problems, such as playing chess, it became intractable to find explicit rules for solving much more complex problems, such as image classification, voice recognition, and translation between languages. natural (such as Spanish, English, etc., other than non-natural languages such as programming languages). A new approach then emerged to take the place of symbolic AI: The Machine Learning and Machine Learning .
Machine Learning
In Victorian England, around 1840 and 1850, Charles Babbage invented the Analytical Engine : The first general-purpose mechanical computer. It only computed operations mechanically in order to automate the computation of certain operations in the field of mathematical analysis, hence its name Analytical Engine. However, this analytical engine did not have the pretensions to originate something new, it could only do what it was ordered to compute, its only purpose was to assist mathematicians in something they already knew how to do.
Then in 1950 Alan Tuning introduced the Turing test, and concluded that general-purpose computers might be able to "Learn" and "be original." The AA then arose from questions such as:
Can a computer go beyond what we order it to do and learn by itself how to perform a specific task? Could a computer surprise us? And, instead of programmers specifying rule by rule how to process data, could a computer automatically learn those rules directly from the data we passed to it?
The question opened a new door to a new programming paradigm. Unlike the classic symbolic AI paradigm, where humans inject rules (a program) and data to be processed according to these rules in order to obtain responses at the exit of the program, with Machine Learning or Machine Learning, humans pass the data as input as well as the expected responses of said data in order to obtain at the output the rules that allow us to do the effective mapping between inputs and their corresponding outputs. These rules can then be applied to new data to produce original responses, that is, generated automatically by the rules that the system "learned" and not by rules explicitly coded by programmers.A machine learning system is " trained" instead of being explicitly " programmed" . Many examples relevant to the task at hand are presented to this system and it finds the statistical structure or patterns in those examples that eventually allow the system to learn the rules to automate said task. For example, if we wanted to automate the task of tagging our vacation photos, what we would do is pass many examples of photos already tagged by humans to the AA system and the system would learn the statistical rules that would allow it to associate specific photos with their respective tags. .
However, although AA began to be considered since the 1990s, it has become the most popular and successful AI sub-field, a trend driven by the availability of better hardware and giant data sets. . AA is strongly related to mathematical statistics, however it differs from statistics in several ways. Unlike statistics, AA tends to deal with large and complex data sets (which can contain millions of images each with thousands of pixels) for which the classic statistical analysis (such as Bayesian analysis) would be totally impractical. As a result, AA, and especially Deep Learning, shows little mathematical theory, compared to the field of statistics, and are considered more as engineering-oriented fields. That is, AA is an applied discipline, in which ideas are tested much more often empirically than theoretically.
Input data : For example, if the task is voice recognition, this input data would be sound files or recordings of people talking. If the task is image tagging, this data could be photos or images.Examples of what is expected as output : In the speech recognition task, these could be human-generated transcripts of the audio files. In the image tagging task, the expected outputs can be tags such as "dog", "cat", "person", etc.One way to measure if the algorithm is doing a good job : This step is necessary to determine the distance or offset between the current output generated by the algorithm and the expected output. This measurement is used as a feedback signal to adjust the way the algorithm works and updates. This adjustment step is what we call “Learning”.
These ingredients by themselves are fundamental to all kinds of AA and AP algorithms. With these ingredients we will now explore what an AA and AP algorithm really does with them to produce results that look like they came out of fictional stories.An AA model transforms its input data into meaningful responses, a process that is "learned" from exposing that model to previously known examples of corresponding inputs and outputs. Therefore, the central problem in AA and AP is learning useful representations of the input data , representations that bring us closer to generating or predicting the expected outputs.
Before going further, let's answer the question, what is a representation ? At its core, a representation is a different way of viewing data, a different way of representing or encoding data. For example, a color image can be encoded in RGB (red-green-blue) format or in HSV (hue-saturation-value) format: These are two different representations of the same data. Some tasks that may be more difficult using one of those representations can be made much easier by using the other representation.
For example, the "select all red pixels in one image" task is much simpler in the RGB format while the "make the image less saturated" task is simpler in the HSV format. AA models are designed to find the most appropriate representations of the information they receive as input, transformations of the data that make them enjoyable for the task at hand, such as the image classification task, for example.
Let's make this a little more concrete using the following example. Let's consider an xy cartesian plane with some points represented by their respective coordinates (x, y) as shown in the following image:In this new coordinate system, the coordinates of our points can be said to be a new representation of our original data. And in this case it is a very good representation! With this new representation, the classification problem between black and white points can be expressed with a simple rule: "Black points are which x> 0 " and "White points are which x <0 ". This new representation basically solves the classification problem.
In this case, we define the coordinate change by hand. But if instead we try to systematically search for different possible coordinate changes, and use the percentage of correctly classified points as feedback, then we will be doing ML / AA. “Learning” in the AA context describes the process of automatically searching for the best and most useful representations for our data.
All AA algorithms consist of automatically finding these representations that convert input data into much more useful representations of them for a specific task. These operations can be changes of coordinates, linear projections, translations, nonlinear operations, etc. AA algorithms are not usually creative in finding these transformations, they are merely searching through a predefined set of operations, that set is called the hypothesis space.
So, that's what AA really is, technically: Searching for useful representations of the input data, within a predefined space of possibilities, using a feedback signal as a guide that allows us to make viable predictions of the expected outputs. This simple idea allows solving a wide range of intellectual tasks, from voice recognition, computer vision and even autonomous cars.
Now that we have understood what “Learning” means in the context of AA, let's look at what makes Deep Learning so special.Deep Learning or Deep Learning is a specific sub-field of Machine Learning: A new attempt to learn ideal representations of data in which an emphasis is placed on learning these representations in succession through what are called layers . The term “ Deep ” in Deep Learning does not make any reference to a type of deep understanding achieved through the use of this type of approach, instead, the term represents the idea of successive and hierarchical representation of data by layers . The number of layers that contribute to a model is called the "model depth".With this in mind, other appropriate names for this approach could be " Layered Representational Learning " or " Hierarchical Representational Learning ".
Modern AP models normally involve tens or hundreds of successive layers of representation, and all the parameters they contain are automatically learned by exposing these models to so-called training data . Meanwhile, other approaches in AA tend to focus on learning using only one or two layers of representation for their data, therefore these types of approaches are called Shallow Learning models , the opposite of Deep Learning or Deep Learning.
In the AP, these layered representations are almost always learned through models called Neural Networks , which are literally structured in layers stacked one after the other. The term Neural Networks is a reference to neurobiology, but although some of the core concepts in AP were developed in part from the inspiration drawn from our understanding of the brain, AP models are not brain models. There is no evidence that the brain implements some of the learning mechanisms used in modern AP models.
Many of us have come across articles and magazines proclaiming that AP models work like the human brain or that they were modeled based on the human brain, but that is not the case. It might be confusing and counterproductive for newbies entering this subfield to think that AP is in any way related to neurobiology. For our purposes, the AP is a mathematical framework for representational learning of data.
To gain a little more insight into what the representations learned by an AP algorithm look like, let's examine how a network with several layers of depth transforms an image of a handwritten digit in order to recognize what digit it is on its way out.As we can see in the previous figures, the network transforms the image of the digit into representations that are increasingly different from the original image and in turn more and more informative regarding the final result. We could think of a “Deep Neural Network” as a multi-stage information distillation operation, where information flows through successive filters and comes out increasingly purified , that is, much more useful with respect to a specific task that we want to solve, in this case the recognition of digits from images of said digits written by hand.
So that's what the AP is, technically: A multi-stage way of learning representations of data . It's a simple idea, but it turns out that very simple mechanisms, being scaled enough, can end up looking like magic.
At this point, we already know that AA is about mapping inputs (as images) to targets (such as the “cat” tag), and that it is done by looking at many examples of inputs and their corresponding targets. We also know that Deep Neural Networks ( Deep RNs) perform this mapping of inputs to targets by applying many (successive depths of the network) simple and successive transformations of the data (through the layers of the network), and that these transformations of the data is learned by exposing the network to many input-target examples . Let us now look at how specifically this " learning " occurs in these deep Neural Networks.
The specification of what a layer should do to its inputs is stored in the so-called layer weights , which are essentially a bunch of numbers. In technical terms, we would say that the transformation performed by a layer to its input data is parameterized by its weights . These weights or weights are then called the parameters of the layer in question. In this context, “ Learning ” means finding that set of numerical values for each of the weights in each of the layers of the RN, so that the RN is able to correctly map the inputs of our examples with their corresponding objectives.
however, to control something, we must first be able to observe it. To control the output or response of our RN, we need to be able to measure how deviant its outputs or predictions are from the expected or target outputs. This is the work of the Cost Function or Loss Function of the RN, sometimes also called the Objective Function. This loss function is responsible for taking the predictions delivered by the RN along with the true objective (what we want the RN to produce) and then computing the deviation value or score , capturing how well the RN has done its job of prediction for that specific example, as we see in the following figure.
Image Source:
The fundamental trick in the AP is to use this deviation value as a feedback signal to adjust the value of each of the weights a little, in the direction that the loss value decreases for the current example, as shown below:
The adjustment of the value of each of these weights or network parameters is the work of the Optimizer, which implements what is called the Backpropagation or Reverse Propagation Algorithm . This algorithm is one of the vital algorithms , if not the most vital, as far as the AP is concerned.
Initially, the weights of the RN are assigned with random values, so that initially the RN will merely implement a series of random transformations. Naturally, the RN outputs at this starting point will be far from what they should ideally be, and the loss value will be very high as well. But, with each example that the RN processes, each of the weights is adjusted a little in the correct direction, that is, in the direction in which the loss value decreases.This is the so-called Training Loop , which, repeated enough times (typically dozens of iterations over thousands or millions of examples), will produce the values for each of the RN weights that will minimize the loss function . An RN with minimal loss is a network in which its outputs or predictions are as close as possible to the true objectives: Then we will have a Trained Network .
In particular, no talk about General Artificial Intelligence on a human level should be taken very seriously. The risk with high short-term expectations is that, as technology fails to deliver results, investment in research will gradually stop, slowing progress for a long time.
This has happened before. Twice in the past, AI has entered a cycle of intense optimism followed by one of disappointment and skepticism, resulting in under-investment. It first started with symbolic AI in 1960. In those early years, projections about AI flew high and some pioneers in the field predicted that in 10 years the creation of general artificial intelligence would be a solved problem. However even today in 2019 that milestone seems to be far from being reached, so far that we are not yet able to predict when it will happen. Years later, seeing that these high expectations failed to materialize, the investment of researchers and the government moved away from the field of AI, marking the beginning of what was called the first winter of AI .
And this would not be the last. In 1980, a new attempt at symbolic AI, this time by the field of expert systems , began to gain traction among large companies. A few initial success stories fueled a new wave of investment, with corporations across the world starting their own internal AI departments to develop these expert systems. Around 1985, companies were spending close to $ 1 trillion per year on this technology. However, in the early 1990s, these systems proved to be difficult to maintain, difficult to scale, and limited in their operating range, so interest in expert systems was slowly dying. That's where the AI's second winter originated .
We could currently be in the third cycle of exaggeration and disappointment in the field of AI, even though we are now in the phase of intense optimism. It is better to moderate our expectations in the short term and make people more familiar with the technical side of the field, so that they have a clear idea of what the AP can and cannot deliver to us.