As a subset of artificial intelligence, deep learning lies at the heart of various innovations: self-driving cars, natural language processing, image recognition and so on. Companies that deliver DL solutions (such as Amazon, Tesla, Salesforce) are at the forefront of stock markets and attract impressive investments. According to Statista, the total funding of artificial intelligence startup companies worldwide in 2014–2019 is equal to more than $26 billion. This high interest can be explained by the amazing benefits of deep learning and its architectures — artificial neural networks.
What is deep learning?
Deep learning is one of the subsets of machine learning that uses deep learning algorithms to implicitly come up with important conclusions based on input data.
Usually, deep learning is unsupervised or semi-supervised. Deep learning is based on representation learning. Instead of using task-specific algorithms, it learns from representative examples. For example, if you want to build a model that recognizes cats by species, you need to prepare a database that includes a lot of different cat images.
The main architectures of deep learning are:
- Convolutional neural networks
- Recurrent neural networks
- Generative adversarial networks
- Recursive neural networks
We are going to talk about them more in detail later in this text.
Difference between machine learning and deep learning
Machine learning attempts to extract new knowledge from a large set of pre-processed data loaded into the system. Programmers need to formulate the rules for the machine, and it learns based on them. Sometimes, a human might intervene to correct its errors.
However, deep learning is a bit different:
Deep learning | Machine learning |
---|---|
large amounts of data | small datasets as long as they are high-quality |
computation-heavy | not always |
an draw accurate conclusions from raw data | carefully pre-processed data |
take much longer to train | can be trained in a reduced amount of time |
you can't know what are the particular features that the neurons represent | logic behind the machine’s decision is clear |
can be used in unexpected ways | algorithm is built to solve a specific problem |
Advantages of deep learning
Now that you know what the difference between DL and ML is, let us look at some advantages of deep learning.
- In 2015, a group of Google engineers was conducting research about how NN carry out classification tasks. By chance, they also noticed that neural networks can hallucinate and produce rather interesting art.
- The ability to identify patterns and anomalies in large volumes of raw data enables deep learning to efficiently deliver accurate and reliable analysis results to professionals. For example, Amazon has more than 560 million items on the website and 300+ million users. No human accountant or even a whole army of accountants would be able to track that many transactions without an AI tool.
- Deep learning doesn’t rely on human expertise as much as traditional machine learning. DL allows us to make discoveries in data even when the developers are not sure what they are trying to find. For example, you want your algorithms to be able to predict customer retention, but you’re not sure which characteristics of a customer will enable the system to make this prediction.
Problems of deep learning
- Large amounts of quality data are resource-consuming to collect. For many years, the largest and best-prepared collection of samples was ImageNet with 14 million different images and more than 20,000 categories. It was founded in 2012, and only last year, Tencent released a database that is larger and more versatile.
- Another difficulty with deep learning technology is that it cannot provide reasons for its conclusions. Therefore, it is difficult to assess the performance of the model if you are not aware of what the output is supposed to be. Unlike in traditional machine learning, you will not be able to test the algorithm and find out why your system decided that, for example, it is a cat in the picture and not a dog.
- It is very costly to build deep learning algorithms. It is impossible without qualified staff who are trained to work with sophisticated maths. Moreover, deep learning is a resource-intensive technology. It requires powerful GPUs and a lot of memory to train the models. A lot of memory is needed to store input data, weight parameters, and activation functions as an input propagates through the network. Sometimes deep learning algorithms become so power-hungry that researchers prefer to use other algorithms, even sacrificing the accuracy of predictions.
However, in many cases, deep learning cannot be substituted.
How can you apply DL and NN to real-life problems?
Today, deep learning is applied across different industries for various use cases:
- Speech recognition. All major commercial speech recognition systems (like Microsoft Cortana, Alexa, Google Assistant, Apple Siri) are based on deep learning.
- Pattern recognition. Pattern recognition systems are already able to give more accurate results than the human eye in medical diagnosis.
- Natural language processing. Neural networks have been used to implement language models since the early 2000s. The invention of LSTM helped improve machine translation and language modeling.
- Discovery of new drugs. For example, the AtomNet neural network has been used to predict new biomolecules that can potentially cure diseases such as Ebola and multiple sclerosis. If you work in healthcare and consider introducing cutting-edge technologies in your practice, check out our biotech software development services.
- Recommender systems. Today, deep learning is being used to study user preferences across many domains. Netflix is one of the brightest examples in this field.
What are artificial neural networks?
“Artificial neural networks” and “deep learning” are often used interchangeably, which isn’t really correct. Not all neural networks are “deep”, meaning “with many hidden layers”, and not all deep learning architectures are neural networks. There are also deep belief networks, for example.
However, since neural networks are the most hyped algorithms right now and are, in fact, very useful for solving complex tasks, we are going to talk about them in this post.
Definition of an ANN
An artificial neural network represents the structure of a human brain modeled on the computer. It consists of neurons and synapses organized into layers.
ANN can have millions of neurons connected into one system, which makes it extremely successful at analyzing and even memorizing various information.
Here is a video for those who want to dive deeper into the technical details of how artificial neural networks work.
Components of Neural Networks
There are different types of neural networks but they always consist of the same components: neurons, synapses, weights, biases, and functions.
Neurons
A neuron or a node is a basic unit of neural networks that receives information, performs simple calculations, and passes it further.
All neurons in a net are divided into three groups:
- Input neurons that receive information from the outside world;
- Hidden neurons that process that information;
- Output neurons that produce a conclusion.
In a large neural network with many neurons and connections between them, neurons are organized in layers. There is an input layer that receives information, a number of hidden layers, and the output layer that provides valuable results. Every neuron performs transformation on the input information.
Neurons only operate numbers in the range [0,1] or [-1,1]. In order to turn data into something that a neuron can work with, we need normalization. We talked about what it is in the post about regression analysis.
Wait, but how do neurons communicate? Through synapses.
Synapses and weights
A synapse is what connects the neurons like an electricity cable. Every synapse has a weight. The weights also add to the changes in the input information. The results of the neuron with the greater weight will be dominant in the next neuron, while information from less ‘weighty’ neurons will not be passed over. One can say that the matrix of weights governs the whole neural system.
How do you know which neuron has the biggest weight? During the initialization (first launch of the NN), the weights are randomly assigned but then you will have to optimize them.
Bias
A bias neuron allows for more variations of weights to be stored. Biases add richer representation of the input space to the model’s weights.
In the case of neural networks, a bias neuron is added to every layer. It plays a vital role by making it possible to move the activation function to the left or right on the graph.
It is true that ANNs can work without bias neurons. However, they are almost always added and counted as an indispensable part of the overall model.
How ANNs work
Every neuron processes input data to extract a feature. Let’s imagine that we have three features and three neurons, each of which is connected with all these features.
Each of the neurons has its own weights that are used to weight the features. During the training of the network, you need to select such weights for each of the neurons that the output provided by the whole network would be true-to-life.
To perform transformations and get an output, every neuron has an activation function. This combination of functions performs a transformation that is described by a common function F — this describes the formula behind the NN’s magic.
There are a lot of activation functions. The most common ones are linear, sigmoid, and hyperbolic tangent. Their main difference is the range of values they work with.
How do you train an algorithm?
Neural networks are trained like any other algorithm. You want to get some results and provide information to the network to learn from. For example, we want our neural network to distinguish between photos of cats and dogs and provide plenty of examples.
Delta is the difference between the data and the output of the neural network. We use calculus magic and repeatedly optimize the weights of the network until the delta is zero. Once the delta is zero or close to it, our model is correctly able to predict our example data.
Iteration
This is a kind of counter that increases every time the neural network goes through one training set. In other words, this is the total number of training sets completed by the neural network.
Epoch
The epoch increases each time we go through the entire set of training sets. The more epochs there are, the better is the training of the model.
Batch
Batch size is equal to the number of training examples in one forward/backward pass. The higher the batch size, the more memory space you’ll need.
What is the difference between an iteration and an epoch?
- one epoch is one forward pass and one backward pass of all the training examples;
- number of iterations is a number of passes, each pass using [batch size] number of examples. To be clear, one pass equals one forward pass + one backward pass (we do not count the forward pass and backward pass as two different passes).
And what about errors?
Error is a deviation that reflects the discrepancy between expected and received output. The error should become smaller after every epoch. If this does not happen, then you are doing something wrong.
The error can be calculated in different ways, but we will consider only two main ways: Arctan and Mean Squared Error.
There is no restriction on which one to use and you are free to choose whichever method gives you the best results. But each method counts errors in different ways:
- With Arctan, the error will almost always be larger.
- MSE is more balanced and is used more often.
What kinds of neural networks exist?
There are so many different neural networks out there that it is simply impossible to mention them all. If you want to learn more about this variety, visit the neural network zoo where you can see them all represented graphically.
Feed-forward neural networks
This is the simplest neural network algorithm. A feed-forward network doesn’t have any memory. That is, there is no going back in a feed-forward network. In many tasks, this approach is not very applicable. For example, when we work with text, the words form a certain sequence, and we want the machine to understand it.
Feedforward neural networks can be applied in supervised learning when the data that you work with is not sequential or time-dependent. You can also use it if you don’t know how the output should be structured but want to build a relatively fast and easy NN.
Recurrent neural networks
A recurrent neural network can process texts, videos, or sets of images and become more precise every time because it remembers the results of the previous iteration and can use that information to make better decisions.
Recurrent neural networks are widely used in natural language processing and speech recognition.
Convolutional neural networks
Convolutional neural networks are the standard of today’s deep machine learning and are used to solve the majority of problems. Convolutional neural networks can be either feed-forward or recurrent.
Let’s see how they work. Imagine we have an image of Albert Einstein. We can assign a neuron to all pixels in the input image.
But there is a big problem here: if you connect each neuron to all pixels, then, firstly, you will get a lot of weights. Hence, it will be a very computationally intensive operation and take a very long time. Then, there will be so many weights that this method will be very unstable to overfitting. It will predict everything well on the training example but work badly on other images.
Therefore, programmers came up with a different architecture where each of the neurons is connected only to a small square in the image. All these neurons will have the same weights, and this design is called image convolution. We can say that we have transformed the picture, walked through it with a filter simplifying the process. Fewer weights, faster to count, less prone to overfitting.
For an awesome explanation of how convolutional neural networks work, watch this video by Luis Serrano.
Generative adversarial neural networks
A generative adversarial network is an unsupervised machine learning algorithm that is a combination of two neural networks, one of which (network G) generates patterns and the other (network A) tries to distinguish genuine samples from the fake ones. Since networks have opposite goals – to create samples and reject samples – they start an antagonistic game that turns out to be quite effective.
GANs are used, for example, to generate photographs that are perceived by the human eye as natural images or deepfakes (videos where real people say and do things they have never done in real life).
What kind of problems do NNs solve?
Neural networks are used to solve complex problems that require analytical calculations similar to those of the human brain. The most common uses for neural networks are:
- Classification. NNs label the data into classes by implicitly analyzing its parameters. For example, a neural network can analyse the parameters of a bank client such as age, solvency, credit history and decide whether to loan them money.
- Prediction. The algorithm has the ability to make predictions. For example, it can foresee the rise or fall of a stock based on the situation in the stock market.
- Recognition. This is currently the widest application of neural networks. For example, a security system can use face recognition to only let authorized people into the building.
Summary
Deep learning and neural networks are useful technologies that expand human intelligence and skills. Neural networks are just one type of deep learning architecture. However, they have become widely known because NNs can effectively solve a huge variety of tasks and cope with them better than other algorithms.
If you want to learn more about applications of machine learning in real life and business, continue reading our blog:
- In our post about best ML applications, you can discover the most stunning use cases of machine learning algorithms.
- Read this Medium post if you want to learn more about GPT-3 and creative computers.
- If you want to know how to choose ML techniques for your project, you will find the answer in our blog post.