A Guide to Deep Learning and Neural Networks

Yulia Gavrilova
Article by Yulia Gavrilova
Thursday, October 8th, 2020

As a subset of artificial intelligence, deep learning lies at the heart of various innovations: self-driving cars, natural language processing, image recognition and so on. Companies that deliver DL solutions (such as Amazon, Tesla, Salesforce) are at the forefront of stock markets and attract impressive investments. According to Statista, the total funding of artificial intelligence startup companies worldwide in 2014–2019 is equal to more than $26 billion. This high interest can be explained by the amazing benefits of deep learning and its architectures — artificial neural networks.

AI startup funding graph

What is deep learning?

what is deep learning

Deep learning is one of the subsets of machine learning that uses deep learning algorithms to implicitly come up with important conclusions based on input data.

Usually, deep learning is unsupervised or semi-supervised. Deep learning is based on representation learning. Instead of using task-specific algorithms, it learns from representative examples. For example, if you want to build a model that recognizes cats by species, you need to prepare a database that includes a lot of different cat images.

The main architectures of deep learning are:

  • Convolutional neural networks
  • Recurrent neural networks
  • Generative adversarial networks
  • Recursive neural networks

We are going to talk about them more in detail later in this text.

Difference between machine learning and deep learning

Machine learning attempts to extract new knowledge from a large set of pre-processed data loaded into the system. Programmers need to formulate the rules for the machine, and it learns based on them. Sometimes, a human might intervene to correct its errors.

However, deep learning is a bit different:

Deep learning Machine learning
large amounts of data small datasets as long as they are high-quality
computation-heavy not always
an draw accurate conclusions from raw data carefully pre-processed data
take much longer to train can be trained in a reduced amount of time
you can't know what are the particular features that the neurons represent logic behind the machine’s decision is clear
can be used in unexpected ways algorithm is built to solve a specific problem

Advantages of deep learning

Now that you know what the difference between DL and ML is, let us look at some advantages of deep learning.

  • In 2015, a group of Google engineers was conducting research about how NN carry out classification tasks. By chance, they also noticed that neural networks can hallucinate and produce rather interesting art.
  • The ability to identify patterns and anomalies in large volumes of raw data enables deep learning to efficiently deliver accurate and reliable analysis results to professionals. For example, Amazon has more than 560 million items on the website and 300+ million users. No human accountant or even a whole army of accountants would be able to track that many transactions without an AI tool.
  • Deep learning doesn’t rely on human expertise as much as traditional machine learning. DL allows us to make discoveries in data even when the developers are not sure what they are trying to find. For example, you want your algorithms to be able to predict customer retention, but you’re not sure which characteristics of a customer will enable the system to make this prediction.

Deep learning advantages

Problems of deep learning

  • Large amounts of quality data are resource-consuming to collect. For many years, the largest and best-prepared collection of samples was ImageNet with 14 million different images and more than 20,000 categories. It was founded in 2012, and only last year, Tencent released a database that is larger and more versatile.
  • Another difficulty with deep learning technology is that it cannot provide reasons for its conclusions. Therefore, it is difficult to assess the performance of the model if you are not aware of what the output is supposed to be. Unlike in traditional machine learning, you will not be able to test the algorithm and find out why your system decided that, for example, it is a cat in the picture and not a dog.
  • It is very costly to build deep learning algorithms. It is impossible without qualified staff who are trained to work with sophisticated maths. Moreover, deep learning is a resource-intensive technology. It requires powerful GPUs and a lot of memory to train the models. A lot of memory is needed to store input data, weight parameters, and activation functions as an input propagates through the network. Sometimes deep learning algorithms become so power-hungry that researchers prefer to use other algorithms, even sacrificing the accuracy of predictions.

However, in many cases, deep learning cannot be substituted.

How can you apply DL to real-life problems?

Deep learning applications

Today, deep learning is applied across different industries for various use cases:

  • Speech recognition. All major commercial speech recognition systems (like Microsoft Cortana, Alexa, Google Assistant, Apple Siri) are based on deep learning.
  • Pattern recognition. Pattern recognition systems are already able to give more accurate results than the human eye in medical diagnosis.
  • Natural language processing. Neural networks have been used to implement language models since the early 2000s. The invention of LSTM helped improve machine translation and language modeling.
  • Discovery of new drugs. For example, the AtomNet neural network has been used to predict new biomolecules that can potentially cure diseases such as Ebola and multiple sclerosis.
  • Recommender systems. Today, deep learning is being used to study user preferences across many domains. Netflix is one of the brightest examples in this field.

What are artificial neural networks?

What are artificial neural networks

“Artificial neural networks” and “deep learning” are often used interchangeably, which isn’t really correct. Not all neural networks are “deep”, meaning “with many hidden layers”, and not all deep learning architectures are neural networks. There are also deep belief networks, for example.

neural networks

However, since neural networks are the most hyped algorithms right now and are, in fact, very useful for solving complex tasks, we are going to talk about them in this post.

Definition of an ANN

An artificial neural network is heavily inspired by the structure of a human brain. Simply put, an ANN represents a sequence of neurons connected by synapses. Those sequences are often organized into layers.

Having many (sometimes millions) of input neurons in the system, the machine learns to analyze and even memorize various information. Due to this structure, a neural network can process monstrous amounts of information very fast.

Here is a video for those who want to dive deeper into the technical details of how artificial neural networks work.

Artificial neural networs are incredibly valuable not only to analyze incoming information but also to reproduce it from their memory.

Components of Neural Networks

Every neural network consists of neurons, synapses, weights, biases, and functions.


A neuron or a node of a neural network is a computing unit that receives information, performs simple calculations with it, and passes it further.

All neurons in a net are divided into three groups:

  • Input neurons that receive information from the outside world;
  • Hidden neurons that process that information;
  • Output neurons that produce a conclusion.

ML architecture

In a large neural network with many neurons and connections between them, neurons are organized in layers. An input layer receives information, n hidden layers (at least 3+) process it, and an output layer provides some result.

Each of the neurons inputs and outputs some data. If this is the first layer, input = output. In other cases, the information that the neurons have received from the previous layer is passed to input. Then, it uses an activation function to get a new output, which is passed to the next layer of neurons in the system.

Neurons only operate numbers in the range [0,1] or [-1,1]. In order to turn data into something that a neuron can work with, we need normalization. We talked about what it is in the post about regression analysis.

Wait, but how do neurons communicate? Through synapses.

Synapses and weights

If we didn’t have synapses, we would be stuck with a bunch of inactive useless neurons. A synapse is a connection between two neurons. Every synapse has a weight. It is the weight that changes the input information while it is transmitted from one neuron to another. The neuron with the greater weight will be dominant in the next neuron. One can say that the matrix of weights is the brain of the whole neural system.

Neuron weights

It is thanks to these weights that the input information is processed and converted into a result. During the initialization (first launch of the NN), the weights are randomly assigned. Later on, they are optimized.


A bias neuron allows for more variations of weights to be stored. Biases add richer representation of the input space to the model’s weights.

In the case of neural networks, a bias neuron is added to every layer. It plays a vital role by making it possible to move the activation function to the left or right on the graph.

bias neurons

It is true that ANNs can work without bias neurons. However, they are almost always added and counted as an indispensable part of the overall model.

How ANNs work

Every neuron processes input data to extract a feature. Let’s imagine that we have features x1, x2, x3, and three neurons, each of which is connected with all these features.

Each of the neurons has its own weights that are used to weight the features. During the training of the network, you need to select such weights for each of the neurons that the output provided by the whole network would be true-to-life.

To perform transformations and get an output, every neuron has an activation function. It allows us to get some new feature space. This combination of functions performs a transformation that is described by a common function F — this describes the formula behind the NN’s magic.

ANN: activation function

There are a lot of activation functions, we will consider the most common ones: linear, sigmoid and hyperbolic tangent. Their main difference is the range of values they work with.

Linear function

Linear function

This function is almost never used, except when you need to test a neural network or transfer a value without transformations.


Sigmoid function

This is the most common activation function, its value range is [0,1]. However, you cannot use it if there are negative values ​​in your research (for example, stocks can go not only up, but also down), then you will need a function that works with negative values.

Hyperbolic tangent

Hyperbolic tangent

It makes sense to use the hyperbolic tangent only when your values ​​can be both negative and positive since the range of the function is [-1,1]. It is not advisable to use this function only with positive values, as it will significantly decrease the accuracy of the results of your neural network.

How do you train an algorithm?

How are neural networks trained? Basically, like any other learning algorithm. We have some output that we want to obtain, for example, a class label. There is some reference output, which we know that these features should have, for example, we want our neural network to distinguish between photos of cats and dogs and we know what those should look like.

Delta is the difference between the data and the output of the neural network. We use calculus magic and repeatedly optimize the weights of the network until the delta is zero. Once the delta is zero, our model is correctly able to predict our example data, obviously.


This is a kind of counter that increases every time the neural network goes through one training set. In other words, this is the total number of training sets completed by the neural network.


When initializing the neural network, this value is set to 0. The larger the epoch, the better the network is trained and, accordingly, its result. The epoch increases each time we go through the entire set of training sets.


Batch size is equal to the number of training examples in one forward/backward pass. The higher the batch size, the more memory space you’ll need.

What is the difference between an iteration and an epoch?

Batch: iteration/epoch

  • one epoch is one forward pass and one backward pass of all the training examples;
  • number of iterations is a number of passes, each pass using [batch size] number of examples. To be clear, one pass equals one forward pass + one backward pass (we do not count the forward pass and backward pass as two different passes).

And what about errors?

Error is a deviation that reflects the discrepancy between expected and received output. The error should become smaller after every epoch. If this does not happen, then you are doing something wrong.

The error can be calculated in different ways, but we will consider only three main ways: Mean Squared Error, Root MSE, and Arctan.

There is no restriction on which one to use and you are free to choose whichever method gives you the best results. But each method counts errors in different ways:

  • In Arctan, the error will almost always be larger, since it works according to the principle: the larger the difference, the larger the error.
  • In Root MSE, the error is the smallest and it is more balanced in the error calculation. MSE is used most often.

What kinds of neural networks exist?

There are so many different neural networks out there that it is simply impossible to mention them all. If you want to learn more about this variety, visit the neural network zoo where you can see them all represented graphically.

Examples of neural networks

Feed-forward neural networks

This is the simplest neural network algorithm. A feed-forward network doesn’t have any memory. That is, there is no going back in a feed-forward network. In many tasks, this approach is not very applicable. For example, when we work with text, the words form a certain sequence, and we want the machine to understand it.

Feedforward neural networks can be applied in supervised learning when the data that you work with is not sequential or time-dependent. You can also use it if you don’t know how the output should be structured but want to build a relatively fast and easy NN.

Recurrent neural networks

A recurrent neural network can process texts, videos, or sets of images and become more precise every time because it remembers the results of the previous iteration and can use that information to make better decisions.

Recurrent neural networks are widely used in natural language processing and speech recognition.

Convolutional neural networks

Convolutional neural networks are the standard of today’s deep machine learning and are used to solve the majority of problems. Convolutional neural networks can be either feed-forward or recurrent.

Let’s see how they work. Imagine we have an image of Albert Einstein. We can assign a neuron to all pixels in the input image.

But there is a big problem here: if you connect each neuron to all pixels, then, firstly, you will get a lot of weights. Hence, it will be a very computationally intensive operation and take a very long time. Then, there will be so many weights that this method will be very unstable to overfitting. It will predict everything well on the training example but work badly on other images.

image convolution process

Therefore, programmers came up with a different architecture where each of the neurons is connected only to a small square in the image. All these neurons will have the same weights, and this design is called image convolution. We can say that we have transformed the picture, walked through it with a filter simplifying the process. Fewer weights, faster to count, less prone to overfitting.

convolutional neural networks

For an awesome explanation of how convolutional neural networks work, watch this video by Luis Serrano.

Generative adversarial neural networks

A generative adversarial network is an unsupervised machine learning algorithm that is a combination of two neural networks, one of which (network G) generates patterns and the other (network A) tries to distinguish genuine samples from the fake ones. Since networks have opposite goals – to create samples and reject samples – they start an antagonistic game that turns out to be quite effective.

elon musk faked into nicolas cage

GANs are used, for example, to generate photographs that are perceived by the human eye as natural images or deepfakes (videos where real people say and do things they have never done in real life).

What kind of problems do NNs solve?

Neural networks are used to solve complex problems that require analytical calculations similar to those of the human brain. The most common uses for neural networks are:

  • Classification. NNs label the data into classes by implicitly analyzing its parameters. For example, a neural network can analyse the parameters of a bank client such as age, solvency, credit history and decide whether to loan them money.
  • Prediction. The algorithm has the ability to make predictions. For example, it can foresee the rise or fall of a stock based on the situation in the stock market.
  • Recognition. This is currently the widest application of neural networks. For example, a security system can use face recognition to only let authorized people into the building.


Deep learning and neural networks are useful technologies that expand human intelligence and skills. Neural networks are just one type of deep learning architecture. However, they have become widely known because NNs can effectively solve a huge variety of tasks and cope with them better than other algorithms.

If you want to learn more about applications of machine learning in real life and business, continue reading our blog:

  • In our post about best ML applications, you can discover the most stunning use cases of machine learning algorithms.
  • Read this Medium post if you want to learn more about GPT-3 and creative computers.
  • If you want to know how to choose ML techniques for your project, you will find the answer in our blog post.
A Guide to Deep Learning and Neural Networks
More from Serokell
Classification Algorithms: A Tomato-Inspired OverviewClassification Algorithms: A Tomato-Inspired Overview
27 best sources to study machine learning27 best sources to study machine learning
Dimensions and Haskell: IntroductionDimensions and Haskell: Introduction