Introduction to Deep Learning Algorithms
Deep Learning algorithms are used to develop models that are made up of several layers of neurons in a neural network. Each of these data represents the data to the next layer. Most of the dataset, which is unstructured like the image data, may have millions of features; due to this huge number of features, it becomes unfeasible to use machine learning algorithm. While deep learning algorithm can learn progressively more about these types of data while it moves from each layer of the neural network.
Deep Learning Algorithms
To create a deep learning model, one must write several algorithms, blend them together and create a net of neurons. Deep learning has a high computational cost. To aid deep learning models, there are deep learning platforms like Tensor flow, Py-Torch, Chainer, Keras, etc. In deep learning, we have tried to replicate the human neural network with an artificial neural network; the human neuron is called perceptron in the deep learning model.
We connect these perceptron units together to create a neural network; it has 3 sections:
- Input layer
- Hidden layers
- Output layer
A perceptron has input nodes (dendrites in the human brain), an actuation function to make a small decision and output nodes (axon in the human brain). We will see how one perceptron works; connecting them together will create a deep learning model. Input information (number of input variables/features) is assigned some weight and fed to the actuation function. The actuation function makes a decision and sends output. This perceptron’s output will be input to other neurons. Once the batch is processed, backpropagation error is calculated at each neuron, with the help of a cost function/ cross-entropy. In this way, input weights are reassigned, and the whole process continues until cross-entropy satisfies the condition.
We have different actuation functions like Sigmoid functions, hyperbolic tangent function, Rectified Linear Unit (ReLU) to take a small decision. A deep learning model needs a vast amount of data to build a good model. Generally, a model with more than 3 hidden layers is treated as a deep neural network. Basically, Deep learning is a set of neurons with a number of parameters defined for each layer. To create the Deep Learning model, the popular architectures are RNN, CNN, etc.
Architectural Methods for Deep Learning Algorithms
To build this architecture following algorithms are used:
1. Back Propagation
In this algorithm, we calculate partial derivatives. In general, the gradient descent method for optimization, derivatives (gradients) are calculated at each iteration. In deep learning, functions are not simple; they are the composition of different functions. In this case, it is hard to calculate gradients, so we use approximate differentiation to calculate derivatives. The more the number of parameters, the more expensive approximate differentiation will be.
2. Stochastic Gradient Descent
In Gradient descent, the goal is to find global minima or optimum solutions. But to get that, we have to consider local minima solutions (not desirable) also. If the objective function is a convex function, it is easy to find the global minima. The initial value for the function and learning rate is deciding parameters for finding global minima. This can easily be understood by considering a river from the mountain top and searching for a foothill (global minima). But in the way, there will be some ups and downs (local minima) which must be avoided. The river originating point and speed (initial value and learning rate in our case) are deciding factors to find global minima.
3. Learning Rate
The learning rate is like the speed of the river; it can reduce training time and increase performance. In general, to learn any technique/sport, in the beginning, the learning rate is relatively high than at the end when one is to master it. After the intermediate stage, the learning will be slow; the focus will be on fine-tuning. The same is applied in deep learning; too large changes are tackled by a higher learning rate and by slowly decreasing the learning rate later for fine-tuning.
4. Batch Normalization
In deep learning initial value of weight (randomly chosen) and learning, rate is defined for a mini-batch. In the beginning, there would be many outliers, and during backpropagation, these outliers must be compensated to compute the weights to get output. This compensation results in extra epochs. So to avoid it, we use batch normalization.
5. Drop Out
In deep learning, we generally encounter the problem of overfitting. Overfitting in large networks with several parameters makes it difficult to predict on test data. So, to avoid that, we use the dropout method, which drops random units during training by creating different ‘thinned networks’. When testing these thinned networks’ predictions are averaged, which helps to avoid overfitting.
6. Bag of Words
We use a continuous bag of words to predict the next word. For e.g., we see in email writing the autosuggestion for completing the sentence is part of NLP. This is done by considering lots of sentences and for a specific word surrounding words that are captured. These specific words and surrounding words are fed to the neural network. After the training model, it can predict the specific word based on the surrounding words.
7. Long Short Term Memory
LSTM is very useful in sequence prediction problems like language translation, predicting sales, and finding the stock price. LSTM has the edge over other techniques because it is able to consider previous data. LSTM makes modifications by the cell states mechanism. It remembers to forget things. The 3 main aspects of LSTM make it stand out from other deep learning techniques. First is when the neuron should have input, second when to remember previous data and what to forget, and third is when to pass output.
A deep learning model is a step towards the replication of the human mind. Instead of biological neurons, deep learning uses an artificial neural network. Deep learning has a high computational cost, which can be decreased by using deep learning frameworks such as Tensor flow and Py-Torch. RNN, CNN are architectural methods for deep learning models. Different Deep learning algorithms that are used in these architectures are discussed in this article.
This is a guide to Deep Learning Algorithms. Here we discuss the architectural methods for deep learning algorithms along with layers. You can also go through our other suggested articles to learn more –