Deep Learning 101: Lesson 8: Backpropagation
This article is part of the “Deep Learning 101” series. Explore the full series for more insights and in-depth learning here.
Backpropagation is a fundamental technique used in training neural networks to learn from data. It involves a two-step process: forward propagation and backward propagation. During forward propagation, inputs are passed through the network and the activations of each neuron are calculated using the specified activation function, such as the sigmoid function. The output of the network is compared to the desired output using a cost or loss function. In the second step, backward propagation, the network adjusts its weights and biases by computing the gradient of the loss function with respect to each weight and bias in the network using partial differentiation. This gradient is then used to update the weights and biases using an optimization algorithm, typically gradient descent. By iteratively repeating this process on a training data set, backpropagation allows the network to adjust its parameters and minimize error, effectively learning the underlying patterns and improving its predictive capabilities.
Let’s try to depict the backpropagation in a network diagram as shown below.
This diagram is instrumental in visualizing the backpropagation algorithm — a fundamental process enabling neural networks to learn from data. It represents a microcosm of the neural network’s operation, showcasing the flow of data from inputs to outputs and the subsequent adjustment of weights and biases through calculated gradients.
The diagram illustrates a neural network with two inputs (x₁ and x₂), each connected to a neuron by weights (w₁ and w₂). These weights are the adjustable parameters that the network will fine-tune during training. The neuron also has a bias (b), depicted as a separate node, which allows the model to fit the data better by providing an additional degree of freedom. As the inputs are fed into the network, they are first combined into a weighted sum (Σ), representing the neuron’s net input. This sum is then processed by a sigmoid activation function, which converts the linear input into a non-linear output (y). The sigmoid function is crucial for enabling the network to capture and model complex, non-linear relationships in the data.
The rightmost part of the diagram captures the output of the network (y), compared against the target or expected value (t), with the discrepancy quantified by a mean squared error function (E). This error measure is reflective of the network’s performance; the goal of training is to minimize this error.
Arrows pointing backward from the error (E) represent the backward propagation of this error signal. By taking the partial derivatives of the error with respect to the output (∂y/∂E), and then applying the chain rule to relate this back to the weights and bias, the network calculates how to adjust w₁, w₂, and b.to reduce E. These adjustments are made in the opposite direction of the gradient, hence the term ‘gradient descent.’
Below are the main steps involved in performing the Forward Propagation and Backward Propagation.
Forward Propagation
The journey of data through the neural network starts with forward propagation, where inputs are passed to the network and processed sequentially from one layer to the next. Each neuron receives inputs, multiplies them by their respective weights, and then adds a bias term. This calculation is the weighted sum, denoted as Σ, and can be expressed mathematically as:
The weighted sum is then transformed by an activation function, which introduces non-linearity into the model, allowing it to learn and represent more complex patterns. In our example, the sigmoid function is used, defined as:
Applied to the weighted sum, the output of the neuron, denoted as y, is:
This activation output is what the network uses to make predictions or decisions based on the input it received.
Error Calculation and Backward Propagation
Once the network has produced an output, it must be evaluated to determine its accuracy. This is done by comparing the output y with the target value t, using a cost function, such as the mean squared error (MSE), to quantify the error of the prediction:
The backpropagation phase starts by computing the gradient of this error with respect to the output of the network. This gradient, ∂y/∂E, indicates the direction and magnitude by which the error will increase or decrease with respect to a change in the output:
The gradient is then propagated back through the network, which requires computing the derivative of the output y with respect to the weighted sum Σ for the sigmoid function:
This derivative reflects how changes in the weighted sum would affect the neuron’s output after the activation function is applied.
Updating the Weights and Bias
The ultimate goal of backpropagation is to use the error gradient to update the weights and bias in such a way that the error is reduced in subsequent iterations. The weights are updated by subtracting the product of the learning rate η and the gradient with respect to each weight:
Similarly, the bias is updated by:
To compute the gradients for the weights and bias, the chain rule is employed:
The partial derivatives of the weighted sum Σ with respect to the weights and bias are the input values and a constant 1, respectively:
By iteratively applying these updates, the neural network adjusts its parameters to minimize the error, thereby improving its performance and accuracy over time. Through backpropagation, neural networks learn to map inputs to the correct outputs, effectively learning from their experiences.
Summary
Backpropagation is a crucial mechanism in training neural networks, allowing them to learn from data by adjusting weights and biases to minimize prediction errors. This two-step process of forward and backward propagation ensures that the network iteratively improves its performance. By calculating gradients and updating parameters, backpropagation enables neural networks to capture intricate relationships in data, making them powerful tools for a wide range of applications in artificial intelligence and machine learning.
4 Ways to Learn
1. Read the article: Backpropagation
2. Play with the visual tool: Backpropagation
3. Watch the video: Backpropagation
4. Practice with the code: Backpropagation
Previous Article: Perceptron
Next Article: Multi-layer Neural Network