Backpropagation
Short Definition
Full Definition
Backpropagation, short for backward propagation of errors, is the cornerstone algorithm for training artificial neural networks. Introduced in its modern form by Rumelhart, Hinton, and Williams in 1986, it solved the critical problem of how to efficiently compute gradients in multi-layer networks. The algorithm works in two phases: during forward propagation, input data flows through the network to produce a prediction; during backward propagation, the error between the prediction and the true value is propagated back through the network using the chain rule of calculus. Each weight receives a gradient indicating how much it contributed to the overall error. These gradients are then used to update the weights, typically using gradient descent or its variants. Backpropagation made it practical to train networks with multiple hidden layers, which was previously computationally infeasible. Without backpropagation, modern deep learning would not exist. The algorithm works with any differentiable loss function and any differentiable activation function, making it extremely versatile. Modern frameworks like PyTorch and TensorFlow implement automatic differentiation, which generalizes backpropagation to arbitrary computational graphs.
Technical Explanation
Backpropagation applies the chain rule to compute partial derivatives of the loss L with respect to each weight w. For a network with layers, the gradient at layer l is: dL/dw_l = dL/da_L * da_L/dz_L * dz_L/da_{L-1} * … * da_l/dz_l * dz_l/dw_l, where a represents activations and z represents pre-activation values. The key efficiency comes from reusing intermediate computations: gradients at each layer depend on gradients from the layer above, avoiding redundant calculations. This reduces the complexity from exponential to linear in the number of layers. Weight updates follow: w_new = w_old – learning_rate * dL/dw. Modern optimizers like Adam maintain per-parameter learning rates using first and second moment estimates of the gradients.
Use Cases
Advantages
Disadvantages
Schema Type
Featured Snippet Candidate
Difficulty Level