Backpropagation is the engine behind virtually all modern deep learning. It works by computing how much each weight contributed to the final error, then nudging every weight in the direction that reduces that error. The process requires a global error signal propagated backward through the entire network.
Proposed by Rumelhart, Hinton & Williams in 1986, though the chain rule mathematics goes back to Leibniz. It requires the network to be differentiable end-to-end - every operation needs a computable gradient.
Input passes through all layers. Activations computed and cached at every layer. Final output compared to target. Scalar loss computed.
Loss gradient flows backward. Chain rule decomposes gradient layer by layer. Each weight receives its exact contribution to error.
Optimizer (SGD, Adam, etc.) applies gradient to weights. Learning rate scales the step. All weights updated simultaneously.
Spike-Timing Dependent Plasticity is the biological mechanism by which synapses strengthen or weaken based on the precise relative timing of pre- and post-synaptic spikes. No global error signal. No backward pass. Just local temporal correlation.
If the presynaptic neuron fires just before the postsynaptic neuron - causality implied - the synapse potentiates (LTP, long-term potentiation). If the order reverses, the synapse depresses (LTD). The window is ~20ms.
Backpropagation has several properties that make neuroscientists deeply skeptical it operates in the brain:
Active research area - can we find learning rules that are both effective and biologically plausible?