BACKPROP vs STDP

HOW LEARNING ACTUALLY HAPPENS GRADIENT DESCENT // HEBBIAN PLASTICITY
MACHINE vs BIOLOGICAL SYNAPTIC LEARNING
01 // Artificial Learning
BACKPROPAGATION

Backpropagation is the engine behind virtually all modern deep learning. It works by computing how much each weight contributed to the final error, then nudging every weight in the direction that reduces that error. The process requires a global error signal propagated backward through the entire network.

Proposed by Rumelhart, Hinton & Williams in 1986, though the chain rule mathematics goes back to Leibniz. It requires the network to be differentiable end-to-end - every operation needs a computable gradient.

L = loss(ŷ, y) // compute scalar error
∂L/∂w = ∂L/∂ŷ · ∂ŷ/∂z · ∂z/∂w
w ← w - η · ∂L/∂w
// chain rule applied recursively from output back to input. η = learning rate. requires storing all intermediate activations.
FORWARD

Input passes through all layers. Activations computed and cached at every layer. Final output compared to target. Scalar loss computed.

BACKWARD

Loss gradient flows backward. Chain rule decomposes gradient layer by layer. Each weight receives its exact contribution to error.

UPDATE

Optimizer (SGD, Adam, etc.) applies gradient to weights. Learning rate scales the step. All weights updated simultaneously.

02 // Biological Learning
STDP

Spike-Timing Dependent Plasticity is the biological mechanism by which synapses strengthen or weaken based on the precise relative timing of pre- and post-synaptic spikes. No global error signal. No backward pass. Just local temporal correlation.

If the presynaptic neuron fires just before the postsynaptic neuron - causality implied - the synapse potentiates (LTP, long-term potentiation). If the order reverses, the synapse depresses (LTD). The window is ~20ms.

Δt = t_post - t_pre
ΔW = A₊ · e^(-Δt/τ₊) if Δt > 0 (LTP)
ΔW = -A₋ · e^(Δt/τ₋) if Δt < 0 (LTD)
// A± = amplitude constants. τ± = time constants (~20ms). entirely local - no knowledge of global error.
STDP LEARNING WINDOW // ΔW vs Δt
0 0 Δt (ms) ΔW LTP LTD pre→post post→pre
03 // The Backprop Problem
BIOLOGICAL IMPLAUSIBILITY

Backpropagation has several properties that make neuroscientists deeply skeptical it operates in the brain:

01
Weight Transport Problem
Backprop requires the feedback pathway to use the exact same weights as the forward pathway, transposed. The brain has no known mechanism for this. Feedback connections are anatomically different from feedforward ones.
02
Temporal Non-locality
Backprop requires storing all intermediate activations during the forward pass to compute gradients. Neurons would need to "remember" their activation values until the error signal arrives - biologically implausible on the required timescale.
03
Global Error Signal
Every weight in a deep network receives a gradient computed from a global loss function. There is no known global error broadcast in the brain. Learning appears to be local.
04
Vanishing Gradients
In very deep networks, gradients shrink exponentially as they propagate backward. The brain has no obvious analogue to batch normalization or residual connections that solve this in ANNs.
04 // Contenders & Hybrids
BIOPLAUSIBLE ALTERNATIVES

Active research area - can we find learning rules that are both effective and biologically plausible?

01
Feedback Alignment (Lillicrap 2016)
Replace transposed weights in backward pass with fixed random matrices. Surprisingly still works - the forward weights align to the random feedback over time. Removes weight transport requirement.
02
Predictive Coding
Each layer predicts the activity of the layer below. Error = prediction - actual. Errors propagate locally. Maps loosely to cortical hierarchies. Karl Friston's Free Energy Principle extends this framework.
03
Contrastive Hebbian Learning
Run network in two phases: clamped (target provided) and free (no target). Synapses update based on difference in correlations between phases. Biologically plausible, maps to sleep/wake cycles.
04
Neuromodulation as Error Signal
Dopamine, acetylcholine, and norepinephrine may carry something like a global reward prediction error. Not identical to backprop loss but functionally analogous. Reinforcement learning connection.
05 // Side by Side
DIRECT COMPARISON
STDP // BIOLOGICAL
Signal scopePurely local. Synapse only knows about its own pre/post spike times. No network-wide information.
TimingOnline, continuous. Happens in real time as spikes occur. No separate forward/backward phases.
Error sourceNone explicitly. Correlation as proxy for causality. Reward signals modulate via neuromodulators.
Memory req.Eligibility traces - synapse maintains short-term memory of recent spike times (~seconds).
ResultUnsupervised structure learning. Temporal sequence learning. Works without labeled data.
BACKPROP // ARTIFICIAL
Signal scopeGlobal. Every weight receives gradient derived from the network-wide scalar loss function.
TimingTwo-phase. Forward pass caches activations. Backward pass computes and applies gradients.
Error sourceExplicit labeled targets required. Loss function compares output to ground truth.
Memory req.Full activation cache for all layers. Scales linearly with network depth and batch size.
ResultHighly effective supervised learning. State of art across most benchmarks. Biologically implausible.