A Hopfield Network is a recurrent neural network that functions as an associative memory — given a partial or corrupted pattern, it retrieves the closest stored memory. John Hopfield introduced it in 1982, and it was immediately recognized as a landmark: for the first time, memory storage and retrieval were understood as a single physical process — energy minimization.
Every configuration of the network's N binary neurons corresponds to a point in a 2^N-dimensional state space. The network defines an energy function over this space, and stored memories are local energy minima — valleys in the landscape. Given a starting state (a noisy or incomplete query), the network's update rule always moves downhill. The system converges to a nearby minimum, retrieving the stored pattern.
The biological analogy is direct: this is how the brain might store and retrieve episodic memories. A smell triggers a whole scene. A fragment of a song surfaces the whole song. The Hopfield model gave the first mathematically rigorous account of how this could work in a neural substrate — using nothing but physics.
paper
capacity
capacity
Prize
The energy landscape has valleys at stored memories. Click anywhere on the landscape to start a state — watch it roll downhill to the nearest memory. The update rule always decreases energy until a fixed point is reached.
N binary neurons, each s_i ∈ {-1, +1}. The energy function (from physics — identical to the Ising spin glass model):
sum over all pairs i≠j, wᵢⱼ = wⱼᵢ (symmetric)
To store M patterns ξ¹,...,ξᴹ, use Hebbian learning — neurons that fire together wire together:
outer product rule — one shot, no backprop needed
The update rule: flip neuron i to sign(Σⱼ wᵢⱼ sⱼ). Each flip is guaranteed to decrease E. The system converges — guaranteed — to a local minimum. The stored patterns are exactly those minima.
Capacity limit: stores at most ~0.14N patterns reliably. Beyond this, "spurious memories" (local minima that aren't stored patterns) proliferate and retrieval fails.
Store binary patterns (5x5 pixels). Corrupt one and watch the network retrieve the original through iterated updates.
Ramsauer et al. (2020) showed that replacing the quadratic energy function with an exponential one shatters the 0.14N capacity limit:
lse = log-sum-exp. β = inverse temperature
The new update rule is the softmax over dot products between the query and all stored patterns — which is exactly the attention mechanism in transformers.
one update step = one attention operation
Capacity scales exponentially with N: can store up to 2^(N/2) patterns without interference. The price: continuous-valued patterns instead of binary, and a polynomial energy function of high degree (tied to β).
This is one of the most surprising theoretical results in recent deep learning: the transformer attention mechanism is mathematically equivalent to one update step of a modern Hopfield network trying to retrieve a stored pattern.
Attn(Q,K,V) = V · softmax(QKᵀ/√d)
Modern Hopfield update:
ξ_new = X · softmax(β · Xᵀξ)
Q = query ξ, K = stored patterns X, V = stored patterns X
β = 1/√d (inverse temperature = scale factor)
The correspondence: keys and values are both the stored pattern matrix X. The query ξ is the retrieval probe. The scale factor β controls retrieval sharpness — high β focuses on the closest pattern (winner-take-all), low β averages across many.
In the modern Hopfield model, β (inverse temperature) controls retrieval sharpness. Drag the slider to see how it transitions from averaging across all memories (low β, soft attention) to sharp winner-take-all retrieval (high β, hard attention). This is exactly what the attention scale factor 1/√d does in transformers.
High β: concentrates on nearest pattern. Hard retrieval. Winner-take-all.
The Hopfield network energy function is identical to the Ising spin glass model — a statistical mechanics model of disordered magnetic systems, studied since the 1970s. Hopfield saw that the same mathematics that described frustrated magnets could describe memory.
Hopfield energy: E = -½ Σᵢⱼ wᵢⱼ sᵢ sⱼ
Formally identical. J = w, σ = s.
Amit, Gutfreund, and Sompolinsky (1985) did the full statistical mechanics analysis of Hopfield networks using replica theory — borrowed from spin glass theory. They derived the 0.14N capacity limit rigorously. The tools of physics became tools of neural network theory.
The 2020 reinterpretation of transformers as Hopfield networks isn't just theoretical — it's changed how people think about what attention is doing:
This framing also explains why attention heads behave as "lookup operations" — each head is an associative memory retrieval with different stored patterns. It also motivated new architectures: LSTM-inspired recurrent models using the Hopfield retrieval update as an explicit memory module.
Add temperature T to the Hopfield update: instead of deterministic sign(.), flip neuron i with probability P(s_i=1) = σ(2/T · Σⱼ wᵢⱼ sⱼ). At T→0: deterministic Hopfield. At T→∞: random. At the right T: a Boltzmann distribution over states. This is the Boltzmann Machine (Hinton, Sejnowski 1986) — the first model of learned probabilistic inference in neural networks.
thermal equilibrium = Boltzmann distribution
The exponential capacity comes at a cost: energy function requires polynomial terms of degree 2n (where n controls capacity-error tradeoff), making it more computationally expensive. The β→∞ limit achieves exponential capacity but requires exact nearest-neighbor lookup — equivalent to hard attention.
John Hopfield and Geoffrey Hinton were awarded the 2024 Nobel Prize in Physics for "foundational discoveries and inventions that enable machine learning with artificial neural networks." Hopfield specifically for the associative memory network; Hinton for the Boltzmann Machine and backpropagation work.
The Nobel committee explicitly highlighted the physics connection: the Hopfield network imported the Ising model and spin glass theory directly into neuroscience and AI. Statistical mechanics became a design tool for computing systems.