Phase Transitions & The Ising Model

I recently made a small foray into statistical mechanics while trying to understand the critical brain hypothesis [1] [2] [3]. Including what exactly it means for the brain to operate near a 2nd-order phase transition, and why the observation that the size and lifetime distributions of neuronal avalanches are power-law distributed offers experimental support for criticality. This lead me down a pretty deep rabbit hole of ideas in statistical mechanics such as the scaling hypothesis, mean-field theory, and the renormalization group. Some of which I will explore in this post.

This post on phase transitions and the Ising model is going to be one in a couple of posts I hope to make on topics at the intersection of physics, neuroscience, and machine learning. These topics include (but are not necessarily limited to) free energy and entropy, Hopfield networks, Boltzmann machines, and the critical brain hypothesis. The discussion of the Ising model in this post will give a smooth transition into a future post on Hopfield networks, which are energy-based models of associative memory built on top of the Ising model. That said, these posts are momentarily in competition with a bunch of mechanistic interpretability posts I also want to get up soon.

To give a high level overview of this post, we will first cover some background information on what a phase transition is and its classifications. Introduce critical exponents, the scaling hypothesis, and the concept of universality observed between different models. Then, we will discuss the Ising model, a model of magnetism at the scale of the magnetic dipole moment of an electron, in a varying number of dimensions. We will discuss what it means to solve a thermodynamic problem, namely computing the free energy and it’s various derivatives, then solve the Ising model in zero- and one-dimensions. Finally, we will cover the mean field approximation of the Ising model in $d$ -dimensions, talk about when this approximation does and does not hold, and provide a very interesting link between mean-field theory and variational inference in Bayesian statistics.

Phase Transitions

A phase transition is an event, that occurs when small changes in a control parameter (such as temperature) lead to large, and sometimes discontinuous, change in the macroscopic properties of a system in the thermodynamic limit. These macroscopic properties are often referred to as order parameters (such as magnetism, see Figure 2). Using the Ehrenfest classification, phase transitions are further classified by the system’s free energy, where $N$ -th-order phase transitions involve a discontinuity in the $N$ -th partial derivative of the free energy with respect to the order parameter. But, throughout this post we will only distinguish between discontinuous (first-order) and continuous (2nd-order and above) phase transitions. This distinction can be seen more clearly in Figure 1 below.

Phase transitions occur at a critical point given by a co-existence curve (a sort of hyperplane) that separates two distinct phases of matter, such as water and ice. When crossing this boundary, the systems macroscopic properties change abruptly, whether through a first- or second-order phase transition. As the system size grows larger, approaching the thermodynamic limit, the properties on each side of this boundary become more sharply defined. This behavior is explained through renormalization group theory, which is a bit beyond the scope of this post, but I will refer to it once or twice more.

Throughout this post, we will use magnetism as the pedagogical example. At a microscopic level, magnetic materials derive their properties from the alignment of magnetic dipole moments. These moments arise primarily from the intrinsic spin of an electron, creating tiny magnetic fields that can interact with neighboring atoms. In ferromagnetic materials (such as iron), these dipole moments can align producing a permanent magnet. When a sufficient fraction of these magnetic moments align, they produce a net magnetic field. Here, we refer to magnetic materials as ferromagnet, and non-magnetic materials as paramagnets.

To give a concrete example of a phase transition that’s relevant within the context of this blog post, let’s consider the ferromagnet-paramagnet phase transition. The magnetization curve shown in Figure 2 illustrates the temperature dependence of the order parameter $M$ in a ferromagnetic system. For temperatures below the critical temperature $T_c$ (called the Curie temperature), the system exhibits spontaneous magnetization, represented by the two symmetric branches of the curve. These branches emerge from a bifurcation at $T_c$ and grow in magnitude as temperature decreases, following the power law $M \propto (T_c - T)^\beta$ , where $\beta$ is the critical exponent (which is something we will discuss later). The dashed line at $M = 0$ for $T < T_c$ represents an unstable state, while the continuation above $T_c$ represents the paramagnetic phase where the thermal fluctuations overcome the ordering tendency of the magnetic interactions resulting in the loss of magnetization.

**Figure 2:** Phase diagram of the Ising model showing magnetization ( $M$ ) as a function of reduced temperature ( $T/T_c$ ).

The first phase transition occurs as we vary the temperature with no external magnetic field present. Consider a permanent magnet at room temperature (such as iron) where the magnetic moments align themselves spontaneously in the same direction, creating a net magnetic field. As we heat this magnet above $T_c$ , it undergoes a dramatic change where the thermal energy becomes strong enough to overcome the alignment of these magnetic moments, and the material transitions into a paramagnetic phase where the magnetic moments point in random directions, resulting in no net magnetic field. This is a continuous or second-order phase transition where the magnetization gradually decreases to zero as we approach the critical temperature (Figure 1 above).

The second type of transition becomes apparent when we apply an external magnetic field at a fixed temperature below $T_c$ . Starting with a strong field in one direction, the magnetic moments align with this field. As we decrease the field strength to zero and then increase it in the opposite direction, the magnetization shows a sudden jump or discontinuity as it switches direction, characteristic of a first-order phase transition. Although, it is also worth noting that because all of the spins are aligned in one direction, there is a phenomenon called hysteresis, which is a sort of lag that occurs due to the spins not wanting to flip even if the external field flips. In this sense, the system’s response depends on its history, and hysteresis reflects the system’s preference to maintain alignment among its magnetic moments even as the external field reverses.

Critical Exponents and Universality

One interesting property of continuous phase transitions is that they exhibit a characteristic power-law behavior in various thermodynamic quantities as the system approaches some critical point. These power laws are characterized by critical exponents, which quantify how physical observables scale with the distance from the critical point. For a magnetic system at temperature $T$ near its critical temperature $T_c$ , we define the reduced temperature $t = (T - T_c)/T_c$ as a measure of this distance.

**Table 1:** Primary critical exponents and their associated scaling relations. Here, $t$ denotes the reduced temperature, $h$ the external field, and $d$ the spatial dimension.

This critical behavior is described by a set of critical exponents, each associated with a specific thermodynamic quantity. For example, the magnetization $M$ , below $T_c$ , scales as $M \sim |t|^\beta$ . Similarly, the magnetic susceptibility $\chi$ diverges as $\chi \sim |t|^{-\gamma}$ . These power laws characterize the singular behavior near the critical point, and can be observed across a wide range of (surprisingly different) physical systems. A larger set of the primary critical exponents and their scaling relations are presented in Table 1 above, and a more graphical plot can be seen in Figure 3 below.

**Figure 3:** Critical behavior of thermodynamic quantities in the 2D Ising model as functions of the reduced temperature $t = (T-T_c)/T_c$ . Left panel: The magnetization $M \sim |t|^\beta$ for $t < 0$ with critical exponent $\beta = 0.125$ (= 1/8), showing how the order parameter continuously approaches zero as the critical temperature is approached from below. Center panel: The correlation length $\xi \sim |t|^{-\nu}$ with critical exponent $\nu = 1.0$ , diverging at $t_c$ , which indicates the emergence of long-range correlations at the critical point. Right panel: The magnetic susceptibility $\chi \sim |t|^{-\gamma}$ with critical exponent $\gamma = 1.75$ (= 7/4), demonstrating the divergence of fluctuations near the critical point. These precise critical exponents are characteristic of the 2D Ising universality class and illustrate how different physical observables exhibit power-law behavior near the critical point, with exponents that are related through scaling relations.

The concept of universality as it relates to critical phenomena states that systems with distinct microscopic interactions, characterized by their (potentially very different) Hamiltonians, can exhibit identical critical exponents. This universal behavior can be understood through the role of fluctuations and the divergence of the correlation length $\xi \sim |t|^{-\nu}$ near the critical point (more on correlation functions in the next section). When $\xi$ becomes much larger than any microscopic length scale, fluctuations on all length-scales become increasingly important, and the individual microscopic interactions becomes irrelevant to the critical behavior. These large fluctuations sort of wash out the microscopic details, leaving only the fundamental symmetries and dimensionality to determine the critical behavior. Systems that share these basic properties form a universality class, characterized by their identical critical exponents.

To give a concrete example, the liquid-gas critical point and the ferromagnetic phase transition in the three-dimensional Ising model have the same critical exponents, and are thus in the same universality class. This is despite the (large) differences in their microscopic physics, where one involves molecular interactions in a fluid and the other involves magnetic moments on a lattice.

Correlation Functions

To understand critical behavior, it’s going to be useful to have an idea for what form the spatial correlation function takes on. Consider a magnetic system where $s(\mathbf{r})$ represents the local spin at position $\mathbf{r}$ . The two-point correlation function is defined as

G(\mathbf{r}_1, \mathbf{r}_2) = \langle s(\mathbf{r}_1)s(\mathbf{r}_2) \rangle - \langle s(\mathbf{r}_1) \rangle \langle s(\mathbf{r}_2) \rangle

Where this function measures the degree to which fluctuations at different points in space are correlated, after subtracting out the effects of the average magnetization. For translationally invariant systems, this depends only on the relative displacement $\mathbf{r} = \mathbf{r}_2 - \mathbf{r}_1$ , allowing us to rewrite $G(\mathbf{r}_1, \mathbf{r}_2)$ as $G(\mathbf{r})$ . Away from criticality, this correlation function exhibits exponential decay

G(\mathbf{r}) \sim e^{-r/\xi}

To understand why this is, consider a system with local interactions. The correlations between distant points must be built up through a chain of intermediate interactions. Each “step” in this chain introduces a multiplicative factor. The accumulated effect over a distance $r$ naturally leads to an exponential decay, with $\xi$ characterizing the typical distance over which correlations persist. However, what is very important to note is that at the critical point, the correlation function transitions to a power-law decay

G(\mathbf{r}) \sim \frac{1}{r^{d-2+\eta}}

where $d$ is the spatial dimension and $\eta$ is the anomalous dimension. Where the anomalous dimension $\eta$ is a sort of correction term that represents how the spatial correlations deviate from what would be expected in a simple non-interacting or mean field system, due to the critical fluctuations. This transition from exponential to power-law behavior reflects the emergence of scale invariance at criticality, as the divergence of $\xi$ eliminates the characteristic length scale that gave us the exponential decay.

**Figure 4:** Visualization of the concept of correlation length. The figure depicts a 2D plane in 3D space with randomly oriented spins (represented by arrows) distributed throughout. Within a circular region of diameter $\xi$ , all spins are aligned in the same direction, while outside this region, spin orientations are random and uncorrelated. The correlation length $\xi$ defines the characteristic spatial scale over which the system maintains order. Spins separated by distances less than $\xi$ tend to be correlated, while those separated by distances greater than $\xi$ tend to behave independently. As the temperature approaches the critical point $T_c$ , this correlation length diverges according to $\xi \sim |t|^{-\nu}$ , leading to long-range order and scale-invariant behavior characteristic of second-order phase transitions.

This is very interesting because it implies that systems that undergo a second-order phase transition have scale-free dynamics at the critical point. Within the context of a future post on the critical brain hypothesis, this is of interest because it explains why observing scale-free dynamics, in for example the size and lifetime distribution of neuronal avalanches, provide evidence for the brain operating near a second-order phase transition between ordered and disordered states. I’ll post more on this another time, but it also adds intuition for why the brain might want to self-organize or fine-tune near the critical point. That is, at the critical point there is a divergence in the correlation length allowing for optimal message passing between pairs of neurons.

The complete theoretical understanding of this power-law behavior emerges from the renormalization group analysis. Furthermore, the full derivation of the correlation function forms can be followed from the Ornstein-Zernike theory of correlation functions, which is slightly beyond the scope of this post. The Berlinsky [4] book has a good discussion of this in Chapter 13.

The Scaling Hypothesis

It turns out that the various critical exponents introduced in the previous section are not all independent. The scaling hypothesis, one of the fundamental principles in the theory of critical phenomena, asserts that only two of the critical exponents are independent, and that the remaining exponents can be determined through a set of scaling relations. The most fundamental of these relations are the Rushbrooke and Widom scaling laws

\alpha + 2\beta + \gamma = 2

\gamma = \beta(\delta - 1)

Further scaling relations can be derived by considering the role of the spatial dimension $d$ and the anomalous dimension $\eta$ . These relationships, known as the hyperscaling relations, and take the form

\begin{align*} \alpha &= 2 - d\nu \\[0.25em] \beta &= \frac{\nu}{2}(d - 2 + \eta) \\[0.25em] \gamma &= \nu(2 - \eta) \\[0.25em] \delta &= \frac{d + 2 - \eta}{d - 2 + \eta} \end{align*}

The existence of these scaling relations reveals a sort of simplicity underlying critical phenomena. Despite the complexity of microscopic interactions, systems near criticality can be characterized by just a few independent parameters. This reduction in complexity suggests that the behavior at critical points is not dependent on the specific details of the microscopic interactions, providing further support to the concept of universality discussed previously.

While I didn’t provide a full derivation of the scaling relationships, a thorough and pedagogical treatment can be found in Kopietz’s “Introduction to the Renormalization Group,” particularly in sections 1.2 and 1.3.

The Ising Model

The Ising model, often referred to as the Drosophila of statistical mechanics, is an abstraction of the microscopic properties of magnetism through discrete magnetic dipole moments (spins) that can either point up ( $+1$ ) or down ( $-1$ ). More formally, the Ising model is defined on a $d$ -dimensional lattice $\Lambda$ where at each site $i \in \Lambda$ there is a discretized abstraction of the spin $\sigma_i \in \{+1, -1\}$ . When the spins align we have a net magnetic field. When the spins do not align, they cancel each other out resulting in no net magnetization. Most important to our study of the Ising model in this post is the existence (or lack thereof depending on the dimensionality of the system) of first- and second-order phase transitions in the order parameter (magnetization) as we vary various control parameters (e.g., the strength of the external field or the temperature respectively).

**Figure 5:** Left: Random spin configuration in a 2D Ising model lattice at high temperature, exhibiting disorder with no clear pattern of aligned spins. Right: Low-temperature configuration showing two distinct magnetic domains separated by a well-defined domain wall. The left domain consists predominantly of -1 spins (light), while the right domain contains mostly +1 spins (dark), illustrating the spontaneous symmetry breaking and phase separation characteristic of the ferromagnetic phase below the critical temperature.

In the ferromagnetic phase below the critical temperature $T_c$ , the system spontaneously breaks symmetry as spins collectively align, forming clusters of uniform magnetization called magnetic domains. In real systems, these domains form due to competing interactions, boundary effects, and non-equilibrium dynamics. The boundaries between oppositely aligned domains are called domain walls (see Figure 5 above or the cover for this post), which have a characteristic width and energy cost (an interesting area of research). As temperature increases toward $T_c$ , thermal fluctuations cause these domain walls to become increasingly irregular and fluid. At the critical point, the correlation length diverges, meaning that fluctuations occur at all length scales, which dissolves the distinct domains as the system transitions to the paramagnetic phase.

Solving a Thermodynamic System

Before we go any further, it might be worth noting what the goals are, and what exactly it means to solve a system like the Ising model. First, we are interested in computing the mean magnetization of the spins $\langle \sigma \rangle$ , which is computed at the ensemble/thermodynamic mean of the individuals spins

\langle \sigma \rangle = \sum_s \sigma_s p(\sigma_s)

Where $\sigma_s$ are the possible micro-states of the system, and $p(s)$ are the Boltzmann probabilities. These probabilities depend on the energy of each microstate, which is given by the Hamiltonian $\mathcal{H}$ of the system. The Hamiltonian encodes all interactions and external influences in the system, and evaluating it for a particular microstate $s$ gives us the energy $E(s)$ .

Following the principles of statistical mechanics, lower energy states are assigned higher probabilities than higher energy states according to the Boltzmann distribution

p(s) = \frac{1}{Z} e^{-\beta E(s)}

Where $\beta = \frac{1}{k_B T}$ is the thermodynamic beta, $k_B$ is the Boltzmann constant, and $Z$ is the Partition function (a normalization factor to ensure the probabilities sum to one). We are working in the canonical ensemble, which is a system at thermodynamic equilibrium with a reservoir heat bath, and so the canonical partition function is given by

Z = \sum_{s} e^{-\beta E(s)}

The Helmholtz free energy of the system is related to the Partition function as

F = \langle E \rangle - T \langle S \rangle = -T \log(Z)

From the free energy, we can find most of what we are interested in. That is, the mean magnetization is related to the free energy as $\langle \sigma \rangle = - \partial F / \partial h$ , and the magnetic susceptibility is $\chi = \partial \langle m \rangle / \partial h$ , which refers how much the magnetization changes with the external field $h$ . And so in some sense, we are mainly interested in the free-energy of the system and it’s various derivatives, which we can get by computing the Partition function. Thus, to solve a thermodynamic system like this roughly boils down to computing the Partition function $Z$ .

The Ising Hamiltonian

In true statistical mechanics fashion (I say this as someone who is not a Physicist), the first thing we will want to do is define the total energy of any state of the system, the Hamiltonian. To do so, notice that there are only two net forces on the system. Given any dipole moment $\sigma_i$ , the forces acting on it are going to be the spins of it’s neighbors and that of any external magnetic field. First, we can model the neighboring interactions with the following term:

-\frac{1}{2} \sum_{\langle i, j \rangle} J_{ij} \sigma_i \sigma_j

Where $\sigma_i \in \{-1, 1\}$ is the spin of the $i$ -th dipole, $J_{ij}$ is the coupling strength between the $i$ -th and $j$ -th dipole, and the $\langle i, j \rangle$ notation means to sum over all possible configurations of $i$ and $j$ . Notice that because of the symmetry with $i$ and $j$ in the $\langle i, j \rangle$ notation, we also have a factor of one-half to account for the double adding of terms.

Next, we have the interaction of an external field $h$ on each dipole. This interaction can be written as

-\sum_i h_{i} \sigma_i

Where $h_i$ is the coupling of the external field on the $i$ -th dipole. Summing these two terms together gives us the Hamiltonian of the Ising model

H = -\frac{1}{2} \sum_{\langle i, j \rangle} J_{ij} \sigma_i \sigma_j - \sum_i h_{i} \sigma_i

Here, we will assume the traditional simplifications that $J_{ij}$ and $h_i$ uniformly interact with all of the dipoles, and allow the one-half term to be absorbed by $J$ . This gives us the following simplified Hamiltonian which we will be working with for the rest of this blog post

H = -J \sum_{\langle i, j \rangle} \sigma_i \sigma_j - h \sum_i \sigma_i

Next, we will try and solve this system (compute the Partition function $Z$ ) in a varying number of dimensions.

Exact Solution in Zero-Dimensions

In zero-dimensions, there is only a single spin and the interaction term in the Ising Hamiltonian vanishes, meaning that we are left with only the interactions between the spin and the external field $h$ . Therefore, in zero-dimensions our new Hamiltonian becomes

H = -h \sigma

As previously mentioned, we must compute the partition function. Given there is only one spin with two possible states micro-states ( $\{-1, +1\}$ ), the Partition function can be trivially computed as

\begin{align*} Z &= \sum_{s = \pm 1} e^{-\beta E(s)} \\ &= e^{\beta h} + e^{-\beta h} \\ &= 2 \cosh(\beta h) \end{align*}

From this partition function, we can compute most of what we are interested in. First, the free energy is given by

\begin{align*} F &= -T \log(Z) \\ &= -T \log(2 \cosh(\beta h)) \end{align*}

The mean magnetization can be computed as the thermodynamic mean or expectation of the spin(s), which results in

\begin{align*} \langle \sigma \rangle &= \sum_{s = \pm 1} P(s) s \\ &= \frac{1}{Z} \sum_{s = \pm 1} E(s) s \\ &= \frac{1}{Z} \big[ e^{\beta h} - e^{-\beta h} \big] \\ &= \frac{1}{Z} \big[ 2 \sinh(\beta h) \big] \\ &= \tanh(\beta h) \end{align*}

Which is bounded strictly between $-1$ and $1$ , dependent on the inverse temperature $\beta = \frac{1}{T}$ (assuming $k_B = 1$ ) and the external field $h$ . Figure 6 below visualizes the mean magnetism (x-axis) at different temperatures. Given any non-zero external field, the average magnetism of the spin is always the sign of the external magnetic field. Right at zero external field, the mean magnetism is always zero (except for the divergence at absolute-zero, which we won’t worry about).

**Figure 6:** Zero-dimensional Ising model results showing the paramagnetic behavior of an isolated spin. Left panel: Magnetization $M$ as a function of temperature $T$ for various external field strengths. At zero field ( $h=0$ ), the magnetization is precisely zero at all temperatures due to the equal probability of spin up and down states. As external field strength increases ( $h=0.1, 0.5, 1.0$ ), the system exhibits induced magnetization that monotonically decreases with temperature. Right panel: Magnetization as a function of external field $h$ at different temperatures. The curves demonstrate the characteristic S-shaped response typical of paramagnetism, with steeper slopes at lower temperatures indicating higher magnetic susceptibility. As temperature increases, thermal fluctuations increasingly compete with the aligning effect of the external field, resulting in a more gradual approach to saturation.

The magnetic susceptibility $\chi$ , which measures the system’s response to small changes in the external field, can be derived from the average magnetization as

\chi = \frac{\partial \langle \sigma \rangle}{\partial h} = \frac{\partial}{\partial h} \tanh(\beta h) = \beta \text{sech}^2(\beta h)

Meaning that while the zero-dimensional Ising model responds to external fields, it cannot exhibit a phase transition. The susceptibility remains analytical for all temperatures and fields, showing no divergence that would result in a phase transition.

Exact Solution in One-Dimension

In one dimension, the Ising Hamiltonian is given by

H = - J \sum_{i} \sigma_i \sigma_{i+1} - h \sum_i \sigma_i

We define the periodic boundary condition $\sigma_{N+1} = \sigma_{1}$ converting this into a sort of circularly linked list. In the thermodynamic limit where $N \to \infty$ , this shouldn’t matter too much, but it turns out that this assumptions affords us a couple of nice simplifications that makes the math easier later on.

Before computing the Partition function, let us first perform a change of variables

\begin{align*} H &= -J \sum_{i} \sigma_i \sigma_{i+1} - h \sum_i \sigma_i \\ &= \sum_{i} ( -J \sigma_i \sigma_{i+1} - \frac{h}{2} (\sigma_i + \sigma_{i+1}) ) \\ &= \sum_{i} E(\sigma_i, \sigma_{i+1}) \end{align*}

Where $E(\sigma_i, \sigma_{i+1}) = (- J \sigma_i \sigma_{i+1} - \frac{h}{2} (\sigma_i + \sigma_{i+1}) )$ is the energy of the bonds between lattice sites $i$ and $i+1$ . Notice the factor of $\frac{1}{2}$ that snuck in due to the double counting of the external field energy.

Again, the partition function takes on the following form

Z = \sum_{\{s\}} e^{-\beta E(\{s\})}

Where $\{s\}$ are the $2^N$ possible states of the system. Notice that because we have interacting terms, this is already a lot harder of a partition function to solve than the zero-dimensional case. In my studies of this form, I came across two solutions: A combinatorial one, and one that involves a transfer matrix. I much prefer the latter, and so we will follow along with that solution. To do so, we will introduce a couple of notational tricks that allow us to rewrite the Partition function as a product of matrices (the transition matrix), which will result in a big simplification happening. First, let’s expand and simplify the Partition function a bit.

\begin{align*} Z &= \sum_{s_1} \sum_{s_2} \cdots \sum_{s_N} e^{-\beta \sum_{i} E(\sigma_i, \sigma_{i+1})} \\ &= \sum_{s_1} \sum_{s_2} \cdots \sum_{s_N} e^{-\beta E(\sigma_1, \sigma_{2}) - \beta E(\sigma_2, \sigma_{3}) - \ldots - \beta E(\sigma_N, \sigma_{1})} \\ &= \sum_{s_1} \sum_{s_2} \cdots \sum_{s_N} e^{-\beta E(\sigma_1, \sigma_{2})} e^{-\beta E(\sigma_2, \sigma_{3})} \cdots e^{-\beta E(\sigma_N, \sigma_{1})} \\ &= \sum_{s_1} \sum_{s_2} \cdots \sum_{s_N} T(\sigma_1, \sigma_2) T(\sigma_2, \sigma_3) \cdots T(\sigma_N, \sigma_1) \end{align*}

Where $T(\sigma_i, \sigma_j) = e^{-\beta E(\sigma_i, \sigma_j)}$ . Next, notice that $T(\sigma_1, \sigma_{2})$ can only take on four possible states: $T(\uparrow, \uparrow)$ , $T(\uparrow, \downarrow)$ , $T(\downarrow, \uparrow)$ , and $T(\downarrow, \downarrow)$ . Thus, we can represent the function $T(\sigma_i, \sigma_j)$ with the following two-by-two (transfer) matrix

\mathbf{T} = \begin{bmatrix} T(\uparrow, \uparrow) & T(\uparrow, \downarrow) \\ T(\downarrow, \uparrow) & T(\downarrow, \downarrow) \end{bmatrix} = \begin{bmatrix} e^{\beta(J+h)} & e^{-\beta J} \\ e^{-\beta J} & e^{\beta(J-h)} \end{bmatrix}

Where the value of $T(\sigma_1, \sigma_{2})$ is given by an element in $\mathbf{T}$ , namely $\mathbf{T}_{\sigma_i, \sigma_j}$ . Now, the key property that makes this technique so successful relies results from the following property

\sum_{\sigma_j} t_{\sigma_i, \sigma_j} t_{\sigma_j, \sigma_k} = (\mathbf{T} * \mathbf{T})_{\sigma_i, \sigma_k}

Which is just the definition of the matrix dot product. From this, we can simplify the partition function as

\begin{align*} Z &= \sum_{s_1} \sum_{s_2} \cdots \sum_{s_N} T(\sigma_1, \sigma_2) T(\sigma_2, \sigma_3) \cdots T(\sigma_N, \sigma_1) \\ &= \sum_{s_1} \sum_{s_3} \cdots \sum_{s_N} (\mathbf{T} * \mathbf{T})_{\sigma_1, \sigma_3} \cdots T(\sigma_N, \sigma_1) \\ &= \sum_{s_1} \sum_{s_4} \cdots \sum_{s_N} (\mathbf{T} * \mathbf{T} * \mathbf{T})_{\sigma_1, \sigma_4} \cdots T(\sigma_N, \sigma_1) \\ &= \sum_{s_1} (\mathbf{T}^N)_{\sigma_1, \sigma_1} \end{align*}

Because we are now only summing over $\sigma_1$ , the only possible states left are $T(\uparrow, \uparrow)$ and $T(\downarrow, \downarrow)$ , which are the diagonal values of $\mathbf{T}^N$ , or the trace.

Z = \text{Tr}[\mathbf{T}^N]

The fact that we have a periodic boundary condition is really what enables the partition function $Z$ to be written as the $N$ -th power of $\mathbf{T}$ . This is a huge simplification, and has a closed form solution after eigenvalue analysis.

As a reminder, recall that the trace of a matrix is equal to the sum of its eigenvalues. Furthermore, because $\mathbf{T}$ is symmetric and real, then $\mathbf{T}^N$ is also, and that

\text{Tr}[\mathbf{T}^N] = \lambda_{+}^N + \lambda_{-}^N

Here, we define the eigenvalues as $\lambda_{+}$ and $\lambda_{-}$ , and assume that $\lambda_{+} \ge \lambda_{-}$ , then solve for them analytically. Also, note that because of the power of $N$ that we only care about the larger of the two eigenvalues since in the thermodynamic limit ( $N \to \infty$ ) we have

Z = \lambda_{+}^N + \lambda_{-}^N = \lambda_{+}^N (1 + \frac{\lambda_{-}^N}{\lambda_{+}^N}) \approx \lambda_{+}^N

Setting up the characteristic equation and solving for $\lambda_{+}$ and $\lambda_{-}$ gives

\begin{align*} \det (\mathbf{T} - \lambda I) &= (e^{\beta (J + h)} - \lambda) (e^{\beta (J - h)} - \lambda) - e^{-2\beta J} \\ &= -\lambda e^{\beta J} 2 \cosh(\beta h) + 2 \sinh(2\beta J) + \lambda^2 \end{align*}

Whose solutions (after setting to zero and solving for lambda) are

\begin{align*} \lambda_{\pm} &= \frac{1}{2} \big( 2 e^{\beta J} \cosh(\beta h) \pm \sqrt{4 e^{2\beta J} \cosh^2 (\beta h) - 8 \sinh(2\beta J)} \big) \\[0.5em] &= e^{\beta J} \cosh(\beta h) \pm \sqrt{e^{2\beta J} \cosh^2 (\beta h) - 2 \sinh(2\beta J)} \\[0.5em] &= e^{\beta J} \cosh(\beta h) \pm \sqrt{e^{2\beta J} \cosh^2 (\beta h) - (e^{2\beta J} - e^{-2\beta J})} \\[0.5em] &= e^{\beta J} \cosh(\beta h) \pm \sqrt{e^{2\beta J}(\sinh^2(\beta h) + 1) - e^{2\beta J} + e^{-2\beta J}} \\[0.5em] &= e^{\beta J} \cosh(\beta h) \pm \sqrt{e^{2\beta J}\sinh^2(\beta h) + e^{-2\beta J}} \\[0.5em] &= e^{\beta J} \big[ \cosh(\beta h) \pm \sqrt{\sinh^2(\beta h) + e^{-4\beta J}} \big] \end{align*}

Now for the free energy, we want a system that is not proportional to N and so we instead define the free-energy per spin as $f = N^{-1} F$ . This gives us

\begin{align*} f &= -T N^{-1} \log(Z) \\ &= -T N^{-1} \log(\lambda_{+}^N + \lambda_{-}^N) \end{align*}

Finally, taking the thermodynamic limit we get the final free-energy per spin as

\begin{align*} f &= -T N^{-1} \log(\lambda_{+}^N) \\ &= -T \log(\lambda_{+}) \end{align*}

Which evaluates to

f = -T \log(e^{\beta J} \big[ \cosh(\beta h) \pm \sqrt{\sinh^2(\beta h) + e^{-4\beta J}} \big])

After some tedious algebra, the magnetization per spin $M = \langle \sigma \rangle$ can be computed from the free energy as

M = \frac{\sinh(\beta h)}{\sqrt{\sinh^2(\beta h) + e^{-4 \beta J}}}

At zero external field $(h=0)$ , analysis shows that $M=0$ for all $T>0$ . This absence of spontaneous magnetization indicates that the one-dimensional Ising model also cannot sustain long-range order at any non-zero temperature. The magnetic susceptibility $\chi$ can be computed by taking a second derivative

\chi = \frac{\partial m}{\partial h} = \beta \frac{1 - \tanh^2(\beta H_{\text{eff}})}{1 - \beta J\text{sech}^2(\beta H_{\text{eff}})}

where $H_{\text{eff}} = J + h$ is the effective field. This susceptibility remains finite for all non-zero temperatures, confirming the absence of a phase transition in the one-dimensional Ising model.

**Figure 7:** Magnetization behavior of the one-dimensional Ising model as a function of temperature for various external field strengths. At zero external field ( $h=0$ ), the magnetization remains exactly zero at all non-zero temperatures, demonstrating the absence of spontaneous magnetization in the 1D Ising model. This confirms the mathematical result that no phase transition occurs in the one-dimensional case, as the system cannot sustain long-range order at any finite temperature. As external fields of increasing strength ( $h=0.1, 0.5, 1.0$ ) are applied, the system exhibits induced magnetization that asymptotically approaches zero at high temperatures as thermal fluctuations overcome the influence of both the external field and the nearest-neighbor interactions. The results are similar to the zero-dimensional case.

This lack of a phase transition in the one-dimensional Ising model is a result of the fact that the thermal fluctuations at any non-zero temperature are sufficient to destroy long-range order. That is, creating a domain wall in one dimension costs a finite amount of energy ( $2J$ ), but the entropy gain from placing this domain wall anywhere along the chain grows logarithmically with system size. At any non-zero temperature, the entropic contribution to the free energy eventually dominates, favoring a disordered state. This competition between energy and entropy explains why higher dimensions are necessary for the existence of phase transitions at finite temperatures.

Mean Field Theory for $d$ -Dimensions

For the two-dimensional Ising model in the absence of an external field, there is a solution by Onsager [5], but it’s a bit too complicated to give the full derivation here. Instead, we will discuss an approximation of the Hamiltonian that will allow us to solve the Ising model in arbitrary dimensions, namely the mean field approximation. Unfortunately, this approximation leads to quite poor results for low dimensions, and actually breaks down right at the critical point which we’ll talk about later.

The goal of this approximation is to decouple interacting terms between neighboring spins by replacing them with a single interaction to the mean field. In doing so, we are able to convert the many-body problem (with many interacting spins) to a one-body problem where each spin only interacts with the mean field. But this often lends itself to a circular argument (called self-consistency) in which the dynamics of our new mean-field system rely on microscopic states, but the microscopic states rely now on the mean-field (rather than the individual terms). We will see this dependence appear later on.

Given the Hamiltonian of the Ising model in an arbitrary $d$ -Dimensions ( $d \ge 0$ )

H = -J \sum_{\langle i, j \rangle} \sigma_i \sigma_j - h \sum_i \sigma_i

and the magnetization of the system $M = \langle \{ \sigma \} \rangle$ . We can then express the interacting spins as

\begin{align*} \sigma_i \sigma_j &= (\sigma_i - M + M) (\sigma_j - M + M) \\ &= M^2 + M(\sigma_i - M) + M(\sigma_j - M) + (\sigma_i - M) (\sigma_j - M) \\ &\approx M^2 + M \sigma_i - M^2 + M \sigma_j - M^2 \\ &\approx -M^2 + M (\sigma_i + \sigma_j) \end{align*}

In the mean field approximation, we neglect the correlation term $(\sigma_i - M) (\sigma_j - M)$ because it represents fluctuations from the mean at different sites, which we assume to be uncorrelated and of higher order. This gives us the following new mean-field Hamiltonian $H_{\text{mf}}$

H_{\text{mf}} = -J \sum_{\langle i, j \rangle} -M^2 + M (\sigma_i + \sigma_j) - h \sum_i \sigma_i

In $d$ dimensions, the number of neighbors each dipole on the lattice has is $\mathit{z}=2d$ . Given this, the first term in the Hamiltonian becomes

-J \sum_{\langle i, j \rangle} -M^2 = \frac{1}{2} J N \mathit{z} M^2

And additionally, the second term becomes

-J \sum_{\langle i, j \rangle} M (\sigma_i + \sigma_j) = -J \mathit{z} M \sum_i \sigma_i

Then, our mean-field Hamiltonian becomes

\begin{align*} H_{\text{mf}} &= \frac{1}{2} J N \mathit{z} M^2 - (J \mathit{z} M + h) \sum_i \sigma_i \\ &= \frac{1}{2} J N \mathit{z} M^2 - H_{\text{eff}} \sum_i \sigma_i \end{align*}

Where the effective field $H_{\text{eff}} = J \mathit{z} M + h$ is the total magnetic field felt by the spins. In other words, we have essentially removed the pairwise spin-spin interactions, and replaced them with the effective field $H_{\text{eff}}$ which consists of the external field $h$ and the mean field $J \mathit{z} M$ . Because all of the pairwise spins have been decoupled, the partition function becomes trivial to compute.

\begin{align*} Z_{\text{mf}} &= \sum_{\{\sigma\}} \exp\big[-\beta H_{\text{mf}}(\{\sigma\}) \big] \\ &= \sum_{\{\sigma\}} \exp\big[-\beta (\frac{1}{2} J N \mathit{z} M^2 - H_{\text{eff}} \sum_i \sigma_i) \big] \\ &= \exp\big[-\beta \frac{1}{2} J N \mathit{z} M^2\big] \big( \sum_{\sigma_i} \exp\big[\beta H_{\text{eff}} \sum_i \sigma_i\big] \big) \\ &= \exp\big[-\beta \frac{1}{2} J N \mathit{z} M^2\big] \big( \sum_{\sigma_i} \prod_{i=1}^N \exp\big[\beta H_{\text{eff}}\sigma_i\big] \big) \\ &= \exp\big[-\beta \frac{1}{2} J N \mathit{z} M^2\big] \big( \prod_{i=1}^N \sum_{\sigma_i} \exp\big[\beta H_{\text{eff}}\sigma_i\big] \big) \\ &= \exp\big[-\beta \frac{1}{2} J N \mathit{z} M^2\big] (2 \cosh(\beta H_{\text{eff}}))^N \\ &= 2^N \exp\big[-\beta \frac{1}{2} J N \mathit{z} M^2\big] \cosh^N(\beta H_{\text{eff}}) \end{align*}

Notice that the swap of the product and sum above is possible because $\sum_{\sigma_i} \prod_{i=1}^N \sigma_i = \prod_{i=1}^N \sum_{\sigma_i} \sigma_i$ . Also note that the product came from the fact that $e^{\sum_i \sigma_i} = \prod_i e^{\sigma_i}$ , which is pretty trivial to see if you expand out the sum.

Then, the free energy $f_{\text{mf}}$ per spin is

\begin{align*} f_{\text{mf}} &= -T N^{-1} \log(Z) \\ &= \frac{1}{2} J \mathit{z} M^2 - T \log(2 \cosh(\beta H_{\text{eff}})) \end{align*}

Remember that $\beta = 1/T$ because we assume that $k_T = 1$ , and thus the factors of $T$ cancel out in the left hand side of the free-energy. The behavior of the free energy for varying temperatures can be seen in Figure 8.

**Figure 8:** The characteristic shapes of the mean-field free energy at different temperatures. Above $T_c$ , the free energy has a single minimum at $M = 0$ , reflecting the stability of the paramagnetic phase. At $T_c$ , this minimum becomes very flat, indicating the critical point where susceptibility diverges. Below $T_c$ , the free energy develops a characteristic double-well structure with minima at finite $\pm M$ , corresponding to the two possible directions of spontaneous magnetization. The central maximum at $M = 0$ represents the unstable state.

Now we can compute the magnetization as $M = - \partial f_{\text{mf}} / \partial H_{\text{eff}}$ , which gives the self-consistency equation

M = \tanh(\beta (J \mathit{z} M + h))

To repeat what was said before, this is called the self-consistency equation because the mean-field relies on the microscopic states, but the microscopic states now also rely on the mean-field. That is, $M$ is a function of itself!

**Figure 9:** Graphical solutions to the self-consistency equation $M = tanh(\beta JM)$ for the $d$ -dimensional Ising model in zero external field at different temperatures. The dashed line represents $y = M$ , while the colored curves show $y = tanh(\beta JM)$ at three distinct temperatures relative to the critical temperature $T_c$ . Intersection points (black circles) indicate solutions to the self-consistency equation. Left panel (blue curve): For $T > T_c$ , only a single solution exists at $M = 0$ , corresponding to the paramagnetic phase with no spontaneous magnetization. Middle panel (green curve): At $T = T_c$ , the slopes of the two curves match at $M = 0$ , marking the critical point where the system transitions between ordered and disordered phases. Right panel (red curve): For $T < T_c$ , three solutions emerge: an unstable solution at $M = 0$ and two stable solutions at $M = \pm 1$ , reflecting the ferromagnetic phase with spontaneous symmetry breaking. This graphical representation illustrates the emergence of non-zero spontaneous magnetization below $T_c$ as predicted by mean field theory.

To understand the phase transition in this mean field approximation, let us first determine the critical temperature by analyzing the self-consistency equation. At zero external field $(h=0)$ , we can examine when a non-zero magnetization first appears as we lower the temperature. For small $M$ , we can expand the hyperbolic tangent using its Taylor series

\tanh(x) = x - \frac{1}{3}x^3 + O(x^5)

Truncating at the cubic term and substituting in $x = \beta JzM$ gives

M = \beta JzM - \frac{1}{3}(\beta Jz)^3 M^3 + O(M^5)

M[1 - \beta Jz + \frac{1}{3}(\beta Jz)^3 M^2] = 0

This equation has two types of solutions: $M = 0$ (paramagnetic phase) and solutions where the term in square brackets vanishes (ferromagnetic phase). The critical temperature occurs when the coefficient of the linear term vanishes, and therefore

1 - \beta_c Jz = 0

T_c = \frac{Jz}{k_B}

Above $T_c$ , only $M = 0$ is stable. Below $T_c$ , non-zero solutions emerge, leading to the onset of spontaneous magnetization. More on the validity of these results in the next section.

Critical Exponents of the MFT Solution

Next, let’s derive some of the critical exponents to analyze how various quantities scale near $T_c$ . First, consider the magnetization below $T_c$ . As we did before, let’s define the reduced temperature $t = (T - T_c)/T_c$ , then notice that near $T_c$ we have

\beta Jz = \frac{Jz}{k_BT} = \frac{T_c}{T}

Substituting back into our expansion and rearranging gives

M[1 - \beta Jz + \frac{1}{3}(\beta Jz)^3 M^2] = 0

M[1 - \frac{T_c}{T} + \frac{1}{3}(\frac{T_c}{T})^3 M^2] = 0

M[\frac{T - T_c}{T} + \frac{1}{3}(\frac{T_c}{T})^3 M^2] = 0

Which has a non-interesting solution at $M=0$ and another (more interesting) solution when the inner part is zero giving

\frac{T - T_c}{T} + \frac{1}{3}(\frac{T_c}{T})^3 M^2 = 0

\frac{1}{3}(\frac{T_c}{T})^3 M^2 = \frac{T_c - T}{T}

M^2 = 3 (\frac{T_c - T}{T} ) (\frac{T}{T_c})^3

M^2 = 3t (1+t)^2

M = \pm \sqrt{3t (1+t)^2}

Which for $t \ll 1$ we have that $M \sim \pm \sqrt{3t}$ or that $M \sim |t|^{1/2}$ with the critical exponent being $\beta = 1/2$ . For the magnetic susceptibility, we can do something similar by differentiating the self-consistency equation with respect to $h$ resulting in

\frac{\partial M}{\partial h} = \beta(1 - \tanh^2[\beta(JzM + h)])(Jz\frac{\partial M}{\partial h} + 1)

\chi = \beta(1 - \tanh^2[\beta(JzM + h)])(Jz \chi + 1)

Given no external field $h=0$ and $T \rightarrow T_c^+$ where $M=0$ we get roughly the following

\chi = \beta(1 - 0)(Jz \chi + 1)

\chi = \frac{\beta}{1-\beta Jz} \sim \frac{\beta_c}{t}

This gives $\chi \propto |t|^{-\gamma}$ with $\gamma = 1$ . I skipped a couple of steps, but the whole derivation, and the derivation of some other critical exponents, can be found in Franz Utermohlen notes [6] on the topic, or either the Berlinsky [4] or Kopietz [7] books.

**Figure 10:** Critical behavior in mean-field theory (MFT) as a function of reduced temperature $t = (T-T_c)/T_c$ . Left panel: Magnetization with critical exponent $\beta = 0.5$ . Right panel: Magnetic susceptibility exhibiting the power law divergence with critical exponent $\gamma = 1.0$ .

Remember that MFT gives us an approximate solution to the true Hamiltonian, so how good are these results? It turns out that in high dimensions, they are quite good (exact even), but fail in lower dimensions. Namely, above something called the upper critical dimension (where $d_c = 4$ for the mean-field Ising model), fluctuations from the mean become irrelevant and these exponents are exact. Below $d_c$ , fluctuations modify the critical behavior in a way that mean field theory fails to capture.

For example, at zero external field ( $h=0$ ), this equation predicts a second-order phase transition at a critical temperature $T_c = Jz/k_B$ for all dimensions. The spontaneous magnetization below this temperature follows $M \propto (T_c - T)^{1/2}$ , with the critical exponent $\beta = 1/2$ being independent of dimension. However, this contradicts our exact solutions for the 1D and 2D Ising model, where in one dimension, the transfer matrix method yields the exact partition function, showing that the free energy per site is analytic for all finite temperatures. The correlation length, given by $\xi = -1/\ln[\tanh(\beta J)]$ , remains finite for all $T > 0$ , demonstrating the absence of long-range order at any non-zero temperature.

Furthermore, in the two-dimensional case, Onsager’s exact solution shows that while a phase transition does exist, it differs significantly from the mean field prediction. The critical temperature in the exact solution ( $T_c \approx 2.27$ ) is lower than the mean field prediction of $k_BT_c/J = 4$ . More importantly, the magnetization near the critical point follows $M \propto (T_c - T)^{1/8}$ , with $\beta = 1/8$ , rather than the mean field value of $1/2$ . We will talk further about these discrepancies in a couple of sections.

The Variational Principle

Next, I want to have a quick word about the theoretical basis for mean field theory, why it works, and present a quick relationship to variational inference. Recall that our goal from MFT is to simplify a many-body problem with interacting degrees of freedom into a more tractable one-body problem. In doing so, the (usually intractable) pairwise interactions are replaced by a single interaction between each degree of freedom and an effective “mean field.” In this sense, the first step in MFT involves replacing the original Hamiltonian $H$ , which encodes the full many-body interactions, with an approximate Hamiltonian. For consistency with standard nomenclature, we label this approximate form the trial Hamiltonian $H_T$ . Note that the trial Hamiltonian is just an approximation of the real Hamiltonian $H$ . Now, let $Z$ be the partition function of $H$ in the canonical ensemble, and define the free energy $F$ to be the familiar

F = -T\log(Z)

Which, for complicated $H$ with many interacting degrees of freedom, the problem of computing the partition function ( $Z$ ), and thus the free energy ( $F$ ), is typically intractable. Here’s where the variational principle comes in. We introduce the trial Hamiltonian, $H_T$ , which ideally captures the essential features of $H$ while being much easier to handle. This trial Hamiltonian has its own partition function $Z_T$ and free energy $F_T$ defined analogously. Then, the probability distribution associated with the trial Hamiltonian is

P_T(\{s\}) = \frac{e^{-\beta H_T(\{s\})}}{Z_T}

Now, we can derive the relationship between $Z$ and $Z_T$ as

\begin{align*} Z &= \sum_{\{s\}} e^{-\beta H(\{s\})} \\ &= \sum_{\{s\}} e^{-\beta H_T(\{s\})} \frac{e^{-\beta H(\{s\})}}{e^{-\beta H_T(\{s\})}} \\ &= \sum_{\{s\}} e^{-\beta H_T(\{s\})} e^{-\beta(H(\{s\}) - H_T(\{s\}))} \\ &= \sum_{\{s\}} P_T(\{s\}) Z_T e^{-\beta(H(\{s\}) - H_T(\{s\}))} \\ &= Z_T \sum_{\{s\}} P_T(\{s\}) e^{-\beta(H - H_T)} \end{align*}

This trial Hamiltonian allows us to rewrite the true free energy in a clever way by using this fact. Substituting back into the expression for $F$ , we find that

\begin{align*} F &= -T\log(Z) \\ &= -T\log\left(Z_T \sum_{\{s\}} P_T(\{s\}) e^{-\beta (H - H_T)}\right) \\ &= -T\log(Z_T) - T\log\left(\sum_{\{s\}} P_T(\{s\}) e^{-\beta (H - H_T)}\right) \\ &= F_T - T\log\left(\sum_{\{s\}} P_T(\{s\}) e^{-\beta (H - H_T)}\right) \\ &= F_T - T\log\left(\langle e^{-\beta (H - H_T)} \rangle_T\right) \end{align*}

where $\langle * \rangle_T$ is the expectation taken over the trial distribution $P_T$ . Simplifying this a bit further using Jensen’s inequality, which states that $\log(\langle X \rangle) \ge \langle \log(X) \rangle$ for a convex function like the logarithm, we get:

\begin{align*} F &= F_T - T\log\left(\langle e^{-\beta (H - H_T)} \rangle_T\right) \\ &\leq F_T - T \langle \log(e^{-\beta (H - H_T)}) \rangle_T \\ &\leq F_T - T \langle -\beta (H - H_T) \rangle_T \\ &\leq F_T + T\beta \langle H - H_T \rangle_T \\ &\leq F_T + \langle H - H_T \rangle_T \end{align*}

Where the right hand side of this inequality (called the Bogoliubov inequality) is called the variational free energy

F_{var} \triangleq F_T + \langle H - H_T \rangle_T

This equation has a really nice interpretation. The first term, $F_T$ is the free energy of the trial Hamiltonian. The second term, $\langle H - H_T \rangle_T$ , measures how much the trial Hamiltonian $H_T$ deviates from the true Hamiltonian $H$ , averaged over the trial distribution. Together they provide an upper bound on the true free energy $F$ . The optimal trial Hamiltonian is the one that minimizes this upper bound, bringing the approximation as close as possible to the true free energy.

**Figure 11:** Image from Gregory Gundersen’s blog post on VI and the ELBO [8]. Schematic representation of the variational approach to mean field theory. The blob represents a family of trial distributions $\mathcal{Q}$ parameterized by the effective field. The optimization process starts with an initial trial distribution $q^{(0)}(\mathbf{Z}) \in \mathcal{Q}$ and iteratively minimizes the KL divergence between the current approximation $q^{(t)}(\mathbf{Z})$ and the true Boltzmann distribution $p(\mathbf{Z}|\mathbf{X})$ . This process is equivalent to minimizing the variational free energy $F_{var} = F_T + \langle H - H_T \rangle_T$ , which provides an upper bound to the true free energy $F$ according to the Bogoliubov inequality. The optimal distribution $q^*(\mathbf{Z})$ corresponds to the self-consistency equation $M = \tanh(\beta(JzM + h))$ in the Ising model, where the effective field parameter minimizes the difference between the trial Hamiltonian $H_T$ and the true many-body Hamiltonian $H$ .

The reason this is nice is because we essentially converted the mean-field problem into a variational inference one, where the closer $H_T$ is to $H$ , the closer the variational free energy becomes to the real free energy of the problem at hand. That is… if we parameterize $H_T$ with a set of weights, then optimizing over these weights is equivalent to making $F_{var}$ as small as possible… which causes $H_T$ to become a better approximation to $H$ … and thus $F_{var}$ approaches $F$ . And et voilà, it’s literally just variational inference. In fact, the Bogoliubov inequality is a sort of negative ELBO.

The see this briefly in play, let’s consider the generic Ising Hamiltonian again

H = -J \sum_{\langle i, j \rangle} \sigma_i \sigma_j - h \sum_i \sigma_i

Again, in the MFT approximation we want to remove the interacting terms and absorb them into an effective field, let’s now call $b$ .

H = b \sum_{i} \sigma_i

Now, notice that $b$ is out parameter or weight we will optimize over. That is, we want to know the effective field $b$ that best approximates the true free energy $F$ given our MFT assumption. Plugging this into the variational free energy, computing the derivative, and optimizing for $b$ reveals

b = h + zJ \tanh(\beta b)

Which is just our self-consistency equation! Well, after substituting in the connection between the effective field and magnetism $M = \tanh(\beta b)$ that is. Doing so gives

M = \tanh(\beta (JzM + h))

Super cool isn’t it! Now we have derived the self-consistency equation in two ways (albeit more in-depth in the former).

The Upper Critical Dimension

Recall that for MFT, we removed second order correlations between fluctuations, represented by terms of the form $(\sigma_i - M)(\sigma_j - M)$ . While this approximation captures qualitative features of phase transitions, it fails to accurately predict critical exponents below a certain spatial dimension, known as the upper critical dimension $d_c$ . To determine $d_c$ , we can check the validity of MFT by examining the ratio of fluctuations to the square of the order parameter

\frac{\langle (\delta M)^2 \rangle}{\langle M \rangle^2} \ll 1

This inequality must hold for the mean field approximation to be a good approximation. Near the critical point, we can express this ratio in terms of known quantities. The numerator, representing fluctuations about the mean, is related to the magnetic susceptibility $\partial M/\partial h$ , while the denominator scales as the square of the magnetization. Using our mean field results

\begin{align*} \langle M \rangle &\sim |t|^{\beta} \sim |t|^{1/2}, \quad t < 0 \\ \frac{\partial M}{\partial h} &\sim |t|^{-\gamma} \sim |t|^{-1} \end{align*}

Here, $t = (T-T_c)/T_c$ is the reduced temperature, and $\beta=1/2$ and $\gamma=1$ are the mean field critical exponents.

These mean field exponents are modified by fluctuations. In general, fluctuations are controlled by the correlation length $\xi$ , which determines the size of correlated regions. The Ginzburg criterion states that mean field theory breaks down when fluctuations within a correlated volume become comparable to the square of the order parameter.

The correlation volume scales as $\xi^d$ , where $d$ is the spatial dimension. The magnitude of fluctuations within this volume can be derived from the two-point correlation function $G(r) = \langle \sigma_0\sigma_r \rangle - \langle \sigma_0 \rangle \langle \sigma_r \rangle$ , which scales as $r^{-(d-2)}$ at the critical point. Integrating over the correlation volume gives a typical fluctuation size of $\xi^{-(d-2)}$ . Therefore

\frac{\langle (\delta M)^2 \rangle}{\langle M \rangle^2} \sim \frac{\xi^{-(d-2)}}{|t|^1}

Using the mean field scaling relation $\xi \sim |t|^{-\nu} \sim |t|^{-1/2}$ , we obtain that

\frac{\langle (\delta M)^2 \rangle}{\langle M \rangle^2} \sim \frac{|t|^{(d-2)/2}}{|t|^1} \sim |t|^{(d-4)/2}

For mean field theory to become exact as $t \to 0$ (approaching the critical point), this ratio must vanish, requiring

(d-4)/2 > 0

This yields $d_c = 4$ as the upper critical dimension for the Ising model with short-range interactions. For $d > 4$ , fluctuations become irrelevant near the critical point, and mean field theory provides exact critical exponents. At $d = 4$ , we find logarithmic corrections to mean field behavior where

\xi \sim |t|^{-1/2}(\ln|t|)^{1/4}

\langle M \rangle \sim |t|^{1/2}(\ln|t|)^{1/4}

For $d < 4$ , the mean field description breaks down near the critical point, and we must use more sophisticated methods like the renormalization group to obtain correct critical exponents.

The physical intuition behind this dimensional dependence is that in higher dimensions, each spin has more neighbors ( $z = 2d$ ), making the mean field approximation more accurate as dimensionality increases. When $d > 4$ , this effect is strong enough to suppress critical fluctuations entirely.

Specifically, MFT fails at the critical point because of the growing magnitude of fluctuations. As we approach $T_c$ , the correlation length $\xi$ diverges, meaning that spins become increasingly correlated over large distances. Mean field theory assumes that each spin feels only the average effect of its neighbors, effectively treating fluctuations as small and uncorrelated. However, near $T_c$ , fluctuations occur on all length scales up to $\xi$ , and these fluctuations are highly correlated.

In lower dimensions ( $d < 4$ ), these correlated fluctuations become particularly important as the collective behavior of many spins moving together can no longer be approximated by independent spins responding to an average field. This collective behavior leads to different critical exponents than those predicted by mean field theory, requiring more sophisticated theoretical approaches.

Final Words

A bit of a longer post I guess, but I found this topic really interesting, and it’s been pretty widely studied so there is a lot of depth that can be explored. This post is basically just a dump of everything that I have been studying along this axis for the past couple of months. In fact, I could probably write another 20 pages on renormalization group methods, and it’s super interesting relationships to deep learning [9] [10] [11]. There have even been some people who have tried to take similar approaches to studying criticality [12], which I think is an interesting research direction. But, I will leave this for another time. There is just too much to write about, and to many things that I want to look into. Most likely the next post up will be on mechanistic interpretability, but we’ll see.

References

Beggs, J. M. (2022). The Cortex and the Critical Point: Understanding the Power of Emergence.
Beggs, J. M. (2022). Addressing skepticism of the critical brain hypothesis. Link
Tian, Y., Tan, Z., Hou, H., Li, G., Cheng, A., Qiu, Y., Weng, K., Chen, C., Sun, P. (2022). Theoretical foundations of studying criticality in the brain. Link
Berlinsky, A. J., Harris, A. B. (2019). Statistical Mechanics: An Introductory Graduate Course.
Onsager, L. (1944). Crystal Statistics. I. A Two-Dimensional Model with an Order-Disorder Transition. Link
Utermohlen, F. (2018). Mean Field Theory Solution of the Ising Model. Link
Kopietz, P., Bartosch, L., Schütz, F. (2010). Introduction to the Functional Renormalization Group.
Gundersen, G. (2021). The ELBO in Variational Inference. Link
Mehta, P., Schwab, D. J. (2014). An exact mapping between the Variational Renormalization Group and Deep Learning. Link
Bény, C. (2013). Deep learning and the renormalization group. Link
Rojefferson (2019). Deep Learning and the Renormalization Group. Link
Meshulam, L., Bialek, W. (2024). Statistical mechanics for networks of real neurons. Link
Ising, E. (1925). Beitrag zur Theorie des Ferromagnetismus. Link
Baxter, R. J. (2016). Exactly solved models in statistical mechanics.
Sakthivadivel, D. A. R. (2022). Magnetisation and Mean Field Theory in the Ising Model. Link

Figures

Custom. First- versus second-order phase transitions. Link.
Custom. Magnetism phase diagram. Link.
Custom. Critical exponents for magnetism, correlation length, and susceptibility. Link.
From the Physics Stack Exchange. Illustration of the correlation length. Link.
Custom. Illustration of the 2D Ising model on a lattice. Link.
Custom. Zero-dimensional Ising solution. Link.
Custom. One-dimensional Ising solution. Link.
Custom. Free energy for the mean-field Ising model at various temperatures. Link.
Custom. Visualization of the self-consistency equation at various temperatures. Link.
Custom. Critical exponents for the mean-field Ising. Link.
From Gregory Gundersen’s blog post on variational inference. Link.

Tables

Custom. Various critical exponents.

Phase Transitions & The Ising Model