Diffusion Models – 1: The Surprisingly Tricky Kolmogorov Equations

This is the first of a series of notes to understand the mathematics of diffusion models from the perspective of an electrical engineer with a background in the mathematical theory of signals and systems based on frequency domain analysis and the Fourier Transform.

Consider a stochastic process X(t) and let p(x_2, t_2|x_1, t_1) \doteq \Pr \left( X(t_2)=x_2 | X(t_1)=x_1 \right), be the conditional probability that the process takes the value x_2 at time t_2 given X(t_1)=x_1.

From The Law of Total Probability, we have p(x_2, t_2) \equiv \int_{x_1} p(x_2, t_2|x_1, t_1) p(x_1, t_1) d x_1. This holds for all x_1, t_1, x_2, t_2, but we will now specialize to a causal sequence of time instants t_0 < t_1 < t_2 and so on. Again using The Law of Total Probability, we can write: p(x_2, t_2 | x_0, t_0) \equiv \int_{x_1} p(x_2, t_2|x_1, t_1, x_0, t_0) p(x_1, t_1|x_0, t_0) d x_1.

If we add the assumption that X(t) is Markov, we get a (slightly) simplified equation: p(x_2, t_2 | x_0, t_0) \equiv \int_{x_1} p(x_2, t_2|x_1, t_1) p(x_1, t_1|x_0, t_0) d x_1 which is sometimes called the Master Equation (ME) – a rather grandiose name for a fairly humble observation.

Differential Form of the Master Equation

Now we will limit ourselves to continuous-time, continuous-valued process X(t) that are nice and smooth. Specifically, we will assume that X(t) is continuous. Of course, for random processes, there are many different definitions of continuity, but we will adopt an informal definition: over an infinitesimally small time intervals \Delta t, the change \Delta X(t) must also be infinitesimally small. Specifically, we will assume that p(x_2, t+ \Delta t|x_1, t) is zero for all values of x_2 except a small neighborhood \left|x_2-x_1 \right| \leq \Delta x. The same is true of course of the product p(x_2, t_2|x_1, t_1) p(x_1, t_1). A standard method in the theory of stochastic processes is represent this product by a Taylor Series to obtain the so-called Kramers-Moyal expansion to express the Master Equation in a differential form. A truncation of this Taylor Series yields the famous Fokker-Planck equation.

However, a detailed derivation of this Taylor Series turns out to be surprisingly tricky if we want to maintain full generality and avoid additional simplifying assumptions.

A Wrong Turn

Consider p(x,t+\Delta t) \equiv \int_{x_1 = -\infty}^\infty p(x, t + \Delta t|x_1, t) p(x_1, t) d x_1. Let \epsilon \doteq x-x_1 be the (random) increment in X(t) in the time interval \Delta t. It is tempting to try p(x,t+\Delta t) \equiv -\int_{\epsilon= -\infty}^\infty p(x, t + \Delta t|x - \epsilon, t) p(x-\epsilon, t) d \epsilon and write a Taylor Series for the integrand. This, however, is a road to nowhere: Taylor Series are useful over a limited range of values for \epsilon, but this formulation requires integrating over all \epsilon \in \mathbb{R}.

One way to salvage this attempt is to assume the process X(t) has independent increments so the transition probabilities are state-independent i.e. p(x, t + \Delta t|x - \epsilon, t) \equiv p(\epsilon, t + \Delta t|0, t) \equiv p_t(\epsilon). According to our previous smoothness assumption, the fixed distribution p_t(\epsilon) has finite support in \epsilon \in [-\Delta x,\Delta x], and so we can write p(x,t+\Delta t) \equiv -\int_{\epsilon= -\Delta x}^{\Delta x} p_t(\epsilon) p(x-\epsilon, t) d \epsilon. Over this small and finite range, we can perform a Taylor expansion of p(x-\epsilon, t).

However, the independent-increments assumption represents a rather significant loss of generality, so we will see if we can avoid this. Our salvage attempt suggests a way forward: keep the p(x, t + \Delta t|x_1, t) term and only do a Taylor expansion of the other term p(x_1, t) \equiv p(x-\epsilon,t). Thus, we have for the first two Taylor Series terms: p(x,t+\Delta t) \equiv -\int_{\epsilon= -\Delta x}^{\Delta x} p(x, t + \Delta t|x - \epsilon, t) \left( p(x, t) -\epsilon p'(x,t) + \dots \right) d \epsilon.

Unfortunately, this expression cannot be simplified because the term p(x, t + \Delta t|x - \epsilon, t) is not a distribution over the variable of integration \epsilon. With a clever modification, we can make this derivation much more tractable.

A More Careful Attempt

Define f_\epsilon(x)=p(x+\epsilon, t + \Delta t|x, t) p(x, t). The subscript in f_\epsilon(x) is to remind ourselves that it is defined for a specific value of \epsilon. Then we have p(x,t+\Delta t) \equiv -\int_{\epsilon=-\infty}^\infty f_\epsilon(x-\epsilon) d\epsilon.

Now consider the expansion f_\epsilon(x-\epsilon)=f_\epsilon(x)-\epsilon f'_\epsilon(x)+\frac{\epsilon^2}{2}f''_\epsilon(x)+\dots, We have to determine if this avoids the pitfalls that we ran into in our earlier attempts. First, note that \int_{\epsilon} f_\epsilon(x) d\epsilon \equiv p(x,t). Define a_{\Delta t}(x,t) \doteq \int_{\epsilon} \epsilon f'_\epsilon(x) d\epsilon \equiv \int_\epsilon \epsilon p(x+\epsilon, t + \Delta t|x, t) d\epsilon and b_{\Delta t}(x,t) \doteq \int_{\epsilon} \frac{\epsilon^2}{2} f''_\epsilon(x) d\epsilon \equiv \int_\epsilon \frac{\epsilon^2}{2} p(x+\epsilon, t + \Delta t|x, t) d\epsilon.

We have: p(x,t+\Delta t) \equiv p(x,t)+\frac{\partial}{\partial x} \Big( p(x, t) a_{\Delta t}(x,t) \Big) + \frac{\partial^2}{\partial x^2} \Big( p(x, t) b_{\Delta t}(x,t) \Big) + \dots. Note that both a_{\Delta t}(x,t),~b_{\Delta t}(x,t) vanish as \Delta t \rightarrow 0 and the limits a(x,t) \doteq \frac{1}{\Delta t}a_{\Delta t}(x,t),~b(x,t) \doteq \frac{1}{\Delta t}b_{\Delta t}(x,t) when they are non-zero have natural physical interpretations as the drift rate and diffusion rate of the process X(t).

Thus we finally have the famous Fokker-Planck equation also known as the Kolmogorov forward equation: \frac{\partial p(x,t)}{\partial x}  \equiv \frac{\partial}{\partial x} \Big( p(x, t) a(x,t) \Big) + \frac{\partial^2}{\partial x^2} \Big( p(x, t) b(x,t) \Big) by keeping only the first two terms in the Taylor expansion.

Leave a comment