Diffusion Models – 1: The Surprisingly Tricky Kolmogorov Equations

This is the first of a series of notes to understand the mathematics of diffusion models from the perspective of an electrical engineer with a background in the mathematical theory of signals and systems based on frequency domain analysis and the Fourier Transform.

Consider a stochastic process $X(t)$ and let $p(x_2, t_2|x_1, t_1) \doteq \Pr \left( X(t_2)=x_2 | X(t_1)=x_1 \right)$ , be the conditional probability that the process takes the value $x_2$ at time $t_2$ given $X(t_1)=x_1$ .

From The Law of Total Probability, we have $p(x_2, t_2) \equiv \int_{x_1} p(x_2, t_2|x_1, t_1) p(x_1, t_1) d x_1$ . This holds for all $x_1, t_1, x_2, t_2$ , but we will now specialize to a causal sequence of time instants $t_0 < t_1 < t_2$ and so on. Again using The Law of Total Probability, we can write: $p(x_2, t_2 | x_0, t_0) \equiv \int_{x_1} p(x_2, t_2|x_1, t_1, x_0, t_0) p(x_1, t_1|x_0, t_0) d x_1$ .

If we add the assumption that $X(t)$ is Markov, we get a (slightly) simplified equation: $p(x_2, t_2 | x_0, t_0) \equiv \int_{x_1} p(x_2, t_2|x_1, t_1) p(x_1, t_1|x_0, t_0) d x_1$ which is sometimes called the Master Equation (ME) – a rather grandiose name for a fairly humble observation.

Differential Form of the Master Equation

Now we will limit ourselves to continuous-time, continuous-valued process $X(t)$ that are nice and smooth. Specifically, we will assume that $X(t)$ is continuous. Of course, for random processes, there are many different definitions of continuity, but we will adopt an informal definition: over an infinitesimally small time intervals $\Delta t$ , the change $\Delta X(t)$ must also be infinitesimally small. Specifically, we will assume that $p(x_2, t+ \Delta t|x_1, t)$ is zero for all values of $x_2$ except a small neighborhood $\left|x_2-x_1 \right| \leq \Delta x$ . The same is true of course of the product $p(x_2, t_2|x_1, t_1) p(x_1, t_1)$ . A standard method in the theory of stochastic processes is represent this product by a Taylor Series to obtain the so-called Kramers-Moyal expansion to express the Master Equation in a differential form. A truncation of this Taylor Series yields the famous Fokker-Planck equation.

However, a detailed derivation of this Taylor Series turns out to be surprisingly tricky if we want to maintain full generality and avoid additional simplifying assumptions.

A Wrong Turn

Consider $p(x,t+\Delta t) \equiv \int_{x_1 = -\infty}^\infty p(x, t + \Delta t|x_1, t) p(x_1, t) d x_1$ . Let $\epsilon \doteq x-x_1$ be the (random) increment in $X(t)$ in the time interval $\Delta t$ . It is tempting to try $p(x,t+\Delta t) \equiv -\int_{\epsilon= -\infty}^\infty p(x, t + \Delta t|x - \epsilon, t) p(x-\epsilon, t) d \epsilon$ and write a Taylor Series for the integrand. This, however, is a road to nowhere: Taylor Series are useful over a limited range of values for $\epsilon$ , but this formulation requires integrating over all $\epsilon \in \mathbb{R}$ .

One way to salvage this attempt is to assume the process $X(t)$ has independent increments so the transition probabilities are state-independent i.e. $p(x, t + \Delta t|x - \epsilon, t) \equiv p(\epsilon, t + \Delta t|0, t) \equiv p_t(\epsilon)$ . According to our previous smoothness assumption, the fixed distribution $p_t(\epsilon)$ has finite support in $\epsilon \in [-\Delta x,\Delta x]$ , and so we can write $p(x,t+\Delta t) \equiv -\int_{\epsilon= -\Delta x}^{\Delta x} p_t(\epsilon) p(x-\epsilon, t) d \epsilon$ . Over this small and finite range, we can perform a Taylor expansion of $p(x-\epsilon, t)$ .

However, the independent-increments assumption represents a rather significant loss of generality, so we will see if we can avoid this. Our salvage attempt suggests a way forward: keep the $p(x, t + \Delta t|x_1, t)$ term and only do a Taylor expansion of the other term $p(x_1, t) \equiv p(x-\epsilon,t)$ . Thus, we have for the first two Taylor Series terms: $p(x,t+\Delta t) \equiv -\int_{\epsilon= -\Delta x}^{\Delta x} p(x, t + \Delta t|x - \epsilon, t) \left( p(x, t) -\epsilon p'(x,t) + \dots \right) d \epsilon$ .

Unfortunately, this expression cannot be simplified because the term $p(x, t + \Delta t|x - \epsilon, t)$ is not a distribution over the variable of integration $\epsilon$ . With a clever modification, we can make this derivation much more tractable.

A More Careful Attempt

Define $f_\epsilon(x)=p(x+\epsilon, t + \Delta t|x, t) p(x, t)$ . The subscript in $f_\epsilon(x)$ is to remind ourselves that it is defined for a specific value of $\epsilon$ . Then we have $p(x,t+\Delta t) \equiv -\int_{\epsilon=-\infty}^\infty f_\epsilon(x-\epsilon) d\epsilon$ .

Now consider the expansion $f_\epsilon(x-\epsilon)=f_\epsilon(x)-\epsilon f'_\epsilon(x)+\frac{\epsilon^2}{2}f''_\epsilon(x)+\dots$ , We have to determine if this avoids the pitfalls that we ran into in our earlier attempts. First, note that $\int_{\epsilon} f_\epsilon(x) d\epsilon \equiv p(x,t)$ . Define $a_{\Delta t}(x,t) \doteq \int_{\epsilon} \epsilon f'_\epsilon(x) d\epsilon \equiv \int_\epsilon \epsilon p(x+\epsilon, t + \Delta t|x, t) d\epsilon$ and $b_{\Delta t}(x,t) \doteq \int_{\epsilon} \frac{\epsilon^2}{2} f''_\epsilon(x) d\epsilon \equiv \int_\epsilon \frac{\epsilon^2}{2} p(x+\epsilon, t + \Delta t|x, t) d\epsilon$ .

We have: $p(x,t+\Delta t) \equiv p(x,t)+\frac{\partial}{\partial x} \Big( p(x, t) a_{\Delta t}(x,t) \Big) + \frac{\partial^2}{\partial x^2} \Big( p(x, t) b_{\Delta t}(x,t) \Big) + \dots$ . Note that both $a_{\Delta t}(x,t),~b_{\Delta t}(x,t)$ vanish as $\Delta t \rightarrow 0$ and the limits $a(x,t) \doteq \frac{1}{\Delta t}a_{\Delta t}(x,t),~b(x,t) \doteq \frac{1}{\Delta t}b_{\Delta t}(x,t)$ when they are non-zero have natural physical interpretations as the drift rate and diffusion rate of the process $X(t)$ .

Thus we finally have the famous Fokker-Planck equation also known as the Kolmogorov forward equation: $\frac{\partial p(x,t)}{\partial x} \equiv \frac{\partial}{\partial x} \Big( p(x, t) a(x,t) \Big) + \frac{\partial^2}{\partial x^2} \Big( p(x, t) b(x,t) \Big)$ by keeping only the first two terms in the Taylor expansion.

Differential Form of the Master Equation

A Wrong Turn

A More Careful Attempt

Share this:

Related

Leave a comment Cancel reply