This is the first of a series of notes to understand the mathematics of diffusion models from the perspective of an electrical engineer with a background in the mathematical theory of signals and systems based on frequency domain analysis and the Fourier Transform.
Consider a stochastic process and let
, be the conditional probability that the process takes the value
at time
given
.
From The Law of Total Probability, we have . This holds for all
, but we will now specialize to a causal sequence of time instants
and so on. Again using The Law of Total Probability, we can write:
.
If we add the assumption that is Markov, we get a (slightly) simplified equation:
which is sometimes called the Master Equation (ME) – a rather grandiose name for a fairly humble observation.
Differential Form of the Master Equation
Now we will limit ourselves to continuous-time, continuous-valued process that are nice and smooth. Specifically, we will assume that
is continuous. Of course, for random processes, there are many different definitions of continuity, but we will adopt an informal definition: over an infinitesimally small time intervals
, the change
must also be infinitesimally small. Specifically, we will assume that
is zero for all values of
except a small neighborhood
. The same is true of course of the product
. A standard method in the theory of stochastic processes is represent this product by a Taylor Series to obtain the so-called Kramers-Moyal expansion to express the Master Equation in a differential form. A truncation of this Taylor Series yields the famous Fokker-Planck equation.
However, a detailed derivation of this Taylor Series turns out to be surprisingly tricky if we want to maintain full generality and avoid additional simplifying assumptions.

A Wrong Turn
Consider . Let
be the (random) increment in
in the time interval
. It is tempting to try
and write a Taylor Series for the integrand. This, however, is a road to nowhere: Taylor Series are useful over a limited range of values for
, but this formulation requires integrating over all
.
One way to salvage this attempt is to assume the process has independent increments so the transition probabilities are state-independent i.e.
. According to our previous smoothness assumption, the fixed distribution
has finite support in
, and so we can write
. Over this small and finite range, we can perform a Taylor expansion of
.
However, the independent-increments assumption represents a rather significant loss of generality, so we will see if we can avoid this. Our salvage attempt suggests a way forward: keep the term and only do a Taylor expansion of the other term
. Thus, we have for the first two Taylor Series terms:
.
Unfortunately, this expression cannot be simplified because the term is not a distribution over the variable of integration
. With a clever modification, we can make this derivation much more tractable.
A More Careful Attempt
Define . The subscript in
is to remind ourselves that it is defined for a specific value of
. Then we have
.
Now consider the expansion , We have to determine if this avoids the pitfalls that we ran into in our earlier attempts. First, note that
. Define
and
.
We have: . Note that both
vanish as
and the limits
when they are non-zero have natural physical interpretations as the drift rate and diffusion rate of the process
.
Thus we finally have the famous Fokker-Planck equation also known as the Kolmogorov forward equation: by keeping only the first two terms in the Taylor expansion.