Informal Introduction to the Gaussian Distribution – 1: Central Limits

Consider a random variable X obtained from a random experiment E with the mean, variance \mu,~\sigma^2 and density function f_X(x).

First- and Second-order approximations. The mean and variance provide a simple, partial statistical description of the random variable X that is easy to understand intuitively: the mean is the center of mass of the distribution f_X(x), while the standard deviation \sigma is a measure of the spread of the distribution away from the mean. The complete statistical description of X is of course provided by the density function f_X(x).

Specifying a distribution by its moments. An alternative statistical description of a random variable is in terms of its moments: \mu_n^n \doteq E \left[ X^n \right],~n=1,2, \dots \infty. To understand the moments of a distribution intuitively, consider the characteristic function \Phi_X(\omega) \doteq E \left[ e^{j \omega X} \right]. Mathematically, the characteristic function is the Fourier transform of the density function f_X(x). For low “frequencies” \omega, we can approximate the characteristic function by a Taylor Series: \Phi_X(\omega) \equiv 1 + j\omega \mu - \frac{1}{2} \omega^2 \left( \mu^2 + \sigma^2 \right) - \frac{1}{3!} j \omega^3 \mu_3^3 + \dots.

Roughly speaking, the lower-order moments provide a coarse, “low frequency” approximation to the distribution, and higher-order moments supply finer-grained “high-frequency” details.

The Law of Large Numbers. Consider N independent repetitions of the experiment E resulting in the iid sequence of random variables X_1,~X_2,~\dots,~X_N. The sample mean random variable S \doteq \frac{1}{N} \sum_{I=1}^N X_i has the mean \mu and variance \frac{\sigma^2}{N^2}.

Clearly, since the variance of S vanishes as N \rightarrow \infty, the random variable S converges to its mean. This is also easily confirmed from \Phi_S(\omega) \equiv \left ( \Phi_X \left( \frac{\omega}{N} \right) \right)^N \equiv \left (  1 + j \frac{\omega}{N} \mu - \frac{1}{2} \frac{\omega^2}{N^2} \left( \mu^2 + \sigma^2 \right) + \dots \right)^N \rightarrow e^{j \omega \mu}. This is one version of the famous Law of Large Numbers (LLN).

Deviations from the Mean. The LLN represents a first-order approximation to the distribution of the sample mean S. To refine this approximation and look at how S is distributed around its mean, consider the “centered random variable” \tilde{Y} \doteq S - \mu \equiv \frac{1}{N} \sum_{I=1}^N \left( X_i - \mu \right). This random variable has the characteristic function \Phi_{\tilde{Y}}(\omega) \equiv \left (  1 - \frac{1}{2} \frac{\omega^2}{N^2} \sigma^2  + \dots \right)^N \rightarrow 1. This is simply the LLN all over again i.e. \tilde{Y} \rightarrow 0. It turns out that deviations from the mean, being second-order effects, are small and vanish asymptotically!

Central Limits. To prevent the deviations from the sample mean from becoming vanishingly small, we must magnify or zoom into them explicitly. Thus, we are led to define Y \doteq \sqrt{N} \left( S - \mu \right) \equiv \frac{1}{\sqrt{N}} \sum_{I=1}^N \left( X_i - \mu \right). This random variable has zero mean and variance \sigma which is finite and its characteristic function is: \Phi_Y(\omega) \equiv \left (  1 - \frac{1}{2} \frac{\omega^2}{N} \sigma^2  + \dots \right)^N \rightarrow e^{- \frac{1}{2} \omega^2 \sigma^2}.

This is a version of the famous Central Limit Theorem (CLT) that says that the small deviations around the sample mean of a large number of independent random variables X_i follow a Gaussian distribution regardless of the actual distribution of the X_i‘s!

Random Mixing Smooths over Fine Details. In fact our simple derivation above does not require that the X_i‘s be identically distributed; only that they have the same mean and variance and that they are independent.

The CLT may help explain why the Bell Curve of the Gaussian distribution is so ubiquitous in nature: for complex, multi-causal natural phenomena, when we look at the aggregate of many small independent variables, the fine details of the underlying variables tend to get obscured.

There are many Internet resources that provide nice illustrations of the CLT. Here’s one from this website:

However, it is important to recognize that the CLT is an asymptotic result and usually applies in practice as an approximation. Following the logic of the derivation above, we should expect the CLT to only account for the coarse features of the distribution; in particular, the Gaussian approximation should not be relied on to predict the probability of rare “tail events”.

One place where the Gaussian approximation works really well is for the distribution of noise voltages in circuits. This is understandable when the noise is thermal in origin. Of course noise voltages are random waveforms, and their statistical description is more complex than that of a single random variable. In particular, we need to discuss the joint distribution of multiple Gaussian random variables or equivalently, Gaussian random vectors. This is a topic for Part 2.

2 thoughts on “Informal Introduction to the Gaussian Distribution – 1: Central Limits

Leave a comment