Informal Introduction to the Gaussian Distribution – 1: Central Limits

Consider a random variable X obtained from a random experiment E with the mean, variance $\mu,~\sigma^2$ and density function $f_X(x)$ .

First- and Second-order approximations. The mean and variance provide a simple, partial statistical description of the random variable X that is easy to understand intuitively: the mean is the center of mass of the distribution $f_X(x)$ , while the standard deviation $\sigma$ is a measure of the spread of the distribution away from the mean. The complete statistical description of X is of course provided by the density function $f_X(x)$ .

Specifying a distribution by its moments. An alternative statistical description of a random variable is in terms of its moments: $\mu_n^n \doteq E \left[ X^n \right],~n=1,2, \dots \infty$ . To understand the moments of a distribution intuitively, consider the characteristic function $\Phi_X(\omega) \doteq E \left[ e^{j \omega X} \right]$ . Mathematically, the characteristic function is the Fourier transform of the density function $f_X(x)$ . For low “frequencies” $\omega$ , we can approximate the characteristic function by a Taylor Series: $\Phi_X(\omega) \equiv 1 + j\omega \mu - \frac{1}{2} \omega^2 \left( \mu^2 + \sigma^2 \right) - \frac{1}{3!} j \omega^3 \mu_3^3 + \dots$ .

Roughly speaking, the lower-order moments provide a coarse, “low frequency” approximation to the distribution, and higher-order moments supply finer-grained “high-frequency” details.

The Law of Large Numbers. Consider N independent repetitions of the experiment E resulting in the iid sequence of random variables $X_1,~X_2,~\dots,~X_N$ . The sample mean random variable $S \doteq \frac{1}{N} \sum_{I=1}^N X_i$ has the mean $\mu$ and variance $\frac{\sigma^2}{N^2}$ .

Clearly, since the variance of S vanishes as $N \rightarrow \infty$ , the random variable S converges to its mean. This is also easily confirmed from $\Phi_S(\omega) \equiv \left ( \Phi_X \left( \frac{\omega}{N} \right) \right)^N \equiv \left ( 1 + j \frac{\omega}{N} \mu - \frac{1}{2} \frac{\omega^2}{N^2} \left( \mu^2 + \sigma^2 \right) + \dots \right)^N \rightarrow e^{j \omega \mu}$ . This is one version of the famous Law of Large Numbers (LLN).

Deviations from the Mean. The LLN represents a first-order approximation to the distribution of the sample mean S. To refine this approximation and look at how S is distributed around its mean, consider the “centered random variable” $\tilde{Y} \doteq S - \mu \equiv \frac{1}{N} \sum_{I=1}^N \left( X_i - \mu \right)$ . This random variable has the characteristic function $\Phi_{\tilde{Y}}(\omega) \equiv \left ( 1 - \frac{1}{2} \frac{\omega^2}{N^2} \sigma^2 + \dots \right)^N \rightarrow 1$ . This is simply the LLN all over again i.e. $\tilde{Y} \rightarrow 0$ . It turns out that deviations from the mean, being second-order effects, are small and vanish asymptotically!

Central Limits. To prevent the deviations from the sample mean from becoming vanishingly small, we must magnify or zoom into them explicitly. Thus, we are led to define $Y \doteq \sqrt{N} \left( S - \mu \right) \equiv \frac{1}{\sqrt{N}} \sum_{I=1}^N \left( X_i - \mu \right)$ . This random variable has zero mean and variance $\sigma$ which is finite and its characteristic function is: $\Phi_Y(\omega) \equiv \left ( 1 - \frac{1}{2} \frac{\omega^2}{N} \sigma^2 + \dots \right)^N \rightarrow e^{- \frac{1}{2} \omega^2 \sigma^2}$ .

This is a version of the famous Central Limit Theorem (CLT) that says that the small deviations around the sample mean of a large number of independent random variables $X_i$ follow a Gaussian distribution regardless of the actual distribution of the $X_i$ ‘s!

Random Mixing Smooths over Fine Details. In fact our simple derivation above does not require that the $X_i$ ‘s be identically distributed; only that they have the same mean and variance and that they are independent.

The CLT may help explain why the Bell Curve of the Gaussian distribution is so ubiquitous in nature: for complex, multi-causal natural phenomena, when we look at the aggregate of many small independent variables, the fine details of the underlying variables tend to get obscured.

There are many Internet resources that provide nice illustrations of the CLT. Here’s one from this website:

However, it is important to recognize that the CLT is an asymptotic result and usually applies in practice as an approximation. Following the logic of the derivation above, we should expect the CLT to only account for the coarse features of the distribution; in particular, the Gaussian approximation should not be relied on to predict the probability of rare “tail events”.

One place where the Gaussian approximation works really well is for the distribution of noise voltages in circuits. This is understandable when the noise is thermal in origin. Of course noise voltages are random waveforms, and their statistical description is more complex than that of a single random variable. In particular, we need to discuss the joint distribution of multiple Gaussian random variables or equivalently, Gaussian random vectors. This is a topic for Part 2.

Informal Introduction to the Gaussian Distribution – 1: Central Limits

2 thoughts on “Informal Introduction to the Gaussian Distribution – 1: Central Limits”

Leave a comment Cancel reply

Share this:

Related

2 thoughts on “Informal Introduction to the Gaussian Distribution – 1: Central Limits”

Leave a comment Cancel reply