The Gaussian distribution — 4: Conditioning like a Pro

In Part 3, we introduced the important and powerful idea of applying geometric ideas about vectors in 2D and 3D space to complex mathematical objects such as random variables and waveforms. We will now show how to use these geometric ideas to understand and visualize an important and frequently-used method in Bayesian inference: finding the conditional distribution of one set of Gaussian rvs given another.

We will work with the example of a simple Markov chain to illustrate these ideas. Consider iid standard Gaussian rvs X_1,~X_2,~X_3 \sim N(0,1), and the Markov chain Y_1 \rightarrow Y_2 \rightarrow Y_3 of Gaussian rvs \underline{Y} = [Y_1,~Y_2,~Y_3]^T where Y_1 \doteq X_1;~Y_2 \doteq X_1 + X_2;~Y_3 \doteq X_1+X_2+X_3.

Clearly, \underline{Y} has zero mean and the covariance: C_{\underline{Y}} \equiv \begin{pmatrix} 1 & 1 & 1 \\ 1 & 2 & 2 \\ 1 & 2 & 3 \end{pmatrix} . The lengths of the three vectors are |Y_1| \equiv \sigma_1 = 1,~|Y_2| \equiv \sigma_2 = \sqrt{2},~|Y_3| \equiv \sigma_3 = \sqrt{3}. The correlation coefficient between Y_1,~Y_2 is \rho_{12} = \frac{C_{12}}{\sigma_1 \sigma_2} \equiv \frac{1}{\sqrt{2}} and the angle between them is \theta_{12} \equiv \cos^{-1} \left( \rho_{12} \right) = 45^\circ. Likewise for Y_2,~Y_3 we have \rho_{23} = \frac{C_{23}}{\sigma_2 \sigma_3} \equiv \sqrt{\frac{2}{3}} and the angle between them is \theta_{23} \equiv \cos^{-1} \left( \rho_{23} \right) = 35.2^\circ and so on.

Conditioning: an easy example. As a warm-up exercise, let us find the conditional distribution of Y_2,~Y_3 given Y_1=a. This is straightforward because knowing Y_1=a tells us that X_1=a, but since X_2,~X_3 are independent of X_1, their distributions do not change. Thus we have Y_2|_{Y_1=a}=a+X_2,~Y_3|_{Y_1=a}=a+X_2+X_3. The conditional means of [Y_2,~Y_3]^T is [a,~a]^T and their conditional covariance is \begin{pmatrix} 1 & 1 \\ 1 & 2 \end{pmatrix} .

Brute-Force Method. Now let’s find the conditional distribution of Y_1,~Y_2 given Y_3=c. This is not as straightforward as the previous case because Y_3 is correlated with all three of X_1, X_2, X_3 and all of their distributions will change when we condition on Y_3=c.

We can, of course, always find the conditional distribution algebraically using the known joint and marginal distributions as f_{Y_1,Y_2|Y_3}\left( y_1, y_2 | Y_3=c \right) \equiv \frac{f_{Y_1, Y_2, Y_3}(y_1,y_2,c)}{f_{Y_3}(c)}. But we want to avoid this brute-force approach and would like to find a more elegant, intuitive method. We now show how to do this using geometric manipulations with vectors.

Conditioning Elegantly. Consider the figures below that shows the geometric relationship between Y_3,~Y_1 and Y_3,~Y_2. (Note that we cannot show all three vectors representing Y_1, Y_2, Y_3 on the same planar vector diagram and preserve geometric relationships such as angles; this is because the three rvs are linearly independent which means the corresponding vectors are not coplanar.)

The key idea is to project the vector Y_1 in the direction of Y_3, so that we can express Y_1 = Y'_3+W_1 as the sum of two component vectors, one Y'_3 that is perfectly aligned with Y_3 and the other perfectly orthogonal to Y_3. By definition, Y'_3 = \gamma Y_3 for some constant \gamma. We need to find this constant in terms of the statistics of Y_1,~Y_3.

Projection using Basic Trigonometry. We now show how to do this using very elementary geometric arguments. Let \hat{u}_3 \doteq \frac{1}{\sigma_3} Y_3 denote the unit vector in the direction of Y_3. By definition, Y'_3 \equiv |Y'_3| \hat{u}_3. From the trigonometry of the right-angled triangle formed by the vectors Y'_3, W_1, Y_1, we have |Y'_3| = |Y_1| \cos \theta_{13} \equiv \sigma_1 \rho_{13} and |W_1| = |Y_1| \sin \theta_{13} \equiv \sigma_1 \sqrt{1-\rho_{13}^2} which gives \gamma \equiv \frac{\sigma_1}{\sigma_3} \rho_{13} = \frac{1}{3} and \sigma_{W_1}^2 \equiv \sigma_1^2 \left( 1-\rho_{13}^2 \right) = \frac{2}{3}.

Thus we have Y_1 = \frac{1}{3} Y_3 + W_1, where W_1 is independent of Y_3. Therefore Y_1|_{Y_3=c} = \frac{1}{3} c + W_1 \sim N \left( \frac{1}{3}c, \frac{2}{3} \right). Similarly, we can show that Y_2|_{Y_3=c} = \frac{2}{3} c + W_2 \sim N \left( \frac{2}{3}c, \frac{1}{3} \right).

Conditional Covariance. We have almost completed the task we set ourselves: to find the conditional distribution of Y_1,~Y_2 given Y_3=c. In particular, we have now calculated the means and variances of Y_1,~Y_2 and therefore the conditional marginal distributions of Y_1,~Y_2 given Y_3=c. However, to find the conditional joint distribution of Y_1,~Y_2, we also need to find their conditional covariance. This is easily done as follows: C_{1,2|3} = E \left( W_1 W_2 \right) \equiv E \left( \left( Y_1 - \frac{Y_3}{3} \right) \left( Y_2 - \frac{2Y_3}{3} \right) \right). Note that since both W_1,~W_2 are independent of Y_3, this expectation is unaffected by conditioning on Y_3=c and is easily evaluated as: C_{1,2|3} \equiv C_{\underline{Y}}(1,2) - \frac{2}{3} C_{\underline{Y}}(1,3) - \frac{1}{3} C_{\underline{Y}}(2,3) + \frac{2}{9} C_{\underline{Y}}(3,3) \equiv \frac{1}{3}.

These ideas can be generalized in a fairly straightforward way to conditioning on multiple random variables and these generalizations form the core of some very important and powerful techniques in statistical inference. A famous example of such a technique is the Kalman Filter. We will conclude this topic with a summary and a few further comments about applications in Part 5.

One thought on “The Gaussian distribution — 4: Conditioning like a Pro

Leave a comment