The Gaussian distribution — 4: Conditioning like a Pro

In Part 3, we introduced the important and powerful idea of applying geometric ideas about vectors in 2D and 3D space to complex mathematical objects such as random variables and waveforms. We will now show how to use these geometric ideas to understand and visualize an important and frequently-used method in Bayesian inference: finding the conditional distribution of one set of Gaussian rvs given another.

We will work with the example of a simple Markov chain to illustrate these ideas. Consider iid standard Gaussian rvs $X_1,~X_2,~X_3 \sim N(0,1)$ , and the Markov chain $Y_1 \rightarrow Y_2 \rightarrow Y_3$ of Gaussian rvs $\underline{Y} = [Y_1,~Y_2,~Y_3]^T$ where $Y_1 \doteq X_1;~Y_2 \doteq X_1 + X_2;~Y_3 \doteq X_1+X_2+X_3$ .

Clearly, $\underline{Y}$ has zero mean and the covariance: $C_{\underline{Y}} \equiv \begin{pmatrix} 1 & 1 & 1 \\ 1 & 2 & 2 \\ 1 & 2 & 3 \end{pmatrix}$ . The lengths of the three vectors are $|Y_1| \equiv \sigma_1 = 1,~|Y_2| \equiv \sigma_2 = \sqrt{2},~|Y_3| \equiv \sigma_3 = \sqrt{3}$ . The correlation coefficient between $Y_1,~Y_2$ is $\rho_{12} = \frac{C_{12}}{\sigma_1 \sigma_2} \equiv \frac{1}{\sqrt{2}}$ and the angle between them is $\theta_{12} \equiv \cos^{-1} \left( \rho_{12} \right) = 45^\circ$ . Likewise for $Y_2,~Y_3$ we have $\rho_{23} = \frac{C_{23}}{\sigma_2 \sigma_3} \equiv \sqrt{\frac{2}{3}}$ and the angle between them is $\theta_{23} \equiv \cos^{-1} \left( \rho_{23} \right) = 35.2^\circ$ and so on.

Conditioning: an easy example. As a warm-up exercise, let us find the conditional distribution of $Y_2,~Y_3$ given $Y_1=a$ . This is straightforward because knowing $Y_1=a$ tells us that $X_1=a$ , but since $X_2,~X_3$ are independent of $X_1$ , their distributions do not change. Thus we have $Y_2|_{Y_1=a}=a+X_2,~Y_3|_{Y_1=a}=a+X_2+X_3$ . The conditional means of $[Y_2,~Y_3]^T$ is $[a,~a]^T$ and their conditional covariance is $\begin{pmatrix} 1 & 1 \\ 1 & 2 \end{pmatrix}$ .

Brute-Force Method. Now let’s find the conditional distribution of $Y_1,~Y_2$ given $Y_3=c$ . This is not as straightforward as the previous case because $Y_3$ is correlated with all three of $X_1, X_2, X_3$ and all of their distributions will change when we condition on $Y_3=c$ .

We can, of course, always find the conditional distribution algebraically using the known joint and marginal distributions as $f_{Y_1,Y_2|Y_3}\left( y_1, y_2 | Y_3=c \right) \equiv \frac{f_{Y_1, Y_2, Y_3}(y_1,y_2,c)}{f_{Y_3}(c)}$ . But we want to avoid this brute-force approach and would like to find a more elegant, intuitive method. We now show how to do this using geometric manipulations with vectors.

Conditioning Elegantly. Consider the figures below that shows the geometric relationship between $Y_3,~Y_1$ and $Y_3,~Y_2$ . (Note that we cannot show all three vectors representing $Y_1, Y_2, Y_3$ on the same planar vector diagram and preserve geometric relationships such as angles; this is because the three rvs are linearly independent which means the corresponding vectors are not coplanar.)

The key idea is to project the vector $Y_1$ in the direction of $Y_3$ , so that we can express $Y_1 = Y'_3+W_1$ as the sum of two component vectors, one $Y'_3$ that is perfectly aligned with $Y_3$ and the other perfectly orthogonal to $Y_3$ . By definition, $Y'_3 = \gamma Y_3$ for some constant $\gamma$ . We need to find this constant in terms of the statistics of $Y_1,~Y_3$ .

Projection using Basic Trigonometry. We now show how to do this using very elementary geometric arguments. Let $\hat{u}_3 \doteq \frac{1}{\sigma_3} Y_3$ denote the unit vector in the direction of $Y_3$ . By definition, $Y'_3 \equiv |Y'_3| \hat{u}_3$ . From the trigonometry of the right-angled triangle formed by the vectors $Y'_3, W_1, Y_1$ , we have $|Y'_3| = |Y_1| \cos \theta_{13} \equiv \sigma_1 \rho_{13}$ and $|W_1| = |Y_1| \sin \theta_{13} \equiv \sigma_1 \sqrt{1-\rho_{13}^2}$ which gives $\gamma \equiv \frac{\sigma_1}{\sigma_3} \rho_{13} = \frac{1}{3}$ and $\sigma_{W_1}^2 \equiv \sigma_1^2 \left( 1-\rho_{13}^2 \right) = \frac{2}{3}$ .

Thus we have $Y_1 = \frac{1}{3} Y_3 + W_1$ , where $W_1$ is independent of $Y_3$ . Therefore $Y_1|_{Y_3=c} = \frac{1}{3} c + W_1 \sim N \left( \frac{1}{3}c, \frac{2}{3} \right)$ . Similarly, we can show that $Y_2|_{Y_3=c} = \frac{2}{3} c + W_2 \sim N \left( \frac{2}{3}c, \frac{1}{3} \right)$ .

Conditional Covariance. We have almost completed the task we set ourselves: to find the conditional distribution of $Y_1,~Y_2$ given $Y_3=c$ . In particular, we have now calculated the means and variances of $Y_1,~Y_2$ and therefore the conditional marginal distributions of $Y_1,~Y_2$ given $Y_3=c$ . However, to find the conditional joint distribution of $Y_1,~Y_2$ , we also need to find their conditional covariance. This is easily done as follows: $C_{1,2|3} = E \left( W_1 W_2 \right) \equiv E \left( \left( Y_1 - \frac{Y_3}{3} \right) \left( Y_2 - \frac{2Y_3}{3} \right) \right)$ . Note that since both $W_1,~W_2$ are independent of $Y_3$ , this expectation is unaffected by conditioning on $Y_3=c$ and is easily evaluated as: $C_{1,2|3} \equiv C_{\underline{Y}}(1,2) - \frac{2}{3} C_{\underline{Y}}(1,3) - \frac{1}{3} C_{\underline{Y}}(2,3) + \frac{2}{9} C_{\underline{Y}}(3,3) \equiv \frac{1}{3}$ .

These ideas can be generalized in a fairly straightforward way to conditioning on multiple random variables and these generalizations form the core of some very important and powerful techniques in statistical inference. A famous example of such a technique is the Kalman Filter. We will conclude this topic with a summary and a few further comments about applications in Part 5.

The Gaussian distribution — 4: Conditioning like a Pro

One thought on “The Gaussian distribution — 4: Conditioning like a Pro”

Leave a comment Cancel reply

Share this:

Related

One thought on “The Gaussian distribution — 4: Conditioning like a Pro”

Leave a comment Cancel reply