###### 3.6.2 Multicollinear Random Vectors

Suppose we are analyzing the risk in a natural gas trading portfolio. Random variables represent tomorrow’s values for each price the portfolio is exposed to. The portfolio holds New York Mercantile Exchange (NYMEX) Henry Hub futures out to 24 months, so there are 24 futures prices. It also has forward positions out to 18 months for 30 delivery points, for another 540 prices. In total, our model depends upon a vector of 564 random variables!

Based upon a time series analysis of historical price data, we construct a 564×564 covariance matrix for our random vector of prices. Gazing at the 318,096 variances and covariances of our matrix, we wonder: Do we really need all these numbers?

Intuitively, we know that the random variables are interdependent. Prices for 6-month and 7-month Transco Zone 2 delivery are highly correlated. So are 3-month prices for adjacent Transco Zones 1 and 2. Because of such interdependencies, it is conceivable that our random vector is singular, but this is probably not the case. Singularity arises infrequently in applications. A more common situation is “almost” singularity, which is known as **multicollinearity**.

We illustrate with two four-dimensional random vectors. Random vector ** X** is singular. Its first three components

*X*

_{1},

*X*

_{2}, and

*X*

_{3}are uncorrelated, each with mean 0 and standard deviation 1. The fourth component

*X*

_{4}equals

*X*

_{1}+

*X*

_{2}+

*X*

_{3}. The covariance matrix for

**is**

*X*[3.51]

Random vector ** Z** is multicollinear. Like

**, its first three components**

*X**Z*

_{1},

*Z*

_{2}, and

*Z*

_{3}are uncorrelated, each with mean 0 and standard deviation 1. The fourth component

*Z*

_{4}equals

*Z*

_{1}+

*Z*

_{2}+

*Z*

_{3}+

*E*, where

*E*is a “noise” random variable that is uncorrelated with

*Z*

_{1},

*Z*

_{2}, and

*Z*

_{3}and has mean 0 and standard deviation .001. Except for the addition of “noise”

*E*, our random vector

**is identical to our random vector**

*Z***. Its covariance matrix is**

*X*[3.52]

The covariance matrix of ** X** is singular. It has determinant 0. The covariance matrix of

**is not singular, but with a determinant of .000001, it is “almost” singular. The random variable**

*Z**Z*

_{4}is almost a linear polynomial of

*Z*

_{1},

*Z*

_{2}, and

*Z*

_{3}, but not quite. We added just enough random “noise” to make it linearly independent. We say a random vector is

**multicollinear**if it is “almost” singular in this sense.

Realizations of a multicollinear random vector tend to cluster near a plane within * ^{n}*. They don’t all lie in that plane, but they “almost” do. This is illustrated with realizations of a two-dimensional multicollinear random vector

**in Exhibit 3.8.**

*Z***. Component**

*Z**Z*

_{2}is “almost” a linear polynomial of component

*Z*

_{1}.

We may think of a random vector ** Z** as being “almost” singular if its covariance matrix has a determinant |

**Σ**| close to 0. In practical applications, the magnitude of this determinant will depend upon the units in which components of

_{Z}**are measured. A more reasonable test for multicollinearity is to consider the determinant |**

*Z***ρ**| of the correlation matrix of

_{Z}**. This determinant will always be in the interval [0,1]. If it is close to 0, this is an indication of multicollinearity. Obviously, if it equals 0,**

*Z***is singular.**

*Z*As we have seen, the dimensionality of a singular random vector ** X** can be reduced with a simple change of variables. No information is lost, as we only eliminate extraneous random variables. Multicollinearity is more problematic. Reducing the dimensionality of a multicollinear random vector

**requires an approximation that somehow identifies and discards minor randomness that is preventing the covariance matrix from being singular.**

*Z*This is the situation we face with our natural gas portfolio. We feel confident that the natural gas market can reasonably be modeled with less than 564 random variables, but we can’t arbitrarily discard random variables! If our covariance matrix isn’t singular, how can we replace our 564 random variables with a smaller set that convey essentially the same information? Principal component analysis will provide a solution.

###### Exercises

True or false:

- A covariance matrix is singular if and only if it is positive definite.
- A covariance matrix is nonsingular if and only if it is positive semidefinite.
- Every random vector has a positive semidefinite covariance matrix.

Which of the following covariance matrices **Σ** are singular? Which are multicollinear?

[3.53]

[3.54]

[3.55]