Lesson 6: Properties of the Sample Mean Vector and Sample Correlation

Introduction

In this lesson we consider the properties of the sample mean vector and the sample correlation which we had defined earlier.

Throughout this lesson we will be making the assumption that our random vectors

X1, X2,...,Xn

are independently sampled from a population with mean vector μ and variance-covariance matrix Σ. Unless stated otherwise, we will not assume that the data are sampled from a multivariate normal distribution.

We first shall consider the properties of the sample mean vector. The following shall compare and contrast the properties of the sample mean vector in the multivariate setting with the properties of the sample mean in the univariate setting.

Lesson Objectives:

Upon completion of this lesson, you should be able to answer the following questions regarding sample mean vectors and correlations.

Sample Mean Vectors

Sample Correlations


Inferences for Sample Mean Vectors

We first shall consider the properties of the sample mean vector . The following shall compare and contrast the properties of the sample mean vector in the multivariate setting with the properties of the sample mean in the univariate setting.

Since the is a function the random data, is also a random vector and hence, has a mean, a variance-covariance matrix and a distribution. We have already seen that the mean of the sample mean vector is equal to the population mean vector μ.

1. Variance

Before considering the sample variance-covariance matrix for the mean vector , let's consider the univariate setting.

Univariate Setting: You should recall from introductory statistics that the population variance of the sample mean, generated from independent samples of size n, is equal to the population variance, σ2 divided by n.

This, of course, is a function of the unknown population variance σ2. We can estimate this by simply substituting in the sample variance s2 for σ2 yielding our estimate for the variance of the population mean s2 over n as shown below:

If we were to take the square root of this quantity we would obtain the standard error of the mean. The standard error of the mean is a measure of the uncertainty of our estimate of the population mean. If the standard error is large, then we are uncertain of our estimate of the mean. Conversely, if the standard error is small, then we are not uncertain of our estimate. What is meant by large or small depends on the application at hand. But in any case, since the standard error is a decreasing function of sample size, the larger our sample the more certain we can be of our estimate of the population mean.

Multivariate Setting: The population variance-covariance matrix replaces the variance of the ’s generated from independent samples of size n, taking a similar form as what was seen in the univariate setting. That is, the variance-covariance matrix of is equal to 1 over n times the population variance-covariance matrix of the individual observations as shown below:

Again, this is a function of the unknown population variance-covariance matrix Σ. An estimate of the variance-covariance matrix of can be obtain by substituting the sample variance-covariance matrix S for the population variance-covariance matrix Σ, yielding the estimate as shown below:

2. Distribution

Let's consider the distribution of the sample mean vector, first looking at the univariate setting and comparing this to the multivariate setting.

Univariate Setting: Here we are going to make the additional assumption that X1, X2,...,Xn are independently sampled from a normal distribution with mean μ and variance σ2. Then, in this case, is going to be normally distributed also with mean μ but with variance σ2 over n. Mathematically we use the following notation as shown below.

Here is followed by the symbol 'tilde', which means 'is distributed as'. N stands for the normal distribution and in parentheses we have the mean followed by the variance of the normally distributed random sample mean.

Multivariate Setting: Similarly, for the multivariate setting, we are going to assume that the data vectors X1, X2,...,Xn are independently sampled from a multivariate normal distribution with mean vector μ and variance-covariance matrix Σ. Then, in this case, the sample mean vector, , is distributed as multivariate normal with mean vector μ and variance-covariance matrix , the variance-covariance matrix for . In statistical notation we write:

3. Law of Large Numbers

At this point we will drop the assumption that the individual observations are sampled from a normal distribution and look at laws of large numbers. These will hold regardless of the distribution of the individual observations.

Univariate Setting: In the univariate setting, we see that if the data are independently sampled, then the sample mean, , is going to converge (in probability) to the population mean μ. What does this mean exactly? It means that as the sample size gets larger and larger the sample mean will tend to approach the true value for a population μ.

Multivariate Setting: A similar result is involved in the multivariate setting , the sample mean vector, , will also converge (in probability) to the mean vector μ;. That is as our sample size gets larger and larger, each of the individual components of that vector, j, will converge to the corresponding means, μj.

4. Central Limit Theorem

Just as in the univariate setting we also have a multivariate Central Limit Theorem. But first, let's review the univariate Central Limit Theorem.

Univariate Setting. If all of our individual observations, X1, X2,...,Xn, are independently sampled from a population with mean μ and variance σ2, then, the sample mean,, is approximately normally distributed with mean μ and variance σ2/n. This differs from Property 2: Distribution, which specifically requires that the data are sampled from a normal distribution. Under Property 2: Distribution, we found that even for small samples the data are going to be normally distributed. The Central Limit Theorem is a more general result which holds regardless of the distribution of the original data and basically says that the sample mean is going to be approximately normally distributed for large samples regardless of the distribution of the individual observations.

Multivariate Setting. A similar result is available in the multivariate setting. If our data vectors X1, X2,...,Xn, are independently sampled from a population with mean vector μ and variance-covariance matrix Σ, then the sample mean vector, , is going to be approximately normally distributed with mean vector μ and variance-covariance matrix , the variance-covariance matrix for the original data.

This Central Limit Theorem is a key result that we will take advantage of later on this course when we talk about hypothesis tests about individual mean vectors or collections of mean vectors under different treatment regimens.


Inferences for Correlations

Let's consider testing the null hypothesis that there is zero correlation between two variables j and k. Mathematically we write this as shown below:

Ho : ρjk = 0 against Ha : ρjk ≠ 0

Recall that the correlation is estimated by sample correlation rjk given in the expression below:

Here we have the sample covariance between the two variables divided by the square root of the product of the individual variances.

We shall assume that the pair of variables j and k are independently sampled from a bivariate normal distribution throughout this discussion; that is:

are independently sampled from a bivariate normal distribution.

To test the null hypothesis, we form the test statistic, t which is equal to the sample correlation times the square root of n - 2 divided by the quantity of 1 minus the correlation squared:

Under the null hypothesis, Ho, this test statistic will be approximately t distributed with n-2 degrees of freedom. Note that this approximation holds for larger samples. We will reject the null hypothesis, Ho, at level α if the absolute value of the test statistic, t, is greater than the critical value from the t-table with n-2 degrees of freedom; that is if:

To illustrate these concepts let's return to our example dataset, the Wechsler Adult Intelligence Scale.

Example: Wechsler Adult Intelligence Scale

These data were analyzed using the SAS program wechsler.sas in our last lesson, (Multivariate Normal Distribution), which yielded the computer output, wechsler.lst. Recall that these are data on n = 37 subjects taking the Wechsler Adult Intelligence Test. This test was broken up into four components:

Looking at the computer output we have summarized the correlations among variables in the table below:

 
Information
Similarities
Arithmetic
Picture
Information
1.00000
0.77153
0.56583
0.31816
Similarities
0.77153
1.00000
0.51295
0.08135
Arithmetic
0.56583
0.51295
1.00000
0.27988
Picture
0.31816
0.08135
0.27988
1.00000

For example, the correlation between Similarities and Information is 0.77153.

Let's consider testing the null hypothesis that there is no correlation between Information and Similarities. This would be written mathematically as shown below:

Ho : ρ12 = 0

We can then substitute values into the formula to compute the test-statistic using the values from this examples. This example is worked out below:

Looking at our t-table for 35 degrees of freedom and an α level of .005, we get a critical value of t(df,1-α/2) = t35, 0.9975 = 3.030. Therefore, we are going to look at the critical value under 0.0025 in the table (since 35 does not appear use the closest df that does not exceed 35 which is 30) and in this case it is 3.030, meaning that t(df,1-α/2) = t(33,0.9975) is 3.030. NOTE: Some text tables provide the right tail probability (the graph at the top will have the area in the right tail shaded in) while other texts will provide a table with the cumulative probability - the graph will be shaded in to the left. The concept is the same. For example, if alpha was 0.01 then using the first text you would look under 0.005 and in the second text look under 0.995.

Since

7.175 > 2.93.030 = t35,0.9975,

we can reject the null hypothesis that Information and Similarities scores are uncorrelated at the α < 0.01 level.

Our conclusion here is that: Similarities scores increase with increasing Information scores (t = 7.175; d.f. = 35; p < 0.0001). You will note here that I am not simply concluding that the results are significant. When drawing conclusions it is never, never adequate to simply state that the results are significant. In all cases you should seek to describe what the results tell you about this data. In this case, since we rejected the null hypothesis we can concluded that the correlation is not equal to zero, but furthermore since the actual sample correlation is greater than zero, we can further conclude that there is a positive association between the two variables, and hence our conclusion that Similarities scores tend to increase with increasing values of Information scores.

You will also note that this is not the only information that I am giving under my conclusion. When giving conclusions you should also back up those conclusions with the appropriate evidence: the test statistic, degrees of freedom (if appropriate), and p-value. Here the appropriate evidence is given by the test statistic t = 7.175; the degrees of freedom for the test, here 35, and the p-value is less than 0.0001 as indicated from the computer print out. The p-value appears below each correlation coefficient in the SAS output.

Confidence Interval for pjk

Once we conclude that there is a positive or negative correlation between two variables the next thing we might want to do is to compute a confidence interval for the correlation. This confidence interval will give us a range of reasonable values for the correlation itself. The sample correlation, because it is bounded between -1 and 1 is typically not normally distributed or even approximately so. Sample correlations near zero may be approximately bell-shaped in distribution, but as they approach either bound, the data values will tend to pile up against those bounds. For example, if you have a positive correlation you'll tend to find a lot of positive values piling up near one, and a long tail trailing off to the left. Similarly, in a negative correlation the values will pile up near -1 and potentially and a tail going off to the right. To adjust for this asymmetry, or the skewness of distribution, we apply a transformation of the correlation coefficients. In particular, we are going to apply Fisher's transformation which is given in the expression below in Step 1 of our procedure for computing confidence intervals for the correlation coefficient.

Step 1: Compute Fisher’s transformation

Here we have one half of the natural log of 1 plus the correlation, divided by one minus the correlation.

Note: In this course, whenever log is mentioned, unless specified otherwise, log stands for the natural log.

For large samples, this transform correlation coefficient z is going to be approximately normally distributed with the mean equal to same transformation of the population correlation, as shown below, and a variance of 1 over the sample size minus 3.

Step 2: Compute a (1 - α) x 100% confidence interval for the Fisher transform of the population correlation.

That is, one half log of 1 plus the correlation divided by 1 minus the correlation. In other words, this confidence interval is given by the expression below:

Here we take the value of Fisher's transform Z, plus and minus the critical value from the z table, divided by the square root of n - 3. The lower bound we will call the Zl and the upper bound we will call the Zu.

Step 3: Back transform the confidence values to obtain the desired confidence interval for ρjk. This given by the expression below:

The first term we see is a function of the lower bound, the Zl. The second term is a function of the upper bound or Zu.

Let's return to the Wechsler Adult Intelligence Data to see how these procedures are carried out.

Example: Wechsler Adult Intelligence Data

Recall that the sample correlation between Similarities and Information was r12 = 0.7719.

Step 1: Compute the Fisher transform:

You should confirm this value on your own.

Step 2: Next, compute the 95% confidence interval for the Fisher transform, :

In other words, the value 1.025 plus or minus the critical value from the normal table, at α/2 = 0.025, which in this case is 1.96. Divide by the square root of n minus 3. Subtracting the result from 1.025 yields the lower bound of 0.68886. Adding the result to 1.025 yields the upper bound of 1.36114.

Step 3: Carry out the back-transform to obtain the 95% confidence interval for ρ12. This is shown in the expression below:

This yields the interval from 0.5972 to 0.8767.

Conclusion: In this case, we can conclude that we are 95% confident that the interval (0.5972, 0.8767) contains the correlation between Information and Similarities scores.

Note the correct interpretation of this interval. We did not say that we are 95% confident that the correlation between Information and Similarities lies between the interval. This would be an improper use of the English language. This statement would imply that the population correlation is random. But, in fact, by assumption, in this class, the population correlation is a fixed deterministic quantity. The only quantities that are random are the bounds of the confidence interval, which are a function of the random data and so are also random. Therefore it is appropriate to make statements regarding the randomness of that interval.


Summary

In this lesson we learned about:

Next, complete the homework problems that will give you a chance to put what you have learned to use...

© 2007 The Pennsylvania State University. All rights reserved.