Choosing the correct analysis (and reading Minitab output)

(#1-2) One numerical variable

If you have one numerical variable:

Example

A random sample of 32 Stat 250 students were asked: What is your current cumulative GPA? The data were entered into a column in Minitab. Two (2) of the students declined to answer the question, and so are denoted as missing ("*"). The data are:

gpa

      *   3.60   3.01   3.41   2.50   2.75   3.59   3.00   3.91   3.13
   2.40   3.94   3.25   3.00   3.50   3.98   2.50   3.45   2.69   2.67
   3.35   2.80   3.40   2.88   3.73   2.10   2.45   3.42   2.50   2.00
      *   2.74

The Minitab output looks like:

T Confidence Intervals

Variable     N      Mean    StDev  SE Mean       95.0 % CI
gpa         30    3.0550   0.5444   0.0994  (  2.8517,  3.2583)

T-Test of the Mean

Test of mu = 3.0000 vs mu > 3.0000

Variable     N      Mean    StDev   SE Mean        T          P
gpa         30    3.0550   0.5444    0.0994     0.55       0.29

The output tells us:

(#3-4) One categorical (binary) variable

If you have one categorical (binary) variable:

Example

A random sample of 87 students were asked: Have you ever dyed your hair? Yes or No. The data were entered into a column in Minitab -- the Yes and No responses were coded as a 1 and 0, respectively. The data are:

dyed

     0     0     0     1     1     1     1     0     1     0     0     1
     0     1     1     0     1     0     0     0     1     0     1     0
     1     0     0     1     0     0     1     1     0     0     1     1
     1     1     1     0     0     1     1     1     1     1     1     0
     0     1     0     0     0     1     0     1     1     1     1     0
     0     0     1     0     0     0     1     0     1     1     0     0
     0     1     1     0     0     0     0     0     0     0     0     0
     0     0     0

The Minitab output looks like:

Test and Confidence Interval for One Proportion

Test of p = 0.5 Vs p < 0.5

Success = 1

Variable          X      N  Sample p        95.0 % CI       Z-Value  P-Value
dyed             38     87  0.436782  (0.332560, 0.541004)    -1.18    0.119

The output tells us:

(#5-6) One categorical (binary) variable that forms independent groups and one numerical variable

If you have one categorical (binary) variable that forms independent groups and one numerical variable:

Note that here we are effectively treating the binary variable that forms the groups as the explanatory variable and the numerical variable by which we're comparing the groups as the response variable. If we switched the variables around and treated the numerical variable as the explanatory variable and treated the binary variable as the response variable, we'd need to do a "logistic regression analysis," which is beyond the scope of this course.

Example

A random sample of 31 students were asked two questions:

The data were entered into two columns in Minitab. One column contains the gender ("subscripts") of the student, while one column contains the amount of money spent on books ("samples"). Males were coded as a 1 and females as a 2. The data are:

 Row   Sex   Books



   1     1    400
   2     1    500
   3     2    210
   4     1    200
   5     2    240
   6     1    160
   7     1    400
   8     2    300
   9     2    250
  10     2    300
  11     1    465
  12     1    300
  13     2    145
  14     1    345
  15     2    350
  16     1    350
  17     2    300
  18     2    300
  19     1    300
  20     2    300
  21     1    350
  22     1    550
  23     1    245
  24     1    230
  25     2    340
  26     1    129
  27     2    300
  28     2    200
  29     2    159
  30     1    330
  31     1    270

The Minitab output looks like:

Two Sample T-Test and Confidence Interval

Two sample T for Books

Sex          N      Mean     StDev   SE Mean

1           17       325       116        28
2           14     263.9      64.4        17

95% CI for mu (1) - mu (2): ( -7,  129)
T-Test mu (1) = mu (2) (Vs not =): T = 1.85  P = 0.076  DF = 25

The output tells us:

(#7-8) One categorical (binary) variable that forms paired groups and one numerical variable

If you have one categorical (binary) variable that forms paired groups and one numerical variable:

Example

The pulse rates of a random sample of 10 students were measured. Then, the students were asked to march in place. Their pulse rates were measured again. The data were entered into two columns in Minitab. One column contains the students' pulse rates before marching, while one column contains the students' pulse rates after marching. The data are:

  Row   Bef    Aft

   1     60     72
   2     62     92
   3     80     84
   4     67     68
   5     70     80
   6     52     72
   7     56     80
   8     75     88
   9     56     64
  10     80    104

Note that we could calculate and enter the differences, and then use the 1-sample t procedures on the column of differences. Instead, we can let Minitab do the dirty work of calculating the differences by just using Mintab's paired t procedures. The Minitab output looks like:

Paired T-Test and Confidence Interval

Paired T for Bef - Aft

                  N      Mean     StDev   SE Mean

Bef              10     65.80     10.21      3.23
Aft              10     80.40     12.14      3.84
Difference       10    -14.60      9.51      3.01

95% CI for mean difference: (-21.40, -7.80)
T-Test of mean difference = 0 (vs < 0): T-Value = -4.85  P-Value = 0.000

The output tells us:

Note that if we had asked Minitab to consider the Aft-Bef differences, our conclusions would not change. The output would look like:

Paired T-Test and Confidence Interval

Paired T for Aft - Bef

                  N      Mean     StDev   SE Mean

Aft              10     80.40     12.14      3.84
Bef              10     65.80     10.21      3.23
Difference       10     14.60      9.51      3.01

95% CI for mean difference: (7.80, 21.40)
T-Test of mean difference = 0 (vs > 0): T-Value = 4.85  P-Value = 0.000

Note that:

(#9-11) One categorical (binary) variable that forms independent groups and one categorical (binary) variable

Here, one categorical variable forms two or more independent groups. It would be natural to summarize the second categorical variable by calculating a proportion for the first group and a proportion for the second group and a proportion for the third group and so on.

The chi-square test (#11) and the 2-proportions Z procedures (#9) are used to see:

That is, the above two goals are equivalent.

How to decide what to use:

Example

A random sample of 70 Stat 250 students were asked two questions:

The data were entered into two columns in Minitab. One column contains the students' gender, while one column contains the students' answer to the horoscope question. The males were coded as a 1, while the females were coded as a 2. Yes responses were coded as a 1, while No responses were coded as a 0.

Because gender forms only two independent groups -- males and females -- we can use either the chi-square test or the 2-proportions Z-test to see if there is a relationship between gender and horoscope reading. We can specify what we are testing either of the following ways.

Way #1

Way #2

The Minitab output for the chi-square test looks like:

Tabulated Statistics



 Rows: gender     Columns: horoscop

 

           0        1      All


 1        32        5       37
       86.49    13.51   100.00
          32        5       37
       26.96    10.04    37.00


 2        19       14       33
       57.58    42.42   100.00
          19       14       33
       24.04     8.96    33.00


 All      51       19       70
       72.86    27.14   100.00
          51       19       70
       51.00    19.00    70.00

Chi-Square = 7.372, DF = 1, P-Value = 0.007

 
 Cell Contents --

                  Count
                  % of Row
                  Count
                  Exp Freq


Note that Minitab tells us what each of the numbers in each of the cells means. The first number in each cell is the number of the people in the sample having the two characteristics defined by the cell. The second number in each cell is the row percentage. In this case, in the (1,1) cell, it is the percentage of males who read their horoscope regularly. The last number in each cell is the number of people in the sample we'd expect to fall in the cell if the null hypothesis were true, i.e. if there were no relationship between gender and horoscope reading.

The output tells us:

The corresponding Minitab output for the 2-proportions Z-test and interval is:

Test and Confidence Interval for Two Proportions


Success = 1


gender            X      N  Sample p

1                 5     37  0.135135
2                14     33  0.424242


Estimate for p(1) - p(2):  -0.289107
95% CI for p(1) - p(2):  (-0.490522, -0.0876921)
Test for p(1) - p(2) = 0 (vs not = 0):  Z = -2.81  P-Value = 0.005

The output tells us:

Two numerical variables