Statistics 200 Honors
Elementary Statistics
Fall 1999
Computer Labs
Thursday, August 26
Since this computer lab was a bit of a bust due to the expired Minitab
license, we'll begin using Minitab next week.
Thursday, September 2
Part of the class will involve using the Minitab data set found at
I:\STAT\200\Old_Files\Survey99.MTW
These data were collected from 227 respondents to a
survey (out of 240 asked to fill out
the survey) in a past section of Stat 200.
By the end of this computer lab, you should be able to do the
following using MINITAB:
- Simulate a binomial variable (and use this to simulate a sample proportion).
- Obtain entries in the standard normal table at the front of your book.
- Produce a histogram or boxplot of a univariate distribution.
- Recode a categorical variable from numeric to text.
- Obtain a confidence interval for a proportion.
- Produce a cross-tabulation table for two categorical variables.
Once you've accomplished these tasks, please take a minute to familiarize
yourself briefly with the
survey and its questions. Feel free to use
MINITAB to produce plots or summary tables of any of the variables you
are interested in. Think about the different types of variables in the
survey and see if you can identify which of the variables fall into these
categories:
- Quantitative variables: Numeric variables for which arithmetic
operations such as adding and averaging make sense.
- Categorical variables: Variables which place an individual into one of
several groups or categories. These may be numeric, but the numbers
only signify which group an individual belongs to.
Consider what types of graphs and summary tables make sense for the different
types of variables.
Thursday, September 9
Most of the day, you'll have time to work on the
homework assignment due tomorrow. This assignment
requires you to use the results of a survey
which is in a Minitab worksheet in this file:
I:\STAT\200\Old_Files\Survey99.MTW
I have written one solution to the homework question, which you can see in
pdf
or postscript format.
Please don't use the same two variables I used!
Here are some dangers of this assignment:
- Make sure you choose two categorical variables. Quantitative variables
will not work.
- Try to choose variables without too many categories. You don't want the
table to get too sparse, meaning that you don't want small numbers in any of
the cells if you can avoid it. Thus, by the same token, don't choose
variables in which one of the choices is likely to have very few people who
choose it.
- As a rule of thumb, none of your expected (not observed) counts should be
less than 5 in order that the chi-square test is approximately a correct test
to use. The test is more accurate for larger samples, and 5 is just a
generally accepted cutoff for the lowest expected count. If you choose
variables which result in counts lower than 5, you might want to choose new
ones or talk to me about it.
- Finally, don't forget that you should explain any Minitab output you turn
in--but I encourage you to turn in such output.
Thursday, September 16
Most of the day, you'll have time to work on the
homework assignment due tomorrow. This assignment
requires you to use the results of a survey
which is in a Minitab worksheet in this file:
I:\STAT\200\Old_Files\Survey99.MTW
I have written one solution to the homework question, which you can see in
pdf
or postscript format.
Please don't use the sports participation variable that I used.
The assignment requires that you use a test comparing two proportions and a
cross-tabulation chi-square test. In addition, you may want to recode
the gender question and your yes-no question of choice from numeric to
text. Here's how to do each of these things:
- To recode (for example) variable C1 so that 0 becomes Male and
1 becomes Female,
select Code from the Manip menu. Choose numeric to text.
Both "from columns" and "to columns" should be C1 unless you want
to create a new column C228 and leave the original C1 intact.
Enter the original values (0, 1) and the new values (Male,
Female).
If you prefer to type the commands, see the
solution
to find out what commands to type.
- To run a two-proportion test, choose Basic Statistics from the
Stat menu, then choose 2 proportions. You want "samples in one
column"; the samples variable is the one you've decided to study
and the subscripts variable should be C1 (gender).
Before you run the test, make sure you
click Options and check the pooled estimate option. If you don't
do this, the test you perform will be slightly different from the
one specified on p. 605.
- To run a cross-tab test, choose Tables from the Stat menu, then
choose Cross tabulation. Be sure to check the chi-square box.
Also check the row and/or column percentage boxes. This will
produce not only the table you want, but the chi-square statistic
and its p-value as well.
Thursday, September 23
No computer lab today. Instead, we'll have an optional informal review
session in 222 Thomas. The test is next Thursday.
Thursday, October 14
By the end of this class:
- Know what a scatterplot is and how to create it on Minitab.
- Know what the correlation r measures and how to compute it on Minitab.
- Do exercise 2.19, p. 131.
Thursday, October 21
Today I'd like you to explore Minitab's capabilities for doing simple linear
regression. As part of this task, you may figure out how to complete
Exercise 2.58 on p. 173, which is part of your assignment due
tomorrow.
Here is a list of the Minitab tasks at which you should try to be somewhat
proficient before you leave today:
- Plotting scatterplots, with or without regression lines added.
- Computing the equation for a regression line, the correlation between
two variables, and the value of r-squared.
- Using the equation to predict values of the response variable
for a given value
of the predictor variable.
- Obtaining a column of residuals and making residual plots of various
types.
- (*) Obtaining confidence intervals for predicted values, the slope of
the
regression line, and the intercept of the regression line.
- (*) Obtaining sum of squares, degrees of freedom, mean squares, and
an F
statistic for the linear regression model.
(*) Please note: The last two items on the list above involve topics we
haven't discussed in class yet.
The help menu may be of interest to you as you attempt these tasks.
Be sure to explore each of the following menu choices at some point today:
- Graph - Plot
- Stat - Regression - Regression
- Stat - Regression - Fitted Line Plot
- Stat - Regression - Residual Plots
Thursday, November 4
Today, you'll be able to spend some of the class working on the homework due
tomorrow. However, I'd like you to come up with a sample dataset which
illustrates the italicized quotation on p. 724:
The significance tests for individual regression coefficients assess the
significance of each predictor variable assuming that all other
predictors are included in the regression equation.
In order to do this, you'll formulate a simple model which includes two
predictors. Each predictor is strongly linearly correlated with the
response variable. However, neither of them is significant when you
look at a regression model with both thrown in.
Can you see how to do this?
dhunter@stat.psu.edu
Last modified: October 21, 1999