Penn State, Department of Statistics

STAT 464-Applied Nonparametric Statistics

Fall Semester 1998

Week1, Weeks 2 and 3, Week 4, Week 5, Week 6, Week 7, Week 8, Week 9, Week 10, Week 11, Week 12, Weeks 13-14, Week 15,
 

Take Home Final:  See Week 15 for details.

Instructor:  Tom Hettmansperger

                       317 Thomas Bld
                       Phone:  865-2211
                       email:  tph@stat.psu.edu
                       Office Hours:  Monday 1:30-2:30,  Wednesday 2:30-3:30, or by appointment.
 

Assistant:  Mustafa Nadar

                     418 Thomas Bld
                     Phone:  865-3230
                     email:  nadar@stat.psu.edu
                     Office Hours:  Tuesday and Thursday 2:00-3:30

Text (required):  Introduction to Statistics:  The Nonparametric Way  by Noether

Text (recommended):  Minitab Handbook 3rd ed.  by  Ryan and Joiner

Reserve (Math Library, McAllister Bld):

 Reference:  Robust Nonparametric Statistical Methods by T. P. Hettmansperger and J. W. McKean  (This book was published in January 1998 and contains all the theoretical and mathematical background for the course.  It will not be used in this course.)
 

 Syllabus:

The course will integrate exploratory data analysis and nonparametric statistical inference.  The emphasis will be on the analysis and interpretation of data.

Your grade will be based on 2 exams and a comprehensive final exam.  The exams will be worth 100 points each and the final will be worth 200 points.  The final grade will be determined by the exam scores.  Homework will be assigned and collected or someone will put their solution on the board.  Homework grades will be used only in case you are on a borderline and, in any case, will not hurt your grade.

I anticipate moving through the text in roughly the following order:  2, 11, 12, 7, 8, 9, 10, 15, 13.  The reason for scrambling the order is to introduce inference in the one-sample model before discussing the two-sample model. The first exam will come roughly after I have discussed the material from Chapters 2, 11, 12.  The second exam will come roughly after Chapters 7,  8, 9, 10 and part of 15.  Scheduling exams will depend on the pace of the class.

Use of the computer will be an integral part of the course.  If you are not familiar with Minitab (a statistical software package) then see me at once.   You are free to use any software that you like, provided it will do the required computations.

To get Minitab in the campus pc labs:  either under the start button or a 'programs' icon on the side find SPREADSHEETS AND STATISTICS.  Inside you will find a Minitab icon to start the program.



Week 1
We will discuss material from Chapter 2 of Noether.
You might also want to look at the material from Chapters 1, 2, 3 of Hoaglin, Mosteller, and Tukey on reserve.
Also Chapter 3 of the Minitab Handbook has relevant material.

Here are the  Minitab commands that I used to make the sensitivity plot for the t statistic.

1.  let k1=-4.6
     let k2=1
2.  let c3(9)=k1
     let c4(k2)=(10**.5)*mean(c3)/stdev(c3)
     let c5(k2)=k1
     let k1=k1+.5
     let k2=k2+1
The first part initializes the program.  To generate (x,t) pairs run the second part several times.  I copy the second part and keep pasting it in the Minitab session window until I get enough values in columns 4 and 5. Then plot c5 vs c4. In versions of Minitab prior to Version 12, you can use the store command to store the commands in 2 above and then execute them several times.  This saves having to copy and paste the commans over and over, not a very elegant solution.  Below, in week 2, I describe how to create Macros beginning with Version 10 of Minitab.



Weeks 2 and 3
In Version 12 of Minitab you must use Global or Local Macros since Minitab 12 does not have the store command. We will try to only use Global Macros in this course since they are easier to write.  The following example shows how to write a macro for the sensitivity plots for the mean and the median of the Shoshoni data.

You must type the macro into a word processor.  I am using Notepad.

gmacro                                         (You must type this line exactly as it is.)
sens                                              (Here you type your choice of the name of the macro.)
name c3 'ave', c4 'x', c5 'med'       (I named the columns in the macro but you don't actually have to do this.)
copy c1 c2                                    (The next 9 lines are the Minitab program to compute the plots.)
do k2=1:k1
let c3(k2)=mean(c2)
let c5(k2)=median(c2)
let c4(k2)=c2(20)
let c2(20)=c2(20)-.05
enddo
plot c3*c4
plot c5*c4
endmacro                                        (You have to put in this line exactly as is.)

Now I am going to save this macro on a floppy in the a drive.  In Notepad, use save as "sens.mac" on the floppy in the a drive.  You need the double quotes so that Minitab will recognize it as a macro.  The sens part is just the name I gave the macro.

Now assume you are in Minitab and the Shoshoni data is in C1.

Exercise Due Friday September 11: Finally, I want to give you an example of a Bootstrap program to approximate the standard deviation of the sample mean from the Shoshoni data.  Actually, this macro will work for any data set you have in C1.

gmacro
bootmean
name c3 'm'
let k3=count(c1)
do k2=1:k1
sample k3 c1 c2;
replace.
let c3(k2)=mean(c2)
enddo
describe c3 c1
histogram c3
endmacro

You could copy and paste this into Notepad and save it as "bootmean.mac" on a floppy in the a drive.
Back in Minitab, let k1=1 and then at the prompt type %a:\bootmean.  If it works let k1=200 and again %a:\bootmean.  Always try it once to make sure it works before you try it 200 times.



Week 4
Assignment due Wednesday, Sept. 16: Find a bootstrap estimate of the standard deviation of the sample median for the Shoshoni data.  Use this estimate of the standard deviaton to make an inference concerning whether the Shoshonis use the golden rectangle.  Explain how your analysis based on the sample median differs from an analysis based on the sample mean.

We now move to Chapter 11 and develop nonparametric confidence intervals for the population median; see section 11.1.2 in the text.  Minitab produces the confidence interval under the Stat>Nonparametrics>1-Sample Sign  menu.  Finally, we will embed the confidence interval in the boxplot using the Graph>Boxplot menu and dialog box.  This will be discussed in detail in class.

Here is an example of the CI-Boxplot for the Shoshoni data; the red crosshatched box is a 95% confidence interval for the population median w/l ratios:




Week 5
Assignment due Monday, Sept. 21: In each of the following two exercises you should
  1. State the research question in English.
  2. Translate the Res. Q.  into a statistical question concerning the population median.
  3. Estimate the population median, get a confidence interval for the population median, and produce a CI-Boxplot
  4. State your conclusions based on your analysis.  (The most important part.)
Do this for #3, p191 by hand and for #5, p191 by computer.

This week we will develop tests of hypotheses based on confidence intervals.  Then the CI-Boxplot becomes a visual display for exploratory analysis, confidence interval, and hypothesis test.  We will also show that the sign test  is equivalent to the test based on the confidence interval.  See section 11.1.3 for the sign test.

Assignment due Friday, Sept. 25:

  1. Using exercises #24 and 33, p196 and p198 in the text, state a research hypothesis, translate the research hypothesis into the appropriate statistical hypothesis, carry out the sign test in two forms:  first using the CI-Boxplot and second using the formal sign test.  State the significance level of the test in each case.   You can do the sign test via computer or by hand.  State you conclusions.
  2. In addition, construct a gmacro that will carry out a sign test, construct a sign interval, and produce a CI-Boxplot.  Print out the macro and hand it in on Friday.  Try it out on the exercises.  If you don't use gmacros then use the store command to build a exec file to do the same thing.  I will eventually put an example on this page.  However, I would prefer that you figure out how to do it using the history window and modifying the commands in the appropriate places.


Week 6
Here is the GMACRO for one sample sign methods that we developed in class:
Gmacro
One
#k1=conf coeff
#k2=null value of eta
#k3=alternative
#k4=data column
SInterval k1 ck4.
STest k2  ck4;
  Alternative k3.
Dotplot ck4
Describe ck4
Boxplot ck4;
  Box;
  Symbol;
    Outlier;
  Box;
    CI k1;
    Type 4;
    EColor 2;
    Color 0;
  ScFrame;
  ScAnnotation.
endmacro

This week we will begin two sample comparisons.  The new aspect of rough confirmatory inference entails using two 85% confidence intervals to carry out a 5% two sided test of  the null hypothesis that delta is 0 versus the alternative hypothesis that delta is not 0.  Here is an example of a Local Macro to do this:

Macro
Roughtwo X1 X2;
Conf  C.
Mcolumn X1 X2 X3 X4
Mconstant C
Default C=85
SInterval C X1 X2.
Stack X1 X2 X3;
  Subscripts X4.
DotPlot X3;
  By X4.
Boxplot X3*X4;
  Box;
  Symbol;
    Outlier;
  Box;
    CI  C;
    Type 4;
    EColor 2;
    Color 0;
  Title "Comparison CI-boxplots (85% conf coeff for 5% two sided test)" ;
  ScFrame;
  ScAnnotation.
endmacro

Exercises due Friday, Oct 2:  Construct a local macro and find an estimate and rough 95% confidence interval for delta based on 85% confidence intervals for the individual samples for data in exercises #12 and #23, p157 of the text.



Week 7
This is the week for Mann-Whitney-Wilcoxon two sample methods.  See Chapters 7 and 10 of the text.  Chapters 6 provides some good background for hypothesis testing.

Exercises due Friday, Oct 9: For each of the following exercises, state a research hypothesis, translate it into a statistical hypothesis, carry out a rough confirmatory and exploratory analysis followed by a strong confirmatory analysis.  There should be a complete discussion of your conclusions.  If you reject the null hypothesis you should include an estimate, margin of error, and confidence interval for delta and say in words what it means.  #13 and #20 p117 of the text.  Do #13 by hand and #20 by computer.



Week 8
This week we finish up the material on two-sample comparisons.  To try the dynamic display for the permutation distribution of the Mann-Whitney test:  stat dept home page>ed resources>by type of resouce>dynamic visual displays>two sample comparisons.  Let me know if you have trouble with it.  The last topic will be a short discussion of power and efficiency of the rank test, t test,  and one sample sign test.


Week 9
Monday is test day.  The take home part is due Friday, Oct. 23 at class time.  We will discuss efficiency and power on Wed. and possibly begin a discussion of the one-way layout.  Read over Section 15.1 (Chapter 15) in the text.

Following is a local macro named roughmany that you can use to construct 85% CI-Boxplots of as many columns of data as you wish. It also provides the comparison dotplots and the interpolated 85% confidence intervals.  To invoke it in Minitab from a floppy:  %a:roughmany c1-c3  for example.  It is set up to handle a variable number of columns.  The 75% option has been deleted since we generally want 85% intervals.

Macro
Roughmany X.1-X.n
Mcolumn X.1-X.n X3 X4
SInterval 85 X.1-X.n
Stack X.1-X.n X3;
  Subscripts X4.
DotPlot X3;
  By X4.
Boxplot X3*X4;
  Box;
  Symbol;
    Outlier;
  Box;
    CI  85;
    Type 4;
    EColor 2;
    Color 0;
  Title "comparison CI-boxplots (85% conf coeff for 5% two sided test)" &
    ;
  ScFrame;
  ScAnnotation.
endmacro



Week 10
Discussion of Kruskal-Wallis test for one-way layouts and multiple comparisions.  Read over Section 15.1 (Chapter 15) in the text. Note however that we are using different multiple comparisons from the text.  Exercises due Friday, Oct. 30:  p286 nos. 7 and 11.  State the research hypothesis, do a rough confirm analysis, carry out the Kruskal-Wallis test, and do the multiple comparisons.

Below is a global macro to generate the permutation distribution of the Kruskal-Wallis statistic for the coal data:
This macro assumes that c6 contains the stacked data, c7 contains the subs to identify the samples in c6, and c8 contains the ranks of the data in c7.

gmacro
coalperm
let k11=12/(42*43)
let k12=3*43
do k2=1:k1
sample 42 c8 c18
unstack c18 c9-c13;
subs c7.
let c14(k2)=k11*(7*mean(c9)**2+8*mean(c10)**2+9*mean(c11)**2+8*mean(c12)**2+10*mean(c13)**2)-k12
enddo
endmacro

Invoke this in Minitab by
1.  let k1=number of permutations you want to use.  (Try 10 to make sure it is working.  Then set it to 1000 or what you want.)
2.  %a:coalperm



Week 11
This week we will continure with the discussion of the one-way layout.  In the case of the maternal rats we must consider patterned statistical alternatives and how to construct tests and how to construct multiple comparisons.
Exercise due Monday, Nov. 9:  Analysis of perfect pitch data; see the handout for details.


Week 12
This week we begin median polish.  Below is a global macro to get the typical value and main effects, the table of residuals, and R*.  It automatically stacks the data and finds the two columns of subscripts.

Here is the global macro called medpolish:  To invoke it: put the data in the first columns beginning with column 1.  Then let k1=number of rows, let k2=number of columns, and let k44=number of iteratios (half cycles).  Then %a:medpolish.

gmacro
medpolish
let k3=k2+1
let k4=k2+2
let k5=k2+3
let k6=k2+4
let k7=k2+5
let k8=k2+6
let k9=k2+7
stack c1-ck2 ck3;
subs ck4.
set ck5
k2(1:k1)
end
name ck5 'r', ck4 'c', ck3 'd', ck6 'row', ck7 'col', ck8 'resids'
name ck9 'comp'
MPolish 'd' 'r' 'c' 'resids';
  Iterations k44;
  Effects k10 'row' 'col';
  Comparison 'comp'.
name k10 'tv'
print 'tv' 'row' 'col'
Table 'r' 'c';
  Data 'resids'.
name k50 'R*'
let k50=(sum(abso(d-tv))-sum(abso(resids)))/(sum(abso(d-tv)))
print k50
Plot 'd'*'r';
  Connect  'c';
  ScFrame;
  ScAnnotation.
Plot 'd'*'c';
  Connect 'r';
  ScFrame;
  ScAnnotation.
endmacro
Assignment:  Due Friday, Nov. 13:  Using the data in the class handout, median polish the table both by hand and by computer, get the residual table, R*, and the graphs. Do you think there is any interaction or not?


Weeks 13-14
Using median polish, we will look at statistical inference for the two-way layout with one observation per cell.  Exercise due Wednesday, Dec. 2:  Analyze the % of women smokers data.  The research hypothesis asks if the % of women smokers has changed over time.  The data  is blocked by age.  Do a rough and strong confirmatory analysis.

Here is a table of Tukey's studentized range critical values:
 
no. of samples 2 3 4 5 6 7 8 9
q* (.05) 1.96 2.34 2.57 2.73 2.85 2.95 3.03 3.10
q* (.10) 1.65 2.05 2.29 2.46 2.59 2.69 2.78 2.86



Week 15
We will develop a stong confirmatory rank test for interaction this week.  As part of the Take Home Final based on the Poison by Treatment experiment, you should state the research hypothesis and carry out a complete analysis of the data.  This exercise is due Friday, Dec. 11.  If we have extra time, I will discuss correlation and association.  Part 2 of the Take Home Final: Handed out on Monday, Dec. 7.  Due Wed., Dec. 16 by Noon.  Give this part to Sue in the Statistics Department Office (Thomas, Rm 326) and ask her to put it in my mailbox.

Back to top of page.