STAT 504: Analysis of Discrete Data
(SAMPLE) Course Description and Syllabus for
Please note: The syllabus that you receive upon enrollment in STAT 504 is your official syllabus for the course. The sample information here is intended to give you an idea of what to expect.
Course Author: Professor Aleksandra Slavkovic
Email: sesa@stat.psu.edu
Course Objectives
- To develop a critical approach to the analysis of contingency tables
- To examine the basic ideas and methods of generalized linear models
- To link logit and log-liner methods with generalized linear models
- To develop basic facility in the analysis of discrete data using SAS/R
Prerequisites
Stat 504 is intended primarily for graduate students outside of the Statistics department. It may also be appropriate for first-or second year graduate students in Statistics. Advanced graduate students in Statistics should consider taking Stat 544 instead.
- Stat 504 assumes knowledge of basic techniques of applied statistics, including normal- theory confidence intervals and hypothesis tests (i.e. one and two-sample t-tests, etc.), multiple linear regression and analysis of variance. It is strongly suggsted that you take Stat 500, 501 and 502 before taking Stat 504.
- A course in applied probability, or at least some familiarity with discrete probability and distributions, expectation, variance, etc. is important.
- Students are expected to have basic mathematical ability to deal with summations, square roots, logarithms, etc. and occasionally some simple matrix matrix algebra.
- Students should already feel comfortable using either SAS or R, or be a quick learner of software packages, or able to figure out how to do the analyses in another package of their choice.
Information, announcements, handouts & homework will all be found at the ANGEL course web site.
Textbook:
An Introduction to Categorical Data Analysis, 2nd Ed. (2007), Wiley, ISBN: 0471226181
This is the new and improved text of Agresti (1996). It is less theoretical and therefore less technical than Agresti (2002). Students are free to purchase either 2007 or 2002 text for this course. References are provided in the lesson materials for both texts.
Alternative Text
Categorical Data Analysis, by Alan Agresti, 2nd edition (2002), Wiley, ISBN: 0471360937
This is an alternative more technical text that students may want to purchase instead. This is a popular and highly cited reference book on categorical data. Some of the lectures will follow this book closely, and others will not. The book is definitely worth owning.
Suggested Reading Materials:
- Bishop, Y.M., Fienberg, S.E., and Holland, P.W. (1975). Discrete Multivariate Analysis: Theory and Practice, MIT Press.
- Edwards, D. (2000). Introduction to Graphical Modeling. Second Edition, Springer.
- Fienberg, S.E. (1980). The Analysis of Cross-Classified Categorical Data. MIT Press.
- Wasserman, L. (2004) All of Statistics: A Concise Course in Statistical Inference. Springer. ( http://www.stat.cmu.edu/ larry/all-of-statistics/index.html)
- Whittaker, J. (1990). Graphical Models in Applied Multivariate Statistics. Wiley. Computing
Statistical Software: SAS, R, Splus, Minitab, etc...
We will primarily use:
- SAS (http://www.sas.com/), and
- R (http://www.cran.rproject.org/).
Students who wish to use other packages (S-PLUS, SPSS, Minitab, Stata, etc.) are welcome to do so, however, they will be responsible for teaching themselves how to perform the analyses in these packages, and for ensuring that the results are consistent with what they would obtain from SAS and/or R. Sample analyses in SAS and/or R will be provided throughout the course. Some students may decide to work with more than one package. Students who use other statistical packages should probably re-work these examples to make sure that they obtain the same results. We cannot guarantee that you would be able to execute the required methods in other packages. The amount of class time that we can devote to computer issues is limited, so you should feel very comfortable with at least one software package. Students who encounter difficulties in computing will be encouraged to seek help through the course discussion board, and in office hours with the instructor and grader or, preferably, through working together with other students in the class.
A book that may be useful is Categorical Data Analysis Using the SAS System, Second Edition by Stokes, Davis, and Koch (2001, SAS Institute). This book is not required, however, it can be helpful for graduate students who anticipate doing a lot of categorical data analysis in SAS in their future research. This book was written by biostatisticians and has a strong biostatistical flavor. It focuses on the mechanics of performing analyses in SAS, rather than on the underlying statistical principles. There are also relevant SAS courses, e.g., the STAT 480, STAT 481, STAT 482 sequence.
There are many resources to learn R. The main website is http://cran.r-project.org/ and the introduction manuals are at http://cran.r-project.org/manuals.html. For additional notes, and book references see http://www.biostat.wisc.edu/~kbroman/Rintro/.
Class attendance and participation
This course will cover a broad range of topics, and will frequently go beyond material found in the textbooks. Students will be responsible for all material covered in lecture notes, whether or not it is found in the textbooks. Hence it is essential for students to attend to class activities on a regular basis, e.g., discussion board.
You are strongly encouraged to participate in online discussion. This helps you become more comfortable with the material, and, at the same time, gives other members of the class the benefit of your ideas and perspective. In particular, you should ask questions whenever you have them. Your questions show me both what has been made clear and what needs to be clarified and, consequently, they help the instructor to teach more effectively. Remember, we learn from our mistakes too!
Course Grading
The course grade will be based on the following allocation:
- homework – 50%
- participation in online discussions – 10%
- 3 online, timed exams – 40%
Homework
Homework assignments will be given frequently throughout the semester, following completion of each lecture. It is your responsibility to download them from the course home page. The assignments will contain both data analysis exercises and conceptual/theoretical questions that challenge your understanding of the key ideas.
The homework SHOULD be typed, especially data analysis part. Clear writing and presentation are important parts of the assignments. Applied statistical analyses are useless without clear explanations. Thus, you should not include raw computer output in your reports.
Exams
There are 3 online, timed exams. The first two exams are worth 10% of your final grade. The third exam is comprehensive, and is worth 20% of your final grade.
Collaborative work
You are encouraged to work together – for example, to help one another with computer issues, to share class notes and discuss the material, etc. On the homework assignments, a reasonable amount of collaboration is allowed. Each student, however, must turn in his or her own written work which reflects his or her own individual analysis and understanding of the material. Because this is a graduate course, the students will be assumed to have sufficient motivation and maturity to come to their own understanding of the material without a strict working- alone policy.
Outline of course
The following outline is tentative, and may be modified as the semester progresses, according to the interests of students and the discretion of the instructor.
- Quick review of discrete probability distributions: binomial, multinomial, Poisson. Introduction to the concept of likelihood. Tests for one-way tables using Pearsons X2 and likelihood-ratio G2 statistics.
- Introduction to contingency tables. 2 × 2 and r × c tables, tests for independence and homogeneity of proportions, Fishers exact test, odds ratio and logit, other measures of association. Introduction to 3-way tables, full independence and conditional independence, collapsing and Simpsons paradox.
- Introduction to generalized linear models. Poisson regression. Logistic regression for dichotomous response, including interpretation of coefficients, main effects and interactions, model selection, diagnostics, and assessing goodness of fit.
- Polytomous logit models for ordinal and nominal response.
- Loglinear models (and graphical models) for multi-way tables.
- Other topics as time permits (and due to the interests) : causality, repeated measures, generalized least squares, mixed models, latent-class models, missing data, algebraic statistics approach.
Physically disabled and learning disabled students
It is Penn State’s policy to not discriminate against qualified students with documented disabilities in its educational programs. If you have a disability related need for modifications in this course, contact your instructor and the Office for Disability Services (located in 116 Boucke Building) or the Disability Contact Liaison at your Penn State location. Instructors should be notified as early in the semester as possible. You may refer to the Nondiscrimination Policy in the Student Guide to University Policies and Rules 1997.
Plagiarism
Feel free to talk with the course instructor if you have any questions or comments about what constitutes plagiarism. All Penn State and Eberly College of Science policies regarding academic integrity apply to this course. See: http://www.science.psu.edu/academic/Integrity/index.html for details.