Software for multiple imputation


Multiple imputation

Multiple imputation is a simulation-based approach to the statistical analysis of incomplete data. In multiple imputation, each missing datum is replaced by m>1 simulated values. The resulting m versions of the complete data can then be analyzed by standard complete-data methods, and the results combined to produce inferential statements (e.g. interval estimates or p-values) that incorporate missing-data uncertainty.

Click here for answers to frequently asked questions about multiple imputation.

Back to top


Libraries for S-PLUS

At present, four different software packages are available for creating multiple imputations in S-PLUS.

NORM
Multiple imputation of multivariate continuous data under a normal model. Routines described in Chapter 5 of Schafer (1997a).

CAT
Multiple imputation of multivariate categorical data under loglinear models. Routines described in Chapters 7-8 of Schafer (1997a).

MIX
Multiple imputation of mixed continuous and categorical data under the general location model. Routines described in Chapter 9 of Schafer (1997a).

PAN
Multiple imputation of panel data or clustered data under a multivariate linear mixed-effects model. Routines described in Schafer (1997b).
All four packages (NORM, CAT, MIX and PAN) are available as functions in S-PLUS. For efficiency, the computationally intensive portions are carried out in Fortran-77; the compiled Fortran object code is dynamically loaded in the S-PLUS session.

S-PLUS for Unix: Each package comes in the form of a shar archive, including Fortran source that must be compiled for your particular system.

S-PLUS Version 3.3 for Windows: Each package is a self-extracting zip (*.exe) file. Executing the file will create an S-PLUS library.

S-PLUS Version 4.0 for Windows: Each package is a self-extracting zip (*.exe) file. Executing the file will create an S-PLUS library.

Having trouble with NORM in S-PLUS version 4.5 for Windows? Try replacing your current "norm.obj" file with this new version.

Back to top


Stand-alone packages for Windows 95/98/NT

We have also been developing free, stand alone applications for Windows 95, 98, and NT. As of July 1999, one package is available.

NORM
Version 2.02 for Windows 95/98/NT. Multiple imputation of multivariate continuous data under a normal model. This is a major update of the package that we first released in 1997. It has lots of great new features. Check it out! Download NORM Version 2.03 for Windows.

Future Windows software releases. We are still working on stand-alone Windows versions of our other software packages CAT, MIX, and PAN. The next package to be released will be PAN, perhaps by late summer 1999. Quality software takes time to develop, especially with our limited resources. Please be patient; we are working as fast as we can!

Back to top


Authorship and use

This software was written by Joe Schafer of the Department of Statistics, The Pennsylvania State University. Maren Olsen (same affiliation) assisted in the development of the stand-alone Windows applications. The software may be distributed free of charge and used by anyone if credit is given. It has been tested fairly well, but it comes with no guarantees and the authors assume no liability for its use or misuse.

Back to top


Acknowledgements

Development of this software has been supported by grant 2R44CA65147-02 from National Institutes of Health, and by grant 1-P50-DA10075 from the National Institute on Drug Abuse (NIDA). This ongoing work is carried out at The Pennsylvania State University, in the Department of Statistics and at the NIDA-supported Center for the Study of Prevention through Innovative Methodology.

Back to top


Problems? Questions?

Because our software is distributed free of charge, our ability to handle user's questions is limited. We are unable to provide detailed advice regarding the use of these packages in specific data sets. In our experience, many questions arise because users are unfamiliar with the new statistical techniques implemented here. Potential users of our software should first become throroughly familiar with the technique of multiple imputation (see our FAQ page). You should also browse the documentation provided with each software package.

Back to top


References

Schafer, J.L. (1997a)
Analysis of Incomplete Multivariate Data, Chapman & Hall, London.

Schafer, J.L. (1997b)
Imputation of missing covariates under a general linear mixed model. Technical report, Dept. of Statistics, Penn State University.

Back to top


created by Joe Schafer/ jls@stat.psu.edu/ revised July 12, 1999