Stat 462 Midterm Exam #2 Study Guide Specific sections covered in text: * 2.4, Interval estimation of mean response E(Yh) * 2.5, Prediction interval for a new observation and for the mean of m new observations * 2.7, Analysis of variance approach to regression analysis * 2.8, General linear test approach * 2.9, Descriptive measures of linear association * 3.2, Residuals * 3.3, Diagnostics for residuals * 3.4, Overview of tests involving residuals Have seen Durbin Watson test for correlated errors Modified Levene test for constant error variance Ryan-Joiner test for normality of errors * 3.7, F test for lack of fit * 3.8, Overview of remedial measures * 3.9, Transformation * 4.1, Joint estimation of \beta_0 and \beta_1 * 4.2, Simultaneous estimation of mean responses * 4.3, Simultaneous prediction intervals for new observations * 5.1, Matrices * 5.2, Matrix addition and subtraction * 5.3, Matrix multiplication * 5.4, Special types of matrices * 5.5, Linear dependence and rank of matrix * 5.6, Inverse of a matrix * 5.7, Some basic results for matrices * 5.8, Random vectors and matrices * 5.9, Simple linear regression model in matrix terms * 5.10, Least squares estimation of regression parameters * 5.11, Fitted values and residuals * 5.12, Analysis of variance results In general: * Know how to read, interpret, and get answers from Minitab output. * Know how to state the hypotheses, make a decision using the P-value, and write a conclusion in words for each test we studied. More specifically, you should: * Know the difference between the true and the estimated regression line. * Be able to distinguish between estimating a mean response, predicting a new observation, and predicting the mean of m new observations * Know how to calculate the confidence interval for a mean response and the prediction interval for a new observation (and the mean of m new observations) from Minitab output. * Know that (and why) a prediction interval for a new observation is wider than a confidence interval for the mean response. * Know the distinction between a confidence interval and a prediction interval. * Know how all the numbers in the ANOVA table for regression relate to one another, such as SSR+SSE=SSTO. If some of the numbers in the table are missing, know how to find them. * Most importantly, know how the ANOVA decomposition decomposes the total variation in Y and the importance of this. * Know how to conduct the F-test that the slope is 0 (versus that it's not 0) using the ANOVA table. Know how the test statistic is derived from the two methods we have learned: 1) from considering the targeted expected values of MSE and MSR, 2) from the general linear test approach. * Know the relation between the t-test and the F-test for testing that the slope is 0. * Be able to specify the full and reduced models for the general linear test approach for a given test situation. * Understand the general idea of the general linear test approach. * Know, for the simple linear regression model, how the general linear test approach leads to the same F-test as before (and therefore that the test can be conducted using the ANOVA table we learned before) * Know that the coefficient of determination and the correlation coefficient are measures of linear association (they can be 0 even if there is a perfect nonlinear association). * Know the formula for the coefficient of determination, and therefore know how to calculate it and interpret it -- "blah-blah % of the variation in Y is explained by the variation in the predictor X." * Know that linear association between two variables does not imply that one causes the other. * Know how to calculate the correlation coefficient from the coefficient of determination * Know what various correlation coefficient values mean, such as -1, 0, 1, and somewhere between -1 and 1. There is no other meaningful interpretation for the correlation coefficient as there is for the coefficient of determination. * Know why we nag ourselves about model checking. * Know the six things that can go wrong with the model. * Know which plot in Minitab output can be used to evaluate what violation. * Know what different residuals vs. fits plots mean: 1) well-behaved (horizontal band), 2) non-constant error variance (fan), 3) nonlinearity (systematic pattern) * Know how to use a standardized residuals vs. fits plot to help flag an outlier. (Minitab helps flag these in the session window output.) * Know how to interpret a residual vs. order plot * Know that a normal probability plot should look linear if the error terms are normally distributed * Know how to interpret a residuals vs. omitted predictors plots * Know how to conduct a Durbin-Watson test given the statistic from Minitab: how to specify the hypotheses, make a decision, and write a conclusion in words * Know how to conduct the modified Levene test given the statistic and P-value from Minitab: how to specify the hypotheses, make a decision, and write a conclusion in words * Know how to conduct the Ryan-Joiner correlation test for normality of error terms: how to specify the hypotheses, make a decision, and write a conclusion in words * Know how the goodness of linear fit test statistic is derived from generalized linear test approach via the specification of the full and reduced models. * Know how to conduct the lack of (linear) fit test: how to specify the hypotheses, make a decision, and write a conclusion in words. * Know how all the numbers in the lack of fit ANOVA table relate to one another, such as SSE=SSPE+SSLF. If some of the numbers in the table are missing, know how to find them. * Know that the (linear) LOF test only gives you evidence against linearity. If you reject the null, and conclude lack of linear fit, it doesn't tell you what (non-linear) regression function would work. * Remedial measures: know possible ways of moving away from the simple linear regression model when different assumptions of the model are violated * Know how transforming the data can help with violations of non-linearity, nonconstant variance and non-normality. * Know that we prefer to transform X only if the only problem is non-linearity, and that would transform Y or X and Y for non-constant variance and non-normality. Know also the reason behind this. * Understand why simultaneous inference is important. * Know the difference between a statement confidence level and a family confidence level. * Know how to interpret statement confidence intervals and family confidence intervals. * Know the basic idea behind the joint Bonferroni confidence intervals and know how to form them (specifically how to choose the multiplier at the correct level). * Be comfortable with the matrix manipulations we've seen in the matrix review * Know how to write down the simple linear regression in matrix terms * Know how the least squares estimates, fitted values, residuals and the ANOVA table can be written in matrix terms.