Lab 8: Predicting children's growth (Regression inference)

In this lab, you will investigate how fast children grow. You will determine how to predict a child's height or weight at a later age from their height or weight at an earlier age.

The Data

First, open the data set GROWTH, which is available in TED.

The data come from the Berkeley guidance study of children and were found here. The study involved 136 children, all born in Berkeley, CA in 1928-1929. These children were measured at ages 2, 9, and 18. The results of the original study were published in [R. D. Tuddenham and M. M. Snyder (1954). Physical growth of California boys and girls from birth to eighteen years. University of California Publications in Child Development, 1, 183-364].

Variable Name       Description
Gender M = Male, F = Female
WT2 Weight of the child in kilograms at age 2
HT2 Height of the child in centimeters at age 2
WT9 Weight of the child in kilograms at age 9
HT9 Height of the child in centimeters at age 9
WT18 Weight of the child in kilograms at age 18
HT18 Height of the child in centimeters at age 18

Minitab Instructions

Recall that you can make scatterplots by going to Graph --> Scatterplot and carry out linear regression by going to Stat --> Regression --> Regression. The table of output that you get includes not only the regression coefficients but also the information that you need to do inference for the regression slope: the standard errors for the estimated coefficients, and the t-statistic and p-value for the test that the slope is zero against a two-sided alternative.

To make sure you get a residual plot along with your regression, click on "Graphs" and the put the explanatory variable in the box under "Residuals versus the variables". You can also get a histogram or a normal probability plot of the residuals by checking the appropriate boxes.

Minitab will also calculate confidence intervals for the mean response and prediction intervals for an individual response, given a particular value for the explanatory variable. To obtain these intervals, go to Stat --> Regression --> Regression, choose the response and predictor variables as usual, and then click on the "Options" button. Type a value for the x-variable in the box "Prediction intervals for new observations". This should be the value of the x-variable for which you want to obtain a confidence interval or prediction interval for the response. Then you can check "confidence limits" and/or "prediction limits" below. You can also adjust the confidence level if necessary. In the Minitab output, under "Predicted Values for New Observations", will be the x-value that you typed in (under "Obs"), the predicted value for the y-variable (under "Fit"), a 95 percent confidence interval for the mean response for this value of the explanatory variable (under "95% CI"), and a 95 percent prediction interval for an individual response at this value of the explanatory variable (under "95% PI").

Predicting children's heights
  1. Begin by using linear regression to predict a child's height at age 9 from the child's height at age 2. What is the equation of your regression line? Based on your scatterplot and residual plot, does linear regression seem like an appropriate way to predict heights?

  2. Next try using linear regression to predict a child's height at age 18 from the child's height at age 9. What is the equation of your regression line? Does linear regression seem appropriate for these data?
Next investigate graphically whether boys and girls exhibit different growth patterns between age 2 and age 9. Go to Graph --> Scatterplot and click on "With regression and groups" and then click "OK". Choose HT9 as the Y variable and HT2 as the X variable, then put Gender in the box "Categorical variables for grouping" and click OK. You will get a scatterplot with height at age 9 on the y-axis and height at age 2 on the x-axis. The points for boys and girls will be in different colors. You will also see two separate regression lines drawn on the graph, one for the boys and one for the girls. Of course, you can then make a similar plot with height at age 18 on the y-axis and height at age 9 on the x-axis.
  1. Is there a big difference between how much boys and girls grow between age 2 and age 9, or does the regression line you found in question 1 appear to work well for both boys and girls?

  2. Now consider the period between age 9 and age 18. Is there a big difference between the growth patterns of boys and girls during this period, or does the regression line you found in question 2 work well for both boys and girls?
For the rest of this lab, you will consider only the boys. Go to Data --> Subset Worksheet , give a name to your new worksheet, then click on "Condition" and in the box labeled "Condition", type in 'Gender' = "M" (including the quotes), then click "OK" twice. You should get a Minitab worksheet that includes only the 66 boys in the original data set.
  1. Find the equation of a regression line that can be used to predict a boy's height at age 18 from the boy's height at age 9.

  2. What percentage of the variation in the boys' heights at age 18 is explained by this regression.

  3. Are the assumptions required for statistical inference satisfied? Explain how you arrive at your conclusions and provide supporting plots.

  4. Can you conclude that there is an association between boys' heights at age 18 and their heights at age 9? Make sure to state your null and alternative hypotheses and give the p-value for your test. Use significance level .05.

  5. Find a 95 percent confidence interval for the slope of your regression line. Explain carefully in a sentence or two what this confidence interval means. (Hint: if you want to find the critical value for the t-distribution in Minitab, go to Calc --> Probability Distributions --> t, click the "Inverse cumulative probability" bubble and type in the appropriate number of degrees of freedom. Then click the "Input Constant" bubble, and in the box type in the amount of area that will be to the left of the critical value you are looking for, which is .975 for a 95 percent confidence interval.)

  6. If a boy is 140 centimeters tall at age 9, find an interval that you are 95 percent confident will contain the boy's height at age 18. (Hint: Minitab can provide this interval. See the instructions at the beginning of the lab.)

  7. Find an interval that you are 95 percent confident will contain the average height at age 18 of all boys who are 140 centimeters tall at age 9.

Predicting children's weights
  1. Find the equation of a regression line that can be used to predict a boy's weight at age 18 from the boy's weight at age 9. Comment on what you see in the scatterplot and the residual plot.

  2. You should have noticed that the data set contains some outliers, including one rather extreme outlier that represents a boy who weighed nearly 67 kilograms at age 9. Try removing this outlier. (The easiest way to do this is to scroll down the list and find the outlier, move your cursor over the cells that you want to delete, and hit the delete key to place the value by an asterisk. Remember the value in case you want to put it back later.) Then do the linear regression again. This time, do the assumptions for inference appear to be satisfied?

  3. How much effect was the outlier having on the slope of the regression line? Would you say that this outlier is an influential point? Is it a high leverage point?

  4. Find an interval that you are 95 percent confident will contain the weight at age 18 of a boy who weights 30 kilograms at age 9. Use whichever model you think is most appropriate for answering this question.