## Lab 7: SAT scores and cloud seeding (Hypothesis testing)

In this lab, you will use Minitab to find confidence intervals and conduct hypothesis tests. You will analyze data on SAT scores and GPAs of economics students at Vanderbilt University, and data from a cloud seeding experiment.

Minitab Instructions

To find confidence intervals and carry out hypothesis tests in Minitab, go to Stat --> Basic Statistics. You will then see options for a number of tests. You should focus on 1-Sample t (the one-sample t-test), 2-sample t (the two-sample t-test), and Paired t (the paired t-test). More details on how to perform these tests are given below, although you may prefer just to start on the activity and refer back to these instructions as needed.

First, consider the one sample t-test. You can either give Minitab a column of data (by clicking in the white box at the top and then selecting a column), or you can select "Summarized data" and just input the sample size, sample mean, and sample standard deviation. To get a hypothesis test rather than just a confidence interval, check the box "Perform hypothesis test". You must also input the "Hypothesized mean", which is the value of μ0 when your null hypothesis is μ = μ0. If you click the "Options" button, you can change the alternative from two-sided (the default) to one-sided. You can also change the "Confidence level". Note that what Minitab calls the confidence level is not exactly what we have been calling the "significance level" or "alpha level"; a confidence level of 95 corresponds to a significance level of .05. If you want, you can click in the "Graphs" button and get a histogram or boxplot of the data at the same time as the test. Minitab outputs both a confidence interval for the parameter and the results of a hypothesis test. The confidence interval is under "95% CI", the t-statistic is under "T", and the p-value is under "P". Minitab does not tell you whether or not to reject the null hypothesis, but of course you can figure this out from the p-value.

The procedure for the paired t-test is similar. You just have to either input both columns of data, or input the sample size, sample mean, and sample standard deviation for the differences between the columns. You can click on "Options" to change the confidence level or switch to a one-sided test. Minitab outputs both a confidence interval for the mean of the differences, and the t-statistic and p-value for the hypothesis test that the difference is zero.

For the two-sample t-test, there are three ways to input the data. If you have the summary statistics for both variables, you can select "Summarized data" and input the summary statistics. If the two samples are in different columns, select "Each sample is its own column" and indicate which two columns contain the data. If the samples are in one column, with another variable containing the labels that designate which sample each value belongs to, then put the column containing the samples in the box labeled "Samples" and the column containing the labels in the box "Sample IDs". Again, by clicking on "Options", you can change the confidence level or switch to a one-sided test.

Note that to get a confidence interval in Minitab, you need to ask for a two-sided hypothesis test. Therefore, if you want to conduct a one-sided hypothesis test and get a confidence interval, you will first need to carry out a one-sided hypothesis test and read off the p-value, then separately ask Minitab for a two-sided hypothesis test so that you can read off the confidence interval.

Minitab Express Instructions

To find confidence intervals and carry out hypothesis tests in Minitab Express, go to Statistics --> 1-Sample Inference --> t for the one-sample t-test, Statistics --> 2-Sample Inference --> t for the two-sample t-test, or Statistics --> 2-Sample Inference --> Paired t for the paired t-test. More details on how to perform these tests are given below, although you may prefer just to start on the activity and refer back to these instructions as needed.

First, consider the one sample t-test. You can either give Minitab a column of data (by double clicking on one of the variables so that it goes into the box labeled "Sample"), or you can choose the "Summarized data" option and just input the sample size, sample mean, and sample standard deviation. To get a hypothesis test rather than just a confidence interval, check the box "Perform hypothesis test". You must also input the "Hypothesized mean", which is the value of μ0 when your null hypothesis is μ = μ0. If you click the "Options" button, you can change the alternative from two-sided (the default) to one-sided. You can also change the "Confidence level". Note that what Minitab calls the confidence level is not exactly what we have been calling the "significance level" or "alpha level"; a confidence level of 95 corresponds to a significance level of .05. If you want, you can click in the "Display" button and get a histogram or boxplot of the data at the same time as the test. Minitab outputs both a confidence interval for the parameter and the results of a hypothesis test. The confidence interval is in the first part of the display under "95% CI for μ". The t-statistic for the hypothesis test is under "T-Value", and the p-value is clearly indicated. Minitab does not tell you whether or not to reject the null hypothesis, but of course you can figure this out from the p-value.

The procedure for the paired t-test is similar. You just have to either input both columns of data, or select the "Summarized data (differences)" option and input the sample size, sample mean, and sample standard deviation for the differences between the columns. You can click on "Options" to change the confidence level or switch to a one-sided test. Minitab outputs both a confidence interval for the mean of the differences, and the t-statistic and p-value for the hypothesis test that the difference is zero.

For the two-sample t-test, there are three ways to input the data. If you have the summary statistics for both variables, you can select "Summarized data" and input the summary statistics. If the two samples are in different columns, select "Each sample is in its own column" and indicate which two columns contain the data. If the samples are in one column, with another variable containing the labels that designate which sample each value belongs to, then put the column containing the samples in the box labeled "Samples" and the column containing the labels in the box "Sample IDs". Again, by clicking on "Options", you can change the confidence level or switch to a one-sided test.

Note that to get a confidence interval in Minitab, you need to ask for a two-sided hypothesis test. Therefore, if you want to conduct a one-sided hypothesis test and get a confidence interval, you will first need to carry out a one-sided hypothesis test and read off the p-value, then separately ask Minitab for a two-sided hypothesis test so that you can read off the confidence interval.

Vanderbilt Data

For this part of the lab, you will work with the data set VANDERBILT, which is available in TritonEd.

We have data on 384 Economics majors who entered Vanderbilt University as freshmen between 1983 and 1986 and took an intermediate macroeconomics course. Only students who had previously taken a calculus course and two semesters of introductory economics were included in the data set. These data were used for the study [J.S. Butler, T.A. Finegan, and J.J. Siegfried (1998). Does More Calculus Improve Student Learning in Intermediate Micro and Macro Economic Theory. Journal of Applied Econometrics, 13, 185-202]. The data were obtained from the Journal of Applied Econometrics Data Archive. The data set contains 384 rows, one for each student, and the following four columns:

 Variable Name Description Math The student's Math SAT score Verbal The student's Verbal SAT score GPA The student's GPA as a freshman Gender F = female, M = male

SAT Scores and GPA

Begin by examining the data graphically. Answer the following questions, supplementing your responses with graphs where appropriate.
1. Examine the distributions of the Math and Verbal SAT scores. Do they appear to be approximately normally distributed? Are there any outliers? What about the distribution of the GPAs?

2. How does the performance of males compare to the performance of females on the Math SAT test? What about the Verbal SAT test? How do the first-year GPAs of male and female students compare? (Hint: Side-by-side boxplots may be useful. Recall you can make them by going to Graph --> Boxplot, then selecting "With Groups" under "One Y".)

In Minitab Express, you make the boxplot by going to Graphs --> Boxplot, then selecting "With Groups" under "Single Y variable".

Now you will conduct some formal hypothesis tests concerning the SAT scores and GPAs. We have learned about three different t-tests (the one-sample t-test, the paired t-test, and the two-sample t-test), so make sure you choose the correct test in each case. Conduct all tests at significance level .05. (Note: although the data include all macroeconomics students at Vanderbilt over a period of time who met certain criteria, you can think of this as a sample from a "population" of all students who could take the class over a longer period of time.)

For each hypothesis test that you conduct, indicate clearly which of the three t-tests you are using and whether you have chosen a one-sided or two-sided alternative hypothesis. Also, report the T-statistic and the p-value for each test, and make sure that your conclusion is clearly stated in the context of the problem.
1. Based on what you observed from examining the data graphically, does it appear that the assumptions required to conduct t-tests are met?

2. Do macroeconomics students at Vanderbilt score significantly higher on the Math SAT than the national average (which in the mid-1980s was around 470)?

3. Do macroeconomics students at Vanderbilt score significantly higher on the Verbal SAT than the national average (which in the mid-1980s was around 425)?

4. Report a 95 percent confidence interval for the true mean Math SAT score. Do 95 percent of students have Math SAT scores that fall within this interval? Explain your answer.

5. Is there a statistically significant difference between the Verbal and Math scores of macroeconomics students at Vanderbilt?

6. Is there a statistically significant difference between the performances of males and females on the Verbal SAT? Construct a 95 percent confidence interval for the difference. Does it include zero? Relate this to the conclusion of your test.

7. Is there a statistically significant difference between the freshman GPAs of males and females? Construct a 95 percent confidence interval for the difference. Does it include zero? Relate this to the conclusion of your test.

8. Is there a statistically significant difference between the performances of males and females on the Math SAT? Construct a 95 percent confidence interval for the difference. Does it include zero? Relate this to the conclusion of your test.

Cloud seeding data

For this part of the lab, open the data set CLOUDS, which is available in TritonEd.

Cloud seeding is a process by which particles are introduced into a cloud in an attempt to induce rainfall. We have data from a cloud seeding experiment involving 52 clouds. Half of the clouds were chosen at random to be seeded with silver nitrate, while the other half were left unseeded. The data were obtained from the Data and Story Library. The original source is [J. Simpson, A. Olsen, and J. Eden (1975). A Bayesian analysis of a multiplicative treatment effect in weather modification. Technometrics, 17, 161-166.] The data set contains the following two columns:

 Variable Name Description Unseeded The rainfall totals (in acre-feet) of the unseeded clouds Seeded The rainfall totals (in acre-feet) of the clouds that were seeded with silver nitrate

Cloud seeding investigations

Your goal here is to assess whether there is strong evidence that cloud seeding produces rainfall. Do this by answering the following questions.
1. First examine and compare the two distributions with histograms and a side-by-side boxplot. Discuss what you observe in a few sentences.

2. Conduct a hypothesis test to test the hypothesis that cloud seeding with silver nitrate increases rainfall. Use significance level .01. Based on what you observed in answering the previous question, are you confident that this is a valid test? If not, what assumptions are violated?

3. Try taking natural logarithms of the rainfall totals. (Recall that you can construct the new variables by going to Calc --> Calculator, then selecting the function "Natural Log" and the appropriate variable.) Are the new variables less skewed? Are the distributions closer to normal? What does a side-by-side boxplot of the variables show?

In Minitab Express, you go to Data --> Formula instead of Calc --> Calculator.

4. Conduct another formal hypothesis test using the transformed variables, again at significance level .01. Are you more confident in the results of this test or the results of the previous test?

5. Write a couple of sentences explaining your overall conclusions about the effect of cloud seeding with silver nitrate on rainfall.