First, open the data set STATES, which is available in TritonEd.

We have data on the 50 states, with an emphasis on economic, demographic, and health variables. The data set contains 50 rows (one for each state) and the following columns:

Variable Name |
Description |

State | The name of the state |

Region | The region of the country (West, South, Midwest, or Northeast) in which the state is located |

Poverty | The percentage of individuals in the state below the poverty line in 2014 |

Income | Median annual household income in the state in 2014 |

InfMort | Infant deaths per 1000 births, 2011-2013 |

LowBwt | Percentage of live births under 2500 grams in 2013 |

Smoke | Percentage of people over age 18 who smoked in 2014 |

LifeExp | Life Expectancy in the state in 2010 |

Diabetes | Percentage of adults with diabetes in 2014 |

Asthma | Percentage of adults with asthma in 2014 |

AfAm | Percentage of people in the state in 2014 who were African-American |

AsAm | Percentage of people in the state in 2014 who were Asian-American |

Hispanic | Percentage of people in the state in 2014 of Hispanic or Latino origin |

The variables AfAm, AsAm, and Hispanic came from the U.S. Census Bureau.

The variable Poverty came from WorldAtlas.

The other variables came from the State Health Facts web site provided by the Kaiser Foundation.

In Lab 1, you learned how to summarize data graphically. In this lab, you will learn how to use Minitab to obtain numerical as well as graphical summaries of data. First practice with the Poverty variable. Go to

In Minitab Express, you go to

If you want to look at a different set of summary statistics than the default ones, click on "Statistics" before clicking "OK". Then you can check whatever variables you want to see. For example, you could check interquartile range if you don't want to figure it out by subtracting the lower quartile from the upper quartile, and you could uncheck things like SE of mean which we haven't learned about yet, or the number of observations, which we know is 50. Now answer the following questions.

- Examine (and present) a histogram of the variable "AfAm". Remember you can change the number of bars by double clicking inside
one of the bars and selecting the "Binning" tab. As always, it is a good idea to select the "Cutpoint" interval type.
This keeps the lowest bar from extending below zero, as negative values for this variable do not make sense. Also make sure, as always,
that your graph is appropriately labeled. Would you describe the distribution of the percentage of African-American
residents in the 50 states as symmetric or skewed? Find the mean, median, lower quartile, and upper quartile. Is the mean greater than, less than, or about the same as the median? Is the lower quartile closer to the median, farther from the median, or about the same distance from the median as the upper quartile?

- Next examine (and present) a histogram of the variable "Income". Would you describe the distribution of the
median incomes of the 50 states as symmetric or skewed? Answer the same questions that you answered for the "AfAm" variable.

- What do you conclude about how skewness affects the mean and median? What do you conclude about how skewness affects the distance between the median and the lower and upper quartiles?

- Examine (and present) a histogram of the "As-Am" variable. Is the distribution symmetric or skewed? Are there any outliers? Find the mean, median, standard deviation, and interquartile range.

- What state has the largest percentage of Asian-Americans, and what percentage of people in that state are Asian-Americans? You can find this by scrolling down the data file. Remove this state from consideration, and calculate the mean, median, standard deviation, and interquartile range for the other 49 states. (There are several ways to remove a state, but probably the easiest is to put the cursor over the cell in the data file that you want to remove and hit the delete key, then remember the value and put it back when you are finished with this question. Do not go to
*Edit --> Delete Cells*, as this will alter the alignment of the data.) Which of these four summary statistics are strongly affected by the outlier, and which ones are not?

Boxplots are a useful graphical tool for comparing different regions of the country. Again, start with the "Poverty" variable. Recall that you can make a boxplot of the poverty rates by going to

In Minitab Express, you make the boxplots by going to

Now answer the following questions in a few sentences each, providing some plots to support your answers.

- What differences, if any, do you see in the levels of poverty in the four regions of the country.

- What differences, if any, do you see in the percentages of African-Americans and the percentages of people of Hispanic or Latino origin across the four regions of the country.

- Now consider the health-related variables. Are there differences in the infant mortality rates in the four regions of the country? What about the life expectancies?

Scatterplots are the standard way of displaying relationships between two quantitative variables. To make a scatterplot, go to

In Minitab Express, you make a scatterplot by going to

Now answer the following questions in a few sentences each, providing plots when necessary to support your answers.

- Investigate the relationship between poverty and smoking. From a scatterplot, do you see a strong relationship between the poverty rate in a state and the percentage of adults who smoke? What is the correlation between the poverty rate and the percentage of adults who smoke?

- Now investigate how poverty is associated with specific health conditions. Describe the relationship between the poverty rate and the percentage of people with diabetes. Describe the relationship between the poverty rate and the percentage of people with asthma.

- What kind of relationship, if any, do you see between the percentage of people who smoke and the life expectancy in the state?

- Is the relationship that you observed in response to the previous question sufficient to prove that smoking causes lower life expectancies?

- Is it possible to use the correlation to summarize the relationship between "Region" and life expectancy? Explain your answer.

Your next goal will be to predict the infant mortality rate from the percentage of babies with low birthweight, using linear regression. Go to

In Minitab Express, you go to

Minitab's regression output does not give you a scatterplot automatically, but you can get a scatterplot with a regression line drawn in by going to

Minitab Express does provide a scatterplot automatically with the regression output. To get a residual plot, select "Graphs" before clicking "OK", click in the box "Residuals versus the variables" and then select your predictor variable.

- Give the equation of your regression line.

- What percentage of the variation in infant mortality rates can be explained by the percentages of low birthweight babies?

- From your scatterplot and residual plot, does it appear that linear regression is appropriate for these data? Show the scatterplot and residual plot, and write a few sentences explaining your answer.

- What would the regression predict to be the infant mortality rate in California? How does this compare to the actual infant mortality rate in California?