Lab 1: May weather in San Diego (Displaying and describing data graphically)

In this lab, you will learn how to display data using graphs, and how to use these graphs to describe quantitative and categorical data.

The Data

First, open the data set WEATHER, which you can obtain by logging into TED, then selecting "Content" from the menu on the left and opening the folder "Data for the Labs". When you download the data file from TED, the file may go into a Downloads folder, which you can find by going to Start --> Documents --> Downloads.

We have data on the weather at the San Diego airport for every May day from 2001 through 2008. The data were obtained from this weather site. The data set contains 248 rows (one for each day) and the following four columns:

Variable Name       Description
Date The date
High The high temperature
Low The low temperature
Cloud Cover Describes the sky conditions as one of the following: Sunny, Partly Cloudy, Mostly Cloudy, Overcast, Fog, Rain


Working with categorical data

A categorical variable is a variable that sorts cases into categories. If we had data on students enrolled in math 11, variables such as gender, class year, and major would be categorical. In your data set, the variable called Cloud Cover is categorical, as it sorts days into six categories. Categorical data can be displayed in a bar graph or a pie graph.

Begin by making a pie chart of the Cloud Cover variable. Go to Graph --> Pie Chart. To tell MINITAB what variable to plot, click in the white box under the words "Categorical variables". The variables appear in the white box on the left. Click on "C4 Cloud Cover" and then click on "Select". Click "OK" and MINITAB will make a pie chart.

Now explore some different options for labeling your graph. You can change the title of the graph by double-clicking on the title. If you go to Editor --> Add --> Slice Labels, you can check any or all of the three boxes if you want to display the frequency (the number of times each category appears in the data) or the percentage of times each category appears in the data. [Note: if you are using MINITAB Version 14, this option will not appear under Editor. Instead, double click inside the pie chart and click the tab for "Slice Labels".] You can also select a single section of the pie by clicking twice within the same section, but pausing for a second or so in between clicks. If you then click twice more quickly on the section, you can change the color of the slice by selecting "Custom" under "Fill Pattern" and choosing a new background color.

Another option is to display these data in a bar chart. Go to Graph --> Bar Chart and click "OK" (you want to make a Simple bar chart). Click on "C4 Cloud Cover" and then "Select" as before, and then click "OK" to make a bar chart. You can modify the title or either axis label by double-clicking on it. Another useful option is that if you double click on one of the bars, then click on the tab "Chart Options" and check the box "Show Y as Percent", the vertical axis on your plot will display the percentage rather than the frequency of each category. MINITAB also gives you some additional options for ordering the categories. Finally, you can display the frequency or the percentage for each category by going to Editor --> Add --> Data Labels and then clicking OK.
  1. Present one graph (either a pie chart or a bar chart) that you think gives a good depiction of the data. To accompany this graph, write a paragraph of two or three sentences describing the data. (Note that this question, like many questions in these labs, does not have a single correct answer. There are two different charts you could choose from, and there are different features of the data that you could choose to focus on in your description.)

  2. On how many May days from 2001 to 2008 was it partly cloudy in San Diego? On what percentage of May days from 2001 to 2008 was it partly cloudy in San Diego?
For your write-up, you will need to copy your graph from MINITAB to Microsoft Word. You can do this by going to Edit --> Copy Graph. Then once you are in Microsoft Word, go to Edit --> Paste. You can have both MINITAB and Microsoft Word open at once if you click the middle box in the upper-right corner of the MINITAB display, so that MINITAB does not fill the entire screen. (Note: if you are using Libre Office from the virtual computing lab, you may need to go to Edit --> Paste Special, then select Minitab Graph and click OK to get good results.)

Working with quantitative data

A quantitative variable records numerical measurements about the cases. If we had data on students enrolled in Math 11, variables such as GPA or SAT scores would be quantitative variables. In the data set we are working with, the high and low temperatures are quantitative variables. Quantitative data are most commonly displayed using a histogram.

Start by making a histogram of the high temperatures. To do this, go to Graph --> Histogram and click "OK". Click on "C2 High" and then "Select", and then click on "OK" to make the histogram. You are able to modify the axis labels and title by double-clicking on them, as before. It is important to be able to change the number of bars in your histogram. If you double-click inside one of the bars, and then click on the "Binning" tab, you get an option to change the number of intervals (bars) in the histogram. Experiment with different numbers. If you use too few bars, you will lose too much information because data get lumped together. If you use too many, it will be hard to see the overall shape of the data. Choose a number that you think gives a good representation of the data. You usually get a better histogram if you click the bubble that says "Cutpoint" so that the numbers displayed on the axis will be the numbers between bars rather than the centers of the bars.
  1. Present a histogram for the high temperatures and a histogram for the low temperatures. As always, make sure the axes of your graphs are labeled appropriately. Supplement your graphs with a few sentences describing the distributions.

  2. Would you describe the distributions as symmetric (which would mean that the left and right sides of the histogram are approximately mirror images) or skewed?

  3. Are there any values that you would describe as outliers among either the high or low temperatures?
It is often useful to examine only part of a data set at a time. Your next task will be to compare the distribution of the high temperature on sunny days to the distribution of the high temperature on partly cloudy days. To graph the distribution of the high temperature on sunny days only, go to Graph --> Histogram as before and click "OK". Select the high temperature variable. Then click on "Data Options" and select "Specify which rows to include" and "Rows that match", and click on "Condition". In the box under "Condition" type

'Cloud Cover' = "Sunny"

(including the quotation marks). After you click "OK" three times, you will get a histogram of the high temperature on sunny days. Another way to do this is to make a new worksheet containing only the sunny days by going to Data --> Subset Worksheet, clicking on "Condition", typing 'Cloud Cover' = "Sunny" in the box that says "Condition", and clicking "OK" twice. Then you can work with just the partly cloudy days and make histograms as you did earlier.
  1. Present histograms of the high temperature on sunny days and the high temperature on partly cloudy days. Summarize what you find in a few sentences. Address issues such as the center of the distributions (whether temperatures tend to be higher when it is sunny or partly cloudy), the spread of the distributions (whether there is more variability in the high temperature on sunny days or partly cloudy days), and the shape of the distributions (whether they are symmetric or skewed).
Boxplots

Boxplots are also useful for displaying quantitative data. Although histograms are usually the best choice for displaying the distribution of one variable, boxplots can be useful for comparing two variables or for comparing one variable across several categories.

To make a boxplot just of the high temperature variable, go to Graph --> Boxplot, click on "OK", then select "High" and click "OK" again. The boxplot is drawn as follows: To display boxplots of the high and low temperatures side-by-side, again start by going to Graph --> Boxplot , but select the lower-left box "Simple" under "Multiple Y's" before clicking "OK". Then select both "High" and "Low", click "OK", and MINITAB will give you the boxplots. You should see, as expected, that high temperatures tend to be higher than low temperatures.

Now compare the high and low temperatures for different sky conditions. Start with the high temperature. To do this, go to Graph --> Boxplot again, but this time click on "With Groups" (under "One Y") before clicking "OK". Select "High" as the graph variable. Then click in the window for "categorical variables for grouping" and select the "Cloud Cover" variable. Then click "OK", and you should get six boxplots side-by-side. Do the same for low temperature.
  1. Present the boxplots comparing high temperatures under different sky conditions. Also present the boxplots comparing low temperatures under different sky conditions. Write a few sentences summarizing what you observe.

  2. Under what sky conditions are high temperatures typically the highest? Under what sky conditions do the high temperatures tend to be lowest?

  3. Under what sky conditions are low temperatures typically the highest? Under what sky conditions do the low temperatures tend to be lowest?