## Lab 1: Fish in Finland (Displaying and describing data graphically)

In this lab, you will learn how to display data using graphs and how to use these graphs to describe quantitative and categorical data. This lab will also provide an introduction to the use of Minitab.

The Data

First, open the data set FISH, which you can obtain by logging into TritonEd, then selecting "Content" from the menu on the left and opening the folder "Data for the Labs". When you download the data file from TritonEd, the file may go into a Downloads folder, which you can find by going to Start --> Documents --> Downloads.

We have data on 159 fish which were caught from Lake Laengelmavesi in Finland. The data were obtained from the Journal of Statistics Education Data Archive. The data set contains 159 rows (one for each fish) and the following three columns:

 Variable Name Description Species The species of fish Weight The weight of the fish in grams. (Note: two measurements are missing) Length The length of the fish in centimeters from the nose to the beginning of the tail

Working with categorical data

A categorical variable is a variable that sorts cases into categories. If we had data on students enrolled in math 11, variables such as gender, class year, and major would be categorical. In your data set, the variable called "Species" is categorical, as it sorts fish into several species. Categorical data can be displayed in a bar chart or a pie chart.

Begin by making a pie chart of the "Species" variable. Go to Graph --> Pie Chart. To tell Minitab what variable to plot, click in the white box under the words "Categorical variables". The variables appear in the white box on the left. Double click on "C1 Species". Click "OK", and then Minitab will make a pie chart.

If you are using Minitab Express, you make a pie chart by going to Graphs --> Pie Chart and click on "Counts of unique values". Then double click on the "Species" variable and click OK.

Now explore some different options for labeling your graph. You can change the title of the graph by double-clicking on the title. If you go to Editor --> Add --> Slice Labels, you can check any or all of the three boxes if you want to display the frequency (the number of times each category appears in the data) or the percentage of times each category appears in the data.

In Minitab Express, you can still change the title by double clicking on it. Next try clicking inside the graph, then clicking in the plus sign inside a circle that appears to the right of the graph near the top. You can then click on the triangle beside "Slice Labels" to display the frequency or the percentage of times that each category appears in the data.

Another option is to display these data in a bar chart. Go to Graph --> Bar Chart and click "OK" (you want to make a Simple bar chart). Double click on "C1 Species", and then click "OK" to make a bar chart. You can modify the title or either axis label by double-clicking on it. Another useful option is that if you double click on one of the bars, then click on the tab "Chart Options" and check the box "Show Y as Percent", the vertical axis on your plot will display the percentage rather than the frequency of each category. Minitab also gives you some additional options for ordering the categories. Finally, you can display the frequency or the percentage for each category by going to Editor --> Add --> Data Labels and then clicking OK.

In Minitab Express, you make a bar chart by going to Graphs --> Bar Chart . Then click on "Simple". Double click on the "Species" variable and then click "OK". If you click inside the graph, then click the plus sign to the right of the graph, you will see some additional options. Selecting "Data Labels" will display the frequency of each category on the graph. If you prefer to display the percentages instead, click on the triangle beside "Scale Type" and change to "Percent".
1. Present one graph (either a pie chart or a bar chart) that you think gives a good depiction of the data. To accompany this graph, write a paragraph of two or three sentences describing the data. Mention how many species there are and which is the most common. (Note that this question, like many questions in these labs, does not have a single correct answer. There are two different charts you could choose from, and there are different features of the data that you could choose to focus on in your description.)

2. How many Common Bream were caught from the lake? What percentage of the fish caught from the lake were Common Bream?
For your write-up, you will need to copy your graph from Minitab to Microsoft Word. You can do this by going to Edit --> Copy Graph. Then once you are in Microsoft Word, go to Edit --> Paste. Another option is to right-click inside the graph and select Sent Graph to Microsoft Word. You can have both Minitab and Microsoft Word open at once if you click the middle box in the upper-right corner of the Minitab display, so that Minitab does not fill the entire screen.

In Minitab Express, you copy the graph by clicking inside the graph and going to Edit --> Copy.

Working with quantitative data

A quantitative variable records numerical measurements about the cases. If we had data on students enrolled in Math 11, variables such as GPA or SAT scores would be quantitative variables. In the data set we are working with, the heights and lengths of the fish are quantitative variables. Quantitative data are most commonly displayed using a histogram.

Start by making a histogram of the weights of the fish. To do this, go to Graph --> Histogram and click "OK". Double click on "C2 Weight", then click on "OK" to make the histogram. You are able to modify the axis labels and title by double-clicking on them, as before. It is important to be able to change the number of bars in your histogram. If you double-click inside one of the bars or along the horizontal axis and then click on the "Binning" tab, you get an option to change the number of intervals (bars) in the histogram. Experiment with different numbers. If you use too few bars, you will lose too much information because data get lumped together. If you use too many, it will be hard to see the overall shape of the data. Choose a number that you think gives a good representation of the data. You almost always get a better histogram if you click the bubble that says "Cutpoint" so that the numbers displayed on the axis will be the numbers between bars rather than the centers of the bars. You should use "Cutpoint" intervals not only for this histogram but for all histograms that you make in Math 11.

In Minitab Express, you make a histogram by going to Graphs --> Histogram, then clicking "Simple". Double click on the "Weight" variable, then click "OK" to make the histogram. You are able to modify the axis labels and title by double-clicking on them. To modify the number of bins, click inside the graph, then on the plus sign to the right of the graph. You can then click on the triangle beside "Binning" to see the options to change the number of bars or switch to the "Cutpoint" method. Below the histogram (and the boxplots that you will make later), Minitab Express provides some summary statistics that you do not get in Minitab. You may ignore them for now.
1. Present a histogram for the weights of the fish and a histogram for the lengths of the fish. As always, make sure the axes of your graphs are labeled appropriately. Supplement your graphs with a few sentences describing the distributions.

2. Would you describe the distributions as symmetric (which would mean that the left and right sides of the histogram are approximately mirror images) or skewed?

3. Are there any values that you would describe as outliers among either the weights or the lengths?
It is often useful to examine only part of a data set at a time. Your next task will be to compare the distribution of the lengths of the Common Bream to the lengths of the Perch. To graph the distribution of the Common Bream only, go to Graph --> Histogram as before and click "OK". Select the length variable. Then click on "Data Options", select "Rows that match", and click on "Condition". In the box under "Condition" type

Species = "Common Bream"

(including the quotation marks). After you click "OK" three times, you will get a histogram of the lengths of the Common Bream.

In Minitab Express, you have to do this by hand. Go to File --> New to create a new Minitab worksheet. Then highlight the part of the data you want to work with (in this case, all of the Common Bream) and go to Edit --> Copy. Then click on the first cell in the first column of the new worksheet and go to Edit --> Paste. You will have to separately copy over or type in the column headings.
1. Present histograms of the lengths of the Common Bream and the lengths of the Perch. Summarize what you find in a few sentences. Discuss the center of the distributions (whether the Common Bream or the Perch tend to be longer), the spread of the distributions (whether there is more variability in the lengths of the Common Bream or the Perch), and the shape of the distributions.
Boxplots

Boxplots are also useful for displaying quantitative data. Although histograms are usually the best choice for displaying the distribution of one variable, boxplots can be useful for comparing two variables or for comparing one variable across several categories.

To make a boxplot just of the "Length" variable, go to Graph --> Boxplot, click on "OK", then select "C3 Length" and click "OK" again. The boxplot is drawn as follows:
• The line in the middle of the box is the median.
• The top of the box is the upper (third) quartile and the bottom of the box is the lower (first) quartile.
• The top of the vertical line (whisker) is the largest value within 1.5 interquartile ranges (IQRs) of the upper quartile, while the bottom of the whisker is the smallest value within 1.5 IQRs of the lower quartile.
• All data values not within 1.5 IQRs of either the upper or lower quartile are plotted separately by an asterisk.
In Minitab Express, you make the boxplot by going to Graphs --> Boxplot. Then select "Simple" under "Single Y variable". Select the "Length" variable and click "OK".

Now compare the weights and lengths of different species of fish. Start with the weights. To do this, go to Graph --> Boxplot again, but this time click on "With Groups" (under "One Y") before clicking "OK". Select "Weight" as the graph variable. Then click in the window for "Categorical variables for grouping" and select "Species". Then click "OK", and you should get seven boxplots side-by-side. Do the same for the lengths.

In Minitab Express, you do this by going to Graphs --> Boxplot, then selecting "With Groups" under "Single Y variable". Double click on "Weight" to select it as the Y variable, then double click on "Species" to select it as a "Group variable".

Present the boxplots comparing the weights and lengths for different species of fish. Then use these boxplots to answer the following questions. In each case, explain briefly how you are able to obtain your answer from the boxplots.
1. Which of the species of fish tends to be the lightest? Which tends to be the shortest?

2. Which of the species has the highest median weight? Which of the species has the highest median length?

3. For which of the species of fish do the weights have the highest interquartile range?