First, open the data set BABIES, which is available in TritonEd.

We have data on 300 babies born at Salinas Valley Memorial Healthcare System in Salinas, California during the months of January through June, 2009. The data were obtained from the WebNursery web site. Information at the Baby Name Facts web site was used to help with distinguishing male and female names. A few babies for which the gender could not be determined from the name, or other information was unavailable, were excluded from the data set. Twins were also excluded from the data set. The data set includes the following columns. The first eight columns have 300 rows and provide information about each of the 300 babies. The last two columns have 181 rows and give the number of babies born on each of the 181 days between 1/1/2009 and 6/30/2009.

Variable Name |
Description |

Date | The date the baby was born |

Time | The time of the day that the baby was born, measured in hours after midnight |

Weight | The baby's weight, in ounces |

Gender | B = boy, G = girl |

NumtoB | For boys, the number of births (including the current one) since the previous boy |

NumtoG | For girls, the number of births (including the current one) since the previous girl |

NumGirls | The number of the previous five babies born that were girls |

Interval | The length of time, in hours, since the previous birth |

Day | A day between 1/1/2009 and 6/30/2009 |

Number | The number of births in the data set on that day |

Suppose we have independent Bernoulli trials, each resulting in success with probability

We will first do a quick check to see if boys and girls appear to be equally likely. Go to

In Minitab Express, the command is

- How many boys and how many girls are there in the data set?

We will investigate whether the variables "NumtoB" and "NumtoG" really do follow approximately a geometric distribution with

- How many times did we have to wait for just one baby to get a boy? How many times did we have to wait for two babies to get a boy? Three? Four? Five? Six? Seven? Answer the same questions for girls. It is probably best to display your answers in a table, similar to what Minitab displays.

In Minitab Express, to figure out what we should expect, you should also type the numbers 1, 2, ..., 7 into one column. Go to to

- How do the numbers you observed in question 2 compare to what you would expect if these numbers followed a geometric distribution with
*p*= 1/2? To answer this question, make a table similar to the one you made for question 2 but with the expected numbers rather than the actual numbers. Do your data roughly agree with what you expected? (If you wish to look at a graph rather than just comparing the numbers in the table, try typing the 7 observed numbers in one column and the 7 expected numbers in another column. Then go to*Graph --> Bar Chart*, choose the option that bars represent "Values from a table", then select "Cluster" under "Two-way table" and click OK. Choose the columns in which you placed the observed and expected numbers as "Graph variables" and put the column in which you have the numbers 1 through 7 in the "Row labels" box. Then click OK. You will get side-by-side bar charts of the observed and expected numbers.)

In Minitab Express, you would go to*Graphs --> Bar Chart*, then select that the values represent "Summarized values for each category in a table", then select "Clustered" under "Two-way table". Choose the columns in which you placed the observed and expected numbers as "Summary variables" and put the column in which you have the numbers 1 through 7 in the "Column of row labels" box. Then click OK.

- You should have noticed that the NumtoB variable once took the value 10. Do you think that this unusual value means there is something wrong with the geometric model, or do you think this was just a chance event?

Again, suppose we have independent Bernoulli trials, each resulting in success with probability

We have split the babies into groups of five and counted the number of girls in each group. The relevant numbers appear in the column "NumGirls". Make sure that you understand how these numbers were computed. For example, there is a 2 in row 5 because there were two girls among the first five babies. There is a 2 in row 10 because there were also two girls among the next five babies, and so on.

- Record the number of groups that have zero, one, two, three, four, and five girls.

- Compare these numbers to what you would expect if these numbers followed the binomial model. (Hint: to get Minitab to help you compute the expected numbers, type the numbers 0, 1, 2, 3, 4, 5 in one column, and use the
*Calc --> Probability Distributions --> Binomial*command, which you are familiar with from Lab 4.) Do the data match the binomial distribution well? Again, you can either compare the numbers in your tables or make side-by-side histograms as described in question 3 above.

In Minitab Express, the relevant command is*Statistics --> Probability Distributions --> Probability Density Function*, and you will select the Binomial distribution.

Suppose an event is happening at some constant rate over a period of time. Then the number of times that the event occurs during a particular time interval has the Poisson distribution. Therefore, the number of customers who arrive in a store during a 5-minute interval should have a Poisson distribution. The number of goals scored during a soccer game should also have approximately a Poisson distribution. If babies are born at approximately a constant rate over time, then the number of babies born on a given day should have a Poisson distribution. Here you will investigate whether the Poisson distribution indeed fits the data well.

- What is the average number of births per day?

- On how many days were there no births? One? Two? Three? Four? Five? Six?

- Compare these numbers to what you would expect if these numbers followed a Poisson distribution with the same mean as what you found in question 7. Do the data approximately agree with what would be expected from the Poisson distribution? (To get Minitab to help with the Poisson distribution computation, type the numbers 0, 1, ..., 6 in one column. Then go
*Calc --> Probability Distributions --> Poisson*, click the bubble that says "Probability", type in the mean you found in question 7, and then proceed as you did for your calculations with the geometric and binomial distributions.)

In Minitab Express, the relevant command is*Statistics --> Probability Distributions --> Probability Density Function*, and you will select the Poisson distribution.

Suppose an event is happening at some constant rate over a period of time. Then the amount of time that we have to wait before the next occurrence of the event has the exponential distribution. For example, the amount of time before the next customer arrives in a store should have the exponential distribution, as should the amount of time before the next goal in a soccer game. If babies are born at approximately a constant rate over time, then the length of time between births (that is, the amount of time we have to wait for the next birth) should have approximately an exponential distribution. Here you will investigate whether this is the case. Because time is a continuous variable, you can not proceed by comparing observed and expected counts as before. Instead, you will base your analysis on a histogram of the data. Below are some points to keep in mind:

- Negative times between births are, of course, impossible. However, by default, Minitab makes the first bar of the histogram extend below zero. To fix this, double click on the horizontal axis, click the "Binning" tab, click in the "Cutpoint" bubble, and click "OK". You will have to do this with several histograms in this lab, and it is very important that you do this in order for your graphs to display the data accurately. Remember you can also change the number of bins.

In Minitab Express, you will need to click inside the histogram, then click on the plus sign to the right of the graph and select "Cutpoint" under "Binning".

- A useful technique is to superimpose an exponential curve on top of the histogram. You can do this by going to
*Graph --> Histogram*and then selecting "With Fit" and clicking "OK". You will have to select a variable as usual. Then click on the "Data View" tab, then click on "Distribution", check the "Fit distribution" box, and select "Exponential" in the "Distribution" window. Then click "OK" twice to make the graph.

In Minitab Express, you click inside the histogram, then click the plus sign to the right of the graph. Then check "Distribution fit", click the associated triangle, and select "Exponential".

- What is the mean time between births?

- Overall, does it appear that the waiting times between births follow approximately an exponential distribution? Explain your answer, and provide a plot to support your answer.

A random variable has a uniform distribution if it has a density that is constant over some interval. If a bus is equally likely to arrive any time during the next three minutes, then the time (in minutes) that you will have to wait for the bus has a uniform distribution on [0, 3]. If babies were equally likely to be born at any time of the day, the distribution of the birth times would be approximately uniform.

- Make a histogram of the birth times. Does the distribution of birth times appear to be uniform, or do you see a different pattern? Explain your answer. It would be a good idea to simulate 300 values from a uniform distribution a few times to see how much fluctuation would be expected just by chance. You can do this by going to
*Calc --> Random Data --> Uniform*.

In Minitab Express, you go to*Data --> Generate Random Data*and select "Uniform" as the distribution.

The normal distribution has a density in the shape of the famous "bell curve". As you will learn later, a famous result called the Central Limit Theorem guarantees that sums and averages of many independent random variables will have approximately a normal distribution. For this reason, normal distributions are ubiquitous in nature. Although we can only be confident of observing a normal distribution when we are taking sums or averages, measurements of complex traits often have approximately a normal distribution. For example, heights and IQ scores of adults are approximately normally distributed, probably because height and IQ are in some sense "averages" of many contributing factors. Here you will investigate whether the birth weights of the babies also follows a normal distribution.

- Graph the distribution of birth weights. Does the distribution of the birth weights look to be approximately normal? (You could answer this just from the histogram, or you could try superimposing a normal curve on top of the histogram, as you did with the exponential distribution.)

In this lab, you have investigated six of the most important distributions in probability theory. You should now have a good idea of when to expect these distributions to appear. For the random variables below, indicate whether you would expect the distribution to be best described as geometric, binomial, Poisson, exponential, uniform, or normal. We do not have data, so you will not to use the computer for these questions. For each item, give a brief explanation of your answer. A one-sentence explanation should be sufficient.

- The number of days that we have to wait before the first Daily 4 number drawn in the California State Lottery is a 6. (Each day, this number is equally likely to be any of the 10 digits.)

- The amount of time before the next plane crash in the United States.

- The number of typographical errors on a page in the rough draft of a report.

- The number of times that a rifle shooter hits a target if he shoots 10 times.

- The number of phone calls that a salesperson gets in the next hour.

- The number of minutes that the salesperson is waiting before her next phone call.

- The time of day that the next major earthquake occurs in Southern California.