Lab 5: Birth times and birth weights (Probability distributions)

In this lab, you will investigate the genders, birth times, and birth weights of babies. One goal of this lab is to help you gain experience in determining when you should expect to see different probability distributions arising in practice. In the course, we have introduced six different probability distributions: geometric, binomial, Poisson, uniform, exponential, and normal. Click here for a one-page handout reviewing these six distributions.

The Data

To load the data set BABIES, click here. We have data on 297 babies born at the Palomar Medical Center in Escondido, California during the months of July through December, 2007. The data were obtained from the WebNursery web site. Information at the Baby Name Facts web site was used to help with distinguishing male and female names. A few babies for which the gender could not be determined from the name were removed from the data set. The data set includes the following columns:

Variable Name       Description
Date The date the baby was born
Time The time of the day that the baby was born
Gender B = boy, G = girl
Weight The baby's weight, in ounces
NumtoB For boys, the number of births (including the current one) since the previous boy
NumtoG For girls, the number of births (including the current one) since the previous girl
Interval The length of time, in hours, since the previous birth
Day A day between 7/1/2007 and 12/31/2007
Number The number of births in the data set on that day

Waiting for boys or girls

We will first do a quick check to see if boys and girls appear to be equally likely. Go to Stat --> Tables --> Tally Individual Variables, select the variable "Gender", and click "OK". MINITAB outputs how many times each value (in this case B or G) appears in the column. This command will be useful throughout the lab.
  1. How many boys and how many girls are there in the data set?
Now consider the variables "NumtoB" and "NumtoG". To understand how these variables were computed from the "Gender" variable, consider, for example, the ninth baby, which was a girl. The fourth baby was the previous girl, but then we had to wait for five more babies (numbers 5, 6, 7, 8, and 9) to get the next girl. Assuming that each baby is independently a boy or girl with probability 1/2 each, these numbers should have a geometric distribution with p = 1/2. Make sure you understand why before proceeding.

We will investigate whether the variables "NumtoB" and "NumtoG" really do follow approximately a geometric distribution with p = 1/2. To do this, we will compare what we actually observe with what we would expect to observe if the geometric model were correct. To tally what we actually observed, go again to Stat --> Tables --> Tally Individual Variables and this time select the variables "NumtoB" and "NumtoG".
  1. How many times did we have to wait for just one baby to get a boy? How many times did we have to wait for two babies to get a boy? Three? Four? Five? Six? Seven? Eight? Answer the same questions for girls. It is probably best to display your answers in a table, similar to what MINITAB displays.
To figure out what we should expect, type the numbers 1, 2, ..., 8 into one column. If you are using MINITAB Version 15, go to Calc --> Probability Distributions --> Geometric, click the bubble at the top that says "Probability", type the value for p in the box labeled "Event probability", enter the column in which you typed the numbers 1, 2, ..., 8 in the box "Input Column", choose another column in the box "Optional storage" to record the output, and click "OK". Notice that, for example, the first three values you get are 1/2, 1/4, and 1/8 because if X has a geometric distribution with p = 1/2, then P(X = 1) = 1/2, P(X = 2) = 1/4, and P(X = 3) = 1/8. (Note: if you are using MINITAB Version 14 rather than MINITAB Version 15, then you will not be able to compute probabilities from the geometric distribution this way. However, these calculations are not too hard to do by hand.) Then to figure out how many of each value we expect, we need to multiply these numbers by the total number of boys or girls in our data set, which you can do by hand or by going to Calc --> Calculator. For example, if there were 100 boys, then we would expect to wait for one baby to get a boy 50 times, for two babies 25 times, for three babies 12.5 times, and so on.
  1. How do the numbers you observed in question 2 compare to what you would expect if these numbers followed a geometric distribution with p = 1/2? To answer this question, make a table similar to the one you made for question 2 but with the expected numbers rather than the actual numbers. Do your data roughly agree with what you expected?

The number of births in a day

If birth rates are constant over time, the number of babies born on a given day should have a Poisson distribution. Here you will investigate whether the Poisson model indeed fits the data well.
  1. What is the average number of births per day?

  2. On how many days were there no births? One? Two? Three? Four? Five? Six?

  3. Compare these numbers to what you would expect if these numbers followed a Poisson distribution with the same mean as what you found in question 4. Do the data approximately agree with what would be expected from the Poisson distribution? (To get MINITAB to help with the Poisson distribution computation, type the numbers 0, 1, ..., 6 in one column. Then go Calc --> Probability Distributions --> Poisson, click the bubble that says "Probability", type in the mean you found in question 4, and then proceed as you did for the geometric distribution in the previous question.)

Intervals between births

If births are happening independently of one another at a constant rate, then the length time between births should have an exponential distribution. Here you will investigate whether this is the case. Because time is a continuous variable, you can not proceed by comparing observed and expected counts as before. Instead, you will base your analysis on a histogram of the data. Below are some points to keep in mind: Now answer the following questions:
  1. What is the mean time between births?

  2. Overall, does it appear that the waiting times between births follow approximately an exponential distribution? Provide a plot to support your answer.

Numbers of boys and girls

If you split the babies into groups of size n, then assuming each baby is independently a boy or a girl with probability 1/2 each, the number of boys/girls in a group should have a binomial distribution with p = 1/2. Try splitting the babies into groups of four, and counting the number of girls in each group. To do this, first go to Calc --> Make Patterned Data --> Simple Set of Numbers. Choose a column in which to store your numbers. Go from 1 to 74 in steps of 1, and list each value 4 times. This will generate a column with the numbers from 1 to 74 repeated four times each, splitting the data into 74 groups of 4. Next put an asterisk by hand in row 297 of the column in which your numbers from 1 to 74 are located (to tell MINITAB this number is missing because the last two baby doesn't fit into any of the 74 groups). Now count the number of girls in each group. To do this, go to Stat --> Tables --> Descriptive Statistics. Put the column containing the numbers from 1 to 74 in the "For rows" box, and Gender in the "For columns" box. Then click OK to get numbers of boys and girls in the 74 groups.
  1. Count the number of groups that have zero, one, two, three, and four girls. (You can either do this by hand, or highlight the table and copy and paste it into the worksheet, and then tabulate the variable corresponding to the girls.)

  2. Compare these numbers to what you would expect if these numbers followed the binomial model. (Hint: this is a similar calculation to what you did with the geometric and Poisson distributions, and you may find the Calc --> Probability Distributions --> Binomial command useful.) Do the data match the binomial distribution well?

Birth Weights and Birth Times

  1. Graph the distribution of birth weights. Does the distribution of the birth weights look to be approximately normal? (You could answer this just from the histogram, or you could try superimposing a normal curve on top of the histogram, as you did with the exponential distribution.)
If babies were equally likely to be born at any time of the day, the distribution of the birth times would be approximately uniform.
  1. Make a histogram of the birth times. (Note: MINITAB Version 14 automatically scales times so that the beginning of the day is represented by 0 and the end of the day is represented by 1, but MINITAB Version 15 keeps the scale from 0 to 24.) Does the distribution of birth times appear to be uniform, or is there a different pattern? It would be a good idea to simulate 297 values from a uniform distribution a few times to see how much fluctuation would be expected just by chance. You can do this by going to Calc --> Random Data --> Uniform.

Other Examples

You should now have a good idea of when to expect the six distributions to appear. For the random variables below, indicate whether you would expect the distribution to be best described as geometric, binomial, Poisson, uniform, exponential, or normal. We do not have data, so you will not to use the computer for these questions. (One-word answers are sufficient here.)
  1. The number of goals that a team scores in a hockey game.

  2. The time of day that a meteor enters the Earth's atmosphere

  3. The number of minutes before a store manager gets her next phone call.

  4. The number of 3's that appear in 20 rolls of a die.

  5. The number of days out of the next 10 that a stock will go up.

  6. The amount of time before the next customer arrives in a store.

  7. The number of particles that a radioactive substance emits in the next two seconds.

  8. The number of free throws that a basketball player needs to make before missing one.

Remember that if you discussed this assignment with anyone other than your instructor or TA, then you should add a section called "Acknowledgments" at the end of your report, indicating from whom you received help.