Lab 4: Simulating Bernoulli trials

Recall that a Bernoulli trial is a random experiment with two possible outcomes, which can be called success and failure. Also recall that if we perform n independent Bernoulli trials, each of which is successful with probability p, and denote the number of successes by X, then we say that X has a binomial distribution with parameters n and p. Then, for k = 0, 1, ..., n,

P(X = k) = Cn,k pk (1-p)n-k,

where Cn,k = n!/[k!(n-k)!] is the number of ways to choose k objects out of n. Also recall that E[X] = np and Var(X) = np(1-p). The standard deviation of X is the square root of the variance.

In this lab, you will observe some of the properties of Bernoulli trials and the binomial distribution by simulation. You will also get a glimpse of two of the most important results in probability theory, the Law of Large Numbers and the Central Limit Theorem. You will not be analyzing data for this lab, so just open up a blank MINITAB worksheet to get started.

Simulating Bernoulli trials

To start with a simple case, let's suppose we want to simulate the procedure of tossing a coin five times. Each coin toss is a Bernoulli trial with success probability 1/2, so we can simulate this using MINITAB by going to Calc --> Random Data --> Bernoulli. You will generate a row of data for each coin toss, so put 5 in the top box. To store the values in column C1, type "C1" in the large box in the middle, and then put .5 in the box labeled "Event probability" (or "Probability of Success" if you are using MINITAB Version 14). You should get five numbers, all either zero or one, in the first column. Think of a zero as a tail and a one as a head.
  1. Repeat this procedure three more times (if you wish, you can do the other three repetitions all at once by typing "C2 C3 C4" into the box labeled "Store in column(s)"). How many heads did you get in each of the four experiments?
Now we want to conduct this experiment 1000 times. Rather than going through the procedure above 1000 times, we can simulate from the binomial distribution. The number of heads in five tosses of a coin has the binomial distribution with n = 5 and p = 1/2. Go to Calc --> Random Data --> Binomial. This time, you want 1000 rows of data. Choose a column in which to store the 1000 values, and enter 5 for the number of trials and .5 for the event probability. You should now have a column of 1000 numbers, all between zero and five. Each of these numbers is a binomial random variable, which you can think of as the number of heads in five tosses of a coin. Now you can look at your results by drawing a histogram. To answer some of the questions below, you may find it useful, before clicking on "OK" to draw the histogram, to click on the "Labels" button, then on the "Data Labels" tab. If you highlight the bubble "Use Y-value labels", then MINITAB will display at the top of the bars of the histogram how many times each number arose. You can also get MINITAB to display these results by going to Stat --> Tables --> Tally Individual Variables and selecting the column in which you stored the 1000 numbers.
  1. What is the probability of getting no heads in five tosses? What is the probability of getting exactly two heads in five tosses? [Note that these are probability questions, not questions about your particular simulation. You can either compute the probabilities by hand, or you can compute binomial probabilities P(X = k) using MINITAB by typing the value or values of k into some column, going to Calc --> Probability Distributions --> Binomial, typing the values for n and p into the appropriate boxes, and listing as the "Input Column" the column in which you typed the values of k. Also make sure to select the bubble "Probability". If the bubble "Cumulative probability" is selected instead, then MINITAB will give you P(Xk).]

  2. In your simulation, how many times out of 1000 were there no heads in the five tosses? How many times were there exactly two heads? Include a histogram of your simulation results in your write-up along with your answers to these questions. Remember to make sure it is properly labeled.

  3. Compare your answers to the previous two questions. In the 1000 simulations, did you get zero heads about as many times as you expected (remember that the number of times you expect to get zero heads is the number of simulations times the probability of getting zero heads in one simulation)? What about two heads?
Next, suppose 250 students each take a 10-question multiple choice test in which there are five choices for each question. Assume that the students all choose their answers by random guessing. Simulate this process by generating an appropriate binomial random variable for each student.
  1. Present a table of your simulation results, showing how many students got each number of questions correct. (That is, indicate how many students got 0 correct, how many got 1 correct, how many got 2 correct, how many got 3 correct, and so on.) Compare these numbers to the number of students you would expect to get each number of questions correct (again, the number of students you would expect to get, say, 2 questions correct is the total number of students times the probability that a given student gets 2 questions correct). How well do your simulation results compare to expectations?

  2. What is the maximum number of questions that any of the students got correct? What is the probability that one particular student would get at least this number of questions correct?

Variability in the number of heads

In this section, you will simulate coin tossing and investigate how the variability in the number of heads depends on the number of tosses. (Note: if you are using the student version of MINITAB, which allows only 10,000 cells to be filled, it is possible that during this exercise you will run out of space. If necessary, you can delete a column by highlighting it and then going to Edit --> Delete Cells , or you can start a new worksheet by going to File --> New.)
  1. First, simulate tossing a coin 20 times and counting the number of heads. Do 1000 repetitions of this procedure (so you will generate 1000 numbers, each a binomial random variable with n = 20 and p = 1/2). Plot a histogram of the results. As always, make sure your histogram is properly labeled.

  2. Find the mean and standard deviation of the 1000 numbers that you got. (Remember you can get this using Stat --> Basic Statistics --> Display Descriptive Statistics. How do these numbers compare to the expected value and standard deviation of a binomial random variable with n = 20 and p = 1/2?

  3. Now simulate tossing a coin 200 times. As before, do 1000 repetitions of this procedure, and plot the results in a histogram. Based on the histogram, would it be unusual for the number of heads to be about 5 more or 5 fewer than expected? Would it be unusual for the number of heads to be 20 more or 20 fewer than expected?

  4. Next simulate tossing a coin 20,000 times. Again do 1000 repetitions and make a histogram. Would it be unusual for the number of heads to be about 5 more or 5 fewer than expected? Would it be unusual for the number of heads to be 20 more or 20 fewer than expected?

  5. Now consider not the number of heads but the fraction of heads, that is, the number of heads divided by the total number of tosses, so if 80 out of 200 tosses were heads, the fraction of heads is .40. (You can construct this variable by going to Calc --> Calculator, then in the box labeled "Expression" select the column containing the number of heads, then press the button / and then type the number of tosses, either 20, 200, or 20,000.) Examine the fractions of heads that you got in 20, 200, and 20,000 tosses. The Law of Large Numbers states that as the number of tosses gets larger, the fraction of heads should get closer and closer to 1/2. Is that consistent with what you observe? Explain your answer.

  6. If X denotes the number of heads in n tosses of a coin, what is the standard deviation of the random variable X? Does this standard deviation get larger or smaller when n gets larger? What is the standard deviation of the fraction of heads, which is X/n? Does this get larger or smaller when n gets larger? Relate these theoretical results to what you observed in your simulations when answering the previous three questions.

The shape of the binomial distribution when np and n(1-p) are large

Here you will investigate what the shape of the binomial distribution looks like when the expected number of successes and the expected number of failures are both large.
  1. Generate 1000 random numbers each from the binomial distribution with n = 20 and p = .5, from the binomial distribution with n = 20 and p = .8, and from the binomial distribution with n = 20 and p = .95. Show the histograms and describe the shapes of the three distributions. Are the shapes of the distributions quite different, or do they look approximately the same?

  2. Generate 1000 random numbers each from the binomial distribution with n = 2000 and p = .5, from the binomial distribution with n = 2000 and p = .8, and from the binomial distribution with n = 2000 and p = .95. Show the histograms and describe the shapes of the three distributions. This time, are the shapes of the distributions approximately the same? What do you conclude about how the shape of the distribution depends on the number of trials? (Later you will learn that this happens because of a famous result known as the Central Limit Theorem.)

Remember that if you discussed this assignment with anyone other than your instructor or TA, then you should add a section called "Acknowledgments" at the end of your report, indicating from whom you received help.