## Lab 4: Simulating Bernoulli trials

Recall that a Bernoulli trial is a random experiment with two possible outcomes, which can be called success and failure. Also recall that if we perform n independent Bernoulli trials, each of which is successful with probability p, and denote the number of successes by X, then we say that X has a binomial distribution with parameters n and p. Then, for k = 0, 1, ..., n,

P(X = k) = Cn,k pk (1-p)n-k,

where Cn,k = n!/[k!(n-k)!] is the number of ways to choose k objects out of n. Also recall that E[X] = np and Var(X) = np(1-p). The standard deviation of X is the square root of the variance.

In this lab, you will observe some of the properties of Bernoulli trials and the binomial distribution by simulation. You will also get a glimpse of two of the most important results in probability theory, the Law of Large Numbers and the Central Limit Theorem. You will not be analyzing data for this lab, so just open up a blank Minitab worksheet to get started.

Simulating coin tosses

To start with a simple case, let's suppose we want to simulate the procedure of tossing a coin 5 times. Each coin toss is a Bernoulli trial with success probability 1/2, so we can simulate this using MINITAB by going to Calc --> Random Data --> Bernoulli. You will generate a row of data for each coin toss, so put 5 in the top box. To store the values in column C1, type "C1" in the large box in the middle, and then put .5 in the box labeled "Event probability" (or "Probability of Success" if you are using MINITAB Version 14). You should get 5 numbers, all either zero or one, in the first column. Think of a zero as a tail and a one as a head.
1. Repeat this procedure three more times (if you wish, you can do the other three repetitions all at once by typing "C2 C3 C4" into the box labeled "Store in column(s)"). How many heads did you get in each of the four experiments?
Now we want to conduct this experiment 1000 times. Rather than going through the procedure above 1000 times, we can simulate from the binomial distribution. The number of heads in 5 tosses of a coin has the binomial distribution with n = 5 and p = 1/2. Go to Calc --> Random Data --> Binomial. This time, you want 1000 rows of data. Choose a column in which to store the 1000 values, and enter 5 for the number of trials and .5 for the event probability. You should now have a column of 1000 numbers, all between zero and five. Each of these numbers is a binomial random variable, which you can think of as the number of heads in five tosses of a coin. Now you can look at your results by drawing a histogram. To answer some of the questions below, you may find it useful, before clicking on "OK" to draw the histogram, to click on the "Labels" button, then on the "Data Labels" tab. If you highlight the bubble "Use Y-value labels", then MINITAB will display at the top of the bars of the histogram how many times each number arose. You can also get MINITAB to display these results by going to Stat --> Tables --> Tally Individual Variables and selecting the column in which you stored the 1000 numbers.
1. What is the probability of getting zero heads in five tosses? What is the probability of getting exactly one head in five tosses? (Hint: these are probability questions, not questions about your particular simulation. You can either compute the probabilities by hand, or you can compute binomial probabilities P(X = k) using Minitab by typing the value or values of k into some column, going to Calc --> Probability Distributions --> Binomial, typing the values for n and p into the appropriate boxes, and listing as the "Input Column" the column in which you typed the values of k. Also make sure to select the bubble "Probability". If the bubble "Cumulative probability" is selected instead, then MINITAB will give you P(Xk).)

2. In your simulation, how many times out of 1000 were there zero heads in the five tosses? How many times was there one head? Include a histogram of your simulation results in your write-up along with your answers to these questions. As always, remember to make sure the axes are appropriately labeled.

3. Compare your answers to the previous two questions. In the 1000 simulations, did you get zero heads about as many times as you expected (remember that the number of times you expect to get zero heads is the number of simulations times the probability of getting zero heads in one simulation)? What about one head?
Next, suppose 200 students each take a 10-question multiple choice test in which there are five choices for each question. Assume that the students all choose their answers by random guessing. Simulate this process by generating an appropriate binomial random variable for each student.
1. Present a table of your simulation results, showing how many students got each number of questions correct. Compare these numbers to the number of students you would expect to get each number of questions correct (again, the number of students you would expect to get, say, 2 questions correct is the total number of students times the probability that a given student gets 2 questions correct). How well do your simulation results compare to expectations?

2. What is the maximum number of questions that any of the students got correct? What is the probability that one particular student would get at least this number of questions correct? (Note: you may find it useful to use Calc --> Probability Distributions --> Binomial with the "Cumulative probability" bubble selected.)

Variability in the number of heads

In this section, you will simulate coin tossing and investigate how the variability in the number of heads depends on the number of tosses. (Note: if you are using the student version of Minitab, which allows only 10,000 cells to be filled, it is possible that during this exercise you will run out of space. If necessary, you can delete a column by highlighting it and then going to Edit --> Delete Cells, or you can start a new worksheet by going to File --> New.)
1. First, simulate tossing a coin 15 times and counting the number of heads. Do 1000 repetitions of this procedure (so you will generate 1000 numbers, each a binomial random variable with n = 15 and p = 1/2). Present a histogram of the results.

2. Find the mean and standard deviation of the 1000 numbers that you got. (Remember you can get this using Stat --> Basic Statistics --> Display Descriptive Statistics.) Are these numbers close to the expected value and standard deviation of a binomial random variable with n = 15 and p = 1/2?

3. Now simulate tossing a coin 150 times. As before, do 1000 repetitions of this procedure, and present the results in a histogram. Based on the histogram, would it be unusual for the number of heads to be about 5 more or 5 fewer than expected? Would it be unusual for the number of heads to be 20 more or 20 fewer than expected?

4. Next simulate tossing coins 1500 and 15,000 times. As before, do 1000 repetitions of each procedure, and make two histograms. When a coin is tossed 15,000 times, is it unusual for the number of heads to be about 5 more or 5 fewer than expected? Is it unusual for the number of heads to be 20 more or 20 fewer than expected?

5. Based on your observations above, when the number of tosses increases, does the difference between the actual number of heads and the expected number of heads tend to get larger or smaller?

6. Now consider not the number of heads but the fraction of heads. The fraction of heads is the number of heads divided by the total number of tosses, so if 60 out of 100 tosses are heads, the fraction of heads is 60/100 = .60. (You can construct this variable by going to Calc --> Calculator, then in the box labeled "Expression" select the column containing the number of heads, then press the button / and then type the number of tosses, either 15, 150, 1500, or 15,000.) Present a side-by-side boxplot of the fractions of heads that you got in 15, 150, 1500, and 15,000 tosses. (To make the boxplot, go to Graph --> Boxplot, then choose "Simple" under "Multiple Y's", put the four columns in which you have recorded the fractions of heads in the "Graph Variables" box, and then click OK.)

7. The Law of Large Numbers states that as the number of tosses gets larger, the fraction of heads should get closer and closer to 1/2. Examine the side-by-side boxplots that you made for question 12. Are your simulation results in agreement with the Law of Large Numbers? Explain your answer.

8. If X denotes the number of heads in n tosses of a coin, what is the standard deviation of the random variable X? Does this standard deviation get larger or smaller when n gets larger? (Hint: this is a theoretical question, and is not asking about your simulation results. You should get an algebraic expression involving n.) Relate this theoretical result to the observations that you made from your simulations in question 11.

9. If X denotes the number of heads in n tosses of a coin, what is the standard deviation of the fraction of heads, which is X/n? Does this standard deviation get larger or smaller as n gets larger? Relate this to the observations that you made in response to question 13.

The shape of the binomial distribution when np and n(1-p) are large

Here you will investigate what the shape of the binomial distribution looks like when the expected number of successes and the expected number of failures are both large.
1. Generate 1000 random numbers each from the binomial distribution with n = 20 and p = .5, from the binomial distribution with n = 20 and p = .8, and from the binomial distribution with n = 20 and p = .95. Show the histograms and describe the shapes of the three distributions. Are the shapes of the distributions quite different, or do they look approximately the same?

2. Generate 1000 random numbers each from the binomial distribution with n = 2000 and p = .5, from the binomial distribution with n = 2000 and p = .8, and from the binomial distribution with n = 2000 and p = .95. Show the histograms and describe the shapes of the three distributions. This time, are the shapes of the distributions approximately the same? What do you conclude about how the shape of the distribution depends on the number of trials? (Later you will learn that this happens because of a famous result known as the Central Limit Theorem.)