Math 11, Lab 4

Lab 4: Simulating Bernoulli trials

Recall that a Bernoulli trial is a random experiment with two possible outcomes, which can be called success and failure. Also recall that if we perform n independent Bernoulli trials, each of which is successful with probability p, and denote the number of successes by X, then we say that X has a binomial distribution with parameters n and p. Then, for k = 0, 1, ..., n,

P(X = k) = C_n,k p^k (1-p)^n-k,
where C_n,k = n!/[k!(n-k)!] is the number of ways to choose k objects out of n. Also recall that E[X] = np and Var(X) = np(1-p). The standard deviation of X is the square root of the variance.

In this lab, you will observe some of the properties of Bernoulli trials and the binomial distribution by simulation. You will also get a glimpse of two of the most important results in probability theory, the Law of Large Numbers and the Central Limit Theorem. You will not be analyzing data for this lab, so just open up a blank Minitab worksheet to get started.

Simulating coin tosses
To start with a simple case, let's suppose we want to simulate the procedure of tossing a coin 6 times. Each coin toss is a Bernoulli trial with success probability 1/2, so we can simulate this using Minitab by going to Calc --> Random Data --> Bernoulli. You will generate a row of data for each coin toss, so put 6 in the top box. To store the values in column C1, type "C1" in the large box in the middle, and then put .5 in the box labeled "Event probability". You should get 6 numbers, all either zero or one, in the first column. Think of a zero as a tail and a one as a head.

In Minitab Express, you can do this by going to Data --> Generate Random Data. Put 6 in the box for "Number of rows in each column", select "Bernoulli" for the distribution, and put .5 in the box for "Event probability"

Repeat this procedure three more times. How many heads did you get in each of the four experiments?

Now we want to conduct this experiment 1000 times. Rather than going through the procedure above 1000 times, we can simulate from the binomial distribution. The number of heads in 6 tosses of a coin has the binomial distribution with n = 6 and p = 1/2. Go to Calc --> Random Data --> Binomial. This time, you want 1000 rows of data. Choose a column in which to store the 1000 values, and enter 6 for the number of trials and .5 for the event probability. You should now have a column of 1000 numbers, all between zero and six. Each of these numbers is a binomial random variable, which you can think of as the number of heads in six tosses of a coin. Now you can look at your results by drawing a histogram. To answer some of the questions below, you may find it useful, before clicking on "OK" to draw the histogram, to click on the "Labels" button, then on the "Data Labels" tab. If you highlight the bubble "Use Y-value labels", then Minitab will display at the top of the bars of the histogram how many times each number arose. You can also get Minitab to display these results by going to Stat --> Tables --> Tally Individual Variables and selecting the column in which you stored the 1000 numbers.

In Minitab Express, you can perform the simulation by going to Data --> Generate Random Data. You want 1000 rows in each column. Choose "Binomial" for the distribution, enter 6 for the number of trials, and enter .5 for the event probability. Remember that to get data labels on a histogram, you click inside the graph, then click the plus sign to the right of the graph, then select "Data Labels". You can also obtain these numbers by going to Statistics --> Summary Statistics --> Tally, then double clicking on the variable in which you stored the 1000 numbers.

What is the probability of getting zero heads in six tosses? What is the probability of getting exactly four heads in six tosses? (Hint: these are probability questions, not questions about your particular simulation. You can either compute the probabilities by hand, or you can compute binomial probabilities P(X = k) using Minitab by typing the value or values of k into some column, going to Calc --> Probability Distributions --> Binomial, typing the values for n and p into the appropriate boxes, and listing as the "Input Column" the column in which you typed the values of k. Also make sure to select the bubble "Probability". If the bubble "Cumulative probability" is selected instead, then Minitab will give you P(X ≤ k).)

The way to compute these probabilities in Minitab Express is to type the values of k into a column. Then go to Statistics --> Probability Distributions --> Probability Density Function. Choose "A column of values" as the "Form of input", and then select the column in which you entered the values of k. Then choose "Binomial" as the distribution, and input the number of trials and the event probability.
In your simulation, how many times out of 1000 were there zero heads in the six tosses? How many times were there four heads? Include a histogram of your simulation results in your write-up along with your answers to these questions. As always, remember to make sure the axes are appropriately labeled.
Compare your answers to the previous two questions. In the 1000 simulations, did you get zero heads about as many times as you expected (remember that the number of times you expect to get zero heads is the number of simulations times the probability of getting zero heads in one simulation)? What about four heads?

Variability in the number of heads
In this section, you will simulate coin tossing and investigate how the variability in the number of heads depends on the number of tosses.

First, simulate tossing a coin 12 times and counting the number of heads. Do 1000 repetitions of this procedure (so you will generate 1000 numbers, each a binomial random variable with n = 12 and p = 1/2). Present a histogram of the results.
Find the mean and standard deviation of the 1000 numbers that you got. (Remember you can get this using Stat --> Basic Statistics --> Display Descriptive Statistics.) Are these numbers close to the expected value and standard deviation of a binomial random variable with n = 12 and p = 1/2?

In Minitab Express, you will use Statistics --> Summary Statistics --> Descriptive Statistics.
Now simulate tossing a coin 120 times. As before, do 1000 repetitions of this procedure, and present the results in a histogram. Based on the histogram, would it be unusual for the number of heads to be about 5 more or 5 fewer than expected? Would it be unusual for the number of heads to be 20 more or 20 fewer than expected?
Next simulate tossing coins 1200 and 12,000 times. As before, do 1000 repetitions of each procedure, and make two histograms. When a coin is tossed 12,000 times, is it unusual for the number of heads to be about 5 more or 5 fewer than expected? Is it unusual for the number of heads to be 20 more or 20 fewer than expected?
Based on your observations above, when the number of tosses increases, does the difference between the actual number of heads and the expected number of heads tend to get larger or smaller?
Now consider not the number of heads but the fraction of heads. The fraction of heads is the number of heads divided by the total number of tosses, so if 60 out of 100 tosses are heads, the fraction of heads is 60/100 = .60. (You can construct this variable by going to Calc --> Calculator, then in the box labeled "Expression" select the column containing the number of heads, then press the button / and then type the number of tosses, either 12, 120, 1200, or 12,000.) Present a side-by-side boxplot of the fractions of heads that you got in 12, 120, 1200, and 12,000 tosses. (To make the boxplot, go to Graph --> Boxplot, then choose "Simple" under "Multiple Y's", put the four columns in which you have recorded the fractions of heads in the "Graph Variables" box, and then click OK.)

To get the fraction of heads in Minitab Express, follow the instructions above except you go to Data --> Formula instead of Calc --> Calculator. For the boxplot, go to Graphs --> Boxplot, then choose "Simple" under "Multiple Y variables". Put the four columns in which you have recorded the fractions of heads in the "Y Variables" box, and then click OK.
The Law of Large Numbers states that as the number of tosses gets larger, the fraction of heads should get closer and closer to 1/2. Examine the side-by-side boxplots that you made for question 10. Are your simulation results in agreement with the Law of Large Numbers? Explain your answer.
If X denotes the number of heads in n tosses of a coin, what is the standard deviation of the random variable X? Does this standard deviation get larger or smaller when n gets larger? (Hint: this is a theoretical question, and is not asking about your simulation results. You should get an algebraic expression involving n.) Relate this theoretical result to the observations that you made from your simulations in question 9.
If X denotes the number of heads in n tosses of a coin, what is the standard deviation of the fraction of heads, which is X/n? Does this standard deviation get larger or smaller as n gets larger? Relate this to the observations that you made in response to question 11.

The shape of the binomial distribution when np and n(1-p) are large
Here you will investigate what the shape of the binomial distribution looks like when the expected number of successes and the expected number of failures are both large.

Generate 1000 random numbers each from the binomial distribution with n = 20 and p = .5, from the binomial distribution with n = 20 and p = .8, and from the binomial distribution with n = 20 and p = .95. Show the histograms and describe the shapes of the three distributions. Are the shapes of the distributions quite different, or do they look approximately the same?
Generate 1000 random numbers each from the binomial distribution with n = 2000 and p = .5, from the binomial distribution with n = 2000 and p = .8, and from the binomial distribution with n = 2000 and p = .95. Show the histograms and describe the shapes of the three distributions. This time, are the shapes of the distributions approximately the same? What do you conclude about how the shape of the distribution depends on the number of trials? (Later you will learn that this happens because of a famous result known as the Central Limit Theorem.)