********** Announcements ************
· We are
expecting to have students with MATH 181A preparation from Winter
and Spring 2016 mostly. These two classes
were taught using different textbooks (not a big problem), and covered
different amounts of materials (bigger problem). More specifically, Sp 2016 did not cover any material on hypothesis
testing, or use the Statistical programming language and environment R in their
homework. To make up for these
deficiencies, the plan for this course will be:
1) Start
with hypothesis testing mostly following the structure of Chapter 10 in Wackerly et al. (see below for reference books);
2) Statistical
programming language and environment R will be introduced during TA sessions at
start of the quarter, and will be used in homework assignments.
· In place of syllabus: after
make up/review of general materials on hypothesis testing (basic concepts, Z
tests, t-tests, likelihood ratio tests), we will focus on linear regression,
with some elements of analysis of variance (ANOVA) as a special case. We will cover contingency tables and
chi-squared tests towards the end. More
detailed information are available as the course progresses
and seen from list of topics covered and homework assignments.
· I will
collect feedbacks during the course, i.e. a slip summarizing: 1) what you have
learned; 2) what you don’t understand. These can be used as bonus points for
your total score in the class. The dates
of collection and points will be announced.
· The
midterm will be on Friday, Oct. 28, in class. A cheat sheet is allowed (this is
true for all exams). The materials covered will be through the end of week 4.
Bring sheets of paper (or a blue book) to write the exam on.
· On Friday
9/30 at end of the lecture, I will collect feedback slips from section A01
(i.e. those enrolled for the 5pm TA session) students telling me: 1) when you
had 181A, 2) what you know about hypothesis testing, 3) what you don’t
understand about what we discussed so far.
Students from A02 are welcome to provide feedbacks, but only those in
A01 will receive 2 bonus points towards their total score (of 100 maximum
without bonus) for the quarter.
· On Friday
10/21 at end of the lecture, I will
collect feedback slips from section A02 students
telling me: 0) your major, 1) when you had 181A, 2) what you have
learned since the start of this quarter, 3) what you don’t understand about
what we discussed so far. Students from A01
are welcome to provide feedbacks, but only those in A02 will receive 2 bonus
points towards their total score.
· Wednesday Nov 9 lecture
will be replaced by TA session, partly because Friday 11/11 is a holiday. Among other things, you will discuss the lm() function in R for linear regression.
· On Friday 11/18 at end of the lecture, I
will collect feedback slips from section A01
students telling me: 1) what you don’t understand about what we have discussed
so far; 2) any other feedbacks in general. Students from A02 are welcome to provide
feedbacks, but only those in A01 will receive 2 bonus points towards their
total scores for the quarter.
· On Wednesday 11/23 at end of the lecture,
I will collect feedback slips from section A02
students telling me: 1) what you don’t understand about what we have discussed
so far; 2) any other feedbacks in general. Students from A01 are welcome to provide
feedbacks, but only those in A02 will receive 2 bonus points towards their
total scores for the quarter.
· Just like
for the midterm, please bring sheets of paper (or a blue book) to write the
final exam on.
Lecture: MWF 4
Instructor:
Ronghui (Lily) Xu
Office: APM 5856
Phone: 534-6380
Email: rxu@ucsd.edu
Office
Hours:
M: 1-2pm
F: 2-3pm
Or, by
appointment
Teaching
Assistants: Jue (Marquis) Hou (click link to TA’s
webpage)
Office: APM 6442
Email: j7hou@ucsd.edu
Office Hours: see TA webpage
Textbook: None
Reference
books:
1.
Wackerly et al., "Mathematical
Statistics with Applications"
2. Larsen and Marx, "An Introduction to
Mathematical Statistics and Its Applications" (5th ed. For homework assignments)
3.
Rice, "Mathematical
Statistics and Data Analysis" (3rd ed.
For homework assignments)
Some R codes used in
lecture, see the TA’s webpage (link above) for
more R examples.
See TED (Triton Ed) for examples shown in
lecture.
Topics
covered:
· Hypothesis testing: basics
· Large sample Z-tests
· Sample size calculation
· Duality with confidence intervals
· One-sample and two-sample t-tests
· Nonparametric tests of one- and two-sample location
· Comparing two-sample variances
· Likelihood ratio test
· Neyman-Pearson
lemma
· Uniformly most powerful tests
· Linear regression: least-squares estimation, matrix form of multiple
linear regression
· Simple linear regression: inference, prediction, correlation
· Multiple linear regression: vector-valued random variables, inference,
F-test, R-squared
· Analysis of variance, multiple comparisons using Bonferroni
and Tukey’s method
· Contingency tables and Pearson’s chi-squared test, likelihood ratio test
for Multinomial distribution
Homework: due each (following) week at TA sessions
or in TA dropbox by end of that day (check with TA for exact time) – be sure to turn in
your R program codes (as applicable) as well as the a complete solution
including the setup of the problem, etc. Wackerly
book Chapter 10 exercises will be on the A.S. soft reserves. Replace all ‘Applet’ exercises in the
assignments with R exercises.
Week 1: Wackerly 10.6,
10.7, 10.19, 10.25; also:
1)
Use
R to redo example 10.6 in Wackerly book (see lecture
notes also) using the exact Binomial distribution. Compare with the results using normal
approximation.
2)
Continue
with example 10.8 in Wackerly book (see lecture notes
also), take a grid with increment of 0.1 in [15, 20], compute the power of the
test at each value of the grid as the alternative hypothesis, and plot the
power curve. What is the limit of the
power as μ approaches 15?
3)
Derive
the sample size formula of Z test for the two-sided alternative hypothesis.
Week
2: Wackerly 10.46, 10.54, 10.66, 10.71;
also:
1)
Redo
10.19, 10.25 (2-sided) using confidence intervals;
2)
R
simulation – repeat the following 100 times:
first set sample size n=6, and generate n data points from a) Normal (0,
1); b) Uniform (0, 1); c) Exponential (1); d) Poisson (5), compute the
t-statistic. Plot the histogram of these
100 values of the t-statistic for each of a) b) c), superimpose each histogram
with a smoothed density if you can for visual purposes. Do they look like a
t-distribution to you with 5 degrees of freedom? Finally repeat c) with n=15, and explain the
purpose this exercise and summarize the interpretation of the results.
Week 3: Wackerly 10.81, 10.82, 15.7, 15.9, 15.12, 15.28; Larsen and Marx
6.4.14, 6.4.15
Week 4 (not
due): Wackerly 10.107, 10.109a,
10.99ab, 10.97abc; also:
For testing two-sample means (2-sided) assuming equal variances, check that the likelihood ratio test is equivalent to the 2-sample t-test.
Week 5: Wackerly 10.97d, 10.99cd; also:
For a random sample of size n=20 from Normal (μ, 1), under the null hypothesis: μ=0. Plot the power curves in the same figure of the three Z-tests for the following three alternative hypotheses: 1) μ>0; 2) μ<0; 3) μ≠0. Assume a 0.05 significance level. Comment on the relation of your plots with the UMP test.
Week 6: Larsen and Marx 11.2.4-6, 11.2.14, 11.3.10; Use R to do the following –
also for each model that you fit, plot the residuals and comment on the plots:
Larsen and Marx 11.2.2, Wackerly 11.69, 11.72
Week 7: Larsen and Marx 11.3.24; also use R to do
Larsen and Marx 11.3.16: before you do the parts a) b) from the book, first do a scatter plot of the data, as well as a residual plot to assess the linear model assumption; test the null hypothesis β1 = 0 at 2-sided 0.05 significance level, and give the 95% confidence interval for β1. Write your solution as a mini-report including appropriate tables and/or figures as needed, and append the R codes at the end.
Week 8: Larsen and Marx 11.4.9; Rice chapter 14 – 11, 16; use R to do
Larsen and Marx 11.4.13, Wackerly 11.74
Week 9: Larsen and Marx 12.2.7, 12.2.8, 12.3.1 (this is the data set from the anova example in class,
see .pdf file, do it both ways: using the table for studentized range, and use R), 12.3.6, 12.3.7; use R to do:
Larsen and Marx 12.2.1, 12.3.1 (see above)
Week 10 (not due): Wackerly 10.97: 1) find the MLE of θ, 2) carry out the likelihood ratio (i.e. goodness-of-fit) test
for the Trinomial distribution specified in this problem using θ.
Larsen and Marx 10.5.7: 1) use R to carry out Pearson’s chi-squared test, 2) consider it as a comparison of two Binomial distributions, do the two parts (i.e. carry out the tests) of the last problem on midterm using this data, 3) compare the likelihood ratio test in the first part of 2) with the likelihood ratio under the Multinomial distribution (here k=4), are they the same or not, why?
Larsen and Marx 10.3.7
Grading: 35% Homework (drop 1 lowest
score) + 25% Midterm + 40% Final