MATH 282B -- Applied Statistics

Announcements
03-08 :: Homework 6 posted. Due Wednesday 03-14.
02-24 :: Homework 5 posted. Due Friday 03-02.
02-13 :: Homework 4 posted. Due Friday 02-24.
02-02 :: Homework 3 posted. Due Wednesday 02-08 in class.
01-25 :: Homework 2 posted. Due Wednesday 02-01 in class.
01-19 :: Homework 1 posted. Due Wednesday 01-25 in class. See the introduction to R notes below.
01-10 :: Extension students and auditors, please send an email to the instructor to be added to the class email list.
01-10 :: Read the whole page. Download R and familiarize yourself with it.

Schedule and class materials
Review from 282A (linear model, least squares, distribution properties)
Confidence intervals and confidence region (Chap 5)
Introduction to R [code]
Linear regression with R [code] [code]
Straight line regression (Chap 6)
Comparing straight lines (Chap 6) [code]
Polynomial (and spline) regression with R [code]
Polynomial (and spline) regression (Chap 7)
Regression with categorical variables [slides][code]
Multiple linear regression [slides] [code]
Model selection [notes] [code]

Data
Some of the data used in lecture and homework is in the data folder (password required)
StatSci.org (data sets, tutorials)



Class logistics

Meeting Time: MWF 10-11:15
Meeting Place: AP&M 5829

Instructor:
  Ery Arias-Castro {eariasca@math.ucsd.edu}
  Office Hours: M 11:15 - 12:15 and 3-4, W 11:15 - 12:15    AP&M 5141

Grader:
  Li Pan {lipan@math.ucsd.edu}

Topics: algebraic, geometrical and numerical aspects of linear regression; hypothesis testing and confidence intervals/regions; departures from assumptions, including detection of outliers; model selection, including robust methods; analysis of variance (ANOVA); logistic (binomial) regression; Poisson regression.
For the official description, look here.

Prerequisites: some basic linear algebra, probability and statistics.

Grading: Homework (30%), in-class exam (30%) and take-home exam (40%).
  Homework: Students are encouraged to exchange ideas and ask each other questions on the lecture notes. However, homework is individual.
  In-class exam: Will take place during lecture and will cover multiple linear regression, including diagnostics and polynomial regression, and model selection. (More on that later)
  Take-home exam: Will consist of a larger number of problems (the equivalent of 2 homework sets) and will be done individually without input from anyone.

Textbook: We will loosely follow the following textbook:
Seber & Lee, 2003
It is not required. Students are encouraged to look at several other textbooks on linear regression. The following are on reserves at the library:
Montgomery, Peck and Vining, 2001 Draper and Smith, 1998
Sengupta & Jammalamadaka, 2003
Weisberg, 2005

Software: We will use the free statistical package R, popular in academia and research institutions at large. It is a clone of S-PLUS. For an interface that resembles Matlab, check RStudio. The following books are specific to the software R (the first few are available online for free, the other ones will be put on reserves at the library):
An Introduction to R by W.N. Venables, D.M. Smith and the R Development Core Team
simpleR --Using R for introductory statistics by John Verzani
R for Beginners by Emmanuel Paradis
Using R for introductory statistics by John Verzani
Introductory Statistics with R by Peter Dalgaard
Software for data analysis : programming with R by John M. Chambers
A first course in statistical programming with R by W. John Braun, Duncan J. Murdoch
The R book by Michael J. Crawley
Data analysis and graphics using R : an example-based approach by John Maindonald and W. John Braun

Other Resources
Jonathan Taylor's STATS 191 and STATS 203 at Stanford.
Larry Wasserman's STAT 707 at CMU.
Bret Larget's Statistics 527 at University of Wisconsin, Madison.
Vincent Zoonekynd's notes.