Spring 2019

** **

** ********** **Announcements** **************

· **Attendance will be required during the final paper
presentations by all**.
Presentations will be in week 10, starting on 6/3. Each group will have
about 20 minutes to present a paper, plus 5 minutes for questions. You should
try to rehearse in order to keep on time and give a desired presentation. You
are welcome to discuss the paper with the instructor if you like.

**Overview:** Survival outcome is often the ‘ultimate’
outcome, in many critical areas of disease research such as cancer, as well as
recently emerging medical AI. This
course discusses the concepts, theories, and applications associated with
censored and truncated survival data. The topics include likelihood for
right censored and left truncated data, nonparametric estimation of survival
distributions, comparing survival distributions, proportional hazards
regression, semiparametric theory and other extended topics on complex survival
data including competing risks etc. as time permitting.

**Important Note: **You are strongly encouraged to attend
lectures and take notes. You are also strongly encouraged to take advantage of
the office hours to discuss any questions/problems that you have - **Note that
you can make appointments for office hours!**

**Lecture:** MWF 2:00-2:50pm, AP&M 2402

**Instructor:** Ronghui (Lily) Xu

**Office:** APM 5856

**Phone:** 534-6380

**Email: **rxu@ucsd.edu

**Office Hours:**

** **By appointment.

**Teaching
Assistant: ** Denise
Rava

**Email: ** drava@ucsd.edu

** **

**Reference books: **

1. Cox and Oakes, Analysis of Survival Data, Chapman & Hall, 1984

2. Fleming and Harrington, Counting Processes and Survival Analysis, Wiley, 1991

3. O'Quigley, Proportional Hazards Regression, Springer, 2008

4. Kalbfleisch
and Prentice, The Statistical Analysis of Failure Time Data, Wiley, 1^{st}
or 2^{nd} ed.

Not reference but read for fun: Gladwell “David and Goliath” which has the story of the Freireich (1963) leukemia survival data that D.R.Cox used and we also use.

**Topics covered:**

Week 1: Brief review of likelihood methods commonly used in practice; right-censored and left truncated data; Kaplan-Meier estimate of survival.

Week 2: Log-rank test of two-sample survival; weighted log-rank tests and efficiency; counting processes.

Week 3: Parametric survival distributions; likelihood; Cox proportional hazards regression model – partial likelihood.

Week 4: Predict survival
under the Cox model; time-dependent covariates; martingale theory.

Week 5: Profile likelihood; stratified Cox model; goodness-of-fit methods.

Week 6: Case study; model
selection - stepwise, explained variation, information criteria, penalized
log-likelihood.

Week 7: Design of a survival study; other survival models; additive hazards model.

Week 8: Competing risks;
multivariate survival; robust estimation.

Week 9: Semiparametric efficiency.

**Reference papers: **

1. [Introduction] Efron, B. and
Hinkley, D.V. (1978) Assessing the accuracy of the maximum likelihood
estimator: observed versus expected Fisher information. *Biometrika*,
**65**, 457-487.

2. Tsiatis A A. A nonidentifiability aspect of the problem of competing risks. Proceedings of the National Academy of Science USA, 1975; 72: 20-22.

3. Cox DR. (1969) Some sampling problems in technology. In: New Development in Survey Sampling, Ed. Johnson and Smith. Wiley.

4. Vardi Y. Multiplicative censoring, renewal processes, deconvolution and decreasing density: Nonparametric estimation. Biometrika, 1989; 76: 751-61.

5. Tsai, Jewell and Wang, A note on the product-limit estimator under right censoring and left truncation. Biometrika, 1987; 74: 883-6.

6. Wang M-C. Nonparametric estimation of cross-sectional survival data. JASA, 1991; 86: 130-143.

7. Wang M-C. A semiparametric model for randomly truncated data. JASA, 1989; 84: 742-748.

8. Struthers and Farewell. A mixture model for time to AIDS data with left truncation and an uncertain origin. Biometrika, 1989; 76: 814-7.

9. Asgharian M, M’Lan CE, Walfson DB. Length-biased sampling with right censoring: an unconditional approach. J Amer Stat Assoc (JASA) 2002, 97: 201-209.

10. Harrington DP, Fleming TR. A class of rank test procedures for censored survival data. Biometrika, 1982; 69(3): 553-566.

11. Reid N. A conversation with Sir David Cox. Statistical Science, 1994; 9: p449-450 (about the Cox model).

12. Thomsen and Keiding. A note on the calculation of expected survival. Statistics in Medicine, 1991; vol. 10, p. 733-738.

13. Xu R and O’Quigley J. Proportional hazards estimate of the conditional survival function. Journal of the Royal Statistical Society, Series B, 2000; vol.62, p. 667-680.

14. Xu R, Luo Y, Chambers, CD. Assessing the effect of vaccine on spontaneous abortion using time-dependent covariates Cox models. Pharmacoepidemiology and Drug Safety, 2012; 21(8): 844-50; doi: 10.1002/pds.3301.

15. O’Quigley J and Pessione F. The problem of a covariate-time qualitative interaction in a survival study. Biometrics, 1991; 47: 101-115.

16. Xu R, Adak S. Survival analysis with time-varying regression effects using a tree-based approach. Biometrics, 2002; 58: 305-315.

17. Gill R. Understanding Cox’s regression model: a martingale approach. J Amer Stat Assoc (JASA). 1984; 79: 441-447.

18. Andersen PK and Gill RD. Cox’s regression model for counting processes: a large sample theory. The Annals of Statistics, 1982; 10: 1100-1120.

19. Lin et al. Checking the Cox model with cumulative sums of martingale-based residuals. Biometrika, 1993; vol. 80, p. 557-572.

20. Xu R, O’Quigley J. Estimating average regression effect under non-proportional hazards. Biostatistics, 2000; 1: 423-439.

21.
Xu R, Harrington
DP. A semiparametric estimate of treatment effects with censored data.
Biometrics, 2001; 57:875-885.

22. Loftus JR and Taylor JE. A significance test for forward stepwise model selection. http://arxiv.org/pdf/1405.3920.pdf

23.
Akaika H (1973). Information theory and an extension of the
maximum likelihood principle. In: Breakthroughs in Statistics, 1992, vol.1,
p.610-24. Springer, New York.

24. Xu, Vaida and Harrington. Using profile likelihood for semiparametric model selection with application to proportional hazards mixed models. Statistica Sinica, 2009; 19: 819-842.

25. Volinsky, CT and Raftery, AE. Bayesian information criterion for censored survival models. Biometrics, 2000; 56: 256-262.

26. Harezlak et al. Variable selection in regression – estimation, prediction, sparsity, inference. In Li and Xu (ed) ‘High-Dimensional Data Analysis in Cancer Research’. Springer, 2009. (available via elink)

27. Tibshirani, R. The lasso method for variable selection in the Cox model. Statistics in medicine, 1997; 16(4): 385-395.

28. Huang J and Harrington D. Penalized partial likelihood regression for right-censored data with bootstrap selection of the penalty parameter. Biometrics, 2002; 58: 781-791.

29. Fan J, Li R. Variable selection for Cox’s proportional hazards model and frailty model. Annals of Statistics, 2002; 30(1): 74-99.

30. Bradic J, Fan J, Jiang J. Regularization for Cox's Proportional Hazards Model with NP-Dimensionality. Annals of Statistics, 2011; 39(6): 3092-3120.

31. Kent J. Information gain and a general measure of correlation. Biometrika, 1983; 70: 163-173.

32. O’Quigley J, Xu R, Stare J. Explained randomness in proportional hazards models. Statistics in Medicine, 2005; 24: 479-489.

33. Xu R, Chambers C. A sample size calculation for spontaneous abortion in observational studies. Reproductive Toxicology, 2011; 32: 490-493.

34. Gray RJ. Flexible methods for analyzing survival data using splines, with application to breast cancer prognosis. JASA, 1992: 87: 942-951.

35. Chan P, Xu R, Chambers C. A study of R-squared measure under the accelerated failure time models. Communications in Statistics – Simulation and Computation, 2018, 47(2): 380-391.

36. Struthers CA, Kalbfleisch JD. Misspecified proportional hazards models. Biometrika, 1986; 73: 363-369.

37. Lagakos SW, Schoenfeld DA. Properties of proportional-hazards score tests under misspecified regression models. Biometrics, 1984; 40: 1037-1048.

38. Chastang C, Byar D, Piantadosi S. A quantitative study of the bias in estimating the treatment effect caused by omitting a balanced covariate in survival model. Statistics in Medicine, 1988; 7: 1243-1255.

39. Murphy SA, van der Vaart AW. On profile likelihood (with discussion). JASA. 2000; 95: 449-485.

40.
Maples JJ, Murphy
SA, Axinn WG. Two-level proportional hazards models.
Biometrics, 2002; 58: 754-763.

41.
Newey WK.
Semiparametric efficiency bounds. J Applied Econometrics, 1990; 5(2): 99-135.

42. Li X, Xu R. Empirical and kernel estimation of covariate distribution conditional on survival time. Computational Statistics and Data Analysis. 2006; 50(12): 3629-3643.

43. Strandberg E, Lin X, Xu R. Estimation of main effect when covariates have non-proportional hazards. Communications in Statistics – Simulation and Computation, 2014, 43(7): 1760-1770.

44. Prentice RL. On non-parametric maximum likelihood estimation of the bivariate survivor function. Statistics in Medicine, 1999; 18: 2517-2527.

45. Wei LJ, Lin DY, Weissfeld L. Failure time data by modeling marginal distributions. JASA 1989; 84: 1065-1073.

46. Morris CN. Parametric empirical Bayes inference: theory and applications (with discussion). JASA, 1983; 78: 47-65.

47. Vaida F, Xu R. Proprotional hazards model with random effects. Statistics in Medicine, 2000; 19: 3309-3324.

48. Gamst A, Donohue M, Xu R. Asymptotic properties and empirical evaluation of the NPMLE in the proportional hazards mixed-effects model. Statistica Sinica, 2009; 19: 997-1011.

49. Louis TA. Finding the observed information matrix when using the EM algorithm. Journal of the Royal Statistical Society B, 1982; 44(2): 226-233.

50. Ripatti S, Palmgren J. Estimation of multivariate frailty models using penalized partial likelihood. Biometrics, 2000; 56: 1016-1022.

51. Murphy SA. Consistency in a proportional hazards model incorporating a random effect. Annals of Statistics, 1994; 22(2): 712-731.

** **

**Homework:** You may
discuss, but please write them independently. Write your solutions, answers and
results** in your own words **(and in complete sentences, and clearly lay out
your setup, background etc.) in the main part, and append program codes in the
back; all needs to be turned in. Any two students turning in exactly the
same solutions may be considered plagiarism.

Use ONLY the updated lecture notes in TED (Triton Ed) to refer to the assignments.

HW1 (due 4/22 in class):

1.
a) Explain in the
derivation of the Kaplan-Meier estimate, where the assumption that C is
independent of T is used;

b) Write
the Kaplan-Meier estimate in counting process notation.

2. a) Simulate a
sample of size 200 using the standard Exponential (1) distribution. Now
generate C from Uniform (0, c), and choose c such that about 20% of the data
are right-censored. Plot the Kaplan-Meier curve and the 95% confidence
intervals.

b) Now focus on estimating S(0.5) from the above distribution. Repeat the simulation of part a) 1000 times, summarize
the bias, standard error (SE), standard deviation (SD) of the estimates from
the 1000 repeats, and coverage probability (CP) of the 95% confidence
intervals.

3.
Do the 3
exercises on page 30 of Lecture 3 notes (see updated notes in TED).

4.
Do the 3
exercises on page 34 of Lecture 3 notes.

HW2 (due 5/20 in class):

1.
Refer to
Lecture 7 notes, simulate a single data set with n = 100 for Z = 0 and 1 with
probability 0.5 each, use beta(t) = 1.4 – 8.32t from page 18 and baseline
hazard of constant one. Fit the two models on page 16 and test the PH
assumption.

2.
Do the exercise
on the bottom of page 20 of Lecture 7 notes.

3.
Download the PBC data from http://lib.stat.cmu.edu/datasets/,
fit a Cox model with age, bilirubin, protime, albumin
and edema. Use the cumulative martingale-based residuals to check: 1) the
proportional hazards assumption, 2) functional form, of each covariate. Compute
one of the R-squares measures that we talked about.

**Papers for final presentation:**

# 15[6/3], 20[6/3], 21[6/5], 27[6/5], 28[6/7], 31[6/7].

**Grading:** 70% Homework + 30% Final presentation/project