Friday, April 5th, 5:30 PM to 7:00 PM
Warren Lecture Hall Auditorium (room 2001), UCSD
TERRY SPEED (Walter and Eliza Hall Institute of Medical Research and UC Berkeley)
Epigenetics: new challenges for probability and statistics
For over 100 years, genetics has inspired a lot of exciting probability and statistics, both applications and theory. Epigenetics is a more recent development. What is it, and what challenges to probability and statistics does it present? Apart from a few exceptions, the DNA sequence of an organism, its genome, is the same no matter which cell you consider. If we view the genome as a universal code for an organism, then how do we obtain cellular specificity? That is, why are blood, nerve, skin, muscle and other cells what they are, and so different from one another, as they all have the same genome? The same question could be asked of honey bee queens and workers, as they can have identical DNA. The answer seems to be via epigenetics, where the Greek prefix epi denotes above or on top of; that is epigenetics is on top of genetics. If we think of the genome sequence as the text, some people have likened the epigenome to the punctuation: the epigenetic marks on DNA help decide how the DNA text is read. Epigenetics controls the spatial and temporal expression of genes, and is also associated with disease states. It involves no change in the underlying DNA sequence, and epigenetic marks are typically preserved during cell division. Epigenetics has been studied in a low-throughput way for over 30 years, using a wide variety of tools and techniques from molecular biology, including DNA sequencing and mass spectrometry. With the advent of microarrays 15 years ago, these platforms began to be used to give high-throughput information on epigenetics. Methylation microarrays are now very widely used. In the last 5 years, second- (also called next-) generation DNA sequencing has been used to study epigenetics, in particular using bisulphite-treated DNA or chromatin immunoprecipitation (ChIP) assays, each followed by massively parallel DNA sequencing. There are now large national and international consortia compiling DNA sequence data relevant to epigenetics. If we think of the single reference human genome, there will be literally hundreds of reference epigenomes, and their analysis will occupy biologists, bioinformaticians and biostatisticians for some time to come. This talk will introduce the topic, outline the data becoming available, summarize some of the progress made so far, and point to some probability and statistical challenges.