For over 100 years, genetics has inspired a lot of exciting probability and statistics, both applications and theory. Epigenetics is a more recent development. What is it, and what challenges to probability and statistics does it present? Apart from a few exceptions, the DNA sequence of an organism, its genome, is the same no matter which cell you consider. If we view the genome as a universal code for an organism, then how do we obtain cellular specificity? That is, why are blood, nerve, skin, muscle and other cells what they are, and so different from one another, as they all have the same genome? The same question could be asked of honey bee queens and workers, as they can have identical DNA. The answer seems to be via epigenetics, where the Greek prefix epi denotes above or on top of; that is epigenetics is on top of genetics. If we think of the genome sequence as the text, some people have likened the epigenome to the punctuation: the epigenetic marks on DNA help decide how the DNA text is read. Epigenetics controls the spatial and temporal expression of genes, and is also associated with disease states. It involves no change in the underlying DNA sequence, and epigenetic marks are typically preserved during cell division. Epigenetics has been studied in a low-throughput way for over 30 years, using a wide variety of tools and techniques from molecular biology, including DNA sequencing and mass spectrometry. With the advent of microarrays 15 years ago, these platforms began to be used to give high-throughput information on epigenetics. Methylation microarrays are now very widely used. In the last 5 years, second- (also called next-) generation DNA sequencing has been used to study epigenetics, in particular using bisulphite-treated DNA or chromatin immunoprecipitation (ChIP) assays, each followed by massively parallel DNA sequencing. There are now large national and international consortia compiling DNA sequence data relevant to epigenetics. If we think of the single reference human genome, there will be literally hundreds of reference epigenomes, and their analysis will occupy biologists, bioinformaticians and biostatisticians for some time to come. This talk will introduce the topic, outline the data becoming available, summarize some of the progress made so far, and point to some probability and statistical challenges.


Terry Speed is head of the Bioinformatics Division of the Walter and Eliza Hall Institute of Medical Research (WEHI) in Melbourne, Australia and Professor (Emeritus) of Statistics at the University of California, Berkeley. Professor Speed's research interests lie in the application of statistics to genetics and genomics, and to related fields such as proteomics, metabolomics and epigenomics. Together with his students and colleagues, he has developed methods of analysis now in daily use in research laboratories worldwide underpinning many of the recent advances in medical research. This work has helped to identify areas of the human genome that contribute to cancer, genes that are vital for embryonic development, and pinpointing malaria proteins responsible for initiating infection in human red blood cells. He is a Fellow and former President of the Institute of Mathematical Statistics and a Fellow of the Australian Academy of Science. His awards include the Pitman Medal and the Australian NHMRC Achievement Award for Excellence in Health and Medical Research. He was recently recognized for his pioneering work with the award of Australia's Victoria Prize for Science and Innovation in the Life Sciences.