The Epigenome: A New View Into the Book of Life
There is growing recognition that epigenetics may be just as important as genetics in human health and disease.
In the early 19th century, Jean-Baptiste Lamarck explained evolution as the inheritance of acquired traits; he believed that changes due to behaviors and exposures in one generation could be passed to subsequent generations. The theory has long since been dismissed. Our actions rarely affect the genetic code of our germline, so our children cannot inherit the consequences, modern genetics assures us.
Surprisingly, however, there may be some truth to Lamarckian inheritance after all. It turns out that our behaviors and exposures can modify our epigenome—causing heritable changes in gene expression without altering the nucleotide sequence. These changes (such as DNA methylation) can be passed down to our offspring, with profound consequences. The phenomenon is well documented in mice, and recent human studies suggest that our food choices and smoking habits may actually affect our kids’ and grandkids’ risks for diabetes, obesity, and early death.
This is just one of the many potential paradigm shifts arising out of the burgeoning field of epigenetics. Though epigenetics has long been recognized as important—we’ve known for decades that it is involved in development, cell differentiation, imprinting, and X-chromosome inactivation—it was seen as a side-show to the main attraction, genetics. That view is rapidly changing, however, as there is growing recognition that epigenetics may be just as important as genetics in human health and disease.
In 2008, the NIH launched a $190-million Roadmap Epigenomics Program, which has established centers to map the human epigenome and funded technology development and disease-related projects in epigenetics. In one of the early successes of this initiative, the first complete map of a human epigenome (detailing DNA methylation for two human cell lines) was published in Nature last November. Even TIME magazine hailed the breakthrough as the number two scientific discovery of 2009. In February of this year, scientists also announced the International Human Epigenome Consortium, a joint effort between the NIH and the European Commission to map 1000 reference epigenomes within a decade.
Studying the epigenome is orders of magnitude more difficult than studying the genome: organisms have a single genome, but hundreds of epigenomes that vary by cell type and developmental stage; the genome comprises just four nucleotides, but the epigenome has many diverse features—including DNA methylation and numerous changes to the proteins that pack DNA into chromatin. The technologies for epigenome-wide studies are just coming online; and they present formidable challenges for computational biologists and bioinformaticians, who must figure out how to process and integrate the enormous amounts of data, as well as correlate them with exposures and diseases.
“The computational epigenetics field is not very developed,” says Christoph Bock, PhD, a research scholar at the Broad Institute and Harvard Stem Cell Institute. “But this is going to change over the next few years.”
Though the field of epigenetics is still in its infancy, the potential payoffs are enormous. Epigenetics has been implicated in cancer, aging, diabetes, mental illness, autism, and Alzheimer’s disease. The epigenome is more readily changeable than the genome, which could potentially revolutionize how we prevent, diagnose, and treat disease. Already, several epigenetic drugs are being used to treat cancer.
“The good news is, in terms of future clinical potential, the epigenome is reversible. So, if there’s a state that you can alter by chemical means—the methylation profile, for example—you can potentially reverse an epigenetic effect,” says Joseph R. Ecker, PhD, professor in Genomic Analysis Laboratory at the Salk Institute.
Developing the Epigenetic Toolkit
The best studied epigenetic feature is DNA methylation: methyl groups (–CH3) are added to cytosine bases, generally in the context of neighboring cytosines and guanines (CG dinucleotides) such that both DNA strands contain a methyl-C symetrically. Methylation is preserved during mitosis and meiosis, and it serves to silence genes (by blocking transcription factors or recruiting proteins that compact chromatin). Other epigenetic features include biochemical modifications of the histone proteins that wrap DNA into chromatin. For example, adding acetyl or methyl groups to certain lysine residues (e.g. H3K4) in the tails of the histones makes DNA spool more loosely, turning genes on; whereas adding methyl groups to other lysine residues (e.g. H3K9 or K27) makes DNA spool more tightly, shutting genes off. These changes are preserved during cell division, though the mechanisms are not well understood. Non-coding RNAs (RNAs that are transcribed but not made into proteins) also play a role in the epigenome, helping to guide and set up the other epigenetic marks or keeping chromatin open by mere transcriptional activity, says Michael Zhang, PhD, professor of computational biology and bioinformatics at Cold Spring Harbor Laboratory (and in the process of moving to University of Texas, Dallas, to set up a new Center for Systems Biology), who is part of the Roadmap initiative to map reference human epigenomes.
The gold standard for detecting methylation is to treat DNA with bisulfite prior to sequencing. Bisulfite converts cytosines to uracils unless they are protected by methylation, so surviving Cs represent methyl-Cs. The gold standard for histone marks is chromatin immunoprecipitation, or ChIP: DNA is crosslinked to histone proteins and then exposed to antibodies that recognize specific modifications (e.g., acetylation of lysine 5), followed by microarray analysis (ChIP-on-chip) or direct sequencing (ChIP-Seq). Most epigenome-wide studies to date have been done on arrays, but researchers are increasingly turning to next-generation sequencing (epigenome-wide bisulfite sequencing, ChIP-Seq, and RNA-Seq) in lieu of arrays. Sequencing remains cost-prohibitive for large epidemiology studies on human tissue, but even this will change in the next few years.
“The field of epigenetics is moving at an incredible pace, almost exclusively driven by technology development in the sequencing field,” says Bock, who tackles computational epigenetics at the Broad Institute, one of the NIH Roadmap’s epigenome sequencing centers. “The second generation sequencers are absolutely key for everything we do; and the new machines that are going to become available this year will again change everything we are doing.”
“The epigenome-wide methods are advancing so rapidly that if we wait a year, we’ll be able to get two to four times as many marks at half the cost,” agrees David A. Bennett, MD, professor of neurological sciences at Rush University and director of the Rush Alzheimer’s Disease Center, who is PI on a Roadmap grant to study the epigenetics of cognitive decline and dementia.
One of the goals of the Roadmap initiative is to help standardize epigenome technology, particularly approaches for processing and analyzing the data.
“At the first Roadmap meeting, it was kind of an eye-opener for me to see how early in the process everyone is—even with cell lines, where they’re doing this to known cells types, let alone to a chunk of brain tissue with different cells in it,” Bennett says. “It’s not just the math and the hardware and the software; there are some big conceptual issues about how to approach some of these datasets.”
Bock agrees. “You can just buy the latest Illumina sequencer and download protocols for ChIP-Seq, and experimentally you’re fine,” Bock says. “But where you’re not fine is how you’re going to analyze the data.”
More and more, he says, that’s what his sequencing center needs to provide as a service—the ability to quickly make sense of the data. “Over the last year, the focus was so much driven by just surviving the wave of data that was coming down on us that a lot of the work was algorithmically relatively basic. So there were no complex models involved, but everything had to be ultra-high speed and highly optimized code so we could process these huge amounts of data.”
But primary processing of the data is just the first bioinformatics challenge. Researchers must also tackle the higher-level issues: how to integrate the different epigenetic marks with each other and with genome and gene expression data; how to identify and interpret cross-talk between different epigenetic marks; how to make predictions about biological function; and how to compare samples, such as from diseased cases and healthy controls.
“Comparison of epigenomes is not yet a defined problem. We are defining it and implementing solutions as we go,” says Aleksandar Milosavljevic, PhD, associate professor of molecular and human genetics and a PI of the NIH Roadmap’s Data Analysis and Coordination Center at Baylor College of Medicine.
Mapping the Human Epigenome
The Roadmap initiative established four Reference Epigenome Mapping Centers, which are covering different aspects of the epigenome (different assays, epigenetics marks, or cell types) with the aim of filling in a matrix of targets. “We are prioritizing the filling of that matrix, so that meaningful analysis can be done at various stages,” Milosavljevic says. The first Human Epigenome Atlas data freeze occurred on April 1st.
The mapping centers send their data to the coordinating center (at Baylor), which has developed pipelines for processing and merging the data. “We define and facilitate data flow, data analysis, integrative analysis, and quality control and coordination with all participants,” Milosavljevic says. The standards and computational tools that they’ve developed will also serve as resources for the larger epigenetic community.
One of the first fruits of the Roadmap initiative has been the complete sequencing of a human epigenome at base-level resolution, published in the November 2009 issue of Nature. This was a collaborative effort involving some members of the sequencing consortia, led by Joseph Ecker of the Salk Institute. They used bisulfite treatment combined with next-generation sequencing to map the methylation profiles (the “methylome”) of a well-known embryonic stem cell line and a differentiated cell line. “Saying you’ve mapped the human epigenome is correct, but it’s not all of the human epigenome—it’s just two cell types,” Ecker notes. “But it’s a start.”
Second-generation sequencing was key, Ecker says. “That’s what allowed us to do 30-times coverage of the genome for two different genomes in a fairly short time.”
Every step posed major computational challenges, Ecker says. The sequencers produced terabytes-worth of data and just figuring out how to move these data off the machines was initially an issue. They also had to develop new approaches to interpreting the data. For example, they developed an algorithm called “Hammer,” which finds methylation sites with a low false discovery rate. “This is the informatics guys’ joke because we’re measuring methyl-C—also called MC—so it’s MC Hammer,” Ecker quips.
Previous attempts to look at methylation across the human genome have been array-based. But arrays only look for methylation in certain places. For example, many arrays only query “CpG islands”, CG-rich areas of the genome that are typically found in gene promoters. In contrast, whole methylome sequencing reveals methyaltion in all its contexts.
“We didn’t have any expectations of what the epigenome was going to look like in terms of its methylation content. So we looked broadly and without bias, and we saw things that were completely unexpected,” Ecker says.
Surprisingly, 25 percent of the methylation in the embryonic stem cells was in a non-CG context (i.e, C next to A or C next to T). It is “a complete mystery” how non-CG methylation is maintained during DNA replication, Ecker says. When the DNA unwinds, the complementary portion of the DNA strand contains no Cs (and, therefore, no methyl-Cs) so there’s nothing for the methylation machinery to copy on that strand. They also identified another novel type of methylation: differentiated cells, but not stem cells, contain long stretches of half-methylated DNA (which they called “partially methylated domains”). The significance of these domains is unknown, but other groups have now identified them in cancer cells, Ecker says.
“I think we learned from this that we should probably have less bias about what we’re going to find, otherwise we won’t find it,” Ecker says.
Connecting Nurture with Nature
Part of the excitement surrounding the epigenome is that it is far more responsive to the environment than the genome. In fact, epigenetics blurs the lines of the nature versus nurture debate, as it turns out that our environments can extensively impact our inherent biology. Monozygotic twins start life with highly similar epigenomes, but their epigenomes diverge as they age, particularly if they’ve had different environmental exposures.
“Epigenetics gives you a way to bridge the gene versus environment question. It is genetic in a way, because you can measure it just like DNA, but it is also influenced by the environment,” Bock says. “So you can’t really give people a machine to carry with them that over a lifetime measures all environmental exposures. But the epigenome might provide such a machine, because it responds to all kinds of external influences.” Epigenetic changes have been related to aging, smoking, diet, alcohol, asbestos, arsenic, inflammation, heavy metals, ultraviolet radiation, infection, toxins, stress, and psychological abuse.
“I think the sexiness of epigenetics is that there’s this potential connection between exposures and the biological consequences of them. You can even measure the biological consequences of social impacts,” says Margaret Daniele Fallin, PhD, associate professor of epidemiology at the John Hopkins University Bloomberg School of Public Health, who is co-PI on a Roadmap grant to study the epigenetics of autism.
The reference epigenomes will be incredibly valuable, but they won’t tell us much about how epigenomes vary in healthy populations and in response to the environment, says Karl Kelsey, MD, professor of community health at Brown University. Conducting such epidemiological studies is a much harder task because you have to study many people and many different tissues, some of which may not be easily accessible (unlike for genetic studies, where DNA can be acquired from any accessible cells).
“Every tissue in your body has a different epigenome. So if you’re looking for variability, you’ve got to look at every tissue, and suddenly the problem becomes more complicated,” Kelsey says. Most studies to date have used array-based technologies, because it’s still cost-prohibitive to sequence so many samples.
For example, in a 2009 paper in PLoS Genetics, Kelsey and his colleagues used Illumina GoldenGate arrays—which probe methylation in 1500 targeted CpG sites from hundreds of genes—to characterize 217 normal human tissue samples from 10 anatomical sites. One of the computational challenges is that individual methylation events cannot be assumed to be independent, so statistical methods need to account for this correlation across methylation sites. “I think this is a poorly understood and poorly recognized problem, for methylation data certainly,” Kelsey says. They used a particular recursive partitioning algorithm (developed by E. Andrés Houseman, ScD, assistant professor of community health at Brown University) that uses mixture models to deal with this problem. “It’s a very interesting solution,” Kelsey says.
They showed that methylation increases with age within CpG islands, but decreases with age at other loci. In previous work, they showed that smoking methylates the promoter region of the p16 tumor suppressor gene in lung tissue. These same methylation changes have also been linked to cancer.
Diagnosing and Treating Cancer
Epigenetics has been a hot topic among cancer researchers for more than a decade—much longer than for other diseases. Epigenetic changes have been identified in almost all cancers. Indeed, according to Kelsey, “Epigenetics is an equal partner to genetic change in creating cancer.” The epigenome can silence tumor suppressor genes or wake up oncogenes or imprinted genes.
Epigenetics holds promise for early detection, as many of the changes appear to occur early in the progression to cancer. “In many cases, before tumor suppressor genes get deleted, their promoters may first get shut off by DNA methylation,” Zhang says. Cancer cells often slip into the blood, so it may be possible to pick up epigenetic signatures from a simple blood test. “So, people have done a lot with looking at changes in blood, urine, and sputum,” Kelsey says. “And I think that’s a real possibility. To date, most of the assays haven’t really borne a lot of fruit. But we’re just really starting to apply them.”
Beyond diagnosis, epigenetic markers may be able to predict the development of cancer even before it appears. For example, about 1 in 200 patients with Barrett’s esophagus, a premalignant condition, will go on to develop esophageal cancer each year. Currently, there is no way to predict who will get cancer, so every patient must undergo repeated endoscopies, which entail high costs, inconvenience, and anxiety, says Stephen J. Meltzer, MD, the Hendrix/Myerberg Professor of Medicine and Oncology at John Hopkins University.
His team has identified a panel of eight epigenetic markers (hypermethylated tumor suppressor genes) that can correctly distinguish progressors from nonprogressors about 75 percent of the time. “When we developed this assay, we didn’t have the genome-wide or the epigenome-wide tools that are now available. So some of the markers we originally chose to study in-depth may not be the best or the only ones we end up with,” says Meltzer, who has a Roadmap grant to search the epigenome for novel markers.
Epigenetic changes may be reversible, which makes them a prime target for treatment and prevention. Already, four epigenetic drugs that re-activate silenced genes (presumably tumor suppressor genes) have been approved for treating blood cancers: two that demethylate DNA and two that maintain histone acetylation (histone deacetylase inhibitors). These drugs lack specificity, and in theory could inadvertently turn on oncogenes. But, so far, they appear to do more good than harm. Researchers are hopeful that new agents and combinations of agents will be even more effective and work on solid tumors.
Epigenetic cancer research to date has focused on the most obvious targets—methylation of tumor suppressor genes or CpG islands. But the most relevant epigenetic events may be happening outside of these contexts, as demonstrated by research at Johns Hopkins University. “Andy [Feinberg] had an intuition that that wasn’t the place to be looking. I’m not quite sure why, but it turned out to be right. He wanted an array that didn’t bias toward genes or islands. We spent a lot of time developing that array,” says Rafael Irizarry, PhD, professor of biostatistics at the Bloomberg School of Public Health, who collaborated on the array—called CHARM (comprehensive high-throughput arrays for relative methylation)—with Andrew Feinberg, MD, professor of molecular medicine, oncology, and molecular biology & genetics at Johns Hopkins University School of Medicine. The array probes regions of high CG content regardless of whether they are CpG islands.
Figuring out how to analyze the data from the arrays has been a challenge, Irizarry says. Just as with gene expression arrays, there is a multiplicity problem—many signals will arise simply by chance rather than as the result of true differences between cases and controls; and distinguishing these is tricky. “It’s the same exact problem, except it’s harder,” Irizarry says. Gene expression arrays focus on predefined units, genes, whereas methylation arrays focus on open-ended regions; so there is uncertainty in defining the regions of interest as well as interpreting the intensity of their signals. “Basically, what it comes down to is there are really two dimensions. It’s not just the level of expression; it’s the size of the region and the height,” Irizarry says. They are still fine-tuning their algorithms, he says.
In a 2009 paper in Nature Genetics, Irizarry and colleagues used CHARM to compare normal tissue and colon cancer samples. As expected, they found many differences in the methylation profiles. What was surprising is that these differentially methylated regions were rarely in CpG islands; rather, they were in regions adjacent to the islands, which they deemed “shores.” Interestingly, while CpG islands generally become methylated during cancer, the shores (which may represent alternative transcription start sites) were equally likely to become demethylated as methylated. Their findings imply that many cancer studies to date have been looking in the wrong places for key methylation changes. Irizarry’s team is now making the switch from arrays to next-generation sequencing. “That introduces a whole new set of problems,” Irizarry says.
Epigenetic changes beyond methylation also play a key role in cancer, but these haven’t been well studied, says Terumi Kohwi-Shigematsu, PhD, senior scientist in the Life Sciences Division of the Department of Energy's Lawrence Berkeley National Laboratory, who is PI on a Roadmap grant to study the epigenomics of breast cancer. Her work focuses on higher-level changes in chromatin structure that affect gene expression. “Epigenetics is probably governed at these higher architectural levels,” she says.
In a 2008 paper in Nature, Kohwi-Shigematsu’s team reported that the protein SATB1, which helps fold chromatin, is overexpressed in aggressive breast cancer cells and correlates with metastasis and poor survival. SATB1 binds to certain regions of DNA and recruits histone-modifying and chromatin-remodeling enzymes that in turn alter the expression of about 1000 genes, many of which are known players in metastasis. “SATB1 provides the mechanism for assembling all this 3D genome architecture to locally determine epigenomic modifications and thereby regulate a large number of genes,” Kohwi-Shigematsu says. Introducing the protein into nonmetastatic breast cancers in mice induces invasive tumors while depleting it from metastatic cells in mice reverses tumors. Thus the protein has implications for both prognosis and treatment. SATB1 binds to certain specialized regions of DNA throughout the genome. Kohwi-Shigematsu’s team will use ChIP-Seq to map these areas and identify which ones are bound by SATB1 in aggressive breast cancer. Computational epigenetics researchers are still working out the optimal algorithms for analyzing ChIP-Seq data.
Understanding the Brain
Methylation plays a key role in brain development; for example, several developmental disorders involve loss of genomic imprinting and one (Rett Syndrome) is caused by a mutation in the enzyme that methylates DNA. Epigenetics has also been implicated in mental illness, and several psychiatric drugs have known epigenetic effects. For example, valproic acid, used to treat epilepsy and bipolar disorder, is a histone deacetylase inhibitor.
“I think epigenetics is a lot more exciting than the genome, especially for the brain, which is so plastic,” Bennett says.
Several pivotal epigenetic studies have focused on the brain. For example, in a groundbreaking paper in Nature Neuroscience in 2004, scientists from McGill University showed that maternal nurturing directly affects psychological development through an epigenetic mechanism. When infant rats were neglected by their mothers (whether biological or foster mothers), the glucocorticoid receptor gene in their brains became methylated (in the promoter). This change persisted into adulthood and caused the rats to be highly stressed. In contrast, baby rats that were extensively groomed and cared for by their mothers had reduced methylation of the gene and became more relaxed adults. Moreover, both effects were reversible in the adults—nurtured rats given a shot of methionine (a DNA-methylating agent) to the brain became stressed and neglected rats given a histone deacetylase inhibitor (which indirectly reduces methylation) became calm.
There’s mounting evidence that epigenetics is a key player in autism as well, Fallin says. She received a Roadmap grant to study epigenetics in the EARLI (Early Autism Risk Longitudinal Investigation) cohort. The study recruits participants when they learn they’re pregnant and follows them through the birth of the baby and the first three years of life. “So we get the whole window of potential exposures, and then we get the early development of the child,” Fallin says. Her team will correlate exposures with methylation changes in the mothers’ and babies’ blood cells and then try to link these to autism outcomes.
Currently, they are using arrays to study the methylation profiles, but “just like everyone else, we are thinking about how you get this directly from sequencing,” says Fallin, who works closely with Irizarry.
At the other end of life, epigenetics may play a role in Alzheimer’s disease and dementia. “When I came across the epigenetics literature, investigators were just beginning to conduct preclinical animal studies examining how the brain might be using epigenetic marks as a way of coding long-term memory,” Bennett says. He received a Roadmap grant to look at epigenetics within two long-standing studies of older people: the Rush Memory and Aging Project and the Religious Order study. Participants undergo annual cognitive testing, provide information about life experiences, and, when they die, donate their brains to the study (nearly 800 brains so far). Bennett’s team will obtain methylation profiles for brain tissue using next-generation sequencing. They have also received an ARRA stimulus grant to collect data on histone modifications.
“The idea is to link epigenetic changes initially to cognitive phenotype and then to psychological and experiential factors,” Bennett says. “Subsequently, we’ll be able to bring in the genome-wide data, because the effects may vary by genetics, kind of in the background.”
“Putting all these data together is not straightforward,” Bennett says. “In the grant, we’re doing our best to describe our approach. But I think it’s really unclear until the data are there. Certainly, it’s so new for data from human brains.”
Studies in mice show that epigenetic changes can be passed down through multiple generations. Agouti mice carry a mutated gene that gives them a yellow coat and a propensity for diabetes and obesity. But when pregnant Agouti mice are fed extra methionine and folic acid—nutrients involved in DNA methylation—their children turn out brown, lean, and healthy; they still carry the defective gene, but it has been silenced through methylation. When these offspring reproduce, they pass the silenced gene to their children regardless of their diets; thus, the grandmother’s diet determines the grandchildren’s phenotypes.
Heritable epigenetic changes can occur at other times in the life cycle as well, not just during fetal development. In a 2009 paper in the Journal of Neuroscience, when adolescent mice with a genetic defect in memory were exposed to an enriched environment (with novel objects, social interaction, and voluntary exercise), their memories improved. “Our original observation was that an enriched environment can overcome the biochemistry of a genetic defect by opening up a new signaling pathway,” says Larry Feig, PhD, professor of biochemistry and neuroscience at Tufts University School of Medicine, who led the research. Unexpectedly, when the enriched mice reproduced (females only), their children also had improved memories even though they were never exposed to the stimulating environment. This was true even when the children were raised by foster moms with poor memories.
“It was a surprise. It wasn’t an area of research that we were actively working on,” he says. “But when I went back to the literature and looked more carefully, there were growing examples of epigenetic transgenerational inheritance. So it wasn’t as farfetched as my initial thoughts.”
Feig’s team has not yet identified the epigenetic change responsible. “So this is an example of epigenetics by all assumptions, but we don’t have any details yet to pin it down,” he says. They are collaborating with researchers at the Broad Institute on epigenome-wide studies to search for the relevant changes. Feig also received an NIH Challenge Grant to study whether adolescent mice exposed to negative experiences, such as smoking and stress, also undergo heritable epigenetic changes.
Epigenetic inheritance is more difficult to study in people, but recent studies suggest that it does occur. In a series of studies from Europe, researchers collected multi-generational data from an isolated community in Northern Sweden that experienced alternating periods of plenty and famine throughout the 19th century. They showed that the paternal grandsons of men who were exposed to periods of overabundant food before puberty (a key stage for sperm development) were at increased risk of diabetes and early death. The paternal granddaughters of women who were exposed to abundant food in utero or during infancy (key stages for egg development) had increased mortality. The pattern of inheritance suggests that the epigenetic effect may be occurring on sex-linked genes, though the epigenetic mechanism has yet to be definitively proven.
If Lamarckian inheritance turns out to be a real phenomenon in people, this will be both an empowering and daunting shift in how we think about evolution and the destiny of our descendants. As scientists continue to probe the largely unexplored territory of the epigenome, this promises to be just one of many surprises.