Network Biology: Converging on Answers to Complex Diseases
Network biology is allowing scientists to convert their cellular parts lists into insights about complex diseases
To parents, the symptoms of autism can seem to appear from out of the blue during a child's first few years of life. But in recent years, researchers have shown that genes involved in the disorder likely affect neurodevelopment in the fetal brain. Among other implications, the results suggest that autism develops in utero, and is not due to exposures after birth.
To reach these conclusions, researchers needed more than the list of 65 strongly implicated autism-associated genes identified by genetics researchers. They needed an understanding of how the genes wire together into biological pathways that manifest as autistic traits—an understanding that emerged from an approach called network biology.
The story is similar for other complex diseases such as diabetes and Parkinson’s disease: Lists of disease-associated genes are growing rapidly thanks to advances in sequencing technology, and network biologists are manipulating those lists to identify the larger biological pathways that malfunction in disease.
Having the gene list is only the first step, says Trey Ideker, PhD, professor of medicine and bioengineering at the University of California, San Diego. It’s akin to having the parts list for an IKEA piece of furniture without the rest of the assembly manual. “Diseases involve networks of genes; and you have to map those networks if you’re going to understand those diseases,” says Ideker, whose network analysis tool, Cytoscape, has been cited more than 12,000 times. “What network biologists are trying to do is apply systematic approaches to map this wiring diagram.”
Network biologists draw graphs that are essentially webs of biological relationships where the nodes are entities such as genes, proteins, or even patients, and the links between two nodes (called edges) represent specific interactions. For example, in a protein-protein interaction network, two proteins are connected by an edge if they are known to physically interact. By applying statistical and mathematical algorithms to these graphs, scientists are able to gain insights—such as identifying sets of genes that work more closely with each other than with other genes in the network, and thus may participate in the same biological pathway.
Network graphs often involve so many criss-crossing edges that they are referred to as hairballs—an indication that they can seem, to the uninitiated, nearly impossible to interpret. Add to this the fact that “interactomes” change in different cell types, tissues, and developmental phases, and you begin to get a picture of just how complicated this field is.
“In terms of network biology, we’re approximately where genomics was in the late 1980s,” Ideker says. Still, there’s been considerable progress in recent years. Scientists have now mapped large-scale—if incomplete—networks for numerous organisms including humans; and network approaches have been at the heart of recent breakthroughs in complex diseases, including autism.
Stunningly, several autism research groups have used network biology to independently arrive at similar conclusions. “What’s exciting is that multiple groups have used multiple different approaches to try and identify convergence among the set of genes that have been implicated in autism, and they’re all coming up with very consistent findings,” says Jeremy Willsey, PhD, assistant professor of psychiatry at the University of California, San Francisco.
Network biology encompasses a wide range of network types, including many based on physical interactions between and among cellular components (e.g., co-expression networks, genetic interaction networks, metabolic networks, protein-protein interaction networks, protein-DNA interaction networks, and protein-RNA interaction networks) as well as others based on similarity among patients or diseases. Each of these offers distinct biological clues that may help scientists transform their cellular parts list into insights about complex diseases. This article describes progress in a subset of these and looks at attempts to integrate the different types of networks.
Co-expression Networks: Genes Working Together in Autism
In a co-expression network, genes are linked if their expression levels are highly correlated. “If genes are truly operating in the same pathways we would expect them to be turned on or off at the same time,” Willsey explains. Co-expression networks boast some of the earliest and most striking success stories in network biology, likely because the requisite data—microarray or mRNA-seq data—are relatively cheap to generate and readily available from existing studies.
In a 2013 paper in Cell, Willsey and colleagues leveraged co-expression networks to gain a foothold into the biology of autism. They identified nine high-confidence autism risk genes (greater than 97 percent chance of involvement) and 122 probable autism genes (greater than 50 percent chance of involvement). “We then asked the question: Is there a particular point in brain development and a particular region of the brain where the genes are most highly co-expressed, which may indicate that this is a relevant point in development for pathogenesis?” Willsey says.
The team obtained genome-wide expression data from 13 developmental stages and four groups of human brain regions from the BrainSpan database. They created 52 co-expression networks by linking each of the nine high-confidence autism genes to its top-20 co-expression partners (out of nearly 17,000 genes) for a given developmental stage and brain region. They reasoned that if a particular network was relevant to autism it should be enriched in probable autism genes as well. Indeed, unexpectedly high numbers of probable autism genes popped up in two networks, both involved in mid-fetal development. “We can see that this enrichment is very specific to particular time periods and brain regions, namely mid-fetal development in the prefrontal cortex,” Willsey says.
They further localized the relevant co-expression to glutamatergic neurons residing within deep layers of the prefrontal cortex. In separate work, their group and others have also consulted the gene ontology for clues about biological function. “Every gene has a bunch of tags and we just looked for overrepresentation of those tags,” Willsey says. Two key themes emerged: synapse biology and transcriptional regulation. “This was a very exciting moment for the field because we were able to go from genes to a specific hypothesis about pathogenesis,” Willsey says.
In the same issue of Cell, a second, independent group of researchers—led by Daniel Geschwind, MD, PhD, professor of neurology and of psychiatry and biobehavioral sciences at the University of California, Los Angeles—reported strikingly similar results. “They saw similar convergence in mid-fetal development in the prefrontal cortex. They also observed enrichment in the same glutamatergic neurons,” Willsey says.
Geschwind’s team first built co-expression networks that were agnostic to autism risk genes: They used BrainSpan data to discover 12 “co-expression modules”—waves of highly co-expressed genes—that characterize normal fetal and infant brain development. They then looked for enrichment of suspected autism genes within these modules. Autism genes were over-represented in two modules involved in transcriptional regulation and three modules involved in synapse formation during fetal development. And the genes were highly expressed in glutamatergic neurons in the prefrontal cortex. A separate set of risk genes for intellectual disability were not enriched in any of the 12 co-expression modules, suggesting that autism is biologically distinct from intellectual disability.
“I think for the field it was very nice to see two different approaches come to a very similar conclusion in terms of pathogenesis,” Willsey says. Other groups have since arrived at similar conclusions using network biology approaches. “For me that’s particularly exciting, because historically in psychiatric disorders there has been a lack of agreement in the field about a lot of different aspects of biology,” Willsey says.
Co-expression networks can also be used to implicate additional disease genes. “We can improve gene discovery by essentially using a guilt-by-association method,” Willsey says. His collaborators at Carnegie Mellon University (Kathryn Roeder, PhD) and University of Pittsburgh (Bernie Devlin, PhD) developed an algorithm, Detecting Association with Networks (DAWN), that identifies hot spots within co-expression networks—areas where multiple autism risk genes cluster together. Genes that reside in these hot zones are automatically suspect, even if they’ve never been implicated before. “Genes that may not have had enough genetic evidence for association get their scores strengthened if they’re highly co-expressed with strongly associated genes,” Willsey says. DAWN could be applied to other complex diseases as well, he says.
Genetic Interaction Networks: Interacting Double Mutants in Cancer and Parkinson’s Disease
In a genetic interaction network, two genes are linked if a mutation in one alters the effect of a mutation in the other. For example, mutating a gene in either the BRCA DNA repair pathway (a pathway implicated in breast cancer) or the PARP DNA repair pathway alone is insufficient to kill the cell; but hitting both at once is lethal. The cancer drug olaparib exploits this so-called “synthetic lethality” by disabling the PARP pathway in cancer cells that already have a BRCA mutation. Mapping genetic interaction networks is costly and time consuming, so we can’t yet approach genome-wide coverage. “Currently it’s about 100 genes that are able to be interrogated by mere mortals,” Ideker says. But genetic interaction networks offer more immediately actionable insights than co-expression networks, such as suggesting novel drug targets. So Ideker set out to make the process cheaper and less time consuming.
In a 2017 paper in Nature Methods, Ideker’s team introduced “combinatorial CRISPR-Cas9” for genetic interaction mapping: They used the gene-editing tool CRISPR-Cas9 to knock out single and pairs of genes in high throughput. As a proof of principle, they systematically mutated 73 cancer genes—tumor suppressor genes and cancer-relevant drug targets—one at a time and in all pair-wise combinations in three human cancer cell lines: cervical, lung, and kidney. Two genes interact if their double mutant grows faster or slower than their single mutants would predict.
Ideker’s team identified numerous interactions, including 152 synthetic-lethal combinations. Most of these were novel, though some were already known; for example, they rediscovered the BRAC-PARP lethality targeted by olaparib. When they tested eight novel synthetic-lethal combinations by simultaneously drugging both genes, six were experimentally validated.
“Ultimately you’d love to be able to take all 30,000 genes and look at how twiddling all pairs of 30,000 genes affects the function of cells. But that’s still too big of an experiment to do, at least with our current state of technology,” Ideker says. But, he says, “What’s nice about these CRISPR studies is that the speed and coverage of that interaction map is directly coupled to the speed and cost of DNA sequencing. So, if that continues to fall, then it’s going to pull the interaction mapping along with it.”
Besides experimental advances, computational advances are also helping scientists make headway in genetic interaction mapping. For example, a new algorithm called TransposeNet, introduced in a 2017 paper in Cell Systems, leverages network information from model organisms such as yeast to help build human networks. TransposeNet was developed by Vikram Khurana, MD, PhD, assistant professor of neurology at Harvard Medical School and principal faculty at the Harvard Stem Cell Institute, working with a team of computational biologists led by Bonnie Berger, PhD, Simons Professor of Applied Mathematics and Computer Science at MIT, and including Jian Peng, PhD (a postdoc at the time).
Khurana studies neurodegenerative disorders in yeast cells and neurons derived from patient stem cells. Yeast may not have brains, but they exhibit critical eukaryotic biology found in specialized cells like neurons, especially when it comes to ancient problems like protein misfolding. In Parkinson’s disease, for example, the protein α-synuclein forms clumps (called Lewy bodies) in dopamine neurons. If yeast cells are forced to express α-synuclein—which is not native to yeast—the protein forms toxic clumps similar to those found in Parkinson’s disease. Berger and Khurana’s team screened the entire yeast genome to identify genes that interact with α-synuclein, either ratcheting up or ratcheting down its toxicity. They came up with 332 genes, which they then assembled into a biological network. Because the yeast genome has been so well studied, they were able to connect genes based on multiple types of relationships—including genetic interactions and physical ones (for example, protein-protein interactions).
Berger and Khurana then created a “humanized” version of this yeast network using TransposeNet. If you simply convert the 332 yeast genes to human genes using homology mapping, “you fall flat on your face,” Khurana says, because there is a dearth of available information about how the human genes interact. TransposeNet solves this issue by using the wiring diagram from yeast to help fill in the wiring diagram for humans. “What we said is if those interactions are conserved and we don’t have any of those interactions in humans yet, why don’t we use our yeast-to-human algorithm to not just convert a list of yeast genes into a list of human genes, but actually take the entire genetic network in yeast and convert that into the human proteomic space?” Khurana explains.
TransposeNet relies heavily on the SteinerNet algorithm developed by Ernest Fraenkel, PhD, professor of biological engineering at MIT. This algorithm optimizes network building by prioritizing the most relevant interactions, including pulling in new genes if needed. “We used his algorithm to not just make a network between genes that were already in our list but actually to be able to add genes in, especially if it solves the network in the most efficient way possible,” Khurana says.
Remarkably, TransposeNet pulled into the network many known Parkinson’s risk genes that don’t have homologs in yeast, including the gene for α-synuclein itself. “It was really cool that the algorithm was able to reintroduce the protein of interest and other human proteins that are critically important for Parkinson’s disease; we never told the network to do that,” Khurana says.
Not surprisingly, the human network was enriched for genes involved in protein trafficking, which is known to go awry in Parkinson’s. Unexpectedly, the network was also enriched for RNA binding proteins, which have previously been implicated in ALS but never in Parkinson’s disease. RNA binding proteins orchestrate protein translation. “We’re very excited about uncovering this new axis of biology,” Khurana says. Indeed, when Khurana’s team grew neurons in a dish from Parkinson’s disease patients with α-synuclein mutations, they found that the neurons had abnormally low protein translation. Moreover, they could reverse this defect by increasing the expression of two genes found to suppress α-synuclein in the original yeast screen.
“Vik Khurana validated these findings in human stem cells, which is unbelievable,” Berger says. “It was a real joint effort between the computational biologists, the biologists, and physicians, which I think is really nice. This is how we’re going to get the best translation to therapeutics in my opinion.”
The network is also being used to discover new Parkinson’s disease risk genes. “We believe that there are absolutely going to be additional genes in our network that are contributing to or causing disease,” Khurana says. They are now sequencing exomes from patients with Parkinson’s disease to look for mutations in network genes.
Protein-Protein Interaction Networks: Proteins Teaming Up in Parkinson’s Disease
Two proteins that physically interact likely participate in the same biological pathway, so mapping protein-protein interactions can give direct insights into disease. Technological advancements are increasing the pace of mapping; and scientists are approaching genome-wide coverage in at least one human cell at one time point.
One of the largest efforts to map the human protein interactome is BioPlex, led by Wade Harper, PhD, and Steve Gygi, PhD, professors of cell biology at Harvard Medical School. Harper and Gygi have developed a high-throughput pipeline that uses affinity purification-mass spectrometry (AP-MS). In AP-MS, scientists insert an affinity tag into their protein of interest. This tag then binds to a matching bead, with which the protein can be fished out of a cell along with all its binding partners—which are then identified by mass spectrometry. Although AP-MS can enrich for the protein of interest and its interacting partners, there are also background proteins that associate non-specifically with affinity beads and often dominate the proteins identified by mass spectrometry. So, the Harper and Gygi groups developed software, CompPASS-Plus, that uses a naive Bayes classifier to help distinguish high-confidence interacting partners from background interactions.
Working at a rate of about 500 proteins per month, this group has now mapped about 10,000 proteins and 120,000 interactions from one human cell line. They have released data for about 7,500 proteins and 50,000 interactions in BioPlex 2.0 (http://bioplex.hms.harvard.edu), which was described in a 2017 paper in Nature. People are already using the data, Harper says. “There are several examples where people have taken the network, identified novel interactions, and then made a biological discovery.”
Membrane proteins represent a challenge for high-throughput approaches, as individual complexes require different detergents for proper extraction. Thus, for membrane proteins, the Harper/Gygi team is turning to a new technology called APEX, developed by Alice Ting, PhD, professor of genetics, biology, and chemistry at Stanford University. Scientists insert an APEX tag into their protein of interest; when activated, APEX “spray paints” everything in the immediate vicinity (within a few nanometers) with biotin, a small molecule that is widely used to couple proteins, Ting says. These biotin-labeled proteins can be pulled out of the cell with high specificity, reducing false positives. Since the protein complex doesn’t have to remain intact, this also reduces false negatives.
Another advantage of APEX is that it can be used to resolve protein interaction networks in space and time. “People tend to view protein complexes as static entities, but, in reality, they’re not,” Harper says. APEX is incredibly quick—labeling takes as little as 20 seconds—making it possible to map protein-protein interactions dynamically. “With APEX, you can monitor the changes in interaction partners over the time course of a biological process, and get a dynamic picture of what’s going on,” Harper says. For example, in a 2017 paper in Cell, Gygi and colleagues used APEX to map the rapidly changing protein interaction networks of G-protein-coupled receptors following ligand binding.
APEX was central to a second paper that Khurana and colleagues published in Cell Systems in 2017 in which they revealed α-synuclein’s protein interaction network (the paper was published as a companion paper to the genetic interaction paper). Using APEX, they detected 225 proteins that reside in tight proximity to α-synuclein in rat neurons. Remarkably, the resulting protein interaction network converged on the same Parkinson’s risk genes and cellular processes as the genetic interaction network created by TransposeNet. Both highlighted protein trafficking and RNA binding proteins. “One of the really cool things is that the core proteins that had originated in yeast in that other paper were actually interacting with α-synuclein,” Khurana says. “So there was this deep relationship between where these proteins are in a cell and the mechanisms through which they exert their toxic effects.”
Patient Similarity Networks: Connecting Shared Phenotypes in Diabetes and Cancer
Biological networks aren’t limited to the molecular level. Researchers are also building networks at the patient and disease levels. In a patient similarity network, two patients are linked if they share phenotypic similarities. Patient similarity networks can reveal subtypes of disease with similar biological underpinnings, which may lead to tailored treatments.
In a 2015 paper in Science Translational Medicine, researchers from the Icahn School of Medicine at Mount Sinai used patient similarity networks to identify three subtypes of type II diabetes. Joel Dudley, PhD, associate professor of genetics and genomic sciences at Icahn, and his colleagues culled 73 objective clinical measures—such as height, weight, and blood panels—from the electronic medical records of 2,551 diabetes patients. They used a dimensionality reduction technique to compress these 73 variables into a more succinct representation; and then clustered patients based on similarity.
The resulting network had three distinct clusters of patients, which the researchers then attempted to characterize. They compared the clusters’ clinical characteristics, as well as their comorbidities and genotypes—factors held out of network building. Patients in subtype one were characterized by classic type II diabetes features—including obesity, high blood sugar, and kidney and eye disease. Patients in subtype two were thinner, had more cancer and cardiovascular disease, and had polymorphisms in immune genes. “I call them skinny immune diabetics,” Dudley says. “They appear to have an immune- or inflammatory-driven diabetes that differs from the classical metabolic dysfunction.” Patients in subtype three had high levels of mental illness as well as cardiovascular disease. Psychiatric medications are known to increase the risk of diabetes, and could partly explain this subtype.
When Dudley presented these data to physicians, the subtypes resonated with them. “Hindsight’s 20-20, but when we showed them this they were like ‘Oh, I know that type of patient,’” Dudley says. “They said it over and over again.”
One limitation is that the data capture just one snapshot in time. “I think a lot of the networks we’ve built today are just glimpses into the real networks in our bodies,” Dudley says. To get a fuller picture, Dudley’s team is working on building networks using longitudinal patient data. “What’s becoming clear is that networks are super dynamic, super ephemeral. And we need lots of different perturbations to really understand the actual logic and topology of the networks,” he says.
When building patient similarity networks, most scientists cluster patients first and then look at the underlying characteristics of the clusters. But Teresa Przytycka, PhD, a senior investigator in the Computational Biology Branch of the National Center for Biotechnology Information at the NIH, has taken a different tactic. Her team uses a probabilistic algorithm that builds the patient network using phenotypic similarity data (e.g., gene expression or survival data) and genetic features (e.g., single-nucleotide polymorphisms or copy-number variations) simultaneously. This way, patients are not forced into a single subtype but are allowed to reflect a mixture of subtypes.
The algorithm borrows from topic modeling approaches in text mining. Topic modeling attempts to classify documents into overarching topics—say sports, politics, culture, and science—by analyzing the set of words in each document. Similarly, Przytycka’s team attempts to classify patients into subtypes by analyzing the set of genetic perturbations in each patient. Just as a document may be classified as both politics and science, a patient may also be classified as a mix of subtypes. “This is an example of taking a method from one field and applying it to another field. We can take software that experts have been working on for years and then repurpose it for molecular biology questions,” Przytycka says. When applied to glioblastoma, her team found that the disease has only three, not four, subtypes; what had previously been deemed a fourth subtype was in fact just a mixture of two other subtypes.
Integrated Networks: The Big Prize
In isolation, each network type gives just one piece of the puzzle. The “big prize” in network biology will be to integrate them, says Ruedi Aebersold, PhD, professor of systems biology in the department of biology at Swiss Federal Institute of Technology in Zurich (ETH Zurich). “They’re all views on the same cell from a different angle, so we’d like to be able to integrate various types of networks into a more global model.” It’s an enormous challenge; and solving it will require computational biologists and experimental biologists to work together, he says.
In a paper in Science in 2016, Aebersold’s team (in collaboration with the group of Johan Auwerx, MD, PhD, from Ecole polytechnique fédérale de Lausanne [EPFL]) presented an analytic pipeline to measure and combine five layers of network data—genetic, expression, metabolic, protein, and phenotypic. They measured about 25,000 transcripts, 2600 proteins, and 1000 metabolites from 40 different strains of mice with well-characterized genetics. The project team fed the mice a high-fat or low-fat diet; and then measured phenotypic responses such as weight change, glucose tolerance, and the presence of fatty liver disease. The researchers then correlated elements across different network types. For example, they measured the correlation between a gene’s RNA expression level and the levels of the protein translated from that RNA—and surprisingly, these weren’t always tightly linked. They assembled all the data into a single network where the nodes were genes, transcripts, proteins, metabolites, or phenotypes, and the edges were the correlations between these. The resulting network gave novel insights into the role several proteins play in metabolizing fat.
With Peng, Berger has also built a computational pipeline for combining networks called MashUp. The connectivity pattern of each node in a network is incredibly complicated—but this information can be compressed into a simpler representation in the same way that Google’s PageRank condenses a website’s connectivity patterns into a simple ranking. MashUp extracts this information from multiple networks for each gene and then integrates it into one measure of global connectivity that informs how the gene relates to other genes in the networks. “We generate compact representations for the topology of each node in its network and then integrate that using off-the-shelf machine learning methods,” Berger explains.
In a 2016 paper in Cell Systems, Berger’s team showed that information extracted and combined with MashUp can be used for tasks such as automated gene function annotation with substantial improvements over state-of-the-art methods. “I think this paper goes a long way to solving the issue of how to integrate multiple network topologies,” Berger says.
Toward Precision Medicine
If you can connect all the networks into one integrated picture of a complex disease, that’s a first step toward using them to provide medical care that’s tailored to individual patients. “We’re all working toward precision medicine,” Ideker says.
He envisions a “clinic of the future” where a computer simulates diseased cell or tissue, informed by all the available interaction data. “You would load a patient’s particular mutations and environmental conditions onto that model and you would compute the drug that best returns that model to its normal state,” Ideker explains. His lab is already working on such a model for a cancer cell. “Of course, we’re not going to get there tomorrow. But it’s important to have the vision,” Ideker says. “We’re going after this vision already. Time will tell how fast we can push it.”