Bringing the Fruits of Computation to Bear on Human Health: It’s a Tough Job but the NIH Has to Do It
The National Institutes of Health are on a mission: To understand and tackle the problems of human health. To make that daunting problem approachable, 15 of the 20 institutes divvy up human health problems by body part (eye, teeth, heart/lung, etc.) or disease type (infectious diseases, cancer, neurological disease, mental health, etc.).
These so-called categorical institutes, driven as they are by a desire to understand their chunk of the health puzzle, invest in computational biology research almost incidentally. “Many institute people might say: We want to fund good science and if it happens to require computation then we’ll fund computation,” says Karin Remington, PhD, director of the Center for Bioinformatics and Computational Biology within the National Institute for General Medical Sciences (NIGMS).
But because computation provides tools that can be useful in many categories of biology and medicine, a large portion of the computational research portfolio (how much is actually unclear) is funded by the non-categorical institutes and centers. These include the National Library of Medicine (NLM), the National Human Genome Research Institute (NHGRI), NIGMS, and the National Institute of Biomedical Imaging and Bioengineering NIBIB, and the National Center for Research Resources (NCRR).
How can the NIH ensure that its investment in computational resources across all the institutes—categorical and non-categorical—really serves the NIH mission? “We have to do a better job of connecting the R01 investigator—the bread and butter investigator of the NIH in general—with these computational approaches,” says Dan Gallahan, PhD, deputy director for the Division of Cancer Biology and a program officer at the National Cancer Institute (NCI).
Here, we’ve interviewed a small group of NIH staff people who are immersed in computational research: They’re all part of the computational choir, if you will. Their perspective provides, we hope, an interesting peek into the way the NIH as an institution thinks about how to bring the fruits of computation home to the categorical sciences.
It will be a tough job, they say, requiring that computational biologists reach across institute boundaries as well as discipline boundaries. For its part, the NIH must facilitate such inter-disciplinary cooperation and find better ways to coordinate computational research across its institutes while avoiding duplication of effort without stifling innovation and leading the development of common approaches, including common vocabularies and common data repositories. And, say some, there should be a far greater investment in computational resources to deal with the flood of high-throughput data.
In the end, when these challenges are met, it will have been worth the effort, says Gallahan. “If you can give researchers a bioinformatics tool that will allow them to replace an animal model or allow them to assay something in multi-dimensional research rather then on just one parameter, then that’s going to help everybody.”
BRIDGE THE CULTURE GAP
Right now, computational and biomedical research travel largely on uncoordinated parallel tracks. On the one hand, many biomedical scientists don’t understand the ways that computation could potentially help their research, so they don’t know what to ask of computational scientists. On the other hand, computational scientists aren’t exactly sitting around waiting for the biomedical researchers to brainstorm good questions. They have their own research aims.
“People might make an algorithmic advance that will eventually have some impact in biomedical research but it’s not a coordinated effort,” Remington says. The two fields speak different languages, “so it’s really tough to translate state of the art developments in computer science and math into things that will be useful in biomedical research.”
So the question for the NIH is how to leap over this sociological barrier. Of course there are a few people who do both computation and biomedical research. The National Centers for Biomedical Computing (NCBCs) are rich with people who do both, Remington says. “Immersed in NCBC-land, you get a different perspective. But NCBC-land is a very biased and blessed community. It’s a good model to follow, but it’s not the way that most of our community works.”
Ideally, says Remington, the NCBCs could serve as a prototype for the kind of environment that the NIH wants to build, where people are talking together in a common language. “What we want to do is make it more of the standard operating procedure, that the experimentalists will be able to communicate what they need to the math and computer science people and really forge relationships and communication structures to advance things in a more coordinated way,” Remington says. “We could really accelerate progress in our basic biology research efforts if we could drive the computer algorithm development to fit the needs of these science areas more directly.”
Another way to bridge the culture gap is to train the next generation of scientists in multiple disciplines. Right now, it’s hard to find people with the mathematical skills necessary to support NLM’s computational projects, says Michael Ackerman, PhD, assistant director for High Performance Computing and Communications at the NLM. “Mathematicians end up in the area of biocomputation sideways—for all the right reasons; but I’m not sure if we have a program that sponsors biomathematics,” he says. The NIH could initiate training programs in cross-disciplines like biomathematics to increase the pool of people who can cut across the divide, proposes Ackerman.
Cross-training is not just about exposing mathematicians and computer scientists to biology. It also goes the other way—clinicians and biologists need to understand and be comfortable with the technical side. Towards that end, the NIH has established a program that exposes medical residents and clinical physicians to biomedical engineering research for a year or two. “It keeps it real, to have clinicians interacting closely with the biomedical imaging and bioengineering research groups,” says Zohara Cohen, PhD, a program director with the National Institute of Biomedical Imaging & Bioengineering.. While not exclusively computational, Cohen points out, “It involves computation in that a lot of our grantees are doing computational work.”
There’s also another bridge that needs to be crossed, says Jennie Larkin, PhD, a program officer with the National Heart, Lung and Blood Institute (NHLBI): the bridge to the open source movement. If tools are freely available, researchers funded by categorical institutes will be more likely to make use of them. Thus, says Larkin, “the NIH needs better clarity about how to support people who are trying to develop and support freely available open-access computing tools and resources.” Since this sort of open-source model is accepted in other areas of computer science and self-sustaining (albeit through different mechanisms), Larkin says it’s time for the NIH to think through ways to achieve the same.
LEARN FROM SUCCESSES
To date, successful efforts to bring computation to the categorical sciences have flowed mostly from the many large centers that have received and continue to receive NIH support, say the NIH staffers interviewed.
“These Center programs are a good approach,” Gallahan says. “I would extend that to make it more diverse and have more of them.
Michael Marron, PhD, director of the Division of Biomedical Technology at the NCRR, agrees: There’s a need for much greater investment in infrastructure and enabling technologies, and large centers are a great way to achieve that.
Marron points to the Biomedical Technology Research Resources (BTRRs) as a long-standing example. For fifty years, the NCRR has invested in these centers, a subset of which is devoted exclusively to Informatics Resources. The “Resources,” as they’re commonly dubbed, develop and disseminate a wide range of software for the biomedical community, including—to mention just a few—software for molecular dynamics (such as the widely-used AMBER and CHARMM), visualization (such as the popular VMD), and genetic epidemiology (SAGE). And new centers are still being created, including one established at UCSD in 2008 to develop computational ways to analyze mass spectrometry data.
Significantly, Marron says, the BTRRs are evaluated and renewed based on whether their products are disseminated and adopted for biomedical research. “The important thing is to get the science done,” Marron says. Many of the Resources offer help-desks as well as software training—not only for students, but for senior biomedical researchers as well. And their principal investigators attend a broad range of conferences—including categorical science conferences—to spread the word about their tools.
When the NIH Roadmap came out, many at NCRR saw that as an affirmation of what they’d been doing for a long time. “The National Centers for Biomedical Computing (NCBCs) are very similar to the [BTRRs] that we’ve funded for years and that’s of course by design—because it’s a model that works,” Marron says. “The NCBC grants complement the BTRRs. And together, they represent the most coordinated NIH activity for support of computation in medicine.”
The NCBCs promote collaborations between computational researchers and biologists by focusing the computational research around specific driving biological problems (DBPs). The NCBC program is a national network that’s envisioned as having “both hubs and spokes,” says Zohara Cohen, PhD, a program director with the National Institute of Biomedical Imaging & Bioengineering. People from outside the NCBC communities can draw on the resources created at these hub centers, expanding the community.
To further that aim, the NIH established a program to fund collaborating R01s—individual investigator-driven research projects that would collaborate with the NCBCs to develop tools with a specific biomedical focus. As a result, the NCBCs have succeeded in connecting up with a variety of collaborating R01s, many of which are funded by categorical institutes. “This is a great model for the future,” Cohen says.
At the NCI, a similar program—the Collaborative Research in Integrative Cancer Biology and the Tumor Microenvironments Program—was modeled on the NCBC collaborative R01s. It’s just starting now, and mandates that individual computational biology investigators within both the Integrative Cancer Biology Program and the Tumor Microenvironment Network program collaborate with people who are not in those groups—ensuring that the program expands to new people and communities. “That again is an example of an active program trying to bring the rest of the community in,” Gallahan says. “That’s exactly what it’s designed to do.”
The NIH is also learning to structure program announcements to bring together computational and biomedical researchers. For example, a program for Collaborative Research in Computational Neuroscience (CRCNS, for which the National Science Foundation serves as lead, with NIH participation) mandates participation by key personnel from both computation and neuroscience. Likewise the Physiome program announcement required leadership representation from both the modeling and biomedicine communities. The Bioengineering Research Partnership (BRP) program establishes interdisciplinary partnerships between people from both biomedicine and engineering, with the aim that they create a deliverable for the biomedical research community within a ten-year time frame. And each of the 34 Clinical Translational Science Awards, which are geared toward remaking the clinical research enterprise, includes a bioinformatics focus.
The Biomedical Information Resource Network (BIRN) and the Cancer Biology Grid (caBIG) serve up a different model of connecting computation and biomedicine—by providing biologists and physicians with platforms that allow them to share data and tools.
The BIRN involved categorical sciences from the get-go, says Michael Huerta, PhD, director of the National Database for Autism Research and associate director at the National Institute of Mental Health (NIMH). NCRR launched the BIRN in 2001 to develop a national infrastructure for biomedical research using neuroscience as a test-bed. When NCRR was just starting to put BIRN together, Huerta says, the institute’s leadership engaged people in the categorical institutes to find out what the biomedical research community needed. “BIRN has grown but always with the categorical institutes kept posted, invited to meetings, invited to the review of grant applications and so forth,” Huerta says. “That proactive effort has transformed BIRN from a good idea to an infrastructure that is increasingly important to neuroscience.”
BIRN confederates data and tools so that users can access them from across the network regardless of where they are stored or housed. “And it does so in an invisible way,” Huerta says. “You don’t necessarily know where you’re getting things from.” Nowadays, with the BIRN platform reaching production mode, NCRR is very interested in expanding BIRN to other domains, Huerta adds. “I’m sure they’d be delighted if folks doing diabetes or heart/lung research would start to increasingly use BIRN.”
The NCI’s caBIG platform is similar, with a set of standardized rules and a common vocabulary for applications, tools, and data shared through its infrastructure. Both BIRN and caBIG were launched within specific research communities (neuroscience and cancer, respectively) but have potentially broader applicability—and may eventually link up to one another. “As time has gone on, these two platforms are getting closer to each other,” Huerta says, “so that before too long, I think in the next generation, you’ll be able to work across them.”
INCREASE FUNDING FOR COMPUTATION
Despite all the great work being done by the various centers, Marron says, there simply isn’t enough money devoted to this area. “We could easily fund 2-3 times what we do and still not be exhausting high quality areas,” he says.
The NIH, Marron says, is really behind in spending on computation and informatics compared to the National Science Foundation, the Department of Energy, and many pharmaceutical companies. According to Marron’s best guess, less than two to four percent of the NIH budget goes to computation and bioinformatics grants. “It’s peanuts,” he says. He thinks the investment should be closer to 25 percent. “There are many challenging computational areas where we could see rapid advances if we could capture the computational tools to do it.”
Specifically, Marron favors much more funding for the development of enabling activities like software and the infrastructure. He’d like to see more efforts like BIRN, that create processes, protocols, sharing agreements and middleware, “all the stuff that makes formation of a virtual organization barrier-free.” And the NCBCs, he says, are a good start. “We would like to fund more and I think there should be more.”
In addition, Marron believes the NIH should invest more to ensure the effective use of massive amounts of data, Marron says. “The vast amount of data collected today will never be viewed by humans, so you have to have tools to do this,” he says. And that raises huge questions of quality control, “How do you know what you’re even looking at?” he asks. “You need to have that built into your software tools so it’s not just garbage-in/garbage-out.
Marron would like to see the NIH invest in new ways to build databases; discover data; visualize data; and analyze data. “We haven’t begun to firm up tools for that. We’re almost still at the level of spreadsheets.” Just finding usable data is nearly impossible. “We should be supporting the development of machine-readable registries of data, so your machine can find it,” he says. And then there’s the problem of combining data of various sorts (gene expression, proteomics and tissue mapping, for example) to come up with meaningful analyses. “It’s clear that getting a handle on the etiology of disease is a multi-dimensional problem,” he says.
And more computer scientists need to be engaged with the NIH, Marron says. “If we support 100-200 computer scientist awards at the NIH I’d be surprised. We should be supporting thousands, from natural language, to database and networking experts,” he says, all with a devotion to improving biomedical research, of course. Plus, there’s a need for computation associated with new experimental tools, such as analyzing fluid dynamics for advanced ultracentrifugation techniques; or applying radar-imaging techniques to high frequency MRI.
Marron also points to a lack of computer expertise in the biomedical research community at large compared to, say, the physics or astronomy communities. Clinicians and clinical institutions are particularly skittish about things like choosing a computer system or accidentally leaking out private data, which can cause them a huge amount of grief, he says. “All of this is more of an argument for why one needs to invest in Centers like the NCBCs—to provide centers of expertise so that everyone doesn’t need to develop on their own.”
AVOID DUPLICATION WHILE LETTING A THOUSAND FLOWERS BLOOM
The NIH in general does avoid funding redundant research, the interviewed program officers say. Even for computation, which is widely dispersed, the grant review process weeds out proposals that duplicate existing grants. And communication and coordination efforts ensure that where funded research has enough in common, NIH program officers will bring researchers together to make alliances or work together.
On the other hand, it can be hard to avoid duplication for computational pieces that are not the main focus of a grant, says Remington, “I hesitate to even think how much money NIH invests in software engineers on R01 grants to reproduce the same sorts of basic database dissemination Web site tools over and over again when really we could have one central repository and do that sort of thing easily as a service for the research community.” These infrastructure problems may eventually be a thing of the past as more people migrate to common platforms like BIRN or caBIG, Huerta says. But other kinds of duplicated effort remain that are tougher to tackle, such as redundancy in algorithms.
A few years ago, when the NLM asked people to rewrite their algorithms in a standard format to be archived and maintained by NLM, they got a surprising number of different algorithms that did the same thing, Ackerman says. Redundancy arises because people think they can do better than what already exists. “So you’re stuck with the redundancy to find out whether it can be done better,” he says. And it’s nearly impossible to discover why a person chooses one algorithm over another. “Is it better for one type of data than another? That’s a nut we’ve never cracked.”
The same problem exists at the NCI, Gallahan says. For example, a number of groups have independently developed microarray analysis programs. “The NCI Center for Bioinformatics and its director, Ken Buetow, really have had to come to grips with what to do with all of these programs that we’re supporting,” he says. It’s a daunting task partly because scientists by their nature want to explore things in their own ways and are wedded to their application and their own research area. “So that can spawn redundancy with an ‘I can do that better’ sort of attitude,” he says. “You have a lot of people pursuing different avenues, all with the best intentions.”
Larkin discovered the same phenomenon when she asked her NHLBI systems biology grantees whether they’d be interested in some way to facilitate sharing of code, software or models. Could they leapfrog off others’ work in order to go farther and faster? The response was mixed. In addition to intellectual property concerns, the researchers had another problem: Sometimes solutions are over-specified so that in fact it’s meaningless to share code. “Just because it works perfectly for one researcher doesn’t mean it will be helpful to anyone else,” she says. “There also may be more than one good solution to a problem. Sometimes it’s a real loss to reinvent the wheel because it’s a wasted effort. But other times you end up with a better wheel—a radial instead of a Conestoga wagon wheel, for example.”
One of the best ways to reduce duplication, Larkin says, is for the NIH to develop efficient ways to share software and other computational tools—which remain hard to find even when they are posted to the web. The NIH already supports a variety of such efforts.
The Biositemaps project launched by the NCBCs, has been discussed in this magazine before (see Winter 2008/09 Issue of BCR). “It might be a real lightweight, decentralized solution,” Larkin says. Many of the categorical institutes also have their own solutions. caBIG connects resources for NCI researchers. In the neuroscience arena there’s the Neuroimaging Informatics Tools and Resources Clearinghouse (NITRC), designed to facilitate the dissemination and adoption of neuroimaging informatics tools and resources; and the Neuroinformatics Framework (NIF) provides a concept-based query language for locating all types of neuroscience resources—including computational ones. And heart researchers can turn to the Cardiovascular Research Grid, while PhysioNet houses analysis tools for looking at medical time series data. There are also repositories for various categories of tools—for example, Simtk.org for physics-based models and simulations; or the ITK/VTK repository for visualization tools.
“So there are many sorts of solutions due to different cultures in different areas,” Larkin says. And like other duplicative efforts, we don’t yet know which ones are the best ones. “I’d be loath to restrict the solution set now,” Larkin says, “because we might guess wrong.”
Bioinformatics and biocomputation cut across all of the institutes’ research programs. This makes coordination among the institutes somewhat challenging. Over the years, a variety of informal coordinating groups have managed different trans-NIH programs involving computation. The Biomedical Information Science and Technology Initiative Consortium (BISTI) is perhaps the best-known and longest-lived. Launched in 2000, it brought together program officers from across all the institutes. And it developed the NCBC program.
“The NCBCs are our best example of how to do things together, but it’s a teeny-tiny example,” Remington says. “It needs to be taken up a notch…to achieve synergy in an area like computation that cuts across so many fields.”
What’s missing, as Remington sees it, is a data-driven, comprehensive strategy for coordinating computation across the NIH. To Remington, the problem has two components: The lack of information about the trans-NIH investment in computation; and the lack of a coordinating group vested with power to act strategically.
“We have precious little understanding of what our real investment is across the institutes,” Remington says. In February 2009, the NIH launched a database (called Research Condition and Disease Categorization) that, for the first time, allows trans-NIH portfolio analysis. This will help the NIH deliver mandatory reports to Congress about its investment in specific disease areas. “It stands to be a really big improvement in our relationship with the public,” Remington comments. But it’s not likely to help identify the trans-NIH investment in computation because, Remington says, it’s built largely on a biomedical vocabulary. “So to do an analysis on computational networks, the word networks will come into play but it won’t turn up as computer networks. It will give networks of genes or hospital networks,” she says. “The system is ill-tuned towards anything related to computation.”
Another portfolio-analysis tool called the Electronic Scientific Portfolio Assistant, or eSPA, which has been evolving within the National Institute of Allergy and Infectious Diseases, allows users to dynamically probe and poke things to see how money is being spent. In May of 2008, it was opened up to a pilot program involving 17 Institutes and Centers. According to Huerta, “NIH really is investing in giving us the informatics capabilities that we need to know what’s happening.” And though he hasn’t tried eSPA yet, Huerta has been told it’s really powerful. “In the future, this will empower program officers to know what’s going on beyond their own portfolio and beyond what they happen to hear about.”
Huerta and Remington both hope eSPA will prove useful for a trans-NIH computational portfolio analysis. “That’s the piece that has been missing from this BISTI consortium,” Remington says. “As functional as it has been over the years, it has really been unable to look across institutes in a real data-driven way, to analyze across the NIH where our investments are going.”
NIH also lacks a coordination effort vested with actual authority, says Remington. “BISTI is really more ad hoc than I think is called for given the need.” BISTI relies on voluntary participation by program officers at multiple institutes, and some institutes participate more than others. “It doesn’t have the same sort of strategic-planning capability as would be best-suited, I think, for moving us forward in this area,” Remington says.
Gallahan agrees. While trans-NIH programs like BISTI help communicate what’s going on among the various institutes, “they don’t have the same sort of gravitas of resources and public awareness as things that come from the Office of the Director, like the Roadmap, or even some specific programs at the NCI,” he says. Admittedly, he says, NCI is less dependent on BISTI, partially because its internal resources and overall scope allow it to frequently act independently. “There might be some benefit to more of a top-down review of computation with some power behind it. But where do you define the point of asking? Sometimes it’s at the Health and Human Services Department level, the NIH level, or we might think it’s at the NCI level.”
To Remington, the next logical progression from BISTI is to have a consortium, perhaps based in the Office of the Director, that strategizes carefully about where NIH investments are going and tries to leverage things that are clearly trans-NIH. “A group that leverages no-brainers for us to do together instead of funding over and over again the same thing, institute by institute,” she says. Of course each institute will have its own strategic plans and its own things they need to do. “But in a cross-cutting area like informatics and computation,” she says, “we could really leverage that effort better if we came together to develop a strategic plan that’s coordinated, that’s not institute by institute.”
The Roadmap was eye-opening for many at the NIH, Remington says. “The Institutes started to realize how much potential savings there could be in sharing intellectual capital and resources…. And bioinformatics and biocomputation are really a sweet spot of that potential.”
DEVELOP COMMON APPROACHES
To Huerta, one of the things that’s hindering the success of biocomputation is the lack of common approaches—common data formats, common vocabularies, common ontologies and common long-term data repositories. The NIH needs to take the lead on this because individual communities won’t do it on their own, Huerta says. “They’re interested in the research. They’re interested in what genes are involved in autism or what peptides are involved in myocardial infarction. They are not driven by ‘what should we call the peptide,’ or ‘what data format should we use?’”
It’s particularly problematic for communities that are organized around a particular data type that might cross institute boundaries—for example, signaling data, which is relevant to NIMH, NHLBI, NCI and others. “How does NIH encourage the development and use of common approaches by such research communities?” he wonders. “They are not going to organize around these things, and there really isn’t a way to do this right now. So I see this as a major need that NIH has. I call it community-based solutions for community-wide needs because the solutions are going to come from the community to serve the needs of the community.”
Getting communities to rally around common approaches requires a different mechanism than a typical research grant. “Really what folks need is organizational and operational support. We could serve as a way to organize around these issues where they wouldn’t be self-organized,” Huerta says. “That’s kind of on the horizon of what NIH needs to start paying attention to. And in fact we’re doing some of that. We haven’t gotten there yet.”
The NIH also has to address the fact that common approaches are dynamic and require ongoing support to be updated, Huerta says. Cohen agrees emphasizing the particular need for long-term data management. “NIH has to answer some questions about how to support large datasets and make them available after a grant ends,” Cohen says. Perhaps with the development of common approaches, this will become an easier to problem to handle.
Taking it to the next level, says Gallahan, common approaches in the computational area will also help researchers explore commonality among diseases, which will in turn help guide ways to interfere with disease. Gallahan points to a paper published last year by Albert-Laszlo Barabasi, [covered in the Fall 2007 News Bytes section of this magazine] that was able to find this sort of interconnection among diseases. Thus, Gallahan says, “Modeling might be able to do scientifically what we’re unable to do administratively.”
COMPUTATION IS THE FUTURE
Ask Gallahan why computation matters to the NCI, and he’ll tell you that it’s the future. “Much as molecular biology opened the world at that scale to manipulation, I think computational biology is going to bridge many of the challenges we have in dealing with biological complexity.”
The effort to cure cancer is particularly on point. Over the last 15 years, Gallahan says, they haven’t seen as many advances as the institute would like. “And I think that’s partly because it is such a complex disease,” he says. The greatest advances have tended to be very targeted therapies that affect a limited (albeit important) population. And after treatment with these therapies, sometimes the tumors reappear, having gained resistance to the drug. The lesson: The problem of cancer requires a better understanding of the disease’s complexity. “And in order to understand and integrate that, we’re going to need these computational approaches.”
NIH Challenge Grants: Stimulating Biomedical Computation
The federal governments's recently passed stimulus package-the American Recovery and Reinvestment Act (ARRA)-provides $10.4 billion to the National Institutes of Health, all dollars that must be spent in 2009-2010. The goal: to stimulate the U.S. economy through support of scientific research. And, if recently announced Challenge Grants are any indication, ARRA will also stimulate computa- tional research in biom
On March 5, the NIH announced that at least $200 million of the ARRA funds will go to a new program called the NIH Challenge Grants in Health and Science Research. According to the announcement, the idea is to give a two-year “jumpstart” to specific scientific and health research challenges in biomedical and behavioral research.
The announcement identified 15 “challenge areas” that encom- passed 878 “challenge topics,” 207 of which are deemed “high priority.” One of the 15 broad challenge areas is entirely computa- tional. Called “Informational Technology for Processing Health Care Data,” it covers four percent of the topics. But many of the other topics— under other broad chal- lenge areas—are highly computational as well, as shown by the charts on this page. “It’s a step in the right direction,” says NCRR’s Michael Marron.