Imaging Collections: How They're Stacking Up
As barriers to massive imaging collections fall, researchers can look at human systems in their entirety rather than in pieces
In the beginning there was the Visible Human. It broke new ground by gathering some 2,000 serial images from a death row inmate’s cadaver, and was the first time researchers had sectioned a single human being and gotten it right.
But the project broke new ground in another way as well. As the first large, publicly-available image collection, it proved that “If you build it, they will come,” according to project director Michael Ackerman, PhD, of the National Library of Medicine (NLM).
The Visible Human was initially envisioned as a tool for teaching anatomy. But soon after the database launched in 1994, use agreements started pouring in from scientists who wanted to create 3-D images to test for radiation absorption or design artificial hips and knees, not to mention from artists illustrating anatomical injuries in court cases, to name just a few of the dozens of projects based on the Visible Human data.
Despite the suggestion that such large image collections could inspire new types of research, the Visible Human Project remained the only public imaging database available for many years. During that time, large public databases in other fields—most notably genomics and proteomics—created whole new realms of research.
Today, unlike genetic sequence data, which are centralized in GenBank, and protein structures, which reside in the Protein Data Bank (PDB), imaging data still lacks a central repository. But an increasing number of people are hoping to create image collections from thousands of people, and not just one prisoner in Texas.
The question is whether the shift from examining images one at a time to looking at them in large groups will not only lead to better research of the type already done today, but will create something fundamentally different. Just as the field of genetics transformed into genomics when biologists moved from looking at individual genes and diseases to examining the whole genome, so too imaging could see a shift. A field that has traditionally studied narrowly defined problems using small collections gleaned from physician-collaborators could find itself faced with huge collections and the potential to reveal new correlations between diseases, genes, and anatomy. As in genomics, it will be possible to look at variation both within and between diseases like never before.
Before this transformation can happen, though, a leap of faith is required: Researchers must share their images now in hopes of greater rewards later. That’s one of the current challenges researchers are tackling. There are others as well: Researchers must find ways to increase computer storage capacity; create a comon language for describing images; develop standards for “metadata” that will explain where an image comes from and what it shows; find ways to map images from different individuals onto an agreed upon “model;” and improve existing ways to analyze and interpret images consistently. They also must make images available remotely, so that physicians in rural areas will have access to large comparative collections.
As these barriers fall and imaging collections become more readily available, suddenly, imaging researchers will be able to do what genomics researchers do all the time: look at human systems in their entirety rather than in pieces.
But before we get ahead of ourselves, let’s review the challenges.
Building and Sharing the Collection
Creating image data is easier than ever. Imaging capacity has increased by leaps and bounds. X-ray technology, developed in the 1890s, was followed by incrementally stronger imaging methods, from ultrasound (widely available in 1970s), to positron emission tomography or PET (1970s), to computerized axial tomography or CT scans (1970s), to magnetic resonance imaging or MRI (early 1980s) and functional MRI (early 1990s). New techniques are still appearing.
And with major improvements in data storage and networking, scientists do not worry as much about amassing bigger data sets. Big disks are relatively cheap—researchers might pay around $3,500 for a terabyte of storage—and the capacity of computer networks to transmit large images is ever improving. Fred Prior, PhD, of Washington University School of Medicine in St. Louis, recently purchased space to store new research images he expects will be generated during the next three years at the Electronic Radiology Laboratory which he directs. His team’s new Network Attached Storage system from BlueArc can hold 102 terabytes, with an option to expand to 500 terabytes or, with an upgrade, to 4,000 terabytes (4 petabytes)—a number once unthinkable. And that does not even include clinical imaging, another huge figure.
Even with such imaging, storage, and computing power in hand, a question remains: how to motivate other researchers to share their images? Scientists feel a sense of proprietary ownership over the images they have collected. While patients can perhaps stake the greatest claim to the images, most images are technically “owned” by the institution where they were made, and specialists carrying out imaging projects feel they should be the first to reap the benefits of the information the images contain, rather than having to share the data.
“Science is highly competitive. Scientists want to get the first publication, to gain funding, and get academic promotions,” says Arthur Toga, PhD, head of the Laboratory of Neuro Imaging (LONI), at the University of California, Los Angeles.
Indeed, in 2000, a spat erupted in the brain imaging world when Michael Gazzaniga, PhD, director of the National fMRI Data Center, wrote to fMRI specialists who had contributed to the Journal of Cognitive Neuroscience, telling them they would be required to share their experimental data with the center if they wished to publish in journals including Science and the Journal of Neuroscience. Researchers immediately raised objections, sending a letter to the center’s financial backers and 14 journals. Releasing their images, they argued, “impinges on the rights authors should have on the publication of findings stemming from their own work.” The center decided to establish a “data hold” for a period of time, to allow authors to profit from their images first.
Maryann Martone, PhD, has run up against some of the same issues. As co-director of the National Center for Microscopy and Imaging Research (NCMIR) at the University of California, San Diego, she has led the creation of the Cell Centered Database (CCDB), one of the first Internet databases for cell-level structural data. She also coordinates a project supported by the Biomedical Informatics Research Network (BIRN) that investigates mouse models of human neurological disease.
“These resources were created with the idea that people were going to populate them from the community, but neuroscientists who do complicated imaging studies are not that happy about having data out there before they can mine it,” she says. Because NCMIR is a “technology development center” funded by the NIH, she says, it has a mission “to serve a large collaborative community.” So she decided to begin with her own center’s data and hope that others would follow: “We do imaging that is unique. I figured, if we just took all the data around here and made it available, that would be helpful.” It was: the project was one of the first web databases devoted to electron tomography when it launched in 2002. Since then, it has continued to give access to complex cellular and subcellular data from light and electron microscopy. Meanwhile, Martone and colleagues are still thinking about the best ways to encourage other research groups to share their data with the site.
As so often happens in the world of science, it is funders—in particular, big government-sponsored efforts—who are beginning to change the rules of the game. One project aiming to put its arms around as many images as possible is caBIGTM. Launched in 2004 by the National Cancer Institute (NCI), it embraces 50 cancer centers and 30 other organizations. caBIGTM is an attempt to bring together the huge amounts of data gathered and tools created in NCI-funded cancer clinical trials. It aims to take an “open source” approach—creating an environment of sharing information in the work it funds. According to some, this is the wave of the future.
“Increasingly, the NIH is requiring that people share data,” says Daniel Rubin, MD, MS, a clinical assistant professor and research scientist at Stanford University Medical Center. Clinical trial information, for instance, is becoming more readily available, Rubin says. He points to the American College of Radiology Imaging Network (ACRIN) as an example of this trend. This NCI-funded group hosts an imaging database that houses a large archive of clinical trial imaging data in cancer fields.
Toga thinks that it is ultimately in a scientist’s self-interest to share. Lots of data is needed if scientists want to identify subtle differences between images, he says. “You can’t possibly collect it on your own.” What helps, he says, is when a couple of folks get together and say, “I’ll share mine if you share yours,” which is becoming more common.
Metadata: Capturing the Context
One cooperative project in which Toga has been involved is the NIH-sponsored Alzheimer’s Disease Neuro-imaging Initiative (ADNI), which encompasses 60 different sites that are sharing image data on the disease. But if a researcher looks at an ADNI image without knowing whether the patient has a disease or not, or without access to the person’s age or gender, or the drugs he or she has been taking, it becomes much less useful.
One of the most important parts of collecting large amounts of imaging data is also to capture each image’s back story—the context in which it was made and the condition of the patient at the time. For images, efforts to create a framework for recording such information—known as metadata—currently lag behind efforts in other realms (e.g., the “MIAME” standards for microarray data). But work is now underway to improve the situation.
Some metadata—such as a patient’s name, home address, and identifying features—must be removed before images enter a large database. The process of “de-identification of protected health information” follows federal privacy regulations.
But other useful information needs to be incorporated into image collections. Before image metadata can make sense, though, more standardization needs to be introduced into the field, many say. Radiologists have a long tradition of looking at images with expert eyes and dictating a free-flowing analysis, which becomes a text report that often uses terms in unique ways. That makes it difficult for other scientists or doctors to understand the image’s context and content in a uniform way.
Attempts to collect and codify metadata are already well underway. One of caBIGTM’s initiatives in its In-vivo Imaging Workspace is called “vocabularies and common data elements,” an effort to standardize terminology in cancer analysis. Rubin, one of the group’s co-leads, reports that they are trying to structure radiology imaging findings, to establish controlled terminologies for radiology, and to associate specific metadata about patients with each image gathered.
Indeed, such efforts do not end with cancer research, but could sweep across all aspects of radiology. Rubin is also involved with a project called RadLex, which is being created to offer a uniform lexicon for radiologists. RadLex plans to unify radiology term standards and to make the new terminology freely available on the Internet. Rubin sees these attempts to create a common vocabulary as the first steps in making metadata meaningful and useful for researchers and clinicians alike.
Comparing Images: Snapshots and Scales
The race to create useful imaging collections faces another hurdle: how can multiple images be compared in a way that makes sense? Each human’s body parts are shaped differently, with variations that range from slight to immense. On top of that, describing shape is notoriously difficult. Though shape has been explored by the scientific community since the time of the Greeks, we still have no quantitative parameters for defining the shapes of “normal” human organs, let alone those suffering from disease. In addition, images are affected by the exact place and time they are taken, and the precise method used to take them. All this serves to undermine any straight-forward database of imaging data. “Image data is a snapshot of one instance of a thing at one time under certain conditions. It’s not a ground truth like a gene sequence,” says Martone.
If all images can be standardized in the way they are conducted—that is, the types of equipment used, and the kinds of patients included, and the disease(s) being examined—comparison becomes easier. That is part of the success of ADNI, according to Toga: its research sites are required to follow strict protocols for their equipment and image acquisition.
Imaging specialists have also come to rely on the best available scientific means of shape comparison, and they try to incorporate this material into their collections. One example is in neuroimaging, where pictures of the brain are often linked to coordinate systems. Like a road map, these identify what parts are found where with reference to a grid or common starting point. For example, Talairach coordinates measure distances from a specific spot in the brain, the anterior commissure.
However, researchers find fault with existing coordinate systems because they fail to accommodate variation in large populations. While they may serve well for a single human or animal, they are not as helpful when scientists aim to “warp” many individuals onto a common model to illustrate the workings of a disease, for example. As a result, some recent brain atlases have developed their own, mathematically-complex methods for mapping variability in big groups onto a single framework.
In human brain mapping, researchers have found novel ways of dealing with natural variation between human brains. Toga reports that the 15-year-old International Consortium for Brain Mapping (ICBM) describes the brain in a probabilistic sense. For example, the atlas might tell viewers that there is an 80 percent likelihood that the basal ganglia is in a particular location that has been set out by coordinates.
Another means of handling variation is evident in the Allen Brain Atlas, an extensive mapping of the mouse brain’s gene expression created by the Allen Institute for Brain Science in Seattle. The team behind this atlas created its own coordinate system to ensure extra accuracy. The ABA is a union of neuroscience, genetics, and informatics. To map gene expression onto the 3-D mouse brain model, a team of neuroanatomists drew all the regions of the brain, and then “we lofted those regions onto a 3-D model of the brain using informatics algorithms,” says Michael Hawrylycz, PhD, director of informatics at the Allen Institute for Brain Science. Using high-level computations, an image of gene expression was then mapped onto the reference atlas’s coordinates, creating pictures that form the database. ABA scientists chose one mouse to be the reference model, and the rest of the mouse data was warped to fit into the spatial framework of that single animal’s brain. “We wanted a mouse that was held under exactly the same conditions that we were going to run the genes under,” Hawrylycz says. Another vexing challenge for image comparison is the issue of scale. Martone points to the problems confronted by brain researchers when they try to see the workings of a disease on multiple scales in a large set of images taken using different technologies. “We go from MRIs, to optical microscopy, to electron microscopy, then to X-ray crystallography,” she says. “Every time you traverse scales, there are gaps. Every time you switch techniques, you lose continuity.” Even the contrast mechanisms are different, so one scale may contain fluorescents while another is gray scale, disorienting researchers. It’s like being confronted with a GPS tracking image of a moving vehicle one minute, and a Polaroid photo of the vehicle’s front wheel the next.
To combat confusion, Martone’s team is trying to create new coordinate and reference systems that ease the transition among scales when studying neurons in the brain. She cites a new software project that attempts to correlate microscopy with “feature-based matching systems” that describe the attributes of such cells in a uniform way.
Analyzing Images In Three Dimensions
Those who set out to compare images are also getting help from advances in image analysis software, a field that has advanced rapidly in the past few years. Ron Kikinis, MD, professor of radiology at Harvard Medical School, has helped lead the way. He and colleagues developed the “3D Slicer” image analysis software, initially a joint, open-source effort between the Surgical Planning Lab at Brigham and Women’s Hospital, where Kikinis is founding director, and the Artificial Intelligence Lab at MIT. Created to help visualize medical image data in 3-D, it has been used with success in fields as far flung as astronomy and geology.
Although Slicer was conceived as an interactive tool for processing single images, it is also useful for researchers working with large sets of images, Kikinis says. “Now people are beginning to build informatics frameworks to hold and manage images; and soon people will shift focus to how to process those images,” Kikinis explains. “With all the progress in image acquisition, you still need to turn data into medically-relevant information, and that requires image analysis,” he says. The current version of Slicer is interoperable with BIRN’s informatics frameworks and is also linked directly to the National Cancer Imaging Archive (NCIA)—a large repository of cancer trial images—as a recommended viewer for its images. Slicer can be used to review image sets for prototyping and results for quality assurance. For example, before processing hundreds of images, it’s wise to test your algorithms and procedures with a handful first. That’s where Slicer’s interoperability with large databases can be used as a tool that offers essential functionality.
Another fundamental tool available to image users is the Insight Tool Kit (ITK), which Ackerman of NLM says took some three years to develop. Based on GE’s Visualization Tool Kit (VTK), ITK’s algorithm allows a user to identify a body part—for instance, the heart—and then ask the tool to draw a line around everything that looks like heart tissue. “Up until now, you’d have to do that by hand,” says Ackerman. The tool saves users’ time and is constantly being updated, mak- ing it ever more efficient.
Other complementary efforts are working to ensure that researchers in distant labs can create their own image analysis applications on a lab workstation. Fred Prior has worked with other researchers to oversee cr ation of the Extensible Imaging Platform, or XIP. “The idea is that there are lots of commercial workstations that are optimized for clinical reading, lots of research packages like Slicer, and great toolkits like ITK that give you functionality, but what’s missing is a way to build custom applications for these tools,” says Prior. XIP will give users a “rapid development environment,” he says, enabling researchers to do image processing more easily. XIP’s initial targets are cancer researchers already working in the grid, but its potential is much greater.
“We’re hoping we’ll see a cottage industry building new applications in this XIP framework to do things like virtual colonoscopy and radiation therapy analysis,” Prior says. The “slick part” in Prior’s words is that such applications could be run through the grid and offered to other researchers remotely through the platform—creating a whole new level of sharing.
In quite a different application of image analysis, some researchers are honing in on new ways to help scientists and doctors find the images they need using tools that analyze its image content rather than its metadata. Known as content-based image retrieval, these programs also strive to overcome errors caused when inaccurate text-based keywords lead to mismatches in retrieving images, write Paul Miki Willy and Karl-Heinz Küfer, PhD, of the German Fraunhofer Institut Techno-und Wirtschaftsmathematik in a 2004 paper. Content-based programs attempt to index images according to visual features such as color, texture, and shape. Ultimately, some hope that these systems might allow a physician to click on an image of a cancer in a particular patient and ask a database to show similar images for comparison. So far, this technology has not yet reached a wide audience; some believe more work is needed to ensure accuracy in such searches.
Accessing Image Databases: Connecting to the Grid
All these image collections will do little good if no one can access them remotely. Researchers at the crossroads of biomedicine and computational science are tackling that problem now.
One promising answer is to create “federated databases”—groups of unique imaging collections that are linked together by a sort of “grid,” and that are accessible remotely via a seamless user interface that makes the data sets resemble one single virtual database. Joel Saltz, MD, PhD, professor and chair of the department of biomedical informatics at Ohio State University, leads a group that develops technologies that can enable “grid” access for large image collections to create such federated systems. His group has developed middleware to support complex distributed applications. It attempts to stitch together different bodies of images, making them available and searchable.
“The overall goal of the effort is to develop an infrastructure to connect multiple databases, to allow people to discover what images are out there, and to analyze both remote and local imagery and to integrate image data with information from molecular studies, clinical studies, and pathology specimens,” Saltz says. The National Cancer Institute caBIGTM project has incorporated the Ohio State group’s software in the caGrid software package. This was first distributed in December and, Saltz says, quite a number of funded efforts have begun to incorporate it. Furthest along in the process of opening up an image database to many users with Saltz’s help is the National Cancer Imaging Archive.
These new systems may not be open to just any member of the public—at least some will require registration and credentials. But the incentive to participate is high. Researchers and physicians who gain access will be able to communicate with each other in new ways that could make a big difference to patients. A major benefit for those linking their images to a grid is the possibility of “central review,” says Saltz. In central review, radiologists remotely read an image and provide their feedback via software that allows a user to capture markups, pointers,and comments. For instance, a radiologist in Omaha might send out a CT scan of a patient’s lung via Saltz’ software to radiologists around the world as well as to computer-aided diagnosis algorithms available at supercomputers in research centers. She might hear back from radiologists in Mumbai, Tokyo, and Chicago, and from computers at a handful of univer- sities, possibly discovering lung nodules she had missed.
Applications: Will They Come?
If researchers overcome the barriers described above, the question then will be whether it will prove worthwhile. Will innovative applications follow? In other words, if you build it, will they come?
Early indications are that they will. For some physicians, the near-term possibility of central review alone will make federated imaging databases worth the effort.
For neuroscientists, gaining insights into the brain’s workings and connections requires large numbers of fine-grained images. In the past, scientists had done studies of specific parts of the brain, but few had tried to discover the overall structure of the brain. Large neuroimaging projects such as the ABA are attempting to change that. Indeed, some hope to one day map every single neuron in the human brain, creating a data set of upwards of 1 million petabytes. This “connectome,” promises to be the image-based Human Genome Project of brain researchers. Its success will rely on computer-assisted image acquisition and analysis to map the structure of the nervous system, says Jeff Lichtman, MD, PhD, professor of molecular and cellular biology at Harvard.
In clinical trials for cancer treatments, image collections help in evaluating a drug’s effectiveness, says Carl Jaffe, MD, diagnostic imaging branch chief for the cancer imaging program in the division of cancer treatment and diagnosis at NCI. The promise of using image collections to speed drug development is already beckoning. “The regulatory authorities are more willing to accept regression of a tumor as a sign of a drug’s effectiveness...and imaging is the pivotal marker for this,” he says. A large database of reference images helps to balance “reader artifacts”—that is, errors in radiologist’s assessments—and to substantiate that a tumor has indeed changed size in an important way, he explains. Researchers could use a central review-style process to verify their reading of an image. “An image database allows you to go back to a larger community of observers and confirm whether or not something seems to be supportable.”
For researchers studying rare diseases, the goal is to find others to compare against and to increase understanding remotely. For example, says Jaffe, in the old days, a researcher hoping to test a drug for a rare disease such as retinoblastoma—a cancer of the retina with an incidence of only 430 cases per year—would have to request MRI films from around the country to try to prove that his trial worked on a range of patients. But some films would come back too dark, some too light, and some without the right metadata. If all the data and images could be collected digitally in an online data-base, the researcher would more quickly understand the drug’s impact. “What you want is an electronic, common pool of data and metadata,” Jaffe says.
Surgeons and other physicians could also benefit from such systems as Rubin’s efforts to use large groups of images to inform a doctor of how to diag- nose and treat a patient. Using Rubin’s decision support software, physicians can select from a series of structured annotations of an image and upload the image data. Then a computer program tells them the likelihood of disease. “We want to give radiologists a tool to help them decide when to biopsy based on what they see,” he says. While it is partly based on the knowledge of expert radiologists, this type of technology will work even better when a large number of images are available to inform the program—hence the need for large databases filled with rich stores of metadata.
Increasing numbers of researchers on the biomolecular scale are also using imaging in their research, including scientists like Martone and the people who utilize the ABA and other such atlases. For example, labs are using the ABA to investigate risk factors for multiple sclerosis and to identify genetic hotspots associated with memory performance. And new databases at the cellular level are popping up, including the Open Microscopy Environment, a large public database focused on microscopy imaging data.
The New Thing
Imaging is just one of many bioscience fields moving towards more and better information sharing and collecting. While the field faces its own hurdles—the difficulties of comparing images, for example—it falls within a larger trend of making data available and breaking down the silos of single organ or disease-focused work that for so long dominated the sciences. It’s the same impulse that inspired the release of the genome and the dawn of genomics, and could cause a similarly radical shift in how people use image data.
The next generation of applications will reveal whether the rise of large imaging collections will create a new science, just as genetics spawned genomics. Ultimately, it might be possible to cross-compare between imaging and genomics. That’s already happening in brain research projects such as the Allen Brain Atlas, but the trend could spread throughout the body. And as in genomics, the shift could generate an entire new field of research in which scientists could build an entire career.
If the Visible Human is any proof, simply building large, accessible collections of images will attract scientific curiosity and will launch a wealth of useful applications we cannot even imagine today.