Trawling for Drug-Gene Relationships
Database automatically mines literature for drug-gene relationships--and does it as well as manually curated databases.
When a drug saves one person but makes another ill, a bitter lesson in genetic differences often follows. With many such lessons already under our collective belts, researchers are using existing knowledge to predict additional drug-gene relationships as a way to forestall future calamities. A new software program can trawl published papers for gene-drug relationships, plug those relationships into known genetic networks, and predict which genes are likely to affect a patient’s response to a drug.
“Our contribution is using text mining and taking decades of research and folding that in to inform the prediction,” says Yael Garten, biomedical informatics PhD candidate in the lab of Russ B. Altman, MD, PhD, at Stanford University and a lead author of the work. “We showed that this is as good as and sometimes even better than manual curation,” in which scientists painstakingly enter published drug-gene interactions into a database. Garten will present the team’s research January 2010 at the Pacific Symposium on Biocomputing in Hawaii.
The previous version of the algorithm, designed by Altman and others, relied more heavily on manual labor. Called PGxPipeline, it employed a database of gene-drug relationships manually compiled from scientific articles by a team of scientists at Stanford Medical School. PGxPipeline wove these relationships into an orderly web, along with a database of gene-gene interactions and other data, to predict how strongly each of 12,460 genes affects response to a specific drug.
The team has now cut PGxPipeline loose from the manually created drug-gene database, automatically mining the information from published papers. This faster, cheaper method will inform the drug-gene rankings with constant updates from new literature. The manual-curation- and text-mining-based versions of PGxPipeline predicted with similar accuracy a test set of 682 drug-gene interactions. And the text-mining-based version was slightly better at identifying genes that play the largest roles in response to a specific drug.
Garten hopes to use the revised PGxPipeline to parse all relevant scientific literature for drug-gene relationships. Better predictions will save researchers time in deciding which of the possible interactions to test in the lab and eventually influence how doctors prescribe drugs, she maintains.
“There is an emerging trend in bioinformatics to combine information from curated databases with information extracted from text,” says Tom Rindflesch, PhD, principal investigator for the semantic knowledge representation project at the National Institutes of Health in Bethesda, Maryland. “This is an excellent example.”