|
High-throughput sequencing is providing insights about nuanced microbial diversity in clinical and disparate environmental settings
William Check
William Check is a freelance writer in Wilmette, Ill.
Summary
- Recently developed DNA sequencing methods are enabling investigators to fast-track high-redundancy sequencing runs at reduced costs and high efficiencies.
- In one approach, approximately 25 million bases of DNA can be sequenced in one fourhour run, at a claimed accuracy of 99%.
- Deep sequencing is providing insights about microbial populations in diverse settings, including mines, along coral reefs, soil, and deep in the ocean.
- This analytic approach also is being harnessed in biomedical research settings to examine a range of bacterial and viral pathogens.
Genomic sequencing grows increasingly virtuosic with each new “generation” of technical improvements. One recent flourish, which traces to 2005, is variously called high-throughput sequencing, massively parallel sequencing, deep or ultra-deep pyrosequencing, or 454 sequencing, after its developer 454 Life Sciences, which Roche, Inc., acquired early in 2007. However they are referred to, these methods are enabling investigators to fast-track high-redundancy DNA sequencing runs at reduced costs and high efficiencies.
Some microbiologists are adopting this approach, whose first use in microbiology was a one-run shotgun sequencing and de novo assembly of the Mycoplasma genitalium genome in 2005. Since then, investigators began adapting this approach to examine microbial populations. In addition to work on microbes, next-generation sequencing is being used to study chromatin structure in plants such as barley, rice, and Arabidopsis thaliana; gene regulation in diverse organisms, including humans and the nematode Caenorhabditis elegans; and cancer genetics in humans. One early demonstration of the power of ultra-deep sequencing was in detecting rare cancer genomes in mixtures of normal and malignant human tissues.
Deep Sequencing Applied to Diverse Microbial Populations
Deep sequencing of DNA is well-suited for analyzing mixed populations of microorganisms, including those dwelling in exotic settings, according to Mitchell L. Sogin of the Marine Biological Laboratory (MBL) in Woods Hole, Mass. He and his collaborators are using massively parallel sequencing of ribosomal RNA genes (rDNA) to assess the diversity and relative abundance of microbial populations associated with the water column, soils, and microbiomes of animals. “What has changed with next-generation sequencing is that instead of spending one to five dollars to get each rDNA read, I can now spend one or two cents,” he says. “This increases the efficiency of my work by two orders of magnitude.”
These same advantages apply to metagenomics, in which genomes from separate strains and species are sequenced in mixtures. For instance, several years ago Forest Rohwer of San Diego State University in San Diego, Calif., and his collaborators began applying ultra-deep pyrosequencing to metagenomes from environmental settings— generating sequences from mixed samples extracted from two deep mines. “Since then, we’ve mostly been using 454 sequencing for our microbial ecology studies,” he says.
Rohwer cites two key advantages from following this strategy. First, pyrosequencing costs much less than does conventional Sanger-based sequencing of DNA. Second, much of his work is done on viruses. “When we clone viruses we lose a lot of virus types,” he says. “The cloning-independent approach helped us find a lot more diversity of viruses.” Random shotgun sequencing that depends on bacterial cloning of sample genomes or metagenomes comes with an inherent bias, whereas 454 sequencing amplifies DNA in a cell-free system. With the latter, he says, “We found quite a few things we had never seen before.”
For Sogin and Rohwer, massively parallel sequencing enables them to analyze large numbers of different DNA molecules in complex mixtures—offering spectacular breadth of analysis. However, this same approach also can offer great depth. For example Robert Shafer at Stanford University in Stanford, Calif., and his collaborators analyze many viral variants simultaneously, sometimes repeatedly sequencing slightly different versions of the same viral DNA. “We had probably the first paper applying ultra-deep sequencing to virus research,” he says.
An advantage to ultra-deep sequencing in the case of viral genomic analysis is that it yields complete sequences for each sample. “You are not just probing for preselected mutations,” Shafer says. Moreover, this approach reduces the workload associated with sequencing multiple clones. “There is a strong rationale for using a next-generation technology like 454 sequencing to detect low-frequency variants in a mixture of species in a PCR product,” he says. “It is an eminently logical approach to sequencing quasispecies in patients with chronic viral diseases.”
Other massively parallel pyrosequencing technologies are available, such as the Applied Biosystems Sequencing by Oligonucleotide Ligation and Detection (SOLiD) and the Genome Analyzer from Illumina (formerly Solexa). However, 454 sequencing yields longer read lengths, which some investigators consider a plus, particularly for viruses. Shorter read lengths do not work so well when resequencing segments that contain high levels of variation, which is the case for many types of viruses. “You may have trouble assembling short reads into a longer continuous read,” Shafer says. “In addition, people who do virus sequencing are interested in showing that two mutations occur on the same viral genome. That’s difficult if you have short reads.”
Both the ABI and Illumina instruments read only 30 to 40 bases, Sogin notes. “Although they may eventually be able to read 100, that doesn’t solve our problem.We need to be able to read 250 contiguous bases.” By contrast, GS FLX model being marketed by 454 reads about 240 bases. “There is no question that 454 and other pyrosequencing technologies have revolutionized this field,” Shafer says.
Researchers who are not now doing massively parallel DNA sequencing also recognize its potential. “I’m excited about the longer read lengths,” says K. Eric Wommack of the University of Delaware and Delaware Biotechnology Institute. One new model instrument that was being launched last last year provides 1 million reads of 400 to 500 nucleotides. “You will get double the read length and more than double the number of reads,” he says. “To me that combination is extremely promising.”
Massively Parallel Sequencing Depends on Synthesis
The 454 Life Sciences website, http://www.454.com/products-solutions/how-it-works/index.asp, features a colorful diagram depicting nextgeneration sequencing technology. Simply put, massively parallel sequencing is based on pyrosequencing, or sequencing by synthesis, in which only one nucleotide triphosphate is added into the reaction mixture per cycle. If there is a complementary base in any single-stranded fragment from the sample, DNA polymerase catalyzes addition of that base, releasing pyrophosphate that, after a series of steps, produces a photon, which is measured by a charge-coupled device camera.
Overall savings from use of instruments that analyze DNA sequences on this basis come from several innovations. For example, a cloning step is eliminated, and all the steps are done on fixed substrates, applying cost-saving nanotechnology methods that reduce reagent use. Perhaps most important, sequencing is done in a massively parallel format. Each unique single-stranded DNA library fragment is attached to a micron-sized bead that is isolated in an oil-in-water emulsion in which amplification takes place.
Amplified fragments are loaded onto a picotiter plate that has “wells,” called picoliter reactors, with a diameter that allows only one bead per well. “The plate looks like a glass slide, about twice as wide, and it has 1.6 million wells,” Sogin says. Nucleotides flow rapidly across the plate, and the sequencing reactions go ahead in parallel. Each well potentially can be the site of a DNA sequencing reaction during each nucleotide addition. Approximately 25 million bases are sequenced in one four-hour run, at a claimed accuracy of 99%.
With the new Titanium reagents, Wommack says, “We have the ability to do truly comparative metagenomics.” Different samples can be “bar coded” by adding sequence tags at the end of fragments in each sample. Samples are added to the picotiter plate at the same time and sorted at the end, enabling comparison of the gene content of 2 to 10microbial communities. “This is going to be a really promising part of the technology,” he says. Of course, there is a tradeoff—the capacity tops off at 1 million sequences, which is divided among the samples.
Deep Sequencing Enhances Environmental Investigations
In addition to investigating microorganisms in deep mines, Rohwer uses massively parallel sequencing to explore other environments. For instance, he detected DNA viruses that are related to herpesviruses on coral reefs. “Those sequences would be hard to clone,” he says. “Next-generation technology let us see them more easily.” Once detected in a particular niche, such novel viruses can be cloned. “You don’t want to just keep doing more metagenomic surveys,” Rohwer says. “You want to look at what you find. Using PCR to confirm what you see in the metagenome is pretty important.”
Rohwer, Elizabeth Dinsdale, Robert Edwards, and their collaborators used 454 sequencing to analyze almost 15 million sequences from nine diverse biomes. “There is quite a lot of predictive power in metagenomes,” Rohwer says. “DNA coded by viruses or microbes can accurately tell us what they are being selected for in that environment, down to a particular enzyme.” It is also predictive in the sense that, if you had a sample of unknown origin, analyzing the metagenome could efficiently determine where it came from. Further, he adds, “Viruses in those communities encode a lot of important genes that we would have never guessed. They are modifying host metabolism in ways that range from motility, to structure of cell wall, to carbohydrate metabolism.”
Rohwer calls an analysis of microbial ecology on four coral atolls in the Pacific “one of our most important papers.” Comparing one of the few remaining pristine reefs to another that is reeling from the effects of human-inflicted pollutants verifies that coral does well on the former reef whereas various species of coral are sick or dying on the other. Microorganisms that associate with corals continue to increase dramatically in areas near where human populations are increasing. Those microbes also tend to be more pathogen-like. “They are closely related to what we think of as pathogens—strep and staph and E. coli,” Rohwer says. “This is also reflected in viral community patterns. So what we are doing to reefs is changing their dynamics and making it harder for them to make a living. This is one of the best things we have ever figured out.”
Meanwhile, the data-processing demands of next-generation sequencing loom as a major challenge, according to Rohwer. “We have many computational people to support the experimentalists,” he says. “Rob Edwards is one of the best informatics people in the world. And we have a large math group that meets once a week to figure out how to deal with data. We could do this work with traditional methods, but it would be more expensive and take longer. And there are some things we have found using these methods that you couldn’t find with conventional approaches.” Thus, for example, single- stranded DNA phages and environmental herpesviruses simply would have been missed if investigators were relying on conventional analytic approaches, he points out.
Surveys of rDNA using deep-sequencing approaches also reveal new patterns of microbial ecology, according to Sogin. Earlier readings of microbial taxon rank distribution curves—how many times each operational taxonomic unit, or taxon, in a community is represented—showed some taxons to be very abundant, making up 10 to 60% percent of all organisms in a particular microbial community. Highly abundant organisms form clusters in half a dozen parts of the phylogenetic tree, and look “the same,” Sogin says. Many other organisms are present in very low numbers, providing distribution curves with very long tails.
Those low-abundance organisms are difficult to measure with conventional methods. “Along comes the ability to sequence very extensively from one sample,” Sogin says. “Instead of looking at a few thousand organisms, we can look at hundreds of thousands. We’ve even done a million.” In these more-extensive surveys, microbial diversity appears much greater than it did previously when other analytic tools were in use. “In a liter of sea water taken from anywhere in the ocean, no one had reported more than 1,500 to 3,000 phylotypes,” Sogin says. “We did a study looking at 750,000 sequences. We saw on the order of 40,000 to 50,000 thousand different bacterial phylotypes.” More impressive, those phylotypes include many rDNA sequences that are divergent from the 500,000 sequences in the RDPII database. “Most of this novelty is part of what makes up the long tail of the taxon rank distribution curve,” Sogin says.
Next-generation sequencing also is uncovering great diversity in soil microbiomes as well as in mammalian species such as mice and humans. Sogin, studying mice, and David Relman of Stanford University, studying humans, find “tremendous diversity” in their microbial communities, Sogin says. For example, analysis of the gut mirobes of mice reveals more than 5,000 phylotypes and, in those of humans, more than 10,000 phylotypes.
“So this long tail in the distribution curve and rare members of the biosphere seem to be present any place you look,” Sogin continues. “Next-generation techniques give us the ability to sequence deeply enough that we can see those rare organisms.” Next-generation technology also can detect shifts in minor populations that appear important for understanding how communities work, such as “subtle shifts in population structures” following administration of antibiotics, he says. Sogin did his initial massively parallel sequencing by sending samples to the 454 Life Sciences facility. His laboratory recently purchased its own instrument, which cost about $500,000.
Deep Sequencing Being Applied to Health Research
There is a “surprising” degree of viral diversity in stool samples from people in Pakistan with acute flaccid paralysis, who are being monitored by a polio surveillance group, according to Eric Delwart of Blood Systems Research Institute in San Francisco, Calif. “We have found a whole zoo of new viruses,” he says. “Many are plant viruses that go right through the GI tract and are probably commensals. But there is a lot of what we think are new animal viruses.” These include picornaviruses, parvoviruses, and circoviruses— small DNA viruses that are pathogenic in birds and pigs. Whether any of these viruses replicate in humans is not yet known. Delwart and his collaborators plan to analyze similar samples from humans in Nigeria and Tunisia. “Working in a blood bank, I am concerned about new viruses penetrating into the blood supply,” Delwart says. “New virus sequences are a good starting point to look for potential new pathogens.” Proteins made based on viral DNA sequences can be used as antigens in assays that measure how many people are being exposed to those viruses. “Some viruses only exist in blood for a short time then move into an organ, so you need to see antibodies,” he says. Delwart applied Sanger and 454 DNA sequencing approaches in a complementary way. “We sometimes use Sanger for longer reads,” he says, “but when we have a large number of samples we use 454.” He sends samples to a service group at the Stanford University genome center for analysis.
Delwart, who collaborates with bioinformatics specialist Chunlin Wang of Stanford, considers data analysis a key challenge. “Wang has a cluster of eight PCs, but it still takes several weeks to get data,” Delwart says. With use of massively parallel sequencing techniques on the rise, he expects the demand for computing power to “go through the roof.”
Michael J. Sadowsky of the BioTechnology Institute and Microbial and Plant Genomics Institute at the University of Minnesota (UM) in St. Paul plans on applying such sequencing techniques to study chronic relapsing Clostridium difficile-associated diarrhea (CDAD), which results when antibiotic treatment causes an imbalance in gut flora. “A small number of people have a susceptibility to this condition,” he says. One approach to treating this condition entails reconstituting the gut flora by instilling mixtures of fecal bacteria into the stomach through a nasal tube or directly into the rectum of a patient.
“CDAD can be cured fairly easily by fecal transplantation,” Sadowsky says. However, he adds, “It’s a treatment without a scientific basis. It may be possible to find a bacterium in feces that could restore balance.” He and his collaborators are recruiting patients with CDAD, who will be paired with a well person from the same household. He will use the 454 instrument at the UMBiomedical Genomics Center to analyze gut microbial populations of patients before and after their clinical procedures and also compare the results with those for microbial populations of the household members.
Meanwhile, Shafer at Stanford University and his collaborators are using deep sequencing to analyze drug resistance in two viruses that cause chronic diseases and have high mutation rates, HIV and hepatitis B virus (HBV). Although a DNA virus, HBV replicates through an RNA intermediate that is made by a reverse transcriptase, introducing a high error rate. The standard approach for monitoring drug resistance is Sanger sequencing, which has a sensitivity of about 30%, meaning it requires a great deal of work to detect minor resistance variants.
With 454 sequencing, Shafer examined viral sequences from 22 HIV-positive individuals who had not been treated with antiviral drugs, and found that many of those individuals carry variant viruses with mutations that, although not directly responsible for resistance, can serve as markers for resistance mutations. To some extent, drug resistance can be transmitted from uninfected to infected individuals, which may account for some of these variants showing up in HIV-infected individuals who have not yet been treated with antivirals.
“We expected that people infected with that [HIV] variant would have a lot of resistant virus,” Shafer says. In fact, they found that there was much less resistance than expected. In practical clinical terms, Shafer says, “We can’t quite say that even if you have those variants standard therapy works just as well. Perhaps we could conclude that a slight modification of standard therapy—substituting a protease inhibitor for a reverse transcriptase inhibitor—might be good enough.”
In analyzing HBV, 454 sequencing picked up many major resistance mutations earlier than standard sequencing, according to Shafer. “In someone off therapy for a number of months or a year, you will still find resistant variants,” he says. “So this finding could show what test to use when restarting therapy.”
As for introducing next-generation sequencing for routine use in clinical settings, Shafer says, “We are several years away.” One reason is that other analytical approaches that provide 20% sensitivity work well enough when combined with information about the clinical and treatment history of a patient. “So the bar is rather high for introducing new technology. We find this assay very good for research and for getting insight into how HIV and HBV evolve resistance starting with minor variants and seeing those minors increase in frequency,” he continues. “On the other hand, before it can be used in clinical settings there needs to be a major push to improve upstream processing and downstream data manipulation.”
 |