Last week I read an article in Wired Science that described how an outbreak of antibiotic resistant Klebsiella pneuomiae was tracked in real-time at an NIH hospital using DNA sequencing technologies. The article described how whole genome sequencing of disease isolates and environmental samples from the hospital was used to track the source and spread of the outbreak.
The scientists monitoring the outbreak tracked spontaneous random mutations in the K. pneumoniea DNA sequence to determine that the outbreak was caused by a single source, and to track the spread of the organism within the hospital. The sequencing information helped investigators identify when and where infection occurred, and also to track transmission of the infection from person-to-person. It also revealed that the order of transmission was different from the order in which the cases presented with symptoms, and helped identify how the organism was spread between individuals.
The article describes how epidemiology, infection control and sequence identification were used together to influence outcome in this situation, but also shows the power of whole genome sequencing to find and track subtle differences between isolates that could not have been identified in any other way.
To me, this is a powerful illustration of just how far DNA sequencing has come over the last few years. Not so long ago, the idea of sequencing the entire genome of numerous disease isolates during an outbreak would have been almost laughable—an idea confined to episodes of the X-files or to science fiction stories. Now, thanks to advanced automated sequencing technologies and the computing power to analyze the results, it is doable within a reasonable timeframe for hospitals with access to the right facilities. Although this type of investigation is still beyond the capabilities of most hospitals, the costs and turnaround times for sequencing are coming down rapidly as new technologies capable of faster, cheaper analysis become available.
We have come a very long way since the days when DNA sequencing was a laborious process involving pouring a gel, running samples,and manually reading the resulting autoradiogram hoping to get a read of 50–100 bases. My reading of the wired article prompted me to find out more about the newer types of sequencing technology available today. Here’s what I learned about each:
First generation automated sequencers used the Sanger method, which is based on the incorporation labeled modified nucleotides (ddNTPs) into the growing DNA strand by DNA polymerase. Once a ddNTP is incorporated, the reaction cannot continue, leading to the generation of a range of DNA fragments. These fragments are then separated by size and their sequence is determined based on the dye label on the terminating ddNTP of each fragment. The fragments can then be assembled into larger contiguous sequences by matching overlapping fragment sequences. Automated implementations of the Sanger approach were the mainstay of the early whole genome sequencing successes in the 1990s. Read lengths of 800 bases per run were possible—a huge improvement on manual methods, but still limited by the small amount of DNA that could be processed in a single run. These technologies were used for the human genome project, which took ~10 years to complete.
Second generation sequencing methods vastly increase throughput by eliminating the need to separate DNA fragments based on size, and by enabling simultaneous interrogation of large numbers of fragments in a single run. These sequencing technologies use PCR to amplify clusters of each DNA template, which are attached to a solid surface and are then interrogated with nucleotides and imaged/measured as the DNA is sequenced. There are several technologies available: Illumina sequencers use reversible terminator dye-labeled nucleotides to interrogate the captured DNA. Once each base is read, the terminator and dye are removed by cleavage and washing, creating a normal nucleotide. The strand is once again extensible and the process is repeated to continue sequencing along the strand. Instead of using dye-labeled nucleotides, the Ion Torrent sequencer measures the release of hydrogen ions upon base incorporation, and the 454 system measures luminescence upon nucleotide incorporation. Second generation sequencers can process large numbers of samples in parallel, increasing speed and throughput, and making sequencing of whole genomes in shorter timeframes much more affordable. The monitoring of the K. pneumoniae outbreak in real-time reported in the Wired article was made possible by second generation sequencing technologies, as was the recent ENCODE project, which sequenced all the intergenic regions of the human genome and took 5 years to complete.
Third generation sequencing methods interrogate single DNA molecules, eliminating the need for PCR and any associated errors resulting from misincorporation of bases during amplification. They also speed up the process by eliminating the need to halt the sequencing process for washing after each base incorporation (such as the cleavage and washing steps described for second generation systems). They offer the potential for sequencing single molecules, and offer much higher throughput, faster sequencing, longer read lengths and reduced cost compared to second generation methods. One example of a third generation sequencing technology is the Pacific Biosciences SMRT system. In this system the DNA is synthesized in small nanometer-scale chambers containing a single DNA polymerase molecule. Laser light from underneath only penetrates the bottom 30nm of the chamber. Fluorescent nucleotides diffuse into the chamber from above, and do not fluoresce until they reach the bottom 30nm of the chamber. When the correct nucleotide is detected by the polymerase, it is incorporated into the growing DNA strand in a process that takes longer than simple diffusion, resulting in a higher signal intensity from incorporated than non-incorporated nucleotides. This process allows monitoring of DNA synthesis in real time, and can generate sequence reads of 2,900 bases per run.
Further Information on Sequencing Techniques
There is a good overview of second generation sequencing techniques here, with some nice illustrations of each technique.
Here is a video covering next generation sequencing methods: