Massively parallel sequencing (MPS), also called next generation sequencing (NGS), has the potential to alleviate some of the biggest challenges facing forensic laboratories, namely degraded DNA and samples containing DNA from multiple contributors. Unlike capillary electrophoresis, MPS genotyping methods do not require fluorescently-labeled oligonucleotides to distinguish amplification products of similar size. Furthermore, it is not necessary to design primers within a color channel to generate amplicons of different sizes to avoid allele overlap. Consequently, all the amplicons can be of a similar, small size (typically <275 base pairs). The small size of the amplicons is particularly advantageous when working with degraded DNA. Because the alleles are distinguished by the number of repeats and the DNA sequence, additional information can be derived from a sample. This can be especially important when genotyping mixtures. As previously demonstrated (1), this sequence variation can help distinguish stutter “peaks” from minor contributor alleles.
Because there is no reliance upon size and fluorescent label, significantly greater multiplexing is possible with MPS approaches. In addition to autosomal short tandem repeats (STRs), we can also sequence Y-STRs, single nucleotide polymorphisms (SNPs), and the mitochondrial DNA control region. The advantage to this approach is the forensic analyst does not need a priori knowledge whether a sample would benefit most from the different methods of genotyping.
Despite these major advantages, there are limitations to the near-term, broad deployment of current MPS technology into forensic laboratories. The limitations fall into four main categories: Workflow, costs, performance with forensically-relevant samples, and community guidelines.
The initial steps in the MPS workflow on the Illumina sequencing platforms (MiSeq® and MiSeq® FGx) are virtually identical to the workflow used for capillary electrophoresis.* DNA is extracted from the samples, the amount of DNA quantitated, and PCR is performed to amplify the genomic DNA regions of interest (e.g., STR loci or SNP sites). At this point, an aliquot of the amplification reaction would typically be combined with formamide and heated to denature the amplicons, then the fragments would be separated and detected on a capillary electrophoresis device. With MPS, the amplicons need to be modified to attach adaptors that are necessary for DNA sequencing. The modification may entail a second amplification to add the adaptor sequences onto the initial amplicons, or a series of enzymatic reactions to ligate the adaptor sequences onto the amplicons. Regardless of the approach used, excess primers and primer-dimers must be eliminated by size selection using paramagnetic beads or columns. After purification, the amplicons are quantitated and normalized prior to loading onto the instrument. The actual sequencing reaction requires a minimum of about 30 hours.
The reagent cost and limited throughput (time to result) make it prohibitive to routinely sequence a single sample. Cost are reduced, and throughput is increased by pooling a number of samples for each sequencing run. The adaptors used for each sample include a unique sequence called an index or a barcode (i.e., each sample is assigned its own unique index). After the individual samples are processed as described above, they are pooled with other samples prior to loading onto the instrument. The unique index allows the sequence data from each sample to be bioinformatically placed into its own “bin” for analysis. A sequencing run generates a minimum of 10 gigabytes of data. Although data storage costs have declined dramatically over the last several years, laboratories will have to invest in hardware to store and retrieve the vast amounts of data generated.
Most studies demonstrating the utility of MPS for forensic genotyping have been performed on single source samples of high quality or fabricated mixtures generated with high quality samples. Additional data, generated in multiple laboratories, is needed to better define an appropriate analytical threshold. Once established, the analytical threshold will define the number of samples that can be indexed, especially when using casework samples. For example, when indexing 10 samples, it is possible to distinguish minor contributors from stutter artifacts associated with the major contributor (assuming a difference in the DNA sequence). At higher levels of indexing, the number of reads from a minor contributor will quickly become marginalized. This is clearly demonstrated with data generated at Promega (2). Samples containing a high proportion of female-to-male DNA, typical of many sexual assault samples, can present problems with MPS when attempting to simultaneous sequence autosomal and Y-STR alleles. At DNA ratios of 216:1, we observed <100 reads for the Y-STR loci when indexing 10 samples. If a greater number of samples were sequenced in the same run (i.e., an increased number of indexed samples), the number of observed reads would likely fall below limits of detection. The data suggests the high concentration of female autosomal amplicons are simply saturating the MiSeq® flow cell. In contrast, this problem is not observed when only attempting to sequence Y-STRs or when using capillary electrophoresis-based Y-STR genotyping systems (2, 3).
Community guidelines and improvements to the technology are needed before routine adoption of MPS genotyping in forensic laboratories. Some countries forbid the use of phenotype or ancestry SNPs. Will the laboratory be expected to review and report SNP data if a full profile is generated with autosomal STRs? Efforts are underway to gather feedback from the community. For example, Battelle is currently managing an NIJ Feasibility and Guidance Study involving a variety of federal, state and local forensic laboratories. The purpose of the study is to evaluate existing MPS genotyping systems (reagents, instrumentation and workflow). At the conclusion of the study, the participating laboratories will present their observations and recommendations (personal communication from Rich Guerrieri, the Battelle scientist managing the study). In addition, workshops have been organized at international meetings the last few years to increase awareness of the strengths and weaknesses of the technology and how it relates to current capillary electrophoresis approaches for genotyping. As additional data is generated by the forensic community, guidelines are beginning to emerge (4). In the interim, MPS is a useful complement to capillary electrophoresis.
- Zeng, X., King, J., Hermanson, S., Patel, J., Storts, D.R. and Budowle, B. (2015) An evaluation of the PowerSeq™ Auto system: A multiplex short tandem repeat marker kit compatible with massively parallel sequencing. Forensic Sci. Int. Genet. 19, 172–9.
- Massively Parallel Sequencing for Forensic DNA Analysis (webinar) http://www.promega.com/resources/webinars/worldwide/archive/massively-parallel-sequencing-for-forensic-analysis/
- Thompson, J.M., Ewing, M.M., Frank, W.E., Pogemiller, J.M., Nolde, C.A., Koehler, D.J., Shaffer, A.M., Rabbach, D.R., Fulmer, P.M., Sprecher, C.J. and Storts, D.R. (2013) Developmental validation of the PowerPlex® Y23 System: A single multiplex Y-STR analysis system for casework and database samples. Forensic Sci. Int. Genet. 7, 240–50.
- Parson, W. et al. (2016) Massively parallel sequencing of forensic STRs: Considerations of the DNA commission of the International Society for Forensic Genetics (ISFG) on minimal nomenclature requirements. Forensic Sci. Int. Genet. 22, 54–63.
*I have chosen to comment on the MiSeq® platforms for two reasons. Our experience at Promega is limited to the Illumina MiSeq® and the ThermoFisher Ion PGM™ System is used primarily for SNP analysis.