Tuberculosis (TB) remains one of the deadliest infectious diseases globally, with millions of new cases and over a million deaths each year. The rise of drug-resistant strains has only complicated treatment and control efforts, turning TB into a moving target for clinicians and public health officials alike. Understanding how TB spreads, evolves and becomes resistant requires more than just microscopes and cultures—it demands a detailed look at the bacterium’s genetic code.

In a recent study published in Scientific Data, Ghodousi et al. (2025) present a vast collection of whole genome sequences (WGS) from Mycobacterium tuberculosis complex (MTBC) strains collected across Italy. Spanning four major regions—Lombardy, Piedmont, Emilia-Romagna, and Lazio—this dataset includes 2,520 isolates obtained between 2017 and 2020. It’s the largest publicly available TB genomic dataset ever assembled from Italy, and it provides a critical resource for improving TB research and surveillance.
From Clinic to Code: How the Data Was Built
Every genome in this study began as a patient-derived sample. The researchers first cultured the Mtb strains, then extracted DNA using the Maxwell® 16 System and the Maxwell® 16 Tissue DNA Purification Kit from Promega. These automated tools facilitated the consistent recovery of high-quality DNA, which is critical for the next step: sequencing.
Whole genome sequencing (WGS) was primarily conducted using Illumina platforms, with a subset of isolates from the Lazio region sequenced using Ion Torrent technology. The sequencing data were then analyzed through a comprehensive bioinformatics pipeline to identify single nucleotide polymorphisms (SNPs), assign phylogenetic lineages and detect mutations associated with antibiotic resistance. To ensure the reliability of the dataset, the researchers validated sequence quality and integrated each genome with detailed clinical metadata, including drug susceptibility test (DST) results and geographic information.
What the Tuberculosis Genomes Revealed
This genomic atlas of TB in Italy offers rich insights into how different strains are distributed and how drug resistance mutations are evolving. Mutations in key resistance genes—such as rpoB (rifampicin resistance), katG (isoniazid resistance), and gyrA (fluoroquinolone resistance)—were cataloged and analyzed across regions. The study also traced phylogenetic lineages, allowing researchers to observe how various strains are related and how they may have moved through populations.
The dataset provides a molecular epidemiology framework that can help identify clusters of transmission, detect emerging resistance, and inform tailored public health responses. This dataset provides the foundation to study whether emerging multidrug-resistant strains represent local evolution or are introduced from external regions.
Why This Matters
Traditional TB surveillance often relies on slower methods like phenotypic drug testing and contact tracing. While still important, these tools can miss key details—especially when it comes to detecting resistance early or understanding transmission patterns. By integrating WGS into surveillance programs, officials can gain a real-time view of TB evolution and spread.
The study’s commitment to open data is equally important. By making this vast genomic resource publicly available, the researchers have enabled a global community of scientists to explore, reanalyze, and build upon this work. Whether you’re modeling the spread of resistant TB in Europe or designing new diagnostics, this dataset offers a solid foundation.
The Takeaway
This research underscores how genomic surveillance, powered by robust lab workflows and supported by tools like the Promega Maxwell® platform, can transform our understanding of infectious disease. By decoding the genome of each TB strain, we’re not just learning about the past—we’re equipping ourselves to shape a smarter, faster response in the future.
Reference
Ghodousi A. et al. (2025) Comprehensive Whole Genome Sequencing Dataset of Mycobacterium tuberculosis Strains Collected Across Italy. Sci Data. 12 624. DOI:10.1038/s41597-025-04966-1

Sara Millevolte

Latest posts by Sara Millevolte (see all)
- Tuberculosis Genome Mapping in Italy: How 2,520 Strains Are Shaping the Future of TB Surveillance - May 8, 2025
- The Long Road to a Norovirus Vaccine: How Close Are We? - April 10, 2025
- Cracking the Undruggable Code: Top 10 Key Takeaways - December 10, 2024