人类线粒体基因组的组装、注释
线粒体基因组从头组装与计算分析流程
该组论文集中探讨如何从Sanger、NGS及长读长测序数据中实现线粒体基因组的从头组装、重构及处理流程,包含多种针对不同测序平台优化的集成分析工具与自动化流水线。
- MitoZ: A toolkit for mitochondrial genome assembly, annotation and visualization(Guanliang Meng, Yiyuan Li, Chentao Yang, Shanlin Liu, 2018, bioRxiv)
- MitoZ: a toolkit for animal mitochondrial genome assembly, annotation and visualization(Guanliang Meng, Yiyuan Li, Chentao Yang, Shanlin Liu, 2019, Nucleic Acids Research)
- MitoFish and MitoAnnotator: A Mitochondrial Genome Database of Fish with an Accurate and Automatic Annotation Pipeline(W. Iwasaki, Tsukasa Fukunaga, Ryota Isagozawa, Koichiro Yamada, Y. Maeda, Takashi P. Satoh, T. Sado, K. Mabuchi, H. Takeshima, M. Miya, M. Nishida, 2013, Molecular Biology and Evolution)
- MitoSuite: a graphical tool for human mitochondrial genome profiling in massive parallel sequencing(Koji Ishiya, S. Ueda, 2017, PeerJ)
- From cheek swabs to consensus sequences: an A to Z protocol for high-throughput DNA sequencing of complete human mitochondrial genomes(Andrew C. Clarke, S. Prost, Jo‐Ann L. Stanton, Timothy J White, M. Kaplan, Elizabeth A. Matisoo-Smith, The Genographic Consortium, 2014, BMC Genomics)
- MitoTool: a web server for the analysis and retrieval of human mitochondrial DNA sequence variations.(Long Fan, Yong-Gang Yao, 2011, Mitochondrion)
- DeGeCI 1.1: a web platform for gene annotation of mitochondrial genomes(L Fiedler, M Bernt, M Middendorf, 2024, Bioinformatics Advances)
- Index-free de novo assembly and deconvolution of mixed mitochondrial genomes(BJ McComish, SFK Hills, PJ Biggs, 2010, Genome Biology and …)
- Long read mitochondrial genome sequencing using Cas9-guided adaptor ligation(Amy R. Vandiver, Brittany Pielstick, Timothy Gilpatrick, Austin N. Hoang, Hillary J. Vernon, Jonathan Wanagat, Winston Timp, 2022, Mitochondrion)
- Comparison of mitochondrial DNA variants detection using short- and long-read sequencing(Ahmed N. Alkanaq, K. Hamanaka, F. Sekiguchi, M. Taguri, A. Takata, N. Miyake, S. Miyatake, T. Mizuguchi, N. Matsumoto, 2019, Journal of Human Genetics)
- Sequencing and de novo assembly of 150 genomes from Denmark as a population reference(L. Maretty, J. M. Jensen, Bent Petersen, J. Sibbesen, Siyang Liu, Palle Villesen, Laurits Skov, Kirstine Belling, C. Have, J. Izarzugaza, M. Grosjean, J. Bork-Jensen, J. Grove, T. Als, Shujia Huang, Yuqi Chang, Ruiqi Xu, Weijian Ye, Junhua Rao, Xiaosen Guo, Jihua Sun, H. Cao, C. Ye, J. V. Beusekom, T. Espeseth, E. Flindt, R. M. Friborg, Anders E. Halager, S. Hellard, C. Hultman, F. Lescai, Shengting Li, O. Lund, Peter Løngreen, T. Mailund, M. Matey-Hernandez, O. Mors, Christian N. S. Pedersen, Thomas Sicheritz-Pontén, P. Sullivan, Ali Syed, D. Westergaard, Rachita Yadav, Ning Li, Xun Xu, T. Hansen, A. Krogh, L. Bolund, T. Sørensen, O. Pedersen, Ramneek Gupta, Simon Rasmussen, Søren Besenbacher, A. Børglum, Jun Wang, H. Eiberg, K. Kristiansen, S. Brunak, M. Schierup, 2017, Nature)
- A high-throughput Sanger strategy for human mitochondrial genome sequencing(Elizabeth A. Lyons, M. Scheible, K. Sturk-Andreaggi, J. Irwin, R. Just, 2013, BMC Genomics)
- The diversity present in 5140 human mitochondrial genomes.(Luísa Pereira, Fernando Freitas, V. Fernandes, Joana B. Pereira, Marta D. Costa, S. Costa, V. Máximo, V. Macaulay, Ricardo Rocha, D. Samuels, 2009, The American Journal of Human Genetics)
- Aberrant Mitochondrial tRNA Genes Appear Frequently in Animal Evolution(Iuliia Ozerova, Jörg Fallmann, M. Mörl, Matthias Bernt, Sonja J. Prohaska, Peter F. Stadler, 2024, Genome Biology and Evolution)
- Gene annotation errors are common in the mammalian mitochondrial genomes database(Carlos F. Prada, J. Boore, 2019, BMC Genomics)
- Structures of the human mitochondrial ribosome in native states of assembly(Alan Brown, S. Rathore, D. Kimanius, S. Aibara, Xiao-chen Bai, Xiao-chen Bai, J. Rorbach, J. Rorbach, Alexey Amunts, Alexey Amunts, V. Ramakrishnan, 2017, Nature Structural & Molecular Biology)
- Separating and Segregating the Human Mitochondrial Genome.(T. Nicholls, C. Gustafsson, 2018, Trends in Biochemical Sciences)
- A benchmarking of human mitochondrial DNA haplogroup classifiers from whole-genome and whole-exome sequence data(Víctor García-Olivares, A. Muñoz-Barrera, J. Lorenzo-Salazar, Carlos Zaragoza‐Trello, L. A. Rubio-Rodríguez, A. Díaz-de Usera, D. Jáspez, Antonio Iñigo Campos, R. González-Montelongo, C. Flores, 2021, Scientific Reports)
- Mitochondrial genomes gleaned from human whole-exome sequencing(E. Picardi, G. Pesole, 2012, Nature Methods)
- Genetic variation and the de novo assembly of human genomes(Mark J. P. Chaisson, R. Wilson, E. Eichler, 2015, Nature Reviews Genetics)
- mtDNA-Server: next-generation sequencing data analysis of human mitochondrial DNA in the cloud(H. Weissensteiner, L. Forer, C. Fuchsberger, Bernd Schöpf, A. Kloss-Brandstätter, Günther Specht, F. Kronenberg, S. Schönherr, 2016, Nucleic Acids Research)
- Advance in the assembly of the plant mitochondrial genomes using high‐throughput DNA sequencing data of total cellular DNAs(Y. Ni, Jingling Li, Yihui Tan, Guoan Shen, Chang Liu, 2025, Plant Biotechnology Journal)
- An efficient preprocessing workflow tailored for mitochondrial genome assembly from fragmented DNA.(Yongheng Zhou, Peng Gao, Shuhui Yang, Yanchun Xu, 2025, Forensic Science International)
- HiFi long-read amplicon sequencing for full-spectrum variants of human mtDNA(Yan Lin, Jiayin Wang, Ran Xu, Zhe Xu, Yifan Wang, Shirang Pan, Yan Zhang, Qing Tao, Yuying Zhao, C. Yan, Zhenhua Cao, K. Ji, 2024, BMC Genomics)
- mtDNA analysis using Mitopore(Jochen Dobner, Thach Nguyen, Mario Pavez-Giani, Lukas Cyganek, F. Distelmaier, J. Krutmann, A. Prigione, Andrea Rossi, 2024, Molecular Therapy - Methods & Clinical Development)
- Nanopore long-read next-generation sequencing for detection of mitochondrial DNA large-scale deletions(Chiara Frascarelli, N. Zanetti, A. Nasca, Rossella Izzo, C. Lamperti, E. Lamantea, A. Legati, D. Ghezzi, 2023, Frontiers in Genetics)
- MToolBox: a highly automated pipeline for heteroplasmy annotation and prioritization analysis of human mitochondrial variants in high-throughput sequencing(C. Calabrese, D. Simone, M. Diroma, Mariangela Santorsola, C. Guttà, G. Gasparre, E. Picardi, G. Pesole, M. Attimonelli, 2014, Bioinformatics)
- A Revised Timescale for Human Evolution Based on Ancient Mitochondrial Genomes(Qiaomei Fu, Alissa Mittnik, Philip L. F. Johnson, Kirsten I. Bos, Martina Lari, Ruth Bollongino, Chengkai Sun, Liane Giemsch, Ralf W. Schmitz, Joachim Bürger, Anna María Ronchitelli, Fabio Martini, Renata Grifoni Cremonesi, Jiřı́ Svoboda, Peter Bauer, David Caramelli, Sergi Castellano, David Reich, Svante Pääbo, Johannes Krause, 2013, Current Biology)
- An integrated pipeline for next-generation sequencing and annotation of mitochondrial genomes(A. Jex, Ross S Hall, D. Littlewood, R. Gasser, 2009, Nucleic Acids Research)
- Investigating Human Mitochondrial Genomes in Single Cells(M. Diroma, Angelo Sante Varvara, M. Attimonelli, G. Pesole, E. Picardi, 2020, Genes)
线粒体基因组功能注释与临床变异解读
该组研究关注线粒体基因组基因边界识别、RNA/蛋白质编码基因的准确注释,以及针对临床医学背景下的变异识别、异质性分析及病理性解读工具。
- Automating the identification of DNA variations using quality-based fluorescence re-sequencing: analysis of the human mitochondrial genome.(M. Rieder, Scott L. Taylor, V. Tobe, D. Nickerson, 1998, Nucleic Acids Research)
- MitoBamAnnotator: A web-based tool for detecting and annotating heteroplasmy in human mitochondrial DNA sequences.(I. Zhidkov, T. Nagar, D. Mishmar, E. Rubin, 2011, Mitochondrion)
- Bioinformatics Tools and Databases to Assess the Pathogenicity of Mitochondrial DNA Variants in the Field of Next Generation Sequencing(C. Bris, D. Goudenège, V. Desquiret-Dumas, M. Charif, E. Colin, D. Bonneau, P. Amati‐Bonneau, G. Lenaers, P. Reynier, V. Procaccio, 2018, Frontiers in Genetics)
- Next-Generation Sequencing of Human Mitochondrial Reference Genomes Uncovers High Heteroplasmy Frequency(M. X. Sosa, I. Sivakumar, Samantha Maragh, V. Veeramachaneni, R. Hariharan, M. Parulekar, Karin M. Fredrikson, T. Harkins, Jeffrey Lin, A. Feldman, Pramila Tata, G. Ehret, A. Chakravarti, 2012, PLoS Computational Biology)
- MitoScape: A big-data, machine-learning platform for obtaining mitochondrial DNA from next-generation sequencing data(Larry N. Singh, Brian M. Ennis, Bryn Loneragan, N. Tsao, M. Sanchez, Jianping Li, P. Acheampong, O. Tran, I. Trounce, Yuankun Zhu, P. Potluri, Regeneron Genetics Center, B. Emanuel, D. Rader, Z. Arany, S. Damrauer, A. Resnick, S. Anderson, D. Wallace, 2021, PLOS Computational Biology)
- Enhanced mitochondrial genome analysis: bioinformatic and long-read sequencing advances and their diagnostic implications(W. Macken, Micol Falabella, C. Pizzamiglio, C. Woodward, E. Scotchman, L. Chitty, J. Polke, E. Bugiardini, M. Hanna, J. Vandrovcova, N. Chandler, R. Labrum, R. Pitceathly, 2023, Expert Review of Molecular Diagnostics)
- Bioinformatics Tools for NGS-Based Identification of Single Nucleotide Variants and Large-Scale Rearrangements in Mitochondrial DNA(Marco Barresi, Giulia Dal Santo, Rossella Izzo, A. Zauli, E. Lamantea, L. Caporali, Daniele Ghezzi, A. Legati, 2025, BioTech)
- MITOMASTER: a bioinformatics tool for the analysis of mitochondrial DNA sequences(Marty C. Brandon, E. Ruiz‐Pesini, D. Mishmar, V. Procaccio, M. Lott, K. Nguyen, Syawal Spolim, U. Patil, P. Baldi, D. Wallace, 2009, Human Mutation)
- A comprehensive collection of annotations to interpret sequence variation in human mitochondrial transfer RNAs(M. Diroma, Paolo Lubisco, M. Attimonelli, 2016, BMC Bioinformatics)
- Improved systematic tRNA gene annotation allows new insights into the evolution of mitochondrial tRNA structures and into the mechanisms of mitochondrial genome rearrangements(Frank Jühling, J. Pütz, Matthias Bernt, A. Donath, M. Middendorf, C. Florentz, P. Stadler, 2011, Nucleic Acids Research)
- A multi-parametric workflow for the prioritization of mitochondrial DNA variants of clinical interest(Mariangela Santorsola, C. Calabrese, Giulia Girolimetti, M. Diroma, G. Gasparre, M. Attimonelli, 2015, Human Genetics)
- Automatic annotation of organellar genomes with DOGMA(Stacia K. Wyman, Robert K. Jansen, J. Boore, 2004, Bioinformatics)
- Accurate annotation of protein-coding genes in mitochondrial genomes.(Marwa Al Arab, Christian Höner zu Siederdissen, K. Tout, Abdullah H. Sahyoun, P. Stadler, Matthias Bernt, 2017, Molecular Phylogenetics and Evolution)
- SG-ADVISER mtDNA: a web server for mitochondrial DNA annotation with data from 200 samples of a healthy aging cohort(M. Rueda, A. Torkamani, 2017, BMC Bioinformatics)
- Benchmarking the Effectiveness and Accuracy of Multiple Mitochondrial DNA Variant Callers: Practical Implications for Clinical Application(E. Ip, Michael Troup, Colin Xu, D. Winlaw, S. Dunwoodie, E. Giannoulatou, 2022, Frontiers in Genetics)
- MSeqDR mvTool: A mitochondrial DNA Web and API resource for comprehensive variant annotation, universal nomenclature collation, and reference genome conversion(Lishuang Shen, M. Attimonelli, Renkui Bai, M. Lott, D. Wallace, Marni J. Falk, X. Gai, 2018, Human Mutation)
- Mitochondrial genome annotation with MFannot: a critical analysis of gene identification and gene model prediction(B. Lang, Natacha Beck, Samuel Prince, M. Sarrasin, Pierre A. Rioux, G. Burger, Jeffrey P. Mower, G. Hausner, 2023, Frontiers in Plant Science)
- Improved annotation of protein-coding genes boundaries in metazoan mitochondrial genomes(A. Donath, Frank Jühling, Marwa Al-Arab, S. Bernhart, Franziska Reinhardt, P. Stadler, M. Middendorf, Matthias Bernt, 2019, Nucleic Acids Research)
- MITOS: improved de novo metazoan mitochondrial genome annotation.(Matthias Bernt, A. Donath, Frank Jühling, Fabian Externbrink, C. Florentz, G. Fritzsch, J. Pütz, M. Middendorf, P. Stadler, 2013, Molecular Phylogenetics and Evolution)
- Fully automated annotation of mitochondrial genomes using a cluster-based approach with de Bruijn graphs(Lisa Fiedler, M. Middendorf, Matthias Bernt, 2023, Frontiers in Genetics)
核嵌入线粒体序列(NUMTs)的识别与干扰控制
该组文献专门研究线粒体DNA向核基因组的迁移现象,重点在于鉴定核线粒体假基因(NUMTs),评估其对线粒体基因组测序、数据质量及进化生物学研究造成的假阳性与干扰。
- Nuclear-embedded mitochondrial DNA sequences in 66,083 human genomes(Wei Wei, K. Schon, G. Elgar, A. Orioli, M. Tanguy, A. Giess, M. Tischkowitz, M. Caulfield, P. Chinnery, 2022, Nature)
- The Mighty NUMT: Mitochondrial DNA Flexing Its Code in the Nuclear Genome(Liying Xue, Jesse D. Moreira, Karan K. Smith, J. Fetterman, 2023, Biomolecules)
- The genomic landscape of polymorphic human nuclear mitochondrial insertions(Gargi Dayama, Sarah B. Emery, J. Kidd, Ryan E. Mills, 2014, Nucleic Acids Research)
- Extraction and annotation of human mitochondrial genomes from 1000 Genomes Whole Exome Sequencing data(M. Diroma, C. Calabrese, D. Simone, Mariangela Santorsola, F. M. Calabrese, G. Gasparre, M. Attimonelli, 2014, BMC Genomics)
- Preventing the pollution of mitochondrial datasets with nuclear mitochondrial paralogs (numts).(Sébastien Calvignac, L. Konecny, F. Malard, C. Douady, 2011, Mitochondrion)
- Comprehensive Identification of Mitochondrial Pseudogenes (NUMTs) in the Human Telomere-to-Telomere Reference Genome(Yichen Tao, Chengpeng He, Deng Lin, Zhenglong Gu, Weilin Pu, 2023, Genes)
- Probability-Based Sequence Comparison Finds Pre-Eutherian Nuclear Mitochondrial DNA Segments in Mammalian Genomes(Muyao Huang, Martin C. Frith, 2026, Journal of Computational Biology)
- Fidelity of capture-enrichment for mtDNA genome sequencing: influence of NUMTs(MingKun Li, R. Schroeder, Albert Ko, M. Stoneking, 2012, Nucleic Acids Research)
- Human mitochondrial DNA in public metagenomes: Opportunity or privacy threat?(Mohamed S. Sarhan, Giacomo Antonello, H. Weissensteiner, Claudia Mengoni, D. Mascalzoni, Levi Waldron, N. Segata, Christian Fuchsberger, 2025, Cell)
人类线粒体基因组研究已构建出成熟的计算生态体系,涵盖了从原始序列重构到精细注释再到变异致病性解读的全过程。当前的重点方向包括:利用新兴的长读长测序技术提升基因组组装的准确性;开发高度自动化、集成化的流程以支持高通量临床诊断;以及针对干扰严重的核嵌入序列(NUMTs)建立过滤标准,从而确保下游临床分析数据的可靠性与准确性。
总计59篇相关文献
BackgroundWhole Exome Sequencing (WES) is one of the most used and cost-effective next generation technologies that allows sequencing of all nuclear exons. Off-target regions may be captured if they present high sequence similarity with baits. Bioinformatics tools have been optimized to retrieve a large amount of WES off-target mitochondrial DNA (mtDNA), by exploiting the aspecificity of probes, partially overlapping to Nuclear mitochondrial Sequences (NumtS). The 1000 Genomes project represents one of the widest resources to extract mtDNA sequences from WES data, considering the large effort the scientific community is undertaking to reconstruct human population history using mtDNA as marker, and the involvement of mtDNA in pathology.ResultsA previously published pipeline aimed at assembling mitochondrial genomes from off-target WES reads and further improved to detect insertions and deletions (indels) and heteroplasmy in a dataset of 1242 samples from the 1000 Genomes project, enabled to obtain a nearly complete mitochondrial genome from 943 samples (76% analyzed exomes). The robustness of our computational strategy was highlighted by the reduction of reads amount recognized as mitochondrial in the original annotation produced by the Consortium, due to NumtS filtering.An accurate survey was carried out on 1242 individuals. 215 indels, mostly heteroplasmic, and 3407 single base variants were mapped. A homogeneous mismatches distribution was observed along the whole mitochondrial genome, while a lower frequency of indels was found within protein-coding regions, where frameshift mutations may be deleterious. The majority of indels and mismatches found were not previously annotated in mitochondrial databases since conventional sequencing methods were limited to homoplasmy or quasi-homoplasmy detection. Intriguingly, upon filtering out non haplogroup-defining variants, we detected a widespread population occurrence of rare events predicted to be damaging. Eventually, samples were stratified into blood- and lymphoblastoid-derived to detect possibly different trends of mutability in the two datasets, an analysis which did not yield significant discordances.ConclusionsTo the best of our knowledge, this is likely the most extended population-scale mitochondrial genotyping in humans enriched with the estimation of heteroplasmies.
… for any human mitochondrial protein-coding gene in three … for assembling complete mitochondrial genomes from off-target … our assembly program and imported into the UCSC …
Mitochondria host multiple copies of their own small circular genome that has been extensively studied to trace the evolution of the modern eukaryotic cell and discover important mutations linked to inherited diseases. Whole genome and exome sequencing have enabled the study of mtDNA in a large number of samples and experimental conditions at single nucleotide resolution, allowing the deciphering of the relationship between inherited mutations and phenotypes and the identification of acquired mtDNA mutations in classical mitochondrial diseases as well as in chronic disorders, ageing and cancer. By applying an ad hoc computational pipeline based on our MToolBox software, we reconstructed mtDNA genomes in single cells using whole genome and exome sequencing data obtained by different amplification methodologies (eWGA, DOP-PCR, MALBAC, MDA) as well as data from single cell Assay for Transposase Accessible Chromatin with high-throughput sequencing (scATAC-seq) in which mtDNA sequences are expected as a byproduct of the technology. We show that assembled mtDNAs, with the exception of those reconstructed by MALBAC and DOP-PCR methods, are quite uniform and suitable for genomic investigations, enabling the study of various biological processes related to cellular heterogeneity such as tumor evolution, neural somatic mosaicism and embryonic development.
Abstract Mitochondrial genome (mitogenome) plays important roles in evolutionary and ecological studies. It becomes routine to utilize multiple genes on mitogenome or the entire mitogenomes to investigate phylogeny and biodiversity of focal groups with the onset of High Throughput Sequencing (HTS) technologies. We developed a mitogenome toolkit MitoZ, consisting of independent modules of de novo assembly, findMitoScaf (find Mitochondrial Scaffolds), annotation and visualization, that can generate mitogenome assembly together with annotation and visualization results from HTS raw reads. We evaluated its performance using a total of 50 samples of which mitogenomes are publicly available. The results showed that MitoZ can recover more full-length mitogenomes with higher accuracy compared to the other available mitogenome assemblers. Overall, MitoZ provides a one-click solution to construct the annotated mitogenome from HTS raw data and will facilitate large scale ecological and evolutionary studies. MitoZ is free open source software distributed under GPLv3 license and available at https://github.com/linzhi2013/MitoZ.
Next-generation DNA sequencing (NGS) technologies have made huge impacts in many fields of biological research, but especially in evolutionary biology. One area where NGS has shown potential is for high-throughput sequencing of complete mtDNA genomes (of humans and other animals). Despite the increasing use of NGS technologies and a better appreciation of their importance in answering biological questions, there remain significant obstacles to the successful implementation of NGS-based projects, especially for new users. Here we present an ‘A to Z’ protocol for obtaining complete human mitochondrial (mtDNA) genomes – from DNA extraction to consensus sequence. Although designed for use on humans, this protocol could also be used to sequence small, organellar genomes from other species, and also nuclear loci. This protocol includes DNA extraction, PCR amplification, fragmentation of PCR products, barcoding of fragments, sequencing using the 454 GS FLX platform, and a complete bioinformatics pipeline (primer removal, reference-based mapping, output of coverage plots and SNP calling). All steps in this protocol are designed to be straightforward to implement, especially for researchers who are undertaking next-generation sequencing for the first time. The molecular steps are scalable to large numbers (hundreds) of individuals and all steps post-DNA extraction can be carried out in 96-well plate format. Also, the protocol has been assembled so that individual ‘modules’ can be swapped out to suit available resources.
Recent rapid advances in high-throughput, next-generation sequencing (NGS) technologies have promoted mitochondrial genome studies in the fields of human evolution, medical genetics, and forensic casework. However, scientists unfamiliar with computer programming often find it difficult to handle the massive volumes of data that are generated by NGS. To address this limitation, we developed MitoSuite, a user-friendly graphical tool for analysis of data from high-throughput sequencing of the human mitochondrial genome. MitoSuite generates a visual report on NGS data with simple mouse operations. Moreover, it analyzes high-coverage sequencing data but runs on a stand-alone computer, without the need for file upload. Therefore, MitoSuite offers outstanding usability for handling massive NGS data, and is ideal for evolutionary, clinical, and forensic studies on the human mitochondrial genome variations. It is freely available for download from the website https://mitosuite.com.
We describe methods for rapid sequencing of the entire human mitochondrial genome (mtgenome), which involve long-range PCR for specific amplification of the mtgenome, pyrosequencing, quantitative mapping of sequence reads to identify sequence variants and heteroplasmy, as well as de novo sequence assembly. These methods have been used to study 40 publicly available HapMap samples of European (CEU) and African (YRI) ancestry to demonstrate a sequencing error rate <5.63×10−4, nucleotide diversity of 1.6×10−3 for CEU and 3.7×10−3 for YRI, patterns of sequence variation consistent with earlier studies, but a higher rate of heteroplasmy varying between 10% and 50%. These results demonstrate that next-generation sequencing technologies allow interrogation of the mitochondrial genome in greater depth than previously possible which may be of value in biology and medicine.
A population reference database of complete human mitochondrial genome (mtGenome) sequences is needed to enable the use of mitochondrial DNA (mtDNA) coding region data in forensic casework applications. However, the development of entire mtGenome haplotypes to forensic data quality standards is difficult and laborious. A Sanger-based amplification and sequencing strategy that is designed for automated processing, yet routinely produces high quality sequences, is needed to facilitate high-volume production of these mtGenome data sets. We developed a robust 8-amplicon Sanger sequencing strategy that regularly produces complete, forensic-quality mtGenome haplotypes in the first pass of data generation. The protocol works equally well on samples representing diverse mtDNA haplogroups and DNA input quantities ranging from 50 pg to 1 ng, and can be applied to specimens of varying DNA quality. The complete workflow was specifically designed for implementation on robotic instrumentation, which increases throughput and reduces both the opportunities for error inherent to manual processing and the cost of generating full mtGenome sequences. The described strategy will assist efforts to generate complete mtGenome haplotypes which meet the highest data quality expectations for forensic genetic and other applications. Additionally, high-quality data produced using this protocol can be used to assess mtDNA data developed using newer technologies and chemistries. Further, the amplification strategy can be used to enrich for mtDNA as a first step in sample preparation for targeted next-generation sequencing.
Accurate mitochondrial DNA (mtDNA) variant annotation is essential for the clinical diagnosis of diverse human diseases. Substantial challenges to this process include the inconsistency in mtDNA nomenclatures, the existence of multiple reference genomes, and a lack of reference population frequency data. Clinicians need a simple bioinformatics tool that is user‐friendly, and bioinformaticians need a powerful informatics resource for programmatic usage. Here, we report the development and functionality of the MSeqDR mtDNA Variant Tool set (mvTool), a one‐stop mtDNA variant annotation and analysis Web service. mvTool is built upon the MSeqDR infrastructure (https://mseqdr.org), with contributions of expert curated data from MITOMAP (https://www.mitomap.org) and HmtDB (https://www.hmtdb.uniba.it/hmdb). mvTool supports all mtDNA nomenclatures, converts variants to standard rCRS‐ and HGVS‐based nomenclatures, and annotates novel mtDNA variants. Besides generic annotations from dbNSFP and Variant Effect Predictor (VEP), mvTool provides allele frequencies in more than 47,000 germline mitogenomes, and disease and pathogenicity classifications from MSeqDR, Mitomap, HmtDB and ClinVar (Landrum et al., 2013). mvTools also provides mtDNA somatic variants annotations. “mvTool API” is implemented for programmatic access using inputs in VCF, HGVS, or classical mtDNA variant nomenclatures. The results are reported as hyperlinked html tables, JSON, Excel, and VCF formats. MSeqDR mvTool is freely accessible at https://mseqdr.org/mvtool.php.
The abundance of biological data characterizing the genomics era is contributing to a comprehensive understanding of human mitochondrial genetics. Nevertheless, many aspects are still unclear, specifically about the variability of the 22 human mitochondrial transfer RNA (tRNA) genes and their involvement in diseases. The complex enrichment and isolation of tRNAs in vitro leads to an incomplete knowledge of their post-transcriptional modifications and three-dimensional folding, essential for correct tRNA functioning. An accurate annotation of mitochondrial tRNA variants would be definitely useful and appreciated by mitochondrial researchers and clinicians since the most of bioinformatics tools for variant annotation and prioritization available so far cannot shed light on the functional role of tRNA variations. To this aim, we updated our MToolBox pipeline for mitochondrial DNA analysis of high throughput and Sanger sequencing data by integrating tRNA variant annotations in order to identify and characterize relevant variants not only in protein coding regions, but also in tRNA genes. The annotation step in the pipeline now provides detailed information for variants mapping onto the 22 mitochondrial tRNAs. For each mt-tRNA position along the entire genome, the relative tRNA numbering, tRNA type, cloverleaf secondary domains (loops and stems), mature nucleotide and interactions in the three-dimensional folding were reported. Moreover, pathogenicity predictions for tRNA and rRNA variants were retrieved from the literature and integrated within the annotations provided by MToolBox, both in the stand-alone version and web-based tool at the Mitochondrial Disease Sequence Data Resource (MSeqDR) website. All the information available in the annotation step of MToolBox were exploited to generate custom tracks which can be displayed in the GBrowse instance at MSeqDR website. To the best of our knowledge, specific data regarding mitochondrial variants in tRNA genes were introduced for the first time in a tool for mitochondrial genome analysis, supporting the interpretation of genetic variants in specific genomic contexts.
Although animal mitochondrial DNA sequences are known to evolve rapidly, their gene arrangements often remain unchanged over long periods of evolutionary time. Therefore, comparisons of mitochondrial genomes may result in significant insights into the evolution both of organisms and of genomes. Mammalian mitochondrial genomes recently published in the GenBank database of NCBI show numerous rearrangements in various regions of the genome, from which it may be inferred that the mammalian mitochondrial genome is more dynamic than expected. However, it is alternatively possible that these are errors of annotation and, if so, are misleading our interpretations. In order to verify these possible errors of annotation, we performed a comparative genomic analysis of mammalian mitochondrial genomes available in the NCBI database. Using a combination of bioinformatics methods to carefully examine the mitochondrial gene arrangements in 304 mammalian species, we determined that there are only two sets of gene arrangements, one that is shared by all of the marsupials and another that is shared by all of the monotremes and eutherians, with these two arrangements differing only by the positions of tRNA genes in the region commonly designated as “WANCY” for the genes it comprises. All of the 68 other cases of reported gene rearrangements are errors. We note that there are also numerous errors of impossibly short, incorrect gene annotations, cases where genomes that are reported as complete are actually missing portions of the sequence, and genes that are clearly present but were not annotated in these records. We judge that the application of simple bioinformatic tools in the verification of gene annotation, particularly for organelle genomes, would be a very useful enhancement for the curation of genome sequences submitted to GenBank.
… MitoBamAnnotator, the first tool dedicated to analyzing human mtDNA variation in general, … and contains a tool for rich annotation of mtDNA variation. This tool provides new avenues for …
Compared to nuclear genomes, mitochondrial genomes (mitogenomes) are small and usually code for only a few dozen genes. Still, identifying genes and their structure can be challenging and time-consuming. Even automated tools for mitochondrial genome annotation often require manual analysis and curation by skilled experts. The most difficult steps are (i) the structural modelling of intron-containing genes; (ii) the identification and delineation of Group I and II introns; and (iii) the identification of moderately conserved, non-coding RNA (ncRNA) genes specifying 5S rRNAs, tmRNAs and RNase P RNAs. Additional challenges arise through genetic code evolution which can redefine the translational identity of both start and stop codons, thus obscuring protein-coding genes. Further, RNA editing can render gene identification difficult, if not impossible, without additional RNA sequence data. Current automated mito- and plastid-genome annotators are limited as they are typically tailored to specific eukaryotic groups. The MFannot annotator we developed is unique in its applicability to a broad taxonomic scope, its accuracy in gene model inference, and its capabilities in intron identification and classification. The pipeline leverages curated profile Hidden Markov Models (HMMs), covariance (CMs) and ERPIN models to better capture evolutionarily conserved signatures in the primary sequence (HMMs and CMs) as well as secondary structure (CMs and ERPIN). Here we formally describe MFannot, which has been available as a web-accessible service (https://megasun.bch.umontreal.ca/apps/mfannot/) to the research community for nearly 16 years. Further, we report its performance on particularly intron-rich mitogenomes and describe ongoing and future developments.
… The sequence has been annotated to encompass all of the known mtDNA functional … Since there is an overlap between mtDNA variants found in legitimate mtDNA sequences and those …
Assigning a pathogenic role to mitochondrial DNA (mtDNA) variants and unveiling the potential involvement of the mitochondrial genome in diseases are challenging tasks in human medicine. Assuming that rare variants are more likely to be damaging, we designed a phylogeny-based prioritization workflow to obtain a reliable pool of candidate variants for further investigations. The prioritization workflow relies on an exhaustive functional annotation through the mtDNA extraction pipeline MToolBox and includes Macro Haplogroup Consensus Sequences to filter out fixed evolutionary variants and report rare or private variants, the nucleotide variability as reported in HmtDB and the disease score based on several predictors of pathogenicity for non-synonymous variants. Cutoffs for both the disease score as well as for the nucleotide variability index were established with the aim to discriminate sequence variants contributing to defective phenotypes. The workflow was validated on mitochondrial sequences from Leber’s Hereditary Optic Neuropathy affected individuals, successfully identifying 23 variants including the majority of the known causative ones. The application of the prioritization workflow to cancer datasets allowed to trim down the number of candidate for subsequent functional analyses, unveiling among these a high percentage of somatic variants. Prioritization criteria were implemented in both standalone (http://sourceforge.net/projects/mtoolbox/) and web version (https://mseqdr.org/mtoolbox.php) of MToolBox.
Whole genome and exome sequencing usually include reads containing mitochondrial DNA (mtDNA). Yet, state-of-the-art pipelines and services for human nuclear genome variant calling and annotation do not handle mitochondrial genome data appropriately. As a consequence, any researcher desiring to add mtDNA variant analysis to their investigations is forced to explore the literature for mtDNA pipelines, evaluate them, and implement their own instance of the desired tool. This task is far from trivial, and can be prohibitive for non-bioinformaticians. We have developed SG-ADVISER mtDNA, a web server to facilitate the analysis and interpretation of mtDNA genomic data coming from next generation sequencing (NGS) experiments. The server was built in the context of our SG-ADVISER framework and on top of the MtoolBox platform (Calabrese et al., Bioinformatics 30(21):3115–3117, 2014), and includes most of its functionalities (i.e., assembly of mitochondrial genomes, heteroplasmic fractions, haplogroup assignment, functional and prioritization analysis of mitochondrial variants) as well as a back-end and a front-end interface. The server has been tested with unpublished data from 200 individuals of a healthy aging cohort (Erikson et al., Cell 165(4):1002–1011, 2016) and their data is made publicly available here along with a preliminary analysis of the variants. We observed that individuals over ~90 years old carried low levels of heteroplasmic variants in their genomes. SG-ADVISER mtDNA is a fast and functional tool that allows for variant calling and annotation of human mtDNA data coming from NGS experiments. The server was built with simplicity in mind, and builds on our own experience in interpreting mtDNA variants in the context of sudden death and rare diseases. Our objective is to provide an interface for non-bioinformaticians aiming to acquire (or contrast) mtDNA annotations via MToolBox. SG-ADVISER web server is freely available to all users at https://genomics.scripps.edu/mtdna.
Mitochondrial DNA (mtDNA) analysis is crucial for the diagnosis of mitochondrial disorders, forensic investigations, and basic research. Existing pipelines are complex, expensive, and require specialized personnel. In many cases, including the diagnosis of detrimental single nucleotide variants (SNVs), mtDNA analysis is still carried out using Sanger sequencing. Here, we developed a simple workflow and a publicly available webserver named Mitopore that allows the detection of mtDNA SNVs, indels, and haplogroups. To simplify mtDNA analysis, we tailored our workflow to process noisy long-read sequencing data for mtDNA analysis, focusing on sequence alignment and parameter optimization. We implemented Mitopore with eliBQ (eliminate bad quality reads), an innovative quality enhancement that permits the increase of per-base quality of over 20% for low-quality data. The whole Mitopore workflow and webserver were validated using patient-derived and induced pluripotent stem cells harboring mtDNA mutations. Mitopore streamlines mtDNA analysis as an easy-to-use fast, reliable, and cost-effective analysis method for both long- and short-read sequencing data. This significantly enhances the accessibility of mtDNA analysis and reduces the cost per sample, contributing to the progress of mtDNA-related research and diagnosis.
The development of next generation sequencing (NGS) has greatly enhanced the diagnosis of mitochondrial disorders, with a systematic analysis of the whole mitochondrial DNA (mtDNA) sequence and better detection sensitivity. However, the exponential growth of sequencing data renders complex the interpretation of the identified variants, thereby posing new challenges for the molecular diagnosis of mitochondrial diseases. Indeed, mtDNA sequencing by NGS requires specific bioinformatics tools and the adaptation of those developed for nuclear DNA, for the detection and quantification of mtDNA variants from sequence alignment to the calling steps, in order to manage the specific features of the mitochondrial genome including heteroplasmy, i.e., coexistence of mutant and wildtype mtDNA copies. The prioritization of mtDNA variants remains difficult, relying on a limited number of specific resources: population and clinical databases, and in silico tools providing a prediction of the variant pathogenicity. An evaluation of the most prominent bioinformatics tools showed that their ability to predict the pathogenicity was highly variable indicating that special efforts should be directed at developing new bioinformatics tools dedicated to the mitochondrial genome. In addition, massive parallel sequencing raised several issues related to the interpretation of very low mtDNA mutational loads, discovery of variants of unknown significance, and mutations unrelated to patient phenotype or the co-occurrence of mtDNA variants. This review provides an overview of the current strategies and bioinformatics tools for accurate annotation, prioritization and reporting of mtDNA variations from NGS data, in order to carry out accurate genetic counseling in individuals with primary mitochondrial diseases.
Next generation sequencing (NGS) allows investigating mitochondrial DNA (mtDNA) characteristics such as heteroplasmy (i.e. intra-individual sequence variation) to a higher level of detail. While several pipelines for analyzing heteroplasmies exist, issues in usability, accuracy of results and interpreting final data limit their usage. Here we present mtDNA-Server, a scalable web server for the analysis of mtDNA studies of any size with a special focus on usability as well as reliable identification and quantification of heteroplasmic variants. The mtDNA-Server workflow includes parallel read alignment, heteroplasmy detection, artefact or contamination identification, variant annotation as well as several quality control metrics, often neglected in current mtDNA NGS studies. All computational steps are parallelized with Hadoop MapReduce and executed graphically with Cloudgene. We validated the underlying heteroplasmy and contamination detection model by generating four artificial sample mix-ups on two different NGS devices. Our evaluation data shows that mtDNA-Server detects heteroplasmies and artificial recombinations down to the 1% level with perfect specificity and outperforms existing approaches regarding sensitivity. mtDNA-Server is currently able to analyze the 1000G Phase 3 data (n = 2,504) in less than 5 h and is freely accessible at https://mtdna-server.uibk.ac.at.
The growing number of next-generation sequencing (NGS) data presents a unique opportunity to study the combined impact of mitochondrial and nuclear-encoded genetic variation in complex disease. Mitochondrial DNA variants and in particular, heteroplasmic variants, are critical for determining human disease severity. While there are approaches for obtaining mitochondrial DNA variants from NGS data, these software do not account for the unique characteristics of mitochondrial genetics and can be inaccurate even for homoplasmic variants. We introduce MitoScape, a novel, big-data, software for extracting mitochondrial DNA sequences from NGS. MitoScape adopts a novel departure from other algorithms by using machine learning to model the unique characteristics of mitochondrial genetics. We also employ a novel approach of using rho-zero (mitochondrial DNA-depleted) data to model nuclear-encoded mitochondrial sequences. We showed that MitoScape produces accurate heteroplasmy estimates using gold-standard mitochondrial DNA data. We provide a comprehensive comparison of the most common tools for obtaining mtDNA variants from NGS and showed that MitoScape had superior performance to compared tools in every statistically category we compared, including false positives and false negatives. By applying MitoScape to common disease examples, we illustrate how MitoScape facilitates important heteroplasmy-disease association discoveries by expanding upon a reported association between hypertrophic cardiomyopathy and mitochondrial haplogroup T in men (adjusted p-value = 0.003). The improved accuracy of mitochondrial DNA variants produced by MitoScape will be instrumental in diagnosing disease in the context of personalized medicine and clinical diagnostics.
Motivation: The increasing availability of mitochondria-targeted and off-target sequencing data in whole-exome and whole-genome sequencing studies (WXS and WGS) has risen the demand of effective pipelines to accurately measure heteroplasmy and to easily recognize the most functionally important mitochondrial variants among a huge number of candidates. To this purpose, we developed MToolBox, a highly automated pipeline to reconstruct and analyze human mitochondrial DNA from high-throughput sequencing data. Results: MToolBox implements an effective computational strategy for mitochondrial genomes assembling and haplogroup assignment also including a prioritization analysis of detected variants. MToolBox provides a Variant Call Format file featuring, for the first time, allele-specific heteroplasmy and annotation files with prioritized variants. MToolBox was tested on simulated samples and applied on 1000 Genomes WXS datasets. Availability and implementation: MToolBox package is available at https://sourceforge.net/projects/mtoolbox/. Contact: marcella.attimonelli@uniba.it Supplementary information: Supplementary data are available at Bioinformatics online.
The unique features of mitochondrial DNA (mtDNA), including its circular and multicopy nature, the possible coexistence of wild-type and mutant molecules (i.e., heteroplasmy) and the presence of nuclear mitochondrial DNA segments (NUMTs), make the diagnosis of mtDNA diseases particularly challenging. The extensive deployment of next-generation sequencing (NGS) technologies has significantly advanced the diagnosis of mtDNA-related diseases. However, the vast amounts and diverse types of sequencing data complicate the interpretation of these variants. From sequence alignment to variant calling, NGS-based mtDNA sequencing requires specialized bioinformatics tools, adapted for the mitochondrial genome. This study presents the use of new bioinformatics approaches, optimized for short- and long-read sequencing data, to enhance the accuracy of mtDNA analysis in diagnostics. Two recent and emerging free bioinformatics tools, Mitopore and MitoSAlt, were evaluated on patients previously diagnosed with single nucleotide variants or large-scale deletions. Analyses were performed in Linux-based environments and web servers implemented in Python, Perl, Java, and R. The results indicated that each tool demonstrated high sensitivity and specific accuracy in identifying and quantifying various types of pathogenic variants. The study suggests that the integrated and parallel use of these tools offers a significant advantage over traditional methods in interpreting mtDNA genetic variants, reducing the computational demands, and provides an accurate diagnostic solution.
… Yet, these online tools are limited to only one type of mtDNA data or are not suitable for parsing massive mtDNA sequences. Besides these online tools, other software has been …
Mitochondrial DNA (mtDNA) mutations contribute to human disease across a range of severity, from rare, highly penetrant mutations causal for monogenic disorders to mutations with milder contributions to phenotypes. mtDNA variation can exist in all copies of mtDNA or in a percentage of mtDNA copies and can be detected with levels as low as 1%. The large number of copies of mtDNA and the possibility of multiple alternative alleles at the same DNA nucleotide position make the task of identifying allelic variation in mtDNA very challenging. In recent years, specialized variant calling algorithms have been developed that are tailored to identify mtDNA variation from whole-genome sequencing (WGS) data. However, very few studies have systematically evaluated and compared these methods for the detection of both homoplasmy and heteroplasmy. A publicly available synthetic gold standard dataset was used to assess four mtDNA variant callers (Mutserve, mitoCaller, MitoSeek, and MToolBox), and the commonly used Genome Analysis Toolkit “best practices” pipeline, which is included in most current WGS pipelines. We also used WGS data from 126 trios and calculated the percentage of maternally inherited variants as a metric of calling accuracy, especially for homoplasmic variants. We additionally compared multiple pathogenicity prediction resources for mtDNA variants. Although the accuracy of homoplasmic variant detection was high for the majority of the callers with high concordance across callers, we found a very low concordance rate between mtDNA variant callers for heteroplasmic variants ranging from 2.8% to 3.6%, for heteroplasmy thresholds of 5% and 1%. Overall, Mutserve showed the best performance using the synthetic benchmark dataset. The analysis of mtDNA pathogenicity resources also showed low concordance in prediction results. We have shown that while homoplasmic variant calling is consistent between callers, there remains a significant discrepancy in heteroplasmic variant calling. We found that resources like population frequency databases and pathogenicity predictors are now available for variant annotation but still need refinement and improvement. With its peculiarities, the mitochondria require special considerations, and we advocate that caution needs to be taken when analyzing mtDNA data from WGS data.
… Fast, automatic, consistent, and high quality annotations are a prerequisite for downstream … fast de novo annotation of mitochondrial protein-coding genes. The annotation is based on …
… Thirty-five base-pair reads were extracted for a human mitochondrial genome (GenBank accession number J01415; see table 1 for a list of the mitochondrial sequences used in this …
The mitochondrial genome (mtDNA) is of interest for a range of fields including evolutionary, forensic, and medical genetics. Human mitogenomes can be classified into evolutionary related haplogroups that provide ancestral information and pedigree relationships. Because of this and the advent of high-throughput sequencing (HTS) technology, there is a diversity of bioinformatic tools for haplogroup classification. We present a benchmarking of the 11 most salient tools for human mtDNA classification using empirical whole-genome (WGS) and whole-exome (WES) short-read sequencing data from 36 unrelated donors. We also assessed the best performing tool in third-generation long noisy read WGS data obtained with nanopore technology for a subset of the donors. We found that, for short-read WGS, most of the tools exhibit high accuracy for haplogroup classification irrespective of the input file used for the analysis. However, for short-read WES, Haplocheck and MixEmt were the most accurate tools. Based on the performance shown for WGS and WES, and the accompanying qualitative assessment, Haplocheck stands out as the most complete tool. For third-generation HTS data, we also showed that Haplocheck was able to accurately retrieve mtDNA haplogroups for all samples assessed, although only after following assembly-based approaches (either based on a referenced-based assembly or a hybrid de novo assembly). Taken together, our results provide guidance for researchers to select the most suitable tool to conduct the mtDNA analyses from HTS data.
… About 2000 completely sequenced mitochondrial genomes are available from the NCBI … manually curated annotations of their protein-coding genes, rRNAs, and tRNAs. This annotation …
Mitochondrial (mt) genomics represents an understudied but important field of molecular biology. Increasingly, mt dysfunction is being linked to a range of human diseases, including neurodegenerative disorders, diabetes and impairment of childhood development. In addition, mt genomes provide important markers for systematic, evolutionary and population genetic studies. Some technological limitations have prevented the expanded generation and utilization of mt genomic data for some groups of organisms. These obstacles most acutely impede, but are not limited to, studies requiring the determination of complete mt genomic data from minute amounts of material (e.g. biopsy samples or microscopic organisms). Furthermore, post-sequencing bioinformatic annotation and analyses of mt genomes are time consuming and inefficient. Herein, we describe a high-throughput sequencing and bioinformatic pipeline for mt genomics, which will have implications for the annotation and analysis of other organellar (e.g. plastid or apicoplast genomes) and virus genomes as well as long, contiguous regions in nuclear genomes. We utilize this pipeline to sequence and annotate the complete mt genomes of 12 species of parasitic nematode (order Strongylida) simultaneously, each from an individual organism. These mt genomic data provide a rich source of markers for studies of the systematics and population genetics of a group of socioeconomically important pathogens of humans and other animals.
Transfer RNAs (tRNAs) are present in all types of cells as well as in organelles. tRNAs of animal mitochondria show a low level of primary sequence conservation and exhibit ‘bizarre’ secondary structures, lacking complete domains of the common cloverleaf. Such sequences are hard to detect and hence frequently missed in computational analyses and mitochondrial genome annotation. Here, we introduce an automatic annotation procedure for mitochondrial tRNA genes in Metazoa based on sequence and structural information in manually curated covariance models. The method, applied to re-annotate 1876 available metazoan mitochondrial RefSeq genomes, allows to distinguish between remaining functional genes and degrading ‘pseudogenes’, even at early stages of divergence. The subsequent analysis of a comprehensive set of mitochondrial tRNA genes gives new insights into the evolution of structures of mitochondrial tRNA sequences as well as into the mechanisms of genome rearrangements. We find frequent losses of tRNA genes concentrated in basal Metazoa, frequent independent losses of individual parts of tRNA genes, particularly in Arthropoda, and wide-spread conserved overlaps of tRNAs in opposite reading direction. Direct evidence for several recent Tandem Duplication-Random Loss events is gained, demonstrating that this mechanism has an impact on the appearance of new mitochondrial gene orders.
Abstract With the rapid increase of sequenced metazoan mitochondrial genomes, a detailed manual annotation is becoming more and more infeasible. While it is easy to identify the approximate location of protein-coding genes within mitogenomes, the peculiar processing of mitochondrial transcripts, however, makes the determination of precise gene boundaries a surprisingly difficult problem. We have analyzed the properties of annotated start and stop codon positions in detail, and use the inferred patterns to devise a new method for predicting gene boundaries in de novo annotations. Our method benefits from empirically observed prevalances of start/stop codons and gene lengths, and considers the dependence of these features on variations of genetic codes. Albeit not being perfect, our new approach yields a drastic improvement in the accuracy of gene boundaries and upgrades the mitochondrial genome annotation server MITOS to an even more sophisticated tool for fully automatic annotation of metazoan mitochondrial genomes.
Summary: The Dual Organellar GenoMe Annotator (DOGMA) automates the annotation of organellar (plant chloroplast and animal mitochondrial) genomes. It is a Web-based package that allows the use of BLAST searches against a custom database, and conservation of basepairing in the secondary structure of animal mitochondrial tRNAs to identify and annotate genes. DOGMA provides a graphical user interface for viewing and editing annotations. Annotations are stored on our password-protected server to enable repeated sessions of working on the same genome. Finished annotations can be extracted for direct submission to GenBank. Availability: http://phylocluster.biosci.utexas.edu/dogma/ Supplementary information: Detailed documentation and tutorials for annotating both animal mitochondrial and plant chloroplast genomes can be found on the DOGMA home page.
DNA transfer from cytoplasmic organelles to the cell nucleus is a legacy of the endosymbiotic event—the majority of nuclear-mitochondrial segments (NUMTs) are thought to be ancient, preceding human speciation1–3. Here we analyse whole-genome sequences from 66,083 people—including 12,509 people with cancer—and demonstrate the ongoing transfer of mitochondrial DNA into the nucleus, contributing to a complex NUMT landscape. More than 99% of individuals had at least one of 1,637 different NUMTs, with 1 in 8 individuals having an ultra-rare NUMT that is present in less than 0.1% of the population. More than 90% of the extant NUMTs that we evaluated inserted into the nuclear genome after humans diverged from apes. Once embedded, the sequences were no longer under the evolutionary constraint seen within the mitochondrion, and NUMT-specific mutations had a different mutational signature to mitochondrial DNA. De novo NUMTs were observed in the germline once in every 104 births and once in every 103 cancers. NUMTs preferentially involved non-coding mitochondrial DNA, linking transcription and replication to their origin, with nuclear insertion involving multiple mechanisms including double-strand break repair associated with PR domain zinc-finger protein 9 (PRDM9) binding. The frequency of tumour-specific NUMTs differed between cancers, including a probably causal insertion in a myxoid liposarcoma. We found evidence of selection against NUMTs on the basis of size and genomic location, shaping a highly heterogenous and dynamic human NUMT landscape. A study examining DNA transfer from mitochondria to the nucleus using whole-genome sequences from 66,083 people shows that this is an ongoing dynamic process in normal cells with distinct roles in different types of cancer.
… (mtDNA enrichment, cDNA amplification, long-range amplification and pre-PCR dilution) on a common set of numt cases, showing that mtDNA … of putative mtDNA datasets with numts. …
Nuclear-mitochondrial DNA segments (NUMTs) are mitochondrial DNA (mtDNA) fragments that have been inserted into the nuclear genome. Some NUMTs are common within the human population but most NUMTs are rare and specific to individuals. NUMTs range in size from 24 base pairs to encompassing nearly the entire mtDNA and are found throughout the nuclear genome. Emerging evidence suggests that the formation of NUMTs is an ongoing process in humans. NUMTs contaminate sequencing results of the mtDNA by introducing false positive variants, particularly heteroplasmic variants present at a low variant allele frequency (VAF). In our review, we discuss the prevalence of NUMTs in the human population, the potential mechanisms of de novo NUMT insertion via DNA repair mechanisms, and provide an overview of the existing approaches for minimizing NUMT contamination. Apart from filtering known NUMTs, both wet lab-based and computational methods can be used to minimize the contamination of NUMTs in analyses of human mtDNA. Current approaches include: (1) isolating mitochondria to enrich for mtDNA; (2) applying basic local alignment to identify NUMTs for subsequent filtering; (3) bioinformatic pipelines for NUMT detection; (4) k-mer-based NUMT detection; and (5) filtering candidate false positive variants by mtDNA copy number, VAF, or sequence quality score. Multiple approaches must be applied in order to effectively identify NUMTs in samples. Although next-generation sequencing is revolutionizing our understanding of heteroplasmic mtDNA, it also raises new challenges with the high prevalence and individual-specific NUMTs that need to be handled with care in studies of mitochondrial genetics.
Practices related to mitochondrial research have long been hindered by the presence of mitochondrial pseudogenes within the nuclear genome (NUMTs). Even though partially assembled human reference genomes like hg38 have included NUMTs compilation, the exhaustive NUMTs within the only complete reference genome (T2T-CHR13) remain unknown. Here, we comprehensively identified the fixed NUMTs within the reference genome using human pan-mitogenome (HPMT) from GeneBank. The inclusion of HPMT serves the purpose of establishing an authentic mitochondrial DNA (mtDNA) mutational spectrum for the identification of NUMTs, distinguishing it from the polymorphic variations found in NUMTs. Using HPMT, we identified approximately 10% of additional NUMTs in three human reference genomes under stricter thresholds. And we also observed an approximate 6% increase in NUMTs in T2T-CHR13 compared to hg38, including NUMTs on the short arms of chromosomes 13, 14, and 15 that were not assembled previously. Furthermore, alignments based on 20-mer from mtDNA suggested the presence of more mtDNA-like short segments within the nuclear genome, which should be avoided for short amplicon or cell free mtDNA detection. Finally, through the assay of transposase-accessible chromatin with high-throughput sequencing (ATAC-seq) on cell lines before and after mtDNA elimination, we concluded that NUMTs have a minimal impact on bulk ATAC-seq, even though 16% of sequencing data originated from mtDNA
The insertion of mitochondrial genome-derived DNA sequences into the nuclear genome is a frequent event in organismal evolution, resulting in nuclear-mitochondrial DNA segments (NUMTs), which serve as a significant driving force for genome evolution. Once incorporated into the nuclear genome, some NUMTs can be conserved for extended periods and may potentially acquire novel cellular roles. However, current mainstream methods for detecting NUMTs are inefficient at identifying ancient and highly degraded NUMTs, leading to their prevalence and impact being underestimated. These ancient NUMTs likely play a far greater role in genetic functions than previously recognized, including contributing to the acquisition of functional exons. This study focuses on identifying ancient NUMTs in mammalian genomes using enhanced high-sensitivity sequence comparison methods. A sensitive and accurate NUMT searching pipeline was established, predicting 1013 NUMTs in the human reference genome, 364 (36%) of which are newly detected compared to the University of California, Santa Cruz (UCSC) reference human NUMTs database. Notably, 90 pre-eutherian human NUMTs were identified, representing significantly older NUMTs than previously reported, with origins dating back at least 100 million years. The most ancient mammalian NUMT could even date back over 160 million years, inserted into the nuclear genome of the common ancestor of therian mammals. This study provides a comprehensive exploration of the quantity and evolutionary history of mammalian NUMTs, paving the way for future research on endosymbiotic impact on the evolution of nuclear genomes.
The transfer of mitochondrial genetic material into the nuclear genomes of eukaryotes is a well-established phenomenon. Many studies over the past decade have utilized reference genome sequences of numerous species to characterize the prevalence and contribution of nuclear mitochondrial insertions to human diseases. The recent advancement of high throughput sequencing technologies has enabled the interrogation of genomic variation at a much finer scale, and now allows for an exploration into the diversity of polymorphic nuclear mitochondrial insertions (NumtS) in human populations. We have developed an approach to discover and genotype previously undiscovered Numt insertions using whole genome, paired-end sequencing data. We have applied this method to almost a thousand individuals in twenty populations from the 1000 Genomes Project and other data sets and identified 138 novel sites of Numt insertions, extending our current knowledge of existing Numt locations in the human genome by almost 20%. Most of the newly identified NumtS were found in less than 1% of the samples we examined, suggesting that they occur infrequently in nature or have been rapidly removed by purifying selection. We find that recent Numt insertions are derived from throughout the mitochondrial genome, including the D-loop, and have integration biases consistent with previous studies on older, fixed NumtS in the reference genome. We have further determined the complete inserted sequence for a subset of these events to define their age and origin of insertion as well as their potential impact on studies of mitochondrial heteroplasmy.
Enriching target sequences in sequencing libraries via capture hybridization to bait/probes is an efficient means of leveraging the capabilities of next-generation sequencing for obtaining sequence data from target regions of interest. However, homologous sequences from non-target regions may also be enriched by such methods. Here we investigate the fidelity of capture enrichment for complete mitochondrial DNA (mtDNA) genome sequencing by analyzing sequence data for nuclear copies of mtDNA (NUMTs). Using capture-enriched sequencing data from a mitochondria-free cell line and the parental cell line, and from samples previously sequenced from long-range PCR products, we demonstrate that NUMT alleles are indeed present in capture-enriched sequence data, but at low enough levels to not influence calling the authentic mtDNA genome sequence. However, distinguishing NUMT alleles from true low-level mutations (e.g. heteroplasmy) is more challenging. We develop here a computational method to distinguish NUMT alleles from heteroplasmies, using sequence data from artificial mixtures to optimize the method.
Mitochondrial genome (mitogenome) plays important roles in evolutionary and ecological studies. It becomes routine to utilize multiple genes on mitogenome or the entire mitogenomes to investigate phylogeny and biodiversity of focal groups with the onset of High Throughput Sequencing technologies. We developed a mitogenome toolkit MitoZ, consisting of independent modules of de novo assembly, findMitoScaf, annotation and visualization, that can generate mitogenome assembly together with annotation and visualization results from HTS raw reads. We evaluated its performance using a total of 50 samples of which mitogenomes are publicly available. The results showed that MitoZ can recover more full-length mitogenomes with higher accuracy compared to the other available mitogenome assemblers. Overall, MitoZ provides a one-click solution to construct the annotated mitogenome from HTS raw data and will facilitate large scale ecological and evolutionary studies. MitoZ is free open source software distributed under GPLv3 license and available at https://github.com/linzhi2013/MitoZ.
Mitochondrial DNA (mtDNA), characterised by its high copy number, structural stability, and maternal inheritance, is a critical genetic marker in forensic genetics, species identification, and conservation studies. Accurate mtDNA genome assembly is essential for these applications. However, DNA from typical wildlife and historical sources - such as museum specimens, keratinised tissues, environmental samples, and ancient remains - is often highly fragmented and damaged, limiting assembly efficiency and accuracy. Here, we developed a preprocessing workflow (MTAK) specifically designed to improve mtDNA assembly from degraded DNA. MTAK integrates two core steps: (1) extraction of homologous reads via reference-sequence alignment and (2) targeted processing of severely damaged 5' and 3' terminal bases. The workflow was evaluated on 24 degraded samples of varying quality. MTAK substantially enhanced assembly completeness and accuracy, particularly in samples with extensive DNA damage, while reducing computational time by over tenfold and minimising resource consumption. An interaction model was implemented to guide optimal sequencing depth for efficient assembly. This approach is compatible with most existing assembly tools and significantly improves mtDNA recovery from challenging historical and wildlife samples.
A wide range of scientific fields, such as forensics, anthropology, medicine, and molecular evolution, benefits from the analysis of mitogenomic data. With the development of new sequencing technologies, the amount of mitochondrial sequence data to be analyzed has increased exponentially over the last few years. The accurate annotation of mitochondrial DNA is a prerequisite for any mitogenomic comparative analysis. To sustain with the growth of the available mitochondrial sequence data, highly efficient automatic computational methods are, hence, needed. Automatic annotation methods are typically based on databases that contain information about already annotated (and often pre-curated) mitogenomes of different species. However, the existing approaches have several shortcomings: 1) they do not scale well with the size of the database; 2) they do not allow for a fast (and easy) update of the database; and 3) they can only be applied to a relatively small taxonomic subset of all species. Here, we present a novel approach that does not have any of these aforementioned shortcomings, (1), (2), and (3). The reference database of mitogenomes is represented as a richly annotated de Bruijn graph. To generate gene predictions for a new user-supplied mitogenome, the method utilizes a clustering routine that uses the mapping information of the provided sequence to this graph. The method is implemented in a software package called DeGeCI (De Bruijn graph Gene Cluster Identification). For a large set of mitogenomes, for which expert-curated annotations are available, DeGeCI generates gene predictions of high conformity. In a comparative evaluation with MITOS2, a state-of-the-art annotation tool for mitochondrial genomes, DeGeCI shows better database scalability while still matching MITOS2 in terms of result quality and providing a fully automated means to update the underlying database. Moreover, unlike MITOS2, DeGeCI can be run in parallel on several processors to make use of modern multi-processor systems.
… quality as MITOS2, while requiring less annotation time when … , which is not offered by MITOS2. Moreover, DeGeCI 1.1 aims … code is optional, unlike MITOS2, where it is mandatory. We …
Abstract Mitochondrial tRNAs have acquired a diverse portfolio of aberrant structures throughout metazoan evolution. With the availability of more than 12,500 mitogenome sequences, it is essential to compile a comprehensive overview of the pattern changes with regard to mitochondrial tRNA repertoire and structural variations. This, of course, requires reanalysis of the sequence data of more than 250,000 mitochondrial tRNAs with a uniform workflow. Here, we report our results on the complete reannotation of all mitogenomes available in the RefSeq database by September 2022 using mitos2. Based on the individual cases of mitochondrial tRNA variants reported throughout the literature, our data pinpoint the respective hotspots of change, i.e. Acanthocephala (Lophotrochozoa), Nematoda, Acariformes, and Araneae (Arthropoda). Less dramatic deviations of mitochondrial tRNAs from the norm are observed throughout many other clades. Loss of arms in animal mitochondrial tRNA clearly is a phenomenon that occurred independently many times, not limited to a small number of specific clades. The summary data here provide a starting point for systematic investigations into the detailed evolutionary processes of structural reduction and loss of mitochondrial tRNAs as well as a resource for further improvements of annotation workflows for mitochondrial tRNA annotation.
ABSTRACT Introduction Primary mitochondrial diseases (PMDs) comprise a large and heterogeneous group of genetic diseases that result from pathogenic variants in either nuclear DNA (nDNA) or mitochondrial DNA (mtDNA). Widespread adoption of next-generation sequencing (NGS) has improved the efficiency and accuracy of mtDNA diagnoses; however, several challenges remain. Areas covered In this review, we briefly summarize the current state of the art in molecular diagnostics for mtDNA and consider the implications of improved whole genome sequencing (WGS), bioinformatic techniques, and the adoption of long-read sequencing, for PMD diagnostics. Expert opinion We anticipate that the application of PCR-free WGS from blood DNA will increase in diagnostic laboratories, while for adults with myopathic presentations, WGS from muscle DNA may become more widespread. Improved bioinformatic strategies will enhance WGS data interrogation, with more accurate delineation of mtDNA and NUMTs (nuclear mitochondrial DNA segments) in WGS data, superior coverage uniformity, indirect measurement of mtDNA copy number, and more accurate interpretation of heteroplasmic large-scale rearrangements (LSRs). Separately, the adoption of diagnostic long-read sequencing could offer greater resolution of complex LSRs and the opportunity to phase heteroplasmic variants. Plain Language Summary Mitochondria generate our bodies’ energy, and they contain their own circular DNA molecules. Changes in this mitochondrial DNA can cause a wide range of genetic diseases. Improved computer processing of the sequence of this DNA and new techniques that can read the full DNA sequence in one experiment may enhance our ability to understand these genetic variants.
The mitochondrial genome (mtDNA) is an important source of disease-causing genetic variability, but existing sequencing methods limit understanding, precluding phased …
… However, long-read sequencing comes along with … of mitochondrial DNA variants in three samples using PacBio’s Sequel (Pacific Biosciences Inc., Menlo Park, CA, USA) long-read …
Primary mitochondrial diseases are progressive genetic disorders affecting multiple organs and characterized by mitochondrial dysfunction. These disorders can be caused by mutations in nuclear genes coding proteins with mitochondrial localization or by genetic defects in the mitochondrial genome (mtDNA). The latter include point pathogenic variants and large-scale deletions/rearrangements. MtDNA molecules with the wild type or a variant sequence can exist together in a single cell, a condition known as mtDNA heteroplasmy. MtDNA single point mutations are typically detected by means of Next-Generation Sequencing (NGS) based on short reads which, however, are limited for the identification of structural mtDNA alterations. Recently, new NGS technologies based on long reads have been released, allowing to obtain sequences of several kilobases in length; this approach is suitable for detection of structural alterations affecting the mitochondrial genome. In the present work we illustrate the optimization of two sequencing protocols based on long-read Oxford Nanopore Technology to detect mtDNA structural alterations. This approach presents strong advantages in the analysis of mtDNA compared to both short-read NGS and traditional techniques, potentially becoming the method of choice for genetic studies on mtDNA.
Background Mitochondrial diseases (MDs) can be caused by single nucleotide variants (SNVs) and structural variants (SVs) in the mitochondrial genome (mtDNA). Presently, identifying deletions in small to medium-sized fragments and accurately detecting low-percentage variants remains challenging due to the limitations of next-generation sequencing (NGS). Methods In this study, we integrated targeted long-range polymerase chain reaction (LR-PCR) and PacBio HiFi sequencing to analyze 34 participants, including 28 patients and 6 controls. Of these, 17 samples were subjected to both targeted LR-PCR and to compare the mtDNA variant detection efficacy. Results Among the 28 patients tested by long-read sequencing (LRS), 2 patients were found positive for the m.3243 A > G hotspot variant, and 20 patients exhibited single or multiple deletion variants with a proportion exceeding 4%. Comparison between the results of LRS and NGS revealed that both methods exhibited similar efficacy in detecting SNVs exceeding 5%. However, LRS outperformed NGS in detecting SNVs with a ratio below 5%. As for SVs, LRS identified single or multiple deletions in 13 out of 17 cases, whereas NGS only detected single deletions in 8 cases. Furthermore, deletions identified by LRS were validated by Sanger sequencing and quantified in single muscle fibers using real-time PCR. Notably, LRS also effectively and accurately identified secondary mtDNA deletions in idiopathic inflammatory myopathies (IIMs). Conclusions LRS outperforms NGS in detecting various types of SNVs and SVs in mtDNA, including those with low frequencies. Our research is a significant advancement in medical comprehension and will provide profound insights into genetics.
… in the Phrap assembly. Average quality profile across the entire mitochondrial genome. The black line represents the average Phrap quality after final assembly of the consensus …
Cells contain thousands of copies of the mitochondrial genome. These genomes are distributed within the tubular mitochondrial network, which is itself spread across the cytosol of the cell. Mitochondrial DNA (mtDNA) replication occurs throughout the cell cycle and ensures that cells maintain a sufficient number of mtDNA copies. At replication termination the genomes must be resolved and segregated within the mitochondrial network. Defects in mtDNA replication and segregation are a cause of human mitochondrial disease associated with failure of cellular energy production. This review focuses upon recent developments on how mitochondrial genomes are physically separated at the end of DNA replication, and how these genomes are subsequently segregated and distributed around the mitochondrial network.
… The Current Human mtDNA Database A complete set of the current human mitochondrial genomes available in GenBank was assembled as described in the Material and Methods …
… Ancient humans are well suited to provide calibration points for the human mitochondrial … These sequences were used as inputs for the iterative mapping assembler (MIA) [11]. For …
Human DNA is unavoidably present in metagenomic analyses of human microbiomes. While current protocols remove human DNA before submission to public repositories, mitochondrial DNA (mtDNA) has been overlooked and frequently persists. We discuss the privacy risks and research opportunities associated with mtDNA, urging consideration by the scientific, ethics, and legal communities.
… of de novo assembly and review state-of-the-art de novo assembly of human and other … sequence variation as a result of incomplete assembly, the implications for biomedicine and …
The assembly of plant mitochondrial genomes presents unique challenges due to difficulties in isolating mitochondrial DNA (mtDNA) and plant mitochondrial genome characteristics, such as low interspecific conservation; sequence sharing among mitochondrial, nuclear and plastid DNAs; and complex structural variations. Our laboratory has sequenced and assembled a dozen plant mitochondrial genomes, testing various strategies and identifying numerous assembly issues. This review compared the advanced methods and tools for plant mitochondrial genome assembly, categorizing assembly algorithms into three groups: (1) reference‐based, (2) de novo and (3) iterative mapping and extension. The performance of 11 software tools used most frequently over the past 5 years (GetOrganelle, Velvet, NOVOPlasty, SOAPdenovo2, Canu, Flye, SMARTdenovo, PMAT, NextDenovo, SPAdes and Unicycler) and two newly developed tools (TIPPo and Oatk) was assessed. The evaluation metrics included the completeness, contiguity and correctness of the assembled plant mitochondrial genomes. SMARTdenovo, NextDenovo and Oatk demonstrated superior performance in terms of contiguity and completeness. GetOrganelle and Flye excelled in correctness. Key challenges in plant mitochondrial genome assembly, such as removing nuclear mitochondrial DNA (NUMT) and mitochondrial plastid DNA (NUPT) contamination and resolving intra‐genomic repetitive regions, were discussed. A general strategy for plant mitochondrial genome assembly used in studies conducted in our laboratory was summarized. This review serves as a resource for those assembling plant mitochondrial genomes or developing plant mitochondrial genome assembly tools.
… Because L0R8F8 was built de novo into a region of the map with an estimated local resolution of ∼4 Å, we performed additional checks to confirm that the model was consistent with …
Hundreds of thousands of human genomes are now being sequenced to characterize genetic variation and use this information to augment association mapping studies of complex disorders and other phenotypic traits. Genetic variation is identified mainly by mapping short reads to the reference genome or by performing local assembly. However, these approaches are biased against discovery of structural variants and variation in the more complex parts of the genome. Hence, large-scale de novo assembly is needed. Here we show that it is possible to construct excellent de novo assemblies from high-coverage sequencing with mate-pair libraries extending up to 20 kilobases. We report de novo assemblies of 150 individuals (50 trios) from the GenomeDenmark project. The quality of these assemblies is similar to those obtained using the more expensive long-read technology. We use the assemblies to identify a rich set of structural variants including many novel insertions and demonstrate how this variant catalogue enables further deciphering of known association mapping signals. We leverage the assemblies to provide 100 completely resolved major histocompatibility complex haplotypes and to resolve major parts of the Y chromosome. Our study provides a regional reference genome that we expect will improve the power of future association mapping studies and hence pave the way for precision medicine initiatives, which now are being launched in many countries including Denmark.
Mitofish is a database of fish mitochondrial genomes (mitogenomes) that includes powerful and precise de novo annotations for mitogenome sequences. Fish occupy an important position in the evolution of vertebrates and the ecology of the hydrosphere, and mitogenomic sequence data have served as a rich source of information for resolving fish phylogenies and identifying new fish species. The importance of a mitogenomic database continues to grow at a rapid pace as massive amounts of mitogenomic data are generated with the advent of new sequencing technologies. A severe bottleneck seems likely to occur with regard to mitogenome annotation because of the overwhelming pace of data accumulation and the intrinsic difficulties in annotating sequences with degenerating transfer RNA structures, divergent start/stop codons of the coding elements, and the overlapping of adjacent elements. To ease this data backlog, we developed an annotation pipeline named MitoAnnotator. MitoAnnotator automatically annotates a fish mitogenome with a high degree of accuracy in approximately 5 min; thus, it is readily applicable to data sets of dozens of sequences. MitoFish also contains re-annotations of previously sequenced fish mitogenomes, enabling researchers to refer to them when they find annotations that are likely to be erroneous or while conducting comparative mitogenomic analyses. For users who need more information on the taxonomy, habitats, phenotypes, or life cycles of fish, MitoFish provides links to related databases. MitoFish and MitoAnnotator are freely available at http://mitofish.aori.u-tokyo.ac.jp/ (last accessed August 28, 2013); all of the data can be batch downloaded, and the annotation pipeline can be used via a web interface.
人类线粒体基因组研究已构建出成熟的计算生态体系,涵盖了从原始序列重构到精细注释再到变异致病性解读的全过程。当前的重点方向包括:利用新兴的长读长测序技术提升基因组组装的准确性;开发高度自动化、集成化的流程以支持高通量临床诊断;以及针对干扰严重的核嵌入序列(NUMTs)建立过滤标准,从而确保下游临床分析数据的可靠性与准确性。