人类线粒体基因组的组装、注释研究进展：59篇文献的方向梳理与核心结论

人类线粒体基因组的组装、注释

人类线粒体基因组研究已构建出成熟的计算生态体系，涵盖了从原始序列重构到精细注释再到变异致病性解读的全过程。当前的重点方向包括：利用新兴的长读长测序技术提升基因组组装的准确性；开发高度自动化、集成化的流程以支持高通量临床诊断；以及针对干扰严重的核嵌入序列（NUMTs）建立过滤标准，从而确保下游临床分析数据的可靠性与准确性。

共 59 篇文献，3 个研究方向

线粒体基因组从头组装与计算分析流程

该组论文集中探讨如何从Sanger、NGS及长读长测序数据中实现线粒体基因组的从头组装、重构及处理流程，包含多种针对不同测序平台优化的集成分析工具与自动化流水线。相关文献: Guanliang Meng et. al, 2018 等 30 篇文献

线粒体基因组功能注释与临床变异解读

该组研究关注线粒体基因组基因边界识别、RNA/蛋白质编码基因的准确注释，以及针对临床医学背景下的变异识别、异质性分析及病理性解读工具。相关文献: M. Rieder et. al, 1998 等 20 篇文献

核嵌入线粒体序列（NUMTs）的识别与干扰控制

该组文献专门研究线粒体DNA向核基因组的迁移现象，重点在于鉴定核线粒体假基因（NUMTs），评估其对线粒体基因组测序、数据质量及进化生物学研究造成的假阳性与干扰。相关文献: Wei Wei et. al, 2022 等 9 篇文献

总计59篇相关文献

Extraction and annotation of human mitochondrial genomes from 1000 Genomes Whole Exome Sequencing data

doi.org-M. Diroma, C. Calabrese, D. Simone 等, 2014-BMC Genomics2区IF 3.7

BackgroundWhole Exome Sequencing (WES) is one of the most used and cost-effective next generation technologies that allows sequencing of all nuclear exons. Off-target regions may be captured if they present high sequence similarity with baits. Bioinformatics tools have been optimized to retrieve a large amount of WES off-target mitochondrial DNA (mtDNA), by exploiting the aspecificity of probes, partially overlapping to Nuclear mitochondrial Sequences (NumtS). The 1000 Genomes project represents one of the widest resources to extract mtDNA sequences from WES data, considering the large effort the scientific community is undertaking to reconstruct human population history using mtDNA as marker, and the involvement of mtDNA in pathology.ResultsA previously published pipeline aimed at assembling mitochondrial genomes from off-target WES reads and further improved to detect insertions and deletions (indels) and heteroplasmy in a dataset of 1242 samples from the 1000 Genomes project, enabled to obtain a nearly complete mitochondrial genome from 943 samples (76% analyzed exomes). The robustness of our computational strategy was highlighted by the reduction of reads amount recognized as mitochondrial in the original annotation produced by the Consortium, due to NumtS filtering.An accurate survey was carried out on 1242 individuals. 215 indels, mostly heteroplasmic, and 3407 single base variants were mapped. A homogeneous mismatches distribution was observed along the whole mitochondrial genome, while a lower frequency of indels was found within protein-coding regions, where frameshift mutations may be deleterious. The majority of indels and mismatches found were not previously annotated in mitochondrial databases since conventional sequencing methods were limited to homoplasmy or quasi-homoplasmy detection. Intriguingly, upon filtering out non haplogroup-defining variants, we detected a widespread population occurrence of rare events predicted to be damaging. Eventually, samples were stratified into blood- and lymphoblastoid-derived to detect possibly different trends of mutability in the two datasets, an analysis which did not yield significant discordances.ConclusionsTo the best of our knowledge, this is likely the most extended population-scale mitochondrial genotyping in humans enriched with the estimation of heteroplasmies.