苜蓿泛基因家族分析
苜蓿及豆科植物泛基因组构建与方法学研究
该组文献侧重于苜蓿及其他豆科植物泛基因组的构建方法、流程、线性与图基因组工具的对比以及对泛基因组学研究现状的综述,探讨了如何处理多倍体和高度杂合基因组的技术挑战。
- Pangenome analysis provides insights into legume evolution and breeding(Longfei Wang, Xinyu Jiang, Wu Jiao, Junrong Mao, Wenxue Ye, Yangrong Cao, Qingshan Chen, Qingxin Song, 2025, Nature Genetics)
- Legume Pangenome: Status and Scope for Crop Improvement(U. Jha, H. Nayyar, E. V. von Wettberg, Yogesh Dashrath Naik, M. Thudi, K. Siddique, 2022, Plants)
- A stepwise guide for pangenome development in crop plants: an alfalfa (Medicago sativa) case study(Harpreet Kaur, Laura M. Shannon, D. Samac, 2024, BMC Genomics)
- Plant Pan-genomics: Opportunities, Advances, and Challenges(Mingjun Hou, Shaozi Pang, 2024, Journal of Data Science and Intelligent Systems)
- Crop pangenomes(A. Pronozin, M. K. Bragina, E. Salina, 2021, Vavilov Journal of Genetics and Breeding)
苜蓿泛基因组结构变异与核心基因进化机制
该组文献集中研究苜蓿(包括二倍体和四倍体)的基因组组装,重点分析基因内容变异(PAV)、核心基因与适应性变异的关系,以及多倍体化带来的遗传约束与进化优势。
- Chromosome‐scale haplotype‐resolved genome assembly of the autotetraploid alfalfa cultivar Bolivia(Hongkui Zhang, Lan Zhou, Hong-Mei Zhao, Jiayan Liang, Yongle Liu, Chen Wang, Sijie Sun, Lizhen Song, Yu'e Zhang, Youfa Cheng, Yongbiao Xue, 2025, Plant Biotechnology Journal)
- Medicago super-pangenome reveals adaptive advantages and evolutionary constraints in autotetraploid alfalfa(Fan Zhang, Chunxue Wei, Xiaoya Shi, Shuo Cao, Xiaodong Xu, Zhiyao Ma, Yanling Peng, Rida Arshad, Hui Xue, Zhen Zhang, Wei Zhang, Yanshuai Xu, Yang Dong, Lianzhu Zhou, Xuejing Cao, Mengrui Du, Xu Wang, Zhiwu Zhang, R. Long, Junmei Kang, Yongfeng Zhou, Qingchuan Yang, 2025, Nature Communications)
- Genomic resources for Australian alfalfa (Medicago sativa L.) genomics: reformatted reference genome, annotated variants, gene presence-absence and diversity analysis from genome re-sequencing(M. Malmberg, D.D. Suraweera, R. Baillie, K.F. Smith, Noel O. Cogan, 2025, BMC Plant Biology)
- Pan-genomic analysis highlights genes associated with agronomic traits and enhances genomics-assisted breeding in alfalfa(F. He, Shuai Chen, Yangyang Zhang, Kun Chai, Qing Zhang, Weilong Kong, Shenyang Qu, Lin Chen, Fan Zhang, Mingna Li, Xue Wang, Huigang Lv, Tiejun Zhang, Xiaofan He, Xiao Li, Yajing Li, Xianyang Li, Xueqian Jiang, Ming Xu, Bilig Sod, Junmei Kang, Xingtan Zhang, R. Long, Qingchuan Yang, 2025, Nature Genetics)
- The chromosome‐level assembly of the wild diploid alfalfa genome provides insights into the full landscape of genomic variations between cultivated and wild alfalfa(Kun Shi, Hongbin Dong, Huilong Du, Yuxian Li, Le Zhou, Chengzhi Liang, Muhammet Şakiroğlu, Zan Wang, 2024, Plant Biotechnology Journal)
- Alfalfa pan-genome unveiled—a breakthrough in alfalfa genomics-assisted breeding(Gai Huang, Lulu Li, Xiaofeng Cao, Hao Lin, 2025, Science China Life Sciences)
苜蓿泛转录组与功能基因解析
该组文献聚焦于苜蓿在不同环境胁迫(如干旱、高盐、铝毒)下的转录组应答、基因共表达网络及功能基因的挖掘,通过组学手段解析农艺性状的调控机制。
- Pan-transcriptome identifying master genes and regulation network in response to drought and salt stresses in Alfalfa (Medicago sativa L.)(C. Medina, D. Samac, Long-Xi Yu, 2021, Scientific Reports)
- Comparison of structural variants in the whole genome sequences of two Medicago truncatula ecotypes: Jemalong A17 and R108(Ao Li, Ai Liu, Shuang Wu, Kun Qu, Hong-yin Hu, Jinli Yang, Nawal Shrestha, Jianquan Liu, Guangpeng Ren, 2022, BMC Plant Biology)
- Advances in basic biology of alfalfa (Medicago sativa L.): a comprehensive overview(Yuanyuan Zhang, Lei Wang, 2025, Horticulture Research)
- Determining the Genetic Architecture and Breeding Potential of Quality Traits in Alfalfa (Medicago sativa L.) Through Genome-Wide Association Study and Genomic Prediction(Ming Xu, Kai Zhu, Xueqian Jiang, Fan Zhang, Bilig Sod, Huajuan Leng, Tianci Zhang, Yanchao Xu, Tianhui Yang, Mingna Li, Xue Wang, Qingchuan Yang, Junmei Kang, Tie-jun Zhang, Lin Chen, R. Long, F. He, 2025, Agronomy)
- Exploring structural variation and gene family architecture with De Novo assemblies of 15 Medicago genomes(Peng Zhou, K. Silverstein, Thiruvarangan Ramaraj, J. Guhlin, R. Denny, Junqi Liu, A. Farmer, K. Steele, R. Stupar, J. Miller, P. Tiffin, J. Mudge, N. Young, 2017, BMC Genomics)
- A pan‐genome and chromosome‐length reference genome of narrow‐leafed lupin (Lupinus angustifolius) reveals genomic diversity and insights into key industry and biological traits(Gagan Garg, L. Kamphuis, P. Bayer, P. Kaur, O. Dudchenko, Candy M. Taylor, K. Frick, R. Foley, Ling-ling Gao, Erez Lieberman Aiden, David Edwards, Karam B. Singh, 2022, The Plant Journal)
- Genome-Wide Association Studies Identifying Multiple Loci Associated With Alfalfa Forage Quality(Sen Lin, C. Medina, O. Norberg, D. Combs, Guojie Wang, G. Shewmaker, S. Fransen, D. Llewellyn, Long-Xi Yu, 2021, Frontiers in Plant Science)
针对苜蓿及相关豆科作物的研究,现有文献主要分为三个逻辑层面:首先是泛基因组的构建技术与理论方法体系;其次是基于组装数据的结构变异分析及核心基因的进化与适应机制研究;最后是结合泛转录组与关联分析,深入挖掘响应环境胁迫及调控农艺性状的功能基因,为分子育种提供理论支撑。
总计18篇相关文献
… gene families, encompassing both core and variable gene sets. These data not only reflect the high genetic diversity of domesticated alfalfa… for dissecting the genetic basis of complex …
… pan-genome constructed from 24 diverse alfalfa accessions, encompassing a wide range of genetic … algorithm, which identified 54,002 nonredundant gene families. The size of the …
The concept of pangenomics and the importance of structural variants is gaining recognition within the plant genomics community. Due to advancements in sequencing and computational technology, it has become feasible to sequence the entire genome of numerous individuals of a single species at a reasonable cost. Pangenomes have been constructed for many major diploid crops, including rice, maize, soybean, sorghum, pearl millet, peas, sunflower, grapes, and mustards. However, pangenomes for polyploid species are relatively scarce and are available in only few crops including wheat, cotton, rapeseed, and potatoes. In this review, we explore the various methods used in crop pangenome development, discussing the challenges and implications of these techniques based on insights from published pangenome studies. We offer a systematic guide and discuss the tools available for constructing a pangenome and conducting downstream analyses. Alfalfa, a highly heterozygous, cross pollinated and autotetraploid forage crop species, is used as an example to discuss the concerns and challenges offered by polyploid crop species. We conducted a comparative analysis using linear and graph-based methods by constructing an alfalfa graph pangenome using three publicly available genome assemblies. To illustrate the intricacies captured by pangenome graphs for a complex crop genome, we used five different gene sequences and aligned them against the three graph-based pangenomes. The comparison of the three graph pangenome methods reveals notable variations in the genomic variation captured by each pipeline. Pangenome resources are proving invaluable by offering insights into core and dispensable genes, novel gene discovery, and genome-wide patterns of variation. Developing user-friendly online portals for linear pangenome visualization has made these resources accessible to the broader scientific and breeding community. However, challenges remain with graph-based pangenomes including compatibility with other tools, extraction of sequence for regions of interest, and visualization of genetic variation captured in pangenome graphs. These issues necessitate further refinement of tools and pipelines to effectively address the complexities of polyploid, highly heterozygous, and cross-pollinated species.
Alfalfa (Medicago sativa L.), a globally significant forage crop, has faced limited breeding progress in recent decades. Several challenges hinder genetic gain in alfalfa, including its status as an outbreeding tetraploid species with pronounced inbreeding depression, high parent numbers in synthetic crosses resulting in limited genetic differentiation between cultivars and a lack of genomic resources to advance genomic breeding techniques in the species. We aim to address some of these limitations by generating genomic resources for alfalfa improvement, including reformatting an allele-aware reference genome to remove duplicate haplotypes while retaining presence absence variation, genome annotation identifying genes and functional elements, SNP discovery and SNP variant effect prediction. The predicted gene set was expanded by the inclusion of RNA sequencing from multiple tissue types and stress treatments. Genetic diversity of 316 samples from seven commercially available cultivars relevant to Australian grazing systems was examined, including a population level analysis of gene presence-absence variation. There is little genetic differentiation between cultivars, with higher diversity within than between cultivars. Several genes were found to display presence-absence at the population level. These findings provide insights for alfalfa breeding programs and underscore the need for continued efforts in developing genomic tools to unlock the crop’s full potential.
The genetic basis for the adaptive advantages of polyploids over their diploid relatives remains poorly understood. To address this knowledge gap, we generate a haplotype-resolved autotetraploid alfalfa (Medicago sativa subsp. sativa) genome and construct a super-pangenome from 13 genomes across seven Medicago taxa. We discover substantial gene content variation in alfalfa, with only 20.1% of genes present on all four haplotypes. Within this group, 53.3% are core genes conserved across the Medicago genus, which we term ‘tetra-copy core genes’. We find these genes are significantly enriched in climate-adaptation-associated genes (1.60-fold) and stress-responsive differentially expressed genes (1.61-fold). Paradoxically, they also carry a high genetic burden, with 80.1% of deleterious variants located in coding regions. Indeed, overexpressing a representative tetra-copy core gene, the glycine decarboxylase (MsGDC), improves both biomass and nitrogen use efficiency, despite its high genetic burden. Our study reveals the trade-off between adaptation and evolutionary constraints mediated by tetra-copy core genes, facilitating polyploid genetics and alfalfa breeding. The genetic mechanisms of why polyploid plants exhibit adaptive advantages over diploids are unclear. Here, the authors assemble the autotetraploid alfalfa genome, incorporate it into a pangenome analysis, and reveal the trade-off between adaptive advantages and evolutionary constraints mediated by tetra-copy core genes.
Abstract Alfalfa (Medicago sativa L.), a perennial legume forage, has been broadly cultivated owing to a variety of favorable characteristics, including comprehensive ecological adaptability, superior nutritive value and palatability, and nitrogen fixation capacity. The productivity traits of alfalfa, specifically its biomass yield and forage quality, are significantly influenced by a series of determinants, including internal developmental factors and external environmental cues. However, the regulatory mechanisms underlying the fundamental biological problems of alfalfa remain elusive. Here, we conducted a comprehensive review focusing on the genomics of alfalfa, advancements in gene-editing technologies, and the identification of genes that control pivotal agronomic characteristics, including biomass formation, nutritional quality, flowering time, and resistance to various stresses. Moreover, a molecular design roadmap for the ‘ideal alfalfa’ has been proposed and the potential of pangenomes, self-incompatibility mechanisms, de novo domestication, and intelligent breeding strategies to enhance alfalfa's yield, quality, and resilience were further discussed. This review will provide comprehensive information on the basic biology of alfalfa and offer new insights for the cultivation of ideal alfalfa.
Progress in genome sequencing, assembly and analysis allows for a deeper study of agricultural plants’ chromosome structures, gene identification and annotation. The published genomes of agricultural plants proved to be a valuable tool for studing gene functions and for marker-assisted and genomic selection. However, large structural genome changes, including gene copy number variations (CNVs) and gene presence/absence variations (PAVs), prevail in crops. These genomic variations play an important role in the functional set of genes and the gene composition in individuals of the same species and provide the genetic determination of the agronomically important crops properties. A high degree of genomic variation observed indicates that single reference genomes do not represent the diversity within a species, leading to the pangenome concept. The pangenome represents information about all genes in a taxon: those that are common to all taxon members and those that are variable and are partially or completely specific for particular individuals. Pangenome sequencing and analysis technologies provide a large-scale study of genomic variation and resources for an evolutionary research, functional genomics and crop breeding. This review provides an analysis of agricultural plants’ pangenome studies. Pangenome structural features, methods and programs for bioinformatic analysis of pangenomic data are described
Previous studies exploring sequence variation in the model legume, Medicago truncatula, relied on mapping short reads to a single reference. However, read-mapping approaches are inadequate to examine large, diverse gene families or to probe variation in repeat-rich or highly divergent genome regions. De novo sequencing and assembly of M. truncatula genomes enables near-comprehensive discovery of structural variants (SVs), analysis of rapidly evolving gene families, and ultimately, construction of a pan-genome. Genome-wide synteny based on 15 de novo M. truncatula assemblies effectively detected different types of SVs indicating that as much as 22% of the genome is involved in large structural changes, altogether affecting 28% of gene models. A total of 63 million base pairs (Mbp) of novel sequence was discovered, expanding the reference genome space for Medicago by 16%. Pan-genome analysis revealed that 42% (180 Mbp) of genomic sequences is missing in one or more accession, while examination of de novo annotated genes identified 67% (50,700) of all ortholog groups as dispensable – estimates comparable to recent studies in rice, maize and soybean. Rapidly evolving gene families typically associated with biotic interactions and stress response were found to be enriched in the accession-specific gene pool. The nucleotide-binding site leucine-rich repeat (NBS-LRR) family, in particular, harbors the highest level of nucleotide diversity, large effect single nucleotide change, protein diversity, and presence/absence variation. However, the leucine-rich repeat (LRR) and heat shock gene families are disproportionately affected by large effect single nucleotide changes and even higher levels of copy number variation. Analysis of multiple M. truncatula genomes illustrates the value of de novo assemblies to discover and describe structural variation, something that is often under-estimated when using read-mapping approaches. Comparisons among the de novo assemblies also indicate that different large gene families differ in the architecture of their structural variation.
Summary Alfalfa (Medicago sativa L.) is one of the most important forage legumes in the world, including autotetraploid (M. sativa ssp. sativa) and diploid alfalfa (M. sativa ssp. caerulea, progenitor of autotetraploid alfalfa). Here, we reported a high‐quality genome of ZW0012 (diploid alfalfa, 769 Mb, contig N50 = 5.5 Mb), which was grouped into the Northern group in population structure analysis, suggesting that our genome assembly filled a major gap among the members of M. sativa complex. During polyploidization, large phenotypic differences occurred between diploids and tetraploids, and the genetic information underlying its massive phenotypic variations remains largely unexplored. Extensive structural variations (SVs) were identified between ZW0012 and XinJiangDaYe (an autotetraploid alfalfa with released genome). We identified 71 ZW0012‐specific PAV genes and 1296 XinJiangDaYe‐specific PAV genes, mainly involved in defence response, cell growth, and photosynthesis. We have verified the positive roles of MsNCR1 (a XinJiangDaYe‐specific PAV gene) in nodulation using an Agrobacterium rhizobia‐mediated transgenic method. We also demonstrated that MsSKIP23_1 and MsFBL23_1 (two XinJiangDaYe‐specific PAV genes) regulated leaf size by transient overexpression and virus‐induced gene silencing analysis. Our study provides a high‐quality reference genome of an important diploid alfalfa germplasm and a valuable resource of variation landscape between diploid and autotetraploid, which will facilitate the functional gene discovery and molecular‐based breeding for the cultivars in the future.
Structural variants (SVs) constitute a large proportion of the genomic variation that results in phenotypic variation in plants. However, they are still a largely unexplored feature in most plant genomes. Here, we present the whole-genome landscape of SVs between two model legume Medicago truncatula ecotypes–Jemalong A17 and R108– that have been extensively used in various legume biology studies. To catalogue SVs, we first resolved the previously published R108 genome assembly (R108 v1.0) to chromosome-scale using 124 × Hi-C data, resulting in a high-quality genome assembly. The inter-chromosomal reciprocal translocations between chromosomes 4 and 8 were confirmed by performing syntenic analysis between the two genomes. Combined with the Hi-C data, it appears that these translocation events had a significant effect on chromatin organization. Using both whole-genome and short-read alignments, we identified the genomic landscape of SVs between the two genomes, some of which may account for several phenotypic differences, including their differential responses to aluminum toxicity and iron deficiency, and the development of different anthocyanin leaf markings. We also found extensive SVs within the nodule-specific cysteine-rich gene family which encodes antimicrobial peptides essential for terminal bacteroid differentiation during nitrogen-fixing symbiosis. Our results provide a near-complete R108 genome assembly and the first genomic landscape of SVs obtained by comparing two M. truncatula ecotypes. This may provide valuable genomic resources for the functional and molecular research of legume biology in the future.
… We also examined the expanded and contracted gene families in 15 genomes and performed enrichment analysis (Figure 1e–g). In Bolivia, expanded gene families were primarily …
Alfalfa is an important legume forage grown worldwide and its productivity is affected by environmental stresses such as drought and high salinity. In this work, three alfalfa germplasms with contrasting tolerances to drought and high salinity were used for unraveling the transcriptomic responses to drought and salt stresses. Twenty-one different RNA samples from different germplasm, stress conditions or tissue sources (leaf, stem and root) were extracted and sequenced using the PacBio (Iso-Seq) and the Illumina platforms to obtain full-length transcriptomic profiles. A total of 1,124,275 and 91,378 unique isoforms and genes were obtained, respectively. Comparative analysis of transcriptomes identified differentially expressed genes and isoforms as well as transcriptional and post-transcriptional modifications such as alternative splicing events, fusion genes and nonsense-mediated mRNA decay events and non-coding RNA such as circRNA and lncRNA. This is the first time to identify the diversity of circRNA and lncRNA in response to drought and high salinity in alfalfa. The analysis of weighted gene co-expression network allowed to identify master genes and isoforms that may play important roles on drought and salt stress tolerance in alfalfa. This work provides insight for understanding the mechanisms by which drought and salt stresses affect alfalfa growth at the whole genome level.
Alfalfa (Medicago sativa L.) is a high-nutritive-value forage crop that provides livestock with abundant protein and essential nutrients. Breeding elite cultivars with superior quality has become a major goal in modern alfalfa improvement. This study systematically evaluated 12 quality-related traits under field conditions using a diverse panel of 176 alfalfa accessions and investigated the genetic basis underlying these traits. Phenotypic analysis revealed variability across all traits, with coefficients of variation ranging from 2.56% to 15.72%. Based on multi-trait clustering analysis, 16 accessions with overall superior quality were identified. Genome-wide association studies (GWAS) detected 45 significant single nucleotide polymorphisms (SNPs) and 12 structural variants (SVs). Within the associated genomic regions, eight candidate genes were prioritized. RT-qPCR validation indicated that three of these genes (Msa.H.0301430, Msa.H.0290550, and Msa.H.0313490) negatively regulate quality traits, while one gene (Msa.H.0479570) acts as a positive regulator. Haplotype analysis further revealed a positive correlation between the number of favorable haplotypes and phenotypic performance. Genomic prediction (GP) achieved accuracies ranging from 0.71 to 0.86 for the traits when incorporating the top 5000 SNPs identified from GWAS. This study provides valuable insights into the genetic architecture of quality-related traits in alfalfa and lays a solid foundation for future molecular design breeding.
The pan-genome concept encompasses all individual genome sequences within a species, comprising both core and variable genes. In recent years, it has emerged as a promising avenue for advancing plant genetic evolution and cultivating desirable traits. This review provides a comprehensive overview of pan-genome representation, detailing both linear pan-genome and graph-based pan-genome. It further presents a thorough analysis of the assembly methods for these pan-genomic structures and discusses the widely-used software tools and processes for their construction, including VG, Minigraph and PGGB. The paper concurrently delineates the fundamental factors that influence pan-genome quality, which include genome assembly quality, accurate annotation, identification of homologous genes, and strategic sample selection. Furthermore, we investigate the pan-genome’s applications in plant genomics and provide a summary of recent research findings related to the plant pan-genomes, particularly in elucidating genetic transformations during domestication, uncovering genetic variations linked to agronomic traits, and offering insights to guide breeding endeavors.
SUMMARY Narrow‐leafed lupin (NLL; Lupinus angustifolius) is a key rotational crop for sustainable farming systems, whose grain is high in protein content. It is a gluten‐free, non‐genetically modified, alternative protein source to soybean (Glycine max) and as such has gained interest as a human food ingredient. Here, we present a chromosome‐length reference genome for the species and a pan‐genome assembly comprising 55 NLL lines, including Australian and European cultivars, breeding lines and wild accessions. We present the core and variable genes for the species and report on the absence of essential mycorrhizal associated genes. The genome and pan‐genomes of NLL and its close relative white lupin (Lupinus albus) are compared. Furthermore, we provide additional evidence supporting LaRAP2‐7 as the key alkaloid regulatory gene for NLL and demonstrate the NLL genome is underrepresented in classical NLR disease resistance genes compared to other sequenced legume species. The NLL genomic resources generated here coupled with previously generated RNA sequencing datasets provide new opportunities to fast‐track lupin crop improvement.
In the last decade, legume genomics research has seen a paradigm shift due to advances in genome sequencing technologies, assembly algorithms, and computational genomics that enabled the construction of high-quality reference genome assemblies of major legume crops. These advances have certainly facilitated the identification of novel genetic variants underlying the traits of agronomic importance in many legume crops. Furthermore, these robust sequencing technologies have allowed us to study structural variations across the whole genome in multiple individuals and at the species level using ‘pangenome analysis.’ This review updates the progress of constructing pangenome assemblies for various legume crops and discusses the prospects for these pangenomes and how to harness the information to improve various traits of economic importance through molecular breeding to increase genetic gain in legumes and tackle the increasing global food crisis.
… No statistical method was used to predetermine sample size for pangenome analysis in this study. No data were excluded from the analyses. The experiments were not randomized, …
Autotetraploid alfalfa is a major hay crop planted all over the world due to its adaptation in different environments and high quality for animal feed. However, the genetic basis of alfalfa quality is not fully understood. In this study, a diverse panel of 200 alfalfa accessions were planted in field trials using augmented experimental design at three locations in 2018 and 2019. Thirty-four quality traits were evaluated by Near Infrared Reflectance Spectroscopy (NIRS). The plants were genotyped using a genotyping by sequencing (GBS) approach and over 46,000 single nucleotide polymorphisms (SNPs) were obtained after variant calling and filtering. Genome-wide association studies (GWAS) identified 28 SNP markers associated with 16 quality traits. Among them, most of the markers were associated with fiber digestibility and protein content. Phenotypic variations were analyzed from three locations and different sets of markers were identified by GWAS when using phenotypic data from different locations, indicating that alfalfa quality traits were also affected by environmental factors. Among different sets of markers identified by location, two markers were associated with nine traits of fiber digestibility. One marker associated with lignin content was identified consistently in multiple environments. Putative candidate genes underlying fiber-related loci were identified and they are involved in the lignin and cell wall biosynthesis. The DNA markers and associated genes identified in this study will be useful for the genetic improvement of forage quality in alfalfa after the validation of the markers.
针对苜蓿及相关豆科作物的研究,现有文献主要分为三个逻辑层面:首先是泛基因组的构建技术与理论方法体系;其次是基于组装数据的结构变异分析及核心基因的进化与适应机制研究;最后是结合泛转录组与关联分析,深入挖掘响应环境胁迫及调控农艺性状的功能基因,为分子育种提供理论支撑。