知识库用于语言资源数字化建设

基于知识图谱与本体的语言知识建模与应用

这些文献共同关注通过本体论（Ontology）、知识图谱（Knowledge Graph）和语义技术，构建结构化的语言或领域知识库，以实现知识的推理、可视化及智能化管理。

The Linguistic Design of the EuroWordNet Database（Antonietta Alonge, Nicoletta Calzolari, Piek Vossen, Laura Bloksma, Irene Castellón Masalles, M. Antònia Martí, Wim Peters, 1998, Computers and the Humanities）
An Ontology based Smart Management of Linguistic Knowledge（Mariem Neji, Fatma Ghorbel, Bilel Gargouri, Nada Mimouni, Elisabeth Métais, 2022, Journal of Data Mining & Digital Humanities）
Ontologies and ontological methods in linguistics（Andrea C. Schalley, 2019, Language and Linguistics Compass）
A Comprehensive Survey on Automatic Knowledge Graph Construction（Lingfeng Zhong, Jia Wu, Qian Li, Hao Peng, Xindong Wu, 2023, ACM Computing Surveys）
Automatic Construction of Subject Knowledge Graph based on Educational Big Data（Ying Su, Yong Zhang, 2020, Proceedings of the 2020 3rd International Conference on Big Data and Education）
An Approach of Ontology Based Knowledge Base Construction for Chinese K12 Education（Jiawei Hu, Zheng Li, Bin Xu, 2016, 2016 First International Conference on Multimedia and Image Processing (ICMIP)）
DBpedia – A large-scale, multilingual knowledge base extracted from Wikipedia（Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N. Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick van Kleef, Sören Auer, Christian Bizer, 2015, Semantic Web）

语言数据资源库的建设与数字化存储

这些文献专注于特定领域（如二语习得、低资源语言、听力与口语发展）语言数据的收集、标注、组织及存储，强调构建供研究者使用的语料库或知识库平台。

Wordbank: an open repository for developmental vocabulary data（Michael C. Frank, Mika Braginsky, Daniel Yurovsky, Virginia A. Marchman, 2016, Journal of Child Language）
Methodology for the creation of a linguistic database: challenges and contributions to the teaching-learning process（Raimundo Gouveia da Silva, Iandra Maria Weirich da Silva Coelho, 2020, Revista de Estudos e Pesquisas sobre Ensino Tecnológico (EDUCITEC)）
Developing ODIN: A Multilingual Repository of Annotated Language Data for Hundreds of the World's Languages（W. David Lewis, Fan Xia, 2010, Literary and Linguistic Computing）
The Listening and Spoken Language Data Repository: Design and Project Overview（Tamala S. Bradham, Christopher Fonnesbeck, Alice E. Toll, Barbara F. Hecht, 2017, Language, Speech, and Hearing Services in Schools）
Discourse annotation guideline for low-resource languages（Francielle Vargas, Wolfgang S. Schmeisser-Nieto, Zohar Rabinovich, Thiago Alexandre Salgueiro Pardo, Fabrício Benevenuto, 2025, Natural Language Processing）
A Metadata Best Practice for a Scientific Data Repository（Jane Greenberg, Hollie White, Sarah Carrier, Ryan Scherle, 2009, Journal of Library Metadata）

语言处理工具链与知识库的自动化构建方法

这些文献讨论了从原始文本或复杂格式数据中自动抽取知识、转换现有语料库格式、以及整合多模态数据（如传感器数据）进行知识库更新的技术实现与算法模型。

Constructing a Second Language: Analyses and Computational Simulations of the Emergence of Linguistic Constructions From Usage（Nick C. Ellis, Diane Larsen‐Freeman, 2009, Language Learning）
UML AS DOMAIN SPECIFIC LANGUAGE FOR THE CONSTRUCTION OF KNOWLEDGE-BASED CONFIGURATION SYSTEMS（Alexander Felfernig, Gerhard Friedrich, Dietmar Jannach, 2000, International Journal of Software Engineering and Knowledge Engineering）
Automatic construction and validation of French large lexical resources. Reuse of verb theoretical linguistic descriptions（Nabil Hathout, Fiammetta Namer, 1998, Proceedings of the Language Resources and Evaluation Conference）
Research on Knowledge Base Construction and Incremental Updating Techniques for Low-resource domains Based on Large Language Models（Zixuan Zhang, Ziyao Han, Jixuan Zhang, Zhongbao Jia, Wentao Yu, Xiaohui Chen, 2025, 2025 5th International Conference on Computer, Internet of Things and Control Engineering (CITCE)）
Knowledge management and Cultural Heritage repositories: Cross-Lingual Information Retrieval strategies（Maria Pia di Buono, Mario Monteleone, Federica Marano, Johanna Monti, 2013, 2013 Digital Heritage International Congress (DigitalHeritage)）
Fonduer（Sen Wu, Luke Hsiao, Xiao Cheng, Braden Hancock, Theodoros Rekatsinas, Philip Levis, Christopher Ré, 2018, Proceedings of the 2018 International Conference on Management of Data）
Construction of Learning Resources for International Chinese Language Education Based on Sensor Technology and Knowledge Graphs（Yue Fu, Lei Zhao, Borui Zheng, Yirong Wang, Liqing Yang, 2026, Sensors and Materials）
Building a Morphological Treebank for German from a Linguistic Database（Petra Steiner, Josef Ruppenhofer, 2018, Proceedings of the Language Resources and Evaluation Conference）

语言数据库设计的理论基础与概论

这些文献提供了关于计算机在语言学研究中应用的基础性回顾、数据库建模原则及设计规范，起到综述和指南的作用。

Designing linguistic databases: A primer for linguists（Alexis Dimitriadis, Simon Musgrave, 2009, The Use of Databases in Cross-Linguistic Studies）
MAIN TYPES OF DATABASES IN LINGUISTIC RESEARCH OF THE XXI CENTURY: FEATURES AND FUNCTIONAL PURPOSE（V. V. Hromovenko, 2021, International Humanitarian University Herald. Philology）

知识库用于语言资源数字化建设

本报告将语言资源数字化建设的相关研究分为四个核心维度：本体与知识图谱的语义建模、专用语言资源库的构建实践、自动化知识抽取与多模态数据整合技术，以及数据库设计的理论基础与方法论概论。这些文献共同反映了从简单的语料存储向智能化、结构化、语义化知识库演进的行业发展趋势。

共 23 篇文献，4 个研究方向

基于知识图谱与本体的语言知识建模与应用

这些文献共同关注通过本体论（Ontology）、知识图谱（Knowledge Graph）和语义技术，构建结构化的语言或领域知识库，以实现知识的推理、可视化及智能化管理。相关文献: Antonietta Alonge et. al, 1998 等 7 篇文献

语言数据资源库的建设与数字化存储

这些文献专注于特定领域（如二语习得、低资源语言、听力与口语发展）语言数据的收集、标注、组织及存储，强调构建供研究者使用的语料库或知识库平台。相关文献: Michael C. Frank et. al, 2016 等 6 篇文献

语言处理工具链与知识库的自动化构建方法

这些文献讨论了从原始文本或复杂格式数据中自动抽取知识、转换现有语料库格式、以及整合多模态数据（如传感器数据）进行知识库更新的技术实现与算法模型。相关文献: Nick C. Ellis et. al, 2009 等 8 篇文献

语言数据库设计的理论基础与概论

这些文献提供了关于计算机在语言学研究中应用的基础性回顾、数据库建模原则及设计规范，起到综述和指南的作用。相关文献: Alexis Dimitriadis et. al, 2009 等 2 篇文献

总计23篇相关文献

Fonduer

Fonduer：一种基于机器学习的丰富格式数据知识库构建系统

dl.acm.org-Sen Wu, Luke Hsiao, Xiao Cheng 等, 2018-Proceedings of the 2018 International Conference on Management of Data

We focus on knowledge base construction (KBC) from richly formatted data. In contrast to KBC from text or tabular data, KBC from richly formatted data aims to extract relations conveyed jointly via textual, structural, tabular, and visual expressions. We introduce Fonduer, a machine-learning-based KBC system for richly formatted data. Fonduer presents a new data model that accounts for three challenging characteristics of richly formatted data: (1) prevalent document-level relations, (2) multimodality, and (3) data variety. Fonduer uses a new deep-learning model to automatically capture the representation (i.e., features) needed to learn how to extract relations from richly formatted data. Finally, Fonduer provides a new programming model that enables users to convert domain expertise, based on multiple modalities of information, to meaningful signals of supervision for training a KBC system. Fonduer-based KBC systems are in production for a range of use cases, including at a major online retailer. We compare Fonduer against state-of-the-art KBC approaches in four different domains. We show that Fonduer achieves an average improvement of 41 F1 points on the quality of the output knowledge base-and in some cases produces up to 1.87× the number of correct entries-compared to expert-curated public knowledge bases. We also conduct a user study to assess the usability of Fonduer's new programming model. We show that after using Fonduer for only 30 minutes, non-domain experts are able to design KBC systems that achieve on average 23 F1 points higher quality than traditional machine-learning-based KBC approaches.