大语言模型（LLM）在智能审计、财报分析中的效率与偏差

本报告综合了当前大语言模型（LLM）在智能审计与财报分析领域的研究成果。研究脉络呈现出从“流程自动化”到“深度语义理解”，再到“可信架构构建”的演进趋势。一方面，LLM通过RAG、多智能体协作等技术显著提升了舞弊检测、业绩预测和报告生成的效率；另一方面，学术界对金融高风险场景下的数值幻觉、算法偏见及合规性风险保持高度警惕，并致力于通过构建专业基准与抑制技术实现“负责任的金融AI”。最终，这些技术应用不仅改变了审计行业的组织形态，也深刻影响了资本市场的信息披露与决策机制。

共 107 篇文献，6 个研究方向

智能审计自动化与组织数字化转型

该组文献探讨了AI和LLM在审计职能中的宏观应用与流程优化，包括内部审计自动化、勾稽关系检测、审计效率提升以及审计机构在数字化转型中的组织适应性与动态能力。相关文献: Haichao Zhang et. al, 2025 等 15 篇文献

财务舞弊检测与智能风险识别监测

此类研究聚焦于利用LLM、图神经网络（GNN）及深度学习技术识别财报中的欺诈行为、异常指标及潜在风险，通过融合管理层讨论（MD&A）语义与财务数据提升检测精度。相关文献: Changhao Song et. al, 2025 等 16 篇文献

财报深度解析、业绩预测与辅助决策

该组文献关注LLM在财报摘要生成、情感分析、KPI预测及股权研究中的表现，验证了模型模拟人类分析师进行财务比率解释、盈利预测及ESG报告分析的能力。相关文献: Tianyu Cao et. al, 2024 等 15 篇文献

金融专用技术架构：RAG、多模态与多智能体协作

研究侧重于针对金融场景优化的底层架构，包括检索增强生成（RAG）、知识图谱（KG）集成、多智能体（Multi-agent）协作系统、NL2SQL技术以及处理表格数据的专用Transformer模型。相关文献: Venkata Sai Nageen et. al, 2024 等 25 篇文献

LLM偏差评估、幻觉抑制与金融可靠性基准

该组文献深入探讨了LLM在金融应用中的负面效应（如数值幻觉、表示偏差、输出漂移），并提出了针对金融任务的性能基准测试及通过CoT、数据清洗等手段提升可靠性的方法。相关文献: Alpay Sabuncuoglu et. al, 2025 等 27 篇文献

市场效应、合规性治理与专项金融应用

该组文献分析了财报披露对资本市场的实证影响，探讨了在法律监管（如欧盟准则）下的合规性问题，并涵盖了税务自动化、供应链金融等专项领域的AI应用。相关文献: Federico Siano et. al, 2025 等 9 篇文献

总计110篇相关文献

Enhancing Financial Risk Analysis using RAG-based Large Language Models

基于RAG的大语言模型增强金融风险评估

A.A. Darji, Fenil Kheni, Dhruvil Chodvadia 等, 2024-2024 3rd International Conference on Automation, Computing and Renewable Systems (ICACRS)

The rapid development of Generative AI has brought major changes in way of functioning of different sectors throughout the world. Many research work has been done in the field of financial sector to increase the efficiency and reduce the errors due to human intervention. However, the current financial risk analysis relies on manual reviews and conventional machine learning models which repeatedly failing to process financial risk data. This study investigates how Retrieval-Augmented Generation (RAG) approach can help Large Language Models (LLM) to generate risk analysis reports for audit reports which extract detailed information from the audit reports and avoid overlooking of small details, which was a major drawback in the earlier system. This research study covers how Retrieval Augmented Generation (RAG) enhances the performance financial risk analysis of audit reports using different LLMs like GPT-4o, Gemini-1.5-flash, and LlaMa3.1. This research work includes the performance of LLMs beyond multiple metrics, including faithfulness, context precision-recall-relevancy, and answer relevance. The research findings imply that LlaMa3.1 is a great model in terms of faithfulness of the generated report with a score of 78.26%. In terms of retrieval of the documents and its context, Llama had a very strong performance by getting the score of 79.62% in context-precision, 78.26% in context-recall and 86.99% in context-relevancy. In terms of generated report, the Llama3.1 model have the score of 37.83% for answer-relevancy and Gemini-1.5-flash have a score of 58.64% for answer-correctness.