ai agent、大模型

本次综合报告将AI Agent与大模型的研究划分为五大核心维度：首先是多智能体架构与协作通信，探讨复杂系统中的交互机制；其次是推理、反射与规划能力的底层逻辑增强；第三是工具使用与任务执行，侧重于 Agent 的自主工具调用及自我迭代能力；第四是广泛的垂直行业落地应用，涵盖了工程、金融、科研等关键生产领域；最后是针对Agent系统的安全性保障、可靠性评估框架及性能基准建设，旨在推动Agent从单一功能验证向可信、可控的工业级应用演进。

共 201 篇文献，5 个研究方向

多智能体协作架构与通信机制

集中研究多智能体系统（MAS）的通用框架、任务分解、拓扑结构及智能体间的通信与协作协议，旨在实现大规模复杂环境下的协同效率。相关文献: Kannan Parthasarathy et. al, 2025 等 20 篇文献

推理增强、反射机制与自主规划

关注LLM Agent的底层逻辑范式，涵盖推理能力提升、自我反思、思维链（CoT）、规划能力以及在复杂决策环境下的逻辑优化。相关文献: Yurun Yuan et. al, 2025 等 19 篇文献

工具使用、自我进化与任务执行

探讨智能体调用外部工具的机制，包括API适配、工具学习、任务执行的自动化范式以及通过反馈机制实现智能体的自我进化。相关文献: Chenfei Zhu et. al, 2025 等 31 篇文献

领域应用驱动的系统开发

展示Agent在特定行业场景中的落地实践，包括软件工程、科研、金融、医疗、物联网、自动驾驶及多媒体内容生产等。相关文献: D. Rivkin et. al, 2025 等 85 篇文献

安全性、可靠性评估与分析框架

研究Agent系统在部署中的鲁棒性、不确定性、对抗防御、安全合规性以及针对Agent的通用评估基准与可视化分析方法。相关文献: Lin Xu et. al, 2023 等 46 篇文献

总计205篇相关文献

RepoAudit: An Autonomous LLM-Agent for Repository-Level Code Auditing

RepoAudit：一个用于代码库级别代码审计的自主LLM智能体

Jinyao Guo, Chengpeng Wang, Xiangzhe Xu 等, 2025-International Conference on Machine Learning

Code auditing is the process of reviewing code with the aim of identifying bugs. Large Language Models (LLMs) have demonstrated promising capabilities for this task without requiring compilation, while also supporting user-friendly customization. However, auditing a code repository with LLMs poses significant challenges: limited context windows and hallucinations can degrade the quality of bug reports, and analyzing large-scale repositories incurs substantial time and token costs, hindering efficiency and scalability. This work introduces an LLM-based agent, RepoAudit, designed to perform autonomous repository-level code auditing. Equipped with agent memory, RepoAudit explores the codebase on demand by analyzing data-flow facts along feasible program paths within individual functions. It further incorporates a validator module to mitigate hallucinations by verifying data-flow facts and checking the satisfiability of path conditions associated with potential bugs, thereby reducing false positives. RepoAudit detects 40 true bugs across 15 real-world benchmark projects with a precision of 78.43%, requiring on average only 0.44 hours and $2.54 per project. Also, it detects 185 new bugs in high-profile projects, among which 174 have been confirmed or fixed. We have open-sourced RepoAudit at https://github.com/PurCL/RepoAudit.