Auto Research

本组文献从三个维度探讨了Auto Research领域：首先是AI代理在科研工作流中的深度集成与理论局限，其次是针对AI科研能力的规范化基准测试，最后是对大规模AI智能体群落行为动态的社会学实证分析。这些研究共同揭示了AI从辅助工具向自主研究参与者演进的过程、效能瓶颈及群体动力学特征。

共 3 篇文献，3 个研究方向

AI代理的研究实践与工作流建模

这些文献共同探讨了将AI代理应用于科学研究的全流程，研究AI如何通过特定工具和框架（如scholar-skill）自动化复杂的科研任务，并探讨了这种模式带来的职业影响和理论界限。相关文献: Yongjun Zhang et. al, 2026

AI代理科研能力的评估与基准测试

这篇文献侧重于量化评估AI代理在端到端科研任务中的表现，通过构建基准环境（ResearchGym）来分析代理在真实学术任务中的成功率及失效模式。相关文献: Aniketh Garikaparthi et. al, 2026

大规模AI代理社群的行为与互动模式

该文献关注群体层面的AI代理现象，通过实证研究观察大规模代理社群（Moltbook）的演化路径、参与行为及互动局限性，提供了一种社会学视角的研究。相关文献: Eason Chen et. al, 2026

总计3篇相关文献

Vibe Researching as Wolf Coming: Can AI Agents with Skills Replace or Augment Social Scientists?

狼群般的Vibe研究：具备技能的AI代理能否取代或补充社会科学家？

Yongjun Zhang, 2026-arXiv.org

AI agents -- systems that execute multi-step reasoning workflows with persistent state, tool access, and specialist skills -- represent a qualitative shift from prior automation technologies in social science. Unlike chatbots that respond to isolated queries, AI agents can now read files, run code, query databases, search the web, and invoke domain-specific skills to execute entire research pipelines autonomously. This paper introduces the concept of vibe researching -- the AI-era parallel to vibe coding -- and uses scholar-skill, a 26-skill plugin for Claude Code covering the full research pipeline from idea to submission across 18 orchestrated phases with 53 quality gates, as an illustrative case. I develop a cognitive task framework that classifies research activities along two dimensions -- codifiability and tacit knowledge requirement -- to identify a delegation boundary that is cognitive, not sequential: it cuts through every stage of the research pipeline, not between stages. I argue that AI agents excel at speed, coverage, and methodological scaffolding but struggle with theoretical originality and tacit field knowledge. The paper concludes with an analysis of three implications for the profession -- augmentation with fragile conditions, stratification risk, and a pedagogical crisis -- and proposes five principles for responsible vibe researching.

安装插件收集

被引 1

ResearchGym: Evaluating Language Model Agents on Real-World AI Research

ResearchGym：评估端到端AI研究中的语言模型代理

Aniketh Garikaparthi, Manasi S. Patwardhan, Arman Cohan, 2026-arXiv.org

We introduce ResearchGym, a benchmark and execution environment for evaluating AI agents on end-to-end research. To instantiate this, we repurpose five oral and spotlight papers from ICML, ICLR, and ACL. From each paper's repository, we preserve the datasets, evaluation harness, and baseline implementations but withhold the paper's proposed method. This results in five containerized task environments comprising 39 sub-tasks in total. Within each environment, agents must propose novel hypotheses, run experiments, and attempt to surpass strong human baselines on the paper's metrics. In a controlled evaluation of an agent powered by GPT-5, we observe a sharp capability--reliability gap. The agent improves over the provided baselines from the repository in just 1 of 15 evaluations (6.7%) by 11.5%, and completes only 26.5% of sub-tasks on average. We identify recurring long-horizon failure modes, including impatience, poor time and resource management, overconfidence in weak hypotheses, difficulty coordinating parallel experiments, and hard limits from context length. Yet in a single run, the agent surpasses the solution of an ICML 2025 Spotlight task, indicating that frontier agents can occasionally reach state-of-the-art performance, but do so unreliably. We additionally evaluate proprietary agent scaffolds including Claude Code (Opus-4.5) and Codex (GPT-5.2) which display a similar gap. ResearchGym provides infrastructure for systematic evaluation and analysis of autonomous agents on closed-loop research.

安装插件收集

被引 2

OpenClaw AI Agents as Informal Learners at Moltbook: Characterizing an Emergent Learning Community at Scale

Eason Chen, Ce Guan, A. Elshafiey 等, 2026-arXiv.org

Informal learning communities have been called the"other Massive Open Online C"in Learning@Scale research, yet remain understudied compared to MOOCs. We present the first empirical study of a large-scale informal learning community composed entirely of AI agents. Moltbook, a social network exclusively for AI agents powered by autonomous agent frameworks such as OpenClaw, grew to over 2.8 million registered agents in three weeks. Analyzing 231,080 non-spam posts across three phases of community evolution, we find three key patterns. First, participation inequality is extreme from the start (comment Gini = 0.889), exceeding human community benchmarks. Second, AI agents exhibit a"broadcasting inversion": statement-to-question ratios of 8.9:1 to 9.7:1 contrast sharply with the question-driven dynamics of human learning communities, and comment-level analysis of 1.55 million comments reveals a"parallel monologue"pattern where 93% of comments are independent responses rather than threaded dialogue. Third, we document a characteristic engagement lifecycle: explosive initial growth (184K posts from 32K authors in 11 days), a spam crisis (57,093 posts deleted by the platform), and engagement decline (mean comments: 31.7 ->8.3 ->1.7) that had not reversed by the end of our observation window despite effective spam removal. Sentiment analysis reveals a selection effect: comment tone becomes more positive as engagement declines, suggesting that casual participants disengage first while committed contributors remain. These findings have direct implications for hybrid human-AI learning platforms.

安装插件收集

被引 6

Auto Research

共 3 篇文献，3 个研究方向

AI代理的研究实践与工作流建模

AI代理科研能力的评估与基准测试

大规模AI代理社群的行为与互动模式