用于材料科学研究的智能体进展
领域综述、科研范式演进与发展路线图
这类文献提供了材料科学从AI4Science向Agentic Science演进的宏观视角,探讨了LLM在科研中的角色定位(如Oracle, Surrogate, Arbiter),并提出了针对特定领域(如2D材料、半导体)的战略发展路径。
- Towards Agentic Intelligence for Materials Science(Huan Zhang, Yizhan Li, Wenhao Huang, Ziyu Hou, Yu Song, Xuye Liu, Farshid Effaty, Jinya Jiang, Sifan Wu, Qianggang Ding, Izumi Takahara, Leonard R. MacGillivray, Teruyasu Mizoguchi, Tianshu Yu, Lizi Liao, Yuyu Luo, Yu Rong, Jia Li, Ying Diao, Heng Ji, Bang Liu, 2026, ArXiv Preprint)
- From large language models to AI agents in energy materials research: enabling discovery, design, and automation(Tongao Yao, Junming Huang, Yujie Yan, Yang Yang, Ziye Wang, Xuqiang Shao, Zhengyang Gao, Weijie Yang, 2025, AI Agent)
- The Future of Artificial Intelligence and the Mathematical and Physical Sciences (AI+MPS)(Andrew Ferguson, Marisa LaFleur, Lars Ruthotto, Jesse Thaler, Yuan-Sen Ting, Pratyush Tiwary, Soledad Villar, E. Paulo Alves, Jeremy Avigad, Simon Billinge, Camille Bilodeau, Keith Brown, Emmanuel Candes, Arghya Chattopadhyay, Bingqing Cheng, Jonathan Clausen, Connor Coley, Andrew Connolly, Fred Daum, Sijia Dong, Chrisy Xiyu Du, Cora Dvorkin, Cristiano Fanelli, Eric B. Ford, Luis Manuel Frutos, Nicolás García Trillos, Cecilia Garraffo, Robert Ghrist, Rafael Gomez-Bombarelli, Gianluca Guadagni, Sreelekha Guggilam, Sergei Gukov, Juan B. Gutiérrez, Salman Habib, Johannes Hachmann, Boris Hanin, Philip Harris, Murray Holland, Elizabeth Holm, Hsin-Yuan Huang, Shih-Chieh Hsu, Nick Jackson, Olexandr Isayev, Heng Ji, Aggelos Katsaggelos, Jeremy Kepner, Yannis Kevrekidis, Michelle Kuchera, J. Nathan Kutz, Branislava Lalic, Ann Lee, Matt LeBlanc, Josiah Lim, Rebecca Lindsey, Yongmin Liu, Peter Y. Lu, Sudhir Malik, Vuk Mandic, Vidya Manian, Emeka P. Mazi, Pankaj Mehta, Peter Melchior, Brice Ménard, Jennifer Ngadiuba, Stella Offner, Elsa Olivetti, Shyue Ping Ong, Christopher Rackauckas, Philippe Rigollet, Chad Risko, Philip Romero, Grant Rotskoff, Brett Savoie, Uros Seljak, David Shih, Gary Shiu, Dima Shlyakhtenko, Eva Silverstein, Taylor Sparks, Thomas Strohmer, Christopher Stubbs, Stephen Thomas, Suriyanarayanan Vaikuntanathan, Rene Vidal, Francisco Villaescusa-Navarro, Gregory Voth, Benjamin Wandelt, Rachel Ward, Melanie Weber, Risa Wechsler, Stephen Whitelam, Olaf Wiest, Mike Williams, Zhuoran Yang, Yaroslava G. Yingling, Bin Yu, Shuwen Yue, Ann Zabludoff, Huimin Zhao, Tong Zhang, 2025, ArXiv Preprint)
- 34 Examples of LLM Applications in Materials Science and Chemistry: Towards Automation, Assistants, Agents, and Accelerated Scientific Discovery(Yoel Zimmermann, Adib Bazgir, Alexander Al-Feghali, Mehrad Ansari, L. C. Brinson, Chiang Yuan, Defne Çirci, Min-Hsueh Chiu, Nathan Daelman, Matthew L Evans, Abhijeet Gangan, Janine George, Hassan Harb, Ghazal Khalighinejad, S. Khan, Sascha Klawohn, Magdalena Lederbauer, Soroush Mahjoubi, Bernadette Mohr, S. M. Moosavi, A. Naik, Aleyna Beste Ozhan, D. Plessers, Aritra Roy, Fabian Schoppach, Philipp Schwaller, Carla Terboven, Katharina Ueltzen, Shang Zhu, Jan Janssen, Calvin Li, Ian T. Foster, B. Blaiszik, 2025, ArXiv)
- Transformative applications of artificial intelligence in lithium battery materials science: advancements and future prospects(Guangcun Shan, Zejian Ding, Liujiang Xi, Hongbin Zhao, Jiliang Zhang, Jijian Xu, 2025, Rare Metals)
- Self-Driving Laboratories: Translating Materials Science from Laboratory to Factory(Andre K. Y. Low, J. J. W. Cheng, K. Hippalgaonkar, L. W. Ng, 2025, ACS Omega)
- AI4Materials: a manifesto for AI-driven scientific discovery in materials science(Eros Radicchi, Usman Syed, Federico Cunico, M. Cassetta, Uzair Khan, Paolo Marone, Alberto Scarsini, Filiberto Semenzin, Francesco Enrichi, Francesco Setti, Adolfo Speghini, Marco Cristani, 2025, No journal)
- Artificial Intelligence for Materials Discovery, Development, and Optimization.(Benediktus Madika, Aditi Saha, Chaeyul Kang, Batzorig Buyantogtokh, Joshua Agar, Christopher M Wolverton, Peter W. Voorhees, Peter B. Littlewood, Sergei V. Kalinin, Seungbum Hong, 2025, ACS nano)
- A Survey of AI for Materials Science: Foundation Models, LLM Agents, Datasets, and Tools(Minh-Hao Van, Prateek Verma, Chen Zhao, Xintao Wu, 2025, ArXiv)
- Agentic Assistant for Materials Scientists(Ruozhu Feng, Yangang Liang, Tianzhixi Yin, Peiyuan Gao, Wei Wang, 2025, The Electrochemical Society Interface)
- From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery(Jiaqi Wei, Yuejin Yang, Xiang Zhang, Yuhan Chen, Xiang Zhuang, Zhangyang Gao, Dongzhan Zhou, Guangshuai Wang, Zhiqiang Gao, Juntai Cao, Zijie Qiu, Ming Hu, Chenglong Ma, Shixiang Tang, Junjun He, Chunfeng Song, Xuming He, Qiang Zhang, Chenyu You, Shuangjia Zheng, Ning Ding, Wanli Ouyang, Nanqing Dong, Yu Cheng, Siqi Sun, Lei Bai, Bowen Zhou, 2025, ArXiv Preprint)
- Development of a domain-specific large language model for III-nitrides via Qwen-based architecture and fine-tuning optimization(Hongbo Wang, Shutian Liu, Da Yang, Lin Qu, 2025, No journal)
- Roadmap: 2D Materials for Quantum Technologies(Qimin Yan, Tongcang Li, Xingyu Gao, Sumukh Vaidya, Saakshi Dikshit, Yue Luo, Stefan Strauf, Reda Moukaouine, Anton Pershin, Adam Gali, Zhenyao Fang, Harvey Stanfield, Ivan J. Vera-Marun, Michael Newburger, Simranjeet Singh, Tiancong Zhu, Mauro Brotons-Gisbert, Klaus D. Jöns, Brian D. Gerardot, Brian S. Y. Kim, John R. Schaibley, Kyle L. Seyler, Jesse Balgley, James Hone, Kin Chung Fong, Lin Wang, Guido Burkard, Yihang Zeng, Tobias Heindel, Serkan Ateş, Tobias Vogl, Igor Aharonovich, 2025, ArXiv Preprint)
- Generative AI in Materials Science: Accelerating Discovery Through Inverse Design (K. Kumar, 2025, SSRN Electronic Journal)
- Beyond designer’s knowledge: Generating materials design hypotheses via a large language model(Quanliang Liu, Maciej P. Polak, So Yeon Kim, MD Al Amin Shuvo, H. Deodhar, Jeongsoo Han, Dane Morgan, Hyunseok Oh, 2025, Acta Materialia)
- AI agents for automating materials research: a case study of crystal plasticity simulations(Jiyi Yang, Yoshinao Kobayashi, Masahiko Demura, 2026, Science and Technology of Advanced Materials: Methods)
- Knowledge-guided large language model for material science(Guanjie Wang, Jingjing Hu, Jian Zhou, Sen Liu, Qingjiang Li, Zhimei Sun, 2025, Review of Materials Research)
- Constitutive scientific generative agent (CSGA): Leveraging large language models for automated constitutive model discovery(Marius Tacke, Matthias Busch, Kartik Bali, Kian P. Abdolazizi, Kevin Linka, C. Cyron, Roland C. Aydin, 2025, Machine Learning for Computational Science and Engineering)
- Electronic polymer discovery through adaptive AI-guided autonomous experimentation(N. Chem, Eng, 2025, Nature Chemical Engineering)
- Agentic material science(Chengbo Li, N. Ran, Jianjun Liu, 2026, Journal of Materials Informatics)
- Autonomous Agents for Scientific Discovery: Orchestrating Scientists, Language, Code, and Physics(Lianhao Zhou, Hongyi Ling, Cong Fu, Yepeng Huang, Michael Sun, Wendi Yu, Xiaoxuan Wang, Xiner Li, Xingyu Su, Junkai Zhang, Xiusi Chen, Chenxing Liang, Xiaofeng Qian, Heng Ji, Wei Wang, Marinka Zitnik, Shuiwang Ji, 2025, ArXiv Preprint)
- 32 examples of LLM applications in materials science and chemistry: towards automation, assistants, agents, and accelerated scientific discovery(Yoel Zimmermann, Adib Bazgir, Alexander Al-Feghali, Mehrad Ansari, Joshua Bocarsly, L. C. Brinson, Chiang Yuan, Defne Çirci, Min-Hsueh Chiu, Nathan Daelman, Matthew L Evans, Abhijeet Gangan, Janine George, Hassan Harb, Ghazal Khalighinejad, Sartaaj Takrim Khan, Sascha Klawohn, Magdalena Lederbauer, Soroush Mahjoubi, Bernadette Mohr, Seyed Mohamad Moosavi, Aakash N. Naik, Aleyna Beste Ozhan, D. Plessers, Aritra Roy, Fabian Schoeppach, Philipp Schwaller, Carla Terboven, Katharina Ueltzen, Yue Wu, Shang Zhu, Jan Janssen, Calvin Li, Ian T. Foster, B. Blaiszik, 2025, Machine Learning)
- Large Language Model in Materials Science: Roles, Challenges, and Strategic Outlook(Jinglan Zhang, Xinyi Chen, Xu Ye, Yulin Yang, Bin Ai, 2025, Advanced Intelligent Discovery)
- Scientific generative AI and materials science: From prediction to physical intelligence(Blessing Ishola, Corrisa Heyes, 2025, MRS Bulletin)
- Unlocking the future of materials science: key insights from the DCTMD workshop(R. Kobayashi, Roger D Amos, T. Crawford, Hongxia Hao, Yi Liu, T. Lookman, R. Ramprasad, Matthias Scheffler, Hong Wang, Tong-Yi Zhang, 2025, Journal of Materials Informatics)
- The paradigm-shifting potential of AI in materials science(Ben Ikenson, 2025, Scilight)
- Integrating Large Language Models into the Chemistry and Materials Science Laboratory Curricula(Annalise E. Maughan, E. Toberer, A. Zevalkink, 2025, Chemistry of Materials)
- Unknowium, beyond the banana, and AI discovery in materials science(Steve Cranford, 2025, Matter)
- A family of large language models for materials research with insights into model adaptability in continued pretraining(Dhruv Ahlawat, Vaibhav Mishra, Somaditya Singh, Mohd Zaki, Vaibhav Bihani, Hargun Singh Grover, Biswajit Mishra, Santiago Miret, Mausam, N. M. A. Krishnan, 2026, Nature Machine Intelligence)
文献知识挖掘、多模态表征与科学数据库构建
侧重于利用LLM和视觉-语言模型(VLM)从海量非结构化文献、图表、光谱及显微图像中自动提取材料属性、合成路线和微观结构特征,构建结构化知识图谱或领域数据库。
- Perovskite-LLM: Knowledge-Enhanced Large Language Models for Perovskite Solar Cell Research(Xiang Liu, Penglei Sun, Shuyan Chen, Longhan Zhang, Peijie Dong, Huajie You, Yongqi Zhang, Chang Yan, Xiaowen Chu, Tong-yi Zhang, 2025, No journal)
- Automated construction of inorganic materials databases and materials discovery based on large language models(Tianrui Yang, 2025, Journal of Computational Methods in Sciences and Engineering)
- Text-mined dataset of solid-state syntheses with impurity phases using Large Language Model(Sanghoon Lee, Kevin Cruse, V. Baibakova, Gerbrand Ceder, Anubhav Jain, 2025, Scientific Data)
- Hierarchical Deep Research with Local-Web RAG: Toward Automated System-Level Materials Discovery(Rui Ding, R. P. Ferreira, Yuxin Chen, Junhong Chen, 2025, ArXiv)
- A Large Language Model-Based Question and Answer Platform for Solid-State Batteries Using the Model Context Protocol: SSB-Q&A-Platform(Zhiyong Liu, Yuedong Sun, Dongxu Guo, Yuejiu Zheng, 2025, 2025 4th International Conference on Artificial Intelligence, Human-Computer Interaction and Robotics (AIHCIR))
- Towards an automated workflow in materials science for combining multi-modal simulative and experimental information using data mining and large language models(Balduin Katzer, Steffen Klinder, Katrin Schulz, 2025, ArXiv)
- Large Language Model-Driven Knowledge Discovery for Designing Advanced Micro/Nano Electrocatalyst Materials(Yin Shen, Shichao Zhao, Yanfei Lv, Fei Chen, Li Fu, H. Karimi‐Maleh, 2025, Computers, Materials & Continua)
- Large Language Model-Powered Decision Support for a Metal Additive Manufacturing Knowledge Graph(Muhammad Tayyab Khan, Lequn Chen, Wenhe Feng, S. Moon, 2025, ArXiv)
- Evaluating the Role of Model Size in Agentic AI for Expert-Like Material Selection(Megan Y. Ying, Daniele Grandi, Allin Groom, Christopher McComb, 2025, Volume 3A: 51st Design Automation Conference (DAC))
- Language Native Lightly Structured Databases for Large Language Model Driven Composite Materials Research(Yuze Liu, Zhaoyuan Zhang, Xiangsheng Zeng, Yihe Zhang, Leping Yu, Lejia Wang, Xi Yu, 2025, ArXiv)
- LLM-Driven Knowledge Graphs: Automated Creation and Natural Language Querying for Materials Science Research(Vladislav Kaverinskiy, O. Palagin, Dariia Nikitiuk, Anna Litvin, K. Malakhov, 2025, No journal)
- Automated Extraction of Material Properties using LLM-based AI Agents(Subham Ghosh, A. Tewari, 2025, ArXiv)
- Symbol-based entity marker highlighting for enhanced text mining in materials science with generative AI(Junhyeong Lee, Jongmin Yuk, Chan-Woo Lee, 2025, ArXiv)
- Knowledge-Driven Agentic Scientific Corpus Distillation Framework for Biomedical Large Language Models Training(Meng Xiao, Xunxin Cai, Qingqing Long, Chengrui Wang, Yuanchun Zhou, Hengshu Zhu, 2025, ArXiv Preprint)
- Foundations of GenIR(Qingyao Ai, Jingtao Zhan, Yiqun Liu, 2025, ArXiv Preprint)
- Digital materials ecosystem: from databases to AI agents for autonomous discovery.(Di Zhang, X. Jia, Yuhang Wang, Heng Liu, Qian Wang, Seong‐Hoon Jang, Daksh Shah, Songbo Ye, Hung Ba Tran, Hao Li, 2026, Chemical science)
- Microfluidic informatics - A research paradigm for the future of microfluidics.(Qing Lu, Zhinan Zhang, Xianting Ding, 2025, Analytica chimica acta)
- Linking heterogeneous microstructure informatics with expert characterization knowledge through customized and hybrid vision-language representations for industrial qualification(Mutahar Safdar, Gentry Wood, Max Zimmermann, Guy Lamouche, Priti Wanjara, Yaoyao Fiona Zhao, 2025, ArXiv Preprint)
- Synergy of robotics and microfluidics for intelligent micro- and nanomanipulation(Mengmeng Xi, Pengsong Zhang, Junyue Dai, Jialei Shi, Haoran Cui, Bo Wang, Junhui Zhu, Peng Pan, Xinyu Liu, 2025, Biomicrofluidics)
- Operationalizing Serendipity: Multi-Agent AI Workflows for Enhanced Materials Characterization with Theory-in-the-Loop(Lan Yao, Suman Samantray, Ayana Ghosh, Kevin M. Roccapriore, L. Kovarik, Sarah I. Allec, Maxim A. Ziatdinov, 2025, ArXiv)
- Reshaping MOFs synthesis conditions mining with a dynamic multi-agents framework of large language model(Zuhong Lin, Daoyuan Ren, Kai Ran, Jing Sun, Songlin Yu, X. Bai, Xiaotian Huang, Haiyang He, Pengxu Pan, Ying Fang, Zhanglin Li, Haipu Li, Jingjing Yao, 2025, Transactions of Materials Research)
- Predicting Materials Thermodynamics Enabled by Large Language Model‐Driven Dataset Building and Machine Learning(Juejing Liu, Haydn Anderson, Noah I. Waxman, Vsevolod Kovalev, Byron Fisher, Elizabeth Li, Xiaofeng Guo, 2026, Advanced Intelligent Systems)
- LLM-Driven Multi-Agent Curation and Expansion of Metal-Organic Frameworks Database(Hong-Jik Kim, Dohoon Kim, Jihan Kim, 2025, ArXiv)
- Construction of an artificial-intelligence agent for the discovery of next-generation white-LED phosphors.(Zichun Zhou, Han Zhang, Chi Song, Chen Ming, Yi-Yang Sun, 2025, Physical chemistry chemical physics : PCCP)
- Harnessing Multimodal Data from Scientific Literature for Spectroscopy Informatics(Tanjin He, Aikaterini Vriza, Ian T. Foster, R. Assary, Logan Ward, Maria K. Y. Chan, 2025, ECS Meeting Abstracts)
- “DIVE” into hydrogen storage materials discovery with AI agents(Di Zhang, X. Jia, Tran Ba Hung, S. Jang, Linda Zhang, Ryuhei Sato, Yusuke Hashimoto, Toyoto Sato, Kiyoe Konno, S. Orimo, Hao Li, 2025, Chemical Science)
- LLM-based AI agents for automated extraction of material properties and structural features(Subham Ghosh, A. Tewari, 2026, Computational Materials Science)
- Automated extraction of materials system charts using a large language model framework(Quanliang Liu, Maciej P. Polak, MD Al Amin Shuvo, H. Deodhar, Jeongsoo Han, Dane Morgan, Hyunseok Oh, 2025, Scripta Materialia)
- Applications of natural language processing and large language models in materials discovery(Xue Jiang, Weiren Wang, Shaohan Tian, Hao Wang, T. Lookman, Yanjing Su, 2025, npj Computational Materials)
智能体系统架构、多机协作与通用推理框架
研究智能体的底层逻辑架构,包括Planner-Executor模式、主动工具发现(MCP协议)、多智能体辩论/黑板协作机制,以及提高长程任务可靠性与对齐的方法。
- Collaborative AI Enhances Image Understanding in Materials Science(R. Yin, Zhichu Ren, Zongyou Yin, Zhen Zhang, So Yeon Kim, Chia-Wei Hsu, Ju Li, 2025, ArXiv)
- Understanding Multi-Agent LLM Frameworks: A Unified Benchmark and Experimental Analysis(Abdelghny Orogat, Ana Rostam, Essam Mansour, 2026, ArXiv Preprint)
- LLM-Based Multi-Agent Blackboard System for Information Discovery in Data Science(Alireza Salemi, Mihir Parmar, Palash Goyal, Yiwen Song, Jinsung Yoon, Hamed Zamani, Tomas Pfister, Hamid Palangi, 2025, ArXiv Preprint)
- A Grassroots Network and Community Roadmap for Interconnected Autonomous Science Laboratories for Accelerated Discovery(Rafael Ferreira da Silva, Milad Abolhasani, Dionysios A. Antonopoulos, Laura Biven, Ryan Coffee, Ian T. Foster, Leslie Hamilton, Shantenu Jha, Theresa Mayer, Benjamin Mintz, Robert G. Moore, Salahudin Nimer, Noah Paulson, Woong Shin, Frederic Suter, Mitra Taheri, Michela Taufer, Newell R. Washburn, 2025, ArXiv Preprint)
- A lightweight model and multi-agent system for layer identification in two-dimensional materials(Ruiliang Zhou, Hailong Liu, I. Babichuk, Yurii A. Romaniuk, A. Tiutiunnyk, Jianan Zhang, Yan Pu, Zisen Zhou, D. Laroze, Jian Yang, 2025, Computational Materials Science)
- Application of multi-agent systems in the field of computational materials science(WU Ying, LU Zhong-Yi, GAO Ze-Feng, 2026, Acta Physica Sinica)
- RAG-Enhanced Collaborative LLM Agents for Drug Discovery(Namkyeong Lee, Edward De Brouwer, Ehsan Hajiramezanali, Tommaso Biancalani, Chanyoung Park, Gabriele Scalia, 2025, ArXiv Preprint)
- MCP-Zero: Active Tool Discovery for Autonomous LLM Agents(Xiang Fei, Xiawu Zheng, Hao Feng, 2025, ArXiv Preprint)
- Multicrossmodal Automated Agent for Integrating Diverse Materials Science Data(Adib Bazgir, Rama chandra Praneeth Madugula, Yuwen Zhang, 2025, ArXiv Preprint)
- SLM-MATRIX: a multi-agent trajectory reasoning and verification framework for enhancing language models in materials data extraction(Xin Li, Zhixuan Huang, Shu Quan, C.A.I. Peng, Xiaoming Ma, 2025, npj Computational Materials)
- Architectures for Building Agentic AI(Sławomir Nowaczyk, 2025, ArXiv Preprint)
- AGAPI-Agents: An Open-Access Agentic AI Platform for Accelerated Materials Design on AtomGPT.org(Jaehyung Lee, J. Ely, Ke Zhang, A. Ajith, Charles Rhys Campbell, Kamal Choudhary, 2025, ArXiv)
- S1-MatAgent: A planner driven multi-agent system for material discovery(Xinrui Wang, Chengbo Li, Boxuan Zhang, Jiahui Shi, Nian Ran, Linjing Li, Jianjun Liu, Dajun Zeng, 2025, ArXiv Preprint)
- ARCANE: A Multi-Agent Framework for Interpretable and Configurable Alignment(Charlie Masters, Marta Grześkiewicz, Stefano V. Albrecht, 2025, ArXiv Preprint)
- AutoTool: Dynamic Tool Selection and Integration for Agentic Reasoning(Jiaru Zou, Ling Yang, Yunzhe Qi, Sirui Chen, Mengting Ai, Ke Shen, Jingrui He, Mengdi Wang, 2025, ArXiv Preprint)
- Hierarchical Multi-agent Large Language Model Reasoning for Autonomous Functional Materials Discovery(Samuel Rothfarb, Megan C. Davis, Ivana Matanovic, Baikun Li, Edward F. Holby, Wilton J. M. Kort-Kamp, 2025, ArXiv)
- Toward Greater Autonomy in Materials Discovery Agents: Unifying Planning, Physics, and Scientists(Lianhao Zhou, Hongyi Ling, Keqiang Yan, Kaiji Zhao, Xiaoning Qian, Raymundo Arr'oyave, Xiaofeng Qian, Shuiwang Ji, 2025, ArXiv)
- DeepAnalyze: Agentic Large Language Models for Autonomous Data Science(Shaolei Zhang, Ju Fan, Meihao Fan, Guoliang Li, Xiaoyong Du, 2025, ArXiv Preprint)
- Knowledge-driven autonomous materials research via collaborative multi-agent and robotic system(Tongyu Shi, Yutang Li, Zhanlong Wang, Wenhe Xu, Guolai Jiang, Dawei Dai, Jie Zhou, Hao Huang, Rui He, Seeram Ramakrishna, Paul K. Chu, Wenhua Zhou, Xue-Feng Yu, 2026, Matter)
- A one-shot automated framework based on large language model and AutoML: Accelerating the design of porous carbon materials and carbon capture optimization(Lin Hu, Zhaorong Zhou, Guozhu Jia, 2025, Separation and Purification Technology)
科学假设生成、逆向设计与逻辑推理
专注于利用智能体生成科学假设、进行物理化学逻辑推演,并结合生成式AI与强化学习进行材料的逆向设计、晶体结构预测及目标导向的分子建模。
- Towards Fully-Automated Materials Discovery via Large-Scale Synthesis Dataset and Expert-Level LLM-as-a-Judge(Heegyu Kim, Taeyang Jeon, Seungtaek Choi, J. Hong, Dong Won Jeon, Ga-Yeon Baek, Gyeong-Won Kwak, Dong-Hee Lee, Jisu Bae, Chi-Hoon Lee, Yoon-Seo Kim, Seon-Jin Choi, Jinsin Park, Sung Beom Cho, Hyunsouk Cho, 2025, Proceedings of the 34th ACM International Conference on Information and Knowledge Management)
- Mechanisms of Matter: Language Inferential Benchmark on Physicochemical Hypothesis in Materials Synthesis(Yingming Pu, Tao Lin, Hongyu Chen, 2025, ArXiv)
- A closed-loop AI framework for hypothesis-driven and interpretable materials design(Kangyu Ji, Tianran Liu, Fang Sheng, Shaun Tan, Moungi Bawendi, Tonio Buonassisi, 2025, ArXiv Preprint)
- Accelerating Scientific Discovery with Autonomous Goal-evolving Agents(Yuanqi Du, Botao Yu, Tianyu Liu, Tony Shen, Junwu Chen, Jan G. Rittig, Kunyang Sun, Yikun Zhang, Zhangde Song, Bo Zhou, Cassandra Masschelein, Yingze Wang, Haorui Wang, Haojun Jia, Chao Zhang, Hongyu Zhao, Martin Ester, Teresa Head-Gordon, Carla P. Gomes, Huan Sun, Chenru Duan, Philippe Schwaller, Wengong Jin, 2025, ArXiv Preprint)
- (Invited) LLM Guided Hypothesis Generation in Self-Driving Lab for Energy Storage Materials Discovery(Wei Wang, Tianzhixi Yin, Ruozhu Feng, J. Heather, Jie Bao, Peiyuan Gao, Yangang Liang, 2025, ECS Meeting Abstracts)
- PriM: Principle-Inspired Material Discovery through Multi-Agent Collaboration(Zheyuan Lai, Yingming Pu, 2025, ArXiv)
- MatPC: Prompting Large Language Model, Crystal Structure Prediction, and First-Principles for Semantic-Driven Material Design.(Jiacheng Zhou, Bo Xiao, Qi Liu, Lifeng Liu, Lei Zhang, 2025, ACS applied materials & interfaces)
- Hypothesis Generation for Materials Discovery and Design Using Goal-Driven and Constraint-Guided LLM Agents(Shrinidhi Kumbhar, Venkatesh Mishra, Kevin Coutinho, Divij Handa, Ashif Iquebal, Chitta Baral, 2025, No journal)
- The Rise of Generative AI for Metal-Organic Framework Design and Synthesis(Chenru Duan, Aditya Nandy, Shyam Chand Pal, Xin-Yu Yang, Wenhao Gao, Yuanqi Du, Hendrik Krass, Y. Kang, Varinia Bernales, Zuyang Ye, Tristan Pyle, Ray Yang, Zeqi Gu, Philipp Schwaller, Shengqian Ma, Shijing Sun, Alán Aspuru-Guzik, S. M. Moosavi, Robert B Wexler, Zhiling Zheng, 2025, ArXiv)
- Agentic additive manufacturing alloy evaluation(P. Pak, Achuth Chandrasekhar, A. Farimani, 2025, Additive Manufacturing Letters)
- Inverse Materials Design by Large Language Model-Assisted Generative Framework(Yun Hao, Chenglin Fan, Beilin Ye, Wenhao Lu, Zhenyi Lu, Peilin Zhao, Zhifeng Gao, Qingyao Wu, Yanhui Liu, Tongqi Wen, 2025, ArXiv)
- Swarms of Large Language Model Agents for Protein Sequence Design with Experimental Validation(Fiona Y. Wang, Di Sheng Lee, David L. Kaplan, Markus J. Buehler, 2025, ArXiv)
- Intelligent inverse design of phononic crystals based on machine learning coupled with localized collocation meshless method(Wenhui Chu, Zhuojia Fu, S. Nanthakumar, Wenzhi Xu, Xiaoying Zhuang, 2025, International Journal of Mechanics and Materials in Design)
- A Critical Examination of Active Learning Workflows in Materials Science(Akhil S. Nair, Lucas Foppa, 2026, ArXiv Preprint)
计算材料学自动化与高通量仿真智能体
探讨智能体如何自动化执行复杂的计算任务,如DFT模拟(VASP)、分子动力学(LAMMPS)、势能面导航及本构模型生成,实现计算流的端到端管理。
- DREAMS: Density Functional Theory Based Research Engine for Agentic Materials Simulation(Ziqi Wang, Hongshuo Huang, Hancheng Zhao, Changwen Xu, Shang Zhu, Jan Janssen, Venkatasubramanian Viswanathan, 2025, ArXiv)
- GENIUS: An Agentic AI Framework for Autonomous Design and Execution of Simulation Protocols(Mohammad Soleymanibrojeni, Roland Aydin, D. Guedes-Sobrinho, A. C. Dias, M. J. Piotrowski, W. Wenzel, C. R. Rego, 2025, ArXiv)
- An Agentic Framework for Autonomous Materials Computation(Zeyu Xia, Jinzhe Ma, C. Zheng, Shufei Zhang, Yuqiang Li, Hang Su, P. Hu, Chang Zhang, Xingao Gong, Wanli Ouyang, Lei Bai, Dongzhan Zhou, Mao Su, 2025, ArXiv)
- Multi-Agentic AI Framework for End-to-End Atomistic Simulations(Aikaterini Vriza, Uma Kornu, Aditya Koneru, Henry Chan, S. Sankaranarayanan, 2025, Digital Discovery)
- Intelligent navigation of potential energy surfaces: leveraging deep reinforcement learning paradigms for accelerated discovery of stable nickel nanoclusters.(Muhammad Umar Farooq, Nazir Ahmed, Muhammad Yasir Muneeb, Fuyi Chen, 2026, Nanoscale)
- Intelligent Sensing and Modeling for Molecular Sequence Prediction Based on Transformer Architecture(Kangjie Li, Jiahang Han, Bo Pang, Dongqing Liu, Xu Guo, Nan Zhang, 2025, 2025 37th Chinese Control and Decision Conference (CCDC))
- VASPilot: MCP-facilitated multi-agent intelligence for autonomous VASP simulations(Jiaxuan 家轩 Liu 刘, Tiannian 天念 Zhu 朱, Caiyuan 财渊 Ye 叶, Z. Fang 方, H. Weng 翁, Q. Wu 吴, 2025, Chinese Physics B)
- Automating modeling in mechanics: LLMs as designers of physics-constrained neural networks for constitutive modeling of materials(Marius Tacke, Matthias Busch, Kian P. Abdolazizi, J. Eichinger, Kevin Linka, Christian J. Cyron, Roland C. Aydin, 2025, ArXiv)
- Towards Fully Automated Molecular Simulations: Multi-Agent Framework for Simulation Setup and Force Field Extraction(Marko D. Petkovi'c, V. Menkovski, S. Calero, 2025, ArXiv)
- Research on Condensed Matter Property Prediction Based on Large Language Model(Peng Li, 2025, 2025 6th International Conference on Computer Vision, Image and Deep Learning (CVIDL))
自驱动实验室(SDLs)与物理实验闭环集成
关注智能体与机器人硬件、自动化仪器的深度集成,实现从实验设计、自动化合成、原位表征到闭环优化的全流程物理自动化系统。
- Adaptive AI decision interface for autonomous electronic material discovery(Yahao Dai, Henry Chan, Aikaterini Vriza, Fredrick Kim, Yunfei Wang, Wei Liu, Naisong Shan, Jing Xu, Max Weires, Yukun Wu, Zhiqiang Cao, C. Suzanne Miller, Ralu Divan, Xiaodan Gu, Chenhui Zhu, Sihong Wang, Jie Xu, 2025, ArXiv Preprint)
- Autonomous Materials Exploration by Integrating Automated Phase Identification and AI-Assisted Human Reasoning(Ming-Chiang Chang, Maximilian Amsler, Duncan R. Sutherland, Sebastian Ament, Katie R. Gann, Lan Zhou, Louisa M. Smieska, Arthur R. Woll, John M. Gregoire, Carla P. Gomes, R. Bruce van Dover, Michael O. Thompson, 2026, ArXiv Preprint)
- AURORA - An Automatic Robotic Platform for Materials Discovery(Bing Lei, Per H. Svensson, Pavel Yushmanov, L. Kloo, 2025, ACS Applied Materials & Interfaces)
- Toward Autonomous Battery Materials Research: Advances and Challenges in Automated Synthesis Experimentation(Yan Zeng, 2025, ECS Meeting Abstracts)
- An Autonomous Laboratory for High-Throughput Air-Sensitive Material Discovery for All-Solid-State Batteries(Yuxing Fei, Bernardus Rendy, Junhee Woo, Xiaochen Yang, Chang Li, Yan Zeng, Gerbrand Ceder, 2025, ECS Meeting Abstracts)
- Author Correction: An autonomous laboratory for the accelerated synthesis of inorganic materials(Nathan J. Szymanski, Bernardus Rendy, Yuxing Fei, Rishi E. Kumar, T. He, David Milsted, Matthew J. McDermott, Max C. Gallant, E. D. Cubuk, Amil Merchant, Haegyeom Kim, Anubhav Jain, Christopher J. Bartel, K. Persson, Yan Zeng, Gerbrand Ceder, 2026, Nature)
- Dara: Automated Multiple-Hypothesis Phase Identification and Refinement from Powder X‑ray Diffraction(Yuxing Fei, Matthew J. McDermott, Christopher L. Rom, Shilong Wang, Gerbrand Ceder, 2025, Chemistry of Materials)
- Agentic AI for Accelerated Materials Characterization in Synchrotron Science(Ting-Rui Li, Jizhou Li, 2025, Photon Science)
- Flexible Autonomous Platforms for Accelerating Material Discovery in Battery R&D(Hui Li, A. Huang, Xiaoping Jiang, 2025, ECS Meeting Abstracts)
- Toward Self-Driving Laboratory 2.0 for Chemistry and Materials Discovery(Heeseung Lee, H. Yoo, Hye Su Jang, Byeongho Park, Yang Jeong Park, S. Han, 2026, Materials Horizons)
- AP-Lab: An AI-Driven Autonomous Pilot-Scale Platform Bridging Materials Discovery and Industrial Manufacturing.(Zhanlong Wang, Zhifen Ma, Wenxing Song, Guolai Jiang, Boshi Jiang, Min Jiang, Weiliang Shu, Bing Wang, Zhiyuan Wan, Shengyong Geng, Zhen Zhao, Wenhua Zhou, Xuefeng Yu, 2026, Advanced science)
- Agentic Synthesis: Autonomous Recipe Discovery via a System of AI Agents and Robotics: Electroplating Ni Thin Film on Cu(Lisa Lu, S. Guha, 2025, ECS Meeting Abstracts)
- FastCat: Autonomous Discovery of Multielement Layered Double Hydroxide Alloy Catalysts for Alkaline Oxygen Evolution Reaction(Nis Fisker‐Bødker, E. Moretti, J. Chang, T. Vegge, 2025, Advanced Intelligent Discovery)
- Self-driving thin film laboratory: autonomous epitaxial atomic-layer synthesis via real-time computer vision analysis of electron diffraction(Haotong Liang, Yunlong Sun, Ryan Paxson, Chih-Yu Lee, Alex T. Hall, Zoey Warecki, John Cumings, Hideomi Koinuma, Aaron Gilad Kusne, Mikk Lippmaa, Ichiro Takeuchi, 2026, ArXiv Preprint)
- Evaluating large language model agents for automation of atomic force microscopy(Indrajeet Mandal, J. Soni, Mohd Zaki, M. Smedskjaer, K. Wondraczek, Lothar Wondraczek, N. Gosvami, N. M. A. Krishnan, 2025, Nature Communications)
- Quantum Kernel Machine Learning for Autonomous Materials Science(Felix Adams, Daiwei Zhu, D. Steuerman, A. Kusne, Ichiro Takeuchi, 2026, ArXiv)
- An AI-native experimental laboratory for autonomous biomolecular engineering(Mingyu Wu, Zhaoguo Wang, Jiabin Wang, Zhiyuan Dong, Jingkai Yang, Qingting Li, Tianyu Huang, Lei Zhao, Mingqiang Li, Fei Wang, Chunhai Fan, Haibo Chen, 2025, ArXiv Preprint)
- UniLabOS: An AI-Native Operating System for Autonomous Laboratories(Jing Gao, Junhan Chang, Haohui Que, Yanfei Xiong, Shixiang Zhang, Xianwei Qi, Zhen Liu, Jun-Jie Wang, Qianjun Ding, Xinyu Li, Ziwei Pan, Qiming Xie, Zhuang Yan, Junchi Yan, Linfeng Zhang, 2025, ArXiv Preprint)
- Toward an Autonomous Robotic Battery Materials Research Platform Powered by Automated Workflow and Ontologized Findable, Accessible, Interoperable, and Reusable Data Management(Enea Svaluto‐Ferro, Graham Kimbell, YeonJu Kim, Nukorn Plainpan, Benjamin Kunz, Lina Scholz, Raphael Läubli, Maximilian Becker, David Reber, Peter Kraus, R. Kühnel, Corsin Battaglia, 2025, Batteries & Supercaps)
- (Invited) Self-Driving Experimentation and Hypothesis Generation for Energy Storage Materials Discovery and Development(Wei Wang, Tianzhixi Yin, Ruozhu Feng, Yangang Liang, Heather Job, Peiyuan Gao, Jie Bao, 2025, ECS Meeting Abstracts)
- Autonomous machine vision meets 2D materials science(Miranda L. Vinay, 2025, Nature Reviews Electrical Engineering)
- Towards self-driving/autonomous material discovery lab(Deva Priyakumar, 2025, Journal of Chemical Sciences)
特定材料体系应用与模型优化技术
展示智能体在合金、电池电解质、聚合物、MOFs等具体领域的应用实效,并涉及针对材料科学任务的LLM微调、压缩及强化学习优化技术。
- System of Agentic AI for the Discovery of Metal-Organic Frameworks(Théo Jaffrelot Inizan, Sherry Yang, Aaron D. Kaplan, Yen-hsu Lin, Jian Yin, Saber Mirzaei, Mona Abdelgaid, Ali H. Alawadhi, KwangHwan Cho, Zhiling Zheng, E. D. Cubuk, C. Borgs, J. Chayes, Kristin A. Persson, O. Yaghi, 2025, ArXiv)
- Expert-Guided LLM Reasoning for Battery Discovery: From AI-Driven Hypothesis to Synthesis and Characterization(Shengchao Liu, Hannan Xu, Yan Ai, Huanxin Li, Yoshua Bengio, Harry Guo, 2025, ArXiv Preprint)
- Autonomous Photovoltaic Materials Discovery and Photoinduced Halide Dynamics in Lead Halide Perovskites(Udo Bach, 2025, Proceedings of the Asia-Pacific International Conference on Perovskite, Organic Photovoltaics and Optoelectronics)
- AI-agent–enhanced knowledge graphs and memory-augmented models as a new paradigm for intelligent sintering systems(Pouria Dianati Souha, 2025, Synthesis and Sintering)
- Penrose Tiled Low-Rank Compression and Section-Wise Q&A Fine-Tuning: A General Framework for Domain-Specific Large Language Model Adaptation(Chuan-Wei Kuo, Siyu Chen, Chenqian Yan, Yu Liu, 2025, ArXiv)
- Retrospex: Language Agent Meets Offline Reinforcement Learning Critic(Yufei Xiang, Yiqun Shen, Yeqin Zhang, Cam-Tu Nguyen, 2025, ArXiv Preprint)
- How do language models learn facts? Dynamics, curricula and hallucinations(Nicolas Zucchet, Jörg Bornschein, Stephanie Chan, Andrew Lampinen, Razvan Pascanu, Soham De, 2025, ArXiv Preprint)
- A fine-tuned large language model based molecular dynamics agent for code generation to obtain material thermodynamic parameters(Zhuo-Fan Shi, Chunxiao Xin, Tong Huo, Yun-Tao Jiang, Bowen Wu, Xing Chen, Wei Qin, Xinjian Ma, Gang Huang, Zhenyu Wang, Xiang Jing, 2025, Scientific Reports)
- Rapid and automated alloy design with graph neural network-powered large language model-driven multi-agent AI(Alireza Ghafarollahi, Markus J. Buehler, 2025, MRS Bulletin)
- Autonomous Discovery of Functional Random Heteropolymer Blends through Evolutionary Formulation Optimization.(Guangqi Wu, Tianying Jin, Alfredo Alexander-Katz, Connor W. Coley, 2025, Matter)
- Multi‐Agent‐Network‐Based Idea Generator for Zinc‐Ion Battery Electrolyte Discovery: A Case Study on Zinc Tetrafluoroborate Hydrate‐Based Deep Eutectic Electrolytes(M. Robson, Shengjun Xu, Zilong Wang, Qing Chen, Francesco Ciucci, 2025, Advanced Materials (Deerfield Beach, Fla.))
- TopoMAS: Large Language Model Driven Topological Materials Multiagent System(Baohua Zhang, Xin Li, Huangchao Xu, Zhong Jin, Quansheng Wu, Ce Li, 2025, ArXiv)
- Unraveling the Complexity of Divalent Hydride Electrolytes in Solid‐State Batteries via a Data‐Driven Framework with Large Language Model(Qian Wang, Fangling Yang, Yuhang Wang, Di Zhang, Ryuhei Sato, Linda Zhang, Eric Jianfeng Cheng, Yigang Yan, Yungui Chen, K. Kisu, S. Orimo, Hao Li, 2025, Angewandte Chemie (International Ed. in English))
- Generative Artificial Intelligence Extracts Structure-Function Relationships from Plants for New Materials(Rachel K. Luu, J. Deng, Mohammed Shahrudin Bin Ibrahim, Nam‐Joon Cho, Ming Dao, Subra Suresh, Markus J. Buehler, 2025, ArXiv)
- Accelerated inorganic materials design with generative AI agents(Izumi Takahara, Teruyasu Mizoguchi, Bang Liu, 2025, Cell Reports Physical Science)
- Autonomous Inorganic Materials Discovery via Multi-Agent Physics-Aware Scientific Reasoning(Alireza Ghafarollahi, Markus J. Buehler, 2025, ArXiv Preprint)
- aLLoyM: a large language model for alloy phase diagram prediction(Yuna Oikawa, G. Deffrennes, Taichi Abe, Ryo Tamura, Koji Tsuda, 2025, npj Computational Materials)
基准测试、系统可靠性与科研基础设施
专注于建立标准化的评估框架(如MatTools, MADE),探讨智能体行为的溯源(Provenance)、元数据标准化以及人类在环(HILT)的协作模式。
- Benchmarking the Discovery Engine(Jack Foxabbott, Arush Tagade, Andrew Cusick, Robbie McCorkell, Leo McKee-Reid, Jugal Patel, Jamie Rumbelow, Jessica Rumbelow, Zohreh Shams, 2025, ArXiv Preprint)
- Large Language Models Reasoning Abilities Under Non-Ideal Conditions After RL-Fine-Tuning(Chang Tian, Matthew B. Blaschko, Mingzhe Xing, Xiuxing Li, Yinliang Yue, Marie-Francine Moens, 2025, ArXiv Preprint)
- MatTools: Benchmarking Large Language Models for Materials Science Tools(Siyu Liu, Bo Hu, Beilin Ye, Jiamin Xu, David J. Srolovitz, Tongqi Wen, 2025, ArXiv Preprint)
- MADE: Benchmark Environments for Closed-Loop Materials Discovery(Shreshth A. Malik, T. Doherty, P. Tigas, Muhammed Razzak, Stephen J. Roberts, Aron Walsh, Y. Gal, 2026, ArXiv)
- QMBench: A Research Level Benchmark for Quantum Materials Research(Yanzhen Wang, Yiyang Jiang, Diana Golovanova, Kamal Das, Hyeonhu Bae, Yufei Zhao, Huu-Thong Le, Abhinava Chatterjee, Yunzhe Liu, Chao-Xing Liu, F. Jornada, Binghai Yan, Xiao-Liang Qi, 2025, ArXiv)
- Looking Forward: Challenges and Opportunities in Agentic AI Reliability(Liudong Xing, Janet, Lin, 2025, ArXiv Preprint)
- PROV-AGENT: Unified Provenance for Tracking AI Agent Interactions in Agentic Workflows(Renan Souza, Amal Gueroudji, Stephen DeWitt, Daniel Rosendo, Tirthankar Ghosal, Robert Ross, Prasanna Balaprakash, Rafael Ferreira da Silva, 2025, ArXiv Preprint)
- Human-in-the-Loop and AI: Crowdsourcing Metadata Vocabulary for Materials Science(Jane Greenberg, Scott McClellan, Addy Ireland, Robert Sammarco, Colton Gerber, Christopher B. Rauch, Mat Kelly, J. Kunze, Yuan An, E. Toberer, 2025, ArXiv)
本报告综合了用于材料科学研究的智能体进展,揭示了该领域正从单一的LLM辅助工具向高度自主的“AI科学家”范式演进。最终分组涵盖了从宏观战略路线图到微观技术实现的完整链条:包括知识层的文献挖掘与多模态表征、逻辑层的架构设计与假设生成、计算层的仿真自动化、以及物理层的自驱动实验室集成。同时,研究重点正转向系统可靠性、标准化基准测试及特定材料体系(如电池、合金)的深度闭环发现,旨在构建可信、高效且具备物理逻辑推理能力的自主科研基础设施。
总计149篇相关文献
Large Language Models (LLMs) are reshaping many aspects of materials science and chemistry research, enabling advances in molecular property prediction, materials design, scientific automation, knowledge extraction, and more. Recent developments demonstrate that the latest class of models are able to integrate structured and unstructured data, assist in hypothesis generation, and streamline research workflows. To explore the frontier of LLM capabilities across the research lifecycle, we review applications of LLMs through 34 total projects developed during the second annual Large Language Model Hackathon for Applications in Materials Science and Chemistry, a global hybrid event. These projects spanned seven key research areas: (1) molecular and material property prediction, (2) molecular and material design, (3) automation and novel interfaces, (4) scientific communication and education, (5) research data management and automation, (6) hypothesis generation and evaluation, and (7) knowledge extraction and reasoning from the scientific literature. Collectively, these applications illustrate how LLMs serve as versatile predictive models, platforms for rapid prototyping of domain-specific tools, and much more. In particular, improvements in both open source and proprietary LLM performance through the addition of reasoning, additional training data, and new techniques have expanded effectiveness, particularly in low-data environments and interdisciplinary research. As LLMs continue to improve, their integration into scientific workflows presents both new opportunities and new challenges, requiring ongoing exploration, continued refinement, and further research to address reliability, interpretability, and reproducibility.
Large language models (LLMs) are reshaping many aspects of materials science and chemistry research, enabling advances in molecular property prediction, materials design, scientific automation, knowledge extraction, and more. Recent developments demonstrate that the latest class of models are able to integrate structured and unstructured data, assist in hypothesis generation, and streamline research workflows. To explore the frontier of LLM capabilities across the research lifecycle, we review applications of LLMs through 32 total projects developed during the second annual LLM hackathon for applications in materials science and chemistry, a global hybrid event. These projects spanned seven key research areas: (1) molecular and material property prediction, (2) molecular and material design, (3) automation and novel interfaces, (4) scientific communication and education, (5) research data management and automation, (6) hypothesis generation and evaluation, and (7) knowledge extraction and reasoning from the scientific literature. Collectively, these applications illustrate how LLMs serve as versatile predictive models, platforms for rapid prototyping of domain-specific tools, and much more. In particular, improvements in both open source and proprietary LLM performance through the addition of reasoning, additional training data, and new techniques have expanded effectiveness, particularly in low-data environments and interdisciplinary research. As LLMs continue to improve, their integration into scientific workflows presents both new opportunities and new challenges, requiring ongoing exploration, continued refinement, and further research to address reliability, interpretability, and reproducibility.
We introduce QMBench, a comprehensive benchmark designed to evaluate the capability of large language model agents in quantum materials research. This specialized benchmark assesses the model's ability to apply condensed matter physics knowledge and computational techniques such as density functional theory to solve research problems in quantum materials science. QMBench encompasses different domains of the quantum material research, including structural properties, electronic properties, thermodynamic and other properties, symmetry principle and computational methodologies. By providing a standardized evaluation framework, QMBench aims to accelerate the development of an AI scientist capable of making creative contributions to quantum materials research. We expect QMBench to be developed and constantly improved by the research community.
Foundation models (FMs) are catalyzing a transformative shift in materials science (MatSci) by enabling scalable, general-purpose, and multimodal AI systems for scientific discovery. Unlike traditional machine learning models, which are typically narrow in scope and require task-specific engineering, FMs offer cross-domain generalization and exhibit emergent capabilities. Their versatility is especially well-suited to materials science, where research challenges span diverse data types and scales. This survey provides a comprehensive overview of foundation models, agentic systems, datasets, and computational tools supporting this growing field. We introduce a task-driven taxonomy encompassing six broad application areas: data extraction, interpretation and Q\&A; atomistic simulation; property prediction; materials structure, design and discovery; process planning, discovery, and optimization; and multiscale modeling. We discuss recent advances in both unimodal and multimodal FMs, as well as emerging large language model (LLM) agents. Furthermore, we review standardized datasets, open-source tools, and autonomous experimental platforms that collectively fuel the development and integration of FMs into research workflows. We assess the early successes of foundation models and identify persistent limitations, including challenges in generalizability, interpretability, data imbalance, safety concerns, and limited multimodal fusion. Finally, we articulate future research directions centered on scalable pretraining, continual learning, data governance, and trustworthiness.
Despite the surge of AI in energy materials research, fully autonomous workflows that connect high-precision experimental knowledge to the discovery of credible new energy-related materials remain at an early stage. Here, we develop the Descriptive Interpretation of Visual Expression (DIVE) multi-agent workflow, which systematically reads and organizes experimental data from graphical elements in scientific literature. Applied to solid-state hydrogen storage materials—a class of materials central to future clean-energy technologies—DIVE markedly improves the accuracy and coverage of data extraction compared to the direct extraction method, with gains of 10–15% over commercial models and over 30% relative to open-source models. Building on a curated database of over 30 000 entries from >4000 publications, we establish a rapid inverse-design AI workflow capable of proposing new materials within minutes. This transferable, end-to-end paradigm illustrates how multimodal AI agents can convert literature-embedded scientific knowledge into actionable innovation, offering a scalable pathway for accelerated discovery across chemistry and materials science.
Designing inorganic crystalline materials with tailored properties is critical to technological innovation, yet current generative computational methods often struggle to efficiently explore desired targets with sufficient interpretability. Here, we present MatAgent, a generative approach for inorganic materials discovery that harnesses the powerful reasoning capabilities of large language models (LLMs). By combining a diffusion-based generative model for crystal structure estimation with a predictive model for property evaluation, MatAgent uses iterative, feedback-driven guidance to steer material exploration precisely toward user-defined targets. Integrated with external cognitive tools-including short-term memory, long-term memory, the periodic table, and a comprehensive materials knowledge base-MatAgent emulates human expert reasoning to vastly expand the accessible compositional space. Our results demonstrate that MatAgent robustly directs exploration toward desired properties while consistently achieving high compositional validity, uniqueness, and material novelty. This framework thus provides a highly interpretable, practical, and versatile AI-driven solution to accelerate the discovery and design of next-generation inorganic materials.
Fragmented knowledge and slow experimental iteration constrain the discovery of energy materials. We trace the evolution of artificial intelligence (AI) in materials science, from large language models as knowledge assistants to autonomous agents that can reason, plan, and use tools. We introduce a two-path framework to analyze this evolution, distinguishing architectural innovation (agent collaboration) from cognitive innovation (learning and representation). This framework synthesizes recent progress in AI-driven discovery, design, and automation. By examining challenges in reliability, interpretability, and physical grounding, we outline a roadmap toward physics-informed, human-AI systems for autonomous scientific discovery.
No abstract available
The concept of a digital materials ecosystem represents a new paradigm in materials research, where data, theory, and automation are integrated into a unified and iterative framework. By combining reliable databases, physical frameworks, and intelligent data analysis, materials discovery is evolving from empirical exploration toward a systematic and predictive science. The rapid growth of data and artificial intelligence (AI) has enabled the identification of complex structure-property relationships, while advances in automated synthesis and high-throughput characterization are closing the loop between prediction and validation. Looking forward, the field must focus on building trustworthy and benchmarked datasets, developing interpretable and high-precision models, and designing AI tools that embody human scientific reasoning. Equally important is ensuring standardization and consistency between digital inputs and experimental responses. Together, these efforts will transform materials discovery from data accumulation into genuine knowledge generation, paving the way for an autonomous and self-improving research ecosystem that accelerates both fundamental understanding and technological innovation.
Self-driving laboratories that couple robotics with AI agents for “closed-loop” experiments involving robot driven material synthesis, in-line measurement and decision-making without human intervention may change how materials are discovered and optimized. They hold the promise for automated materials development that can iterate significantly faster than manual workflows while reducing bias and human error. In this work we have explored the playbook for automated experiment–measure–decide loops in materials science using the technologically important example of electrochemical deposition of nickel. Nickel electroplating itself is a mature, industrially relevant process with well-documented chemistries and processes. Industrial Ni electroplating dates back to at least 1866 and has an estimated global market size of 3.9 Billion dollars in 2024. We have therefore chosen the electroplating of Ni on Cu substrates as an important model system for our proof-of-concept studies for benchmarking autonomous material synthesis discovery before tackling more complex chemistries. We present an autonomous experimental setup that discovers and refines electroplating recipes for depositing smooth Ni thin films on Cu substrates. The experiment demonstrates that AI-in-the-loop experimentation can efficiently navigate parameter spaces and uncover process windows for optimizing material performance without human intervention. Also, importantly, we show that hardware and software costs for such agentic synthesis have reduced to the point where they hold the promise of wide deployment and scalability. A benchtop robot executes a four-step closed loop: (1) film plating; (2) in-line measurement of roughness, coverage, and deposition rate; (3) AI feedback; and (4) plating under updated conditions. Decision-making is fully delegated to a large-language-model (LLM) ChatGPT-4o agent that proposes experimental settings and adapts them based on measured outcomes of previous experimental rounds to enable autonomous exploration. A photograph of the experiment setup is shown in the figure below. The controllable variables are plating current density, bath temperature, and electrolyte dilution (water-to-solution ratio), time in seconds. The original plating solution follows a Watts-type formulation: 24% nickel sulfate hexahydrate, 5% nickel chloride hexahydrate, 5% boric acid, 1.4% softener, 0.03% brightener, 0.3% wetting agent, and 64.37% water. The optimization target is multi-objective: maximize film smoothness and areal coverage while achieving a specified thickness. Thickness (and therefore deposition rate) is quantified by measuring the weight of the Ni plated Cu substrate before and after plating using a microbalance. Surface roughness is proxied by measuring the spot size of a laser beam reflected from the Ni surface, which is imaged using a camera. Larger spots indicate higher microroughness. Film coverage is estimated by microscope imaging. All of the mechanical operations are carried out by a robot arm and all hardware controlled via a Raspberry Pi. Analysis of the reflected laser spot and microscopic image is done by sending the images and appropriate text prompts to ChatGPT-4o used as a vision-language model. The LLM agent receives these metrics and textual guidance and returns the next set of parameters—(current density in A/cm 2 , dilution, temperature in °C, time in seconds)—in a machine-parsable format to drive the next robot loop. Operational safety is enforced by hard-coded limits of maximum plating voltage, current density, solution concentration, and solution temperature. We report autonomous experiments starting from an initial prompt to the LLM that describes the experimental objective (100 µm target thickness; smoothest, fully covered films). Though the LLM initially guessed sub-optimal deposition parameters, it consistently converged to optimal deposition parameters of: (a) keeping the original Watts solution (no or nearly no dilution); (b) 0.1 A/cm 2 plating current density (for comparison, current density recommendation is ~ 0.02 to 0.07 A/cm 2 range in the Nickel plating handbook); (c) 35 degree C solution temperature (highest within a manually set safety limit by us)--within 4 to 6 iterations. Our results successfully illustrates how co-designed AI-agentic and robotic systems can accelerate the discovery of material synthesis recipes. It may serve as a starting point toward autonomous optimization of more complex electrodeposition systems and electrochemical manufacturing workflows. Figure 1
Artificial intelligence is reshaping scientific discovery, yet its use in materials research remains limited by fragmented computational ecosystems, reproducibility challenges, and dependence on commercial large language models (LLMs). Here we introduce AGAPI (AtomGPT.org API), an open-access agentic AI platform that integrates more than eight open-source LLMs with over twenty materials-science API endpoints, unifying databases, simulation tools, and machine-learning models through a common orchestration framework. AGAPI employs an Agent-Planner-Executor-Summarizer architecture that autonomously constructs and executes multi-step workflows spanning materials data retrieval, graph neural network property prediction, machine-learning force-field optimization, tight-binding calculations, diffraction analysis, and inverse design. We demonstrate AGAPI through end-to-end workflows, including heterostructure construction, powder X-ray diffraction analysis, and semiconductor defect engineering requiring up to ten sequential operations. In addition, we evaluate AGAPI using 30+ example prompts as test cases and compare agentic predictions with and without tool access against experimental data. With more than 1,000 active users, AGAPI provides a scalable and transparent foundation for reproducible, AI-accelerated materials discovery. AGAPI-Agents codebase is available at https://github.com/atomgptlab/agapi.
No abstract available
The history of science is punctuated by serendipitous discoveries, where unexpected observations, rather than targeted hypotheses, opened new fields of inquiry. While modern autonomous laboratories excel at accelerating hypothesis testing, their optimization for efficiency risks overlooking these crucial, unplanned findings. To address this gap, we introduce SciLink, an open-source, multi-agent artificial intelligence framework designed to operationalize serendipity in materials research by creating a direct, automated link between experimental observation, novelty assessment, and theoretical simulations. The framework employs a hybrid AI strategy where specialized machine learning models perform quantitative analysis of experimental data, while large language models handle higher-level reasoning. These agents autonomously convert raw data from materials characterization techniques into falsifiable scientific claims, which are then quantitatively scored for novelty against the published literature. We demonstrate the framework's versatility across diverse research scenarios, showcasing its application to atomic-resolution and hyperspectral data, its capacity to integrate real-time human expert guidance, and its ability to close the research loop by proposing targeted follow-up experiments. By systematically analyzing all observations and contextualizing them, SciLink provides a practical framework for AI-driven materials research that not only enhances efficiency but also actively cultivates an environment ripe for serendipitous discoveries, thereby bridging the gap between automated experimentation and open-ended scientific exploration.
Artificial intelligence (AI) agents, leveraging capabilities in natural language understanding, multimodal knowledge fusion, and tool invocation, are driving material science towards a new stage of agent-driven. This article systematically reviews the progress of AI agents in material science. It highlights their core innovation in material knowledge processing, structure design, and property calculation, significantly accelerating the materials design process. Furthermore, the article analyzes the impact of agents on experiments, which promote the automation of material synthesis and characterization. The integration of these capabilities is driving the development of self-driving laboratories, moving the field towards end-to-end autonomous materials creation. By providing a comprehensive overview of this rapidly developing field, this review aims to clarify the deep integration of AI agents with material science, thereby accelerating the realization of on-demand material design.
Artificial intelligence is reshaping scientific exploration, but most methods automate procedural tasks without engaging in scientific reasoning, limiting autonomy in discovery. We introduce Materials Agents for Simulation and Theory in Electronic-structure Reasoning (MASTER), an active learning framework where large language models autonomously design, execute, and interpret atomistic simulations. In MASTER, a multimodal system translates natural language into density functional theory workflows, while higher-level reasoning agents guide discovery through a hierarchy of strategies, including a single agent baseline and three multi-agent approaches: peer review, triage-ranking, and triage-forms. Across two chemical applications, CO adsorption on Cu-surface transition metal (M) adatoms and on M-N-C catalysts, reasoning-driven exploration reduces required atomistic simulations by up to 90% relative to trial-and-error selection. Reasoning trajectories reveal chemically grounded decisions that cannot be explained by stochastic sampling or semantic bias. Altogether, multi-agent collaboration accelerates materials discovery and marks a new paradigm for autonomous scientific exploration.
Existing benchmarks for computational materials discovery primarily evaluate static predictive tasks or isolated computational sub-tasks. While valuable, these evaluations neglect the inherently iterative and adaptive nature of scientific discovery. We introduce MAterials Discovery Environments (MADE), a novel framework for benchmarking end-to-end autonomous materials discovery pipelines. MADE simulates closed-loop discovery campaigns in which an agent or algorithm proposes, evaluates, and refines candidate materials under a constrained oracle budget, capturing the sequential and resource-limited nature of real discovery workflows. We formalize discovery as a search for thermodynamically stable compounds relative to a given convex hull, and evaluate efficacy and efficiency via comparison to baseline algorithms. The framework is flexible; users can compose discovery agents from interchangeable components such as generative models, filters, and planners, enabling the study of arbitrary workflows ranging from fixed pipelines to fully agentic systems with tool use and adaptive decision making. We demonstrate this by conducting systematic experiments across a family of systems, enabling ablation of components in discovery pipelines, and comparison of how methods scale with system complexity.
Automated characterization of porous materials has the potential to accelerate materials discovery, but it remains limited by the complexity of simulation setup and force field selection. We propose a multi-agent framework in which LLM-based agents can autonomously understand a characterization task, plan appropriate simulations, assemble relevant force fields, execute them and interpret their results to guide subsequent steps. As a first step toward this vision, we present a multi-agent system for literature-informed force field extraction and automated RASPA simulation setup. Initial evaluations demonstrate high correctness and reproducibility, highlighting this approach's potential to enable fully autonomous, scalable materials characterization.
Materials discovery and design are essential for advancing technology across various industries by enabling the development of application-specific materials. Recent research has leveraged Large Language Models (LLMs) to accelerate this process. We explore the potential of LLMs to generate viable hypotheses that, once validated, can expedite materials discovery. Collaborating with materials science experts, we curated a novel dataset from recent journal publications, featuring real-world goals, constraints, and methods for designing real-world applications. Using this dataset, we test LLM-based agents that generate hypotheses for achieving given goals under specific constraints. To assess the relevance and quality of these hypotheses, we propose a novel scalable evaluation metric that emulates the process a materials scientist would use to evaluate a hypothesis critically. Our curated dataset, proposed method, and evaluation framework aim to advance future research in accelerating materials discovery and design with LLMs.
We aim at designing language agents with greater autonomy for crystal materials discovery. While most of existing studies restrict the agents to perform specific tasks within predefined workflows, we aim to automate workflow planning given high-level goals and scientist intuition. To this end, we propose Materials Agent unifying Planning, Physics, and Scientists, known as MAPPS. MAPPS consists of a Workflow Planner, a Tool Code Generator, and a Scientific Mediator. The Workflow Planner uses large language models (LLMs) to generate structured and multi-step workflows. The Tool Code Generator synthesizes executable Python code for various tasks, including invoking a force field foundation model that encodes physics. The Scientific Mediator coordinates communications, facilitates scientist feedback, and ensures robustness through error reflection and recovery. By unifying planning, physics, and scientists, MAPPS enables flexible and reliable materials discovery with greater autonomy, achieving a five-fold improvement in stability, uniqueness, and novelty rates compared with prior generative models when evaluated on the MP-20 data. We provide extensive experiments across diverse tasks to show that MAPPS is a promising framework for autonomous materials discovery.
No abstract available
No abstract available
No abstract available
Artificial intelligence (AI) has accelerated materials discovery, yet its translation to industrial manufacturing remains limited due to two critical gaps: the scarcity of proprietary industrial datasets and the absence of application-oriented benchmarks. To address these challenges, we develop the AP-Lab, an AI-Driven Autonomous Pilot-Scale Laboratory workstation designed to bridge research and manufacturing. Using magnetic nanoparticles (MNPs) for viral nucleic acids (NAs) extraction as a case study, the AP-Lab integrates four agent-controlled systems for user interaction, optimization scheme generation, autonomous synthesis and testing, and data management. By leveraging localized industrial datasets and adopting Polymerase Chain Reaction (PCR) cycle threshold (Ct) values as an application-specific benchmark, the AP-Lab achieves rapid optimization of MNPs-based NAs extraction products at pilot-scale corresponding to 50,000 tests per batch within three weeks, and enables scale-up manufacturing of 1 million tests per batch in two months. Compared to conventional manual workflows, the AP-Lab reduces development timelines from four to six months to three weeks while delivering performance superior to leading commercial products. This work demonstrates a scalable strategy for AI-driven pilot-scale production and offers a blueprint for accelerating industrial adoption of advanced materials.
While developing new polymers typically requires years of investigation, blending existing polymers offers a cost-effective strategy to create new materials. However, developing functional polymer blends is often a slow and challenging process due to their vast design space, the non-additive nature of polymer properties, and limited fundamental understanding to guide the optimization. Here, we report an autonomous platform that addresses these challenges by integrating high-throughput blending, real-time data acquisition, and an evolutionary algorithm for composition optimization. This approach enables rapid exploration of complex combinatorial blending spaces of random heteropolymers (RHPs). With enzyme thermal stability as a model objective, this system discovered random heteropolymer blends (RHPBs) that outperform all constituents. Retrospective analysis reveals segment-level interactions correlated with the performance. This work highlights the opportunity for materials discovery within the RHP and RHPB space and the immense potential of leveraging autonomous discovery platforms to accelerate the discovery of polymers with emergent properties.
We present a long-horizon, hierarchical deep research (DR) agent designed for complex materials and device discovery problems that exceed the scope of existing Machine Learning (ML) surrogates and closed-source commercial agents. Our framework instantiates a locally deployable DR instance that integrates local retrieval-augmented generation with large language model reasoners, enhanced by a Deep Tree of Research (DToR) mechanism that adaptively expands and prunes research branches to maximize coverage, depth, and coherence. We systematically evaluate across 27 nanomaterials/device topics using a large language model (LLM)-as-judge rubric with five web-enabled state-of-the-art models as jurors. In addition, we conduct dry-lab validations on five representative tasks, where human experts use domain simulations (e.g., density functional theory, DFT) to verify whether DR-agent proposals are actionable. Results show that our DR agent produces reports with quality comparable to--and often exceeding--those of commercial systems (ChatGPT-5-thinking/o3/o4-mini-high Deep Research) at a substantially lower cost, while enabling on-prem integration with local data and tools.
The search for efficient, durable, and scalable catalysts for the oxygen evolution reaction (OER) is hampered by the slow innovation cycles for advanced materials. Here, we present FastCat —an artificial intelligence (AI)‐orchestrated self‐driving closed‐loop materials discovery system for the autonomous discovery of platinum group metal free multimetal hydroxides for alkaline water electrolysis. With FastCat, we have synthesized, characterized, and tested more than 500 Ni‐based multielement layered double hydroxide OER catalysts. In one of the largest AI‐orchestrated catalyst discovery campaigns to date, our metaheuristic Bayesian optimization identified known high‐performance compositions and several novel multielement Ni–Fe–Cr–Co alloys with unprecedented overpotentials as low as 231 mV at 20 mA cm −2 and compositional ratios approaching at higher current densities. FastCat can synthesize and electrochemically test up to 75 material compositions daily without human interaction. It can optimize the composition by analyzing trends in the test results, validating their reproducibility, and testing durability. This study includes measurements of >1000 samples, making it one of the largest autonomous experimental electrocatalysis datasets to date, and the findings showcase how autonomous materials discovery can make complex, multielement compositions accessible for research and drastically accelerate the development of novel materials.
No abstract available
No abstract available
Halide materials are promising superionic conductors for ASSBs but require moisture-free synthesis, limiting throughput due to manual operation in gloveboxes or dry rooms. An automated robotic system for handling air-sensitive materials in an inert atmosphere would greatly accelerate discovery. Here, we present a fully automated lab for high-throughput solid-state synthesis of halide conductors and other air-sensitive materials. Compact and modular, it operates entirely within an N 2 -protected double-station glovebox. The system includes five automated workstations for solid-state synthesis, with two six-axis robot arms seamlessly handling and transferring samples. By eliminating manual operation in confined spaces, it enhances efficiency and increases throughput. Many halide ionic conductors rely on rare-earth metals or mechanochemical treatments, raising costs and complicating scale-up. Developing compositions synthesizable via conventional heating with earth-abundant precursors is essential. Using our automated platform, we will systematically explore aliovalent substitution in close-packed halides, mapping synthesis accessibility by analyzing polymorph energetics, dopant effects, and heating conditions. This approach enables cost-effective, scalable ionic conductors, advancing solid-state battery technology.
The convergence of laboratory automation, artificial intelligence (AI), and data-driven science has catalyzed the emergence of self-driving laboratories (SDLs), autonomous platforms capable of designing, executing, and analyzing experiments with minimal...
The urgent need for renewable energy solutions requires rapid advancements in materials discovery. In response, we present AURORA, an innovative robotic platform that enhances this process by integrating automated synthesis, characterization, and evaluation into a single unit, thereby improving efficiency and reducing errors. Its modular design allows for adaptable screening of diverse materials, including metal halide perovskites, and their application in solar cell devices. Our study demonstrates the ability of AURORA to autonomously synthesize and evaluate polycrystalline, mixed halide perovskites, including a novel mesoscopic solar cell array with improved data reliability and throughput. AURORA also conducts postsynthesis treatments and dynamic analyses under stress, setting it apart from traditional methods. These features make AURORA a transformative tool for the discovery of novel materials, with potential machine learning integration for optimization. Our results highlight the application of AURORA as a robust and adaptable platform for future developments in automated materials research.
The automation of experimental workflows with high flexibility is a critical part to construct a self-driven laboratories (SDL) for accelerating material discovery. The autonomous experiment platforms generate highly repeatable and reliable data pipeline and realize the close-loop workflow by paring with AI models. Multiple representative modules, such as precise solid & liquid dispensing, uniform mixing & pellet pressing & sintering, as well as electrolyte filling and cell assembling, can be combined with intelligent robots to realize autonomous liquid & solid-state electrolyte preparation and cell performance evaluation, which accelerates the battery material screening processes, obtains the optimized compositions and structures, and improves the battery performance and cycling stability.
Large Language Models (LLMs) have emerged as powerful tools for accelerating scientific discovery, yet their static knowledge and hallucination issues hinder autonomous research applications. Recent advances integrate LLMs into agentic frameworks, enabling retrieval, reasoning, and tool use for complex scientific workflows. Here, we present a domain-specialized agent designed for reliable automation of first-principles materials computations. By embedding domain expertise, the agent ensures physically coherent multi-step workflows and consistently selects convergent, well-posed parameters, thereby enabling reliable end-to-end computational execution. A new benchmark of diverse computational tasks demonstrates that our system significantly outperforms standalone LLMs in both accuracy and robustness. This work establishes a verifiable foundation for autonomous computational experimentation and represents a key step toward fully automated scientific discovery.
The discovery of novel battery materials has been accelerated by advanced modeling and machine learning. However, their integration into battery cells remains constrained by the necessity for experimental validation. The status of development and validation of the automated robotic battery materials research platform Aurora is presented, enabling rapid testing of scientific hypotheses and validation of physical models. Aurora integrates electrolyte formulation, battery cell assembly, and battery cell cycling into a stepwise automated application‐relevant workflow. The different features of the Aurora platform can be leveraged to design experiments elucidating the impact of cycling parameters, electrode composition, and balancing, and electrolyte formulation on battery performance and long‐term cycling stability with the example of NMC||graphite and LFP||graphite cells with carbonate‐based electrolytes, which serve as benchmark battery cell chemistries. A large, structured, dataset with ontologized metadata detailing cell assembly and cycling protocols, alongside corresponding time series cycling data for all cells is provided as open research data. This study establishes Aurora as a powerful research platform for accelerating battery materials research.
The rapid expansion of materials science literature demands scalable and intelligent systems for extracting, structuring, and utilizing scientific knowledge. Traditional manual approaches to inorganic materials database construction are labor-intensive and error-prone. In this study, we propose a novel end-to-end framework that leverages instruction-tuned large language models (LLMs) for automated knowledge extraction and discovery in inorganic materials science. By fine-tuning LLMs on domain-specific corpora—including peer-reviewed articles, patents, and chemical databases—the system accurately extracts structured material-property-synthesis relationships from unstructured text. These records are aligned to a schema and stored in a queryable knowledge graph. Furthermore, we demonstrate inverse design by prompting LLMs to generate candidate materials satisfying user-defined targets (e.g., high thermal conductivity). Evaluations on benchmark synthesis corpora show high accuracy in named entity recognition (F1-score > 95%) and low numerical error for temperature/duration extraction. The resulting database supports effective downstream applications such as Curie temperature prediction (RMSE = 23.6 K, R ^ 2 = 0.927). This work showcases how LLMs, when combined with schema-aware reasoning and human-in-the-loop curation, can serve as robust tools for accelerating autonomous materials discovery.
Large language models (LLMs) are transforming laboratory automation by enabling self-driving laboratories (SDLs) that could accelerate materials research. However, current SDL implementations rely on rigid protocols that fail to capture the adaptability and intuition of expert scientists in dynamic experimental settings. Here, we show that LLM agents can automate atomic force microscopy (AFM) through our Artificially Intelligent Lab Assistant (AILA) framework. Further, we develop AFMBench—a comprehensive evaluation suite challenging LLM agents across the complete scientific workflow from experimental design to results analysis. We find that state-of-the-art LLMs struggle with basic tasks and coordination scenarios. Notably, models excelling at materials science question-answering perform poorly in laboratory settings, showing that domain knowledge does not translate to experimental capabilities. Additionally, we observe that LLM agents can deviate from instructions, a phenomenon referred to as sleepwalking, raising safety alignment concerns for SDL applications. Our ablations reveal that multi-agent frameworks significantly outperform single-agent approaches, though both remain sensitive to minor changes in instruction formatting or prompting. Finally, we evaluate AILA’s effectiveness in increasingly advanced experiments—AFM calibration, feature detection, mechanical property measurement, graphene layer counting, and indenter detection. These findings establish the necessity for benchmarking and robust safety protocols before deploying LLM agents as autonomous laboratory assistants across scientific disciplines. LLM agents could revolutionize laboratory automation, but their capabilities remain poorly tested. Here, the authors create a framework automating atomic force microscopy with LLMs and benchmark them through an end-to-end evaluation suite, revealing major limitations and safety concerns
Sintering processes play a critical role in materials manufacturing; however, their optimization remains highly dependent on empirical knowledge, fragmented datasets, and costly experimental trials. Existing modeling and machine learning approaches often lack a unified structure for representing complex relationships among processing parameters, microstructural evolution, and final material properties. This perspective article argues that knowledge graphs can serve as a missing semantic layer for organizing sintering-related data, enabling structured representation of process–property relationships across heterogeneous databases. Furthermore, the integration of autonomous AI agents equipped with memory-augmented learning models is proposed as a promising direction for continuously constructing, updating, and reasoning over such knowledge graphs. By combining structured knowledge representation with adaptive learning and agent-based optimization, this framework has the potential to transform sintering research into a self-improving, data-driven ecosystem. This perspective highlights future research directions toward intelligent, explainable, and autonomous sintering systems for advanced materials engineering.
Micro- and nanomanipulation technology has found important applications in the fields of chemistry, materials, biology, and medicine. However, traditional manual techniques, constrained by the small size and fragile nature of target samples, often lack accuracy, efficiency, and throughput. Microfluidics has become a promising tool for handling micro- and nanoscale samples, addressing the above limitations. In particular, a growing number of innovations at the intersection of robotics and microfluidics have been proposed, showcasing the incredible potential in synergizing robotics and microfluidics technologies to develop fully automated and accurate systems for versatile micro- and nanomanipulation. In this Perspective, we discuss the ongoing research and development of robotics-enhanced microfluidics for micro- and nanomanipulation. We outline the key roles of major robotics technologies such as sensing, control, and artificial intelligence (AI) in microfluidic manipulation. We also propose the future directions of AI agents in microfluidic manipulation, aiming to achieve intelligent decision-making and execution across different manipulation tasks.
Intelligent Sensing and Modeling for Molecular Sequence Prediction Based on Transformer Architecture
This study addresses a critical challenge in accurately predicting molecular sequences, essential for drug design and materials science applications. Traditional methods often struggle to effectively capture the complexities inherent in molecular structures, making the prediction task both significant and difficult. In response, we propose an intelligent sensing and modeling approach utilizing Transformer architectures. By leveraging the Simplified Molecular Input Line Entry System (SMILES) as a sequential molecular representation, we develop a Transformer-based autoregressive model capable of generating novel molecular compounds, predicting molecular properties, and optimizing chemical structures. Our model incorporates advanced techniques such as Byte Pair Encoding (BPE), rotary positional encodings, and grouped query attention (GQA), enabling it to capture hierarchical dependencies accurately and efficiently. Experimental results demonstrate the superiority of our Transformer-based method compared to traditional RNNbased models, especially in handling data sparsity and computational complexity. Furthermore, comprehensive evaluations on validation and test datasets confirm the model's ability to generate chemically valid, structurally diverse molecules, highlighting its transformative impact on drug discovery and molecular informatics.
Spectroscopy techniques, such as X-ray absorption spectroscopy (XAS), Raman spectroscopy, and X-ray photoelectron spectroscopy (XPS), are powerful tools for structural characterization and compositional analysis of materials. During spectroscopic analysis, experimental spectra are often evaluated by comparing them to previously measured or computed reference spectra. A comprehensive spectroscopy database can reduce the probability of misinterpretation and facilitate the analysis of spectra for new materials. To construct a large-scale spectroscopy database across various chemical systems and structures, we propose a multimodal learning approach to unlock the vast amount of experimental spectroscopy data embedded in materials science literature. By developing an agentic workflow that orchestrates different machine learning models, including optical character recognition (OCR), object detection, instance segmentation, and textual information retrieval, we are able to automatically convert unstructured spectroscopy data from scientific figures and text into structured numerical data and metadata that can serve as references in new spectroscopy experiments. To demonstrate the capability of our workflow, we have extracted and digitized hundreds of XAS curves relevant to battery materials. This growing database will accelerate the discovery of advanced materials by allowing for rapid determination of their crystallography, elemental composition, ionic oxidation states, and local atomic environments. Furthermore, it lays the foundation for developing high-efficiency machine learning algorithms that enable real-time spectra matching and autonomous materials characterization.
No abstract available
The functional properties of nanoclusters are dictated by their atomic-scale structures; however, the efficient discovery of global energy minima on complex high-dimensional potential energy surfaces remains a formidable challenge in computational materials design. Traditional global optimization algorithms often struggled with slow convergence and a tendency to become trapped in local minima. Here, we present a deep reinforcement learning framework called the Deepcluster, which employs an agent that autonomously navigate the intricate potential energy landscapes to identify the most stable structures of nanoclusters. Our approach leverages an actor-critic network guided by the Trust Region Policy Optimization (TRPO) algorithm to intelligently balance the exploration of new configurations with the exploitation of low-energy regions. This framework combines advanced decision-making of reinforcement learning (RL) with deep learning to intelligently balance the exploration of new configurations against the exploitation of low-energy regions. Unlike supervised methods that rely on static datasets, our on-policy Deepcluster agent autonomously explores the configuration space through trial and error, trained in real-time from an initial random structure generated by the Birmingham parallel genetic algorithm. By forgoing the need for a predefined structural dataset, the agent learns to optimize configurations dynamically. This is achieved through a comprehensive state embedding that incorporates atom-centered symmetry functions (ACSFs) alongside energy, forces, and structural flags. A deep neural actor network-based on multi-layer perceptrons-then proposes optimal atomic displacements. Resulting configurations are subsequently optimized using the Effective Medium Theory (EMT) potential in conjunction with the BFGS algorithm, enabling the Deepcluster agent to efficiently navigate the energy landscape without dependence on a vast training set of pre-optimized structures. We demonstrated the power and generality of this framework by discovering the global minima (GM) structures of a series of nickel nanoclusters (Ni10, Ni13, Ni20, Ni38). The structural and thermodynamic stability of the stable strucutures identified from the Deepcluster framework is validated by first-principles calculations, including strongly negative binding energies and thermal stability from ab initio molecular dynamics simulations at 300 K. Crucially, the identified global minima show exact agreement with independent genetic algorithm searches, providing compelling cross-methodological validation. The Deepcluster agent establishes a robust, scalable, and efficient paradigm that transcends the limitations of traditional approaches, paving the way for the accelerated discovery of complex functional nanomaterials for catalysis and energy applications.
Microfluidics is inherently interdisciplinary, spanning material science, physics, chemistry, biomedical science, mechanical engineering, and computer science. However, researchers often approach challenges from isolated disciplinary perspectives, leading to high information entropy and the formation of data silos. To address this, we propose microfluidic informatics-a novel research paradigm aimed at systematically integrating multidisciplinary knowledge using informatics methodologies. From a design science perspective, this paradigm leverages data-driven principles to manage and interpret complex, multi-source microfluidic data. We introduce a universal information representation model: MicrofluidicInfo = { I, F, S, D, O, DF, DA, MR, UM}, constructed using machine learning approaches such as dimensionality reduction, clustering, classification, and regression. This model enables intuitive and standardized representation of information within each hierarchical unit and their interconnections. Microfluidic informatics provides a digital-intelligent framework for advancing microfluidic mechanism studies and accelerating translational applications.
No abstract available
With the fast-paced development of artificial intelligence (AI), large language models (LLMs) have attracted the most attention and are being applied in a wide range of application scenarios, including programming and health care, among others. Research has shown that LLM does well in programming and may fall behind in other fields at the moment. It is reasonable to expect that advances in LLM’s capability in “reasoning” with less computing resources are in the near future with an extremely quickly advancing pace. For the scientific research field, many farsighted psychologists and social science researchers have attempted to propose several modes of LLM participating in human scientists’ behavior patterns. At some level, research itself can be considered a type of social event of the scientific community, forging a community understanding of a certain natural occurrence by using a certain discipline to explain it. Several research papers have attempted to use an LLM or other generative models in materials research discovery; however, the reliability and the limitations of the approach have not been thoroughly discussed from the perspective of domain scientists.
Large language models (LLMs) have reshaped the research landscape by enabling new approaches to knowledge retrieval and creative ideation. Yet their application in discipline-specific experimental science, particularly in highly multi-disciplinary domains like materials science, remains limited. We present a first-of-its-kind framework that integrates generative AI with literature from hitherto-unconnected fields such as plant science, biomimetics, and materials engineering to extract insights and design experiments for materials. We focus on humidity-responsive systems such as pollen-based materials and Rhapis excelsa (broadleaf lady palm) leaves, which exhibit self-actuation and adaptive performance. Using a suite of AI tools, including a fine-tuned model (BioinspiredLLM), Retrieval-Augmented Generation (RAG), agentic systems, and a Hierarchical Sampling strategy, we extract structure-property relationships and translate them into new classes of bioinspired materials. Structured inference protocols generate and evaluate hundreds of hypotheses from a single query, surfacing novel and experimentally tractable ideas. We validate our approach through real-world implementation: LLM-generated procedures, materials designs, and mechanical predictions were tested in the laboratory, culminating in the fabrication of a novel pollen-based adhesive with tunable morphology and measured shear strength, establishing a foundation for future plant-derived adhesive design. This work demonstrates how AI-assisted ideation can drive real-world materials design and enable effective human-AI collaboration.
The Copilot for Real-world Experimental Scientist (CRESt) system empowers researchers to control autonomous laboratories through conversational AI, providing a seamless interface for managing complex experimental workflows. We have enhanced CRESt by integrating a multi-agent collaboration mechanism that utilizes the complementary strengths of the ChatGPT and Gemini models for precise image analysis in materials science. This innovative approach significantly improves the accuracy of experimental outcomes by fostering structured debates between the AI models, which enhances decision-making processes in materials phase analysis. Additionally, to evaluate the generalizability of this approach, we tested it on a quantitative task of counting particles. Here, the collaboration between the AI models also led to improved results, demonstrating the versatility and robustness of this method. By harnessing this dual-AI framework, this approach stands as a pioneering method for enhancing experimental accuracy and efficiency in materials research, with applications extending beyond CRESt to broader scientific experimentation and analysis.
Metadata vocabularies are essential for advancing FAIR and FARR data principles, but their development constrained by limited human resources and inconsistent standardization practices. This paper introduces MatSci-YAMZ, a platform that integrates artificial intelligence (AI) and human-in-the-loop (HILT), including crowdsourcing, to support metadata vocabulary development. The paper reports on a proof-of-concept use case evaluating the AI-HILT model in materials science, a highly interdisciplinary domain Six (6) participants affiliated with the NSF Institute for Data-Driven Dynamical Design (ID4) engaged with the MatSci-YAMZ plaform over several weeks, contributing term definitions and providing examples to prompt the AI-definitions refinement. Nineteen (19) AI-generated definitions were successfully created, with iterative feedback loops demonstrating the feasibility of AI-HILT refinement. Findings confirm the feasibility AI-HILT model highlighting 1) a successful proof of concept, 2) alignment with FAIR and open-science principles, 3) a research protocol to guide future studies, and 4) the potential for scalability across domains. Overall, MatSci-YAMZ's underlying model has the capacity to enhance semantic transparency and reduce time required for consensus building and metadata vocabulary development.
Predictive atomistic simulations have propelled materials discovery, yet routine setup and debugging still demand computer specialists. This know-how gap limits Integrated Computational Materials Engineering (ICME), where state-of-the-art codes exist but remain cumbersome for non-experts. We address this bottleneck with GENIUS, an AI-agentic workflow that fuses a smart Quantum ESPRESSO knowledge graph with a tiered hierarchy of large language models supervised by a finite-state error-recovery machine. Here we show that GENIUS translates free-form human-generated prompts into validated input files that run to completion on $\approx$80% of 295 diverse benchmarks, where 76% are autonomously repaired, with success decaying exponentially to a 7% baseline. Compared with LLM-only baselines, GENIUS halves inference costs and virtually eliminates hallucinations. The framework democratizes electronic-structure DFT simulations by intelligently automating protocol generation, validation, and repair, opening large-scale screening and accelerating ICME design loops across academia and industry worldwide.
No abstract available
One of the main bottlenecks for the wide adoption of atomistic simulation pipelines for computational materials design is the high complexity of the workflows which many times requires the use...
Autonomous microscopy promises path towards improved materials characterization – and unprecedented advances in physics
No abstract available
The construction of experimental datasets is essential for expanding the scope of data-driven scientific discovery. Recent advances in natural language processing (NLP) have facilitated automatic extraction of structured data from unstructured scientific literature. While existing approaches-multi-step and direct methods-offer valuable capabilities, they also come with limitations when applied independently. Here, we propose a novel hybrid text-mining framework that integrates the advantages of both methods to convert unstructured scientific text into structured data. Our approach first transforms raw text into entity-recognized text, and subsequently into structured form. Furthermore, beyond the overall data structuring framework, we also enhance entity recognition performance by introducing an entity marker-a simple yet effective technique that uses symbolic annotations to highlight target entities. Specifically, our entity marker-based hybrid approach not only consistently outperforms previous entity recognition approaches across three benchmark datasets (MatScholar, SOFC, and SOFC slot NER) but also improve the quality of final structured data-yielding up to a 58% improvement in entity-level F1 score and up to 83% improvement in relation-level F1 score compared to direct approach.
No abstract available
No abstract available
Generative models and machine learning promise accelerated material discovery in MOFs for CO2 capture and water harvesting but face significant challenges navigating vast chemical spaces while ensuring synthetizability. Here, we present MOFGen, a system of Agentic AI comprising interconnected agents: a large language model that proposes novel MOF compositions, a diffusion model that generates crystal structures, quantum mechanical agents that optimize and filter candidates, and synthetic-feasibility agents guided by expert rules and machine learning. Trained on all experimentally reported MOFs and computational databases, MOFGen generated hundreds of thousands of novel MOF structures and synthesizable organic linkers. Our methodology was validated through high-throughput experiments and the successful synthesis of five"AI-dreamt"MOFs, representing a major step toward automated synthesizable material discovery.
Material selection is fundamental to the design process, as it significantly affects the cost, performance, appearance, manufacturability, and sustainability of a product. It is a complex, open-ended challenge that forces designers to continuously adapt to new information, balance diverse stakeholder demands, weigh trade-offs, and navigate uncertainties to achieve the optimal outcome. Previous studies have explored the potential of large language models (LLMs) to assist in the material selection process, with findings suggesting that LLMs could provide valuable support. However, discrepancies between LLM outputs and expert recommendations indicate the need for further research. To address the limitations of standalone LLMs, particularly their lack of reasoning and action-execution capabilities, agentic AI has been developed with enhanced functionalities. These agents integrate LLMs with external search tools, allowing them to retrieve and analyze domain-specific information, iteratively refine responses, and improve decision-making alignment with experts. This study compares standalone LLMs and agentic AI frameworks, examining how search-augmented agents can more effectively emulate expert decision-making in material selection. Our findings reveal a nonlinear relationship between model size and performance, with some models demonstrating lower proximity to human survey results and struggling to follow instructions. These insights contribute to a broader understanding of AI integration in design workflows.
Designing proteins de novo with tailored structural, physicochemical, and functional properties remains a grand challenge in biotechnology, medicine, and materials science, due to the vastness of sequence space and the complex coupling between sequence, structure, and function. Current state-of-the-art generative methods, such as protein language models (PLMs) and diffusion-based architectures, often require extensive fine-tuning, task-specific data, or model reconfiguration to support objective-directed design, thereby limiting their flexibility and scalability. To overcome these limitations, we present a decentralized, agent-based framework inspired by swarm intelligence for de novo protein design. In this approach, multiple large language model (LLM) agents operate in parallel, each assigned to a specific residue position. These agents iteratively propose context-aware mutations by integrating design objectives, local neighborhood interactions, and memory and feedback from previous iterations. This position-wise, decentralized coordination enables emergent design of diverse, well-defined sequences without reliance on motif scaffolds or multiple sequence alignments, validated with experiments on proteins with alpha helix and coil structures. Through analyses of residue conservation, structure-based metrics, and sequence convergence and embeddings, we demonstrate that the framework exhibits emergent behaviors and effective navigation of the protein fitness landscape. Our method achieves efficient, objective-directed designs within a few GPU-hours and operates entirely without fine-tuning or specialized training, offering a generalizable and adaptable solution for protein design. Beyond proteins, the approach lays the groundwork for collective LLM-driven design across biomolecular systems and other scientific discovery tasks.
Accurately identifying the synthesis conditions of metal-organic frameworks (MOFs) is essential for guiding experimental design, yet remains challenging because relevant information in the literature is often scattered, inconsistent, and difficult to interpret. We present MOFh6, a large language model driven system that reads raw articles or crystal codes and converts them into standardized synthesis tables. It links related descriptions across paragraphs, unifies ligand abbreviations with full names, and outputs structured parameters ready for use. MOFh6 achieved 99% extraction accuracy, resolved 94.1% of abbreviation cases across five major publishers, and maintained a precision of 0.93 +/- 0.01. Processing a full text takes 9.6 s, locating synthesis descriptions 36 s, with 100 papers processed for USD 4.24. By replacing static database lookups with real-time extraction, MOFh6 reshapes MOF synthesis research, accelerating the conversion of literature knowledge into practical synthesis protocols and enabling scalable, data-driven materials discovery.
No abstract available
A multi-agent artificial intelligence (AI) model is developed to automate the discovery of new metallic alloys, integrating multimodal data and external knowledge, including insights from physics via atomistic simulations. The system consists of (a) large language models (LLMs) for tasks such as reasoning and planning, (b) AI agents with distinct roles collaborating dynamically, and (c) a newly developed graph neural network (GNN) model for rapid retrieval of physical properties. We chose the ternary NbMoTa body-centered-cubic alloy as our model system and developed the GNN to predict two fundamental materials properties: the Peierls barrier and the solute/screw dislocation interaction energy. Our GNN model efficiently predicts these properties, reducing reliance on costly brute-force calculations and alleviating the computational demands on the multi-agent system. By combining the predictive capabilities of GNNs with the collaborative intelligence of LLM-driven reasoning agents, the system autonomously explores vast alloy design spaces, identifies trends in atomic-scale properties, and predicts macroscale mechanical strength, as demonstrated by several computational experiments. This synergistic approach accelerates the discovery of advanced alloys and holds promise for broader applications in other complex systems, marking a step forward in automated materials discovery and design. Traditional deep learning models, such as graph neural networks and convolutional neural networks, operate within the confines of their training data sets, making single-step inferences for regression or classification. Our work introduces a multi-agent strategy that transcends these limitations by integrating deep learning with reasoning and decision-making capabilities. This intelligent system actively interprets results, determines subsequent actions, and iteratively refines predictions, accelerating the materials design process. We demonstrate its effectiveness in exploring the vast compositional space of a ternary alloy, where the model dynamically solicits data, analyzes trends, generates visualizations, and derives insights into materials behavior. By enabling accurate predictions of key alloy characteristics, our approach advances the discovery of novel metallic systems and underscores the critical role of solid-solution alloying. More broadly, it represents a major step toward integrating artificial intelligence with scientific reasoning, moving closer to artificial general intelligence in engineering. This paradigm shift has profound implications for materials science, enabling more efficient, autonomous, and intelligent exploration of complex materials spaces.
The rapid discovery of materials is constrained by the lack of large, machine-readable datasets that couple performance metrics with structural context. Existing databases are either small, manually curated, or biased toward first principles results, leaving experimental literature underexploited. We present an agentic, large language model (LLM)-driven workflow that autonomously extracts thermoelectric and structural-properties from about 10,000 full-text scientific articles. The pipeline integrates dynamic token allocation, zeroshot multi-agent extraction, and conditional table parsing to balance accuracy against computational cost. Benchmarking on 50 curated papers shows that GPT-4.1 achieves the highest accuracy (F1 = 0.91 for thermoelectric properties and 0.82 for structural fields), while GPT-4.1 Mini delivers nearly comparable performance (F1 = 0.89 and 0.81) at a fraction of the cost, enabling practical large scale deployment. Applying this workflow, we curated 27,822 temperature resolved property records with normalized units, spanning figure of merit (ZT), Seebeck coefficient, conductivity, resistivity, power factor, and thermal conductivity, together with structural attributes such as crystal class, space group, and doping strategy. Dataset analysis reproduces known thermoelectric trends, such as the superior performance of alloys over oxides and the advantage of p-type doping, while also surfacing broader structure-property correlations. To facilitate community access, we release an interactive web explorer with semantic filters, numeric queries, and CSV export. This study delivers the largest LLM-curated thermoelectric dataset to date, provides a reproducible and cost-profiled extraction pipeline, and establishes a foundation for scalable, data-driven materials discovery beyond thermoelectrics.
Materials discovery relies on high-throughput, high-fidelity simulation techniques such as Density Functional Theory (DFT), which require years of training, extensive parameter fine-tuning and systematic error handling. To address these challenges, we introduce the DFT-based Research Engine for Agentic Materials Screening (DREAMS), a hierarchical, multi-agent framework for DFT simulation that combines a central Large Language Model (LLM) planner agent with domain-specific LLM agents for atomistic structure generation, systematic DFT convergence testing, High-Performance Computing (HPC) scheduling, and error handling. In addition, a shared canvas helps the LLM agents to structure their discussions, preserve context and prevent hallucination. We validate DREAMS capabilities on the Sol27LC lattice-constant benchmark, achieving average errors below 1\% compared to the results of human DFT experts. Furthermore, we apply DREAMS to the long-standing CO/Pt(111) adsorption puzzle, demonstrating its long-term and complex problem-solving capabilities. The framework again reproduces expert-level literature adsorption-energy differences. Finally, DREAMS is employed to quantify functional-driven uncertainties with Bayesian ensemble sampling, confirming the Face Centered Cubic (FCC)-site preference at the Generalized Gradient Approximation (GGA) DFT level. In conclusion, DREAMS approaches L3-level automation - autonomous exploration of a defined design space - and significantly reduces the reliance on human expertise and intervention, offering a scalable path toward democratized, high-throughput, high-fidelity computational materials discovery.
Large language model (LLM)-based agentic frameworks increasingly adopt the paradigm of dynamically generating task-specific agents. We suggest that not only agents but also specialized software modules for scientific and engineering tasks can be generated on demand. We demonstrate this concept in the field of solid mechanics. There, so-called constitutive models are required to describe the relationship between mechanical stress and body deformation. Constitutive models are essential for both the scientific understanding and industrial application of materials. However, even recent data-driven methods of constitutive modeling, such as constitutive artificial neural networks (CANNs), still require substantial expert knowledge and human labor. We present a framework in which an LLM generates a CANN on demand, tailored to a given material class and dataset provided by the user. The framework covers LLM-based architecture selection, integration of physical constraints, and complete code generation. Evaluation on three benchmark problems demonstrates that LLM-generated CANNs achieve accuracy comparable to or greater than manually engineered counterparts, while also exhibiting reliable generalization to unseen loading scenarios and extrapolation to large deformations. These findings indicate that LLM-based generation of physics-constrained neural networks can substantially reduce the expertise required for constitutive modeling and represent a step toward practical end-to-end automation.
Advances in generative artificial intelligence are transforming how metal-organic frameworks (MOFs) are designed and discovered. This Perspective introduces the shift from laborious enumeration of MOF candidates to generative approaches that can autonomously propose and synthesize in the laboratory new porous reticular structures on demand. We outline the progress of employing deep learning models, such as variational autoencoders, diffusion models, and large language model-based agents, that are fueled by the growing amount of available data from the MOF community and suggest novel crystalline materials designs. These generative tools can be combined with high-throughput computational screening and even automated experiments to form accelerated, closed-loop discovery pipelines. The result is a new paradigm for reticular chemistry in which AI algorithms more efficiently direct the search for high-performance MOF materials for clean air and energy applications. Finally, we highlight remaining challenges such as synthetic feasibility, dataset diversity, and the need for further integration of domain knowledge.
Agentic systems enable the intelligent use of research tooling, augmenting a researcher's ability to investigate and propose novel solutions to existing problems. Within Additive Manufacturing (AM), alloy selection and evaluation remains a complex challenge, often requiring expertise in the various domains of materials science, thermodynamic simulations, and experimental analysis. Large Language Model (LLM) enabled agents can facilitate this endeavor by utilizing their extensive knowledge base to dispatch tool calls via Model Context Protocol (MCP) to perform actions such as thermophysical property diagram calculations and lack of fusion process map generation. In addition, the multi-agent system can effectively reason through complex user prompts and provide analysis on the lack of fusion process window of common alloys such as SS316L and IN718 along with proposed composition variants of known alloys. These agents can dynamically adjust their task trajectory to the outcomes of tool call results, effectively enabling autonomous decision-making in practical environments. This work aims to showcase the benefits of adopting a LLM enabled multi-agent system to automate and accelerate the task of evaluating proposed additive manufacturing alloys, both novel and known.
Large language models (LLMs) are creating a new paradigm for materials science by transforming textual insights into experimental findings. Leveraging their strengths in natural language understanding, multimodal alignment, and few‐shot reasoning, LLMs already show potential in property prediction, synthesis planning, and uncertainty quantification. This perspective highlights four key roles, Oracle, Surrogate, Quant, and Arbiter, to systematize recent advancements of LLMs in knowledge extraction, property inference, risk assessment, and decision‐making. Experience suggests that true value arises from integrating these capabilities into a verifiable, traceable loop rather than merely scaling model size. However, LLMs still face challenges due to data heterogeneity, limited interpretability, hallucination control, and misalignment with scientific tasks. To address these issues, we propose three forward‐looking directions: developing domain‐adapted foundation models infused with materials science context, establishing a standardized cross‐modal data infrastructure, and incorporating expert feedback alongside robotic automated experimentation into a fully traceable research loop. Through enhanced human–AI collaboration and methodological innovation, LLMs can transform from general‐purpose language tools into scientifically aware partners, advancing materials discovery toward a more efficient, interpretable, and sustainable future.
Topological materials occupy a frontier in condensed-matter physics thanks to their remarkable electronic and quantum properties, yet their cross-scale design remains bottlenecked by inefficient discovery workflows. Here, we introduce TopoMAS (Topological materials Multi-Agent System), an interactive human-AI framework that seamlessly orchestrates the entire materials-discovery pipeline: from user-defined queries and multi-source data retrieval, through theoretical inference and crystal-structure generation, to first-principles validation. Crucially, TopoMAS closes the loop by autonomously integrating computational outcomes into a dynamic knowledge graph, enabling continuous knowledge refinement. In collaboration with human experts, it has already guided the identification of novel topological phases SrSbO3, confirmed by first-principles calculations. Comprehensive benchmarks demonstrate robust adaptability across base Large Language Model, with the lightweight Qwen2.5-72B model achieving 94.55% accuracy while consuming only 74.3-78.4% of tokens required by Qwen3-235B and 83.0% of DeepSeek-V3's usage--delivering responses twice as fast as Qwen3-235B. This efficiency establishes TopoMAS as an accelerator for computation-driven discovery pipelines. By harmonizing rational agent orchestration with a self-evolving knowledge graph, our framework not only delivers immediate advances in topological materials but also establishes a transferable, extensible paradigm for materials-science domain.
Deep generative models hold great promise for inverse materials design, yet their efficiency and accuracy remain constrained by data scarcity and model architecture. Here, we introduce AlloyGAN, a closed-loop framework that integrates Large Language Model (LLM)-assisted text mining with Conditional Generative Adversarial Networks (CGANs) to enhance data diversity and improve inverse design. Taking alloy discovery as a case study, AlloyGAN systematically refines material candidates through iterative screening and experimental validation. For metallic glasses, the framework predicts thermodynamic properties with discrepancies of less than 8% from experiments, demonstrating its robustness. By bridging generative AI with domain knowledge and validation workflows, AlloyGAN offers a scalable approach to accelerate the discovery of materials with tailored properties, paving the way for broader applications in materials science.
No abstract available
No abstract available
No abstract available
No abstract available
Abstract Solid‐state electrolytes (SSEs) are essential for next‐generation energy storage technologies. However, the exploration of divalent hydrides is hindered by complex ionic migration mechanisms and reliance on “trial‐and‐error” methodologies. Conventional approaches, which focus on individual materials and predefined pathways, remain inefficient. Herein, we present a data‐driven artificial intelligence framework that integrates a comprehensive SSE database with large language models and ab initio metadynamics (MetaD) simulations to accelerate the discovery of hydride SSEs. Our study reveals that hydrides incorporating neutral molecules have great potential, with MetaD revealing novel “two‐step” ion migration mechanisms. Predictive models developed using both experimental and computational data accurately forecast ionic migration activation energies for various types of hydride SSEs. In particular, some SSEs with carbon‐containing neutral molecules exhibit notably low activation energy, with barriers as low as 0.62 eV. This framework enables the rapid identification of optimized SSE candidates and establishes a transformative tool for advancing sustainable energy storage technologies.
In the field of materials science, addressing the complex relationship between the material structure and properties has increasingly involved leveraging the text generation capabilities of AI-generated content (AIGC) models for tasks that include literature mining and data analysis. However, theoretical calculations and code development remain labor-intensive challenges. This paper proposes a novel approach based on text-to-code generation, utilizing large language models to automate the implementation of simulation programs in materials science. The effectiveness of automated code generation and review is validated with thermodynamics simulations based on the LAMMPS software as a foundation. This study introduces Molecular Dynamics Agent (MDAgent), a framework designed to guide large models in automatically generating, executing, and refining simulation code. In addition, a thermodynamic simulation code dataset for LAMMPS was constructed to fine-tune the language model. Expert evaluation scores demonstrate that MDAgent significantly improves the code generation and review capabilities. The proposed approach reduces the average task time by 42.22%, as compared to traditional models, thus highlighting its potential applications in the field of materials science.
With the rapid growth of solid-state battery research, massive research literature has posed a huge challenge to researchers on how to effectively use it. This paper introduces a multi-databases question and answer platform for solid-state batteries based on large language models and Model Context Protocol (MCP): Solid-State Battery Question and Answer Platform (SSB-Q&A- Platform), which aims to provide researchers with effective insights into solid-state battery design. The SSB-Q&A- Platform, which is based on the MCP, encapsulates the solid-state battery SQL database, vector database and knowledge graph database into agents, and realizes the interaction between the large language model and the solid-state battery multi-databases through the Qwen-Agent framework. The information of the database comes from the solid-state battery literatures, which ensures that researchers can accurately obtain solid-state battery-related information, avoiding the illusion of large language models and the lack of information in a single database interaction. In general, SSB- Q&A-Platform ensures the credibility of solid-state battery information sources through multi-database interaction. The platform provides highly accurate references, offering researchers reliable solid-state battery design suggestions.
New discoveries in chemistry and materials science, with an increasingly expanding volume of requisite knowledge and experimental workload, provide unique opportunities for machine learning (ML) to take critical roles in accelerating research efficiency. This work demonstrates 1) the use of large language models (LLMs) for automated literature reviews; and 2) the training of an ML model to predict chemical knowledge (thermodynamic parameters). The LLM‐based literature review tool (LMExt) successfully extracted chemical information and beyond into a machine‐readable structure, including stability constants for metal cation–ligand interactions, thermodynamic properties, and other broader data types (medical research papers and financial reports), effectively overcoming the challenges inherent in each domain. Using the autonomous acquisition of thermodynamic data, an ML model is trained using the CatBoost algorithm for accurately predicting thermodynamic parameters (e.g., enthalpy of formation) of minerals. This work highlights the transformative potential of integrated ML approaches to reshape chemistry and materials science research.
No abstract available
This study proposes a condensed matter property prediction framework based on Large Language Model (LLM), which combines material crystal structure encoding with physical prior knowledge to solve the problems of high computational cost and low high-throughput screening efficiency in traditional Density Functional Theory (DFT). The system design includes a multi-scale feature extraction module (atomic level graph attention+lattice symmetry encoding) and an adaptive head for predicting physical properties, supporting end-to-end prediction of key physical properties such as elastic modulus and band gap. Experiments have shown that on the Materials Project dataset, the model's prediction error (MAE) for elastic modulus is 8.7 GPa (32 % lower than CGCNN), and the correlation coefficient ($\mathrm{R}^{2}$) for semiconductor bandgap is 0.91 (28 % higher than SchNet). Key innovations include: 1) Crystal serialization representation method based on RoBERTa architecture; 2) Introduce a loss function with symmetry constraints. In the future, we will explore cross scale joint prediction of physical properties and closed-loop optimization of experimental data.
Solid-state synthesis is widely used to obtain various inorganic materials, such as battery materials and bulk thermoelectrics. Despite its prevalence, the process remains challenging due to the lack of a general theory and well-understood underlying reaction mechanisms. While prior works have successfully extracted structured datasets from literature, they often neglect product phase purity or yield. In this work, we construct a solid-state synthesis dataset consisting of 80,806 syntheses extracted with a large language model (LLM), including 18,869 reactions with impurity phase(s). Our dataset not only validates expected thermodynamic trends for impurity phase formation but also identifies challenging cases where impurity phases emerge even when the target phase is significantly more stable.
Metal additive manufacturing (AM) involves complex interdependencies among processes, materials, feedstock, and post-processing steps. However, the underlying relationships and domain knowledge remain fragmented across literature and static databases that often require expert-level queries, limiting their applicability in design and planning. To address these limitations, we develop a novel and structured knowledge graph (KG), representing 53 distinct metals and alloys across seven material categories, nine AM processes, four feedstock types, and corresponding post-processing requirements. A large language model (LLM) interface, guided by a few-shot prompting strategy, enables natural language querying without the need for formal query syntax. The system supports a range of tasks, including compatibility evaluation, constraint-based filtering, and design for AM (DfAM) guidance. User queries in natural language are normalized, translated into Cypher, and executed on the KG, with results returned in a structured format. This work introduces the first interactive system that connects a domain-specific metal AM KG with an LLM interface, delivering accessible and explainable decision support for engineers and promoting human-centered tools in manufacturing knowledge systems.
Group III-nitride semiconductors, distinguished by their wide bandgap characteristics, have demonstrated significant advantages in high-performance opto-electronic devices and emerged as a cutting-edge domain in semiconductor materials research. This study aims to construct Qwen-Nitrides, a large language model (LLM)-based expert system for wide-bandgap Group III-nitride semiconductors, designed to enhance technical cognition efficiency in two critical subdomains: AlGaN material systems and deep ultraviolet (DUV) LED devices. Building upon the Qwen2 LLM architecture, we developed a domain knowledge enhancement framework integrated with Low-Rank Adaptation (LoRA) fine-tuning technology. Through an innovative knowledge distillation framework, our approach achieves: 1) A multiphase knowledge acquisition workflow combining automated knowledge extraction-synthesis engines with expert manual calibration protocols, ensuring high-quality fine-tuning data; 2) A domain-adaptive fine-tuning mechanism that effectively improves the Qwen2 model's cognitive precision in semiconductor-specific contexts. Experimental results demonstrate that the optimized Qwen-Nitrides model exhibits substantial accuracy improvements in core knowledge Q&A tasks compared to the baseline model. The results indicated that this study provides an efficient technical route for future research on group III- nitride semiconductor expert system.
Large language models (LLMs) are general-purpose tools with wide-ranging applications, including in materials science. In this work, we introduce aLLoyM, a fine-tuned LLM specifically trained on alloy compositions, temperatures, and their corresponding phase information. To develop aLLoyM, we curated question-and-answer (Q&A) pairs for binary and ternary phase diagrams using the open-source Computational Phase Diagram Database (CPDDB) and assessments based on CALPHAD (CALculation of PHAse Diagrams). We fine-tuned Mistral, an open-source pre-trained LLM, for two distinct Q&A formats: multiple-choice and short-answer. Benchmark evaluations demonstrate that fine-tuning substantially enhances performance on multiple-choice phase diagram questions. Moreover, the short-answer model of aLLoyM can generate novel phase diagrams from its components alone, suggesting that it may aid the discovery of new materials systems. To promote further research and adoption, we have publicly released the short-answer fine-tuned version of aLLoyM, along with the complete benchmarking Q&A dataset, on Hugging Face.
Large language models (LLMs) hold great promise for specialized scientific domains such as materials science, yet adapting them efficiently and accurately to domain-specific knowledge remains challenging due to limited data and high knowledge density. We propose a two-stage framework that combines structured model compression with a scientific fine-tuning regimen to address this challenge. In the compression stage, we decompose the LLM's weight matrices into local low-rank"rank blocks"and arrange these blocks in a Penrose-like non-periodic tiling pattern. Each block is then compacted via spectral transformations (e.g., discrete cosine or Fourier transforms), and a Kullback-Leibler (KL) divergence-based alignment loss preserves the distributional similarity between the compressed model's representations and those of the original full model. In the adaptation stage, the compressed model is further tuned using a human-like scientific reading protocol: it processes technical materials science documents section by section, engaging in a structured question-and-answer routine for each section. This section-wise Q&A fine-tuning strategy extracts explicit reasoning traces and gradually injects domain knowledge, while minimizing catastrophic forgetting of the model's general language capabilities. By balancing efficient compression with targeted adaptation, our two-stage approach enables precise specialization of LLMs to high-value domains under data-scarce conditions. We present this principled yet exploratory pipeline and outline its potential for advancing materials science knowledge integration, laying the groundwork for comprehensive empirical evaluation in future work.
In this study, an AI-guided framework is developed for semantic-driven material design, integrating large language models (LLMs) with first-principles methods and crystal structure prediction (MatPC) to identify novel photovoltaic materials. By utilizing prompt-engineered LLMs, semantic embeddings of material property descriptions are leveraged to identify uncommon materials candidates with strong alignment to desired functionalities. The material discovery pipeline combines LLMs, similarity scoring, dimensional reduction, formula screening, crystal structure prediction, and DFT validation into a cohesive computational workflow. The candidates undergo crystal structure prediction to generate polymorphs using a hybrid genetic algorithm-graph neural network (GA-GNN) approach, followed by validation through DFT calculations on atomic and electronic properties, optical absorption, and theoretical power conversion efficiencies. As a case study, an unconventional Bi2WO6 polymorph is identified as a promising photovoltaic material, with its electronic and optical properties thoroughly analyzed via first-principles calculations. Our study presents an efficient material discovery pipeline leveraging large language models (LLMs) to accelerate the material design process.
No abstract available
No abstract available
The field of materials science stands at a critical inflection point. While laboratory innovations continue to emerge at an unprecedented pace, the traditional timeline from discovery to market in 10–20 years has become an unacceptable bottleneck in addressing urgent technological challenges. We argue that self-driving laboratories (SDLs) represent not merely another step in automation, but a fundamental reimagining of the materials development pipeline. By integrating manufacturing constraints and scalability considerations from the earliest stages of discovery, SDLs can collapse the laboratory-to-factory timeline while improving reproducibility and success rates. This requires abandoning the traditional sequential approach of materials screening, device optimization and manufacturing scale-up; in favor of concurrent cross-scale development. Here, we critically examine current SDL implementations, challenge prevailing assumptions about automation in materials science, and propose a roadmap for truly integrated materials development platforms that could revolutionize how we translate laboratory discoveries into commercial products.
No abstract available
To close the gap between the rates of computational screening and experimental realization of novel materials 1,2 , we introduce the A-Lab, an autonomous laboratory for the solid-state synthesis of inorganic powders. This platform uses computations, historical data from the literature, machine learning (ML) and active learning to plan and interpret the outcomes of experiments performed using robotics. Over 17 days of continuous operation, the A-Lab realized 36 compounds from a set of 57 targets including a variety of oxides and phosphates that were identified using large-scale ab initio phase-stability data from the Materials Project and Google DeepMind. Synthesis recipes were proposed by natural-language models trained on the literature and optimized using an active-learning approach grounded in thermodynamics. Analysis of the failed syntheses provides direct and actionable suggestions to improve current techniques for materials screening and synthesis design. The high success rate demonstrates the effectiveness of artificial-intelligence-driven platforms for autonomous materials synthesis and motivates further integration of computations, historical knowledge and robotics.
Autonomous materials science, where active learning is used to navigate large compositional phase space, has emerged as a powerful vehicle to rapidly explore new materials. A crucial aspect of autonomous materials science is exploring new materials using as little data as possible. Gaussian process-based active learning allows effective charting of multi-dimensional parameter space with a limited number of training data, and thus is a common algorithmic choice for autonomous materials science. An integral part of the autonomous workflow is the application of kernel functions for quantifying similarities among measured data points. A recent theoretical breakthrough has shown that quantum kernel models can achieve similar performance with less training data than classical models. This signals the possible advantage of applying quantum kernel machine learning to autonomous materials discovery. In this work, we compare quantum and classical kernels for their utility in sequential phase space navigation for autonomous materials science. Specifically, we compute a quantum kernel and several classical kernels for x-ray diffraction patterns taken from an Fe-Ga-Pd ternary composition spread library. We conduct our study on both IonQ's Aria trapped ion quantum computer hardware and the corresponding classical noisy simulator. We experimentally verify that a quantum kernel model can outperform some classical kernel models. The results highlight the potential of quantum kernel machine learning methods for accelerating materials discovery and suggest complex x-ray diffraction data is a candidate for robust quantum kernel model advantage.
No abstract available
The International Workshop on Data-Driven Computational and Theoretical Materials Design was held between October 9-13, 2024, in Shanghai, gathering leading scientists and researchers from around the world, representing various aspects of data-driven AI methodologies and applications in materials design. The topics covered over 46 talks and 29 posters spanned a wide range of the latest advancements, including Machine Learning for Materials Design, Method Development, Machine Learning Interatomic Potentials, Advanced Computing, Infrastructure and Standards, Large Language Models, and Autonomous Labs. As part of the workshop, a panel discussion titled “Unlocking the AI Future of Materials Science” was held to disseminate the state-of-the-art of AI/ML in materials science and consider directions for the future. This report is a synthesis, for this Special Issue, of the panel discussion - drawing on insights gained from the workshop as a whole and surrounding conversations, in particular, the question of what constitutes success.
Advancements in energy technologies increasingly depend on the development of solid-state materials that serve as high-performance and sustainable electrodes and electrolytes. While computational methods, artificial intelligence, and design strategies can identify promising materials, synthesizing these materials often remains a significant bottleneck due to its inherent complexity and resource-intensive nature. To streamline the design–make–measure loop, an autonomous solid-state synthesis laboratory (named the A-Lab) powered by AI and automation was first established by a team in Berkeley led by Professor Ceder. As one of the initial members in this pioneering project, I will share the stories about the engineering innovations, scientific discoveries, and person experiences during the build of the A-Lab. Through case studies, I will highlight how the A-Lab holds promises to accelerate batteries materials research. Additionally, recognizing the importance of complementary solution-based methods, such as precipitation and hydrothermal synthesis, I will discuss recent efforts in my group aimed at automating these processes, from predicting optimal synthesis condition to automating experimental execution.
No abstract available
The rapid advancement of perovskite solar cells (PSCs) has led to an exponential growth in research publications, creating an urgent need for efficient knowledge management and reasoning systems in this domain. We present a comprehensive knowledge-enhanced system for PSCs that integrates three key components. First, we develop Perovskite-KG, a domain-specific knowledge graph constructed from 1,517 research papers, containing 23,789 entities and 22,272 relationships. Second, we create two complementary datasets: Perovskite-Chat, comprising 55,101 high-quality question-answer pairs generated through a novel multi-agent framework, and Perovskite-Reasoning, containing 2,217 carefully curated materials science problems. Third, we introduce two specialized large language models: Perovskite-Chat-LLM for domain-specific knowledge assistance and Perovskite-Reasoning-LLM for scientific reasoning tasks. Experimental results demonstrate that our system significantly outperforms existing models in both domain-specific knowledge retrieval and scientific reasoning tasks, providing researchers with effective tools for literature review, experimental design, and complex problem-solving in PSC research.
No abstract available
No abstract available
Density-functional-theory (DFT) simulations with the Vienna Ab initio Simulation Package (VASP) are indispensable in computational materials science but often require extensive manual setup, monitoring, and postprocessing. Here, we introduce VASPilot, an open-source platform that fully automates VASP workflows via a multi-agent architecture built on the CrewAI framework and a standardized model context protocol (MCP). VASPilot’s agent suite handles every stage of a VASP study from retrieving crystal structures and generating input files to submitting Slurm jobs, parsing error messages, and dynamically adjusting parameters for seamless restarts. A lightweight Quart-based web interface provides intuitive task submission, real-time progress tracking, and drill-down access to execution logs, structure visualizations, and plots. We validated VASPilot on both routine and advanced benchmarks: automated band-structure and density-of-states calculations (including on-the-fly symmetry corrections), plane-wave cutoff convergence tests, lattice-constant optimizations with various van der Waals corrections, and cross-material band-gap comparisons for transition-metal dichalcogenides. In all cases, VASPilot completed the missions reliably and without manual intervention. Moreover, its modular design allows easy extension to other DFT codes simply by deploying the appropriate MCP server. By offloading technical overhead, VASPilot enables researchers to focus on scientific discovery and accelerates high-throughput computational materials research.
Aqueous deep eutectic electrolytes (DEEs) offer great potential for low‐cost zinc‐ion batteries but often have limited performance. Discovering new electrolytes is therefore crucial, yet time‐consuming and resource‐intensive. In response, this work presents a Large Language Model (LLM)‐based multi‐agent network that proposes DEE compositions for zinc‐ion batteries. By analyzing academic papers from the DEE field, the network identifies innovative, inexpensive, and sustainable Lewis bases to pair with Zn(BF4)2·xH2O. A Zn(BF4)2·xH2O‐ethylene carbonate (EC) system demonstrates high conductivity (10.6 mS cm−1) and a wide electrochemical stability window (2.37 V). The optimized electrolyte enables stable zinc stripping/plating, achieves outstanding rate performance (81 mAh g−1 at 5 A g−1), and supports 4000 cycles in Zn||polyaniline cells at 3 A g−1. Spectroscopic analyses and simulations reveal that EC coordinates to Zn2+, mitigating water‐induced corrosion, while a fluorine‐rich hybrid organic/inorganic solid electrolyte interphase enhances stability. This work showcases a pioneering LLM‐driven approach to electrolyte development, establishing a new paradigm in materials research.
No abstract available
Complex chemical space and limited knowledge scope with biases holds immense challenge for human scientists, yet in automated materials discovery. Existing intelligent methods relies more on numerical computation, leading to inefficient exploration and results with hard-interpretability. To bridge this gap, we introduce a principles-guided material discovery system powered by language inferential multi-agent system (MAS), namely PriM. Our framework integrates automated hypothesis generation with experimental validation in a roundtable system of MAS, enabling systematic exploration while maintaining scientific rigor. Based on our framework, the case study of nano helix demonstrates higher materials exploration rate and property value while providing transparent reasoning pathways. This approach develops an automated-and-transparent paradigm for material discovery, with broad implications for rational design of functional materials. Code is publicly available at our \href{https://github.com/amair-lab/PriM}{GitHub}.
Metal-organic framework (MOF) databases have grown rapidly through experimental deposition and large-scale literature extraction, but recent analyses show that nearly half of their entries contain substantial structural errors. These inaccuracies propagate through high-throughput screening and machine-learning workflows, limiting the reliability of data-driven MOF discovery. Correcting such errors is exceptionally difficult because true repairs require integrating crystallographic files, synthesis descriptions, and contextual evidence scattered across the literature. Here we introduce LitMOF, a large language model-driven multi-agent framework that validates crystallographic information directly from the original literature and cross-validates it with database entries to repair structural errors. Applying LitMOF to the experimental MOF database (the CSD MOF Subset), we constructed LitMOF-DB, a curated set 118,464 computation-ready structures, including corrections of 69% (6,161 MOFs) of the invalid MOFs in the latest CoRE MOF database. Additionally, the system uncovered 12,646 experimentally reported MOFs absent from existing resources, substantially expanding the known experimental design space. This work establishes a scalable pathway toward self-correcting scientific databases and a generalizable paradigm for LLM-driven curation in materials science.
Aqueous organic redox-active materials have emerged as promising alternatives to transition metal ions in redox flow batteries for large-scale energy storage, offering advantages such as structural tunability, cost efficiency, wide availability, and safety. However, progress has been restricted to a limited selection of water-soluble organic compounds. This presentation highlights a data-driven approach, featuring the self-driving ARES lab at PNNL and an integrated hypothesis-generation workflow, to accelerate the discovery and development of these materials. Key topics include database curation, structure-property prediction, and automated property characterization and performance testing.
Aqueous organic redox-active materials have emerged as promising alternatives to transition metal ions in redox flow batteries for large-scale energy storage, offering advantages such as structural tunability, cost efficiency, wide availability, and safety. However, progress has been restricted to a limited selection of water-soluble organic compounds. This presentation highlights a data-driven approach, featuring the MIRAL (Materials Innovation through Robotics and AI Lab) at PNNL and an integrated hypothesis-generation workflow, to accelerate the discovery and development of these materials. Key topics include database curation, structure-property prediction, and automated property characterization and performance testing. Reference: Yin T, Feng R, Bao J, Gao P, Liang Y, Heather J, et al. Learning Advance: Robotics-LLM Guided Hypotheses Generation for the Discovery of Chemical Knowledge. ChemRxiv. 2025; doi:10.26434/chemrxiv-2025-n1b4l
Powder X-ray diffraction (XRD) is a foundational technique for characterizing crystalline materials. However, the reliable interpretation of XRD patterns, particularly in multiphase systems, remains a manual and expertise-demanding task. As a characterization method that only provides structural information, multiple reference phases can often be fit to a single pattern, leading to potential misinterpretation when alternative solutions are overlooked. To ease humans’ efforts and address the challenge, we introduce Dara (data-driven automated Rietveld analysis), a framework designed to automate the robust identification and refinement of multiple phases from powder XRD data. Dara performs an exhaustive tree search over all plausible phase combinations within a given chemical space and validates each hypothesis using the BGMN Rietveld refinement routine. Key features include structural database filtering, automatic clustering of isostructural phases during tree expansion, and peak-matching-based scoring to identify promising phases for refinement. When ambiguity exists, Dara generates multiple hypothesis which can then be decided between by human experts or with further characterization tools. By enhancing the reliability and accuracy of phase identification, Dara enables scalable analysis of realistic complex XRD patterns and provides a foundation for integration into multimodal characterization workflows, moving toward fully self-driving materials discovery.
Materials synthesis remains a critical bottleneck in developing innovations for energy storage, catalysis, electronics, and biomedical devices. Current synthesis design relies heavily on empirical trial-and-error methods guided by expert intuition, limiting the pace of materials discovery. To address this challenge, we present AlchemyBench, a comprehensive benchmark built upon a curated dataset of 17,667 expert-verified synthesis recipes from open-access literature. AlchemyBench provides an end-to-end framework that supports research in large language models (LLMs) applied to materials synthesis prediction. The benchmark encompasses four key tasks: raw materials and equipment prediction, synthesis procedure generation, and characterization outcome forecasting. To enable scalable evaluation, we propose an LLM-as-a-Judge framework that leverages large language models for automated assessment, demonstrating strong agreement with expert evaluations (e.g., Pearson's r = 0.80, Spearman's ρ = 0.78). Our experimental results reveal that reasoning-focused models (Claude 3.7, GPT-4o) achieve scores around 4.0 on well-documented oxide and organic synthesis targets, but performance drops by approximately 0.3 points on electrochemical workflows. Fine-tuning on AlchemyBench data enables a 7B-parameter open-source model to surpass generic baselines trained on 1M samples, while retrieval-augmented generation provides an additional +0.20 improvement when supplied with five high-similarity contexts. AlchemyBench addresses a critical gap in the field by providing the first comprehensive, legally redistributable benchmark for automated materials synthesis prediction. Our contributions establish a foundation for exploring LLM capabilities in predicting and guiding materials synthesis, ultimately accelerating experimental design and innovation in materials science.
The capacity of Large Language Models (LLMs) to generate valid scientific hypotheses for materials synthesis remains largely unquantified, hindered by the absence of benchmarks probing physicochemical logics reasoning. To address this, we introduce MatterMech, a benchmark for evaluating LLM-generated hypotheses across eight nanomaterial synthesis domains. Our analysis reveals a critical disconnect: LLMs are proficient in abstract logic yet fail to ground their reasoning in fundamental physicochemical principles. We demonstrate that our proposed principle-aware prompting methodology substantially outperforms standard Chain-of-Thought, enhancing both hypothesis accuracy and computational efficiency. This work provides a methodological framework to advance LLMs toward reliable scientific hypothesis generation in materials science. The MatterMech benchmark and associated code is publicly available at \href{https://github.com/amair-lab/MatterMech}{GitHub}.
No abstract available
To retrieve and compare scientific data of simulations and experiments in materials science, data needs to be easily accessible and machine readable to qualify and quantify various materials science phenomena. The recent progress in open science leverages the accessibility to data. However, a majority of information is encoded within scientific documents limiting the capability of finding suitable literature as well as material properties. This manuscript showcases an automated workflow, which unravels the encoded information from scientific literature to a machine readable data structure of texts, figures, tables, equations and meta-data, using natural language processing and language as well as vision transformer models to generate a machine-readable database. The machine-readable database can be enriched with local data, as e.g. unpublished or private material data, leading to knowledge synthesis. The study shows that such an automated workflow accelerates information retrieval, proximate context detection and material property extraction from multi-modal input data exemplarily shown for the research field of microstructural analyses of face-centered cubic single crystals. Ultimately, a Retrieval-Augmented Generation (RAG) based Large Language Model (LLM) enables a fast and efficient question answering chat bot.
No abstract available
Large language models have been extensively employed for scientific research from different aspects, yet their performance is often limited by gaps in highly specialized knowledge. To bridge this divide, in this perspective we take phosphor materials for white LED applications as a model system and construct a domain-specific knowledge base that couples retrieval-augmented generation with a numerical-querying model context protocol. By automatically extracting and structuring data from more than 5400 publications-including chemical compositions, crystallographic parameters, excitation-emission wavelengths, and synthesis conditions-we construct an artificial-intelligence agent that delivers both broad semantic search and exact parameter lookup, each answer accompanied by verifiable references. This hybrid approach mitigates hallucinations, and improves recall and precision in expert-level question-answering. Finally, we outline how linking this curated corpus to lightweight machine-learning models and even automated experimental synthesis facilities can close the loop from target specification to experimental validation, offering a blueprint for accelerated materials discovery.
The chapter discusses the foundational impact of modern generative AI models on information access (IA) systems. In contrast to traditional AI, the large-scale training and superior data modeling of generative AI models enable them to produce high-quality, human-like responses, which brings brand new opportunities for the development of IA paradigms. In this chapter, we identify and introduce two of them in details, i.e., information generation and information synthesis. Information generation allows AI to create tailored content addressing user needs directly, enhancing user experience with immediate, relevant outputs. Information synthesis leverages the ability of generative AI to integrate and reorganize existing information, providing grounded responses and mitigating issues like model hallucination, which is particularly valuable in scenarios requiring precision and external knowledge. This chapter delves into the foundational aspects of generative models, including architecture, scaling, and training, and discusses their applications in multi-modal scenarios. Additionally, it examines the retrieval-augmented generation paradigm and other methods for corpus modeling and understanding, demonstrating how generative AI can enhance information access systems. It also summarizes potential challenges and fruitful directions for future studies.
Large language models (LLMs) leverage chain-of-thought (CoT) techniques to tackle complex problems, representing a transformative breakthrough in artificial intelligence (AI). However, their reasoning capabilities have primarily been demonstrated in solving math and coding problems, leaving their potential for domain-specific applications-such as battery discovery-largely unexplored. Inspired by the idea that reasoning mirrors a form of guided search, we introduce ChatBattery, a novel agentic framework that integrates domain knowledge to steer LLMs toward more effective reasoning in materials design. Using ChatBattery, we successfully identify, synthesize, and characterize three novel lithium-ion battery cathode materials, which achieve practical capacity improvements of 28.8%, 25.2%, and 18.5%, respectively, over the widely used cathode material, LiNi0.8Mn0.1Co0.1O2 (NMC811). Beyond this discovery, ChatBattery paves a new path by showing a successful LLM-driven and reasoning-based platform for battery materials invention. This complete AI-driven cycle-from design to synthesis to characterization-demonstrates the transformative potential of AI-driven reasoning in revolutionizing materials discovery.
The convergence of artificial intelligence and materials science presents a transformative opportunity, but achieving true acceleration in discovery requires moving beyond task-isolated, fine-tuned models toward agentic systems that plan, act, and learn across the full discovery loop. This survey advances a unique pipeline-centric view that spans from corpus curation and pretraining, through domain adaptation and instruction tuning, to goal-conditioned agents interfacing with simulation and experimental platforms. Unlike prior reviews, we treat the entire process as an end-to-end system to be optimized for tangible discovery outcomes rather than proxy benchmarks. This perspective allows us to trace how upstream design choices-such as data curation and training objectives-can be aligned with downstream experimental success through effective credit assignment. To bridge communities and establish a shared frame of reference, we first present an integrated lens that aligns terminology, evaluation, and workflow stages across AI and materials science. We then analyze the field through two focused lenses: From the AI perspective, the survey details LLM strengths in pattern recognition, predictive analytics, and natural language processing for literature mining, materials characterization, and property prediction; from the materials science perspective, it highlights applications in materials design, process optimization, and the acceleration of computational workflows via integration with external tools (e.g., DFT, robotic labs). Finally, we contrast passive, reactive approaches with agentic design, cataloging current contributions while motivating systems that pursue long-horizon goals with autonomy, memory, and tool use. This survey charts a practical roadmap towards autonomous, safety-aware LLM agents aimed at discovering novel and useful materials.
This community paper developed out of the NSF Workshop on the Future of Artificial Intelligence (AI) and the Mathematical and Physics Sciences (MPS), which was held in March 2025 with the goal of understanding how the MPS domains (Astronomy, Chemistry, Materials Research, Mathematical Sciences, and Physics) can best capitalize on, and contribute to, the future of AI. We present here a summary and snapshot of the MPS community's perspective, as of Spring/Summer 2025, in a rapidly developing field. The link between AI and MPS is becoming increasingly inextricable; now is a crucial moment to strengthen the link between AI and Science by pursuing a strategy that proactively and thoughtfully leverages the potential of AI for scientific discovery and optimizes opportunities to impact the development of AI by applying concepts from fundamental science. To achieve this, we propose activities and strategic priorities that: (1) enable AI+MPS research in both directions; (2) build up an interdisciplinary community of AI+MPS researchers; and (3) foster education and workforce development in AI for MPS researchers and students. We conclude with a summary of suggested priorities for funding agencies, educational institutions, and individual researchers to help position the MPS community to be a leader in, and take full advantage of, the transformative potential of AI+MPS.
We introduce a multicrossmodal LLM-agent framework motivated by the growing volume and diversity of materials-science data ranging from high-resolution microscopy and dynamic simulation videos to tabular experiment logs and sprawling literature archives. While recent AI efforts have accelerated individual tasks such as property prediction or image classification, they typically treat each modality in isolation, leaving rich cross-modal correlations unexplored and forcing researchers to perform laborious manual integration. Moreover, existing multimodal foundation models often require expensive retraining or fine-tuning on domain data, and current multi-agent systems in materials informatics address only narrow subtasks. To overcome these obstacles, we design a coordinated team of specialized LLM agents, each equipped with domain-adapted prompts and plugins that project their outputs into a shared embedding space. A dynamic gating mechanism then weights and merges these insights, enabling unified reasoning over heterogeneous inputs without ever modifying the underlying LLM weights. We validate our approach on challenging case studies and demonstrate substantial gains in retrieval accuracy (85%), captioning fidelity, and integrated coverage (35%) compared to single-modality and zero-shot baselines. Our work paves the way for AI digital researchers capable of bridging data silos and accelerating the materials-discovery cycle. The code is available at https://github.com/adibgpt/Multicrossmodal-Autonomous-Materials-Science-Agent.
Recent advances in large language models (LLMs) have shown great potential to accelerate drug discovery. However, the specialized nature of biochemical data often necessitates costly domain-specific fine-tuning, posing major challenges. First, it hinders the application of more flexible general-purpose LLMs for cutting-edge drug discovery tasks. More importantly, it limits the rapid integration of the vast amounts of scientific data continuously generated through experiments and research. Compounding these challenges is the fact that real-world scientific questions are typically complex and open-ended, requiring reasoning beyond pattern matching or static knowledge retrieval.To address these challenges, we propose CLADD, a retrieval-augmented generation (RAG)-empowered agentic system tailored to drug discovery tasks. Through the collaboration of multiple LLM agents, CLADD dynamically retrieves information from biomedical knowledge bases, contextualizes query molecules, and integrates relevant evidence to generate responses - all without the need for domain-specific fine-tuning. Crucially, we tackle key obstacles in applying RAG workflows to biochemical data, including data heterogeneity, ambiguity, and multi-source integration. We demonstrate the flexibility and effectiveness of this framework across a variety of drug discovery tasks, showing that it outperforms general-purpose and domain-specific LLMs as well as traditional deep learning approaches. Our code is publicly available at https://github.com/Genentech/CLADD.
There has been unprecedented interest in developing agents that expand the boundary of scientific discovery, primarily by optimizing quantitative objective functions specified by scientists. However, for grand challenges in science , these objectives are only imperfect proxies. We argue that automating objective function design is a central, yet unmet requirement for scientific discovery agents. In this work, we introduce the Scientific Autonomous Goal-evolving Agent (SAGA) to amend this challenge. SAGA employs a bi-level architecture in which an outer loop of LLM agents analyzes optimization outcomes, proposes new objectives, and converts them into computable scoring functions, while an inner loop performs solution optimization under the current objectives. This bi-level design enables systematic exploration of the space of objectives and their trade-offs, rather than treating them as fixed inputs. We demonstrate the framework through a broad spectrum of applications, including antibiotic design, inorganic materials design, functional DNA sequence design, and chemical process design, showing that automating objective formulation can substantially improve the effectiveness of scientific discovery agents.
Conventional machine learning approaches accelerate inorganic materials design via accurate property prediction and targeted material generation, yet they operate as single-shot models limited by the latent knowledge baked into their training data. A central challenge lies in creating an intelligent system capable of autonomously executing the full inorganic materials discovery cycle, from ideation and planning to experimentation and iterative refinement. We introduce SparksMatter, a multi-agent AI model for automated inorganic materials design that addresses user queries by generating ideas, designing and executing experimental workflows, continuously evaluating and refining results, and ultimately proposing candidate materials that meet the target objectives. SparksMatter also critiques and improves its own responses, identifies research gaps and limitations, and suggests rigorous follow-up validation steps, including DFT calculations and experimental synthesis and characterization, embedded in a well-structured final report. The model's performance is evaluated across case studies in thermoelectrics, semiconductors, and perovskite oxides materials design. The results demonstrate the capacity of SparksMatter to generate novel stable inorganic structures that target the user's needs. Benchmarking against frontier models reveals that SparksMatter consistently achieves higher scores in relevance, novelty, and scientific rigor, with a significant improvement in novelty across multiple real-world design tasks as assessed by a blinded evaluator. These results demonstrate SparksMatter's unique capacity to generate chemically valid, physically meaningful, and creative inorganic materials hypotheses beyond existing materials knowledge.
Computing has long served as a cornerstone of scientific discovery. Recently, a paradigm shift has emerged with the rise of large language models (LLMs), introducing autonomous systems, referred to as agents, that accelerate discovery across varying levels of autonomy. These language agents provide a flexible and versatile framework that orchestrates interactions with human scientists, natural language, computer language and code, and physics. This paper presents our view and vision of LLM-based scientific agents and their growing role in transforming the scientific discovery lifecycle, from hypothesis discovery, experimental design and execution, to result analysis and refinement. We critically examine current methodologies, emphasizing key innovations, practical achievements, and outstanding limitations. Additionally, we identify open research challenges and outline promising directions for building more robust, generalizable, and adaptive scientific agents. Our analysis highlights the transformative potential of autonomous agents to accelerate scientific discovery across diverse domains.
The discovery of high-performance materials is crucial for technological advancement. Inverse design using multi-agent systems (MAS) shows great potential for new material discovery. However, current MAS for materials research rely on predefined configurations and tools, limiting their adaptability and scalability. To address these limitations, we developed a planner driven multi-agent system (S1-MatAgent) which adopts a Planner-Executor architecture. Planner automatically decomposes complex materials design tasks, dynamically configures various tools to generate dedicated Executor agents for each subtask, significantly reducing reliance on manual workflow construction and specialized configuration. Applied to high-entropy alloy catalysts for hydrogen evolution reactions in alkaline conditions, S1-MatAgent completed full-cycle closed-loop design from literature analysis and composition recommendation to performance optimization and experimental validation. To tackle the deviations between designed materials and target, as well as high experimental verification costs, S1-MatAgent employs a novel composition optimization algorithm based on gradients of machine learning interatomic potential, achieving 27.7 % improvement in material performance. S1-MatAgent designed 13 high-performance catalysts from 20 million candidates, with Ni4Co4Cu1Mo3Ru4 exhibiting an overpotential of 18.6 mV at 10 mA cm-2 and maintaining 97.5 % activity after 500 hours at 500 mA cm-2. The universal MAS framework offers a universal and scalable solution for material discovery, significantly improving design efficiency and adaptability.
True intelligence requires active capability acquisition, yet current LLM agents inject pre-defined tool schemas into prompts, reducing models to passive selectors and falling short of robust general-purpose agency. We introduce MCP-Zero, an active agent framework that restores tool discovery autonomy to LLMs themselves. Instead of overwhelming models with all available tools, MCP-Zero enables agents to actively identify capability gaps, and request specific tools on-demand, transforming them from large-scale retrievers into genuine autonomous agents. The framework operates through three core mechanisms: (1) Active Tool Request, where models autonomously generate structured requests specifying their exact tool requirements; (2) Hierarchical Semantic Routing, a two-stage algorithm that matches requests to relevant servers and tools through improved semantic alignment; (3) Iterative Capability Extension, enabling agents to progressively build cross-domain toolchains while maintaining minimal context footprint. We construct MCP-tools, a comprehensive dataset of 308 MCP servers and 2,797 tools from the official Model-Context-Protocol repository. Experiments demonstrate that MCP-Zero preserves agent autonomy while achieving substantial efficiency gains: (i) accurate tool selection from nearly 3k candidates across 248.1k tokens; (ii) 98\% reduction in token consumption on APIBank while maintaining high accuracy; and (iii) consistent multi-turn performance that scales with tool ecosystem growth. This work establishes active tool discovery as a fundamental design pattern for scalable autonomous agent systems.
AI-powered autonomous experimentation (AI/AE) can accelerate materials discovery but its effectiveness for electronic materials is hindered by data scarcity from lengthy and complex design-fabricate-test-analyze cycles. Unlike experienced human scientists, even advanced AI algorithms in AI/AE lack the adaptability to make informative real-time decisions with limited datasets. Here, we address this challenge by developing and implementing an AI decision interface on our AI/AE system. The central element of the interface is an AI advisor that performs real-time progress monitoring, data analysis, and interactive human-AI collaboration for actively adapting to experiments in different stages and types. We applied this platform to an emerging type of electronic materials-mixed ion-electron conducting polymers (MIECPs) -- to engineer and study the relationships between multiscale morphology and properties. Using organic electrochemical transistors (OECT) as the testing-bed device for evaluating the mixed-conducting figure-of-merit -- the product of charge-carrier mobility and the volumetric capacitance (μC*), our adaptive AI/AE platform achieved a 150% increase in μC* compared to the commonly used spin-coating method, reaching 1,275 F cm-1 V-1 s-1 in just 64 autonomous experimental trials. A study of 10 statistically selected samples identifies two key structural factors for achieving higher volumetric capacitance: larger crystalline lamellar spacing and higher specific surface area, while also uncovering a new polymer polymorph in this material.
The Discovery Engine is a general purpose automated system for scientific discovery, which combines machine learning with state-of-the-art ML interpretability to enable rapid and robust scientific insight across diverse datasets. In this paper, we benchmark the Discovery Engine against five recent peer-reviewed scientific publications applying machine learning across medicine, materials science, social science, and environmental science. In each case, the Discovery Engine matches or exceeds prior predictive performance while also generating deeper, more actionable insights through rich interpretability artefacts. These results demonstrate its potential as a new standard for automated, interpretable scientific modelling that enables complex knowledge discovery from data.
Rapid and reliable qualification of advanced materials remains a bottleneck in industrial manufacturing, particularly for heterogeneous structures produced via non-conventional additive manufacturing processes. This study introduces a novel framework that links microstructure informatics with a range of expert characterization knowledge using customized and hybrid vision-language representations (VLRs). By integrating deep semantic segmentation with pre-trained multi-modal models (CLIP and FLAVA), we encode both visual microstructural data and textual expert assessments into shared representations. To overcome limitations in general-purpose embeddings, we develop a customized similarity-based representation that incorporates both positive and negative references from expert-annotated images and their associated textual descriptions. This allows zero-shot classification of previously unseen microstructures through a net similarity scoring approach. Validation on an additively manufactured metal matrix composite dataset demonstrates the framework's ability to distinguish between acceptable and defective samples across a range of characterization criteria. Comparative analysis reveals that FLAVA model offers higher visual sensitivity, while the CLIP model provides consistent alignment with the textual criteria. Z-score normalization adjusts raw unimodal and cross-modal similarity scores based on their local dataset-driven distributions, enabling more effective alignment and classification in the hybrid vision-language framework. The proposed method enhances traceability and interpretability in qualification pipelines by enabling human-in-the-loop decision-making without task-specific model retraining. By advancing semantic interoperability between raw data and expert knowledge, this work contributes toward scalable and domain-adaptable qualification strategies in engineering informatics.
Two-dimensional (2D) materials have emerged as a versatile and powerful platform for quantum technologies, offering atomic-scale control, strong quantum confinement, and seamless integration into heterogeneous device architectures. Their reduced dimensionality enables unique quantum phenomena, including optically addressable spin defects, tunable single-photon emitters, low-dimensional magnetism, gate-controlled superconductivity, and correlated states in Moiré superlattices. This Roadmap provides a comprehensive overview of recent progress and future directions in exploiting 2D materials for quantum sensing, computation, communication, and simulation. We survey advances spanning spin defects and quantum sensing, quantum emitters and nonlinear photonics, computational theory and data-driven discovery of quantum defects, spintronic and magnonic devices, cavity-engineered quantum materials, superconducting and hybrid quantum circuits, quantum dots, Moiré quantum simulators, and quantum communication platforms. Across these themes, we identify common challenges in defect control, coherence preservation, interfacial engineering, and scalable integration, alongside emerging opportunities driven by machine$-$learning$-$assisted design and integrated experiment$-$theory feedback loops. By connecting microscopic quantum states to mesoscopic excitations and macroscopic device architectures, this Roadmap outlines a materials-centric framework for integrating coherent quantum functionalities and positions 2D materials as foundational building blocks for next-generation quantum technologies.
This chapter argues that the reliability of agentic and generative AI is chiefly an architectural property. We define agentic systems as goal-directed, tool-using decision makers operating in closed loops, and show how reliability emerges from principled componentisation (goal manager, planner, tool-router, executor, memory, verifiers, safety monitor, telemetry), disciplined interfaces (schema-constrained, validated, least-privilege tool calls), and explicit control and assurance loops. Building on classical foundations, we propose a practical taxonomy-tool-using agents, memory-augmented agents, planning and self-improvement agents, multi-agent systems, and embodied or web agents - and analyse how each pattern reshapes the reliability envelope and failure modes. We distil design guidance on typed schemas, idempotency, permissioning, transactional semantics, memory provenance and hygiene, runtime governance (budgets, termination conditions), and simulate-before-actuate safeguards.
This chapter presents perspectives for challenges and future development in building reliable AI systems, particularly, agentic AI systems. Several open research problems related to mitigating the risks of cascading failures are discussed. The chapter also sheds lights on research challenges and opportunities in aspects including dynamic environments, inconsistent task execution, unpredictable emergent behaviors, as well as resource-intensive reliability mechanisms. In addition, several research directions along the line of testing and evaluating reliability of agentic AI systems are also discussed.
Large Language Models (LLMs) and other foundation models are increasingly used as the core of AI agents. In agentic workflows, these agents plan tasks, interact with humans and peers, and influence scientific outcomes across federated and heterogeneous environments. However, agents can hallucinate or reason incorrectly, propagating errors when one agent's output becomes another's input. Thus, assuring that agents' actions are transparent, traceable, reproducible, and reliable is critical to assess hallucination risks and mitigate their workflow impacts. While provenance techniques have long supported these principles, existing methods fail to capture and relate agent-centric metadata such as prompts, responses, and decisions with the broader workflow context and downstream outcomes. In this paper, we introduce PROV-AGENT, a provenance model that extends W3C PROV and leverages the Model Context Protocol (MCP) and data observability to integrate agent interactions into end-to-end workflow provenance. Our contributions include: (1) a provenance model tailored for agentic workflows, (2) a near real-time, open-source system for capturing agentic provenance, and (3) a cross-facility evaluation spanning edge, cloud, and HPC environments, demonstrating support for critical provenance queries and agent reliability analysis.
Corpus distillation for biomedical large language models (LLMs) seeks to address the pressing challenge of insufficient quantity and quality in open-source annotated scientific corpora, which remains a bottleneck for effective LLM training in biomedical research. This paper proposes a knowledge-driven, agentic framework for scientific corpus distillation, tailored explicitly for LLM training in the biomedical domain, addressing the challenge posed by the complex hierarchy of biomedical knowledge. Central to our approach is a collaborative multi-agent architecture, where specialized agents, each guided by the Medical Subject Headings (MeSH) hierarchy, work in concert to autonomously extract, synthesize, and self-evaluate high-quality textual data from vast scientific literature. This agentic framework collectively generates and refines domain-specific question-answer pairs, ensuring comprehensive coverage and consistency with biomedical ontologies while minimizing manual involvement. Extensive experimental results show that language models trained on our multi-agent distilled datasets achieve notable improvements in biomedical question-answering tasks, outperforming both strong life sciences LLM baselines and advanced proprietary models. Notably, our AI-Ready dataset enables Llama3-70B to surpass GPT-4 with MedPrompt and Med-PaLM-2, despite their larger scale. Detailed ablation studies and case analyses further validate the effectiveness and synergy of each agent within the framework, highlighting the potential of multi-agent collaboration in biomedical LLM training.
Agentic reinforcement learning has advanced large language models (LLMs) to reason through long chain-of-thought trajectories while interleaving external tool use. Existing approaches assume a fixed inventory of tools, limiting LLM agents' adaptability to new or evolving toolsets. We present AutoTool, a framework that equips LLM agents with dynamic tool-selection capabilities throughout their reasoning trajectories. We first construct a 200k dataset with explicit tool-selection rationales across 1,000+ tools and 100+ tasks spanning mathematics, science, code generation, and multimodal reasoning. Building on this data foundation, AutoTool employs a dual-phase optimization pipeline: (i) supervised and RL-based trajectory stabilization for coherent reasoning, and (ii) KL-regularized Plackett-Luce ranking to refine consistent multi-step tool selection. Across ten diverse benchmarks, we train two base models, Qwen3-8B and Qwen2.5-VL-7B, with AutoTool. With fewer parameters, AutoTool consistently outperforms advanced LLM agents and tool-integration methods, yielding average gains of 6.4% in math & science reasoning, 4.5% in search-based QA, 7.7% in code generation, and 6.9% in multimodal understanding. In addition, AutoTool exhibits stronger generalization by dynamically leveraging unseen tools from evolving toolsets during inference.
Artificial intelligence (AI) is reshaping scientific discovery, evolving from specialized computational tools into autonomous research partners. We position Agentic Science as a pivotal stage within the broader AI for Science paradigm, where AI systems progress from partial assistance to full scientific agency. Enabled by large language models (LLMs), multimodal systems, and integrated research platforms, agentic AI shows capabilities in hypothesis generation, experimental design, execution, analysis, and iterative refinement -- behaviors once regarded as uniquely human. This survey provides a domain-oriented review of autonomous scientific discovery across life sciences, chemistry, materials science, and physics. We unify three previously fragmented perspectives -- process-oriented, autonomy-oriented, and mechanism-oriented -- through a comprehensive framework that connects foundational capabilities, core processes, and domain-specific realizations. Building on this framework, we (i) trace the evolution of AI for Science, (ii) identify five core capabilities underpinning scientific agency, (iii) model discovery as a dynamic four-stage workflow, (iv) review applications across the above domains, and (v) synthesize key challenges and future opportunities. This work establishes a domain-oriented synthesis of autonomous scientific discovery and positions Agentic Science as a structured paradigm for advancing AI-driven research.
As agents based on large language models are increasingly deployed to long-horizon tasks, maintaining their alignment with stakeholder preferences becomes critical. Effective alignment in such settings requires reward models that are interpretable so that stakeholders can understand and audit model objectives. Moreover, reward models must be capable of steering agents at interaction time, allowing preference shifts to be incorporated without retraining. We introduce ARCANE, a framework that frames alignment as a multi-agent collaboration problem that dynamically represents stakeholder preferences as natural-language rubrics: weighted sets of verifiable criteria that can be generated on-the-fly from task context. Inspired by utility theory, we formulate rubric learning as a reconstruction problem and apply a regularized Group-Sequence Policy Optimization (GSPO) procedure that balances interpretability, faithfulness, and computational efficiency. Using a corpus of 219 labeled rubrics derived from the GDPVal benchmark, we evaluate ARCANE on challenging tasks requiring multi-step reasoning and tool use. The learned rubrics produce compact, legible evaluations and enable configurable trade-offs (e.g., correctness vs. conciseness) without retraining. Our results show that rubric-based reward models offer a promising path toward interpretable, test-time adaptive alignment for complex, long-horizon AI systems.
Large language models accumulate vast knowledge during pre-training, yet the dynamics governing this acquisition remain poorly understood. This work investigates the learning dynamics of language models on a synthetic factual recall task, uncovering three key findings: First, language models learn in three phases, exhibiting a performance plateau before acquiring precise factual knowledge. Mechanistically, this plateau coincides with the formation of attention-based circuits that support recall. Second, the training data distribution significantly impacts learning dynamics, as imbalanced distributions lead to shorter plateaus. Finally, hallucinations emerge simultaneously with knowledge, and integrating new knowledge into the model through fine-tuning is challenging, as it quickly corrupts its existing parametric memories. Our results emphasize the importance of data distribution in knowledge acquisition and suggest novel data scheduling strategies to accelerate neural network training.
Reinforcement learning (RL) has become a key technique for enhancing the reasoning abilities of large language models (LLMs), with policy-gradient algorithms dominating the post-training stage because of their efficiency and effectiveness. However, most existing benchmarks evaluate large-language-model reasoning under idealized settings, overlooking performance in realistic, non-ideal scenarios. We identify three representative non-ideal scenarios with practical relevance: summary inference, fine-grained noise suppression, and contextual filtering. We introduce a new research direction guided by brain-science findings that human reasoning remains reliable under imperfect inputs. We formally define and evaluate these challenging scenarios. We fine-tune three LLMs and a state-of-the-art large vision-language model (LVLM) using RL with a representative policy-gradient algorithm and then test their performance on eight public datasets. Our results reveal that while RL fine-tuning improves baseline reasoning under idealized settings, performance declines significantly across all three non-ideal scenarios, exposing critical limitations in advanced reasoning capabilities. Although we propose a scenario-specific remediation method, our results suggest current methods leave these reasoning deficits largely unresolved. This work highlights that the reasoning abilities of large models are often overstated and underscores the importance of evaluating models under non-ideal scenarios. The code and data will be released at XXXX.
Large Language Models (LLMs) possess extensive knowledge and commonsense reasoning capabilities, making them valuable for creating powerful agents. However, existing LLM agent frameworks have not fully utilized past experiences for improvement. This work introduces a new LLM-based agent framework called Retrospex, which addresses this challenge by analyzing past experiences in depth. Unlike previous approaches, Retrospex does not directly integrate experiences into the LLM's context. Instead, it combines the LLM's action likelihood with action values estimated by a Reinforcement Learning (RL) Critic, which is trained on past experiences through an offline ''retrospection'' process. Additionally, Retrospex employs a dynamic action rescoring mechanism that increases the importance of experience-based values for tasks that require more interaction with the environment. We evaluate Retrospex in ScienceWorld, ALFWorld and Webshop environments, demonstrating its advantages over strong, contemporary baselines.
Autonomous data science, from raw data sources to analyst-grade deep research reports, has been a long-standing challenge, and is now becoming feasible with the emergence of powerful large language models (LLMs). Recent workflow-based data agents have shown promising results on specific data tasks but remain fundamentally limited in achieving fully autonomous data science due to their reliance on predefined workflows. In this paper, we introduce DeepAnalyze-8B, the first agentic LLM designed for autonomous data science, capable of automatically completing the end-toend pipeline from data sources to analyst-grade deep research reports. To tackle high-complexity data science tasks, we propose a curriculum-based agentic training paradigm that emulates the learning trajectory of human data scientists, enabling LLMs to progressively acquire and integrate multiple capabilities in real-world environments. We also introduce a data-grounded trajectory synthesis framework that constructs high-quality training data. Through agentic training, DeepAnalyze learns to perform a broad spectrum of data tasks, ranging from data question answering and specialized analytical tasks to open-ended data research. Experiments demonstrate that, with only 8B parameters, DeepAnalyze outperforms previous workflow-based agents built on most advanced proprietary LLMs. The model, code, and training data of DeepAnalyze are open-sourced, paving the way toward autonomous data science.
Scientific discovery is being revolutionized by AI and autonomous systems, yet current autonomous laboratories remain isolated islands unable to collaborate across institutions. We present the Autonomous Interconnected Science Lab Ecosystem (AISLE), a grassroots network transforming fragmented capabilities into a unified system that shorten the path from ideation to innovation to impact and accelerates discovery from decades to months. AISLE addresses five critical dimensions: (1) cross-institutional equipment orchestration, (2) intelligent data management with FAIR compliance, (3) AI-agent driven orchestration grounded in scientific principles, (4) interoperable agent communication interfaces, and (5) AI/ML-integrated scientific education. By connecting autonomous agents across institutional boundaries, autonomous science can unlock research spaces inaccessible to traditional approaches while democratizing cutting-edge technologies. This paradigm shift toward collaborative autonomous science promises breakthroughs in sustainable energy, materials development, and public health.
Active learning (AL) plays a critical role in materials science, enabling applications such as the construction of machine-learning interatomic potentials for atomistic simulations and the operation of self-driving laboratories. Despite its widespread use, the reliability and effectiveness of AL workflows depend on implicit design assumptions that are rarely examined systematically. Here, we critically assess AL workflows deployed in materials science and investigate how key design choices, such as surrogate models, sampling strategies, uncertainty quantification and evaluation metrics, relate to their performance. By identifying common pitfalls and discussing practical mitigation strategies, we provide guidance to practitioners for the efficient design, assessment, and interpretation of AL workflows in materials science.
Autonomous scientific research, capable of independently conducting complex experiments and serving non-specialists, represents a long-held aspiration. Achieving it requires a fundamental paradigm shift driven by artificial intelligence (AI). While autonomous experimental systems are emerging, they remain confined to areas featuring singular objectives and well-defined, simple experimental workflows, such as chemical synthesis and catalysis. We present an AI-native autonomous laboratory, targeting highly complex scientific experiments for applications like autonomous biomolecular engineering. This system autonomously manages instrumentation, formulates experiment-specific procedures and optimization heuristics, and concurrently serves multiple user requests. Founded on a co-design philosophy of models, experiments, and instruments, the platform supports the co-evolution of AI models and the automation system. This establishes an end-to-end, multi-user autonomous laboratory that handles complex, multi-objective experiments across diverse instrumentation. Our autonomous laboratory supports fundamental nucleic acid functions-including synthesis, transcription, amplification, and sequencing. It also enables applications in fields such as disease diagnostics, drug development, and information storage. Without human intervention, it autonomously optimizes experimental performance to match state-of-the-art results achieved by human scientists. In multi-user scenarios, the platform significantly improves instrument utilization and experimental efficiency. This platform paves the way for advanced biomaterials research to overcome dependencies on experts and resource barriers, establishing a blueprint for science-as-a-service at scale.
Large language models (LLMs) are increasingly applied to materials science questions, including literature comprehension, property prediction, materials discovery and alloy design. At the same time, a wide range of physics-based computational approaches have been developed in which materials properties can be calculated. Here, we propose a benchmark application to evaluate the proficiency of LLMs to answer materials science questions through the generation and safe execution of codes based on such physics-based computational materials science packages. MatTools is built on two complementary components: a materials simulation tool question-answer (QA) benchmark and a real-world tool-usage benchmark. We designed an automated methodology to efficiently collect real-world materials science tool-use examples. The QA benchmark, derived from the pymatgen (Python Materials Genomics) codebase and documentation, comprises 69,225 QA pairs that assess the ability of an LLM to understand materials science tools. The real-world benchmark contains 49 tasks (138 subtasks) requiring the generation of functional Python code for materials property calculations. Our evaluation of diverse LLMs yields three key insights: (1)Generalists outshine specialists;(2)AI knows AI; and (3)Simpler is better. MatTools provides a standardized framework for assessing and improving LLM capabilities for materials science tool applications, facilitating the development of more effective AI systems for materials science and general scientific research.
Autonomous laboratories promise to accelerate discovery by coupling learning algorithms with robotic experimentation, yet adoption remains limited by fragmented software that separates high-level planning from low-level execution. Here we present UniLabOS, an AI-native operating system for autonomous laboratories that bridges digital decision-making and embodied experimentation through typed, stateful abstractions and transactional safeguards. UniLabOS unifies laboratory elements via an Action/Resource/Action&Resource (A/R/A&R) model, represents laboratory structure with a dual-topology of logical ownership and physical connectivity, and reconciles digital state with material motion using a transactional CRUTD protocol. Built on a distributed edge-cloud architecture with decentralized discovery, UniLabOS enables protocol mobility across reconfigurable topologies while supporting human-in-the-loop governance. We demonstrate the system in four real-world settings -- a liquid-handling workstation, a modular organic synthesis platform, a distributed electrolyte foundry, and a decentralized computation-intensive closed-loop system -- showing robust orchestration across heterogeneous instruments and multi-node coordination. UniLabOS establishes a scalable foundation for agent-ready, reproducible, and provenance-aware autonomous experimentation.
Autonomous experimentation holds the potential to accelerate materials development by combining artificial intelligence (AI) with modular robotic platforms to explore extensive combinatorial chemical and processing spaces. Such self-driving laboratories can not only increase the throughput of repetitive experiments, but also incorporate human domain expertise to drive the search towards user-defined objectives, including improved materials performance metrics. We present an autonomous materials synthesis extension to SARA, the Scientific Autonomous Reasoning Agent, utilizing phase information provided by an automated probabilistic phase labeling algorithm to expedite the search for targeted phase regions. By incorporating human input into an expanded SARA-H (SARA with human-in-the-loop) framework, we enhance the efficiency of the underlying reasoning process. Using synthetic benchmarks, we demonstrate the efficiency of our AI implementation and show that the human input can contribute to significant improvement in sampling efficiency. We conduct experimental active learning campaigns using robotic processing of thin-film samples of several oxide material systems, including Bi$_2$O$_3$, SnO$_x$, and Bi-Ti-O, using lateral-gradient laser spike annealing to synthesize and kinetically trap metastable phases. We showcase the utility of human-in-the-loop autonomous experimentation for the Bi-Ti-O system, where we identify extensive processing domains that stabilize $δ$-Bi$_2$O$_3$ and Bi$_2$Ti$_2$O$_7$, explore dwell-dependent ternary oxide phase behavior, and provide evidence confirming predictions that cationic substitutional doping of TiO$_2$ with Bi inhibits the unfavorable transformation of the metastable anatase to the ground-state rutile phase. The autonomous methods we have developed enable the discovery of new materials and new understanding of materials synthesis and properties.
Emerging materials science platforms with the ability to make autonomous decisions on the fly are fundamentally changing the outlook and protocols for materials optimization and discovery. Because AI-driven self-navigating schemes can effectively reduce the total number of iterations needed to arrive at the "answer" (i.e. the best stochiometric composition for a desired physical property, optimum materials processing parameters, etc.) by significant margins, they have the potential to revolutionize materials and chemical manufacturing processes at large in research laboratory settings as well as in industrial plants. Here, we demonstrate a successful implementation of real-time closed-loop autonomous navigation of a multi-dimensional materials synthesis parameter space for fabricating phase-pure epitaxial films of a metastable phase of a functional oxide in a combinatorial pulsed laser deposition chamber. Sequential epitaxial growth iterations in search of the optimized recipe to stabilize the desired crystal phase were performed using frame-by-frame quantitative computer vision analysis of reflection high-energy electron diffraction (RHEED) images of the unit-cell level film being deposited. The autonomous scheme regularly resulted in > 30-fold reduction in the number of required experiments compared to a comprehensive mapping of the parameter space. The real-time workflow developed here can be readily extended to a variety of thin film synthesis platforms opening the door for self-driving atomic-level materials design as well as autonomous optimization of semiconductor manufacturing.
Multi-agent LLM frameworks are widely used to accelerate the development of agent systems powered by large language models (LLMs). These frameworks impose distinct architectural structures that govern how agents interact, store information, and coordinate tasks. However, their impact on system performance remains poorly understood. This gap is critical, as architectural choices alone can induce order-of-magnitude differences in latency and throughput, as well as substantial variation in accuracy and scalability. Addressing this challenge requires (i) jointly evaluating multiple capabilities, such as orchestration overhead, memory behavior, planning, specialization, and coordination, and (ii) conducting these evaluations under controlled, framework-level conditions to isolate architectural effects. Existing benchmarks focus on individual capabilities and lack standardized framework-level evaluation. We address these limitations by (i) introducing an architectural taxonomy for systematically comparing multi-agent LLM frameworks along fundamental dimensions, and (ii) developing MAFBench, a unified evaluation suite that integrates existing benchmarks under a standardized execution pipeline. Using MAFBench, we conduct a controlled empirical study across several widely used frameworks. Our results show that framework-level design choices alone can increase latency by over 100x, reduce planning accuracy by up to 30%, and lower coordination success from above 90% to below 30%. Finally, we translate our findings into concrete architectural design principles and framework selection guidance, and outline promising future research directions.
Advances in large language models (LLMs) have created new opportunities in data science, but their deployment is often limited by the challenge of finding relevant data in large data lakes. Existing methods struggle with this: both single- and multi-agent systems are quickly overwhelmed by large, heterogeneous files, and master-slave multi-agent systems rely on a rigid central controller that requires precise knowledge of each sub-agent's capabilities, which is not possible in large-scale settings where the main agent lacks full observability over sub-agents' knowledge and competencies. We propose a novel multi-agent paradigm inspired by the blackboard architecture for traditional AI models. In our framework, a central agent posts requests to a shared blackboard, and autonomous subordinate agents - either responsible for a partition of the data lake or retrieval from the web - volunteer to respond based on their capabilities. This design improves scalability and flexibility by removing the need for a central coordinator to know each agent's expertise or internal knowledge. We evaluate the approach on three benchmarks that require data discovery: KramaBench and modified versions of DSBench and DA-Code. Results show that the blackboard architecture substantially outperforms strong baselines, achieving 13%-57% relative improvements in end-to-end success and up to a 9% relative gain in data discovery F1 over the best baseline.
Scientific hypothesis generation is central to materials discovery, yet current approaches often emphasize either conceptual (idea-to-data) reasoning or data-driven (data-to-idea) analysis, rarely achieving an effective integration of both. Here, we present a generalizable active learning workflow that integrates top-down, theory-driven hypothesis generation, guided by a large language model. This is complemented by bottom-up, data-driven hypothesis testing through a root-cause association study. We demonstrate this approach through the design of equimolar quinary-cation two-dimensional perovskite, a chemically complex system with over 850,000 possible cation combinations. In the top-down component, the large language model drives closed-loop optimization by proposing candidates that are likely to achieve phase purity, leveraging domain knowledge and chain-of-thought reasoning. With each iteration, the model identifies an increasing number of near phase-pure compositions, sampling less than 0.004% of the design space. In parallel, the bottom-up association study identifies molecular features with statistically significant influences on phase purity. The integration of these approaches enables the convergence of conceptual and statistical hypotheses, leading to generalizable and rational design rules for phase-pure quinary-cation two-dimensional perovskites. As a proof of concept, we applied the optimized phase-pure quinary-cation two-dimensional perovskite film as a surface capping layer in perovskite solar cells, achieving good performance and stability. Our framework enables the development of interpretable and generalizable design rules that are applicable to a wide range of optimization processes within complex design spaces, providing a foundational strategy for rational, scalable, and efficient materials discovery.
本报告综合了用于材料科学研究的智能体进展,揭示了该领域正从单一的LLM辅助工具向高度自主的“AI科学家”范式演进。最终分组涵盖了从宏观战略路线图到微观技术实现的完整链条:包括知识层的文献挖掘与多模态表征、逻辑层的架构设计与假设生成、计算层的仿真自动化、以及物理层的自驱动实验室集成。同时,研究重点正转向系统可靠性、标准化基准测试及特定材料体系(如电池、合金)的深度闭环发现,旨在构建可信、高效且具备物理逻辑推理能力的自主科研基础设施。