人工智能能力
通用人工智能(AGI)的理论框架与认知启发架构
探讨AGI的数学定义(如AIXI)、演进路径、脑启发神经架构、常识知识表示以及受心理学启发的认知空间导航模型。
- The Role and Future of Artificial General Intelligence (AGI)(Sirajo Abdullahi Bakura, Abubakar Bello Bada, Musa Karatu, Mubashir Haruna, Abdulsalam Ibrahim Magawata, Ede Ifesinachi Chizzy, Shuaibu Yau, 2025, International Journal of Innovative Science and Research Technology)
- Computable Artificial General Intelligence(Michael Timothy Bennett, 2022, ArXiv)
- Expression unleashed in artificial intelligence.(Ekaterina I Tolstaya, Abhinav Gupta, Edward Hughes, 2023, The Behavioral and brain sciences)
- New directions for artificial intelligence: human, machine, biological, and quantum intelligence(Weigang Li, L. Enamoto, Denise Leyi Li, G. P. Rocha Filho, 2021, Frontiers of Information Technology & Electronic Engineering)
- 2017年国际人工智能领域研究前沿的分析与研究 (Analysis and Investigation of Research Frontiers in International Field of Artificial Intelligence in 2017)(Yanling Yao, 2018, 计算机科学)
- Artificial General Intelligence (AGI): The Quest for Machines That Can Think Like Humans(Manoj Kumar, Apoorva Sharma, S. Saini, 2020, Turkish Journal of Computer and Mathematics Education (TURCOMAT))
- From Reinforcement Learning Towards Artificial General Intelligence(F. Rocha, V. S. Costa, L. Reis, 2020, No journal)
- Creating Artificial General Intelligence: A Holistic and Practical Approach(Eeman Majumder, 2024, INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT)
- The Embeddings World and Artificial General Intelligence(M. H. Chehreghani, 2022, Cogn. Syst. Res.)
- Information Retrieval for Artificial General Intelligence: A New Perspective of Information Retrieval Research(ChengXiang Zhai, 2025, Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval)
- Artificial Expert Intelligence through PAC-reasoning(S. Shalev-Shwartz, A. Shashua, Gal Beniamini, Yoav Levine, Or Sharir, Noam Wies, Ido Ben-Shaul, Tomer Nussbaum, Shir Peled, 2024, ArXiv)
- An integrated model of advanced artificial intelligence capability in higher education(Noor Afzan Salleh, 2022, Journal of Advances in Technology and Engineering Research)
- Human Brain Inspired Artificial Intelligence Neural Networks.(Paschalis Theotokis, 2025, Journal of integrative neuroscience)
- Artificial general intelligence through recursive data compression and grounded reasoning: a position paper(A. Franz, 2015, ArXiv)
- A criterion for Artificial General Intelligence: hypothetic-deductive reasoning, tested on ChatGPT(L. Vervoort, Vitaliy Mizyakov, Anastasia Ugleva, 2023, ArXiv)
- Navigating Conceptual Space; A new take on Artificial General Intelligence(P. Leikanger, 2022, ArXiv)
- Evidence of interrelated cognitive-like capabilities in large language models: Indications of artificial general intelligence or achievement?(David Ili'c, Gilles E. Gignac, 2023, Intelligence)
- Commonsense Knowledge in Machine Intelligence(Niket Tandon, A. Varde, Gerard de Melo, 2018, SIGMOD Rec.)
- Can AI have common sense? Finding out will be key to achieving machine intelligence(M. Kejriwal, Henrique Santos, Alice M. Mulvehill, Ke Shen, Deborah L. McGuinness, Henry Lieberman, 2024, Nature)
- Factor space:a new idea for artificial intelligence based on causal reasoning(Qi-Wei Kong, Jing He, Peizhuang Wang, 2020, 2020 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT))
- A Metamodel and Framework for Artificial General Intelligence From Theory to Practice(Hugo Latapie, Özkan Kiliç, Gaowen Liu, Yan Yan, R. Kompella, Pei Wang, K. Thórisson, Adam Lawrence, Yuhong Sun, Jayanth Srinivasa, 2021, J. Artif. Intell. Conscious.)
- Machine Psychology: integrating operant conditioning with the non-axiomatic reasoning system for advancing artificial general intelligence research(Robert Johansson, 2024, Frontiers in Robotics and AI)
- Toward Safe Artificial General Intelligence: Collective Narrow AI Collaboration with an Artificial Prefrontal Cortex(Rachmad Imam Tarecha, Priska Choirina, Urnika Mudhifatul Jannah, Bagus Seta Inba Cipta, Pangestuti Prima Darajat, Amalia Agung Septarina, 2026, ICoBITS)
大语言模型(LLM)的逻辑推理、复杂决策与评估基准
聚焦于LLM在逻辑、数学、经济等领域的推理能力,涵盖思维链(CoT)技术、强化学习(如DeepSeek-R1)对推理的激励、认知偏见分析以及针对专家级问题的评估基准。
- Artificial intelligence learns to reason.(Melanie Mitchell, 2025, Science (New York, N.Y.))
- R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation(Meng-Hao Guo, Jiajun Xu, Yi Zhang, Jiaxing Song, Hao-Yang Peng, Yi-Xuan Deng, Xin Dong, Kiyohiro Nakayama, Zhengyang Geng, Chen Wang, Bolin Ni, Guo-Wei Yang, Yongming Rao, Houwen Peng, Han Hu, Gordon Wetzstein, Shi-Min Hu, 2025, ArXiv)
- DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning(DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Jun-Mei Song, Ruoyu Zhang, R. Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiaoling Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z. F. Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao, A. Liu, Bing Xue, Bing-Li Wang, Bochao Wu, B. Feng, Chengda Lu, Chenggang Zhao, C. Deng, Chenyu Zhang, C. Ruan, Damai Dai, Deli Chen, Dong-Li Ji, Erhang Li, Fangyun Lin, Fucong Dai, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Han Bao, Hanwei Xu, Haocheng Wang, Honghui Ding, Huajian Xin, Huazuo Gao, Hui Qu, Hui Li, Jianzhong Guo, Jiashi Li, Jiawei Wang, JingChang Chen, Jingyang Yuan, Junjie Qiu, Junlong Li, J. Cai, J. Ni, Jian Liang, Jin Chen, Kai Dong, Kai Hu, Kaige Gao, Kang Guan, Kexin Huang, K. Yu, Lean Wang, Lecong Zhang, Liang Zhao, Litong Wang, Liyue Zhang, Lei Xu, Leyi Xia, Mingchuan Zhang, Minghua Zhang, M. Tang, Meng Li, Miaojun Wang, Mingming Li, Ning Tian, Panpan Huang, Peng Zhang, Qiancheng Wang, Qinyu Chen, Qiushi Du, Ruiqi Ge, Ruisong Zhang, Ruizhe Pan, Runji Wang, R. J. Chen, R. Jin, Ruyi Chen, Shanghao Lu, Shangyan Zhou, Shanhuang Chen, Shengfeng Ye, Shiyu Wang, Shuiping Yu, Shunfeng Zhou, Shuting Pan, S. S. Li, Shuang Zhou, Shao-Kang Wu, Tao Yun, Tian Pei, T. Sun, T. Wang, Wangding Zeng, Wanjia Zhao, Wen Liu, W. Liang, Wenjun Gao, Wen-Xia Yu, Wentao Zhang, W. Xiao, Wei An, Xiaodong Liu, Xiaohan Wang, Xiaokang Chen, X. Nie, Xin Cheng, Xin Liu, Xin Xie, Xingchao Liu, Xinyu Yang, Xinyuan Li, Xuecheng Su, Xuheng Lin, X. Q. Li, Xiangyu Jin, Xi-Cheng Shen, Xiaosha Chen, Xiaowen Sun, Xiaoxiang Wang, Xinnan Song, Xinyi Zhou, Xianzu Wang, Xinxia Shan, Y. K. Li, Y. Q. Wang, Y. X. Wei, Yang Zhang, Yanhong Xu, Yao Li, Yao Zhao, Yaofeng Sun, Yaohui Wang, Yi Yu, Yichao Zhang, Yifan Shi, Yi Xiong, Ying He, Y. Piao, Yisong Wang, Yixuan Tan, Yiyang Ma, Yiyuan Liu, Yongqiang Guo, Y. Ou, Yuduan Wang, Yue Gong, Yu-Jing Zou, Yujia He, Yunfan Xiong, Yu-Wei Luo, Yu-mei You, Yuxuan Liu, Yuyang Zhou, Y. X. Zhu, Yanping Huang, Yao Li, Yi Zheng, Yuchen Zhu, Yunxiang Ma, Ying Tang, Y. Zha, Yuting Yan, Z. Ren, Z. Ren, Zhangli Sha, Zhe Fu, Zhean Xu, Zhenda Xie, Zhen-guo Zhang, Zhewen Hao, Zhicheng Ma, Zhigang Yan, Zhiyu Wu, Zihui Gu, Zijia Zhu, Zijun Liu, Zi-An Li, Ziwei Xie, Ziyang Song, Zizheng Pan, Zhen Huang, Zhipeng Xu, Zhongyu Zhang, Zhen Zhang, 2025, Nature)
- Beyond Statistical Learning: Exact Learning Is Essential for General Intelligence(András György, Tor Lattimore, N. Lazic, Csaba Szepesvári, 2025, ArXiv)
- Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact(Rizwan Qureshi, Ranjan Sapkota, Abbas Shah, Amgad Muneer, Anas Zafar, Ashmal Vayani, Maged Shoman, Abdelrahman B. M. Eldaly, Kai Zhang, Ferhat Sadak, Shaina Raza, Xinqi Fan, Ravid Shwartz-Ziv, Hong Yan, Vinjia Jain, Aman Chadha, Manoj Karkee, Jia Wu, S. Mirjalili, 2025, ArXiv)
- Abstraction and analogy-making in artificial intelligence.(Melanie Mitchell, 2021, Annals of the New York Academy of Sciences)
- Logical reasoning for human activity recognition based on multisource data from wearable device(Mahmood Alsaadi, Ismail Keshta, J. V. N. Ramesh, Divya Nimma, Mohammad Shabaz, N. Pathak, P. Singh, S. Kiyosov, Mukesh Soni, 2025, Scientific Reports)
- Forewarning Artificial Intelligence about Cognitive Biases.(Jonathan Wang, Donald A Redelmeier, 2025, Medical decision making : an international journal of the Society for Medical Decision Making)
- Understanding Collective Intelligence: Investigating the Role of Collective Memory, Attention, and Reasoning Processes.(Anita Williams Woolley, Pranav Gupta, 2024, Perspectives on psychological science : a journal of the Association for Psychological Science)
- Human-machine Collaborative Decision-making: An Evolutionary Roadmap Based on Cognitive Intelligence(Minglun Ren, Nengying Chen, Huida Qiu, 2023, International Journal of Social Robotics)
- Three Epochs of Artificial Intelligence in Health Care.(Michael D Howell, Greg S Corrado, Karen B DeSalvo, 2024, JAMA)
- MT-R1-Zero: Advancing LLM-based Machine Translation via R1-Zero-like Reinforcement Learning(Zhaopeng Feng, Shaosheng Cao, Jiahan Ren, Jiayuan Su, Ruizhe Chen, Yan Zhang, Zhe Xu, Yao Hu, Jian Wu, Zuozhu Liu, 2025, ArXiv)
- ChatGPT与DeepSeek-R1比较研究:架构、推理能力与应用场景分析A Comparative Study of ChatGPT and DeepSeek-R1: Analysis of Architecture, Reasoning Capabilities, and Application Scenarios(李昌奎, 2025, Theory and Practice of Social Science)
- Unlocking the Wisdom of Large Language Models: An Introduction to The Path to Artificial General Intelligence(Edward Y. Chang, 2024, ArXiv)
- Evaluating the Performance of Large Language Models (LLMs) Through Grid-Based Game Competitions: An Extensible Benchmark and Leaderboard on the Path to Artificial General Intelligence (AGI)(Oguzhan Topsakal, 2025, The Journal of Cognitive Systems)
- Machine Number Sense: A Dataset of Visual Arithmetic Problems for Abstract and Relational Reasoning(Wenhe Zhang, Chi Zhang, Yixin Zhu, Song-Chun Zhu, 2020, ArXiv)
- Artificial intelligence chain-of-thought reasoning in nuanced medical scenarios: mitigation of cognitive biases through model intransigence.(Jonathan Wang, Donald A Redelmeier, 2025, BMJ quality & safety)
- Economic reasoning and artificial intelligence.(David C Parkes, Michael P Wellman, 2015, Science (New York, N.Y.))
- LexGenius: An Expert-Level Benchmark for Large Language Models in Legal General Intelligence(Wenjin Liu, Haoran Luo, Xinyu Feng, Xiang Ji, Lijuan Zhou, Rui Mao, Jiapu Wang, Shirui Pan, Erik Cambria, 2025, ArXiv)
医疗健康领域的临床推理与智慧医疗应用
专门研究AI在临床医学环境下的表现,包括诊断准确性、临床决策过程、医学考试能力、护理系统化以及通用医疗人工智能(GMAI)框架的构建。
- Exploring the interplay of clinical reasoning and artificial intelligence in psychiatry: Current insights and future directions.(Christophe Gauld, Vincent P Martin, Hugo Bottemanne, Pierre Fourneret, Jean-Arthur Micoulaud-Franchi, Guillaume Dumas, 2024, Psychiatry research)
- Solving gaps in clinical reasoning is the cure to neurophobia in artificial intelligence.(Ethan Meltzer, 2025, Journal of the neurological sciences)
- Clinical Reasoning of a Generative Artificial Intelligence Model Compared With Physicians.(Stephanie Cabral, Daniel Restrepo, Zahir Kanjee, Philip Wilson, Byron Crowe, Raja-Elie Abdulnour, Adam Rodman, 2024, JAMA internal medicine)
- From large language models to artificial general intelligence: Evolution pathways in clinical healthcare(Indraneel Borgohain, 2025, World Journal of Advanced Research and Reviews)
- Artificial Intelligence Clinical Reasoning in Board-Style Clinical Vignettes: A Comparative Study.(Lorela Gjunkshi, Ledio Gjunkshi, Kenneth A Quezada, Nadiya A Persaud, Joseph Braun, 2025, Cureus)
- Is generative artificial intelligence capable of clinical reasoning?(Adam Rodman, Eric J Topol, 2025, Lancet (London, England))
- Humans, machines and decisions: Clinical reasoning in the age of artificial intelligence, evidence-based medicine and Covid-19.(Michael Loughlin, Samantha Marie Copeland, 2021, Journal of evaluation in clinical practice)
- Multi-Label Active Learning-Based Machine Learning Model for Heart Disease Prediction.(Ibrahim M El-Hasnony, Omar M Elzeki, Ali Alshehri, Hanaa Salem, 2022, Sensors (Basel, Switzerland))
- CLINICAL REASONING AND ARTIFICIAL INTELLIGENCE: CAN AI REALLY THINK?(Richard M Schwartzstein, 2024, Transactions of the American Clinical and Climatological Association)
- Artificial intelligence for ventricular arrhythmia capability using ambulatory electrocardiograms(Joseph Barker, Xin Mres, Li, A. Phd, A. Md, Ibrahim Antoun MSc, C. Phd, Ivelin Koev MSc, S. Md, P. F. S. Schlindwein, H. Phd, Shui Hao Chin Md, I. Phd, P. W. B. Nicolson, Andre Ng, Mbbs BSc Afhea, Aicsm Mres, 2023, European Heart Journal. Digital Health)
- Evaluation and accurate diagnoses of pediatric diseases using artificial intelligence.(Huiying Liang, Brian Y Tsui, Hao Ni, Carolina C S Valentim, Sally L Baxter, Guangjian Liu, Wenjia Cai, Daniel S Kermany, Xin Sun, Jiancong Chen, Liya He, Jie Zhu, Pin Tian, Hua Shao, Lianghong Zheng, Rui Hou, Sierra Hewett, Gen Li, Ping Liang, Xuan Zang, Zhiqi Zhang, Liyan Pan, Huimin Cai, Rujuan Ling, Shuhua Li, Yongwang Cui, Shusheng Tang, Hong Ye, Xiaoyan Huang, Waner He, Wenqing Liang, Qing Zhang, Jianmin Jiang, Wei Yu, Jianqun Gao, Wanxing Ou, Yingmin Deng, Qiaozhen Hou, Bei Wang, Cuichan Yao, Yan Liang, Shu Zhang, Yaou Duan, Runze Zhang, Sarah Gibson, Charlotte L Zhang, Oulan Li, Edward D Zhang, Gabriel Karin, Nathan Nguyen, Xiaokang Wu, Cindy Wen, Jie Xu, Wenqin Xu, Bochu Wang, Winston Wang, Jing Li, Bianca Pizzato, Caroline Bao, Daoman Xiang, Wanting He, Suiqin He, Yugui Zhou, Weldon Haw, Michael Goldbaum, Adriana Tremoulet, Chun-Nan Hsu, Hannah Carter, Long Zhu, Kang Zhang, Huimin Xia, 2019, Nature medicine)
- Foundation models for generalist medical artificial intelligence.(Michael Moor, Oishi Banerjee, Zahra Shakeri Hossein Abad, Harlan M Krumholz, Jure Leskovec, Eric J Topol, Pranav Rajpurkar, 2023, Nature)
- Empowering biomedical discovery with AI agents.(Shanghua Gao, Ada Fang, Yepeng Huang, Valentina Giunchiglia, Ayush Noori, Jonathan Richard Schwarz, Yasha Ektefaie, Jovana Kondic, Marinka Zitnik, 2024, Cell)
- Artificial intelligence for clinical reasoning: the reliability challenge and path to evidence-based practice.(He Xu, Yueqing Wang, Yangqin Xun, Ruitai Shao, Yang Jiao, 2025, QJM : monthly journal of the Association of Physicians)
- [Clinical reasoning, the art of medicine and artificial intelligence].(Stefano Bassetti, Martin C Hirsch, Edouard Battegay, 2024, Deutsche medizinische Wochenschrift (1946))
- Nursing Care Systematization with Case-Based Reasoning and Artificial Intelligence.(Malik Bader Alazzam, Nahla Tayyib, Samar Zuhair Alshawwa, Md Kawser Ahmed, 2022, Journal of healthcare engineering)
- Intuitive Human-Artificial Intelligence Theranostic Complementarity.(J Harvey Turner, 2025, Cancer biotherapy & radiopharmaceuticals)
- Exploring the Role of Artificial Intelligence in Smart Healthcare: A Capability and Function-Oriented Review(Syed Raza Abbas, H. Seol, Zeeshan Abbas, S. Lee, 2025, Healthcare)
- Artificial Intelligence, Machine Learning, and Medicine: A Little Background Goes a Long Way Toward Understanding.(Mark P Cote, James H Lubowitz, Jefferson C Brand, Michael J Rossi, 2021, Arthroscopy : the journal of arthroscopic & related surgery : official publication of the Arthroscopy Association of North America and the International Arthroscopy Association)
- A framework for human evaluation of large language models in healthcare derived from literature review.(Thomas Yu Chow Tam, Sonish Sivarajkumar, Sumit Kapoor, Alisa V Stolyar, Katelyn Polanska, Karleigh R McCarthy, Hunter Osterhoudt, Xizhi Wu, Shyam Visweswaran, Sunyang Fu, Piyush Mathur, Giovanni E Cacciamani, Cong Sun, Yifan Peng, Yanshan Wang, 2024, NPJ digital medicine)
- Artificial Intelligence and Clinical Reasoning-a Way to Walk to Harrison's.(John C Penner, R Jeffrey Kohlwes, 2023, Journal of general internal medicine)
- [Not Available].(Paolo De Angelis, Alice Andalò, Nicola Gentili, Luca Giorgetti, Lorenzo Ridolfi, Roberto Pasolini, Andrea Pagliarani, Martina Cavallucci, Roberto Vespignani, Antonella Carbonaro, 2025, Recenti progressi in medicina)
机器学习的自适应、持续学习与领域迁移能力
研究AI系统在动态环境中的鲁棒性,涉及终身学习、领域自适应(Domain Adaptation)、主动学习、概念漂移检测及数据流环境下的自我调节机制。
- Lifelong Self-Adaptation: Self-Adaptation Meets Lifelong Machine Learning(Omid Gheibi, Danny Weyns, 2022, 2022 International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS))
- Continual Active Learning for Efficient Adaptation of Machine Learning Models to Changing Image Acquisition(Matthias Perkonigg, J. Hofmanninger, G. Langs, 2021, No journal)
- Learning Transferable Features with Deep Adaptation Networks(Mingsheng Long, Yue Cao, Jianmin Wang, Michael I. Jordan, 2015, ArXiv)
- A decentralized adaptation of model-free Q-learning for thermal-aware energy-efficient virtual machine placement in cloud data centers(Alireza Aghasi, K. Jamshidi, Ali Bohlooli, B. Javadi, 2023, Comput. Networks)
- Integrating AI and Machine Learning in Quality Assurance for Automation Engineering(Parameshwar Reddy Kothamali, S. Surya, Mounika Dandyala, Vinod kumar Karne, 2024, International Journal for Research Publication and Seminar)
- Domain Adaptation Based on Semi-Supervised Cross-Domain Mean Discriminative Analysis and Kernel Transfer Extreme Learning Machine.(Xinghai Li, Jianwei Ma, 2023, Sensors (Basel, Switzerland))
- Domain adaptation through active learning strategies for anomaly classification in wastewater treatment plants.(Francesca Bellamoli, Marco Vian, Mattia Di Iorio, Farid Melgani, 2024, Water science and technology : a journal of the International Association on Water Pollution Research)
- Concept-drifts adaptation for machine learning EEG epilepsy seizure prediction.(Edson David Pontes, Mauro Pinto, Fábio Lopes, César Teixeira, 2024, Scientific reports)
- Adaptability and sustainability of machine learning approaches to traffic signal control.(Marcin Korecki, 2022, Scientific reports)
- Position: General Intelligence Requires Reward-based Pretraining(Seungwook Han, Jyothish Pari, Samuel Gershman, Pulkit Agrawal, 2025, No journal)
具身智能、物理推理与自主代理系统
涵盖具备物理形态或自主行动能力的AI,包括物理推理能力评估(Phy-Q)、机器人控制、具身智能体(FaGeL)以及代理智能(Agentic AI)在自动化实验中的应用。
- Phy-Q as a measure for physical reasoning intelligence(Cheng Xue, Vimukthini Pinto, C. Gamage, Ekaterina Nikonova, Peng Zhang, Jochen Renz, 2021, Nature Machine Intelligence)
- CRISPR-GPT for agentic automation of gene-editing experiments.(Yuanhao Qu, Kaixuan Huang, Ming Yin, Kanghong Zhan, Dyllan Liu, Di Yin, Henry C Cousins, William A Johnson, Xiaotong Wang, Mihir Shah, Russ B Altman, Denny Zhou, Mengdi Wang, Le Cong, 2026, Nature biomedical engineering)
- Neural Control and Online Learning for Speed Adaptation of Unmanned Aerial Vehicles.(Vatsanai Jaiton, Kongkiat Rothomphiwat, Emad Ebeid, Poramate Manoonpong, 2022, Frontiers in neural circuits)
- Machine Learning Techniques for Increasing Efficiency of the Robot's Sensor and Control Information Processing.(Yuriy Kondratenko, Igor Atamanyuk, Ievgen Sidenko, Galyna Kondratenko, Stanislav Sichevskyi, 2022, Sensors (Basel, Switzerland))
- Visual Positioning of Nasal Swab Robot Based on Hierarchical Decision(Guozhi Li, Shuizhong Zou, S. Ding, 2023, Journal of Shanghai Jiaotong University (Science))
- FaGeL: Fabric LLMs Agent empowered Embodied Intelligence Evolution with Autonomous Human-Machine Collaboration(Jia Liu, Min Chen, 2024, ArXiv)
- From the logic of coordination to goal-directed reasoning: the agentic turn in artificial intelligence(Tsehaye Haidemariam, 2026, Frontiers in Artificial Intelligence)
- From Mind to Machine: The Rise of Manus AI as a Fully Autonomous Digital Agent(Minjie Shen, Qikai Yang, Yanshu Li, 2025, ArXiv)
组织AI能力构建、行业赋能与科学研究自动化
探讨AI能力如何转化为组织绩效与创造力,分析其在供应链、法律、交通等行业的落地,以及AI作为“科学家2.0”对科研范式的重塑。
- Artificial intelligence capability and organizational performance: unraveling the mediating mechanisms of decision-making processes(Suheil Neiroukh, Okechukwu Lawrence Emeagwali, Hasan Yousef Aljuhmani, 2024, Management Decision)
- Artificial Intelligence Capability and Firm Performance: A Sustainable Development Perspective by the Mediating Role of Data-Driven Culture(Samuel Fosso Wamba, M. Queiroz, Ilias O. Pappas, Yulia Sullivan, 2024, Information Systems Frontiers)
- Artificial intelligence capability: Conceptualization, measurement calibration, and empirical study on its impact on organizational creativity and firm performance(Patrick Mikalef, Manjul Gupta, 2021, Inf. Manag.)
- Integrating creativity and artificial intelligence capability in entrepreneurial ventures(Cristina Doritta Brandão Majorana, Sílvio Luís de Vasconcellos, F. Borini, 2025, Journal of Small Business and Enterprise Development)
- Generative artificial intelligence in supply chain and operations management: a capability-based framework for analysis and implementation(Ilya Jackson, Dmitry A. Ivanov, Alexandre Dolgui, Jafar Namdar, 2024, International Journal of Production Research)
- Editor's view: What makes science successful?(Igor Rudan, 2025, Journal of global health)
- Machine Learning for Biological Design.(Tom Blau, Iadine Chades, Cheng Soon Ong, 2024, Methods in molecular biology (Clifton, N.J.))
- 人工智能技术赋能高校机械专业职教师资培养路径探究(朋仝 于, 尚峰 吴, 2026, 教育发展探索)
- 基于科学计量的世界人工智能领域发展状况分析 (Analysis of Development Status of World Artificial Intelligence Based on Scientific Measurement)(Yue Li, Cheng Su, Jia Jia, Zhenchi Xu, Ruiqiang Tian, 2017, 计算机科学)
- Soft-HGRNs: soft hierarchical graph recurrent networks for multi-agent partially observable environments(Yixiang Ren, Zhenhui Ye, Yining Chen, Xiaohong Jiang, Guang-hua Song, 2023, Frontiers of Information Technology & Electronic Engineering)
基础神经网络架构、特征表示与计算智能方法论
侧重于底层技术实现,包括卷积神经网络(CNN)、自编码器、特征提取技术、信息场论以及各类计算智能优化算法。
- 基于神经网络的异构网络向量化表示方法 (Vectorized Representation of Heterogeneous Network Based on Neural Networks)(Weizu Wu, Liqun Liu, Dongqing Xie, 2017, 计算机科学)
- 基于卷积神经网络的压缩感知重构算法优化 (Optimization of Compressed Sensing Reconstruction Algorithms Based on Convolutional Neural Network)(Yuhong Liu, Shuying Liu, Fuxiang Fu, 2020, 计算机科学)
- 基于多分类器协同学习的卷积神经网络训练算法 (CNN Training Algorithm Based on Co-studying of Multiple Classifiers)(Wen Chen, Enyang Zhang, Yong Zhao, 2016, 计算机科学)
- Two-Stream Auto-Encoder Network for Unsupervised Skeleton-Based Action Recognition(Gangsheng Wang, Yaonan Guan, Dewei Li, 2023, Journal of Shanghai Jiaotong University (Science))
- Person Re-Identification Based on Spatial Feature Learning and Multi-Granularity Feature Fusion(Zijian Diao, Shuai Cao, Wenwei Li, Jianan Liang, Guilin Wen, Weixi Huang, Shouming Zhang, 2023, Journal of Shanghai Jiaotong University (Science))
- Superiority of a Convolutional Neural Network Model over Dynamical Models in Predicting Central Pacific ENSO(Tingyu Wang, Ping Huang, 2023, Advances in Atmospheric Sciences)
- 一种基于卷积神经网络深度学习的人体行为识别方法 (Method on Human Activity Recognition Based on Convolutional Neural Networks)(Zhongmin Wang, Hongjiang Cao, Lin Fan, 2016, 计算机科学)
- Explore brain-inspired machine intelligence for connecting dots on graphs through holographic blueprint of oscillatory synchronization(Tingting Dan, Jiaqi Ding, Guorong Wu, 2025, Nature Communications)
- Information Field Theory and Artificial Intelligence.(Torsten Enßlin, 2022, Entropy (Basel, Switzerland))
- Computational intelligence based machine learning methods for rule-based reasoning in computer vision applications(T. Dhivyaprabha, P. Subashini, M. Krishnaveni, 2016, 2016 IEEE Symposium Series on Computational Intelligence (SSCI))
- Computational Sensing, Understanding, and Reasoning: An Artificial Intelligence Approach to Physics-Informed World Modeling(B. Moya, Alberto Badías, D. González, F. Chinesta, Elías Cueto, 2023, Archives of Computational Methods in Engineering)
- 融合多任務學習類神經網路聲學模型訓練於會議語音辨識之研究(Leveraging Multi-task Learning with Neural Network Based Acoustic Modeling for Improved Meeting Speech Recognition) [In Chinese](Ming-Han Yang, Yao-Chi Hsu, Hsiao-Tsung Hung, Ying-Wen Chen, Berlin Chen, Kuan-Yu Chen, 2016)
- 一种基于CRO的高阶神经网络多示例学习方法 (Multiple-instance Learning Method Based on CRO High Order Neural Networks)(Bo Deng, Yingjun Lu, Ruzhi Wang, 2017, 计算机科学)
- Trusted Storage Architecture for Machine Reasoning based on Blockchain(Yichuan Wang, Rui Fan, Xinyue Yin, Xinhong Hei, 2022, IEEE INFOCOM 2022 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS))
- Machine Learning for Wireless Sensor Networks Security: An Overview of Challenges and Issues.(Rami Ahmad, Raniyah Wazirali, Tarik Abu-Ain, 2022, Sensors (Basel, Switzerland))
- Image quality assessment for machine learning tasks using meta-reinforcement learning.(Shaheer U Saeed, Yunguan Fu, Vasilis Stavrinides, Zachary M C Baum, Qianye Yang, Mirabela Rusu, Richard E Fan, Geoffrey A Sonn, J Alison Noble, Dean C Barratt, Yipeng Hu, 2022, Medical image analysis)
- Machine learning and logic: a new frontier in artificial intelligence(Vijay Ganesh, S. Seshia, S. Jha, 2022, Formal Methods in System Design)
- 一种基于前向无监督卷积神经网络的人脸表示学习方法 (Forward and Unsupervised Convolutional Neural Network Based Face Representation Learning Method)(Tao Zhu, Haijun Ren, Weijun Hong, 2016, 计算机科学)
- 复杂网络上多智能体系统的一致性研究 (Research of Consensus in Multi-agent Systems on Complex Network)(Sen Zhang, Wenqi Liu, Ningbo Zhao, 2019, 计算机科学)
- A Structural Formula Process Neural Networks and Its Applications(Shaohua Xu, Xingui He, Bing Wang, 2006, J. Comput. Res. Dev.)
- 基于深度神经网络的语音识别系统研究 (Speech Recognition System Based on Deep Neural Network)(Weilin Li, Jian Wen, Wenkai Ma, 2016, 计算机科学)
- 一种考虑等级语义关联的证据推理决策方法 (Decision Making Approach Based on Evidential Reasoning Considering SemanticRelationship among Assessment Grades)(Meijing Zhang, Yingming Wang, 2018, 计算机科学)
最终合并的分组全面覆盖了人工智能能力的研究版图:从底层的神经网络架构与特征表示技术,到中层的逻辑推理、自适应学习与具身智能机制,再到高层的通用人工智能(AGI)愿景。报告不仅深入探讨了AI在医疗、科研、供应链等关键行业的深度应用能力,还从组织行为学和管理学视角分析了AI能力对企业绩效的赋能作用,构建了一个从理论基础到技术实现,再到社会组织应用的完整能力评价体系。
总计115篇相关文献
人工智能技术的飞速发展推动了大语言模型(LLM)的不断进步。在众多LLM中,OpenAI推出的ChatGPT和DeepSeek-AI开发的DeepSeek-R1尤为引人注目。ChatGPT基于GPT-4架构,具备强大的自然语言理解能力和广泛的应用场景,而DeepSeek-R1则通过强化学习方法优化推理能力,在数学推理和编程任务中展现了强劲的竞争力。本文基于DeepSeek-R1的最新研究成果,全面对比ChatGPT与DeepSeek-R1在模型架构、训练方法、推理能力、应用场景及开放性等方面的差异。研究发现,ChatGPT依赖监督微调(SFT)和基于人类反馈的强化学习(RLHF),在自然语言处理任务上表现突出,而DeepSeek-R1更倾向于通过强化学习优化推理能力,尤其在数学推理、代码生成等任务上表现优异。此外,ChatGPT采用闭源策略,主要用于商业应用,而DeepSeek-R1则采取开源模式,为研究社区和开发者提供更大的灵活性。本文的研究结果为人工智能研究人员和开发者提供了重要参考,以期促进LLM技术的发展,并为未来的大模型优化提供新思路。 The rapid development of artificial intelligence has driven the continuous advancement of large language models (LLMs). Among them, OpenAI's ChatGPT and DeepSeek-AI's DeepSeek-R1 have garnered significant attention. ChatGPT, built upon the GPT-4 architecture, demonstrates strong natural language understanding and wide-ranging applications, whereas DeepSeek-R1 leverages reinforcement learning techniques to optimize reasoning capabilities, excelling in mathematical reasoning and programming tasks. This paper, based on the latest research on DeepSeek-R1, provides a comprehensive comparison between ChatGPT and DeepSeek-R1 in terms of model architecture, training methods, reasoning capabilities, application scenarios, and openness. The study reveals that ChatGPT relies on supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF), making it highly effective in natural language processing tasks. In contrast, DeepSeek-R1 emphasizes reinforcement learning to enhance reasoning abilities, particularly excelling in mathematical reasoning and code generation tasks. Moreover, ChatGPT follows a closed-source approach, primarily for commercial use, while DeepSeek-R1 adopts an open-source model, offering greater flexibility for researchers and developers. This study provides valuable insights for AI researchers and developers, contributing to the advancement of LLM technology and future model optimization strategies.
Abstract Artificial intelligence (AI) has been heralded by many as the next source of business value. Grounded on the resource-based theory of the firm and on recent work on AI at the organizational context, this study (1) identifies the AI-specific resources that jointly create an AI capability and provides a definition, (2) develops an instrument to capture the AI capability of the firms, and (3) examines the relationship between an AI capability and organizational creativity and performance. Findings empirically support the suggested theoretical framework and corresponding instrument and provide evidence that an AI capability results in increased organizational creativity and performance.
No abstract available
PurposeThis study investigates the profound impact of artificial intelligence (AI) capabilities on decision-making processes and organizational performance, addressing a crucial gap in the literature by exploring the mediating role of decision-making speed and quality.Design/methodology/approachDrawing upon resource-based theory and prior research, this study constructs a comprehensive model and hypotheses to illuminate the influence of AI capabilities within organizations on decision-making speed, decision quality, and, ultimately, organizational performance. A dataset comprising 230 responses from diverse organizations forms the basis of the analysis, with the study employing a partial least squares structural equation model (PLS-SEM) for robust data examination.FindingsThe results demonstrate the pivotal role of AI capabilities in shaping organizational decision-making processes and performance. AI capability significantly and positively affects decision-making speed, decision quality, and overall organizational performance. Notably, decision-making speed is a critical factor contributing significantly to enhanced organizational performance. The study further uncovered partial mediation effects, suggesting that decision-making processes partially mediate the relationship between AI capabilities and organizational performance through decision-making speed.Originality/valueThis study contributes to the existing body of literature by providing empirical evidence of the multifaceted impact of AI capabilities on organizational decision-making and performance. Elucidating the mediating role of decision-making processes advances our understanding of the complex mechanisms through which AI capabilities drive organizational success.
PurposeWhile the literature on artificial intelligence (AI) capability is expanding, gaps remain in understanding how this capability is internally developed in technology-based startups (TBS) across different life cycle phases. This study, grounded in the resource orchestration theory (ROT), investigates the pathway through which TBS use organizational creativity to build AI capability and achieve performance.Design/methodology/approachA conceptual framework based on ROT emphasizes the role of organizational creativity in the structuring and bundling processes. Data were collected through a survey of 166 managers and employees of TBS operating in Brazil and international markets, using multiple linear regressions and the Sobel test for analysis. The study validated the AI capability scale in the TBS context.FindingsAI capability fully mediates the relationship between organizational creativity and performance, confirming that organizational creativity is a critical resource for AI capability development. These findings advance ROT by deepening the understanding of how AI capability is developed in TBS. The study offers a dynamic, process-based view of performance trajectories in TBS, demonstrating that the synchrony between creativity and AI capability creates a cyclical process, maximizing company performance.Originality/valueThis research identifies an alternative pathway for TBS to develop AI capability and achieve performance, highlighting the synchronization and co-evolution of resources and capabilities. It provides novel insights into AI capability’s mediating role and expands understanding of resource management in TBS across life cycle phases.
No abstract available
Artificial Intelligence (AI) is transforming smart healthcare by enhancing diagnostic precision, automating clinical workflows, and enabling personalized treatment strategies. This review explores the current landscape of AI in healthcare from two key perspectives: capability types (e.g., Narrow AI and AGI) and functional architectures (e.g., Limited Memory and Theory of Mind). Based on capabilities, most AI systems today are categorized as Narrow AI, performing specific tasks such as medical image analysis and risk prediction with high accuracy. More advanced forms like General Artificial Intelligence (AGI) and Superintelligent AI remain theoretical but hold transformative potential. From a functional standpoint, Limited Memory AI dominates clinical applications by learning from historical patient data to inform decision-making. Reactive systems are used in rule-based alerts, while Theory of Mind (ToM) and Self-Aware AI remain conceptual stages for future development. This dual perspective provides a comprehensive framework to assess the maturity, impact, and future direction of AI in healthcare. It also highlights the need for ethical design, transparency, and regulation as AI systems grow more complex and autonomous, by incorporating cross-domain AI insights. Moreover, we evaluate the viability of developing AGI in regionally specific legal and regulatory frameworks, using South Korea as a case study to emphasize the limitations imposed by infrastructural preparedness and medical data governance regulations.
This research examines the transformative potential of artificial intelligence (AI) in general and Generative AI (GAI) in particular in supply chain and operations management (SCOM). Through the lens of the resource-based view and based on key AI capabilities such as learning, perception, prediction, interaction, adaptation, and reasoning, we explore how AI and GAI can impact 13 distinct SCOM decision-making areas. These areas include but are not limited to demand forecasting, inventory management, supply chain design, and risk management. With its outcomes, this study provides a comprehensive understanding of AI and GAI's functionality and applications in the SCOM context, offering a practical framework for both practitioners and researchers. The proposed framework systematically identifies where and how AI and GAI can be applied in SCOM, focussing on decision-making enhancement, process optimisation, investment prioritisation, and skills development. Managers can use it as a guidance to evaluate their operational processes and identify areas where AI and GAI can deliver improved efficiency, accuracy, resilience, and overall effectiveness. The research underscores that AI and GAI, with their multifaceted capabilities and applications, open a revolutionary potential and substantial implications for future SCOM practices, innovations, and research.
Aim: Current clinical practice guidelines for implantable cardioverter defibrillators (ICDs) are insufficiently accurate for ventricular arrhythmia (VA) risk stratification leading to significant morbidity and mortality. Artificial intelligence offers novel risk stratification lens through which VA capability can be determined from electrocardiogram in normal sinus rhythm. The aim was to develop and test a deep neural network for VA risk stratification using routinely collected ambulatory electrocardiograms. Methods: A multicentre case-control study was undertaken to assess VA-ResNet-50, our open source ResNet-50 based deep neural network. VA-ResNet-50 was designed to read pyramid samples of 3-lead 24-hour ambulatory electrocardiograms to decide if a heart is capable of VA based on the electrocardiogram alone. Consecutive adults with VA from East Midlands, UK, who had ambulatory electrocardiograms as part of their NHS care between 2014 and 2022 were recruited and compared to all comer ambulatory electrocardiograms without VA. Results: Of 270 patients, 159 heterogeneous patients had a composite VA outcome. The mean time difference between the electrocardiogram and VA was 1.6 years ([1/3] ambulatory electrocardiogram before VA). The deep neural network was able to classify electrocardiograms for VA capability with an accuracy of 0.76 (CI 95% 0.66 - 0.87), F1 score of 0.79 (0.67 - 0.90), AUC of 0.8 (0.67 - 0.91) and RR of 2.87 (1.41 - 5.81). Conclusion: Ambulatory electrocardiograms confer risk signals for VA risk stratification when analysed using VA-ResNet-50. Pyramid sampling from the ambulatory electrocardiograms is hypothesised to capture autonomic activity. We encourage groups to build on this open-source model.
Neural coupling in both neuroscience and AI emerges dynamic oscillatory patterns that encode abstract concepts. To that end, we hypothesize that a deeper understanding of the neural mechanisms that determine brain rhythms could inspire next-generation design principles for machine learning algorithms, leading to greater efficiency and robustness. Following this notion, we first model the evolving brain rhythm by the interference between spontaneously synchronized neural oscillations (termed HoloBrain). The success of modeling brain rhythms via an artificial dynamic system of coupled oscillations gives rise to the “first principle” for emerging brain-inspired machine intelligence through the common mechanism of synchronization (termed HoloGraph), enabling graph neural networks (GNNs) to move beyond conventional heat diffusion paradigms toward modeling oscillatory synchronization. Our HoloGraph not only effectively addresses the over-smoothing issue in GNNs but also manifests the potential of reasoning and solving challenging problems on graphs. Neural coupling is a challenge in understanding both brain function and advancing machine intelligence. Here, the authors introduce HoloBrain and HoloGraph, a brain-inspired framework that models oscillatory synchronization to overcome limitations of graph neural networks and enable more efficient, robust learning.
No abstract available
No abstract available
Recent advancements in Large Language Models (LLMs) have enhanced the reasoning capabilities of embodied agents, driving progress toward AGI-powered robotics. While LLMs have been applied to tasks like semantic reasoning and task generalization, their potential in open physical space exploration remains underexplored. This paper introduces FaGeL (Fabric aGent empowered by embodied intelligence with LLMs), an embodied agent integrating smart fabric technology for seamless, non-intrusive human-agent interaction. FaGeL autonomously generates tasks using multimodal data from wearable and ambient sensors, refining its behavior based on implicit human feedback in generated text, without explicit ratings or preferences. We also introduce a token-level saliency map to visualize LLM fine-tuning, enhancing the interpretability of token-level alignment. The system leverages dual feedback mechanisms to improve token-level alignment and addresses challenges in non-intrusive human-machine interaction and cognition evolution. Our contributions include FaGeL's development, the DualCUT algorithm for AI alignment, and experimental validation in cooperative tasks, demonstrating FaGeL's ability to adapt and evolve autonomously through implicit feedback. In the future, we plan to explore FaGeL's scalability in dynamic environments and its integration with other AI systems to develop AGI agents that adapt seamlessly to diverse human needs.
Human-machine Collaborative Decision-making: An Evolutionary Roadmap Based on Cognitive Intelligence
No abstract available
We argue that a key reasoning skill that any advanced AI, say GPT-4, should master in order to qualify as 'thinking machine', or AGI, is hypothetic-deductive reasoning. Problem-solving or question-answering can quite generally be construed as involving two steps: hypothesizing that a certain set of hypotheses T applies to the problem or question at hand, and deducing the solution or answer from T - hence the term hypothetic-deductive reasoning. An elementary proxy of hypothetic-deductive reasoning is causal reasoning. We propose simple tests for both types of reasoning, and apply them to ChatGPT. Our study shows that, at present, the chatbot has a limited capacity for either type of reasoning, as soon as the problems considered are somewhat complex. However, we submit that if an AI would be capable of this type of reasoning in a sufficiently wide range of contexts, it would be an AGI.
Humans are well versed in reasoning about the behaviours of physical objects and choosing actions accordingly to accomplish tasks, while this remains a major challenge for artificial intelligence. To facilitate research addressing this problem, we propose a new testbed that requires an agent to reason about physical scenarios and take an action appropriately. Inspired by the physical knowledge acquired in infancy and the capabilities required for robots to operate in real-world environments, we identify 15 essential physical scenarios. We create a wide variety of distinct task templates, and we ensure that all the task templates within the same scenario can be solved by using one specific strategic physical rule. By having such a design, we evaluate two distinct levels of generalization, namely local generalization and broad generalization. We conduct an extensive evaluation with human players, learning agents with various input types and architectures, and heuristic agents with different strategies. Inspired by how the human intelligence quotient is calculated, we define the physical reasoning quotient (Phy-Q score) that reflects the physical reasoning intelligence of an agent using the physical scenarios we considered. Our evaluation shows that (1) all the agents are far below human performance, and (2) learning agents, even with good local generalization ability, struggle to learn the underlying physical reasoning rules and fail to generalize broadly. We encourage the development of intelligent agents that can reach the human-level Phy-Q score. When it comes to reasoning about the motion of physical objects, humans have natural intuitive physics knowledge. To test how good artificial learning agents are in similar predictive abilities, Xue and colleagues present a benchmark based on a two-dimensional physics environment in which 15 physical reasoning skills are measured.
Reasoning stands as a cornerstone of intelligence, enabling the synthesis of existing knowledge to solve complex problems. Despite remarkable progress, existing reasoning benchmarks often fail to rigorously evaluate the nuanced reasoning capabilities required for complex, real-world problemsolving, particularly in multi-disciplinary and multimodal contexts. In this paper, we introduce a graduate-level, multi-disciplinary, EnglishChinese benchmark, dubbed as Reasoning Bench (R-Bench), for assessing the reasoning capability of both language and multimodal models. RBench spans 1,094 questions across 108 subjects for language model evaluation and 665 questions across 83 subjects for multimodal model testing in both English and Chinese. These questions are meticulously curated to ensure rigorous difficulty calibration, subject balance, and crosslinguistic alignment, enabling the assessment to be an Olympiad-level multi-disciplinary benchmark. We evaluate widely used models, including OpenAI o1, GPT-4o, DeepSeek-R1, etc. Experimental results indicate that advanced models perform poorly on complex reasoning, especially multimodal reasoning. Even the top-performing model OpenAI o1 achieves only 53.2% accuracy on our multimodal evaluation. Data and code are made publicly available at here.
With the urgent demand for interpretability and generalization of artificial intelligence decision-making process, machine reasoning based on production rules provides an auditable scheme for knowledge construction process. Aiming at the problems of traditional machine reasoning, such as lack of trusted storage, reasoning and difficulty in tracing results, this paper proposes a trusted machine reasoning process storage scheme based on blockchain. In this scheme, the fact and rule storage and trusted machine reasoning algorithm are encapsulated in the alliance blockchain smart contract, to realize the trusted reasoning and reliable storage in the whole reasoning process and ensure the transparency and traceability of rule logic update. Index Terms–Blockchain, Machine Reasoning, Smart Contract, Production rule, Trusted Reasoning
No abstract available
No abstract available
As a comprehensive indicator of mathematical thinking and intelligence, the number sense (Dehaene 2011) bridges the induction of symbolic concepts and the competence of problem-solving. To endow such a crucial cognitive ability to machine intelligence, we propose a dataset, Machine Number Sense (MNS), consisting of visual arithmetic problems automatically generated using a grammar model—And-Or Graph (AOG). These visual arithmetic problems are in the form of geometric figures: each problem has a set of geometric shapes as its context and embedded number symbols. Solving such problems is not trivial; the machine not only has to recognize the number, but also to interpret the number with its contexts, shapes, and relations (e.g., symmetry) together with proper operations. We benchmark the MNS dataset using four predominant neural network models as baselines in this visual reasoning task. Comprehensive experiments show that current neural-network-based models still struggle to understand number concepts and relational operations. We show that a simple brute-force search algorithm could work out some of the problems without context information. Crucially, taking geometric context into account by an additional perception module would provide a sharp performance gain with fewer search steps. Altogether, we call for attention in fusing the classic search-based algorithms with modern neural networks to discover the essential number concepts in future research.
No abstract available
Manus AI is a general-purpose AI agent introduced in early 2025, marking a significant advancement in autonomous artificial intelligence. Developed by the Chinese startup Monica.im, Manus is designed to bridge the gap between"mind"and"hand"- combining the reasoning and planning capabilities of large language models with the ability to execute complex, end-to-end tasks that produce tangible outcomes. This paper presents a comprehensive overview of Manus AI, exploring its core technical architecture, diverse applications across sectors such as healthcare, finance, manufacturing, robotics, and gaming, as well as its key strengths, current limitations, and future potential. Positioned as a preview of what lies ahead, Manus AI represents a shift toward intelligent agents that can translate high-level intentions into real-world actions, heralding a new era of human-AI collaboration.
In the rapid development of artificial intelligence in the last several years, machine learning is the mainstream method to realize artificial intelligence. What people usually call machine learning can be equivalent to statistical learning, which requires big data and powerful computing power; This is a machine learning trend driven by data, using algorithms to get a model with clear parameters, ignoring causal reasoning and focusing on statistical data; The machine learning method lacking logical causal reasoning will greatly hinder the advancement of artificial intelligence; How knowledge-driven causal reasoning provides new ideas for artificial intelligence is a question worth thinking about for scholars of artificial intelligence. Factor space theory that emphasizes causal reasoning will provide a new perspective and thinking for the development of artificial intelligence.
Smart wearable devices detection and recording of people’s everyday activities is critical for health monitoring, helping persons with disabilities, and providing care for the elderly. Most of the research that is being conducted uses a machine learning-based methodology; however, these approaches frequently have issues with high computing resource consumption, burdensome training data gathering, and restricted scalability across many contexts. This research suggests a behaviour detection technology based on multi-source sensing and logical reasoning to address these problems. In order to realize the natural fusion of signal processing and logical reasoning in behavior recognition research, this work designs a lightweight behavior recognition solution using the pertinent theories of ontology reasoning in classical artificial intelligence. Machine learning technology is also employed for behavior recognition using the same data set. Once the best model has been chosen, the cross-person recognition results after testing and modification of parameters are 90.8% and 92.1%, respectively. This technology was used to create a behaviour recognition system, and several tests were run to assess how well it worked. The findings demonstrate that the suggested strategy achieves over 90% recognition accuracy for 11 different daily activities, including jogging, walking, and stair climbing. Additionally, the suggested strategy dramatically minimises the quantity of user-provided training data needed in comparison to machine learning-based behaviour identification techniques.
No abstract available
No abstract available
No abstract available
本文主要研究一种用于鼻拭子机器人自动采样操作的视觉定位方法。使用机器人完成鼻拭子采样任务可以减少医务人员对新型冠状病毒病(COVID-19)患者的直接接触, 从而减小COVID-19带来的负面影响, 对COVID-19 的检测和防疫具有重要意义。该方法根据COVID-19 的传播特点使用层次决策网络来处理机器人的行为约束条件, 并结合医务人员的采样动作特点设计了使用单臂机器人进行鼻拭子采样操作的视觉导航定位方法。该方法所使用的决策网络综合考虑了人工采样操作中引起潜在接触感染风险的影响因素, 以尽可能降低病毒在人员之间的传播概率。进一步形成具有人工智能特征的视觉伺服控制策略, 并完成稳定、安全的鼻拭子机器人采样操作。实验证明, 该方法能够实现良好的机器人视觉系统的定位, 可以为公共卫生防控提供必要的技术支持。 This study focuses on a robot vision localization method for coping with the operational task of automatic nasal swab sampling. The application is important in the detection and epidemic prevention of Corona Virus Disease 2019 (COVID-19) to alleviate the large-scale negative impact of individuals suffering from pneumonia owing to COVID-19. In this method, the idea of a hierarchical decision network is used to consider the strong infectious characteristics of the COVID-19, which is followed by processing the robot behavior constraint condition. The visual navigation and positioning method using a single-arm robot for sampling is also planned, which considers the operation characteristics of medical staff. In the decision network, the risk factor for potential contact infection caused by swab sampling operations is established to avoid the spread among personnel. A robot visual servo control with artificial intelligence characteristics is developed to achieve a stable and safe nasal swab sampling operation. Experiments demonstrate that the proposed method can achieve good vision positioning for the robots and provide technical support for managing new major public health situations.
近年来,多智能体深度强化学习(multi-agent deep reinforcement learning, MADRL)的研究进展使其在现实世界的任务中更加实用,但其相对较差的可扩展性和部分可观测的限制为MADRL模型的性能和部署带来了更多的挑战。人类社会可以被视为一个大规模的部分可观测环境,其中每个人都具备与他人交流并记忆经验的功能。基于人类社会的启发,我们提出一种新的网络结构,称为层次图递归网络(hierarchical graph recurrent network, HGRN),用于部分可观测环境下的多智能体合作任务。具体来说,我们将多智能体系统构建为一个图,利用新颖的图卷积结构来实现异构相邻智能体之间的通信,并采用一个递归单元来使智能体具备记忆历史信息的能力。为了鼓励智能体探索并提高模型的鲁棒性,我们进而设计一种最大熵学习方法,令智能体可以学习可配置目标行动熵的随机策略。基于上述技术,我们提出一种名为Soft-HGRN的基于值的MADRL算法,及其名为SAC-HGRN的actor-critic变体。在三个同构场景和一个异构环境中进行实验;实验结果不仅表明我们的方法相比四个MADRL基线取得了明显的改进,而且证明了所提模型的可解释性、可扩展性和可转移性。 The recent progress in multi-agent deep reinforcement learning (MADRL) makes it more practical in real-world tasks, but its relatively poor scalability and the partially observable constraint raise more challenges for its performance and deployment. Based on our intuitive observation that human society could be regarded as a large-scale partially observable environment, where everyone has the functions of communicating with neighbors and remembering his/her own experience, we propose a novel network structure called the hierarchical graph recurrent network (HGRN) for multi-agent cooperation under partial observability. Specifically, we construct the multi-agent system as a graph, use a novel graph convolution structure to achieve communication between heterogeneous neighboring agents, and adopt a recurrent unit to enable agents to record historical information. To encourage exploration and improve robustness, we design a maximum-entropy learning method that can learn stochastic policies of a configurable target action entropy. Based on the above technologies, we propose a value-based MADRL algorithm called Soft-HGRN and its actor-critic variant called SAC-HGRN. Experimental results based on three homogeneous tasks and one heterogeneous environment not only show that our approach achieves clear improvements compared with four MADRL baselines, but also demonstrate the interpretability, scalability, and transferability of the proposed model.
目的类脑计算,是指仿真、模拟和借鉴大脑神经网络结构和信息处理过程的装置、模型和方法,其目标是制造类脑计算机和类脑智能。 方法类脑计算相关研究已经有20多年的历史,本文从模拟生物神经元和神经突触的神经形态器件、神经网络芯片、类脑计算模型与应用等方面对国内外研究进展和面临的挑战进行介绍,并对未来的发展趋势进行展望。 结果与经典人工智能符号主义、连接主义、行为主义以及机器学习的统计主义这些技术路线不同,类脑计算采取仿真主义:结构层次模仿脑(非冯·诺依曼体系结构),器件层次逼近脑(模拟神经元和神经突触的神经形态器件),智能层次超越脑(主要靠自主学习训练而不是人工编程)。 结论目前类脑计算离工业界实际应用还有较大差距,这也为研究者提供了重要研究方向与机遇。
No abstract available
No abstract available
This paper presents an interdisciplinary framework, Machine Psychology, which integrates principles from operant learning psychology with a particular Artificial Intelligence model, the Non-Axiomatic Reasoning System (NARS), to advance Artificial General Intelligence (AGI) research. Central to this framework is the assumption that adaptation is fundamental to both biological and artificial intelligence, and can be understood using operant conditioning principles. The study evaluates this approach through three operant learning tasks using OpenNARS for Applications (ONA): simple discrimination, changing contingencies, and conditional discrimination tasks. In the simple discrimination task, NARS demonstrated rapid learning, achieving 100% correct responses during training and testing phases. The changing contingencies task illustrated NARS’s adaptability, as it successfully adjusted its behavior when task conditions were reversed. In the conditional discrimination task, NARS managed complex learning scenarios, achieving high accuracy by forming and utilizing complex hypotheses based on conditional cues. These results validate the use of operant conditioning as a framework for developing adaptive AGI systems. NARS’s ability to function under conditions of insufficient knowledge and resources, combined with its sensorimotor reasoning capabilities, positions it as a robust model for AGI. The Machine Psychology framework, by implementing aspects of natural intelligence such as continuous learning and goal-driven behavior, provides a scalable and flexible approach for real-world applications. Future research should explore using enhanced NARS systems, more advanced tasks and applying this framework to diverse, complex tasks to further advance the development of human-level AI.
Can machines truly think, reason and act in domains like humans? This enduring question continues to shape the pursuit of Artificial General Intelligence (AGI). Despite the growing capabilities of models such as GPT-4.5, DeepSeek, Claude 3.5 Sonnet, Phi-4, and Grok 3, which exhibit multimodal fluency and partial reasoning, these systems remain fundamentally limited by their reliance on token-level prediction and lack of grounded agency. This paper offers a cross-disciplinary synthesis of AGI development, spanning artificial intelligence, cognitive neuroscience, psychology, generative models, and agent-based systems. We analyze the architectural and cognitive foundations of general intelligence, highlighting the role of modular reasoning, persistent memory, and multi-agent coordination. In particular, we emphasize the rise of Agentic RAG frameworks that combine retrieval, planning, and dynamic tool use to enable more adaptive behavior. We discuss generalization strategies, including information compression, test-time adaptation, and training-free methods, as critical pathways toward flexible, domain-agnostic intelligence. Vision-Language Models (VLMs) are reexamined not just as perception modules but as evolving interfaces for embodied understanding and collaborative task completion. We also argue that true intelligence arises not from scale alone but from the integration of memory and reasoning: an orchestration of modular, interactive, and self-improving components where compression enables adaptive behavior. Drawing on advances in neurosymbolic systems, reinforcement learning, and cognitive scaffolding, we explore how recent architectures begin to bridge the gap between statistical learning and goal-directed cognition. Finally, we identify key scientific, technical, and ethical challenges on the path to AGI.
Artificial General Intelligence (AGI) is an area of artificial intelligence research that aspires to create machines that can perform any intellectual task that a human can do. Unlike Artificial Narrow Intelligence (ANI), which is domain-specific, AGI seeks to replicate human-like adaptability, reasoning, and creativity. This paper provides a critical overview of the historical development of AGI, evaluates ongoing projects and initiatives, examines its potential applications across multiple sectors, and discusses associated ethical and governance challenges. The study identifies technological and societal gaps, highlights risks, and proposes a phased roadmap toward the development of AGI. The analysis emphasizes the importance of integrating ethical frameworks, interdisciplinary collaboration, and responsible innovation in shaping the AGI path.
This article examines the trajectory and challenges of evolving current large language models (LLMs) toward artificial general intelligence (AGI) capabilities within clinical healthcare environments. The text analyzes the gaps between contemporary LLMs' pattern recognition abilities and the robust reasoning, causal understanding, and contextual adaptation required for true medical AGI. Through a systematic review of current clinical applications and limitations of LLMs, the article identifies three critical areas requiring advancement: dynamic integration of multi-modal medical data streams, consistent medical reasoning across novel scenarios, and autonomous learning from clinical interactions while maintaining safety constraints. A novel architectural framework is proposed that combines LLM capabilities with symbolic reasoning, causal inference, and continual learning mechanisms specifically designed for clinical environments. The article suggests that while LLMs provide a promising foundation, achieving AGI in clinical systems requires fundamental breakthroughs in areas including knowledge representation, uncertainty quantification, and ethical decision-making. The article concludes by outlining a roadmap for research priorities and safety considerations essential for progressing toward clinical AGI while maintaining patient safety and care quality.
Traditionally, the users of an information retrieval (IR) system have been human users. We present a new perspective on IR research in which the users of an IR system are intelligent agents instead of human users. Extending the current work on retrieval-augmented generation (RAG), we identify five novel IR tasks that an intelligent agent must be able to perform in order to achieve Human-Level Artificial Intelligence, or Artificial General Intelligence (AGI), including 1) External Information Retrieval (EIR) to access new information unseen by the agent, 2) Provenance Information Retrieval (PIR) to trace the provenance of information, 3) Curriculum Information Retrieval (CIR) to actively acquire the most useful new data and information for lifelong learning, 4) Rule Information Retrieval (RIR) to perform reasoning and problem solving, and 5) Scenario Information Retrieval (SIR) to leverage past scenarios for problem solving and decision making. We compare these new IR tasks with the traditional IR tasks performed by an IR system that serves human users and systematically examine the challenges involved in the five new IR tasks, providing a roadmap for new IR research within the broader context of AGI development.
Grid-based games, such as Tic-Tac-Toe, Connect-Four, and Gomoku, offer a valuable platform for evaluating large language models (LLMs) in reasoning, rule comprehension, and strategic thinking which are key skills for advancing Artificial General Intelligence (AGI). Current evaluation benchmarks often focus on tasks like natural language understanding or domain-specific problem-solving, lacking in multi-step reasoning and decision-making assessments. This study introduces an extensible benchmark framework leveraging these games to evaluate LLMs using three prompt types: list, illustration, and image. The framework's modular design facilitates the addition of new games, dynamic rule changes, and advanced prompt engineering techniques, enabling deeper examination of LLM capabilities. Through 2,310 simulated matches, we evaluated leading LLMs, including Claude 3.5 Sonnet, GPT-4 Turbo, and Llama3-70B. Results revealed significant performance variations, with simpler games like Tic-Tac-Toe yielding fewer invalid moves, while more complex games like Connect-Four and Gomoku posed greater challenges. List prompts were generally well-handled, while illustration and image prompts led to higher rates of disqualifications and missed opportunities. The findings underscore the utility of grid-based games as benchmarks for evaluating strategic thinking and adaptability, with implications for robotics, autonomous systems, and interactive AI. Limitations in handling visual data and complex scenarios suggest areas for improvement. The open-source nature of the benchmark encourages transparency and community contributions, fostering collaborative advancements in LLM research. Future directions include expanding to more complex games, refining prompt techniques, and exploring dynamic rule changes to deepen insights into LLM reasoning capabilities. This study lays the groundwork for advancing AI evaluation through flexible and comprehensive benchmarking tools, guiding progress toward more sophisticated and real-world applications.
This booklet, Unlocking the Wisdom of Multi-LLM Collaborative Intelligence, serves as an accessible introduction to the full volume The Path to Artificial General Intelligence. Through fourteen aphorisms, it distills the core principles of Multi-LLM Agent Collaborative Intelligence (MACI), a framework designed to coordinate multiple LLMs toward reasoning, planning, and decision-making that surpasses the capabilities of any single model. The booklet includes titles, abstracts, and introductions from each main chapter, along with the full content of the first two. The newly released third edition features significant enhancements to Chapters 6 through 9 and a revised preface responding to Yann LeCun's critique of AGI feasibility. While LeCun argues that LLMs lack grounding, memory, and planning, we propose that MACI's collaborative architecture, featuring multimodal agents in executive, legislative, and judicial roles, directly addresses these limitations. Chapters on SocraSynth, EVINCE, consciousness modeling, and behavior regulation demonstrate that reasoning systems grounded in structured interaction and checks and balances can produce more reliable, interpretable, and adaptive intelligence. By integrating complementary model strengths, including world modeling and multimodal perception, MACI enables a system-level intelligence that exceeds the sum of its parts. Like human institutions, progress in AI may depend less on isolated performance and more on coordinated judgment. Collaborative LLMs, not just larger ones, may chart the path toward artificial general intelligence.
Abstract—Artificial General Intelligence (AGI) represents the apex of AI research, striving to replicate human-like adaptability, reasoning, and learning across diverse domains. While current AI systems excel in specific, narrow tasks, they fall short of generalization, creativity, and transferability. Inspired by Francois Chollet’s “On the Measure of Intelligence,” this paper synthesizes theoretical insights and practical methodologies to propose a pathway toward AGI. We introduce frameworks for hybrid architectures, embodied learning, skill-acquisition benchmarks, and ethical safeguards, creating a robust foundation for scalable and human-aligned AGI. Index Terms—Artificial General Intelligence (AGI), Hybrid Architectures, Skill-Acquisition Efficiency, Ethical Safeguards, Embodied Learning, Generalization Benchmarks
Large language models (LLMs) are advanced artificial intelligence (AI) systems that can perform a variety of tasks commonly found in human intelligence tests, such as defining words, performing calculations, and engaging in verbal reasoning. There are also substantial individual differences in LLM capacities. Given the consistent observation of a positive manifold and general intelligence factor in human samples, along with group-level factors (e.g., crystallized intelligence), we hypothesized that LLM test scores may also exhibit positive intercorrelations, which could potentially give rise to an artificial general ability (AGA) factor and one or more group-level factors. Based on a sample of 591 LLMs and scores from 12 tests aligned with fluid reasoning (Gf), domain-specific knowledge (Gkn), reading/writing (Grw), and quantitative knowledge (Gq), we found strong empirical evidence for a positive manifold and a general factor of ability. Additionally, we identified a combined Gkn/Grw group-level factor. Finally, the number of LLM parameters correlated positively with both general factor of ability and Gkn/Grw factor scores, although the effects showed diminishing returns. We interpreted our results to suggest that LLMs, like human cognitive abilities, may share a common underlying efficiency in processing information and solving problems, though whether LLMs manifest primarily achievement/expertise rather than intelligence remains to be determined. Finally, while models with greater numbers of parameters exhibit greater general cognitive-like abilities, akin to the connection between greater neuronal density and human general intelligence, other characteristics must also be involved.
General intelligence remains in its developmental stage. At present, most artificial intelligence systems are still categorized as narrow AI, capable of performing only specific and well-defined tasks. To move toward more general intelligence, a promising approach is through collective collaboration among multiple narrow AIs. However, the uncontrolled growth of such systems could pose significant risks in the future. To mitigate these risks, we propose the implementation of an artificial prefrontal cortex mechanism within the collaborative framework of narrow AIs. This mechanism functions as both a safety controller and a task switcher, determining which narrow AI should be activated for a given context. Through this architecture, the collaborative system may evolve toward adaptive and safe general intelligence, which is capable of coordination and reasoning, yet constrained by ethical and operational safeguards.
From early days, a key and controversial question inside the artificial intelligence community was whether Artificial General Intelligence (AGI) is achievable. AGI is the ability of machines and computer programs to achieve human-level intelligence and do all tasks that a human being can. While there exist a number of systems in the literature claiming they realize AGI, several other researchers argue that it is impossible to achieve it. In this paper, we take a different view to the problem. First, we discuss that in order to realize AGI, along with building intelligent machines and programs, an intelligent world should also be constructed which is on the one hand, an accurate approximation of our world and on the other hand, a significant part of reasoning of intelligent machines is already embedded in this world. Then we discuss that AGI is not a product or algorithm, rather it is a continuous process which will become more and more mature over time (like human civilization and wisdom). Then, we argue that pre-trained embeddings play a key role in building this intelligent world and as a result, realizing AGI. We discuss how pre-trained embeddings facilitate achieving several characteristics of human-level intelligence, such as embodiment, common sense knowledge, unconscious knowledge and continuality of learning, by machines.
Artificial general intelligence (AGI) may herald our extinction, according to AI safety research. Yet claims regarding AGI must rely upon mathematical formalisms -- theoretical agents we may analyse or attempt to build. AIXI appears to be the only such formalism supported by proof that its behaviour is optimal, a consequence of its use of compression as a proxy for intelligence. Unfortunately, AIXI is incomputable and claims regarding its behaviour highly subjective. We argue that this is because AIXI formalises cognition as taking place in isolation from the environment in which goals are pursued (Cartesian dualism). We propose an alternative, supported by proof and experiment, which overcomes these problems. Integrating research from cognitive science with AI, we formalise an enactive model of learning and reasoning to address the problem of subjectivity. This allows us to formulate a different proxy for intelligence, called weakness, which addresses the problem of incomputability. We prove optimal behaviour is attained when weakness is maximised. This proof is supplemented by experimental results comparing weakness and description length (the closest analogue to compression possible without reintroducing subjectivity). Weakness outperforms description length, suggesting it is a better proxy. Furthermore we show that, if cognition is enactive, then minimisation of description length is neither necessary nor sufficient to attain optimal performance, undermining the notion that compression is closely related to intelligence. However, there remain open questions regarding the implementation of scale-able AGI. In the short term, these results may be best utilised to improve the performance of existing systems. For example, our results explain why Deepmind's Apperception Engine is able to generalise effectively, and how to replicate that performance by maximising weakness.
Edward C. Tolman found reinforcement learning unsatisfactory for explaining intelligence and proposed a clear distinction between learning and behavior. Tolman's ideas on latent learning and cognitive maps eventually led to what is now known as conceptual space, a geometric representation where concepts and ideas can form points or shapes.Active navigation between ideas - reasoning - can be expressed directly as purposive navigation in conceptual space. Assimilating the theory of conceptual space from modern neuroscience, we propose autonomous navigation as a valid approach for emulated cognition. However, achieving autonomous navigation in high-dimensional Euclidean spaces is not trivial in technology. In this work, we explore whether neoRL navigation is up for the task; adopting Kaelbling's concerns for efficient robot navigation, we test whether the neoRL approach is general across navigational modalities, compositional across considerations of experience, and effective when learning in multiple Euclidean dimensions. We find neoRL learning to be more resemblant of biological learning than of RL in AI, and propose neoRL navigation of conceptual space as a plausible new path toward emulated cognition.
General reasoning represents a long-standing and formidable challenge in artificial intelligence (AI). Recent breakthroughs, exemplified by large language models (LLMs)1,2 and chain-of-thought (CoT) prompting3, have achieved considerable success on foundational reasoning tasks. However, this success is heavily contingent on extensive human-annotated demonstrations and the capabilities of models are still insufficient for more complex problems. Here we show that the reasoning abilities of LLMs can be incentivized through pure reinforcement learning (RL), obviating the need for human-labelled reasoning trajectories. The proposed RL framework facilitates the emergent development of advanced reasoning patterns, such as self-reflection, verification and dynamic strategy adaptation. Consequently, the trained model achieves superior performance on verifiable tasks such as mathematics, coding competitions and STEM fields, surpassing its counterparts trained through conventional supervised learning on human demonstrations. Moreover, the emergent reasoning patterns exhibited by these large-scale models can be systematically used to guide and enhance the reasoning capabilities of smaller models. A new artificial intelligence model, DeepSeek-R1, is introduced, demonstrating that the reasoning abilities of large language models can be incentivized through pure reinforcement learning, removing the need for human-annotated demonstrations.
This paper introduces a new metamodel-based knowledge representation that significantly improves autonomous learning and adaptation. While interest in hybrid machine learning/symbolic AI systems leveraging, for example, reasoning and knowledge graphs, is gaining popularity, we find there remains a need for both a clear definition of knowledge and a metamodel to guide the creation and manipulation of knowledge. Some of the benefits of the metamodel we introduce in this paper include a solution to the symbol grounding problem, cumulative learning and federated learning. We have applied the metamodel to problems ranging from time series analysis, computer vision and natural language understanding and have found that the metamodel enables a wide variety of learning mechanisms ranging from machine learning, to graph network analysis and learning by reasoning engines to interoperate in a highly synergistic way. Our metamodel-based projects have consistently exhibited unprecedented accuracy, performance, and ability to generalize. This paper is inspired by the state-of-the-art approaches to AGI, recent AGI-aspiring work, the granular computing community, as well as Alfred Korzybski’s general semantics. One surprising consequence of the metamodel is that it not only enables a new level of autonomous learning and optimal functioning for machine intelligences, but may also shed light on a path to better understanding how to improve human cognition.
This paper presents a tentative outline for the construction of an artificial, generally intelligent system (AGI). It is argued that building a general data compression algorithm solving all problems up to a complexity threshold should be the main thrust of research. A measure for partial progress in AGI is suggested. Although the details are far from being clear, some general properties for a general compression algorithm are fleshed out. Its inductive bias should be flexible and adapt to the input data while constantly searching for a simple, orthogonal and complete set of hypotheses explaining the data. It should recursively reduce the size of its representations thereby compressing the data increasingly at every iteration. Abstract Based on that fundamental ability, a grounded reasoning system is proposed. It is argued how grounding and flexible feature bases made of hypotheses allow for resourceful thinking. While the simulation of representation contents on the mental stage accounts for much of the power of propositional logic, compression leads to simple sets of hypotheses that allow the detection and verification of universally quantified statements. Abstract Together, it is highlighted how general compression and grounded reasoning could account for the birth and growth of first concepts about the world and the commonsense reasoning about them.
Artificial Expert Intelligence (AEI) seeks to transcend the limitations of both Artificial General Intelligence (AGI) and narrow AI by integrating domain-specific expertise with critical, precise reasoning capabilities akin to those of top human experts. Existing AI systems often excel at predefined tasks but struggle with adaptability and precision in novel problem-solving. To overcome this, AEI introduces a framework for ``Probably Approximately Correct (PAC) Reasoning". This paradigm provides robust theoretical guarantees for reliably decomposing complex problems, with a practical mechanism for controlling reasoning precision. In reference to the division of human thought into System 1 for intuitive thinking and System 2 for reflective reasoning~\citep{tversky1974judgment}, we refer to this new type of reasoning as System 3 for precise reasoning, inspired by the rigor of the scientific method. AEI thus establishes a foundation for error-bounded, inference-time learning.
No abstract available
The quest for artificial intelligence (AGI) revolves around creating machines that can mimic human-like cognitive abilities in various industries AGI aims to enable machines to achieve multifunctional learning, problem solving and power of reasoning similar to human reasoning. It seeks to go beyond a narrow, idiosyncratic AI by developing machines to understand and adapt to new situations by learning from experience without explicit structure. AGI research focuses on algorithm architectures to enable autonomous learning, abstraction, and contextual understanding, with the goal of replicating human-like intelligence through disciplines such as neuroscience, computer science, and integrating cognitive psychology to model human cognition to achieve AGI. Advances in deep learning, reinforcement learning, and neural networks are critical to improving AGI. However, challenges remain in understanding the complexity of human cognitive processes and ensuring appropriate design. Despite the progress, finding truly thinking machines like humans is still a huge endeavour between technical, ethical, and philosophical research.
Sound deductive reasoning -- the ability to derive new knowledge from existing facts and rules -- is an indisputably desirable aspect of general intelligence. Despite the major advances of AI systems in areas such as math and science, especially since the introduction of transformer architectures, it is well-documented that even the most advanced frontier systems regularly and consistently falter on easily-solvable deductive reasoning tasks. Hence, these systems are unfit to fulfill the dream of achieving artificial general intelligence capable of sound deductive reasoning. We argue that their unsound behavior is a consequence of the statistical learning approach powering their development. To overcome this, we contend that to achieve reliable deductive reasoning in learning-based AI systems, researchers must fundamentally shift from optimizing for statistical performance against distributions on reasoning problems and algorithmic tasks to embracing the more ambitious exact learning paradigm, which demands correctness on all inputs. We argue that exact learning is both essential and possible, and that this ambitious objective should guide algorithm design.
Large Language Models (LLMs) have demonstrated impressive real-world utility, exemplifying artificial useful intelligence (AUI). However, their ability to reason adaptively and robustly -- the hallmarks of artificial general intelligence (AGI) -- remains fragile. While LLMs seemingly succeed in commonsense reasoning, programming, and mathematics, they struggle to generalize algorithmic understanding across novel contexts. Our experiments with algorithmic tasks in esoteric programming languages reveal that LLM's reasoning overfits to the training data and is limited in its transferability. We hypothesize that the core issue underlying such limited transferability is the coupling of reasoning and knowledge in LLMs. To transition from AUI to AGI, we propose disentangling knowledge and reasoning through three key directions: (1) pretaining to reason using RL from scratch as an alternative to the widely used next-token prediction pretraining, (2) using a curriculum of synthetic tasks to ease the learning of a reasoning prior for RL that can then be transferred to natural language tasks, and (3) learning more generalizable reasoning functions using a small context window to reduce exploiting spurious correlations between tokens. Such a reasoning system coupled with a trained retrieval system and a large external memory bank as a knowledge store can overcome several limitations of existing architectures at learning to reason in novel scenarios.
Legal general intelligence (GI) refers to artificial intelligence (AI) that encompasses legal understanding, reasoning, and decision-making, simulating the expertise of legal experts across domains. However, existing benchmarks are result-oriented and fail to systematically evaluate the legal intelligence of large language models (LLMs), hindering the development of legal GI. To address this, we propose LexGenius, an expert-level Chinese legal benchmark for evaluating legal GI in LLMs. It follows a Dimension-Task-Ability framework, covering seven dimensions, eleven tasks, and twenty abilities. We use the recent legal cases and exam questions to create multiple-choice questions with a combination of manual and LLM reviews to reduce data leakage risks, ensuring accuracy and reliability through multiple rounds of checks. We evaluate 12 state-of-the-art LLMs using LexGenius and conduct an in-depth analysis. We find significant disparities across legal intelligence abilities for LLMs, with even the best LLMs lagging behind human legal professionals. We believe LexGenius can assess the legal intelligence abilities of LLMs and enhance legal GI development. Our project is available at https://github.com/QwenQKing/LexGenius.
The rise of agentic artificial intelligence (Agentic AI) marks a transition from systems that optimize externally specified objectives to systems capable of representing, evaluating, and revising their own goals. Whereas earlier AI architectures executed fixed task specifications, agentic systems maintain recursive loops of perception, evaluation, goal-updating, and action, allowing them to sustain and adapt purposive activity across temporal and organizational scales. This paper argues that Agentic AI is not an incremental extension of large language models (LLMs) or autonomous agents in the sense we know it from classical AI and multi-agent systems, but a reconstitution of agency itself within computational substrates. Building on the logic of coordination, delegation, and self-regulation developed in early agent-based process management systems, we propose a general theory of synthetic purposiveness, where agency emerges as a distributed and self-maintaining property of artificial systems operating in open-ended environments. We develop the concept of synthetic teleology—the engineered capacity of artificial systems to generate and regulate goals through ongoing self-evaluation—and we formalize its dynamics through a recursive goal-maintenance equation. We further outline design patterns, computational semantics, and measurable indicators of purposiveness (e.g., teleological coherence, adaptive recovery, and reflective efficiency), providing a foundation for the systematic design and empirical investigation of agentic behaviour. By reclaiming agency as a first-class construct in artificial intelligence, we argue for a paradigm shift from algorithmic optimization toward goal-directed reasoning and purposive orchestration—one with far-reaching epistemic, societal, and institutional consequences.
In the past years, machine learning (ML) has become a popular approach to support self-adaptation. While ML techniques enable dealing with several problems in self-adaptation, such as scalable decision-making, they are also subject to inherent challenges. In this paper, we focus on one such challenge that is particularly important for self-adaptation: ML techniques are designed to deal with a set of predefined tasks associated with an operational domain; they have problems to deal with new emerging tasks, such as concept shift in input data that is used for learning. To tackle this challenge, we present lifelong self-adaptation: a novel approach to self-adaptation that enhances self-adaptive systems that use ML techniques with a lifelong ML layer. The lifelong ML layer tracks the running system and its environment, associates this knowledge with the current tasks, identifies new tasks based on differentiations, and updates the learning models of the self-adaptive system accordingly. We present a reusable architecture for lifelong self-adaptation and apply it to the case of concept drift caused by unforeseen changes of the input data of a learning model that is used for decision-making in self-adaptation. We validate lifelong self-adaptation for two types of concept drift using two cases.
The integration of AI and Machine Learning (ML) into Quality Assurance (QA) for Automation Engineering represents a transformative shift, leveraging data-driven decision-making and automation across industries. Despite their promising benefits, the reliability, fairness, and generalizability of ML models remain significant concerns. This paper addresses these challenges by exploring the complexities inherent in assessing and validating ML programs. Firstly, it identifies obstacles such as bias, model robustness, and adaptability to new data, emphasizing the necessity for rigorous testing frameworks. Secondly, the paper reviews existing methodologies and solutions proposed in scholarly literature to enhance the assessment of ML programs, ensuring they perform as intended and meet ethical standards.This comprehensive manual serves as a guiding resource for professionals and scholars navigating the dynamic convergence of QA and ML. It underscores the need for continual learning and adaptation in an era where AI's potential is matched by the responsibilities of ethical and resilient model development. By offering profound insights and methodologies, the paper equips QA practitioners and AI enthusiasts alike to navigate the intricate terrain of quality assurance in the era of machine learning effectively.
Imaging in clinical routine is subject to changing scanner protocols, hardware, or policies in a typically heterogeneous set of acquisition hardware. Accuracy and reliability of deep learning models suffer from those changes as data and targets become inconsistent with their initial static training set. Continual learning can adapt to a continuous data stream of a changing imaging environment. Here, we propose a method for continual active learning on a data stream of medical images. It recognizes shifts or additions of new imaging sources - domains -, adapts training accordingly, and selects optimal examples for labelling. Model training has to cope with a limited labelling budget, resembling typical real world scenarios. We demonstrate our method on T1-weighted magnetic resonance images from three different scanners with the task of brain age estimation. Results demonstrate that the proposed method outperforms naive active learning while requiring less manual labelling.
Recent studies reveal that a deep neural network can learn transferable features which generalize well to novel tasks for domain adaptation. However, as deep features eventually transition from general to specific along the network, the feature transferability drops significantly in higher layers with increasing domain discrepancy. Hence, it is important to formally reduce the dataset bias and enhance the transferability in task-specific layers. In this paper, we propose a new Deep Adaptation Network (DAN) architecture, which generalizes deep convolutional neural network to the domain adaptation scenario. In DAN, hidden representations of all task-specific layers are embedded in a reproducing kernel Hilbert space where the mean embeddings of different domain distributions can be explicitly matched. The domain discrepancy is further reduced using an optimal multikernel selection method for mean embedding matching. DAN can learn transferable features with statistical guarantees, and can scale linearly by unbiased estimate of kernel embedding. Extensive empirical evidence shows that the proposed architecture yields state-of-the-art image classification error rates on standard domain adaptation benchmarks.
Large-scale reinforcement learning (RL) methods have proven highly effective in enhancing the reasoning abilities of large language models (LLMs), particularly for tasks with verifiable solutions such as mathematics and coding. However, applying this idea to machine translation (MT), where outputs are flexibly formatted and difficult to automatically evaluate with explicit rules, remains underexplored. In this work, we introduce MT-R1-Zero, the first open-source adaptation of the R1-Zero RL framework for MT without supervised fine-tuning or cold-start. We propose a rule-metric mixed reward mechanism to guide LLMs towards improved translation quality via emergent reasoning. On the WMT 24 English-Chinese benchmark, our MT-R1-Zero-3B-Mix achieves competitive performance, surpassing TowerInstruct-7B-v0.2 by an average of 1.26 points. Meanwhile, our MT-R1-Zero-7B-Mix attains a high average score of 62.25 across all metrics, placing it on par with advanced proprietary models such as GPT-4o and Claude-3.5-Sonnet, while the MT-R1-Zero-7B-Sem variant achieves state-of-the-art scores on semantic metrics. Moreover, our work exhibits strong generalization capabilities on out-of-distribution MT tasks, robustly supporting multilingual and low-resource settings. Extensive analysis of model behavior across different initializations and reward metrics offers pioneering insight into the critical role of reward design, LLM adaptability, training dynamics, and emergent reasoning patterns within the R1-Zero paradigm for MT. Our code is available at https://github.com/fzp0424/MT-R1-Zero.
No abstract available
Against the macro background of deep integration between higher education and vocational education, the rapid development of artificial intelligence (AI) technology offers new theoretical perspectives and practical pathways for transforming and restructuring the training model of vocational faculty in university mechanical programs. Based on constructivist learning theory, teacher professional development theory, and the intrinsic logic of technology-enabled education, this paper systematically analyzes the core issues in the current training of vocational faculty in mechanical programs, including outdated knowledge structures, disconnection from practical abilities, and insufficient teaching adaptability. By introducing AI technologies with core capabilities such as context awareness, adaptive learning, intelligent assessment, and virtual simulation, it is possible to construct a multidimensional empowerment pathway, which runs from data-driven diagnosis of teacher competencies to intelligent personalized training programs, then to integrated practical training environments, and finally to continuous and dynamic professional development support. This pathway aims to transition faculty training from experience-oriented to data-intelligence-oriented, significantly enhancing teachers' abilities in instructional design, technology application, and professional reflexivity in complex engineering contexts. The study further points out that AI empowerment is not merely a technological addition, but involves a systematic reshaping of educational philosophy, curriculum systems, evaluation mechanisms, and organizational culture. Future exploration is needed in ethical standards, human-computer collaboration mechanisms, and interdisciplinary faculty development to advance the training system for mechanical program vocational faculty toward greater intelligence, personalization, and ecological integration.
No abstract available
本评论回顾1998年提出的“一次性学习”(once learning, OLM)机制, 和随后出现的用于图像分类的“一瞥学习”(one-shot learning)以及用于目标检测的“你仅看一次”(you only look once, YOLO)。基于目前人工智能(AI)研究现状, 提出将其划分为以下子学科: 人工类人智能、人工机器智能、人工仿生智能和人工量子智能。这些被认为是AI研发的主要方向, 并按以下分类标准区分: (1)以类人、机器、仿生或量子计算为本的AI研发;(2)升维或降维的信息输入;(3)小样本或大数据知识学习。
No abstract available
No abstract available
No abstract available
No abstract available
No abstract available
No abstract available
No abstract available
No abstract available
No abstract available
No abstract available
The exceptionally rapid development of highly flexible, reusable artificial intelligence (AI) models is likely to usher in newfound capabilities in medicine. We propose a new paradigm for medical AI, which we refer to as generalist medical AI (GMAI). GMAI models will be capable of carrying out a diverse set of tasks using very little or no task-specific labelled data. Built through self-supervision on large, diverse datasets, GMAI will flexibly interpret different combinations of medical modalities, including data from imaging, electronic health records, laboratory results, genomics, graphs or medical text. Models will in turn produce expressive outputs such as free-text explanations, spoken recommendations or image annotations that demonstrate advanced medical reasoning abilities. Here we identify a set of high-impact potential applications for GMAI and lay out specific technical capabilities and training datasets necessary to enable them. We expect that GMAI-enabled applications will challenge current strategies for regulating and validating AI devices for medicine and will shift practices associated with the collection of large medical datasets.
Interest in artificial intelligence (AI) has reached an all-time high, and health care leaders across the ecosystem are faced with questions about where, when, and how to deploy AI and how to understand its risks, problems, and possibilities. While AI as a concept has existed since the 1950s, all AI is not the same. Capabilities and risks of various kinds of AI differ markedly, and on examination 3 epochs of AI emerge. AI 1.0 includes symbolic AI, which attempts to encode human knowledge into computational rules, as well as probabilistic models. The era of AI 2.0 began with deep learning, in which models learn from examples labeled with ground truth. This era brought about many advances both in people's daily lives and in health care. Deep learning models are task-specific, meaning they do one thing at a time, and they primarily focus on classification and prediction. AI 3.0 is the era of foundation models and generative AI. Models in AI 3.0 have fundamentally new (and potentially transformative) capabilities, as well as new kinds of risks, such as hallucinations. These models can do many different kinds of tasks without being retrained on a new dataset. For example, a simple text instruction will change the model's behavior. Prompts such as "Write this note for a specialist consultant" and "Write this note for the patient's mother" will produce markedly different content. Foundation models and generative AI represent a major revolution in AI's capabilities, ffering tremendous potential to improve care. Health care leaders are making decisions about AI today. While any heuristic omits details and loses nuance, the framework of AI 1.0, 2.0, and 3.0 may be helpful to decision-makers because each epoch has fundamentally different capabilities and risks.
Deep learning artificial intelligence (AI) algorithms are poised to subsume diagnostic imaging specialists in radiology and nuclear medicine, where radiomics can consistently outperform human analysis and reporting capability, and do it faster, with greater accuracy and reliability. However, claims made for generative AI in respect of decision-making in the clinical practice of theranostic nuclear medicine are highly contentious. Statistical computer algorithms cannot emulate human emotion, reason, instinct, intuition, or empathy. AI simulates intelligence without possessing it. AI has no understanding of the meaning of its outputs. The unique statistical probability attributes of large language models of AI must be complemented by the innate human intuitive capabilities of nuclear physicians who accept the responsibility and accountability for direct clinical care of each individual patient referred for theranostic management of specified cancers. Complementarity envisions synergistic engagement with AI radiomics, genomics, radiobiology, dosimetry, and data collation from multidimensional sources, including the electronic medical record, to enable the nuclear physician to spend informed face time with their patient. Together with physician discernment, application of the technical insights from AI will facilitate optimal formulation of a personalized precision theranostic strategy for empathic, efficacious, targeted treatment of the patient with cancer in accordance with their wishes.
Conceptual abstraction and analogy-making are key abilities underlying humans' abilities to learn, reason, and robustly adapt their knowledge to new domains. Despite a long history of research on constructing artificial intelligence (AI) systems with these abilities, no current AI system is anywhere close to a capability of forming humanlike abstractions or analogies. This paper reviews the advantages and limitations of several approaches toward this goal, including symbolic methods, deep learning, and probabilistic program induction. The paper concludes with several proposals for designing challenge tasks and evaluation measures in order to make quantifiable and generalizable progress in this area.
The problem of generating generally capable agents is an important frontier in artificial intelligence (AI) research. Such agents may demonstrate open-ended, versatile, and diverse modes of expression, similar to humans. We interpret the work of Heintz & Scott-Phillips as a minimal sufficient set of socio-cognitive biases for the emergence of generally expressive AI, separate yet complementary to existing algorithms.
This cross-sectional study assesses the ability of a large language model to process medical data and display clinical reasoning compared with the ability of attending physicians and residents.
We envision "AI scientists" as systems capable of skeptical learning and reasoning that empower biomedical research through collaborative agents that integrate AI models and biomedical tools with experimental platforms. Rather than taking humans out of the discovery process, biomedical AI agents combine human creativity and expertise with AI's ability to analyze large datasets, navigate hypothesis spaces, and execute repetitive tasks. AI agents are poised to be proficient in various tasks, planning discovery workflows and performing self-assessment to identify and mitigate gaps in their knowledge. These agents use large language models and generative models to feature structured memory for continual learning and use machine learning tools to incorporate scientific knowledge, biological principles, and theories. AI agents can impact areas ranging from virtual cell simulation, programmable control of phenotypes, and the design of cellular circuits to developing new therapies.
No abstract
Performing effective gene-editing experiments requires a deep understanding of both the CRISPR technology and the biological system involved. Meanwhile, despite their versatility and promise, large language models (LLMs) often lack domain-specific knowledge and struggle to accurately solve biological design problems. We present CRISPR-GPT, an LLM agent system to automate and enhance CRISPR-based gene-editing design and data analysis. CRISPR-GPT leverages the reasoning capabilities of LLMs for complex task decomposition, decision-making and interactive human-artificial intelligence (AI) collaboration. This system incorporates domain expertise, retrieval techniques, external tools and a specialized LLM fine tuned with open-forum discussions among scientists. CRISPR-GPT assists users in selecting CRISPR systems, experiment planning, designing guide RNAs, choosing delivery methods, drafting protocols, designing assays and analysing data. We showcase the potential of CRISPR-GPT by knocking out four genes with CRISPR-Cas12a in a human lung adenocarcinoma cell line and epigenetically activating two genes using CRISPR-dCas9 in a human melanoma cell line. CRISPR-GPT enables fully AI-guided gene-editing experiment design and analysis across different modalities, validating its effectiveness as an AI co-pilot in genome engineering.
The field of artificial intelligence (AI) strives to build rational agents capable of perceiving the world around them and taking actions to advance specified goals. Put another way, AI researchers aim to construct a synthetic homo economicus, the mythical perfectly rational agent of neoclassical economics. We review progress toward creating this new species of machine, machina economicus, and discuss some challenges in designing AIs that can reason effectively in economic contexts. Supposing that AI succeeds in this quest, or at least comes close enough that it is useful to think about AIs in rationalistic terms, we ask how to design the rules of interaction in multi-agent systems that come to represent an economy of AIs. Theories of normative design from economics may prove more relevant for artificial agents than human agents, with AIs that better respect idealized assumptions of rationality than people, interacting through novel rules and incentive systems quite distinct from those tailored for people.
Artificial intelligence (AI) in the form of ChatGPT has rapidly attracted attention from physicians and medical educators. While it holds great promise for more routine medical tasks, may broaden one's differential diagnosis, and may be able to assist in the evaluation of images, such as radiographs and electrocardiograms, the technology is largely based on advanced algorithms akin to pattern recognition. One of the key questions raised in concert with these advances is: What does the growth of artificial intelligence mean for medical education, particularly the development of critical thinking and clinical reasoning? In this commentary, we will explore the elements of cognitive theory that underlie the ways in which physicians are taught to reason through a diagnostic case and compare hypothetico-deductive reasoning, often employing illness scripts, with inductive reasoning, which is based on a deeper understanding of mechanisms of health and disease. Issues of cognitive bias and their impact on diagnostic error will be examined. The constructs of routine and adaptive expertise will also be delineated. The application of artificial intelligence to diagnostic problem solving, along with concerns about racial and gender bias, will be delineated. Using several case examples, we will demonstrate the limitations of this technology and its potential pitfalls and outline the direction medical education may need to take in the years to come.
Julia has two sisters and one brother. How many sisters does her brother Martin have? Solving this tiny puzzle requires a bit of thinking. You might mentally picture the family of three girls and one boy and then realize that the boy has three sisters. Or you might figure out a more general rule: Any boy in the family will have one more sister than any girl. In other words, the answer to such a puzzle isn't something you immediately know, like Paris is the capital of France; it requires reasoning, a central feature of human intelligence, and one that large language models (LLMs) like GPT-4, for all their impressive behavior, struggle with.
"Clinical reasoning" refers to all the thought processes that physicians use to make a diagnosis and determine a treatment and care plan. Artificial intelligence (AI) will enhance, improve, and accelerate human clinical diagnostic thinking, but it is unlikely to replace it. Its application in medicine has the potential to drastically reduce medical diagnostic errors and give doctors more time to care for their patients. Here, we provide an overview of some of the key elements of clinical diagnostic reasoning and the potential impacts of AI on clinical reasoning.
No abstract
For many years, it has been widely accepted in the psychiatric field that clinical practice cannot be reduced to finely tuned statistical prediction systems utilizing diverse clinical data. Clinicians are recognized for their unique and irreplaceable roles. In this brief historical overview, viewed through the lens of artificial intelligence (AI), we propose that comprehending the reasoning behind AI can enhance our understanding of clinical reasoning. Our objective is to systematically identify the factors that shape clinical reasoning in medicine, based on six factors that were historically considered beyond the reach of statistical methods: open-endedness, unanalyzed stimulus-equivalences, empty cells, theory mediation, insufficient time, and highly configured functions. Nevertheless, a pertinent consideration in the age of AI is whether these once-considered insurmountable specific factors of clinicians are now subject to scrutiny or not. Through example in AI, we demonstrate that a deeper understanding of these factors not only sheds light on clinical decision-making and its heuristic processes but also underscores the significance of collaboration between AI experts and healthcare professionals. This comparison between AI and clinical reasoning contributes to a better grasp of the current challenges AI faces in the realm of clinical medicine.
Of the most popular applications of artificial intelligence (AI), those used in the health sector are the ones that represent the largest proportion, in terms of use and expectation. An investigative systematization model is proposed in the scientific training of nursing professionals, by articulating epistemological positions from previous studies on the subject. In order to validate the model proposed, a prototype was created to present an application that could help nurses in their clinical processes, storing their experiences in a case base for future research. The prototype consisted of digitizing paediatric nursing diagnoses and inserting them into a case base in order to assess the effectiveness of the prototype in handling these cases in a structure conducive to retrieval, adaptation, indexing, and case comparison. This work presents as a result a computational tool for the health area, employing one of the artificial intelligence techniques, case-based reasoning (CBR). The small governmental nursing education institution in Bangladesh used in this study did not yet have the systemization of nursing care (NCS) and computerized support scales.
As society has come to rely on groups and technology to address many of its most challenging problems, there is a growing need to understand how technology-enabled, distributed, and dynamic collectives can be designed to solve a wide range of problems over time in the face of complex and changing environmental conditions-an ability we define as "collective intelligence." We describe recent research on the Transaction Systems Model of Collective Intelligence (TSM-CI) that integrates literature from diverse areas of psychology to conceptualize the underpinnings of collective intelligence. The TSM-CI articulates the development and mutual adaptation of transactive memory, transactive attention, and transactive reasoning systems that together support the emergence and maintenance of collective intelligence. We also review related research on computational indicators of transactive-system functioning based on collaborative process behaviors that enable agent-based teammates to diagnose and potentially intervene to address developing issues. We conclude by discussing future directions in developing the TSM-CI to support research on developing collective human-machine intelligence and to identify ways to design technology to enhance it.
Artificial intelligence (AI)-based methods have emerged as powerful tools to transform medical care. Although machine learning classifiers (MLCs) have already demonstrated strong performance in image-based diagnoses, analysis of diverse and massive electronic health record (EHR) data remains challenging. Here, we show that MLCs can query EHRs in a manner similar to the hypothetico-deductive reasoning used by physicians and unearth associations that previous statistical methods have not found. Our model applies an automated natural language processing system using deep learning techniques to extract clinically relevant information from EHRs. In total, 101.6 million data points from 1,362,559 pediatric patient visits presenting to a major referral center were analyzed to train and validate the framework. Our model demonstrates high diagnostic accuracy across multiple organ systems and is comparable to experienced pediatricians in diagnosing common childhood diseases. Our study provides a proof of concept for implementing an AI-based system as a means to aid physicians in tackling large amounts of data, augmenting diagnostic evaluations, and to provide clinical decision support in cases of diagnostic uncertainty or complexity. Although this impact may be most evident in areas where healthcare providers are in relative shortage, the benefits of such an AI system are likely to be universal.
No abstract
With generative artificial intelligence (GenAI), particularly large language models (LLMs), continuing to make inroads in healthcare, assessing LLMs with human evaluations is essential to assuring safety and effectiveness. This study reviews existing literature on human evaluation methodologies for LLMs in healthcare across various medical specialties and addresses factors such as evaluation dimensions, sample types and sizes, selection, and recruitment of evaluators, frameworks and metrics, evaluation process, and statistical analysis type. Our literature review of 142 studies shows gaps in reliability, generalizability, and applicability of current human evaluation practices. To overcome such significant obstacles to healthcare LLM developments and deployments, we propose QUEST, a comprehensive and practical framework for human evaluation of LLMs covering three phases of workflow: Planning, Implementation and Adjudication, and Scoring and Review. QUEST is designed with five proposed evaluation principles: Quality of Information, Understanding and Reasoning, Expression Style and Persona, Safety and Harm, and Trust and Confidence.
This study evaluated the diagnostic accuracy of four large language model (LLM) artificial intelligence (AI) platforms in generating primary and differential diagnoses using United States Medical Licensing Examination (USMLE) Step 1 clinical vignettes. Ten USMLE Step 1 clinical vignette questions were selected, and answer choices were removed to simulate open-ended diagnostic reasoning. Each LLM-ChatGPT GPT-4o-mini (OpenAI), Meta AI Llama 4, Google Gemini 2.0 Flash, and Claude Sonnet 4 (Anthropic)-was prompted to provide both a primary diagnosis and a ranked differential diagnosis. Responses were evaluated using a three-point scoring rubric: 2 points for a correct final diagnosis, 1 point for a correct differential diagnosis only, and 0 points for an incorrect or missing diagnosis. The total possible score per model was 20 points. Claude Sonnet 4 achieved the highest accuracy with a total score of 20/20 (100%), followed by Google Gemini at 19/20 (95%), ChatGPT GPT-4o-mini at 17/20 (85%), and Meta AI Llama 4 at 13/20 (65%). All models demonstrated clinically relevant reasoning; however, diagnostic prioritization and accuracy varied by platform. The findings indicate that current LLMs possess strong potential as supplemental tools for diagnostic reasoning and medical education. Their ability to generate accurate diagnoses from complex clinical scenarios suggests value for training and clinical decision support. However, variability across platforms highlights the need for cautious implementation. Ethical considerations-including algorithmic bias, overreliance on AI-generated outputs, and patient privacy-must be addressed prior to clinical integration. Future research should incorporate larger and more diverse case sets, include specialty-specific assessments, and establish governance frameworks to guide responsible AI use in medical settings.
The integration of generative artificial intelligence (AI), particularly large language models (LLMs), into clinical reasoning heralds transformative potential for medical practice. However, their capacity to authentically replicate the complexity of human clinical decision-making remains uncertain-a challenge defined here as the reliability challenge. While studies demonstrate LLMs' ability to pass medical licensing exams and achieve diagnostic accuracy comparable to physicians, critical limitations persist. Crucially, LLMs mimic reasoning patterns rather than executing genuine logical reasoning, and their reliance on outdated or non-regional data undermines clinical relevance. To bridge this gap, we advocate for a synergistic paradigm where physicians leverage advanced clinical expertise while AI evolves toward transparency and interpretability. This requires AI systems to integrate real-time, context-specific evidence, align with local healthcare constraints, and adopt explainable architectures (e.g. multi-step reasoning frameworks or clinical knowledge graphs) to demystify decision pathways. Ultimately, reliable AI for clinical reasoning hinges on harmonizing technological innovation with human oversight, ensuring ethical adherence to beneficence and non-maleficence while advancing evidence-based, patient-centered care.
Artificial intelligence large language models (LLMs) are increasingly used to inform clinical decisions but sometimes exhibit human-like cognitive biases when facing nuanced medical choices. We tested whether new chain-of-thought reasoning LLMs might mitigate cognitive biases observed in physicians. We presented medical scenarios (n=10) to models released by DeepSeek, OpenAI and Google. Each scenario was presented in two versions that differed according to a specific bias (eg, surgery framed in survival vs mortality statistics). Responses were categorised and the extent of bias was measured by the absolute discrepancy between responses to different versions of the same scenario. The extent of intransigence (also termed dogma or inflexibility) was measured by Shannon entropy. The extent of deviance in each scenario was measured by comparing the average model response to the average practicing physician response (n=2507). DeepSeek-R1 mitigated 6 out of 10 cognitive biases observed in practicing physicians by generating intransigent all-or-none responses. The four biases that persisted were post hoc fallacy (34% vs 0%, p<0.001), decoy effects (44% vs 5%, p<0.001), Occam's razor fallacy (100% vs 0%, p<0.001) and hindsight bias (56% vs 0%, p<0.001). In every scenario, the average model response deviated substantially from the average response of practicing physicians (p<0.001 for all). Similar patterns of persistent specific biases, intransigent responses and substantial deviance from practicing physicians were also apparent in OpenAI and Google. Some biases persist in chain-of-thought reasoning LLMs, and models tend to produce intransigent recommendations. These findings highlight the role of clinicians to think broadly, respect diversity and remain vigilant when interpreting chain-of-thought reasoning artificial intelligence LLMs in nuanced medical decisions for patients.
Cancer Virtual Lab is a secure and interoperable platform for oncology research. It integrates HL7 FHIR and ontology-based knowledge graphs to structure clinical data, enabling advanced semantic reasoning. Generative AI supports researchers by guiding data exploration and interpretation, accelerating insights and enhancing precision oncology in a privacy-preserving, scalable environment.
This is an observational study that assesses the clinical reasoning of an artificial intelligence chatbot using previously published cases. The primary objective of the study is to assess the unique challenges neurologic cases pose to machine large language models (LLMs) in clinical practice in the context of human errors in clinical reasoning. This observational study tested the accuracy of GPT-4o, an artificial intelligence-powered chatbot, in generating a differential diagnosis for a series of 29 cases. The cases were presented in two formats, one as the full protocol with all detailed information, and a second clinical syndrome with information distilled via clinical reasoning into core features of pace and localization. The primary outcome was comparison of GPT-4o diagnostic accuracy when given all clinical information versus the distilled clinical syndrome. GPT-4o performed equally as well or better when provided the clinical syndrome versus the full protocol for every of the 29 cases provided. The overall accuracy improved from approximately 76 % to 97 % when given the clinical syndrome, and the overall rank of the correct diagnosis within the differential generated was equal or better for every clinical syndrome compared to the full case protocol. Our study demonstrates that clinical reasoning remains a major barrier in diagnostic accuracy in artificial intelligence just as it is mirrored in human trainees. Focusing training not just on knowledge, but on clinical reasoning, has the potential to improve performance of machine LLMs and learners alike dramatically.
Artificial intelligence models display human-like cognitive biases when generating medical recommendations. We tested whether an explicit forewarning, "Please keep in mind cognitive biases and other pitfalls of reasoning," might mitigate biases in OpenAI's generative pretrained transformer large language model. We used 10 clinically nuanced cases to test specific biases with and without a forewarning. Responses from the forewarning group were 50% longer and discussed cognitive biases more than 100 times more frequently compared with responses from the control group. Despite these differences, the forewarning decreased overall bias by only 6.9%, and no bias was extinguished completely. These findings highlight the need for clinician vigilance when interpreting generated responses that might appear seemingly thoughtful and deliberate.HighlightsArtificial intelligence models can be warned to avoid racial and gender bias.Forewarning artificial intelligence models to avoid cognitive biases does not adequately mitigate multiple pitfalls of reasoning.Critical reasoning remains an important clinical skill for practicing physicians.
It is becoming increasingly evident that Artificial intelligence (AI) development draws inspiration from the architecture and functions of the human brain. This manuscript examines the alignment between key brain regions-such as the brainstem, sensory cortices, basal ganglia, thalamus, limbic system, and prefrontal cortex-and AI paradigms, including generic AI, machine learning, deep learning, and artificial general intelligence (AGI). By mapping these neural and computational architectures, I herein highlight how AI models progressively mimic the brain's complexity, from basic pattern recognition and association to advanced reasoning. Current challenges, such as overcoming learning limitations and achieving comparable neuroplasticity, are addressed alongside emerging innovations like neuromorphic computing. Given the rapid pace of AI advancements in recent years, this work underscores the importance of continuously reassessing our understanding as technology evolves exponentially.
This editorial examines the factors contributing to the success of science, tracing its evolution from fundamental human curiosity to contemporary advancements propelled by technology, data, and artificial intelligence (AI). Beginning with the hypothesis-testing process, it highlights how imaginative individuals throughout history have offered explanations for the natural world, designed experiments, and amassed evidence to confirm or reject their ideas and theories, thus generating new knowledge and understanding of nature. Early humans formulated simple myths and legends as the first scientific hypotheses, partly to lessen their fear of the unknown. A more scientific turn appeared when rare explorer-scientists ventured beyond their ancestral homes, gathered empirical information using their limited senses, made choices based on observations, and sometimes relocated entire communities. Their efforts reflected the timeless elements of the scientific method: from generating a hypothesis to its experimental proof, broad validation and application of new knowledge. The paper then examines the characteristics of successful scientific disciplines. They attract many researchers who generate novel ideas and hypotheses, building an accelerating momentum of discovery. Further hallmarks of such fields are swift and fair peer validation and robust mechanisms for applying new knowledge to improve human well-being. By contrast, less successful fields will struggle with attracting talent, leading to slower progress, which could also be coupled with resistance to new ideas and obstacles to real-world translation of new knowledge. A central theme of the paper is the contribution of measurement and tools to science's success. Modern instruments, from microscopes and telescopes to satellites and statistical tools, have extended our perception of nature, revealing realms far smaller and far larger than human senses can access. The paper also addresses the revolution of 'hypothesis-free science', driven by computers and big data. Rather than framing a single hypothesis, modern researchers gather enormous datasets and use algorithms to test large numbers of possible hypotheses simultaneously and systematically, free of human bias introduced through existing knowledge. Finally, the paper explores how AI could advance science to unprecedented successes: not just by improving human senses like a microscope does, providing additional ones like the Large Hadron Collider does, or extending human memory and computational capacity like computers do, but also by expanding human reasoning itself. Unlike previous tools, AI can synthesise human knowledge and generate hypotheses, design studies, explore patterns and write papers, thus becoming both a 'philosopher 2.0' and a 'scientist 2.0'. Therefore, AI may transform science from a human-centred endeavour into collaborative effort that relies on hybrid intelligence. This unprecedented new frontier will require attention to questions of its explainability, bias, authorship, ethics, and accountability. In the future, science will remain successful by staying aligned with its fundamental mission: to improve the human condition through the expansion of knowledge and understanding of our world.
Information field theory (IFT), the information theory for fields, is a mathematical framework for signal reconstruction and non-parametric inverse problems. Artificial intelligence (AI) and machine learning (ML) aim at generating intelligent systems, including such for perception, cognition, and learning. This overlaps with IFT, which is designed to address perception, reasoning, and inference tasks. Here, the relation between concepts and tools in IFT and those in AI and ML research are discussed. In the context of IFT, fields denote physical quantities that change continuously as a function of space (and time) and information theory refers to Bayesian probabilistic logic equipped with the associated entropic information measures. Reconstructing a signal with IFT is a computational problem similar to training a generative neural network (GNN) in ML. In this paper, the process of inference in IFT is reformulated in terms of GNN training. In contrast to classical neural networks, IFT based GNNs can operate without pre-training thanks to incorporating expert knowledge into their architecture. Furthermore, the cross-fertilization of variational inference methods used in IFT and ML are discussed. These discussions suggest that IFT is well suited to address many problems in AI and ML research and application.
Seizure prediction remains a challenge, with approximately 30% of patients unresponsive to conventional treatments. Addressing this issue is crucial for improving patients' quality of life, as timely intervention can mitigate the impact of seizures. In this research field, it is critical to identify the preictal interval, the transition from regular brain activity to a seizure. While previous studies have explored various Electroencephalogram (EEG) based methodologies for prediction, few have been clinically applicable. Recent studies have underlined the dynamic nature of EEG data, characterised by data changes with time, known as concept drifts, highlighting the need for automated methods to detect and adapt to these changes. In this study, we investigate the effectiveness of automatic concept drift adaptation methods in seizure prediction. Three patient-specific seizure prediction approaches with a 10-minute prediction horizon are compared: a seizure prediction algorithm incorporating a window adjustment method by optimising performance with Support Vector Machines (Backwards-Landmark Window), a seizure prediction algorithm incorporating a data-batch (seizures) selection method using a logistic regression (Seizure-batch Regression), and a seizure prediction algorithm with a dynamic integration of classifiers (Dynamic Weighted Ensemble). These methods incorporate a retraining process after each seizure and use a combination of univariate linear features and SVM classifiers. The Firing Power was used as a post-processing technique to generate alarms before seizures. These methodologies were compared with a control approach based on the typical machine learning pipeline, considering a group of 37 patients with Temporal Lobe Epilepsy from the EPILEPSIAE database. The best-performing approach (Backwards-Landmark Window) achieved results of 0.75 ± 0.33 for sensitivity and 1.03 ± 1.00 for false positive rate per hour. This new strategy performed above chance for 89% of patients with the surrogate predictor, whereas the control approach only validated 46%.
We briefly present machine learning approaches for designing better biological experiments. These approaches build on machine learning predictors and provide additional tools to guide scientific discovery. There are two different kinds of objectives when designing better experiments: to improve the predictive model or to improve the experimental outcome. We survey five different approaches for adaptive experimental design that iteratively search the space of possible experiments while adapting to measured data. The approaches are Bayesian optimization, bandits, reinforcement learning, optimal experimental design, and active learning. These machine learning approaches have shown promise in various areas of biology, and we provide broad guidelines to the practitioner and links to further resources.
The rapid growth and adaptation of medical information to identify significant health trends and help with timely preventive care have been recent hallmarks of the modern healthcare data system. Heart disease is the deadliest condition in the developed world. Cardiovascular disease and its complications, including dementia, can be averted with early detection. Further research in this area is needed to prevent strokes and heart attacks. An optimal machine learning model can help achieve this goal with a wealth of healthcare data on heart disease. Heart disease can be predicted and diagnosed using machine-learning-based systems. Active learning (AL) methods improve classification quality by incorporating user-expert feedback with sparsely labelled data. In this paper, five (MMC, Random, Adaptive, QUIRE, and AUDI) selection strategies for multi-label active learning were applied and used for reducing labelling costs by iteratively selecting the most relevant data to query their labels. The selection methods with a label ranking classifier have hyperparameters optimized by a grid search to implement predictive modelling in each scenario for the heart disease dataset. Experimental evaluation includes accuracy and F-score with/without hyperparameter optimization. Results show that the generalization of the learning model beyond the existing data for the optimized label ranking model uses the selection method versus others due to accuracy. However, the selection method was highlighted in regards to the F-score using optimized settings.
Energy and security are major challenges in a wireless sensor network, and they work oppositely. As security complexity increases, battery drain will increase. Due to the limited power in wireless sensor networks, options to rely on the security of ordinary protocols embodied in encryption and key management are futile due to the nature of communication between sensors and the ever-changing network topology. Therefore, machine learning algorithms are one of the proposed solutions for providing security services in this type of network by including monitoring and decision intelligence. Machine learning algorithms present additional hurdles in terms of training and the amount of data required for training. This paper provides a convenient reference for wireless sensor network infrastructure and the security challenges it faces. It also discusses the possibility of benefiting from machine learning algorithms by reducing the security costs of wireless sensor networks in several domains; in addition to the challenges and proposed solutions to improving the ability of sensors to identify threats, attacks, risks, and malicious nodes through their ability to learn and self-development using machine learning algorithms. Furthermore, this paper discusses open issues related to adapting machine learning algorithms to the capabilities of sensors in this type of network.
Artificial intelligence (AI) and machine learning refer to computers built and programed by humans to perform tasks according to our design. This is vital to keep in mind as we try to understand the application of AI to medicine. AI is a tool with strengths and limitations. The primary strength of AI is that it allows us to assimilate and process unlimited quantities of health care data. The limits of AI include the inability of machines to adapt in a human sense, the reality that machines lack human insight (i.e., clinical judgment or common sense), and the limitation that machine-learning algorithms are subject to the data on which they are trained. Thus, we must adapt to AI and machine learning. Next, because machine learning is a type of AI in which computers are programmed to improve the algorithms under which they function over time, we require insight to achieve an element of explainability about the key data underlining a particular machine-learning prediction. Finally, machine-learning algorithms require validation before they can be applied to data sets different from the data on which they were trained. As computers have become faster and more powerful, and as the availability of digital data has become immense, we can program our machines to analyze data and recognize patterns that, in sum, are a primary basis of medical diagnosis and treatment.
Unmanned aerial vehicles (UAVs) are involved in critical tasks such as inspection and exploration. Thus, they have to perform several intelligent functions. Various control approaches have been proposed to implement these functions. Most classical UAV control approaches, such as model predictive control, require a dynamic model to determine the optimal control parameters. Other control approaches use machine learning techniques that require multiple learning trials to obtain the proper control parameters. All these approaches are computationally expensive. Our goal is to develop an efficient control system for UAVs that does not require a dynamic model and allows them to learn control parameters online with only a few trials and inexpensive computations. To achieve this, we developed a neural control method with fast online learning. Neural control is based on a three-neuron network, whereas the online learning algorithm is derived from a neural correlation-based learning principle with predictive and reflexive sensory information. This neural control technique is used here for the speed adaptation of the UAV. The control technique relies on a simple input signal from a compact optical distance measurement sensor that can be converted into predictive and reflexive sensory information for the learning algorithm. Such speed adaptation is a fundamental function that can be used as part of other complex control functions, such as obstacle avoidance. The proposed technique was implemented on a real UAV system. Consequently, the UAV can quickly learn within 3-4 trials to proactively adapt its flying speed to brake at a safe distance from the obstacle or target in the horizontal and vertical planes. This speed adaptation is also robust against wind perturbation. We also demonstrated a combination of speed adaptation and obstacle avoidance for UAV navigations, which is an important intelligent function toward inspection and exploration.
In this paper, we consider image quality assessment (IQA) as a measure of how images are amenable with respect to a given downstream task, or task amenability. When the task is performed using machine learning algorithms, such as a neural-network-based task predictor for image classification or segmentation, the performance of the task predictor provides an objective estimate of task amenability. In this work, we use an IQA controller to predict the task amenability which, itself being parameterised by neural networks, can be trained simultaneously with the task predictor. We further develop a meta-reinforcement learning framework to improve the adaptability for both IQA controllers and task predictors, such that they can be fine-tuned efficiently on new datasets or meta-tasks. We demonstrate the efficacy of the proposed task-specific, adaptable IQA approach, using two clinical applications for ultrasound-guided prostate intervention and pneumonia detection on X-ray images.
Good data feature representation and high precision classifiers are the key steps for pattern recognition. However, when the data distributions between testing samples and training samples do not match, the traditional feature extraction methods and classification models usually degrade. In this paper, we propose a domain adaptation approach to handle this problem. In our method, we first introduce cross-domain mean approximation (CDMA) into semi-supervised discriminative analysis (SDA) and design semi-supervised cross-domain mean discriminative analysis (SCDMDA) to extract shared features across domains. Secondly, a kernel extreme learning machine (KELM) is applied as a subsequent classifier for the classification task. Moreover, we design a cross-domain mean constraint term on the source domain into KELM and construct a kernel transfer extreme learning machine (KTELM) to further promote knowledge transfer. Finally, the experimental results from four real-world cross-domain visual datasets prove that the proposed method is more competitive than many other state-of-the-art methods.
This study investigates how adaptable Machine Learning Traffic Signal control methods are to topological variability. We ask how well can these methods generalize to non-Manhattan-like networks with non-uniform distances between intersections. A Machine Learning method that is highly reliable in various topologies is proposed and compared with state-of-the-art alternatives. Lastly, we analyze the sustainability of different traffic signal control methods based on computational efforts required to achieve convergence and perform training and testing. We show that our method achieves an approximately seven-fold improvement in terms of CO[Formula: see text] emitted in training over the second-best method.
Real-time systems are widely used in industry, including technological process control systems, industrial automation systems, SCADA systems, testing, and measuring equipment, and robotics. The efficiency of executing an intelligent robot's mission in many cases depends on the properties of the robot's sensor and control systems in providing the trajectory planning, recognition of the manipulated objects, adaptation of the desired clamping force of the gripper, obstacle avoidance, and so on. This paper provides an analysis of the approaches and methods for real-time sensor and control information processing with the application of machine learning, as well as successful cases of machine learning application in the synthesis of a robot's sensor and control systems. Among the robotic systems under investigation are (a) adaptive robots with slip displacement sensors and fuzzy logic implementation for sensor data processing, (b) magnetically controlled mobile robots for moving on inclined and ceiling surfaces with neuro-fuzzy observers and neuro controllers, and (c) robots that are functioning in unknown environments with the prediction of the control system state using statistical learning theory. All obtained results concern the main elements of the two-component robotic system with the mobile robot and adaptive manipulation robot on a fixed base for executing complex missions in non-stationary or uncertain conditions. The design and software implementation stage involves the creation of a structural diagram and description of the selected technologies, training a neural network for recognition and classification of geometric objects, and software implementation of control system components. The Swift programming language is used for the control system design and the CreateML framework is used for creating a neural network. Among the main results are: (a) expanding the capabilities of the intelligent control system by increasing the number of classes for recognition from three (cube, cylinder, and sphere) to five (cube, cylinder, sphere, pyramid, and cone); (b) increasing the validation accuracy (to 100%) for recognition of five different classes using CreateML (YOLOv2 architecture); (c) increasing the training accuracy (to 98.02%) and testing accuracy (to 98.0%) for recognition of five different classes using Torch library (ResNet34 architecture) in less time and number of epochs compared with Create ML (YOLOv2 architecture); (d) increasing the training accuracy (to 99.75%) and testing accuracy (to 99.2%) for recognition of five different classes using Torch library (ResNet34 architecture) and fine-tuning technology; and (e) analyzing the effect of dataset size impact on recognition accuracy with ResNet34 architecture and fine-tuning technology. The results can help to choose efficient (a) design approaches for control robotic devices, (b) machine-learning methods for performing pattern recognition and classification, and (c) computer technologies for designing control systems and simulating robotic devices.
The increasing use of intermittent aeration controllers in wastewater treatment plants (WWTPs) aims to reduce aeration costs via continuous ammonia and oxygen measurements but faces challenges in detecting sensor and process anomalies. Applying machine learning to this unbalanced, multivariate, multiclass classification challenge requires much data, difficult to obtain from a new plant. This study develops a machine learning algorithm to identify anomalies in intermittent aeration WWTPs, adaptable to new plants with limited data. Utilizing active learning, the method iteratively selects samples from the target domain to fine-tune a gradient-boosting model initially trained on data from 17 plants. Three sampling strategies were tested, with low probability and high entropy sampling proving effective in early adaptation, achieving an F2-score close to the optimal with minimal sample use. The objective is to deploy these models as decision support systems for WWTP management, providing a strategy for efficient model adaptation to new plants, and optimizing labeling efforts
最终合并的分组全面覆盖了人工智能能力的研究版图:从底层的神经网络架构与特征表示技术,到中层的逻辑推理、自适应学习与具身智能机制,再到高层的通用人工智能(AGI)愿景。报告不仅深入探讨了AI在医疗、科研、供应链等关键行业的深度应用能力,还从组织行为学和管理学视角分析了AI能力对企业绩效的赋能作用,构建了一个从理论基础到技术实现,再到社会组织应用的完整能力评价体系。