心理认知能力 洞察用户 交流方式
心智理论 (ToM) 的计算建模、评估与交互推理
该组文献是研究的核心,侧重于如何赋予AI(尤其是LLM和机器人)理解人类信念、意图和欲望的能力。涵盖了ToM的评估基准(如UniToM、EnigmaToM、BDIQA)、递归推理机制、贝叶斯建模以及在谈判和协作中的应用。
- Enhancing Conversational Agents with Theory of Mind: Aligning Beliefs, Desires, and Intentions for Human-Like Interaction(M. Jafari, Devin Yuncheng Hua, Hao Xue, Flora D. Salim, 2025, ArXiv)
- UniToMBench: Integrating Perspective-Taking to Improve Theory of Mind in LLMs(Prameshwar Thiyagarajan, Vaishnavi Parimi, Shamant Sai, Soumil Garg, Zhangir Meirbek, Nitin Yarlagadda, Kevin Zhu, Christof Kim, 2025, ArXiv)
- Evaluating Theory-of-Mind in Large Language Models Through Opponent Modeling(Emre Kuru, Anıl Doğru, M. Doğan, Reyhan Aydoğan, 2025, Proceedings of the 25th ACM International Conference on Intelligent Virtual Agents)
- EnigmaToM: Improve LLMs' Theory-of-Mind Reasoning Capabilities with Neural Knowledge Base of Entity States(Hainiu Xu, Siya Qi, Jiazheng Li, Yuxiang Zhou, Jinhua Du, C. Catmur, Yulan He, 2025, ArXiv)
- Bayesian Theory of Mind for False Belief Understanding in Human-Robot Interaction(Mehdi Hellou, Samuele Vinanzi, Angelo Cangelosi, 2023, 2023 32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN))
- Theory of Mind in Human-AI Interaction(Qiaosi Wang, S. Walsh, Mei Si, Jeff Kephart, Justin D. Weisz, Ashok K. Goel, 2024, Extended Abstracts of the CHI Conference on Human Factors in Computing Systems)
- MindCraft: Theory of Mind Modeling for Situated Dialogue in Collaborative Tasks(Cristian-Paul Bara, Sky CH-Wang, J. Chai, 2021, No journal)
- Interactive AI with a Theory of Mind(Mustafa Mert Çelikok, Tomi Peltola, Pedram Daee, Samuel Kaski, 2019, ArXiv Preprint)
- Modeling appraisal in theory of mind reasoning(Mei Si, S. Marsella, David V. Pynadath, 2008, Autonomous Agents and Multi-Agent Systems)
- Expedient Assistance and Consequential Misunderstanding: Envisioning an Operationalized Mutual Theory of Mind(Justin D. Weisz, Michael J. Muller, Arielle Goldberg, Darío Andrés Silva Moran, 2024, ArXiv)
- Theory of Mind Abilities of Large Language Models in Human-Robot Interaction: An Illusion?(Mudit Verma, Siddhant Bhambri, Subbarao Kambhampati, 2024, Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction)
- Rank-O-ToM: Unlocking Emotional Nuance Ranking to Enhance Affective Theory-of-Mind(JiHyun Kim, JuneHyoung Kwon, MiHyeon Kim, Eunju Lee, YoungBin Kim, 2025, ArXiv Preprint)
- Modeling Theory of Mind in Dyadic Games Using Adaptive Feedback Control(Ismael T. Freire, X. Arsiwalla, J. Puigbò, P. Verschure, 2023, Inf.)
- A Computable Game-Theoretic Framework for Multi-Agent Theory of Mind(Fengming Zhu, Yuxin Pan, Xiaomeng Zhu, Fangzhen Lin, 2025, ArXiv Preprint)
- Theory of Mind for Explainable Human-Robot Interaction(Marie S. Bauer, Julia Gachot, Matthias Kerzel, Cornelius Weber, Stefan Wermter, 2025, ArXiv Preprint)
- Large Model Strategic Thinking, Small Model Efficiency: Transferring Theory of Mind in Large Language Models(Nunzio Lorè, Sepehr Ilami, Babak Heydari, 2024, ArXiv)
- Theory of Mind and Self-Disclosure to CUIs(Samuel Rhys Cox, 2025, ArXiv Preprint)
- Function Alignment: A New Theory of Mind and Intelligence, Part I: Foundations(Gus G. Xia, 2025, ArXiv)
- Emergent Communication with World Models(Alexander I. Cowen-Rivers, Jason Naradowsky, 2020, ArXiv Preprint)
- SoMi-ToM: Evaluating Multi-Perspective Theory of Mind in Embodied Social Interactions(Xianzhe Fan, Xuhui Zhou, Chuanyang Jin, Kolby Nottingham, Hao Zhu, M. Sap, 2025, ArXiv)
- NegotiationToM: A Benchmark for Stress-testing Machine Theory of Mind on Negotiation Surrounding(Chunkit Chan, Cheng Jiayang, Yauwai Yim, Zheye Deng, Wei Fan, Haoran Li, Xin Liu, Hongming Zhang, Weiqi Wang, Yangqiu Song, 2024, ArXiv)
- LLMs achieve adult human performance on higher-order theory of mind tasks(Winnie Street, John Oliver Siy, Geoff Keeling, Adrien Baranes, Benjamin Barnett, M. McKibben, Tatenda Kanyere, Alison Lentz, B. A. Y. Arcas, R. Dunbar, 2024, Frontiers in Human Neuroscience)
- Theory of Mind Modeling in Search and Rescue Teams(Huao Li, Ini Oguntola, Dana Hughes, Michael Lewis, K. Sycara, 2022, 2022 31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN))
- Rethinking Theory of Mind Benchmarks for LLMs: Towards A User-Centered Perspective(Qiaosi Wang, Xuhui Zhou, Maarten Sap, Jodi Forlizzi, Hong Shen, 2025, ArXiv)
- Satisficing Mentalizing: Bayesian Models of Theory of Mind Reasoning in Scenarios with Different Uncertainties(Jan Pöppel, Stefan Kopp, 2019, ArXiv Preprint)
- LLM Theory of Mind and Alignment: Opportunities and Risks(Winnie Street, 2024, ArXiv)
- Gap the (Theory of) Mind: Sharing Beliefs About Teammates’ Goals Boosts Collaboration Perception, Not Performance(Yotam Amitai, Reuth Mirsky, Ofra Amir, 2025, 2025 IEEE International Conference on AI and Data Analytics (ICAD))
- Mitigating Value Conflicts with Computational Theory of Mind(Emre Erdogan, Hüseyin Aydin, F. Dignum, R. Verbrugge, Pinar Yolum, 2025, No journal)
情感智能、共情感知与社会化交流反馈
关注AI如何识别与产生共情反馈。研究内容包括情感化社交机器人、面部表情/微表情识别、虚拟现实中的具身共情、以及通过情感对齐和社交线索(如礼貌、语气)增强用户粘性。
- SENSE-7: Taxonomy and Dataset for Measuring User Perceptions of Empathy in Sustained Human-AI Conversations(Jina Suh, Lindy Le, Erfan Shayegani, Gonzalo Ramos, Judith Amores, Desmond C. Ong, Mary Czerwinski, Javier Hernandez, 2025, ArXiv Preprint)
- AI-Powered Emotional Intelligence in Social Robots: Towards Empathetic Human–Robot Interaction(Haddiqa Shafaqat, 2025, International Journal of Advanced and Innovative Research (IJAIR))
- Modeling User Empathy Elicited by a Robot Storyteller(Leena Mathur, Micol Spitale, Hao Xi, Jieyun Li, Maja J Matarić, 2021, ArXiv Preprint)
- Scenario-Based Immersive Virtual Reality Platform for Testing Empathy Types: A Development Study(T. Laine, Woo-Jin Lee, Jiyoung Moon, Aziz Hasanov, H. Suk, 2025, IEEE Access)
- Facial Expression Recognition as a Measure of User-Designer Empathy(Aleksi Salmi, Jie Li, Katja Holtta-Otto, 2022, Volume 6: 34th International Conference on Design Theory and Methodology (DTM))
- Comparing Cognitive and Affective Theory of Mind for an Assistive Robotics Application(Luca Raggioli, Antimo Cantiello, Raffaella Esposito, Alessandra Rossi, Silvia Rossi, 2025, Proceedings of the 33rd ACM Conference on User Modeling, Adaptation and Personalization)
- Personality- and Memory-based framework for Emotionally Intelligent agents(Alice Nardelli, Giacomo Maccagni, Federico Minutoli, Antonio Sgorbissa, C. Recchiuto, 2024, 2024 33rd IEEE International Conference on Robot and Human Interactive Communication (ROMAN))
- Applying a Text-Based Affective Dialogue System in Psychological Research: Case Studies on the Effects of System Behaviour, Interaction Context and Social Exclusion(M. Skowron, Stefan Rank, A. Świderska, Dennis Küster, Arvid Kappas, 2014, Cognitive Computation)
- Designing VRPT experience for empathy toward out-groups using critical incidents and cultural explanations(Daniela Hekiert, Magdalena Igras-Cybulska, A. Cybulski, 2021, 2021 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct))
- Empathy and embodied experience in virtual environment: To what extent can virtual reality stimulate empathy and embodied experience?(Donghee Don Shin, 2018, Comput. Hum. Behav.)
- Robot Mirroring: Promoting Empathy with an Artificial Agent by Reflecting the User’s Physiological Affective States(Monica Perusquía-Hernández, Marisabel Cuberos-Balda, David Antonio Gómez Jáuregui, Diego Paez-Granados, Felix Dollack, José Salazar, 2020, 2020 29th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN))
- The Impact of Adaptive Emotional Alignment on Mental State Attribution and User Empathy in HRI(Giorgia Buracchio, Ariele Callegari, Massimo Donini, Cristina Gena, Antonio Lieto, Alberto Lillo, Claudio Mattutino, Alessandro Mazzei, Linda Pigureddu, Manuel Striani, Fabiana Vernero, 2025, ArXiv Preprint)
- A Talking Musical Robot over Multiple Interactions: After Bonding and Empathy Fade, Relevance and Realism Arise(Ivy S. Huang, J. Hoorn, 2025, ACM Transactions on Human-Robot Interaction)
- Embedding Emotional Intelligence in AGI for Enhanced Cognition(Vishal Punjabi, Balajee Asish Brahmandam, Srinath Chandramohan, 2025, International Journal of Innovative Research in Computer and Communication Engineering)
- Bridging Cognition and Emotion: Empathy-Driven Multimodal Misinformation Detection(Zihan Wang, Lu Yuan, Zhengxuan Zhang, Qing Zhao, 2025, ArXiv Preprint)
- CoEmpaTeam: Enhancing Cognitive Empathy using LLM-based Avatars and Dynamic Role Play in Virtual Reality(Dehui Kong, Martin Feick, Shi Liu, Alexander Maedche, 2026, ArXiv Preprint)
- Cognitive vs. emotional empathy: exploring their impact on user outcomes in health-assistant chatbots(Tingting Jiang, Chuxuan Huang, Yanrun Xu, Han Zheng, 2025, Behaviour & Information Technology)
- Are You Empathizing with Me? Exploring External Expressions of Empathy in Interpersonal VR Communication(Yongho Lee, Bowon Kim, Hyunchul Kim, Jeongmi Lee, Gun A. Lee, Heesook Shin, Youn-Hee Gil, 2025, 2025 IEEE International Symposium on Mixed and Augmented Reality (ISMAR))
- AI Communication Tone and Consumer Judgment: The Role of Servant Perception in Behavioral Intentions.(John Yang, 2026, Behavioral sciences)
- Prosociality Matters: How Does Prosocial Behavior in Interdependent Situations Influence the Well-being and Cognition of Road Users?(Sooyeon Kim, Shashank Mehrotra, Kumar Akash, Teruhisa Misu, John D. Lee, 2024, Proceedings of the 16th International Conference on Automotive User Interfaces and Interactive Vehicular Applications)
- Learning to Generate Context-Sensitive Backchannel Smiles for Embodied AI Agents with Applications in Mental Health Dialogues(Maneesh Bilalpur, Mert Inan, Dorsa Zeinali, Jeffrey F. Cohn, Malihe Alikhani, 2024, ArXiv Preprint)
用户心理模型、可解释性 AI 与信任校准
探讨用户如何形成对AI的认知表征,以及系统如何通过可解释性(XAI)策略、透明化反馈和元认知干预来对齐用户期望,纠正认知偏差(如锚定效应、过度依赖),从而建立适度信任。
- Exploring User Mental Models of End-to-End Encrypted Communication Tools(Ruba Abu-Salma, Elissa M. Redmiles, Blase Ur, Miranda Wei, 2018)
- Building Appropriate Mental Models: What Users Know and Want to Know about an Agentic AI Chatbot(Michelle Brachman, Siya Kunde, Sarah Miller, Ana Fucs, S. Dempsey, Jamie Jabbour, Werner Geyer, 2025, Proceedings of the 30th International Conference on Intelligent User Interfaces)
- Investigating the effect of Mental Models in User Interaction with an Adaptive Dialog Agent(Lindsey Vanderlyn, Dirk Väth, Ngoc Thang Vu, 2024, ArXiv Preprint)
- Sequential Explanations with Mental Model-Based Policies(A. Yeung, Shalmali Joshi, J. Williams, Frank Rudzicz, 2020, ArXiv)
- ReasoningRec: Bridging Personalized Recommendations and Human-Interpretable Explanations through LLM Reasoning(Millennium Bismay, Xiangjue Dong, James Caverlee, 2024, ArXiv Preprint)
- Study on the Helpfulness of Explainable Artificial Intelligence(Tobias Labarta, Elizaveta Kulicheva, Ronja Froelian, Christian Geißler, Xenia Melman, Julian von Klitzing, 2024, ArXiv Preprint)
- Natural Language Interaction to Facilitate Mental Models of Remote Robots(Francisco J. Chiyah Garcia, José Lopes, Helen Hastie, 2020, ArXiv Preprint)
- Users' Mental Models of Generative AI Chatbot Ecosystems(Xingyi Wang, Xiaozheng Wang, Sunyup Park, Yaxing Yao, 2025, ArXiv Preprint)
- Mental model shifts in human-LLM interactions(Johannes Schneider, 2025, Journal of Intelligent Information Systems)
- DeBiasMe: De-biasing Human-AI Interactions with Metacognitive AIED (AI in Education) Interventions(Chaeyeon Lim, 2025, ArXiv Preprint)
- Violation of Expectation via Metacognitive Prompting Reduces Theory of Mind Prediction Error in Large Language Models(Courtland Leer, Vincent Trost, Vineeth Voruganti, 2023, ArXiv)
- Cognitive Amplification vs Cognitive Delegation in Human-AI Systems: A Metric Framework(Eduardo Di Santi, 2026, ArXiv Preprint)
- More Than Accuracy: Towards Trustworthy Machine Learning Interfaces for Object Recognition(Hendrik Heuer, Andreas Breiter, 2020, ArXiv Preprint)
- Not All Transparency Is Equal: Source Presentation Effects on Attention, Interaction, and Persuasion in Conversational Search(Jiangen He, Jiqun Liu, 2025, Proceedings of the 2026 Conference on Human Information Interaction and Retrieval)
- From Framework to Evidence: Testing Explainable AI Feedback for Leadership Learning in Collaborative VR under the C²L-AI Model(LI Yahan, 2025, Artificial Intelligence Education Studies)
认知负荷、执行功能优化与界面交互设计 (UI/UX)
研究交互过程中的认知资源分配。重点在于如何通过动态界面生成、多模态反馈(触觉、视觉、眼动)、注意力引导等手段降低用户认知负荷,特别是针对老年人或认知障碍人群的辅助设计。
- The Impact of Feedback Modalities and the Influence of Cognitive Load on Interpersonal Communication in Nonclinical Settings: Experimental Study Design(Chryselle Rego, E. Montague, 2023, JMIR Human Factors)
- Digital Wellbeing Lens: Design Interfaces That Respect User Attention(Alberto Monge Roffarello, Luigi De Russis, Massimiliano Pellegrino, 2024, Proceedings of the 2024 International Conference on Advanced Visual Interfaces)
- Examining the Effects of HMDs/FSDs and Gender Differences on Cognitive Processing Ability and User Experience of the Stroop Task-Embedded Virtual Reality Driving System (STEVRDS)(Chen-Wei Chang, Mengtong Li, S. Yeh, Yijing Chen, A. Rizzo, 2020, IEEE Access)
- PROMOTING EXECUTIVE FUNCTIONS IN OLDER ADULTS: THE ROLE OF GAMIFICATION AND INDIVIDUAL DIFFERENCES(Morgan Gomez, A. Pahor, Audrey Carrillo, Aaron Seitz, Susanne Jaeggi, 2024, Innovation in Aging)
- AI-enhanced learning and cognitive processes in digital humanities: A systematic review of executive functions(Mohammed A. Alshehri, Alsubaie Faisal Bin Shabib Mosleet, M. Abdellatif, Mohamed Ali Nemt-allah, 2025, Research Journal in Advanced Humanities)
- Measuring Cognitive Abilities in the Wild: Validating a Population-Scale Game-Based Cognitive Assessment(Mads Kock Pedersen, Carlos Mauricio Castaño Díaz, Qian Janice Wang, Mario Alejandro Alba-Marrugo, Ali Amidi, Rajiv Vaid Basaiawmoit, Carsten Bergenholtz, Morten H. Christiansen, Miroslav Gajdacz, Ralph Hertwig, Byurakn Ishkhanyan, Kim Klyver, Nicolai Ladegaard, Kim Mathiasen, Christine Parsons, Janet Rafner, Anders Ryom Villadsen, Mikkel Wallentin, Blanka Zana, Jacob Friis Sherson, 2020, ArXiv Preprint)
- Engineering-Psychological Aspects of Animation Application in User Interface of Robotic Systems(2025, Ergodesign)
- Dynamic User Interface Generation for Enhanced Human-Computer Interaction Using Variational Autoencoders(Runsheng Zhang, Shixiao Wang, Tianfang Xie, Shiyu Duan, Mengmeng Chen, 2024, ArXiv Preprint)
- Cognitive Debt in the ChatGPT Era: How Ethical and Emotional Use Shapes Cognitive Function in Emerging Adults(Kiran Shehzadi, Khalid Khan, M. Chaudhry, 2025, Nature-Nurture Journal of Psychology)
- AI Chatbots and Cognitive Control: Enhancing Executive Functions Through Chatbot Interactions: A Systematic Review(Pantelis Pergantis, Victoria Bamicha, C. Skianis, Athanasios Drigas, 2025, Brain Sciences)
- Cognitive Load Balancing: Building AI Systems to Help Humans Manage Overstimulation in a 24/7 Attention Economy(Akangsha Sunil Bedmutha, 2025, European Modern Studies Journal)
- Exploring Cognitive Load Dynamics in Human-Machine Interaction for Teleoperation: A User-Centric Perspective on Remote Operation System Design(Juan Jose Garcia Cardenas, Xiaoxuan Hei, Adriana Tapus, 2024, 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS))
- Enhancing Attention Span in Human-Computer Interaction by Designing Digital Interfaces for Optimal Cognitive Efficiency(Henry Ricardo, Edbert Andersen, N. Masaling, Andry Chowanda, 2025, 2025 IEEE 2nd International Conference on Cryptography, Informatics, and Cybersecurity (ICoCICs))
- Age-Related Visual Attention and Interaction Performance Across Interface Layouts in EV Charging Applications: An Eye-Tracking Study(Dongyue Liu, Tong Wu, Xiyun Li, 2026, IEEE Access)
- Chromatic Asymmetry in Visual Attention: Dissociable Effects of Background Color on Capture and Processing During Reading—An Eye-Tracking Study(A. Teixeira, Pedro Martins, Sónia Brito-Costa, Maryam Abbasi, 2026, Symmetry)
- Outline or Solid? The Role of Icon Style on User's Perception(Zhangfan Shen, Yi Wang, Moke Li, Jiaxiang Chen, Zhanpeng Hu, 2025, Human Factors and Ergonomics in Manufacturing & Service Industries)
- Getting the Most from Eye-Tracking: User-Interaction Based Reading Region Estimation Dataset and Models(Ruoyan Kong, Ruixuan Sun, Charles Chuankai Zhang, Chen Chen, Sneha Patri, Gayathri Gajjela, Joseph A. Konstan, 2023, ArXiv Preprint)
个性化用户画像、神经科学建模与个体差异
利用用户的性格(大五人格、气质)、偏好、历史数据甚至EEG等神经科学指标构建“数字孪生”。研究如何实现从被动推荐到主动预测的转变,同时探讨隐私悖论与公平性挑战。
- The Personalization Trap: How User Memory Alters Emotional Reasoning in LLMs(Xi Fang, Weijie Xu, Yuchong Zhang, Stephanie Eckman, Scott Nickleach, Chandan K. Reddy, 2025, ArXiv)
- Understanding the Role of User Profile in the Personalization of Large Language Models(Bin Wu, Zhengyan Shi, Hossein A. Rahmani, Varsha Ramineni, Emine Yilmaz, 2024, ArXiv Preprint)
- Learning Personalized User Preference from Cold Start in Multi-turn Conversations(Deguang Kong, Abhay Jha, Lei Yun, 2023, ArXiv Preprint)
- PediaMind-R1: A Temperament-Aware Language Model for Personalized Early Childhood Care Reasoning via Cognitive Modeling and Preference Alignment(Zihe Zhang, Can Zhang, Yanheng Xu, Xin Hu, Jichao Leng, 2025, ArXiv Preprint)
- The Influence of Personality Traits on User Interaction with Recommendation Interfaces(Dongning Yan, Li Chen, 2023, ACM Transactions on Interactive Intelligent Systems)
- Neuroscientific User Models: The Source of Uncertain User Feedback and Potentials for Improving Recommendation and Personalisation(Kevin Jasberg, Sergej Sizov, 2018, ArXiv Preprint)
- Personalized neuroscience: User modeling of cognitive function and brain activity in the cloud(T. Nick, Laura Berman, Arye Barnehama, 2015)
- Probabilistic Digital Twins of Users: Latent Representation Learning with Statistically Validated Semantics(Daniel David, 2025, ArXiv Preprint)
- Personalization, Privacy, and Me(Reshma Narayanan Kutty, Claudia Orellana-Rodriguez, Igor Brigadir, Ernesto Diaz-Aviles, 2021, ArXiv Preprint)
- The Personalization Paradox: the Conflict between Accurate User Models and Personalized Adaptive Systems(Santiago Ontañón, Jichen Zhu, 2021, ArXiv Preprint)
- From Traits to Empathy: Personality-Aware Multimodal Empathetic Response Generation(Jiaqiang Wu, Xuandong Huang, Zhouan Zhu, Shangfei Wang, 2025, No journal)
- Effects of cross-cultural language differences on social cognition during human-agent interaction in cooperative game environments(Casey C. Bennett, Young Bae, Jun Hyung Yoon, Yejin Chae, Eunseo Yoon, Seeun Lee, Uijae Ryu, S. Kim, Benjamin Weiss, 2023, Comput. Speech Lang.)
- Gender Differences in Visual Information Perception Ability: A Signal Detection Theory Approach(Yejin Lee, Kwangtae Jung, 2025, Applied Sciences)
社交动力学、群体行为与心理健康干预
探讨AI在社会层面的应用。一方面研究社交媒体中的谣言传播、极化与毒性交互;另一方面研究AI作为数字疗法在CBT、抑郁症护理、ADHD康复及家庭动态中的治疗性作用。
- Group behavior simulation in multi-character interactions: Game narrative generation Algorithms based on social proof theory(Tan Liying, 2025, Environment and Social Psychology)
- Structural and Cognitive Bottlenecks to Information Access in Social Networks(Jeon-Hyung Kang, Kristina Lerman, 2013, ArXiv Preprint)
- Understanding Online Polarization Through Human-Agent Interaction in a Synthetic LLM-Based Social Network(Tim Donkers, Jürgen Ziegler, 2025, ArXiv Preprint)
- A Rumor Propagation Model Based on User Cognition and Evolutionary Game(Rong Wang, Zerui Wu, Liangyu Wang, Chaolong Jia, Yunpeng Xiao, 2025, ACM Transactions on Knowledge Discovery from Data)
- Exploring Malicious Comments behavior on Social Media: An Analysis from Cognition, Affect , and Conation(Xiangyu Kong, 2025, Frontiers in Humanities and Social Research)
- Emotion Artificial Intelligence: A Cognitive Behavioral Sentiment Analysis System for Mental Health Support(Bosubabu Sambana, K. V. Prasad, B. Ganesh, S. Mahalakshmi, U. Nanaji, V. Trinadha, Surya Pavan, Kumar Gudla, Y. Ramesh, 2025, 2025 IEEE International Conference on Advanced Computing Technologies (ICACT))
- Customizable AI for Depression Care: Improving the User Experience of Large Language Model-Driven Chatbots(Yi Li, Xu Ding, Yifan Chen, Yeye Li, Nan Ma, 2025, Proceedings of the 2025 ACM Designing Interactive Systems Conference)
- A Personalized Content Filter for Mental Health Apps Using BiLSTM(A. Joshi, K. Bhavsar, Piyush Deolikar, Nitesh Rajput, Diptee Chikmurge, Sharmila Kharat, 2025, 2025 3rd International Conference on Intelligent Systems, Advanced Computing and Communication (ISACC))
- User Experience of ADHD People: an EEG-Based Exploratory Study(Eva Lissette Paredes-Cabrera, Carmen Mezura-Godoy, E. Benítez-Guerrero, 2025, Programming and Computer Software)
- Artificial Intelligence (AI) in the Family System: Possible Positive and Detrimental Effects on Parenting, Communication and Family Dynamics(Máté Szondy, Ágnes Magyary, 2025, European Journal of Mental Health)
- Phubbing and mental well-being: a media naturalness theory approach to digital disruption in face-to-face communication(Fahad Zeya, Nargis Sultana, 2026, Mental Health and Digital Technologies)
- Evaluating an Online Cognitive Training Platform for Older Adults: User Experience and Implementation Requirements.(M. Haesner, A. Steinert, J. O’Sullivan, Markus Weichenberger, 2015, Journal of gerontological nursing)
认知科学驱动的底层理论框架与认知架构
包含支撑上述应用的基础理论研究,如System 1与System 2认知架构、矢量符号架构、信念更新的数学模型、以及拟人化、认知失调等深层心理学理论的探讨。
- AAAI 2022 Fall Symposium: System-1 and System-2 realized within the Common Model of Cognition(Brendan Conway-Smith, Robert L. West, 2023, ArXiv Preprint)
- A philosophical and ontological perspective on Artificial General Intelligence and the Metaverse(Martin Schmalzried, 2024, ArXiv Preprint)
- THE PSYCHOLOGICAL IMPLICATIONS OF ANTHROPOMORPHISING ARTIFICIAL INTELLIGENCE: REFLEXIVE AND ETHICAL RISKS(Вероніка Горєлова, 2025, "Scientific notes of the University"KROK")
- Vector Symbolic Architectures answer Jackendoff's challenges for cognitive neuroscience(Ross W. Gayler, 2004, ArXiv Preprint)
- CASPER: Cognitive Architecture for Social Perception and Engagement in Robots(Samuele Vinanzi, Angelo Cangelosi, 2022, ArXiv Preprint)
- Modeling Belief in Dynamic Systems, Part II: Revision and Update(N Friedman, J. Y. Halpern, 1999, ArXiv Preprint)
- A Systems-Theoretic Approach to Mental State Estimation for Theory-of-Mind-Aware Social Robots(Maria L. Morão Patrício, A. Jamshidnejad, 2025, IEEE Access)
- Scripts and Social Cognition(Gen Eickers, 2024, Ergo an Open Access Journal of Philosophy)
- From Theory of Mind to Theory of Environment: Counterfactual Simulation of Latent Environmental Dynamics(Ryutaro Uchiyama, 2026, ArXiv Preprint)
- Cognitive Science and Artificial Intelligence for Human Cognition and Communication(A. K. Sangaiah, Huimin Lu, Qing Hu, 2020, IEEE Consumer Electron. Mag.)
合并后的分组展示了“心理认知能力洞察用户交流方式”这一主题的完整科研版图:从底层理论(认知架构与数学建模)出发,通过核心技术(ToM心智建模、情感计算、个性化画像)实现对用户深层意图与特质的捕捉,并最终落实到交互界面优化(认知负荷管理、XAI)以及特定社会/医疗场景(心理健康干预、社交动力学)的应用中。研究趋势正从单一的“功能辅助”转向深度的“心理对齐”与“人机协同智能”。
总计293篇相关文献
No abstract available
This article presents two studies conducted with an affective dialogue system in which text-based system–user communication was used to model, generate and present different affective and social interaction scenarios. We specifically investigated the influence of interaction context and roles assigned to the system and the participants, as well as the impact of pre-structured social interaction patterns that were modelled to mimic aspects of “social exclusion” scenarios. The results of the first study demonstrate that both the social context of the interaction and the roles assigned to the system influence the system evaluation, interaction patterns, textual expressions of affective states, as well as emotional self-reports. The results observed for the second study show the system’s ability to partially exclude a participant from a triadic conversation without triggering significantly different affective reactions or a more negative system evaluation. The experimental evidence provides insights on the perception, modelling and generation of affective and social cues in artificial systems that can be realized in different modalities, including the text modality, thus delivering valuable input for applying affective dialogue systems as tools for studying affect and social aspects in online communication.
No abstract available
Abstract Mixed Reality enables individuals to visualise and interact with artefacts and environments through a combination of physical and virtual assets. It has received increased interest from the design community as a means to accelerate, enrich and enhance prototyping activities. This article concerns MR’s ability to deceive an individual through the combination of virtual and physical assets and their underlying traits (e.g., mass, size), and a user’s cognitive ability to ‘join the dots’. If properly implemented, MR could save time and resources by reducing the required prototype fidelity and the need to fully realise variants. However, there is a gap in understanding how the traits of physical and virtual assets and cognition combine to form reality. This article presents a study investigated the role mass, virtual and physical model size played on users perception of an MR prototype. The relative impact of these factors was determined by varying these parameters and assessing the user’s perceived change. The key finding from this study was that the virtual model size had a far greater influence on prototype perceived by the user. This suggests that the required physical fidelity of an MR prototype can be lower than the virtual. Furthermore, exploring size design variants can be achieved exclusively through changes to the virtual model.
The concept of joint attention holds significant importance in human interaction and is pivotal in establishing rapport, understanding, and effective communication. Within social robotics, enhancing user perception of the robot and promoting a sense of natural interaction with robots becomes a central element. In this sense, emulating human-centric qualities in social robots, such as joint attention, defined as the ability of two or more individuals to focus on a common event simultaneously, can increase their acceptability. This study analyses the impact on user perception of a responsive joint attention system integrated into a social robot within an interactive scenario. The experimental setup involves playing against the robot in the “Odds and Evens” game under two conditions: whether the joint attention system is active or inactive. Additionally, auditory and visual distractors are employed to simulate real-world distractions, aiming to test the system’s ability to capture and follow user attention effectively. To assess the influence of the joint attention system, participants completed the Robotic Social Attributes Scale (RoSAS) after each interaction. The results showed a significant improvement in user perception of the robot’s competence and warmth when the joint attention system was active.
Color is a basic but not well-measured factor that affects how well visual communication works. Color psychology has long associated hue, brightness, and saturation with emotional and behavioral responses; however, there is a scarcity of studies that amalgamate these cognitive principles with computational design and empirical cross-cultural testing. This research addresses the existing gap by formulating a cognitive-computational framework that simulates color perception and consumer engagement within digital media environments. The study employs a mixed-methods design, integrating extensive cross-cultural surveys (n = 620), eye-tracking experiments (n = 90), and machine-learning analysis of user engagement data from 12 international campaigns, to elucidate the variations in perceptual and emotional responses to color across different cultural schemas and media platforms. Regression and ANOVA analyses indicate that warm-saturated color palettes (red-orange spectrum) yield elevated engagement scores (β = 0.63, p < .01) and increased visual fixation durations (+42%) in collectivist cultures, whereas cool-moderate tones are more effective in individualist contexts. The results show that combining cognitive psychology with computational modeling can help make evidence-based design decisions. The results lead to a prescriptive framework that lets designers and marketers use algorithms to find the best color strategy for different groups of people.
Despite the many studies investigated the impact of icon design on usability in the past, few have compared outline icons with solid icons. This study combines familiarity training, recognition tasks, and visual search tasks to explore how icon design style and internal cognitive characteristics jointly affect visual perception. A total of 120 pairs of solid icons and corresponding outline icons were collected and designed. Subsequently, participants were asked to rate icons based on familiarity and concreteness, excluding those that were either too familiar or too unfamiliar. After 27 participants were familiarized with all of the icons over two training sessions, they were required to complete the task of recalling icons with relevant semantic meanings. Finally, to further decompose the users' visual perception process, participants' ability to visually search for icons was additionally tested. The results indicated that participants performed significantly better at recognizing and visually searching for solid icons, especially when they were unfamiliar. However, the visual perception advantage decreased with an increase in familiarity. In addition, strong evidence was found indicating that concrete solid icons have the highest visual search performance. The findings in this study provide practical guidelines for user interface design.
The accurate perception of visual stimuli in human–machine systems is crucial for improving system safety, usability, and task performance. The widespread adoption of digital technology has significantly increased the importance of visual interfaces and information. Therefore, it is essential to design visual interfaces and information with user characteristics in mind to ensure accurate perception of visual information. This study employed the Cognitive Perceptual Assessment for Driving (CPAD) to evaluate and compare gender differences in the ability to perceive visual signals within complex visual stimuli. The experimental setup included a computer with CPAD installed, along with a touch monitor, mouse, joystick, and keyboard. The participants included 11 male and 20 female students, with an average age of 22 for males and 21 for females. Prior to the experiment, participants were instructed to determine whether a signal stimulus was present: if a square, presented as the signal, was included in the visual stimulus, they moved the joystick to the left; otherwise, they moved it to the right. Each participant performed a total of 40 trials. The entire experiment was recorded on video to measure overall response times. The experiment measured the number of correct detections of signal presence, response times, the number of misses (failing to detect the signal when present), and false alarms (detecting the signal when absent). The analysis of experimental data revealed no significant differences in perceptual ability or response times for visual stimuli between genders. However, males demonstrated slightly superior perceptual ability and marginally shorter response times compared to females. Analyses of sensitivity and response bias, based on signal detection theory, also indicated a slightly higher perceptual ability in males. In conclusion, although these differences were not statistically significant, males demonstrated a slightly better perception ability for visual stimuli. The findings of this study can inform the design of information, user interfaces, and visual displays in human–machine systems, particularly in light of the recent trend of increased female participation in the industrial sector. Future research will focus on diverse types of visual information to further validate these findings.
This study aims to understand how users experience sponsored content shared by influencers on Instagram and how the meaning of this content is reflected in their purchase intentions within the framework of the Elaboration Likelihood Model (ELM). In the study, which was conducted by adopting the phenomenological design, one of the qualitative research designs, in-depth interviews were conducted with 29 Instagram users. The findings revealed that users structure their interactions with sponsored content through two basic forms of cognitive processing: central and peripheral. Participants who engaged in central processing focused on the informational quality of the content, the contribution of the product to daily life and the sincerity of the influencer, while those who engaged in peripheral processing found superficial cues such as the popularity of the influencer, visual narrative style, user comments and campaign language more decisive. Participants' attitudes towards the content and the contexts of meaning they construct with the product are shaped not only by content features but also by individual perceptions, social media usage practices, and digital relationships with the influencer. The study draws attention to the multi-layered nature of user experiences in social media communication and emphasizes the importance of authenticity, knowledge-based narratives, and trust in content design
In this study, a haptic feedback system based on AR glasses and electrical stimulation technology is designed and implemented to assess users’ spatial imagination ability. The system builds a virtual environment through Unity, combining the immersive vision of AR glasses with the haptic feedback of electrical stimulation hardware to capture hand movements and simulate the sense of touch in real time. Twelve subjects were recruited to explore the effectiveness of haptic recognition of geometric shapes and its association with spatial imagination ability. The results showed that shape complexity and size significantly affected the judgement results. In addition, the total number of correct trials on the experimental task showed a strong positive correlation with scores on the Spatial Imagination Ability Questionnaire (SIAQ), validating the validity of the paradigm as a measurement tool for spatial imagination ability. This study provides a new method for the optimisation of haptic feedback systems and the assessment of spatial cognitive ability, which can be further enhanced by expanding the sample and optimising the design in the future
Vibration feedback is common in everyday devices, from virtual reality systems to smartphones. However, cognitive and physical activities may impede our ability to sense vibrations from devices. In this study, we develop and characterize a smartphone platform to investigate how a shape-memory task (cognitive activity) and walking (physical activity) impair human perception of smartphone vibrations. We measured how Apple's Core Haptics Framework parameters can be used for haptics research, namely how <italic>hapticIntensity</italic> modulates amplitudes of 230 Hz vibrations. A 23-person user study found that physical (<inline-formula><tex-math notation="LaTeX">$p< 0.001$</tex-math></inline-formula>) and cognitive (<inline-formula><tex-math notation="LaTeX">$p=0.012$</tex-math></inline-formula>) activity increase vibration perception thresholds. Cognitive activity also increases vibration response time (<inline-formula><tex-math notation="LaTeX">$p< 0.001$</tex-math></inline-formula>). This work also introduces a smartphone platform that can be used for out-of-lab vibration perception testing. Researchers can use our smartphone platform and results to design better haptic devices for diverse, unique populations.
Social robots are increasingly expected to initiate interactions, yet proactive behavior often produces negative experiences if presented when the users are not available. We argue that availability is not binary and depends on the user's cognitive load. Moreover, proactivity can take different forms, suggesting that proactive cues can be adjusted to the user’s current load. We tested whether matching proactivity modality (verbal and nonverbal cues) to cognitive-load level improves robot perception. In a 2×2×2 between-participants study, 87 participants completed a high-load writing task or a low-load screw-sorting task while an ElliQ robot initiated interaction using verbal cues, nonverbal cues, both, or neither. Perceptions varied across conditions. Under low load, verbal proactivity was preferred regardless of nonverbal cues. Under high load, nonverbal-only proactivity produced the best experience. These findings suggest that proactivity design is not trivial and should focus on aligning the robot's proactive behavior with the user's cognitive load.
In the rapidly evolving landscape of AI-mediated communication (AIMC), tools powered by Large Language Models (LLMs) are becoming integral to interpersonal communication. Employing a mixed-methods approach, we conducted a one-week diary and interview study to explore users’ perceptions of these tools’ ability to: 1) support interpersonal communication in the short-term, and 2) lead to potential long-term effects. Our findings indicate that participants view AIMC support favorably, citing benefits such as increased communication confidence, finding precise language to express their thoughts, and navigating linguistic and cultural barriers. However, our findings also show current limitations of AIMC tools, including verbosity, unnatural responses, and excessive emotional intensity. These shortcomings are further exacerbated by user concerns about inauthenticity and potential overreliance on the technology. We identify four key communication spaces delineated by communication stakes (high or low) and relationship dynamics (formal or informal) that differentially predict users’ attitudes toward AIMC tools. Specifically, participants report that these tools are more suitable for communicating in formal relationships than informal ones and more beneficial in high-stakes than low-stakes communication.
Teleoperated robots, especially in hazardous environments, integrate human cognition with machine efficiency, but can increase cognitive load, causing stress and reducing task performance and safety. This study examines the impact of the information available to the operator on cognitive load, physiological responses (e.g., GSR, blinking, facial temperature), and performance during teleoperation in three conditions: C1 - in presence, C2 - remote with Visual feedback, and C3 - remote with telepresence robot. The findings from our user study involving 20 participants show that information availability significantly impacts perceived cognitive load, as evidenced by the differences observed between conditions in our analysis. Furthermore, the results indicated that blinking rates varied significantly among the conditions. The results also underline that individuals with higher error scores on the spatial orientation test (SOT), reflecting lower spatial ability, are more likely to experience failure in conditions 2 and 3. The results show that information availability significantly affects cognitive load and teleoperation performance, especially depth perception of the robot’s actions. Additionally, the thermal and GSR data findings indicate an increase in stress and anxiety levels when operators perform conditions 2 and 3, thus corroborating an increase in the user’s cognitive load.
Artificial intelligence is increasingly embedded in service interactions, requiring users to form rapid social judgments about AI communicators based on limited linguistic and contextual cues. This research examines how AI communication tone shapes behavioral intentions through social cognitive processes of role construal and agency attribution. Drawing on politeness theory, formality research, and social cognition perspectives, two scenario-based experiments test whether formal versus casual tone influences responses via attitudes toward the tone and the AI, and how these effects depend on perceptions of AI as a servant-like social actor. Study 1 shows that tone effects are moderated by servant perception and that economic framing, specifically paid versus free access, functions as an antecedent of hierarchical role construal. Study 2 replicates these effects and demonstrates that interaction structure, one-way versus two-way communication, similarly shapes servant perception by signaling differential autonomy. Across both studies, formal tone is more effective when AI is construed as subordinate, whereas casual tone is less effective under hierarchical role frames. By identifying servant perception as a central social cognitive mechanism, this research advances understanding of human judgment and decision making in technology-mediated interactions and offers implications for AI communication design aligned with role expectations. Because both studies rely on U.S. consumers, the findings should be interpreted within cultural contexts characterized by relatively low power distance, where role expectations and hierarchy norms may differ from other cultural settings.
The advent of virtual reality technology has provided a new approach for assessing and training cognitive processing ability, with the design of simulations used to replicate real events in everyday lives. To better understand how head mounted displays/flat screen displays (HMDs/FSDs) and differences in the individuals who use them affect cognitive performance and the use of VR systems, our research group created the Stroop task-embedded virtual reality driving system (STEVRDS) and conducted a 2 $\times $ 2 between-group factorial design experiment among college students. The study examined the effects of HMDs and FSDs that differ in monovision/stereovision and field of view, the impact of gender (males vs. females) on users’ performances in virtual driving and Stroop trials, and users’ psychophysiological responses while using the system. The participants’ subjective perceptions toward STEVRDS were also assessed to support the analyses/interpretations of cognitive performance, as well as provide empirical data relating to user experiences. The statistical analyses showed both main and interaction effects of HMDs/FSDs and gender on task performance, psychophysiological responses, and user evaluations of the system. The psychophysiological patterns exhibited during the use of STEVRDS further extended the findings. Overall, our results were comparable with cognitive phenomena reported in other studies/in real-life experiences or explained by logical reasoning, which suggests that the design/development of the STEVRDS is suitable for cognitive assessment/training. Practical implications are discussed for the application of HMDs and FSDs in evaluating and enhancing cognitive processing ability and the need for specific tailoring for male and female users.
In human-agent teams, openly sharing goals is often assumed to enhance planning, collaboration, and effectiveness. However, direct communication of these goals is not always feasible, requiring teammates to infer their partner's intentions through actions. Building on this, we investigate whether an AI agent's ability to share its inferred understanding of a human teammate's goals can improve task performance and perceived collaboration. Through an experiment comparing three conditions-no recognition (NR), viable goals (VG), and viable goals on-demand (VGod)-we find that while goal-sharing information did not yield significant improvements in task performance or overall satisfaction scores, thematic analysis suggests that it supported strategic adaptations and subjective perceptions of collaboration. Cognitive load assessments revealed no additional burden across conditions, highlighting the challenge of balancing informativeness and simplicity in human-agent interactions. These findings highlight the nuanced trade-off of goal-sharing: while it fosters trust and enhances perceived collaboration, it can occasionally hinder objective performance gains.
Cognitive biases can influence the decision-making of board members and CISOs responsible for managing cyber risks. However, limited attention has been given to understanding how these biases affect cybersecurity governance, specifically in the communication of risks between CISOs and boards. This paper aims to address this gap by identifying cognitive biases and proposing how these biases influence communication and strategic decision-making in cybersecurity governance. By further examining their impact, we strive to uncover the mechanisms that contribute to underestimations or distortions in risk perception, which can compromise an organization’s ability to respond effectively to cyber threats. This short paper provides three exemplary biases expected to influence communication and decision-making in cybersecurity governance. Following the initial results, we propose a series of interviews with CISOs to reveal the challenges they face when communicating cyber risks to boards, focusing on how biases influence the decisions regarding cybersecurity risks.
Background The escalating demands of modern health care systems, combined with the emotional toll of patient care, have led to an alarming increase in physician burnout rates. This burnout, characterized by emotional exhaustion, depersonalization, and reduced personal accomplishment, can hinder doctors’ ability to connect with patients effectively. Moreover, the cognitive load arising from information overload and the need for multitasking can further hinder doctors’ ability to connect with patients effectively. Understanding the complex relationship between physician burnout and cognitive load is crucial for devising targeted interventions that enhance physician well-being and promote effective physician-patient interactions. Implementing strategies to alleviate burnout and cognitive load can lead to improved health care experiences and patient outcomes. Objective Our study explores the interplay between physician burnout and its potential impact on interpersonal communication, particularly focusing on the role of cognitive load using a pilot study in a nonclinical setting involving nonclinical participants. Methods This study uses an experimental design to evaluate 3 feedback tools (haptic, visual, and postvisit summary) and measure the cognitive load they impose on nonclinical participants in a nonclinical environment. The NASA Task Load Index, a widely accepted measure of cognitive load, was used to quantify the cognitive load associated with the feedback tools. The study used a within-subject design, meaning participants experienced all 3 feedback methods. A sample of 18 nonclinical participants was selected using counterbalancing techniques. Results Postsession feedback not only enhancing performance but also mitigating the influence of cognitive load as compared with real-time feedback (haptic+visual). Participants with interview experience showed lower cognitive load levels when exposed to real-time feedback as compared with novice users. In contrast, postsession feedback was more effective for novice users. In addition, cognitive workload emerged as a moderating factor in the relationship between feedback tools and their impact on performance, particularly in terms of speaking balance and pace. This moderating effect suggests that the correlation between feedback tool efficacy and performance varies based on an individual’s cognitive load while using the feedback tool. The comparison of postfeedback with haptic feedback yielded a Z score of −3.245 and a P value of .001, while the comparison with visual feedback resulted in a Z score of −2.940 and a P value of .003. These outcomes underscore a significant disparity in the means between postsession feedback and real-time feedback (haptic+visual), with postsession feedback indicating the lowest mean score. Conclusions Through the examination of various feedback tools, this study yields significant and insightful comparisons regarding their usability and appropriateness in nonclinical settings. To enhance the applicability of these findings to clinical environments, further research encompassing diverse participant cohorts and clinical scenarios is warranted.
In e-commerce of organic products, privacy and trust are key determinants of consumer behaviour. This study examines gender differences in privacy concerns, perceived risk, perceived control, trust, and online purchase intention within an extended Social Cognitive Theory framework that integrates cognitive and social variables. Data were collected from 821 users, and the hypotheses were tested using Structural Equation Modelling (SEM). The findings reveal significant gender differences that are partially mediated by trust. Specifically, female consumers exhibit a stronger negative effect of perceived privacy risks on trust in the provider (β = −0.231, p < 0.001) than male consumers (β = −0.101, p < 0.05), and female consumers show a significant relationship between perceived ability to control and trust in the provider (β = 0.137, p < 0.05) compared to male consumers (β = 0.088, p < 0.10). These results highlight the need for differentiated digital strategies that reinforce data transparency and user control while adapting communication and design to gender-specific perceptions and trust mechanisms.
This paper describes a research study that aims to investigate changes in effective communication during human-AI collaboration with special attention to the perception of competence among team members and varying levels of task load placed on the team. We will also investigate differences between human-human teamwork and human-agent teamwork. Our project will measure differences in the communication quality, team perception and performance of a human actor playing a Commercial Off - The Shelf game (COTS) with either a human teammate or a simulated AI teammate under varying task load. We argue that the increased cognitive workload associated with increases task load will be negatively associated with team performance and have a negative impact on communication quality. In addition, we argue that positive team perceptions will have a positive impact on the communication quality between a user and teammate in both the human and AI teammate conditions. This project will offer more refined insights on Human - AI relationship dynamics in collaborative tasks by considering communication quality, team perception, and performance under increasing cognitive workload.
Since high dropout rates in online learning platforms were reported, various factors affecting learner retention have been identified, with learners' perceptions of their experiences playing a crucial role in shaping their persistence. For instance, Kittur et al. highlight how success expectations are shaped by perceived system fit and course difficulty. Recent advances in generative Artificial Intelligence (GenAI) present new possibilities for GenAI-mediated learning. AI-generated instructional messages are often perceived as clearer than human-written content, but their impact on learners' perceptions of skill-building experiences remains underexplored. This study examines GenAI-mediated learning in a self-directed context, focusing on communication skills. We compare three messaging styles - Affective, Cognitive, and Action-Oriented - to investigate their influence on learners' perceptions of the learning process. We applied this approach to ten instructional units, using GenAI to generate 30 learning items. Three evaluators assessed them for desirability and appropriateness through numerical ratings and open-ended feedback. The 180 excerpts were analyzed using reflexive thematic analysis, revealing four overarching themes: Prerequisite Common Ground, Intrinsic Value, User Responses, and Expressed Preferences. We discuss these insights to inform the design of GenAI-mediated, self-directed skill-building, with the goal of enhancing engagement, persistence, and learning outcomes.
The increasing application of Conversational Agents (CAs) changes the way customers and businesses interact during a service encounter. Research has shown that CA equipped with social cues (e.g., having a name, greeting users) stimulates the user to perceive the interaction as human-like, which can positively influence the overall experience. Specifically, social cues have shown to lead to increased customer satisfaction, perceived service quality, and trustworthiness in service encounters. However, many CAs are discontinued because of their limited conversational ability, which can lead to customer dissatisfaction. Nevertheless, making errors and mistakes can also be seen as a human characteristic (e.g., typing errors). Existing research on human-computer interfaces lacks in the area of CAs producing human-like errors and their perception in a service encounter situation. Therefore, we conducted a 2x2 online experiment with 228 participants on how CAs typing errors and CAs human-like behavior treatments influence user’s perception, including perceived service quality.
As AI systems (foundation models, agentic systems) grow increasingly capable of operating for minutes or hours at a time, users'prompts are transforming into highly detailed, elaborate specifications for the AI to autonomously work on. While interactive prompting has been extensively studied, comparatively less is known about how people communicate specifications for these types of long-horizon tasks. In a qualitative study in which 16 professionals drafted specifications for both a human colleague and an AI, we found a core divergence in how people specified problems to people versus AI: people approached communication with humans as providing a"compass", offering high-level intent to encourage flexible exploration. In contrast, communication with AI resembled painstakingly laying down"railway tracks": rigid, exhaustive instructions to minimize ambiguity and deviation. This strategy was driven by a perception that current AI has limited ability to infer intent, prioritize, and make judgments on its own. When envisioning an idealAI collaborator, users expressed a desire for a hybrid between current AI and human colleagues: a collaborator that blends AI's efficiency and large context window with the critical thinking and agency of a human colleague. We discuss design implications for future AI systems, proposing that they align on outcomes through generated rough drafts, verify feasibility via end-to-end"test runs,"and monitor execution through intelligent check-ins, ultimately transforming AI from a passive instruction-follower into a reliable collaborator for ambiguous, long-horizon problems.
The spread of health-related misinformation on social media has increased user efforts to tackle emerging risks. In this study, we provide a model on how users’ perceptions of risk when encountering possibly fear-inducing pandemic misinformation influenced their intent to fact-check it. We employed an online survey to collect the data among adult Facebook users. The model was tested using structural equation modelling. Unlike previous risk perception models, we found a positive effect of cognitive risk perception on the intentions of social media users to utilise internet tools for verifying the accuracy of information. The results also revealed that the more emotional risk users perceived, the more they intended to use ample sources, and, indirectly, also to use tools for verifying information. Furthermore, the participants demonstrated a greater propensity to utilise online fact-checking tools as their intention to explore many information sources increased. Our study contributes to the field by connecting cognitive and emotional risk perception with multi-faceted fact-checking in social media, where both individual fact-checking practices and information-seeking behaviour merge. It also contributes to human information behaviour research, by highlighting higher concerns with disease danger as possible user characteristics for motivated misinformation debunking. Thus, our findings may aid health practitioners and risk communicators in assessing how to target and educate, especially individuals with low-risk perception. Finally, we call on the general public and legislators to recognise the invaluable role of providing online information accurately as a crucial part of the strategic communication agenda. Plain Language Summary How risk perception affects fact-checking on social media The spread of false health information on social media has made people work harder to deal with new risks. Our study looks at how people's thoughts and feelings about risk when they see scary pandemic misinformation affect their desire to check the facts. We tested this using advanced statistics. We found that when people think more logically about risks, they are more likely to use online tools to fact-check. When people feel more emotional risk, they are more likely to look for multiple sources of information and then use online fact-checking tools. The more people want to gather information from different sources, the more they use online fact-checking tools. Our study shows how both thinking and feeling about risks are linked to thorough fact-checking on social media. It highlights that people who are very worried about disease risks are more motivated to correct false information. This helps health professionals and communicators know how to better target and educate those who are less aware of risks. We stress the importance of accurate online information as a crucial part of effective communication, urging the public and lawmakers to understand this vital role.
Human-robot interaction in cooperative and assistive scenarios requires robotic systems to assess the task state and coherently choose their next move. Moreover, it is also fundamental to correctly recognize how the user’s stress and emotional response are changing to offer support appropriately. The robot should be able to adapt to different user reactions, considering the situational context, and displaying empathetic behaviors aiming to support and encourage the users. In this work, we aim to assess the impact of empathetic supporting behaviors on the perception of the robot and the users’ performance during a collaborative task, as opposed to assistive strategies focusing only on the task’s performance. With this objective in mind, we propose a robotic architecture to assist a user in playing a memory game in real-time using a Furhat robot. We conducted a user study where 60 participants played with the robot to evaluate the effects of the two types of Theory of Mind on the assistive task and their perception of the robot. To this extent, the participants interacted with a robot endowed with either Cognitive or Affective Theory of Mind to respectively allow the robot to understand intentions and beliefs, and to show empathetic behaviors to improve the collaboration. The two conditions resulted in achieving the same results in terms of task performance, but the participants rated the emotionally engaged robot higher in perceived social intelligence.
No abstract available
Abstract When humans share space in road traffic, as drivers or as vulnerable road users, they draw on their full range of communicative and interactive capabilities. Much remains unknown about these behaviors, but they need to be captured in models if automated vehicles are to coexist successfully with human road users. Empirical studies of human road user behavior implicate a large number of underlying cognitive mechanisms, which taken together are well beyond the scope of existing computational models. Here, we note that for all of these putative mechanisms, computational theories exist in different subdisciplines of psychology, for more constrained tasks. We demonstrate how these separate theories can be generalized from abstract laboratory paradigms and integrated into a computational framework for modeling human road user interaction, combining Bayesian perception, a theory of mind regarding others’ intentions, behavioral game theory, long-term valuation of action alternatives, and evidence accumulation decision-making. We show that a model with these assumptions—but not simpler versions of the same model—can account for a number of previously unexplained phenomena in naturalistic driver–pedestrian road-crossing interactions, and successfully predicts interaction outcomes in an unseen data set. Our modeling results contribute to demonstrating the real-world value of the theories from which we draw, and address calls in psychology for cumulative theory-building, presenting human road use as a suitable setting for work of this nature. Our findings also underscore the formidable complexity of human interaction in road traffic, with strong implications for the requirements to set on development and testing of vehicle automation.
The last couple of years have witnessed emerging research that appropriates Theory-of-Mind (ToM) tasks designed for humans to benchmark LLM's ToM capabilities as an indication of LLM's social intelligence. However, this approach has a number of limitations. Drawing on existing psychology and AI literature, we summarize the theoretical, methodological, and evaluation limitations by pointing out that certain issues are inherently present in the original ToM tasks used to evaluate human's ToM, which continues to persist and exacerbated when appropriated to benchmark LLM's ToM. Taking a human-computer interaction (HCI) perspective, these limitations prompt us to rethink the definition and criteria of ToM in ToM benchmarks in a more dynamic, interactional approach that accounts for user preferences, needs, and experiences with LLMs in such evaluations. We conclude by outlining potential opportunities and challenges towards this direction.
Theory-of-Mind (ToM), the ability to infer the mental states, goals, and preferences of others — is a core component of human social intelligence. In this work, we investigate whether Large Language Models (LLMs) exhibit ToM capabilities in the context of strategic interaction. We frame opponent modeling in negotiation as a grounded and interpretable ToM task, where a model must infer an agent’s preferences by observing offer exchanges during the negotiation. We guide LLMs to interpret offer histories and infer latent utility representations, including issue and value weights. We conduct a comprehensive evaluation of state-of-the-art LLMs across multiple negotiation domains. Our results show that LLMs can successfully recover opponents unknown preferences and in some cases even outperform classical opponent modeling baselines, even without task-specific training. These findings offer new evidence of LLMs’ emerging capacity for social reasoning and position opponent modeling as a practical benchmark for evaluating Theory-of-Mind in foundation models.
Natural language interaction with agentic Artificial Intelligence (AI), driven by Large Language Models (LLMs), is expected to remain a dominant paradigm in the near future. While humans instinctively align their communication with mental states -- an ability known as Theory of Mind (ToM), current LLM powered systems exhibit significant limitations in this regard. This study examines the extent to which open source language models (LLaMA) can capture and preserve ToM related information and how effectively it contributes to consistent ToM reasoning in generated responses. We further investigate whether explicit manipulation of ToM related components, such as beliefs, desires, and intentions, can enhance response alignment. Experiments on two LLaMA 3 variants demonstrate that incorporating ToM informed alignment improves response quality, achieving win rates of 67 and 63 percent for the 3B and 8B models, respectively. These findings highlight the potential of ToM driven strategies to improve alignment in LLM based conversational agents.
Large Language Models (LLMs) have shown exceptional generative abilities in various natural language and generation tasks. However, possible anthropomorphization and leniency towards failure cases have propelled discussions on emergent abilities of LLMs especially on Theory of Mind (ToM) abilities in Large Language Models. While several false-belief tests exists to verify the ability to infer and maintain mental models of another entity, we study a special application of ToM abilities that has higher stakes and possibly irreversible consequences : Human Robot Interaction. In this work, we explore the task of Perceived Behavior Recognition, where a robot employs an LLM to assess the robot's generated behavior in a manner similar to human observer. We focus on four behavior types, namely - explicable, legible, predictable, and obfuscatory behavior which have been extensively used to synthesize interpretable robot behaviors. The LLMs goal is, therefore to be a human proxy to the agent, and to answer how a certain agent behavior would be perceived by the human in the loop, for example "Given a robot's behavior X, would the human observer find it explicable?". We conduct a human subject study to verify that the users are able to correctly answer such a question in the curated situations (robot setting and plan) across five domains. A first analysis of the belief test yields extremely positive results inflating ones expectations of LLMs possessing ToM abilities. We then propose and perform a suite of perturbation tests which breaks this illusion, i.e. Inconsistent Belief, Uninformative Context and Conviction Test. The high score of LLMs on vanilla prompts showcases its potential use in HRI settings, however to possess ToM demands invariance to trivial or irrelevant perturbations in the context which LLMs lack. We report our results on GPT-4 and GPT-3.5-turbo.
Theory of Mind (ToM), humans’ capability of attributing mental states such as intentions, goals, emotions, and beliefs to ourselves and others, has become a concept of great interest in human-AI interaction research. Given the fundamental role of ToM in human social interactions, many researchers have been working on methods and techniques to equip AI with an equivalent of human ToM capability to build highly socially intelligent AI. Another line of research on ToM in human-AI interaction seeks to understand people’s tendency to attribute mental states such as blame, emotions, and intentions to AI, along with the role that AI should play in the interaction (e.g. as a tool, partner, teacher, facilitator, and more) to align with peoples’ expectations and mental models. The goal of this line of work is to distill human-centered design implications to support the development of increasingly advanced AI systems. Together, these two research perspectives on ToM form an emerging paradigm of “Mutual Theory of Mind (MToM)” in human-AI interaction, where both the human and the AI each possess the ToM capability. This workshop aims to bring together different research perspectives on ToM in human-AI interaction by engaging with researchers from various disciplines including AI, HCI, Cognitive Science, Psychology, Robotics, and more to synthesize existing research perspectives, techniques, and knowledge on ToM in human-AI interaction, as well as envisioning and setting a research agenda for MToM in human-AI interaction.
Large Language Models (LLMs) have sparked substantial interest and debate concerning their potential emergence of Theory of Mind (ToM) ability. Theory of mind evaluations currently focuses on testing models using machine-generated data or game settings prone to shortcuts and spurious correlations, which lacks evaluation of machine ToM ability in real-world human interaction scenarios. This poses a pressing demand to develop new real-world scenario benchmarks. We introduce NegotiationToM, a new benchmark designed to stress-test machine ToM in real-world negotiation surrounding covered multi-dimensional mental states (i.e., desires, beliefs, and intentions). Our benchmark builds upon the Belief-Desire-Intention (BDI) agent modeling theory and conducts the necessary empirical experiments to evaluate large language models. Our findings demonstrate that NegotiationToM is challenging for state-of-the-art LLMs, as they consistently perform significantly worse than humans, even when employing the chain-of-thought (CoT) method.
An ideal integration of autonomous agents in a human world implies that they are able to collaborate on human terms. In particular, theory of mind plays an important role in maintaining common ground during human collaboration and communication. To enable theory of mind modeling in situated interactions, we introduce a fine-grained dataset of collaborative tasks performed by pairs of human subjects in the 3D virtual blocks world of Minecraft. It provides information that captures partners’ beliefs of the world and of each other as an interaction unfolds, bringing abundant opportunities to study human collaborative behaviors in situated language communication. As a first step towards our goal of developing embodied AI agents able to infer belief states of collaborative partners in situ, we build and present results on computational models for several theory of mind tasks.
A major challenge in cognitive science and AI has been to understand how intelligent autonomous agents might acquire and predict the behavioral and mental states of other agents in the course of complex social interactions. How does such an agent model the goals, beliefs, and actions of other agents it interacts with? What are the computational principles to model a Theory of Mind (ToM)? Deep learning approaches to address these questions fall short of a better understanding of the problem. In part, this is due to the black-box nature of deep networks, wherein computational mechanisms of ToM are not readily revealed. Here, we consider alternative hypotheses seeking to model how the brain might realize a ToM. In particular, we propose embodied and situated agent models based on distributed adaptive control theory to predict the actions of other agents in five different game-theoretic tasks (Harmony Game, Hawk-Dove, Stag Hunt, Prisoner’s Dilemma, and Battle of the Exes). Our multi-layer control models implement top-down predictions from adaptive to reactive layers of control and bottom-up error feedback from reactive to adaptive layers. We test cooperative and competitive strategies among seven different agent models (cooperative, greedy, tit-for-tat, reinforcement-based, rational, predictive, and internal agents). We show that, compared to pure reinforcement-based strategies, probabilistic learning agents modeled on rational, predictive, and internal phenotypes perform better in game-theoretic metrics across tasks. The outlined autonomous multi-agent models might capture systems-level processes underlying a ToM and suggest architectural principles of ToM from a control-theoretic perspective.
In this paper, we propose a novel personalized decision support system that combines Theory of Mind (ToM) modeling and explainable Reinforcement Learning (XRL) to provide effective and interpretable interventions. Our method leverages DRL to provide expert action recommendations while incorporating ToM modeling to understand users' mental states and predict their future actions, enabling appropriate timing for intervention. To explain interventions, we use counterfactual explanations based on RL's feature importance and users' ToM model structure. Our proposed system generates accurate and personalized interventions that are easily interpretable by end-users. We demonstrate the effectiveness of our approach through a series of crowd-sourcing experiments in a simulated team decision-making task, where our system outperforms control baselines in terms of task performance. Our proposed approach is agnostic to task environment and RL model structure, therefore has the potential to be generalized to a wide range of applications.
In order to achieve a widespread adoption of social robots in the near future, we need to design intelligent systems that are able to autonomously understand our beliefs and preferences. This will pave the foundation for a new generation of robots able to navigate the complexities of human societies. To reach this goal, we look into Theory of Mind (ToM): the cognitive ability to understand other agents’ mental states. In this paper, we rely on a probabilistic ToM model to detect when a human has false beliefs with the purpose of driving the decision-making process of a collaborative robot. In particular, we recreate an established psychology experiment involving the search for a toy that can be secretly displaced by a malicious individual. The results that we have obtained in simulated experiments show that the agent is able to predict human mental states and detect when false beliefs have arisen. We then explored the set-up in a real-world human interaction to assess the feasibility of such an experiment with a humanoid social robot.
Theory of Mind (ToM) refers to the ability to make inferences about other’s mental states. Such ability is fundamental for human social activities such as empathy, teamwork, and communication. As intelligent agents come to be involved in diverse human-agent teams, they will also be expected to be socially intelligent in order to become effective teammates. In this paper, we describe a computational ToM model which observes team behaviors and infers their mental states in a urban search and rescue (US&R) task. Our modular ToM model approximates human inference by explicitly representing beliefs, belief updates, and action prediction/generation using Deep Neural Networks (DNNs). To validate our model we compare its performance to the gold standard of human observers asked to make the same inferences. The ToM model proved superior to the average judgments of human observers on all four tests of inference and better than 90th percentile observers on three of the four. While the learning bias provided by modularizing belief and prediction proved sufficient for the simple inferences tested, substantial refinement will be needed to replicate the complex nuanced chains of inference observed in human social interaction.
This paper examines the extent to which large language models (LLMs) are able to perform tasks which require higher-order theory of mind (ToM)—the human ability to reason about multiple mental and emotional states in a recursive manner (e.g., I think that you believe that she knows). This paper builds on prior work by introducing a handwritten test suite—Multi-Order Theory of Mind Q&A—and using it to compare the performance of five LLMs of varying sizes and training paradigms to a newly gathered adult human benchmark. We find that GPT-4 and Flan-PaLM reach adult-level and near adult-level performance on our ToM tasks overall, and that GPT-4 exceeds adult performance on 6th order inferences. Our results suggest that there is an interplay between model size and finetuning for higher-order ToM performance, and that the linguistic abilities of large models may support more complex ToM inferences. Given the important role that higher-order ToM plays in group social interaction and relationships, these findings have significant implications for the development of a broad range of social, educational and assistive LLM applications.
Theory-of-Mind (ToM), the ability to infer others' perceptions and mental states, is fundamental to human interaction but remains challenging for Large Language Models (LLMs). While existing ToM reasoning methods show promise with reasoning via perceptual perspective-taking, they often rely excessively on off-the-shelf LLMs, reducing their efficiency and limiting their applicability to high-order ToM reasoning. To address these issues, we present EnigmaToM, a novel neuro-symbolic framework that enhances ToM reasoning by integrating a Neural Knowledge Base of entity states (Enigma) for (1) a psychology-inspired iterative masking mechanism that facilitates accurate perspective-taking and (2) knowledge injection that elicits key entity information. Enigma generates structured knowledge of entity states to build spatial scene graphs for belief tracking across various ToM orders and enrich events with fine-grained entity state details. Experimental results on ToMi, HiToM, and FANToM benchmarks show that EnigmaToM significantly improves ToM reasoning across LLMs of varying sizes, particularly excelling in high-order reasoning scenarios.
Humans continuously infer the states, goals, and behaviors of others by perceiving their surroundings in dynamic, real-world social interactions. However, most Theory of Mind (ToM) benchmarks only evaluate static, text-based scenarios, which have a significant gap compared to real interactions. We propose the SoMi-ToM benchmark, designed to evaluate multi-perspective ToM in embodied multi-agent complex social interactions. This benchmark is based on rich multimodal interaction data generated by the interaction environment SoMi, covering diverse crafting goals and social relationships. Our framework supports multi-level evaluation: (1) first-person evaluation provides multimodal (visual, dialogue, action, etc.) input from a first-person perspective during a task for real-time state inference, (2) third-person evaluation provides complete third-person perspective video and text records after a task for goal and behavior inference. This evaluation method allows for a more comprehensive examination of a model's ToM capabilities from both the subjective immediate experience and the objective global observation. We constructed a challenging dataset containing 35 third-person perspective videos, 363 first-person perspective images, and 1225 expert-annotated multiple-choice questions (three options). On this dataset, we systematically evaluated the performance of human subjects and several state-of-the-art large vision-language models (LVLMs). The results show that LVLMs perform significantly worse than humans on SoMi-ToM: the average accuracy gap between humans and models is 40.1% in first-person evaluation and 26.4% in third-person evaluation. This indicates that future LVLMs need to further improve their ToM capabilities in embodied, complex social interactions.
This paper introduces function alignment, a novel theory of mind and intelligence that is both intuitively compelling and structurally grounded. It explicitly models how meaning, interpretation, and analogy emerge from interactions among layered representations, forming a coherent framework capable not only of modeling minds but also of serving as a blueprint for building them. One of the key theoretical insights derived from function alignment is bounded interpretability, which provides a unified explanation for previously fragmented ideas in cognitive science, such as bounded rationality, symbol grounding, and analogy-making. Beyond modeling, the function alignment framework bridges disciplines often kept apart, linking computational architecture, psychological theory, and even contemplative traditions such as Zen. Rather than building on any philosophical systems, it offers a structural foundation upon which multiple ways of understanding the mind may be reconstructed.
Theory of Mind (ToM), the ability to understand the mental states of oneself and others, remains a challenging area for large language models (LLMs), which often fail to predict human mental states accurately. In this paper, we introduce UniToMBench, a unified benchmark that integrates the strengths of SimToM and TOMBENCH to systematically improve and assess ToM capabilities in LLMs by integrating multi-interaction task designs and evolving story scenarios. Supported by a custom dataset of over 1,000 hand-written scenarios, UniToMBench combines perspective-taking techniques with diverse evaluation metrics to better stimulate social cognition in LLMs. Through evaluation, we observe that while models like GPT-4o and GPT-4o Mini show consistently high accuracy in tasks involving emotional and belief-related scenarios, with results usually above 80%, there is significant variability in their performance across knowledge-based tasks. These results highlight both the strengths and limitations of current LLMs in ToM-related tasks, underscoring the value of UniToMBench as a comprehensive tool for future development. Our code is publicly available here: https://github.com/Shamant/unifiedtombenchmark.
Multiagent systems bring together agents that represent different users with possibly different concerns. When interacting to make decisions, conflicts occur. A well-known case is with privacy. Agents often need to manage the privacy of content that belong to multiple users, such as sharing group pictures on social media. When agents have different expectations on how the content should be shared, multi-party privacy conflicts can arise. How should we design agents to deal with such conflicts? We have studied an empirical user study to understand the effect of group dynamics in various multi-party privacy settings. Our findings show that as users' beliefs and knowledge about others evolve, privacy expectations shift as well. Inspired by this, we propose computational agents that mimic a human-inspired Theory of Mind (ToM) model to help their users preserve their privacy in multi-party privacy conflicts. The agents can express empathy when others are in need but can also fight for their own privacy. We evaluate our approach in multiagent simulations with varying decision-making strategies. Our results demonstrate that ToM-enabled agents improve privacy preservation for all parties, and even more when their understanding of others is dynamically updated through learning.
Recent research shows that Large Language Models (LLMs) exhibit a compelling level of proficiency in Theory of Mind (ToM) tasks. This ability to impute unobservable mental states to others is vital to human social cognition and may prove equally important in principal-agent relations between individual humans and Artificial Intelligences (AIs). In this paper, we explore how a mechanism studied in developmental psychology known as Violation of Expectation (VoE) can be implemented to reduce errors in LLM prediction about users by leveraging emergent ToM affordances. And we introduce a \textit{metacognitive prompting} framework to apply VoE in the context of an AI tutor. By storing and retrieving facts derived in cases where LLM expectation about the user was violated, we find that LLMs are able to learn about users in ways that echo theories of human learning. Finally, we discuss latent hazards and augmentative opportunities associated with modeling user psychology and propose ways to mitigate risk along with possible directions for future inquiry.
Artificial intelligence (AI) and mixed reality (MR), within human-computer interaction (HCI), are rapidly redefining areas of healthcare by introducing new approaches to patient care and clinical education. This editorial explores how these technologies, through Extended Mind Theory, enhance mental health treatment and medical training. AI-powered virtual therapists, using natural language processing and predictive analytics, provide accessible, personalized mental health support, allowing for remote and immersive therapy. In MR environments, patients with anxiety, post-traumatic stress disorder (PTSD), or phobias can safely engage in therapeutic exercises, confronting fears in controlled, virtual settings. In clinical education, AI and MR deliver adaptive, immersive training tools that respond to individual needs, enabling repeated practice in a risk-free environment. These tools improve skills and build confidence by simulating high-stakes scenarios like emergency response, with HCI principles ensuring user-friendly and experiential learning. Ethical considerations, including data security and transparency, are essential as these tools integrate into healthcare. This blend of AI, MR, and HCI redefines healthcare boundaries, extending cognitive and emotional support into virtual spaces, enhancing both patient care and clinical training.
In the past few years, human-robot deception has been receiving growing attention in several fields (e.g., human-robot interaction, laws, philosophy, and psychology). While deception both in human-human and human-robot interactions may have positive consequences, it still presents philosophical and psychological controversy. In particular, verbal deceptions (i.e., in the form of lies or misleading information) may be judged as intentional behaviour at times. While intentionality has been recognised as fundamental in the development of trust, it is not yet fully clear which mechanisms can be designed to foster trust and the potential issues connected to deception. To this extent, in this study, we investigate whether the ability of mentalizing may be one of such mechanisms. We conducted a user study during a public fair, where participants played an assistive game with a robot endowed with Theory of Mind (ToM). We collected the responses from 37 participants to evaluate their perception of trust in the robot. During the game, the robot may occasionally have deceptive behaviours suggesting the wrong move to the human players. Our results showed that a deceptive robot was less trusted compared to a non-deceiving one. We also found that people’s perception of the robot was positively affected by the frequency of exposure to deception (i.e., wrong suggestions).
Social robots are increasingly deployed in fields such as health care and education to support users through social interactions. Nonetheless, these robots mostly rely on black-box machine learning methods that lack awareness of the mental states of their users, which often leads to unnatural behavior. To address this, we propose three model-based techniques for real-time estimation of invisible mental states of humans. Each method adapts the extended Kalman filter and incorporates a validated dynamic model of human mental states. These mental state estimators are designed for human-robot social interactions and personalize their parameters using initial user data. When tested with 10 human participants interacting with a NAO robot, the mental state estimators reduced the average error in estimation and prediction of mental states across all participants by 3% (i.e., from 12% to 9%), with improvements of up to 13% for individual participants. These results demonstrate the potential of integrating such state estimators into the behavioral control systems of social robots to enhance their awareness of the mental states of users.
Large language models (LLMs) are transforming human-computer interaction and conceptions of artificial intelligence (AI) with their impressive capacities for conversing and reasoning in natural language. There is growing interest in whether LLMs have theory of mind (ToM); the ability to reason about the mental and emotional states of others that is core to human social intelligence. As LLMs are integrated into the fabric of our personal, professional and social lives and given greater agency to make decisions with real-world consequences, there is a critical need to understand how they can be aligned with human values. ToM seems to be a promising direction of inquiry in this regard. Following the literature on the role and impacts of human ToM, this paper identifies key areas in which LLM ToM will show up in human:LLM interactions at individual and group levels, and what opportunities and risks for alignment are raised in each. On the individual level, the paper considers how LLM ToM might manifest in goal specification, conversational adaptation, empathy and anthropomorphism. On the group level, it considers how LLM ToM might facilitate collective alignment, cooperation or competition, and moral judgement-making. The paper lays out a broad spectrum of potential implications and suggests the most pressing areas for future research.
Background The increasing deployment of conversational artificial intelligence (AI) in mental health interventions necessitates an evaluation of their efficacy in rectifying cognitive biases and recognizing affect in human-AI interactions. These biases are particularly relevant in mental health contexts as they can exacerbate conditions such as depression and anxiety by reinforcing maladaptive thought patterns or unrealistic expectations in human-AI interactions. Objective This study aimed to assess the effectiveness of therapeutic chatbots (Wysa and Youper) versus general-purpose language models (GPT-3.5, GPT-4, and Gemini Pro) in identifying and rectifying cognitive biases and recognizing affect in user interactions. Methods This study used constructed case scenarios simulating typical user-bot interactions to examine how effectively chatbots address selected cognitive biases. The cognitive biases assessed included theory-of-mind biases (anthropomorphism, overtrust, and attribution) and autonomy biases (illusion of control, fundamental attribution error, and just-world hypothesis). Each chatbot response was evaluated based on accuracy, therapeutic quality, and adherence to cognitive behavioral therapy principles using an ordinal scale to ensure consistency in scoring. To enhance reliability, responses underwent a double review process by 2 cognitive scientists, followed by a secondary review by a clinical psychologist specializing in cognitive behavioral therapy, ensuring a robust assessment across interdisciplinary perspectives. Results This study revealed that general-purpose chatbots outperformed therapeutic chatbots in rectifying cognitive biases, particularly in overtrust bias, fundamental attribution error, and just-world hypothesis. GPT-4 achieved the highest scores across all biases, whereas the therapeutic bot Wysa scored the lowest. Notably, general-purpose bots showed more consistent accuracy and adaptability in recognizing and addressing bias-related cues across different contexts, suggesting a broader flexibility in handling complex cognitive patterns. In addition, in affect recognition tasks, general-purpose chatbots not only excelled but also demonstrated quicker adaptation to subtle emotional nuances, outperforming therapeutic bots in 67% (4/6) of the tested biases. Conclusions This study shows that, while therapeutic chatbots hold promise for mental health support and cognitive bias intervention, their current capabilities are limited. Addressing cognitive biases in AI-human interactions requires systems that can both rectify and analyze biases as integral to human cognition, promoting precision and simulating empathy. The findings reveal the need for improved simulated emotional intelligence in chatbot design to provide adaptive, personalized responses that reduce overreliance and encourage independent coping skills. Future research should focus on enhancing affective response mechanisms and addressing ethical concerns such as bias mitigation and data privacy to ensure safe, effective AI-based mental health support.
As a foundational component of cognitive intelligence, theory of mind (ToM) can make AI more closely resemble human thought processes, thereby enhancing their interaction and collaboration with human. In particular, it can significantly improve a model's comprehension of videos in complex scenes. However, current video question answer (VideoQA) datasets focus on studying causal reasoning within events, few of them genuinely incorporating human ToM. Consequently, there is a lack of development in ToM reasoning tasks within the area of VideoQA. This paper presents BDIQA, the first benchmark to explore the cognitive reasoning capabilities of VideoQA models in the context of ToM. BDIQA is inspired by the cognitive development of children's ToM and addresses the current deficiencies in machine ToM within datasets and tasks. Specifically, it offers tasks at two difficulty levels, assessing Belief, Desire and Intention (BDI) reasoning in both simple and complex scenarios. We conduct evaluations on several mainstream methods of VideoQA and diagnose their capabilities with zero-shot, few-shot and supervised learning. We find that the performance of pre-trained models on cognitive reasoning tasks remains unsatisfactory. To counter this challenge, we undertake thorough analysis and experimentation, ultimately presenting two guidelines to enhance cognitive reasoning derived from ablation analysis.
As the performance of larger, newer Large Language Models continues to improve for strategic Theory of Mind (ToM) tasks, the demand for these state-of-the-art models increases commensurately. However, their deployment is costly both in terms of processing power and time. In this paper, we investigate the feasibility of creating smaller, highly-performing specialized algorithms by way of fine-tuning. To do this, we first present a large pre-trained model with 20 unique scenarios that combine different social contexts with games of varying social dilemmas, record its answers, and use them for Q&A fine-tuning on a smaller model of the same family. Our focus is on in-context game-theoretic decision-making, the same domain within which human interaction occurs and that requires both a theory of mind (or a semblance thereof) and an understanding of social dynamics. The smaller model is therefore trained not just on the answers provided, but also on the motivations provided by the larger model, which should contain advice and guidelines to navigate both strategic dilemmas and social cues. We find that the fine-tuned smaller language model consistently bridged the gap in performance between the smaller pre-trained version of the model and its larger relative and that its improvements extended in areas and contexts beyond the ones provided in the training examples, including on out-of-sample scenarios that include completely different game structures. On average for all games, through fine-tuning, the smaller model showed a 46% improvement measured as alignment towards the behavior of the larger model, with 100% representing indistinguishable behavior. When presented with out-of-sample social contexts and games, the fine-tuned model still displays remarkable levels of alignment, reaching an improvement of 18% and 28% respectively.
Design fictions allow us to prototype the future. They enable us to interrogate emerging or non-existent technologies and examine their implications. We present three design fictions that probe the potential consequences of operationalizing a mutual theory of mind (MToM) between human users and one (or more) AI agents. We use these fictions to explore many aspects of MToM, including how models of the other party are shaped through interaction, how discrepancies between these models lead to breakdowns, and how models of a human's knowledge and skills enable AI agents to act in their stead. We examine these aspects through two lenses: a utopian lens in which MToM enhances human-human interactions and leads to synergistic human-AI collaborations, and a dystopian lens in which a faulty or misaligned MToM leads to problematic outcomes. Our work provides an aspirational vision for human-centered MToM research while simultaneously warning of the consequences when implemented incorrectly.
A common conjecture is that social success relies on "theory of mind"-the everyday skill of imputing mental states to others. We test the hypothesis that individuals with stronger theory of mind skills and motivation garner more positive first impressions because of how they interact with others. Participants included 334 young adults who were paired with a peer for a first-time meeting. Dyads completed a cooperative Lego-building task, which was videotaped and later coded for behavioral manifestations of theory of mind by independent raters. Theory of mind accuracy and motivation were assessed with validated laboratory tasks and a self-report questionnaire, respectively. First impressions were assessed based on partner's ratings of participant likeability, enjoyment of the interaction, and changes in positive affect. Results of actor-partner interdependence mediation models revealed that the associations between theory of mind and first impressions are indirect and mediated through behaviors. Specifically, participants with stronger theory of mind demonstrated greater cognitive sensitivity and pragmatic conversational skills. However, only cognitive sensitivity subsequently predicted more favorable first impressions. This research shows that social-cognitive skills can affect others' social impressions through their behavioral manifestations. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
In the digital age, individuals increasingly express their thoughts and emotions through social media, resulting in a continuous stream of user-generated text data. This abundance of textual information offers valuable insight into users' mental and emotional states. Leveraging this data, a Cognitive Behavioral Therapy (CBT) model has been developed to classify emotions and support mental health interventions. CBT, a well-established, evidence-based approach for addressing psychological issues, benefits from the integration of advanced artificial intelligence (AI) and deep learning techniques to enhance its effectiveness. The proposed CBT framework utilizes Natural Language Understanding (NLU) models such as BERT and RoBERTa to analyze user-generated text and categorize it into specific emotional states based on intent. Datasets sourced from platforms like Kaggle and Reddit serve as the foundation for training the model in emotion detection tasks. Beyond classification, the system incorporates XLNet to generate contextually appropriate and supportive responses aimed at promoting emotional well-being and guiding users toward a healthier mental state. This dual capability enables the system to function as both a diagnostic and therapeutic tool. Users may find interactions with AI-driven systems more approachable and non-judgmental, offering a comfortable alternative to traditional psychotherapy while still delivering meaningful support and guidance.
No abstract available
Advancements in smart TV technologies have enhanced digital entertainment by providing personalized services to households. However, their role as a group device and the complexities of co-viewing experiences are often overlooked. To better understand user expectations in co-viewing contexts, and create more intuitive and satisfying experiences, we adopt the concept of mental models. By employing a mental model framework, we study co-viewing sessions and conducted semi-structured interviews to capture how groups browse and select content when co-viewing. Our findings reveal five design implications for interface designs including social factors, active user engagement, and adaptive interaction design. Based on these design implications, we propose an interface design prototype informed by four actionable design insights that align with co-viewing usage contexts. By investigating how group dynamics shape content exploration and decision-making, our work delivers practical insights with a visual prototype for designing more satisfying co-viewing experiences on smart TVs.
No abstract available
Augmented Reality multi-user interfaces facilitate communication, coordination and collaboration among teams. Moreover, these interfaces can help to align the team’s perceptions and expectations under a shared mental model. This model is a psychological construct that represents the common knowledge, beliefs, and understandings held by team members. In this paper, we study to what extent, if any, the combination of Augmented Reality multi-user interfaces and shared mental models affects human-robot trust. To this end, we developed an Augmented Reality multi-user interface to perform a user study (N = 37) comparing non-dyadic human-robot interactions with a quadruped robot exhibiting low reliability (Group 3), against dyadic interactions while the robot exhibited high-reliability (Group 1) or low-reliability (Group 2). We made this comparison using validated trust questionnaires relevant to HRI. Our results, obtained via Bayesian data analysis methods, show differences in the distribution of answers between groups 1 and 2. Notably, this difference is smaller between groups 1 and 3, which suggests that the combination of shared mental models and multi-user interfaces holds promise as an effective way to manage and calibrate human-robot trust.
No abstract available
Large language models (LLMs) have become extensively used among users across diverse settings. Yet, with the complex nature of these large-scale artificial intelligence (AI) systems, leveraging their capabilities effectively is yet to be explored. In this study, we looked at the types of communication errors that occur in interactions between humans and ChatGPT-3.5 in Arabic. A corpus of six Arabic-language consultations was collected from an online mental health support forum. For each consultation, the researchers provided the user’s Arabic queries to ChatGPT-3.5 and analyzed the system’s responses. The study identified 102 communication errors, mostly grammatical and repetitions. Other errors involved contradictions, ambiguous language, ignoring questions, and lacking sociality. By examining the patterns and types of communication errors observed in ChatGPT’s responses, the study is expected to provide insights into the challenges and limitations of current conversational AI systems, particularly in the context of sensitive domains like mental health support.
This qualitative phenomenological study investigates the complex relationship between mental health conditions and smartphone-mediated interpersonal communication among university students in Makassar, Indonesia. Using in-depth interviews with 18 participants from three universities, categorized by their Depression Anxiety Stress Scale-21 (DASS-21) scores, the research explores how psychological well-being influences communication patterns. The study reveals five major themes: the smartphone as an emotional regulator, digital communication as a social safety net, technology-mediated relationship maintenance, anxiety-driven communication patterns, and cultural adaptation in digital spaces. The findings indicate that a student's mental health status is a crucial moderating factor that shapes how they engage with digital communication technologies. The study proposes a Digital Communication Adaptation Model (DCAM), a theoretical framework explaining how mental health conditions influence communication patterns. The research contributes to communication and mental health literature by offering a culturally sensitive model that bridges individual psychological factors with broader social contexts, providing insights for future research and practical interventions in this area.
Conversational agents powered by large language models (LLM) have increasingly been utilized in the realm of mental well-being support. However, the implications and outcomes associated with their usage in such a critical field remain somewhat ambiguous and unexplored. We conducted a qualitative analysis of 120 posts, encompassing 2917 user comments, drawn from the most popular subreddit focused on mental health support applications powered by large language models (u/Replika). This exploration aimed to shed light on the advantages and potential pitfalls associated with the integration of these sophisticated models in conversational agents intended for mental health support. We found the app (Replika) beneficial in offering on-demand, non-judgmental support, boosting user confidence, and aiding self-discovery. Yet, it faced challenges in filtering harmful content, sustaining consistent communication, remembering new information, and mitigating users' overdependence. The stigma attached further risked isolating users socially. We strongly assert that future researchers and designers must thoroughly evaluate the appropriateness of employing LLMs for mental well-being support, ensuring their responsible and effective application.
Background: Cognitive behavioral therapy (CBT)-based mobile apps have been shown to improve CBT-based interventions effectiveness. Despite the proliferation of these apps, user-centered guidelines pertaining to their design remain limited. The study aims to identify design features of CBT-based apps using online app reviews. Methods: We used 4- and 5-star reviews, preprocessed the reviews, and represented the reviews using word-level bigrams. Then, we leveraged latent Dirichlet allocation (LDA) and visualization techniques using python library for interactive topic model visualization to analyze the review and identify design features that contribute to the success and effectiveness of the app. Results: A total of 24,902 reviews were analyzed. LDA optimization resulted in 86 topics that were labeled by two independent researchers, with an interrater Cohen’s kappa value of 0.86. The labeling and grouping process resulted in a total of six main design features for effective CBT-based mobile apps, namely, mental health management and support, credibility support, self-understanding and personality insights, therapeutic approaches and tools, beneficial rescue sessions, and personal growth and development. Conclusions: The high-level design features identified in this study could evidently serve as the backbone of successful CBT-based mobile apps for mental health.
Agentic systems aim to handle complex problems with increasing system autonomy using generative AI. These new agentic systems are becoming more feasible and easier to build. Yet we know little about what end-users need to know to use these systems appropriately. We study one such agentic system, “Gent,” which can break down complex problems into a set of actions, provide a rationale for each action, interact with external information, and cite its sources. Our goals were to understand users’ mental models of the agentic system, the information users leveraged to evaluate the accuracy of the system, and users’ information needs. In our study (N=24), participants interacted with Gent for four information seeking tasks where they could see Gent’s actions, rationale, and sources. Participants’ mental models centered around the search-like qualities of the system, with their confidence impacted by the website sources. Participants’ mental models often lacked insight into the workings of the generative AI model and agentic framework that impact the actions the system takes. Participants used the descriptions of the system’s actions to support their evaluation of the accuracy of the system and wanted to know more about how the system got to its answers. Participants also relied on their own personal knowledge and the style or length of Gent’s responses to evaluate the accuracy. Our results highlight the need for further transparency in agentic AI systems to support end-users in evaluating system outputs and help them build effective mental models.
The rapid diffusion of “(GenAI)” Generative Artificial Intelligence systems has reshaped everyday activities, yet their adoption remains uneven and cognitively demanding for many users. Existing research has largely relied on conventional technology acceptance models, providing limited insight into cognitive burden and GenAI-specific system characteristics. To address this gap, this study develops an integrated framework combining the Technology Acceptance Model, Cognitive Load Theory, and the DeLone and McLean Information Systems Success Model to explain GenAI adoption among ordinary users. Survey data from 1001 active GenAI users were analyzed using partial least squares structural equation modeling (PLS-SEM). The results indicate that all core technology acceptance relationships are statistically significant (p < 0.001), while mental load negatively affects perceived usefulness and user attitudes. Moreover, GenAI system attributes—output quality, transparency, friction reduction, and system integration—significantly moderate key adoption pathways and strengthen the translation of behavioral intention into actual use. Predictive assessment indicates that the proposed model outperforms the baseline technology acceptance model, with stronger explanatory power and superior out-of-sample predictive performance (Q2predict > 0.35). The findings offer actionable insights for designing cognitively efficient, trustworthy, and sustainable GenAI systems.
An increasing number of social media platforms are utilizing social network integration to enhance sustained user participation. However, there is a limited understanding of the mechanisms through which social network integration facilitates user continuance. This study applies communication visibility theory to explicate how social network integration enables visible communication among social media users in game environments, which enhances players’ social stickiness and commitment. Furthermore, we utilize social comparison theory to illuminate the effects of social comparison, spurred by relationships with acquaintances, on individuals’ continuance participation. Our research model is substantiated through a longitudinal field study that gathered subjective and objective data from a social game. After two rounds of data collection, we obtained 231 responses to examine the research model. Our results elaborate how communication visibility enhances players’ socialization with social media friendships, which further enhances their continuance participation in social games. We identify that social comparison exerts both a direct and moderating influence on players’ continuance participation. Integrating social media friendships into social games enhances player retention by fostering social presence, engagement and comparison. Game operators should optimize interfaces, implement notifications and host interactive events. These strategies also apply to e-commerce and knowledge-sharing platforms, improving user participation and community interactions. This study enriches the existing literature by clarifying how social network integration influences players’ sustained use within social games. It also provides practical insights into leveraging social network integration to enhance user retention across various social media platforms.
No abstract available
Introduction The potential safety benefits of advanced driver assistance systems (ADAS) highly rely on drivers’ appropriate mental models of and trust in ADAS. Current research mainly focused on drivers’ mental model of adaptive cruise control (ACC) and lane centering control (LCC), but rarely investigated drivers’ understanding of emerging driving automation functions beyond ACC and LCC. Methods To address this research gap, 287 valid responses from ADAS users in the Chinese market, were collected in a survey study targeted toward state-of-the-art ADAS (e.g., autopilot in Tesla). Through cluster analysis, drivers were clustered into four groups based on their knowledge of traditional ACC and LCC functions, knowledge of functions beyond ACC and LCC, and knowledge of ADAS limitations. Predictors of driver grouping were analyzed, and we further modeled drivers’ trust in ADAS. Results Drivers in general had weak knowledge of LCC functions and functions beyond ACC and LCC, and only 27 (9%) of respondents had a relatively strong mental model of ACC and LCC. At the same time, years of licensure, weekly driving distance, ADAS familiarity, driving style (i.e., planning), and personability (i.e., agreeableness) were associated with drivers’ mental model of ADAS. Further, it was found that the mental model of ADAS, vehicle brand, and drivers’ age, ADAS experience, driving style (i.e., focus), and personality (i.e., emotional stability) were significant predictors of drivers’ trust in ADAS. Discussion These findings provide valuable insights for the design of driver education and training programs to improve driving safety with ADAS.
This study aims to investigate how phubbing – diverting attention from face-to-face interactions to mobile devices – affects psychological well-being. Grounded in media naturalness theory (MNT), the research explores the roles of perceived immediacy, power dynamics, rejection sensitivity, social attraction and emotional arousal in shaping communication satisfaction. Using a latent variable model and Partial Least Squares Structural Equation Modeling (PLS-SEM), the study aims to uncover the psychological mechanisms underlying digitally disrupted conversations and how individual traits may moderate these effects. The study uses a publicly available data set from ICPSR, comprising 68 participants who engaged in face-to-face conversations under varying smartphone usage conditions. An eight-construct latent variable model was developed to examine 11 hypotheses grounded in MNT. PLS-SEM was conducted using WarpPLS, which is well-suited for non-normal data distributions. The model assessed direct, mediating and moderating relationships among key psychological and communication constructs, including immediacy, power perception, rejection sensitivity, social attraction, arousal and satisfaction, to understand the psychological impact of phubbing during interpersonal interactions. The analysis reveals that longer interaction duration increases perceived immediacy and emotional arousal. Immediacy positively influences perceived power and social attraction, while power enhances perceptions of phubbing, which in turn elevates rejection sensitivity. Although immediacy generally predicts higher satisfaction, this effect is negatively mediated by social attraction and moderated by rejection sensitivity, indicating that personal traits can offset the benefits of natural communication. The structural model demonstrates strong reliability, validity and model fit, highlighting the complex psychological consequences of phubbing and how digital interruptions subtly undermine emotional satisfaction in face-to-face conversations. This study offers a novel application of MNT to examine the psychological impacts of phubbing within digitally disrupted face-to-face interactions. Unlike prior research that focuses primarily on interpersonal outcomes, this work uncovers deeper psychological mechanisms – such as power perception, rejection sensitivity and emotional arousal – that mediate or moderate the effects of mobile phone use on communication satisfaction. By integrating dispositional and relational variables into a single structural model, the study advances theoretical understanding of digital disruption in interpersonal settings and provides practical insights for mental health practitioners, educators and designers of communication technologies.
Mental health applications have gained significant popularity, but many still fall short in delivering personalized emotional support. This study explores a Personalized Content Filter powered by BiLSTM to analyze user-generated text, such as journal entries or chat messages, and identify emotions like sadness, anxiety, joy, and calmness. By determining the prevailing emotion, the system provides customized recommendations, such as mindfulness practices, stress-relief techniques, or uplifting content.The BiLSTM model uses contextual insights from text to ensure precise emotion recognition, employing pre-trained word embeddings for robust feature extraction. Over time, the system learns from individual users’ emotional trends, refining its recommendations to suit their specific needs. Experimental results confirm the model’s accuracy and reliability, emphasizing its capability in emotion detection.Incorporating this system into mental health applications has the potential to boost user engagement while offering timely support for emotional well-being. This research underscores the potential of deep learning in advancing mental health tools and suggests future directions, such as integrating multimodal data and exploring broader real-world implementations.
Extended Reality (XR) systems are rapidly shifting from isolated, single-user applications towards collaborative and social multi-user experiences. To evaluate the quality and effectiveness of such interactions, it is therefore required to move beyond traditional individual metrics such as Quality-of-Experience (QoE) or Sense of Presence (SoP). Instead, group-level dynamics such as effective communication, coordination etc. need to be encompassed to assess the shared understanding of goals and procedures. In psychology, this is referred to as a Shared Mental Model (SMM). The strength and congruence of such an SMM are known to be key for effective team collaboration and performance. In an immersive XR setting, though, novel Influence Factors (IFs) emerge that are not considered in a setting of physical co-location. Evaluations on the impact of these novel factors on SMM formation in XR, however, are close to non-existent. Therefore, this work proposes SMMs as a novel evaluation tool for collaborative and social XR experiences. To better understand how to explore this construct, we ran a prototypical experiment based on ITU recommendations in which the influence of asymmetric end-to-end latency is evaluated through a collaborative, two-user block building task. The results show how also in an XR context strong SMM formation can take place even when collaborators have fundamentally different responsibilities and behavior. Moreover, the study confirms previous findings by showing in an XR context that a teams’ SMM strength is positively associated with its performance.
In recent years, the digital delivery of mindfulness-based interventions for anxiety and stress relief has gained popularity among the general population. However, there is a lack of research on how users navigate those specially designed mobile applications(apps) and how their mental health journeys may be shaped by the design processes that constitute their interactions with the app. Given the reach and scale of Headspace app's user base, there is an opportunity to study the design and user experience. Integrating the concept of technological affordances and the Unified Theory of Acceptance and Use of Technology (UTAUT-2), this study seeks to answer the following research question: How do the characteristics of the Headspace mobile app enable or constrain different aspects of the user’s mental health journey? The findings reveal that Headspace offers affordances such as accessibility, progress tracking, and privacy, which influence user engagement and experience. The affordances notably intersect with UTAUT-2, thereby providing a robust design to understand the sociotechnical dynamics of using digital mental health tools. Overall, this study provides insight into user experience and interpretation of affordances, demonstrating the relationship between design elements and individual perceptions that can iteratively inform digital experience.
The act of explaining across two parties is a feedback loop, where one provides information on what needs to be explained and the other provides an explanation relevant to this information. We apply a reinforcement learning framework which emulates this format by providing explanations based on the explainee's current mental model. We conduct novel online human experiments where explanations generated by various explanation methods are selected and presented to participants, using policies which observe participants' mental models, in order to optimize an interpretability proxy. Our results suggest that mental model-based policies (anchored in our proposed state representation) may increase interpretability over multiple sequential explanations, when compared to a random selection baseline. This work provides insight into how to select explanations which increase relevant information for users, and into conducting human-grounded experimentation to understand interpretability.
No abstract available
In hybrid mobility societies, where automated vehicles (AVs) and humans interact in public spaces, the significance of prosocial behaviors intensifies. These behaviors are crucial for the smooth functioning of an interdependent transportation environment, mitigating challenges from the integration of AVs and human-operated systems, and enhancing user well-being by fostering more efficient, less stressful, and inclusive environments. This study explores the impact of receiving prosocial behaviors on cognition, riding behavior, and well-being of micromobility users through interdependent traffic situations within a simulated urban environment. Our mixed design study involved two types of social interactions as between-subject conditions of prosocial and asocial interaction, and three categories of time constraint as within-subject conditions: relaxed, neutral, and pressed. The findings reveal that receiving prosocial and asocial behaviors can affect the state of well-being and trial performance in a mobility environment.
With the rapid development of digital technologies, intelligent generation of game narratives has become a crucial research direction in artificial intelligence and game design fields. However, existing algorithms lack deep understanding of group psychological mechanisms and struggle to generate authentic and credible multi-character interaction scenarios. Based on social proof effect theory, this study constructs an innovative multi-character interaction group behavior simulation algorithm aimed at enhancing the coherence, authenticity, and user experience of game narratives. The research employs a methodology combining theoretical modeling, algorithm design, and empirical validation. First, an "Environment-Cognition-Society-Behavior" quaternary interaction theoretical framework is constructed, providing in-depth analysis of environmental factors' influence mechanisms on group behavior, including the operational patterns of spatial layout, environmental complexity, and contextual cues. Second, the dynamic evolution mechanisms of social proof effects are systematically explored, revealing the inverted U-shaped relationship between group size and influence propagation, the S-shaped temporal curve characteristics of group behavior convergence, and the moderating role of individual differences in environmental adaptation. Building upon this foundation, a narrative generation algorithm based on Graph Neural Networks and Multi-Agent Reinforcement Learning is designed and implemented. Through the collaborative operation of a social proof intensity calculation engine, multi-character decision coordinator, and dynamic narrative generator, high-quality adaptive narrative creation is achieved. Through large-scale user experience testing involving 180 participants, the study validates the algorithm's effectiveness: compared to traditional methods, narrative logical consistency improved by over 40%, character behavior credibility scores reached 8.5 points, overall user immersion increased by 45%, average gameplay duration increased by 68%, and replay rate reached 73.2%. Algorithm performance testing demonstrates an average response time of only 127 milliseconds, memory usage reduced by 39.8%, CPU utilization decreased by 50.4%, exhibiting excellent scalability and system stability that fully meets industrial-grade application requirements. The research achievements not only provide crucial support for technological innovation in the gaming industry but also establish foundations for application expansion in education and training, social governance, mental health, and other fields, possessing significant theoretical value and broad practical application prospects. This study successfully validates the tremendous potential of psychological theories in artificial intelligence algorithm design, opening new pathways for interdisciplinary integration research.
Given that emotional content spreads more widely than rational content in social networks, as well as the complexity of user cognition and the interaction of derivative topics, this article proposes a derivative topic dissemination model that integrates multidimensional cognition and game theory. First, regarding the issue of user emotional reactions in mining topics. In this article, we quantify the affective influence among users by considering user behaviors as continuous conversations through conversation-level sentiment analysis and the proximity centrality of social networks. Second, considering that user behavior is influenced by multidimensional cognition, this article proposes a method based on S(Sensibility) R(Rationality) 2vec to simulate the dialectical relationship between sensibility and rationality in the user decision-making process. Finally, considering the cooperative and competitive relationship among derived topics, this article uses evolutionary game theory to analyze the topic life cycle and quantify its impact on user behavior by time discretization method. Accordingly, we propose a CG-back-propagation (BP) model incorporating a BP neural network to efficiently simulate the nonlinear relationship of user behavior. Experiments show that the model can not only effectively tap the influence of multidimensional cognition on users’ retweeting behavior, but also effectively perceive the propagation dynamics of derived topics.
Virtual environment technology enables users to create and embody avatars that differ from their real-life appearances. The "Proteus effect" refers to how an avatar’s appearance can shape a user’s cognition and behavior. This study explores how users’ impressions of avatars with different visual characteristics influence their levels of agreeableness, with the broader aim of supporting skill development for daily tasks and social interaction. In the experiment, Priest and Warrior avatars were used in a prisoner’s dilemma task to assess the relationship between avatar impressions and agreeableness. While significant differences in perceived agreeableness and extroversion were found based on avatar appearance, no significant effects were observed in actual task behavior. Future research should enhance the experimental design by incorporating outcome-based rewards to better capture behavioral effects.
In this paper we present a computational model for managing the impressions of warmth and competence (the two fundamental dimensions of social cognition) of an Embodied Conversational Agent (ECA) while interacting with a human. The ECA can choose among four different self-presentational strategies eliciting different impressions of warmth and/or competence in the user, through its verbal and non-verbal behavior. The choice of the non-verbal behaviors displayed by the ECA relies on our previous studies. In our first study, we annotated videos of human-human natural interactions of an expert on a given topic talking to a novice, in order to find associations between the warmth and competence elicited by the expert's non-verbal behaviors (such as type of gestures, arms rest poses, smiling). In a second study, we investigated whether the most relevant non-verbal cues found in the previous study were perceived in the same way when displayed by an ECA. The computational learning model presented in this paper aims to learn in real-time the best strategy (i.e., the degree of warmth and/or competence to display) for the ECA, that is, the one which maximizes user's engagement during the interaction. We also present an evaluation study, aiming to investigate our model in a real context. In the experimental scenario, the ECA plays the role of a museum guide introducing an exposition about video games. We collected data from 75 visitors of a science museum. The ECA was displayed in human dimension on a big screen in front of the participant, with a Kinect on the top. During the interaction, the ECA could adopt one of 4 self-presentational strategies during the whole interaction, or it could select one strategy randomly for each speaking turn, or it could use a reinforcement learning algorithm to choose the strategy having the highest reward (i.e., user's engagement) after each speaking turn.
Background The ability to follow one another’s gaze plays an important role in our social cognition; especially when we synchronously perform tasks together. We investigate how gaze cues can improve performance in a simple coordination task (i.e., the mirror game), whereby two players mirror each other’s hand motions. In this game, each player is either a leader or follower. To study the effect of gaze in a systematic manner, the leader’s role is played by a robotic avatar. We contrast two conditions, in which the avatar provides or not explicit gaze cues that indicate the next location of its hand. Specifically, we investigated (a) whether participants are able to exploit these gaze cues to improve their coordination, (b) how gaze cues affect action prediction and temporal coordination, and (c) whether introducing active gaze behavior for avatars makes them more realistic and human-like (from the user point of view). Methodology/Principal Findings 43 subjects participated in 8 trials of the mirror game. Each subject performed the game in the two conditions (with and without gaze cues). In this within-subject study, the order of the conditions was randomized across participants, and subjective assessment of the avatar’s realism was assessed by administering a post-hoc questionnaire. When gaze cues were provided, a quantitative assessment of synchrony between participants and the avatar revealed a significant improvement in subject reaction-time (RT). This confirms our hypothesis that gaze cues improve the follower’s ability to predict the avatar’s action. An analysis of the pattern of frequency across the two players’ hand movements reveals that the gaze cues improve the overall temporal coordination across the two players. Finally, analysis of the subjective evaluations from the questionnaires reveals that, in the presence of gaze cues, participants found it not only more human-like/realistic, but also easier to interact with the avatar. Conclusion/Significance This work confirms that people can exploit gaze cues to predict another person’s movements and to better coordinate their motions with their partners, even when the partner is a computer-animated avatar. Moreover, this study contributes further evidence that implementing biological features, here task-relevant gaze cues, enable the humanoid robotic avatar to appear more human-like, and thus increase the user’s sense of affiliation.
In this paper, I will argue that the rise in hostility and polarization on social media is explainable by taking into account a radical difference between online and face-toface interaction. In everyday offline environments, socially shared and context-dependent norms frame the understanding of other people’s minds based on their behavior. I will argue that, on social media platforms, social cognition is distorted thanks to two deliberate design choices that are a means for financial gain for the platform’s designers: namely, the lack of socially shared norms on these platforms (entailed by what is known as context collapse) and their interfaces’ extreme user- centeredness. I will argue that such design features not only cause frustration in the understanding of others but encourage testimonial injustice in interaction.
No abstract available
Previous work in HCI about personal informatics and behavior change suggests that representing data in intuitive metaphors and meaningful stories on glace-able displays should be considered to complement typical data visualization for daily user reflection and understanding. Informed by insights from social psychology, providing information regarding one's behavior (i.e., feedback) should (1) link behavioral data to positively or negatively valued outcomes; (2) show changes in the outcomes over time; and (3) include measures for pursuing different outcomes. Grounded in metaphor and blending theories from embodied cognition, we suggest metaphorically mapping less intuitive behavior-outcome links with more direct cause-effect relations from seemingly unrelated yet familiar domains. A behavior and a comparable scenario are cognitively compressed into an "animated parable". This paper describes the theoretical framework and design guidelines, and reports the development of a blended concept, "incingarette" (cigarette and incinerator), and its prototype. The work-in-progress informs updates on design recommendations.
A major challenge in human-robot interaction (HRI) is creating the “social fluidity” necessary for humans to perceive the interaction as life-like. During verbal interactions, for instance, the speech content itself is not the only thing that matters. Rather, things like timing, cadence, and manner of speaking are necessary to speak “like a native”, yet those attributes vary significantly by language and cultural setting. To that end, we developed a bilingual virtual avatar (Korean and English speaking) capable of autonomous speech during cooperative gameplay with a human participant in a social survival video game. We then ran a series of experiments with 60 participants (30 English speakers and 30 Korean speakers) interacting with the avatar during 30-minute game sessions. The experiments included several conditions, in which we modified the avatar’s speech behavior in different ways while collecting multiple types of data (audio-visual recordings, speech data, gameplay data, human perceptions). Results showed significant differences between English and Korean speakers during the experiment. Korean speakers spoke less on average and had more negative speech sentiment, while the English speakers spoke more frequently and had more positive speech sentiment. The avatar was also more likely to interrupt the human’s speech in English than Korean, despite having the same design. Furthermore, Korean speakers perceived more social presence when the avatar engaged in more repetitive speech behavior, while English speakers perceived more when the avatar was more “chatty”. We suggest that these results likely relate to cultural differences between East Asian cultures and Western cultures in terms of the social norms that govern appropriate social interaction behavior, and discuss the implications for future work on interactive speech agents.
With the development of social media, user behavior on these platforms has become a focal point of research. However, less attention has been paid to irrational information behaviors in the study of social media user behavior. Malicious comments fundamentally reflect the irrational behavior of social media users. This paper, based on the cognition-affect-conation theoretical model, constructs a structural equation model that includes malice, individual constraints, radicalized emotions, and malicious comment behavior. Through 214 questionnaires, SEM is used to analyze the path relationships between cognitive bias, radicalized emotions, and malicious comment behavior, as well as the factors influencing radicalized emotions. The study findings indicate: (1) the objective perception of information malice is not related to malicious comment behavior. (2) Subjectively perceived information malice significantly positively influences both radicalized emotions and malicious comment behavior. (3) Individual cognitive constraints significantly positively influence both radicalized emotions and malicious comment behavior. (4) Radicalized emotions, acting as a mediating variable, significantly positively influence malicious comment behavior. The conclusions of this study will provide theoretical references for publishing accurate information on social media, reducing online hostility, and building a healthy and harmonious online environment.
In social networks, studying rumor propagation patterns is essential for curbing the spread of rumors. Given the coexistence and conflict of multiple-type rumor information, as well as users’ cognitive differences, this article presents a rumor propagation model grounded in user cognition and evolutionary game theory. First, considering the potential impact of social relationships between users on rumor propagation, the KD-Tree algorithm is employed to uncover hidden connections between users, thereby enriching the topology of the user’s social network. Second, a user behavior driving mechanism for rumor, anti-rumor, and motivation-rumor types is constructed based on evolutionary games to reflect the interactive and strategic nature of users’ responses. Moreover, the Lotka-Volterra equation is utilized to explore the dynamic game of multi-type rumor information and the cognitive process of users. Finally, to address differences in users’ cognition, this article introduces the anti-rumor trust state A and the motivation-rumor trust state M, which arise from users’ exposure to multiple types of rumor information. Based on these trust states, a rumor propagation model, SIAMR, is constructed using user cognition and evolutionary game theory. Experiments demonstrate that the model accurately captures the dynamic interactions between multi-type rumor information and the transmission process of rumor topics in social networks. The proposed model integrates cognitive psychology with a strategic interaction framework, offering a more realistic representation of rumor propagation behavior in the real world. Experimental results reveal that SIAMR improves prediction accuracy by 14.23% over baseline models in simulating the dynamics of multiple types of rumors, effectively capturing users’ cognitive influences and the mechanisms of information competition.
"Don't Judge a Book by its Cover": Exploring Discriminatory Behavior in Multi-User-Robot Interaction
With multi-user scenarios in human-robot interaction the problem of predisposed and unfair robot treatments due to biases arises. Thus, this study explores whether individuals recognize discrimination by a social robot and the impact of the feeling of exclusion. As a social consequence, the influence of robot discrimination on the perception of interaction partners and the attribution of blame is focused. Employing a VR-based multi-user lab experiment simulating a library task, participants experienced discrimination by a robot. Results suggest that discriminated individuals felt more discriminated against, albeit not significantly more ostracized. Moreover, discrimination influenced the self-attribution of blame and observers' evaluations of the discriminated user's competence. This work highlights the complex social impact of robot discrimination on human interactions and team dynamics.
Multinomial processing tree (MPT) models can provide novel insights into the cognitive processes underlying a wide variety of social cognitive judgments and behaviors. In previous research, MPT models have been used to disentangle the contributions of multiple latent processes to tasks configured to assess moral reasoning, processing fluency, decision-making, implicit biases, and social categorization, among many other topics. However, until recently, MPT models were limited in their application to categorical data. New methodological advances extend traditional MPT estimation methods by incorporating reaction time data, thereby expanding the breadth and depth of questions that can be investigated. This article provides a user-friendly step-by-step tutorial for response-time extended MPT methods with annotated code and data, using the Implicit Association Test as a working example.
After several years of providing extensive e-banking services, customers' acceptance and use of e-banking systems are major competitive advantages for banks and other IT-based institutions. Reports are indicated, a wide range of people is reluctant to use e-banking services effectively. Accordingly, this research aims to improve users’ attitudes and intentions to use e-banking in a transition economy. To achieve this aim, the social cognition theory is employed within the framework of human–technology interaction as a novel approach to extend the technology acceptance model. To this effect, partial-least-squares structural equation modeling is applied to study survey data of five Iranian banks. The results show that users’ innovativeness, compatibility, vicarious experience, and flexibility in electronic learning significantly influenced their attitude, but technophobia inversely influenced the intention to use e-banking. Regarding technology aspects, quality of functional structure was the sole factor that significantly influenced intention to use e-banking while the influence of online interactive structure, innovative structure, and security structure on users' intention was not confirmed. We conclude that banks in most transition economies that have similar socioeconomic conditions could influence users’ attitudes and intentions toward e-banking by drawing from the insights of this study, thus enabling the banks to make more informed choices about investing in Internet banking.
This workshop aims at bringing together various disciplines to address the relationship between mentalizing or adopting the intentional stance towards robots and social attunement in human-robot interaction. The question will be tackled from the empirical, theoretical, computational, and philosophical approaches as well as potential applications in clinical domains. We invite speakers from the areas of cognitive and social neuroscience, psychology, computational modeling, cognitive science, human-robot interaction, robotics, and philosophy of mind, who address the question of conditions and consequences of attributing mental states to robots. We will also discuss individual differences in attitudes regarding robots, including variability in the likelihood of adopting the intentional stance towards artificial agents. The concluding discussion will focus on how empirical results influence the implementation of behavior in robots, and which application contexts should promote robot design that elicits mentalizing. In the discussion, we will also address ethical aspects related to evoking socio-cognitive mechanisms (including mentalizing) towards robots.
To explain how social cognition normally serves us in real life, we need to ask which factors contribute to specific social interactions. Recent accounts, and mostly pluralistic models, have started incorporating contextual and social factors in explanations of social cognition. In this paper, I further motivate the importance of contextual and identity factors for social cognition. This paper presents scripts as an alternative resource in social cognition that can account for contextual and identity factors. Scripts are normative and context-sensitive knowledge structures that describe behavior in terms of corresponding events, situations, social roles, individuals, or mental state types in a way that guides action. The script approach presented here builds on recent accounts of social cognition but points out important differences and possible advantages it has over them: for example, the script approach focuses even more strongly on context and identity.
Music consumption is shaped by both internal factors (e.g., mood, motivation) and external factors (e.g., activity, social environment), which together influence listeners’ behavior (e.g., focus, songs’ skips) and reactions (e.g., mood changes). While prior research has explored real-life or survey-based, context-aware music listening with limited available context information, we introduce a dataset comprising 216 music listening sessions collected in real-world settings through a custom-built Android mobile application designed to assess a wide range of contextual factors. The dataset captures static (e.g., activity, social environment, motivation) and dynamic (e.g., mood changes) contextual factors, along with music interaction data (e.g., skipped or fully listened songs), listening focus levels, and participant traits (e.g., demographics, music education, listening preferences, personality). Our analysis highlights key insights into how different contextual factors influence user behavior and mood. demonstrating significant differences in skipping songs, focus levels, and genre diversity. We show that music listening sessions grouped by context are significantly different in terms of music listening behaviors (focus, skipping, and session genre diversity) and mood changes (happiness, sadness, stress, and energy). Furthermore, we explore the correlations between personality traits and listening behaviors (mean skip rate and genre diversity). Ultimately, our findings emphasize the importance of understanding context, as different situations lead to distinct music preferences and have varying impacts on user behavior and emotional responses.
Generative AI systems like chatbots are increasingly being introduced into learning, teaching and assessment scenarios at universities. While previous research suggests that users treat chatbots like humans, computer systems are still often perceived as less trustworthy, potentially impairing their usefulness in learning contexts. How are processes of social cognition applied to chatbots compared to humans? Our study focuses on the role of politeness in communication. We hypothesise that polite communication improves the perception of trustworthiness of chatbots. University students read a feedback dialogue between a student and a feedback provider. In a 2 × 2 between‐subjects experimental design, we manipulated the feedback's author (chatbot vs. human teacher) and the feedback formulation (polite vs. direct). Participants evaluated the feedback giver on measures of epistemic trustworthiness (expertise, benevolence and integrity) and on two basic dimensions of social cognition, namely agency and communion. Results showed that a polite feedback giver was rated higher on benevolence and communion, whereas a direct feedback giver was rated higher on agency. Unexpectedly, the chatbot was rated lower on benevolence than the human. This suggests that social cognition does apply to interactions with chatbots, with caveats. We discuss the findings regarding the design of feedback chatbots and their use in higher education. What is already known about this topic Technology users tend to treat computer systems like humans, but computers are usually trusted less. Polite communication, that is mitigation of face threats is expected to enhance the evaluation of a chatbot as trustworthy. The research is relevant for the use and acceptance of chatbots as feedback providers in educational contexts. What this paper adds We test the assumption that polite language reduces the gap in epistemic trustworthiness between chatbots and human teachers as feedback givers. We describe an empirical study with 284 university student participants who report their perceptions of a feedback dialogue between a student and either a human teacher or a chatbot. We analyse the impact of feedback source as well as politeness on trustworthiness perceptions and social cognition. Implications for practice and/or policy The study confirms that users are receptive to politeness in communication. They treat chatbots in a similar manner to human interaction partners. The results highlight the significance of politeness of chatbots' language in learning contexts. Feedback chatbots need to be equipped with suitable linguistic strategies, such as politeness, for communicating in a socially appropriate manner at critical points in the instructional dialogue.
As society witnesses an increasing presence of robots in domains such as healthcare, education, and service industries, understanding user perceptions and acceptance becomes essential. This research investigates the connection between the perception of robot behavior and user experience, emphasizing the role of social characteristics in shaping perceptions. A sample of 240 participants (mean age 39) evaluated scenarios with non-anthropomorphic robots exhibiting different behavior-one scenario where the robot displayed social behavior (social sensitivity, attention-sharing, and helping) and another where it did not. Insights from the literature underscore the importance of user experience, cultural differences, and prior exposure to robots in shaping attitudes. The present paper replicates the evidence that experience with robots impacts the perception of robots. The novel finding is that users with greater experience prefer robots that show social behavior. The experiments utilized the Mind Attribution Scale, Godspeed Scale, Robotic Social Attributes Scale, and a Prior Experience with Robots Questionnaire. ANCOVA analysis revealed a significant interaction between robot behavior and participants' experience on the perception of the robots. Results indicated that as participants' experience increased, robots with social behavior received higher ratings across all instruments, affirming the impact of personal experiences on shaping perceptions. The study contributes valuable insights into the dynamics of human-robot interaction, guiding the programming of robot behavior for enhanced user experience and societal acceptance in various domains where robots are increasingly present.
The elderly population often experiences a decline in social interaction and cognitive function, especially in institutional settings such as the Tresna Werdha Budi Pertiwi Social Home in Bandung. Residents frequently face isolation and reduced mental stimulation, which impairs their emotional well-being and independence. This study aims to design a board game that enhances social togetherness and improves cognitive ability among older women in the institution. Using a Design Thinking methodology, researchers engaged in five stages: empathize (through interviews and observations), define (via SWOT analysis and crossover), ideate (sketching concepts and mind mapping), prototype (creating physical mock-ups), and test (evaluating usability with real users). The final product, titled "ABC Brain Teaser," features bright, high-contrast visuals and simple but challenging rules tailored to older people’s preferences and cognitive capabilities. Testing revealed that participants found the game engaging, easy to use, and beneficial for memory stimulation and interaction. The board game effectively addressed the lack of communal activity and provided an inclusive tool for cognitive exercise. This research offers a replicable model for elderly care institutions seeking non-pharmacological interventions to enhance life quality, demonstrating that board games can be both therapeutic and socially empowering when designed with empathy and local context in mind.
Mutual understanding via sharing and interpreting inner states is socially rewarding. Prior research shows that people find Brain-Computer Interfaces (BCIs) a suitable tool to implicitly communicate their cognitive states. In this paper, we conduct an online survey (N=43) to identify design parameters for systems that implicitly share cognitive states. We achieve this by designing a research probe called "SpotlessMind" to artistically share brain occupancy with another while considering the bystanders' experience to elicit user responses. Our results show that 98% would like to see the installation. People would use it as a gesture of openness and as a communication mediator. Abstracting visual, auditory, and somatosensory depictions is a good trade-off between understandability and users' privacy protection. Our work supports designing engaging prototypes that promote empathy, cognitive awareness and convergence between individuals.
The rapid advancement of intelligent chatbots has transformed human-AI interaction, offering novel opportunities to enhance user experience (UX) through psychological and design interventions. However, the mechanisms by which chatbot design features influence UX remain understudied, particularly regarding the roles of emotional and cognitive mediators.This study employed a 2 × 2 within-subjects experimental design with 160 participants to investigate the effects of anthropomorphism (high vs. low) and perceived intelligence (high vs. low) in chatbot avatars on UX. Structural equation modeling (SEM) was utilized to analyze the mediating roles of perceived empathy and trust in this relationship.Direct effects of anthropomorphism and perceived intelligence on UX were nonsignificant. However, their combined influence was significantly mediated by perceived empathy and trust (β = 0.48, *p* < 0.01). Specifically, highly anthropomorphic avatars correlated with elevated empathy (β = 0.32) and trust (β = 0.27), which in turn improved UX.These findings underscore the importance of emotional engagement over mere intelligence in designing effective chatbots. This research contributes unique insights into the complex mechanisms governing user interactions with intelligent chatbots, emphasizing the need for design strategies that prioritize emotional connections and cognitive ease.
No abstract available
ABSTRACT Chatbots are increasingly employed to provide basic medical advice and medication guidance among other health information services. Despite their utility, many users feel a disconnect due to perceived lack of empathy in these systems, leading to resistance toward using chatbot services. Prior research in human–computer interaction has highlighted the significant role of empathy in enhancing user experience, yet it remains uncertain whether cognitive empathy and emotional empathy differ in their impact. Informed by the Computers as Social Actors (CASA) theory, this study conducted a between-subjects experiment to investigate how different empathy types in health-assistant chatbots influence user satisfaction and usage intention. Additionally, it examined the mediating role of social presence and the moderating role of gender. The findings revealed that emotional empathy significantly improved user satisfaction and intention to use compared to cognitive empathy, with no notable gender differences. Social presence partially mediated the relationship between the chatbot’s empathy type and user outcomes. These results not only enhance our understanding of empathy’s mechanisms and effects in human–computer interactions but also offer crucial insights for developing effective communication strategies in health-assistant chatbots.
Although smart device use among children is increasing, most interventions overlook their cognitive and emotional development or rely too heavily on external control. Such approaches often overlook the developmental needs of children for emotional regulation and autonomy. Therefore, this study aims to propose a child-centred user experience (UX) framework to support digital self-regulation in preschool-aged children. The proposed system integrates multiple psychological theories—including Piaget’s concept of animistic thinking, executive function theory, Self-Determination Theory, and Acceptance and Commitment Therapy—to support cognitive and emotional regulation during screen use. Key features include persistent visual cues to enhance time awareness and behavioural anticipation, narrative-based character interactions to foster empathy and agency, and ritualised closure routines supported by multimodal and tangible interaction elements. Developed as a mobile prototype, the system was iteratively refined through two-stage consultations with child and adolescent psychiatrists and a developmental psychologist, including formative design feedback and follow-up expert review. Their feedback provided preliminary validation of the system’s developmental validity and emotional coherence. These findings suggest that affectively attuned UX design is a viable alternative to conventional control-based screen-time interventions in early childhood.
In recent years, the development of internet technology and digital innovation has made mobile application services and platform systems an indispensable part of daily life. The demand for augmented reality has been increasing as a result. In 2020, it became a significant turning point, and according to international survey agencies and professional reports and data, since people's daily lives are inseparable from mobile phones, whether you are at work, in class, or leisure time, you will use your phone anytime and anywhere. Everyone's smartphone has different types of augmented reality apps, mainly in communication and socializing, map navigation, and gaming categories. When people use smartphones, they have already moved away from traditional phone call functions and focus on social interaction and communication behaviors. In Taiwan, everyone's smartphone will have a "social communication augmented reality app," among which "Meta Spark, Line, and Snapchat Apps" are the three most used. The usability evaluation of the interface design and process operation, as well as the user experience satisfaction in these three apps, are even more critical. The development and application of digital services redefine social behavior patterns, and people's ways of making new friends have transitioned from traditional face-to-face meetings to virtual video meetings. The use of remote virtual video or smartphone video will become the norm. Therefore, everyone establishes a new mode of communication and exchange through application software. In augmented reality social communication applications, how to quickly achieve user satisfaction in the "facial effect interface" and "function editing" processes, and the recognition of "image symbols" will be the most important issues. The purpose of this study is to conduct user experience testing and cognitive psychology research on the three social communication applications, "Meta Spark App, Line App, and Snapchat App". Using application software, people are establishing new modes of communication. In augmented reality social communication applications, the user experience satisfaction of "facial effect interface," "function editing," and the operational process of "image symbols" identification will be the most important issues for this "social communication augmented reality app." The purpose of this study is to conduct user experience testing and cognitive psychology research on the three social communication applications "Meta Spark App, Line App, and Snapchat App."
Empathy is central to social interaction, yet how it is externally expressed in virtual reality (VR) communication remains underexplored. In this study, we examined how directionality-aware cues of empathy, such as mimicry, eye contact, and body proximity, relate to cognitive and emotional empathy. We designed high- and low-empathy scenarios and recruited participants with acting experience to ensure clear emotional expressions. Our findings indicate that facial mimicry patterns differ by empathy type: cognitive empathy involves subtle, speech-related muscle movements, whereas emotional empathy is associated with more intense affective expressions. Interestingly, we also found that while facial expressions and lower-body mimicry tend to emerge unconsciously, upper-body mimicry occurs more consciously, suggesting distinct pathways of empathic embodiment. We also observed that vocal intensity mimicry and pitch variability serve as important indicators of empathy, and a consistent hand approach is closely linked to empathy. Additionally, emotional empathy fosters longer eye contact, whereas cognitive empathy stabilizes gaze and head movements. Finally, we constructed machine learning models to predict empathy from these external expressions. Our best classifier achieved an accuracy of 0.756 for cognitive empathy and 0.704 for emotional empathy, indicating the feasibility of objective assessment. These findings provide a deeper understanding of how empathy is manifested in VR communication and support the development of empathy-aware virtual agents and training systems.
Abstract Information data visualisation is a new technology and trend emerging from the development of online information dissemination. This paper integrates the cross-field knowledge of journalism and communication, cognitive psychology and computer science, conducts two information visualisation experiments of colour symbols and cognitive load under the guidance of dual coding theory, and finds that contrasting colour schemes and symbol markers with meaningful associations help the audience’s cognition and memory of infographics. The errors caused by pie charts were much more significant than those caused by line and bar charts, and the errors caused by bar charts were much smaller than those caused by line charts. The cognitive load that pie charts cause is significantly higher than that of line and bar charts. Based on the visual representation of information visualisation, the principle of dual coding theory information processing and the results of two experiments, this paper proposes three principles of data visualisation strategy for information dissemination. This study provides theoretical support for the information visualisation design of data visualisation products, and the research conclusions and design methods can be applied to related information visualisation interfaces, which have good theoretical research value and practical significance.
Communication technology plays a crucial role in facilitating remote collaborative work. This study investigated sex differences in Perceived Participation Equality and User Experience across different communication formats, i.e., face-to-face communication, conventional video conferences, and Virtual Reality (VR). An empirical study was conducted involving 15 groups, each comprising three participants, who engaged in a decision-making task. A research model was developed to evaluate the interplay between perceived participation equality, empathy, and immersion. This model was employed across three communication conditions and included both male and female participants. These findings on sex differences in user experience could help create a connected, cohesive, and productive remote collaborative work environment.
This theoretical paper aims to explore empathy in the context of technologically mediated patient-provider communication, specifically within the context of video- and telehealth consultations. Over the past few decades, empathy has been recognized as a vital component of high-quality patient care, often prioritizing the cognitive over the emotional dimensions of empathy. As healthcare increasingly embraces digital communication technologies, including video consultations, the dynamics of empathy in clinical encounters are altered. With this paper we explore the pertinent question: how do new digital communication modalities impact on empathy and its different dimensions? To address the above question, we move beyond clinical and applied empathy frameworks instead integrating insights from two related philosophical traditions. First, the classical phenomenological understanding of empathy (represented primarily by Edith Stein) as embodied intuition. Second, the postphenomenological philosophy of technology, represented by Don Ihde and not least inspired by Maurice Merleau-Ponty’s phenomenology of embodiment. We apply these theoretical frameworks to empirical analyses of video consultations in general practice and telemedical encounters between chronic obstructive pulmonary disease (COPD) patients and specialist telenurses. Our analysis demonstrates that even though video consultations do not allow for the same level of “fine-tuned” body-mediated sensory input, a whole-body empathetic experience can nevertheless be established through (1) the audio-visual sensory impressions that are being mediated by the technology, (2) our whole-body interpretations of this information and (3) our shared experiences of a lifeworld that we actively orient ourselves towards. These experiences may lead to empathetic communication and helping actions that draw on both emotional, intuitive and cognitive dimensions in a holistic manner. Combining theoretical insights from phenomenology and postphenomenology with empirical telehealth analyses, we demonstrate how empathy is both reconfigured through technological mediation and sustained as an embodied, intersubjective practice. We thus conclude that empathetic care practices can be established in technologically mediated encounters through bodily intentionality where our bodies and minds are unified in understanding and connecting with other persons, even though we are not in the same physical space. We propose a theoretical bridge, connecting classical phenomenology and postphenomenology in the context of empathy in technologically mediated patient-provider communication. This bridge is grounded in Merleau-Ponty’s conception of whole-body perception and the lifeworld whether through physical proximity or digital interaction.
As people today use information products in contexts with distractions, we need to design for people’s attention. User experience design routinely relies on behavioral design to engage distracted users and nudge them toward specific behavior. Although practiced in user experience design, behavioral design is less known in technical communication. In this article, we use the CHOICES (Context, Habits, Other people, Incentives, Congruence, Emotions, and Salience) framework developed by McKinsey’s Behavioral Lab to introduce students to learn about behavioral design principles that make use of cognitive biases to influence people. We maintain that behavioral design is useful for technical communicators because they create digital assets that are part of the user experience.
Meeting the needs of users requires an understanding of the contexts where they interact with materials. This entry presents an approach for integrating script theory into usability to develop medical materials individuals can use in the settings where they receive or perform healthcare activities. The entry introduces technical communication professionals to script theory and presents mechanisms for using script theory to research patient expectations of and presents usable materials for health and medical contexts.
In the contemporary metaverse landscape, comprehending the intricacies of human interaction is imperative for enhancing communication within Virtual Reality (VR) experiences. At the core of meaningful social relationships lie empathy and trust, pivotal elements nurtured by the capacity to comprehend both self and others’ thoughts and intentions. Conventional face-to-face interactions heavily rely on non-verbal cues, such as body language and facial expressions, to convey messages and display empathy. To investigate the relationship between emotional simulation in VR and empathetic skills, in this paper we conducted a user study involving 37 participants, which were requested of simulating facial expressions in a virtual environment. In order to capture their facial behavior, we employed the Meta Quest Pro, a virtual reality headset featured with accurate built-in sensors for capturing 63 microexpressions, accordingly to the Facial Action Coding System (FACS)[11]. Furthermore, the Interpersonal reactivity index (IRI)[9] questionnaire was used to assess the participants’ empathetic abilities. The results of this study underscore a statistically significant correlation between participants’ empathetic skills and their capacity to simulate emotions through facial expressions within the context of VR scenarios. Additionally, this research offers valuable insights into the prevalence of human micro-expressions during the simulation of seven distinct emotions. These findings lay the foundation for potential applications in the field of mental health and emotional well-being within the context of metaverse.
Face‐to‐face communication relies extensively on non‐verbal cues (NVCs) which complement, or at times dominate, the communicative process as they convey emotions with intense salience, thus definitively affecting interpersonal communication. The capture, transference, and subsequent interpretation of NVCs becomes complicated in computer‐mediated communicative processes, particularly in shared virtual worlds, for which there is growing interest both in regard to NVCs technological integration and their affective impact. This paper presents a between‐groups experimental setup which is facilitated in immersive virtual reality (IVR), and examines NVCs effects on user experience, with special emphasis on degree of attention toward each NVC as an isolated controlled variable of a scripted performance by a virtual character (VC). This study aims to evaluate NVCs fidelity based on the capabilities of the motion‐capture technologies utilized to address cue integration developmental challenges and examines NVCs impact on users' perceived realism of the VC, their empathy toward him, and the degree of social presence experienced. To meet the objectives set the affective impact of low‐fidelity automated NVCs and high‐fidelity real‐time captured NVCs were compared. The findings of the evaluation suggest that although NVCs do impact user experience to an extent, their effects are notably more subtle compared to previous studies.
In human-centered product design and development, understanding the users is essential. Empathizing with the user can help designers to gain deeper insights into the user experience and their needs. However, capturing real time empathy during user interaction and the degree to which empathy enhances user understanding remains unclear. To narrow this gap, this study aims to explore the use of facial expression recognition during a videotaped user interview as means to capture empathy. Mimicry and synchrony have been shown to be predictors of empathy in cognitive psychology. In this study, we adapt this method to 46 user-designer interviews. The results show that the user and designer show mimicry in their facial expressions, which indicates that affective empathy can be captured via facial recognition. However, we find that the user’s facial expressions might not represent their actual emotional tone, which can mislead the designer. Further, we do not find a link between the observed mimicry of facial expressions and the understanding of mental contents, which hints that the affective and some cognitive parts of user empathy may not be directly connected. Further studies are needed to understand how facial expression analysis can further be used to study and advance empathic design.
No abstract available
No abstract available
No abstract available
This study investigates the influence of communication style similarity between streamers and viewers on purchase intention within the framework of similarity attraction theory and cognitive-emotional system theory. Live marketing, utilizing online streamers for real-time interaction with consumers, has become a prominent sales strategy. A quantitative approach was employed, using questionnaire data collected from live marketing audiences. The survey measured communication style preferences of both viewers and streamers, along with viewers' perceived level of quasi-social interaction during the live stream, immersive experience, and purchase intention. The research demonstrates that when a streamer's communication style aligns with a viewer's preference, viewers perceive a stronger sense of quasi-social interaction. This heightened sense of connection fosters a more immersive live streaming experience, ultimately leading to a greater purchase intention. Furthermore, the study reveals that viewers with a higher need for cognitive closure-the desire to minimize ambiguity-experience an amplified effect of both communication style similarity and immersive experience on their purchase intention. This research contributes to the evolving body of knowledge on live marketing communication. By highlighting the importance of communication style matching between streamers and viewers, the findings offer valuable guidance to live streaming platforms and companies. Tailoring streamer communication styles to align with target audience preferences can enhance audience engagement, create a more immersive experience, and ultimately drive higher conversion rates.
When individuals experience stressful situations, it is common to disclose personal challenges and stress and seek social support on social media platforms. To better understand and improve social support in online environments, this study uses Reddit as a case study to examine how people navigate support-seeking online during stressful times. We conducted interviews with 16 individuals who used Reddit to share stress and seek support, exploring their motivations and outcomes of their posts and the challenges encountered. Findings reveal that users typically seek emotional, informational, network, and esteem support online and receive the desired support. Beyond immediate support receipt, online support interactions may have sustained positive impacts, such as mindset and social practice changes, the formation of offline friendships, and contributions to personal resilience in navigating stressful events. However, a mismatch between expectations and received responses, along with the inherent barriers of online communities, presents challenges. We discuss Reddit's affordances and limitations for stress disclosure and support exchange within the broader social media ecosystem, and then propose design implications to enhance online support exchange, such as fostering clearer communication, increasing support engagement, and community empathy.
Abstract Chatbots enhanced by VR can deliver rich social cues through both verbal and non-verbal communication. While existing research emphasizes verbal factors and visual anthropomorphism, systematic exploration of body movements remains limited. This study proposes the Interactive-Cheerful-Empathic (ICE) Movements Framework, mapping body movements to three psychological needs: autonomy, competence, and relatedness. We developed a VR chatbot (Hilie) with four movement modes (interactive, cheerful, empathic, and no movement) and conducted a single-factor within-subjects experiment involving 56 university students. Quantitative and qualitative results revealed that chatbots with body movements—particularly cheerful movements—significantly enhanced users’ self-disclosure willingness, satisfaction, trust, and intention to use compared to static counterparts. The ICE framework effectively addresses multi-level psychological needs through coordinated movements. This work pioneers the operationalization of self-determination theory in chatbot design, providing theoretical models and practical guidelines for developing highly anthropomorphic chatbots, while advancing optimization strategies for online mental health services.
Empathy skills are required for understanding and responding to others’ emotions. Traditional instruments for testing a person’s empathy type lack immersion and contain recall bias. In this development study, we presented the technical foundations of Virtual Reality Empathy Test (VRET), an immersive virtual reality (IVR) platform that tests users’ empathy type through interactive scenarios and related questions. The questions are based on the Multidimensional Empathy Scale for Adolescents. VRET determines the user’s empathy type along three dimensions of empathy: cognitive-affective, positive-negative, and majority-minority. Furthermore, we presented the results of a usability and user experience evaluation of VRET in which 99 Korean adolescents (47 females, 49 males, and three others, ages 15–17) answered a mixed-method post-experiment questionnaire. The results indicated a good overall usability with issues in complex technology setup, virtual characters, scenarios, interactions, and cybersickness. Moreover, we identified several affordances, such as the ability to test one’s empathy type, superiority over paper-based tests, high immersion, realism, situatedness, fun, interest, and novelty. Furthermore, we conducted a technical performance evaluation, which showed steady performance with a mean frame rate of about 90. The results of this study contribute to the growing body of literature on the human-computer interaction aspects of utilizing IVR in psychology. The architectural design of VRET opens opportunities for the collection of multimodal data and the utilization of immersive scenario-based content in other application areas.
Task analysis, a methodology for iterative problem-solving, offers human factors and user experience (UX) practitioners a powerful lens to enhance functionality and inclusivity in their solutions. Practitioners combine core principles of human behavior and needs with characteristics of activities to enable greater accessibility and engagement across contexts. By evaluating daily activities into their more granular physical, cognitive, sensory, and social components, human factors, ergonomics, safety and user experience practitioners may gain a deeper understanding of the diverse ways people interact with spaces and objects. Task analysis addresses practical user needs and ensures environments support safe, accessible engagement through the dynamic interaction among user, environment and activity. Integrating task analysis allows for more intentionality in layout, lighting, material selection and accessibility features to accommodate diverse abilities. Task analysis inspires us all to challenge how we might transform the built environment into one that promotes independence, well-being and serves all people.
ABSTRACT In debating who takes responsibility for adolescent’s online activity, expectations are that a multi-systemic approach is needed. In this paper, the voices of 11–18-year-olds, teachers, and mental health practitioners in focus group conversations were analysed using thematic analysis. Results indicated that young people demonstrated empathy in situ during data collection. However, when reporting on conversations in digital spaces, they complained of a lack of empathy from others, noting that bullying, and trolling were problematic. We propose the novel use of an intentional digital cognitive interruption to support empathic posting. The intention is for this to act as a catalyst for young users to consider their responses before posting by providing a momentary disruption to the fast flow of online interaction. We invite further conversation about supporting adolescents’ digital empathy in online spaces. Impact summary Prior state of knowledge Adolescents’ lives are inseparable from digital technology. This has created a moral concern about their online conduct in relation to responsibility, empathy, kindness, and care (conceptualised as a digital ethics of care within a digital citizenship framework). Novel contributions Building on previous work on digital ethics and morality, we introduce the cognitive interruption as a solution to instant thoughtless posting that lacks empathy. We offer a novel model, called STEP – Stop, Think, Empathise, Privacy. Practical implications The application of STEP is valuable to anyone concerned with encouraging adolescents to be more empathically aware of their digital communication and conduct. This includes, parents, teachers, technology companies, and young people themselves.
This study examines user experience evolution across three repeated interactions with an on-screen NAO robot designed to express artificial empathy through verbal communication and music. The participant numbers across the three interactions were N1 = 139, N2 = 129, and N3 = 121, respectively, with 121 participants completing all sessions. During interaction, the robot gave empathic feedback and/or played music to the participant as a token of empathy. Repeated measures MANCOVA and Structural Equation Modeling revealed that initial bonding tendencies and perceptions of the robot trying to be empathetic faded over time. In its place, a tendency emerged of the robot becoming more personally relevant and remarkably, its design appeared to become more realistic, like a human being. When the robot merely tried empathetic conversation or just played music, participants were disappointed about its capabilities, visible in increased levels of negative valence. Bonding and perceived empathy flourished when the robot played music while talking empathically in chorus, a mutual reinforcement effect. At first, for the loneliest individuals, the mere presence of the robot, rather than its empathic behaviors, was more influential in determining the robot’s relevance to their concerns. These results underscore the importance of a multimodal approach in designing empathic robots.
First person perspective taking presented in head-mounted displays make it a perfect interface to experience empathy toward other people. Since some intercultural misunderstandings stem from ethnocentrism, it is worth considering the possibilities given by the VR experiences to explain behaviors toward out-groups and induce empathetic actions. In this paper we present the design process of ethnoVR, a 7-minutes 360-degree film, which allows taking the perspective of two students - a Chinese and a Pole - who face a problem in efficient communication. The scenario was created with the usage of critical incident technique, adopting user-centered design paradigm.
To address the issue of user empathy throughout the emotional experience process, this study presents a method to evaluate the efficacy of cultural empathy evoked based on fuzzy-FMEA. The method focuses on symbolic culture and creative products, constructing an evaluation index system and decision-making framework in terms of cultural empathic evoking. It utilizes thematic analysis to discover and categorize the factors that influence cultural empathy, as well as an evaluation index system to improve the Failure Mode and Effects Analysis framework. It effectively solves the limitations of traditional FMEA, such as single weighting and uncertainty. According to the assessment report, cognitive association failure and scenario restoration failure are significant risk factors for cultural empathy-evoking failure. This study’s findings provide designers with realistic proposals for thematic symbolic imagery and serialized design forms, as well as scientific assessment tools and decision-making resources for cultural industries and policymakers.
Self-tracking aims to increase awareness, decrease undesired behaviors, and ultimately lead towards a healthier lifestyle. However, inappropriate communication of self- tracking results might cause the opposite effect. Subtle self- tracking feedback is an alternative that can be provided with the aid of an artificial agent representing the self. Hence, we propose a wearable pet that reflects the user’s affective states through visual and haptic feedback. By eliciting empathy and fostering helping behaviors towards it, users would indirectly help themselves. A wearable prototype was built, and three user studies performed to evaluate the appropriateness of the proposed affective representations. Visual representations using facial and body cues were clear for valence and less clear for arousal. Haptic interoceptive patterns emulating heart-rate levels matched the desired feedback urgency levels with a saturation frequency. The integrated visuo-haptic representations matched to participants own affective experience. From the results, we derived three design guidelines for future robot mirroring wearable systems: physical embodiment, interoceptive feedback, and customization.
Subtle environmental stimuli, such as micro-animations, color hierarchy imbalances, and low-level background noise, can unconsciously shape user attention and cognitive load. Despite their ubiquity in digital interfaces, these non-salient factors remain underexplored in current Human-Computer Interaction research. This study employs literature review and case analysis to analyze neural correlates (e.g., frontal alpha oscillations), behavioral indicators (e.g., mouse trajectories, reaction times), and subjective self-reports (e.g., mind-wandering). The paper outlines multi-dimensional pathways through which attention is subtly eroded. This paper aims to introduce the dynamic interaction mechanism between environment, user, and task association, and explain the path by which non-salient factors influence user attention. The review further explores adaptable measures in interface design, providing strategies for both general users and vulnerable groups such as individuals with ADHD and the elderly. This study provides certain theoretical guidance and methodological reference for Human-Computer Interaction design.
This study investigates how different interface layouts affect user interaction efficiency and visual processing mechanisms, with a particular focus on the moderating role of age differences. A total of 40 participants were recruited, including 20 younger adults (aged 22–27) and 20 older adults (aged 52–58), who completed three types of tasks—target localization, status confirmation, and functional operation—across four typical electric vehicle (EV) charging application layouts (L1–L4). Eye-tracking techniques were employed to capture average fixation duration, saccade count, fixation-to-saccade ratio, and gaze heatmaps, while task completion time, accuracy, and subjective preferences were analyzed in parallel. The results show that L1 (Bottom Navigation with Top Functional Area) yielded the best performance in both task efficiency and visual processing, with minimal performance differences across age groups. In contrast, L3 (Bottom Navigation with Two-Level Hierarchical Structure) significantly impaired interaction efficiency and increased cognitive load, particularly for older adults. Eye-tracking metrics aligned closely with behavioral data and user preferences, underscoring the critical role of layout structure in shaping attentional allocation and interaction pathways. These findings highlight the importance of adopting intuitive and flat layout designs to enhance universality and usability in cross-age mobile application interfaces.
No abstract available
Social Media platforms are increasing user engagements with their attractive User Interfaces (UI) design playing a crucial role in maintaining user involvement. Use of Emotional Intelligence (EI) into the UI transforming user interaction and engagement pattern significantly. With an emphasis on how design decisions affect user behaviour, this study explores the nuanced function of EIUIs (Emotionally Inteligence User Interfaces) in the context of social media. It specifically draws attention to the dangers of manipulative techniques like "dark patterns"-purposeful interface strategies intended to influence users to take actions they might not have otherwise taken-as well as the possibility of constructive involvement. By investigating these practices, the paper hopes to clarify the wider ethical consequences of interface design as well as the duty of platforms and designers to promote transparency, autonomy, and confidence in digital interactions. This paper aims to contribute to the discourse on user interface design by providing a comprehensive understanding of EIUIs' potential and pitfalls in the context of social media interactions. Additionally, this study focusses on finding information of the pattern of Emotionally Intelligent User Interfaces (EIUI) design to enhance user engagements emotionally and tend to take decisions and the impact of the design pattern in user experience.
The paper investigates engineering-psychological aspects of applying animation in user interfaces of robotic systems; considers various forms of animated elements representation in interfaces, their characteristics, and their impact on the user’s cognitive-emotional, perceptual, and executive spheres; shows the features of memory, attention, thinking, and decision-making mechanisms functioning under the animation influence. The authors conduct an analysis of factors affecting the animation perception, including its intensity and context of use; provide a classification and examples of animation application in user graphical interfaces; carry out general recommendations for developers of user interfaces for robotic systems that meet usability requirements. The work shows that animation is an effective means of improving the quality of human-machine interaction in robotic systems. The paper concludes that animation has a significant impact on the users’ emotional state. It can cause joy, calmness, surprise, or dislike depending on its nature and context of use. Proper use of animation contributes to improving mood and reducing stress of users.
Conversational search systems increasingly provide source citations, yet how citation or source presentation formats influence user engagement remains unclear. We conducted a crowdsourcing user experiment with 394 participants comparing four source presentation designs that varied citation visibility and accessibility: collapsible lists, hover cards, footer lists, and aligned sidebars. High-visibility interfaces generated more hovering on sources, though clicking remained infrequent across all conditions. While interface design showed limited effects on user experience and perception measures, it significantly influenced knowledge, interest, and agreement changes. High-visibility interfaces initially reduced knowledge gain and interest, but these positive effects emerged with increasing source usage. The sidebar condition uniquely increased agreement change. Our findings demonstrate that source presentation alone may not enhance engagement and can even reduce it when insufficient sources are provided.
This paper looks into how user interface (UI) design can influence users' attention span and cognitive load in digital tasks. As digital platforms become more feature rich, poor design choices can easily lead to overload and slow users down. This project is made by building on principles from Human-Computer Interaction (HCI), such as Cognitive Load Theory, Hick's Law, and Fitts's Law, to see how minimalist design and clear feedback might help people stay focused. Earlier studies may have explored minimalism and gamification, not many have compared them side by side in the same setting. To fill this problem, we combined a literature review with a small experiment to test a minimalist and a gamified UI. Participants did the same tasks on both versions, and we recorded task time, workload scores, focus scores, and error rate. Results showed that while focus and workload produced similar results for both designs, the minimalist version helped people finish tasks faster. This implies that while gamified features can keep people engaged, minimal UI may work better when speed and focus are important.
Social media, through the implicit design of information architecture, has constructed a consumption guidance system centered on attention capture. Its mechanism not only involves the surface structure of information presentation, but also reshapes users' consumption cognitive patterns through the deep penetration of emotional connection, value metaphor and social identity. This implicit control breaks through the explicit intervention framework of traditional advertising and accomplishes the implantation and reinforcement of consumption values in the user's unconscious state. This article reveals from three dimensions of information architecture: interaction logic, emotional coding, and value recognition, how social media achieves systematic capture of consumer attention through the synergistic effect of technical means and psychological mechanisms.
Waiting for system loading is a common scenario that often diminishes user experience, leading to dissatisfaction. Well-established visual indicators like progress bars can not directly apply to the interactions with voice assistants (VAs) like Siri. As VAs continue to rise in popularity, this research aims to explore the design of auditory indicators, particularly human speech, for optimizing waiting experiences in Voice User Interfaces (VUIs). We first organized focus groups (N=35) to identify design considerations for speech indicators, uncovering design opportunities in integrating explanations and humor. Subsequently, we conducted an empirical study (N=30) to evaluate the effects of speech indicators with two levels of explanation and humor on the waiting experience, measured by attention, perceived time, pleasure, and overall satisfaction, during both short and long loading durations. Our findings suggest significant potential for incorporating explanations and humor into VUIs, offering actionable insights for designing effective speech indicators that improve waiting experiences.
The study aims at investigating the effects of a peer-matching interface to collaborate in brainstorming on cognitive flexibility, maintenance of attention, and the quality of the ideas generation process in general. We developed an original, adaptive algorithm that pairs the participants with complementary cognitive and interaction profiles in order to maximize creative synergy in the remote collaboration setting. The research applied to 120 subjects and incorporated quantitative measures of cognitive performance with qualitative measures of ideation output. Findings demonstrate that the peer-matching interface produced a significant increase in cognitive flexibility, enhanced sustained attention and yielded a greater number of original and well-developed ideas relative to the use of traditional brainstorming formats. The academic networking sites like LinkedIn and ResearchGate played a significant role in the recruitment of participants and the interaction after the sessions but presented issues linked to the diversity of users, engagement, and collaboration in real-time. I was the lead researcher and developed the peer-matching theoretical framework, designed the experiment, coordinated participants via academic platforms, and was the primary Age analysing and interpreting the data. Among the findings is a great insight in designing smart collaboration tools in both educational and professional innovation settings.
As autonomous personal mobility vehicles (APMVs) are increasingly integrated into shared spaces, short-distance interactions between pedestrians and APMVs will become more frequent. To facilitate communication in shared spaces, APMVs equipped with external human-machine interfaces (eHMIs). Although the eHMI is primarily designed to communicate with pedestrians, its communication also affects the APMV passenger due to the short-distance interaction. This paper focused on the effect of passengers’ personality traits on their user experience when the APMV exhibits different eHMIs. An experiment was conducted in the field with 24 participants as APMV passengers who experienced three distinct eHMI types: eHMI-T (text-based), eHMI-NV (neutral voice-based), and eHMI-AV (affective voice-based). Through causal discovery analysis, our findings revealed that when the APMV is equipped with eHMI-T, various personality traits of passengers collectively influenced their user experience. In contrast, the eHMI-NV design demonstrated that personality traits had no direct influence on user experience. The eHMI-AV design primarily showed that agreeableness and extraversion negatively influenced concerns about drawing attention, which subsequently affected other user experience. Based on the results, this paper recommends designing different eHMIs based on the APMV ownerships, such as private or public shared APMVs.
BACKGROUND: Embodied conversational agents (ECA) are possible enablers of assistive technologies, in particular for older adults with cognitive impairment. Yet, dedicated interaction management techniques addressing the specificities of this public are needed. OBJECTIVES: We assess whether the interaction management framework of the LOUISE (Lovely User Interface for Servicing Elders) ECA has the potential to overcome the user interface constraints linked to cognitive impairment. METHODS: LOUISE supports key target-specific features: personalization; attention management; context reminders; image and video displays; a conversation manager for task-oriented interactions; and the foundations for a domain-specific XML-based language for task-oriented assistive scenarios. LOUISE’s usability and acceptance were evaluated at the Broca geriatric hospital in Paris. with a group of 14 older adults with either mild cognitive impairment (MCI) or Alzheimer’s disease (AD) through four simple but realistic assistive scenarios: drinking, taking medicine, measuring blood pressure and choosing the lunch menu. RESULTS: Most of our participants were able to interact with the ECA, succeeded in completing the proposed tasks and enjoyed our design. CONCLUSION: The field usability evaluation of LOUISE’s interaction management framework suggests that this suite of interaction techniques can be effective in enabling interfaces for users with MCI or AD.
The wide variety of services and data available on the internet may make people's lives easier, increasing the access to information and turning services that were complicated into more practical ones. However, the use of computers can be difficult for some people due to issues related to usability, accessibility, or for feeling afraid or anxious while using computers. When this anxiety reaches high levels, they manifest what is known as Computer Anxiety (CA). People with Computer Anxiety (PwCA) may face problems when using computers at home, at work or for study purposes, resulting in multiple forms of barriers even before the actual interaction with a computer. In this context, an eye tracking field study was performed with 39 elderly participants interacting with a website aiming to identify user interface elements impacting negatively task performance and user experience for people with CA. Moreover, an initial exploratory study was performed on the feasibility of creating a classifier for identifying sessions related to people with CA. Results show that certain user interface elements (e.g., carousel and maps) might impact negatively task performance and user experience for PwCA, due to information overload and salient objects calling users' attention. Moreover, classification model using Random Forest reached accuracy of 84.8%. From the presented results, one expects that personalized systems could use classification algorithms to identify sessions from PwCA and then simplify user interfaces based on different levels of CA.
User psychological characteristic analysis provides personalized solutions for merchants to meet user demands, thereby enhancing user experience and satisfaction. Traditional methods often rely on semantic information extracted from text. However, these approaches often overlook the complex interaction among content text, psychological feature, and syntactic structure. This paper proposes a Psychological Feature Activated Dual-Perspective Heterogeneous Graph Neural Network (PAPGNN). Firstly, feature words are generated and activated by leveraging large language models and psychological prior knowledge, effectively distinguishing different personality dimensions. Then, a dual-perspective heterogeneous network is introduced to capture both semantic associations and syntactic dependencies from user reviews. Finally, a graph convolutional network with differential regularization and attention mechanism is adopted to fuse and align multi-dimensional information. Experiments on Entertainment and Life datasets are conducted to demonstrate the effectiveness of the proposed method.
As attention to mental health issues continues to grow, emotional healing chatbots—acting as a new type of solution—are experiencing explosive growth. Clarifying the technology affordances of these chatbots is crucial for their large-scale implementation; however, existing studies still lack in-depth analysis of their key affordance dimensions and the mechanisms through which these dimensions influence user interaction satisfaction. This study employs the BERTopic model to identify the dimensions of affordance that influence user satisfaction. Subsequently, by testing and comparing six machine learning algorithms, the optimal model for predicting factors influencing user satisfaction was selected. The results show that the XGBoost model is the most effective among all the models with 92.66% accuracy, 91.69% precision, 91.39% recall, and 97.68% AUC value. To further analyze the influence mechanism, the study used the interpretable Shapley Additive exPlanations (SHAP) model, ultimately confirming that the key affordance dimensions influencing user satisfaction are: comfort, memory storage, imagination, healing, communication, regulation, companionship, memory association, humor, vocal appeal, and empathy. The findings of this study provide theoretical support and practical insights for optimizing the service quality of emotional healing chatbots and enhancing users' mental health and well-being.
The article examines the influence of gamification of digital platform interfaces on the transformation of the value function of users in the context of the attention economy. The theoretical basis is the concept of attention as a scarce resource (M. A. Milkova, A. Sh. Tkhostov), as well as behavioral and neuroeconomic models of utility assessment (V. A. Klyucharev, I. A. Vasiliev). The analysis of gamification mechanisms (points, badges, levels, leaderboards) and their impact on motivational attitudes, cognitive strategies and behavioral patterns of users is carried out. It is shown that gamification contributes to a shift in motivation from internal to external, enhances short-term involvement and causes a restructuring of the value function towards a preference for symbolic rewards. The risks of digital addiction, the effect of excessive justification and a decrease in the depth of cognitive information processing are noted. A conclusion is made about the need for ethically sound and adaptive application of gamification practices in interface design, taking into account the cognitive and motivational characteristics of users.
Each year, multi-modal interaction continues to grow within both industry and academia. However, researchers have yet to fully explore the impact of multi-modal systems on learning and memory retention. This research investigates how combining gaze-based controls with gesture navigation affects information retention when compared to standard track-pad usage. A total of twelve participants read four textual articles through two different user interfaces which included a track-pad and a multi-modal interface that tracked eye movements and hand gestures for scrolling, zooming, and revealing content. Participants underwent two assessment sessions that measured their information retention immediately and after a twenty-four hour period along with the NASA-TLX workload evaluation and the System Usability Scale assessment. The initial analysis indicates that multi-modal interaction produces similar targeted information retention to traditional track-pad usage, but this neutral effect comes with higher cognitive workload demands and seems to deteriorate with long-term retention. The research results provide new knowledge about how multi-modal systems affect cognitive engagement while providing design recommendations for future educational and assistive technologies that require effective memory performance.
User interfaces heavily rely on attention-capture design patterns, e.g., infinite scroll and other variable-rewarding mechanisms, that erode users’ sense of autonomy and undermine their digital wellbeing. Instead of having users rely on external self-regulation tools, this paper advocates that tech companies and designers should prioritize users’ digital wellbeing by design. To take a first step in this direction, we present Digital Wellbeing Lens, a Figma plugin that guides designers in creating user interfaces that respect and preserve user time and attention. The plugin allows the continuous evaluation of prototypes against attention-capture patterns, calculating a digital wellbeing score and suggesting suitable design alternatives. Besides introducing the plugin, we demonstrate its practical application through a use case involving the design of a mobile social media app, and we report on the results of a first exploratory study with four designers, discussing the opportunities and challenges of embracing this paradigm shift.
Although cognitive metrics like attention significantly influence user interaction with digital environments, their role in web search behaviour remains underexplored. This study employs a multidimensional attention framework to investigate the relationship between attention levels and browsing/dwell time behaviours on web pages, a subject underrepresented in previous research. Using a dataset of 50 graduate students from Shiraz University and metrics derived from the Stroop Color and Word Test (SCWT), Wisconsin Cards Sorting Test (WCST), and Continuous Performance Test (CPT), this research examines selective, alternating, and sustained attention. Participants were clustered into attention levels via k-means, and their web search activities were analysed using ANOVA. The results of the present study indicated that users with higher levels of alternating attention clicked significantly more often while searching the web. Additionally, users with greater selective attention spent less time before clicking on the first result, performed fewer scrolls, and had a longer dwell time before saving their first selection. Furthermore, the study revealed that users with moderate sustained attention exhibited less dwell time on the search engine results pages (SERPs). This research contributes to the existing literature by providing a nuanced understanding of how different aspects of attention influence web browsing behaviours. It highlights the need for web developers and marketers to consider cognitive processes in their designs, ultimately leading to more effective and user-friendly digital interfaces. The insights gained from this study are invaluable for enhancing information retrieval systems and improving overall user satisfaction in online environments.
This study aimed to identify the role of sustained attention and its interaction with task complexity in shaping high school adolescents’ search performance (satisfaction and success) during web-based information retrieval. The study sample consisted of 90 female students from the tenth and eleventh grades at Shiraz University High School. Data was collected by recording, observing, and analysing user interaction reports on the web, utilising Camtasia version 2023 software. In this study, two search tasks (one simple task and one complex task) were designed for the subjects. The complexity of the search tasks was assessed through three components: information clarity, response dispersion, and cognitive activity. The levels of simplicity and complexity were determined by statistically analysing the scores assigned to the tasks by experts in knowledge and information science. Two methods were employed to evaluate performance, which included measuring both satisfaction and success. Success was determined by the subjects’ ability to correctly answer the tasks, and their responses were scored by three experts in knowledge and information science. To assess satisfaction, participants completed a questionnaire after finishing each task. Additionally, the level of sustained attention was measured using the computerised Continuous Performance Test developed by Rosvold et al. in 1956, which has been validated for reliability in Iran by Hadianfard et al. in 2000. The findings of the study indicated significant differences in the performance (satisfaction and success) of adolescents with varying levels of sustained attention. Specifically, adolescents with high levels of sustained attention demonstrated better performance than those with medium levels, while adolescents with medium levels outperformed those with low levels of sustained attention. Also, the results indicated that the greater the level of attention adolescents exhibited, the better their performance, in terms of satisfaction and success, when completing tasks—both simple and complex. Additionally, across all three groups of adolescents categorised by their levels of sustained attention (high, medium, and low), performance in simple tasks consistently outperformed that in complex tasks. Sustained attention is a crucial cognitive variable that significantly influences users’ search performance. However, past studies have given this aspect less attention, and the specific effects of sustained attention and its interaction with task complexity on adolescents’ search performance have yet to be examined. The findings of this study can inform better strategies, such as enhancing information retrieval systems, improving user interface design, and developing educational approaches to boost search performance among adolescents. By understanding how sustained attention varies among adolescents, information retrieval and user interface design experts can create more effective systems and interfaces tailored to the diverse needs of this user group. Additionally, the results will raise awareness among information science researchers and information literacy educators about the cognitive factors that impact adolescents’ web search performance, enabling them to devise targeted measures for improvement.
In the context of Electroencephalography (EEG) research, how is Working Memory (WM) leveraged in Human-Computer Interaction (HCI)? To address this question, this paper explores how WM is represented in EEG-based HCI experiments, with the aim of informing interface and system design that more effectively aligns with users’ cognitive capacities and limitations. A total of 132 studies published between 2018 and 2024 were reviewed to identify HCI use-cases involving EEG to study WM, outline key WM concepts, and evaluate how these align with findings from other disciplines. The findings indicate that WM-related EEG studies in HCI aim to enhance user interaction through more efficient signal analysis and the development of adaptive systems and brain-computer interfaces (BCIs). However, the findings also highlight the lack of theoretical grounding for EEG-based WM research within HCI. Key cognitive theories are often overlooked, and the strong association between WM and Attention is rarely acknowledged. Although the neural basis of Working Memory (WM) is well represented, its conceptualization within HCI remains underdeveloped and often misaligned with advances in cognitive science and psychology—potentially limiting the development of safe, effective, and cognitively-aware user-centered technologies. Based on these findings, we recommend: (a) integrating alternative models of Working Memory (WM) into HCI system design; (b) incorporating Attention evaluation in EEG-based WM experiments; and (c) exploring the development of a custom WM foundation model to address conceptual limitations in HCI and variability across tasks, users, and environments.
Visual selective attention is a cognitive process by which humans efficiently process critical visual information. It reflects the user’s authentic visual thinking and can be applied by designers in age-friendly design to enhance the user experience of elderly users, meeting their visual needs and attention characteristics. This has significant implications for the age-friendly design of fresh e-commerce product interfaces. This paper explores age-friendly design for fresh e-commerce product interfaces based on the theory of visual selective attention. Experimental data indicate that the optimized interface significantly enhances the user experience for elderly users, with task completion time reduced by 39.62% and satisfaction increased by 60%. First, qualitative research is conducted to uncover the visual selective attention mechanisms of elderly users. Combining this with the framework of fresh e-commerce products, an age-friendly design model is established, including page layout, brand colors, font size, and focal styles. Second, using eye-tracking, descriptive analysis, and correlation coefficient analysis, a comparative analysis of the visual selection behaviors of elderly and young users is conducted, yielding characteristics and principles for age-friendly interactive interface design. Finally, the feasibility and effectiveness of the proposed method are validated through design practice and evaluation. This research provides new insights and methods for the age-friendly design of fresh e-commerce product interfaces. It holds practical significance and value for constructing an elderly perspective in fresh e-commerce and expanding the private traffic of elderly users.
Human-Computer Interaction (HCI) in Virtual Reality (VR) environments is a rapidly evolving field that seeks to enhance user experience through immersive and intuitive design principles. As VR technology advances, the interaction between humans and virtual systems becomes increasingly complex, requiring innovative approaches to ensure usability, accessibility, and engagement. This paper explores the fundamental principles of HCI in VR, focusing on interaction techniques, input modalities, feedback mechanisms, and the psychological impact of virtual experiences. Various interaction techniques, such as hand tracking, motion controllers, eye-tracking, and voice commands, are examined, highlighting their advantages and limitations in different applications. Additionally, feedback mechanisms, including haptic feedback, spatial audio, and visual cues, play a crucial role in enhancing realism and user immersion. The study also addresses cognitive and ergonomic challenges, such as motion sickness, cognitive load, and the importance of adaptive interfaces that accommodate diverse user needs. Furthermore, the concept of presence—the feeling of “being there” in a virtual space—is explored, emphasizing how design choices influence immersion and engagement. Accessibility considerations, including designing for users with disabilities and optimizing VR experiences for different demographics, are also discussed. By analyzing current trends, user experience research, and best practices, this study provides insights for designers, developers, and researchers aiming to create effective, user-friendly, and inclusive VR applications. Ultimately, the goal is to improve the seamless integration of humans and virtual environments, enhancing usability and effectiveness across various domains such as gaming, education, healthcare, and remote collaboration.
Joint attention, the capacity of two or more individuals to focus on a common event simultaneously, is fundamental to human–human interaction, enabling effective communication. When considering the field of social robotics, emulating this capability might be necessary for promoting natural interactions and thus improving user engagement. Responding to joint attention (RJA), defined as the ability to react to external attentional cues by aligning focus with another individual, plays a critical role in promoting mutual understanding. This study examines how RJA impacts user engagement during human–robot interaction. The participants play a turn-taking game against a social robot under two conditions: with our RJA system active and with the system inactive. Auditory and visual stimuli are introduced to simulate real-world dynamics, testing the robot’s ability to detect and follow the user’s focus of attention. We use a twofold approach to evaluate the system’s impact on the user’s experience during the interaction. On the one hand, we use head pose telemetry to quantify attentional aspects of engagement, including measures of distraction and focus during the interaction. On the other hand, we use a post-experimental questionnaire incorporating the User Engagement Scale Short Form to assess engagement. The results regarding telemetry data reveal reduced distraction and improved attentional consistency, highlighting the system’s ability to maintain attention on the current task effectively. Furthermore, the questionnaire responses show that RJA significantly enhances self-reported engagement when the system is active. We believe these findings confirm the value of attentional mechanisms in promoting engaging human–robot interactions.
Digital technologies have produced an entirely new kind of cognitive burden on consumers striding through the contemporary attention economy. The article looks into the possibility of redesigning artificial intelligence systems so that they work as cognitive buffers as opposed to the cause of information overload. It provides a framework of cognition explored, which lays out the basis of cognitive load in modern settings, through an examination of the neurological and psychological effects of perpetual digital stimulation. The article subsequently suggests that the AI systems can create a paradigm shift in human-computer interaction by recognizing cognitive states, using neuro-symbolic decision mechanisms, and it can adjust interfaces accordingly. This article proposes new user experience-related patterns that involve timing, prioritisation, and simplification, which are grounded in the use case studies in an academic, gig economy, family management, and neurodivergent context. These apps show the usefulness of context-specific cognitive buffering to improve well-being and performance in a variety of people. Lastly, the article brings out new frontiers in cognitive protection, such as ambient computing integration, collaborating team buffering, standardized protocols of cognitive demand, as well as the use of neural interface technologies, as a guiding vision of the shift towards human-AI partnerships that do not ignore cognitive limits but instead increase capability.
This study investigates the cognitive and emotional impacts of specific “Deceptive Patterns” in user interface design on well-known online platforms: Facebook, Instagram, Spotify, Salesforce, Eventbrite, and PayPal. The focus is on two types of deceptive patterns, “hard to cancel” and “hidden subscription” practices. Employing an integrated methodology of electroencephalogram (EEG), eye-tracking, and sentiment analysis, this research analyzes how these patterns influence user behaviour, attention, and emotional responses. The study utilized EEG to measure cognitive load as users interacted with “hard to cancel” interfaces revealing that these platforms increase cognitive demands, inducing greater mental effort and frustration. Eye-tracking data demonstrated that platforms with more transparent mechanisms effectively captured and held user attention on the Terms and Conditions as the key elements, thereby fostering user trust and enhancing transparency. Sentiment analysis further assessed users’ emotional responses, underscoring the positive association between transparent interfaces and user satisfaction. This research highlights the importance of ethical design practices that prioritize user autonomy and transparency, offering a unique methodological contribution through the combined use of EEG, eye-tracking, and sentiment analysis to comprehensively capture cognitive and emotional responses.
No abstract available
Background Digital media usage has become an integral part of daily life, but prolonged or emotionally driven engagement—especially during late-night hours—may lead to concerns about behavioral and mental health. Existing predictive systems fail to account for the nuanced interplay between users’ internal psychological states and their surrounding ecological contexts. Objective This study aims to develop a psychologically and ecologically informed behavior prediction model to identify high-risk patterns of digital media usage and support early-stage intervention strategies. Methods We propose a Dual-Channel Cross-Attention Network (DCCAN) architecture composed of three layers: signal identification (for psychological and ecological encoding), interaction modeling (via cross-modal attention), and behavior prediction. The model was trained and tested on a dataset of 9,782 users and 51,264 behavior sequences, annotated with labels for immersive usage, late-night activity, and susceptibility to health misinformation. Results The DCCAN model achieved superior performance across all three tasks, especially in immersive usage prediction (F1-score: 0.891, AUC: 0.913), outperforming LSTM, GRU, and XGBoost baselines. Ablation studies confirmed the critical role of both psychological and ecological signals, as well as the effectiveness of the cross-attention mechanism. Conclusions Incorporating psychological and ecological modalities through attention-based fusion yields interpretable and accurate predictions for digital risk behaviors. This framework shows promise for scalable, real-time behavioral health monitoring and adaptive content moderation on media platforms.
Visual attention mechanisms are modulated by chromatic properties of the environment, with significant implications for human–computer interaction, interface design, and cognitive ergonomics. Despite extensive research on color perception, a critical gap remains in understanding how background colors differentially affect initial attentional capture versus sustained processing efficiency during text reading. This study investigates how seven different background colors (yellow, orange, red, green, blue, purple, and black) influence visual attention and cognitive load during standardized reading tasks with white text, revealing a fundamental asymmetry in chromatic processing stages. Using high-frequency eye-tracking at 120 Hz with 30 participants in a within-subjects design, we measured time-to-first fixation, total viewing duration, fixation count, and revisitation frequency across chromatic conditions. Non-parametric statistical analyses (Friedman test for omnibus comparisons, Wilcoxon signed-rank test for pairwise comparisons) revealed a systematic dissociation between preattentive capture and sustained processing. Yellow backgrounds enabled the fastest initial attentional capture (0.65 s), while black backgrounds produced the slowest detection (1.75 s). However, this pattern reversed during sustained processing: black backgrounds enabled the shortest total viewing time (0.88 s) through efficient information sampling (median 5.0 fixations), while yellow required the longest viewing duration (1.75 s) with fewer fixations (median 3.0). Statistical comparisons confirmed significant differences across conditions (Friedman test: χ2(6)=138.4–154.2, all p<0.001; pairwise comparisons with Bonferroni correction: α=0.0024). We note that luminance and chromatic contrast were not independently controlled, as colors inherently vary in both dimensions in realistic interface design. Consequently, the observed effects reflect the combined influence of hue, saturation, and luminance contrast as they naturally co-occur. These findings reveal a descriptive pattern consistent with functionally distinct mechanisms, where chromatic salience appears to facilitate preattentive capture while luminance contrast appears to determine sustained processing efficiency, with optimal colors for one stage being suboptimal for the other under the present experimental conditions. This observed chromatic asymmetry suggests potential implications for interface design: warm colors like yellow may optimize rapid attention capture for alerts and warnings, while high-contrast combinations like white-on-black may optimize sustained reading efficiency, though these preliminary patterns require validation across diverse contexts. Green and purple backgrounds offer balanced performance across both processing stages, representing near-symmetric solutions suitable for mixed-task interfaces. Given the controlled laboratory setting, university student sample, and 15 s exposure duration, design recommendations should be considered preliminary and validated in diverse real-world contexts.
No abstract available
As AI recommendation systems become increasingly important in consumer decision-making, leveraging sound cues to optimize user interaction experience has become a key research topic. Grounded in the theory of perceptual contagion, this study centers on sound cues in AI recommendation scenarios, systematically examining their impact on consumer choice and choice satisfaction, as well as the underlying psychological mechanisms. Study 1 (hotel recommendation, N = 155) demonstrated that embedding sound cues into recommendation interfaces significantly increased consumer choice and choice satisfaction. Study 2 (laptop recommendation, N = 155) further revealed that this effect was mediated by preference fluency. Contrary to expectations, AI literacy did not moderate these effects, suggesting that sound cues exert influence across different user groups regardless of technological expertise. Theoretically, this study (1) introduces the theory of perceptual contagion into AI-human interaction research; (2) identifies preference fluency as the core mediating mechanism; and (3) challenges the traditional assumptions about the role of AI literacy. Practically, this study proposes a low-cost and highly adaptable design strategy, providing a new direction for recommendation systems to shift from content-driven to experience-driven. These findings enrich the understanding of sensory influences in digital contexts and offer practical insights for optimizing the design of AI platforms.
Abstract Background Patients with insomnia have difficulty in both falling asleep and maintaining sleep. Individuals with long-term sleep deprivation are prone to poor concentration and impaired memory; however, these problems can be alleviated following brief behavioral treatment for insomnia (BBT-I). This study involved the design of an app called “Sleep Well” that enables individuals with insomnia to easily record their sleep behavior. The app guides users to recall and record sleep-related information, acquire sleep hygiene knowledge, and communicate with therapists online. Objective This study examined how specific sleep diary interface design features in a brief cognitive behavioral therapy for insomnia (BBT-I) app influence users’ attention and short-term memory. Using a combination of objective eye-tracking measures and subjective attention assessments, the study compared 3 interface designs to determine how visual layout, input modality, and interaction style interact with insomnia symptoms to affect attentional performance, memory accuracy, and user preference. Methods Three sleep diary interfaces were designed, varying background mode (day vs night), color scheme (blue vs green), box shape (circular, rounded rectangular, or rectangular), and input method (slide-in, tap, or type-in). A total of 33 participants completed standardized diary-entry tasks while eye movements were recorded using an eye tracker to capture gaze trajectories and visual attention patterns during app interaction. User experience, subjective attention, and interface preferences were assessed using structured questionnaires. Data were analyzed using descriptive statistics, nonparametric tests, Pearson correlation analysis, cross-tabulation analysis, and exploratory factor analysis to examine associations among interface design, attentional performance, memory accuracy, and user characteristics. Results A total of 33 participants (n=13, 39.4% male and n=20, 60.6% female) aged 20 to 64 years completed this study. Based on the Insomnia Severity Index, 6 of 33 (18.2%) participants had clinical insomnia and 13 of 33 (39.4%) reported insomnia symptoms. Most participants reported staying up late (22/33, 66.7%), and more than half of participants reported drinking tea (17/33, 51.5%). Interface design significantly influenced objective attentional performance, as measured by eye-tracking indicators of task efficiency and visual allocation. Sleep quality and insomnia symptoms were consistently associated with attentional and short-term memory outcomes, with memory accuracy varying across interfaces and showing particular sensitivity to sleep maintenance difficulties. Subjective attentional control was strongly associated with both eye-tracking metrics and memory performance, and interface preferences differed by insomnia status. Conclusions Interface design significantly modulates attention and short-term memory performance in users with insomnia. Eye-tracking revealed that insomnia symptoms and sleep quality influence visual attention and task efficiency, whereas subjective attentional control showed stronger and more consistent associations with memory accuracy than physiological eye-movement indicators. These findings suggest that cognitive processing during sleep diary completion relies more on internal attentional states than on observable gaze behavior. Designing low-load, attention-supportive interfaces may therefore improve usability and data accuracy in digital BBT-I interventions.
When an AI assistant remembers that Sarah is a single mother working two jobs, does it interpret her stress differently than if she were a wealthy executive? As personalized AI systems increasingly incorporate long-term user memory, understanding how this memory shapes emotional reasoning is critical. We investigate how user memory affects emotional intelligence in large language models (LLMs) by evaluating 15 models on human validated emotional intelligence tests. We find that identical scenarios paired with different user profiles produce systematically divergent emotional interpretations. Across validated user independent emotional scenarios and diverse user profiles, systematic biases emerged in several high-performing LLMs where advantaged profiles received more accurate emotional interpretations. Moreover, LLMs demonstrate significant disparities across demographic factors in emotion understanding and supportive recommendations tasks, indicating that personalization mechanisms can embed social hierarchies into models emotional reasoning. These results highlight a key challenge for memory enhanced AI: systems designed for personalization may inadvertently reinforce social inequalities.
As Artificial Intelligence (AI) tools like ChatGPT gain traction in clinical contexts, their role in neurorehabilitation, particularly in addressing executive function impairments associated with ADHD, remains underexplored. This study examines whether generative AI can meaningfully support clinicians in designing individualized cognitive rehabilitation plans, not as a replacement but as a complementary aid. The research consisted of three separate studies, each addressing distinct stages of the investigation. First, expert-driven prompts were developed based on literature and clinical insights to guide ChatGPT in generating rehabilitation plans for three hypothetical profiles of individuals with ADHD (adolescents, adults, and older adults). In Study 2, the outputs were analyzed using a semi-systematic qualitative framework (ISAAC), assessing structure, coherence, and adaptability across developmental stages. Study 3 involved an external panel of 27 neuropsychologists and cognitive rehabilitation specialists (M = 6; F = 21; mean age = 46.5, SD = 15) who rated each plan’s theoretical validity, clinical relevance, and feasibility. Experts in Study 3 generally responded positively to the theoretical consistency of the plans, especially those for adolescents and adults, recognizing alignment with established models of executive function rehabilitation. Many professionals expressed openness to using AI as a support tool in practice. However, feasibility emerged as a key limitation, with concerns over a lack of personalization, unrealistic resource assumptions, and unvalidated techniques, particularly in adult and older adult profiles. These findings align with earlier studies in occupational therapy and clinical decision-making, which also identified challenges in real-world applicability. While clinical experts express cautious optimism about AI-assisted rehabilitation planning, further development is necessary to enhance accuracy, personalization, and feasibility for the safe integration of AI into clinical practice.
Background/Objectives: The evolution of digital technology enhances the broadening of a person’s intellectual growth. Research points out that implementing innovative applications of the digital world improves human social, cognitive, and metacognitive behavior. Artificial intelligence chatbots are yet another innovative human-made construct. These are forms of software that simulate human conversation, understand and process user input, and provide personalized responses. Executive function includes a set of higher mental processes necessary for formulating, planning, and achieving a goal. The present study aims to investigate executive function reinforcement through artificial intelligence chatbots, outlining potentials, limitations, and future research suggestions. Specifically, the study examined three research questions: the use of conversational chatbots in executive functioning training, their impact on executive-cognitive skills, and the duration of any improvements. Methods: The assessment of the existing literature was implemented using the systematic review method, according to the PRISMA 2020 Principles. The avalanche search method was employed to conduct a source search in the following databases: Scopus, Web of Science, PubMed, and complementary Google Scholar. This systematic review included studies from 2021 to the present using experimental, observational, or mixed methods. It included studies using AI-based chatbots or conversationalists to support executive functions, such as anxiety, stress, depression, memory, attention, cognitive load, and behavioral changes. In addition, this study included general populations with specific neurological conditions, all peer-reviewed, written in English, and with full-text access. However, the study excluded studies before 2021, the literature reviews, systematic reviews, non-AI-based chatbots or conversationalists, studies not targeting the range of executive skills and abilities, studies not written in English, and studies without open access. The criteria aligned with the study objectives, ensuring a focus on AI chatbots and the impact of conversational agents on executive function. The initial collection totaled n = 115 articles; however, the eligibility requirements led to the final selection of n = 10 studies. Results: The findings of the studies suggested positive effects of using AI chatbots to enhance and improve executive skills. Although, several limitations were identified, making it still difficult to generalize and reproduce their effects. Conclusions: AI chatbots are an innovative artificial intelligence tool that can function as a digital assistant for learning and expanding executive skills, contributing to the cognitive, metacognitive, and social development of the individual. However, its use in executive skills training is at a primary stage. The findings highlighted the need for a unified framework for reference and future studies, better study designs, diverse populations, larger sample sizes of participants, and longitudinal studies that observe the long-term effects of their use.
Reliable detection and prediction of neural activity and behavior requires a user model of brain activity that dynamically adapts based on known time-dependent physiological processes, as well as unknown traits of the user. We have applied wireless electroencephalography (EEG) sensors, edge devices with feedback capability, and cloud-assisted data acquisition to real-time and longitudinal brain monitoring and alerting. Toward a user model of brain function, we collected neural and behavioral data from humans in the field. The data replicate previous findings that were obtained under tight laboratory control, suggesting that the methods that we describe will be useful for user modeling of human brain activity under more natural conditions. Specifically, we report that frontal cortex oscillations reorganized with age. Focusing on time-varying aspects of behavior, we then found that performance on memory-intensive cognitive tasks declined during the day. Next, we examined interactions between neural activity and behavioral performance. We report that neural activity and performance co-varied and that this co-variation depended on the cognitive task in ways that were, again, consistent with previous laboratory studies. Lastly, we report the foundations of an adaptive model based on this system that will enable dynamic personalization tailored to each user.
This systematic review synthesizes empirical evidence on artificial intelligence-enhanced learning interventions targeting executive function development across diverse populations and developmental stages within digital humanities contexts. Following PRISMA guidelines, a comprehensive search of five databases (PsycINFO, ERIC, Web of Science, Scopus, PubMed) from January 2020 through December 2024 identified 14 studies encompassing 1,810 participants aged 6 to 77 years. Included studies examined adaptive intelligent tutoring systems, virtual reality platforms, computerized cognitive training programs, computational thinking interventions, and machine learning-based assessment tools applied to humanities education and research. Results demonstrated consistent positive effects on inhibitory control (effect sizes: 0.11–0.62), cognitive flexibility, working memory (effect sizes: 0.09–0.18), and planning abilities, with machine learning models achieving high diagnostic accuracy (86.8%) for executive function impairments. Effectiveness was moderated by individual baseline cognitive capacity, particularly working memory constraints. Theoretical mechanisms underlying improvements included adaptive difficulty adjustment, cognitive load optimization, personalized scaffolding through Case-Based Reasoning and reinforcement learning algorithms, and neuroplasticity-driven efficiency gains. Despite promising findings, limitations include intervention heterogeneity, brief intervention durations, and limited long-term follow-up. Future research should prioritize longitudinal randomized controlled trials, neuroimaging studies elucidating neural mechanisms, and implementation science investigations supporting evidence-based integration of AI technologies in digital humanities pedagogy and clinical contexts.
BACKGROUND Chat generative retrained transformer (ChatGPT) represents a groundbreaking advancement in Artificial Intelligence (AI-chatbot) technology, utilizing transformer algorithms to enhance natural language processing and facilitating their use for addressing specific tasks. These AI chatbots can respond to questions by generating verbal instructions similar to those a person would provide during the problem-solving process. AIM ChatGPT has become the fastest growing software in terms of user adoption in history, leading to an anticipated widespread use of this technology in the general population. Current literature is predominantly focused on the functional aspects of these technologies, but the field has not yet explored hypotheses on how these AI chatbots could impact the evolutionary aspects of human cognitive development. Thesis: The "neuronal recycling hypothesis" posits that the brain undergoes structural transformation by incorporating new cultural tools into "neural niches," consequently altering individual cognition. In the case of technological tools, it has been established that they reduce the cognitive demand needed to solve tasks through a process called "cognitive offloading." In this theoretical article, three hypotheses were proposed via forward inference about how algorithms such as ChatGPT and similar models may influence the cognitive processes and structures of upcoming generations. CONCLUSIONS By forecasting the neurocognitive effects of these technologies, educational and political communities can anticipate future scenarios and formulate strategic plans to either mitigate or enhance the cognitive influence that these factors may have on the general population. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
OBJECTIVE To explore potential differences in the relationship between executive function (EF) skills and language development when integrating augmentative and alternative communication technology into speech-language therapy for deaf/hard of hearing (DHH) children. METHOD Randomized trial data were analysed to investigate this relationship among children who participated in a Technology-Assisted Language Intervention (TALI) compared to treatment as usual (TAU). Language samples were assessed for pre-post-intervention changes, including mean length of utterance in morphemes (MLU), mean turn length (MTL), and number of different words spoken (NDW). EF skills were measured with standardized parent-report assessment. RESULTS Thirty-seven DHH children were included (TALI n = 19 and TAU n = 18). Results of regression models indicated that higher EF skills were significantly (p < 0.05) associated with improvements in MLU, MTL, and NDW among children who received TAU. No significant associations between EF skills and any of the measures (MLU, MTL, NDW) were seen in children who received TALI. CONCLUSION These results suggest that TALI may offer language learning support, particularly for DHH children with EF difficulties. Future research should investigate the direct relationship between EF measurements and language outcomes in TALI recipients. Establishing baseline EF measurements in DHH children could inform personalized strategies within language interventions and therapy.
Excoriation, or skin picking disorder (SPD), is a common and impairing condition, yet many individuals face barriers to accessing evidence-based care. Digital interventions offer a scalable option, but few have been evaluated in real-world settings. This study examined naturalistic outcomes among 2063 adult users of StopPicking, a self-guided online intervention for SPD. The program includes three modules: Module 1 focuses on self-monitoring and assessment, Module 2 delivers personalized behavioral strategies, and Module 3 supports maintenance and relapse prevention. Symptom severity was assessed at baseline, post-Module 1, and at final program use. Using an intent-to-treat approach, multilevel modeling showed a medium-sized reduction in symptom severity over time (Hedges' g = 0.71). Greater engagement, measured by self-monitoring frequency and number of weeks active in the program, was associated with lower final severity scores. Baseline severity, age of SPD onset, and age group were linked to usage patterns: users with higher severity or earlier onset tended to self-monitor more consistently, while older adults had higher rates of early discontinuation. Findings support StopPicking as a viable self-help option and highlight the importance of promoting meaningful engagement. Results also suggest that self-monitoring may function both as an indicator of engagement and a potential mechanism of change.
Large Language Models (LLMs) demonstrate significant potential in the field of mental health. However, existing chatbots often lack personalized designs, which may limit their ability to fully address the complex needs of users with depression. This study builds upon the previously developed CloudEcho system, a mental health management application that integrates emotion monitoring and psychological support functions, to explore the impact of role customization features on user trust and customization experience. Using a mixed-methods approach, this study compares the differences between the system with role customization features and its original version. Quantitative results indicate that role customization can enhance user trust, showcasing high usability and satisfaction. Qualitative interviews further reveal the strengths and limitations of this feature and suggest directions for optimization. Together, these findings highlight the potential value of chatbot role customization in mental health support and offer theoretical and practical guidance for future LLM-driven personalized design and optimization in mental health contexts.
Background:The accelerated use of large language models like ChatGPT has revolutionized human emotional and cognitive involvement, yet its neuropsychological implications remain poorly known. The present study proposes the concept of cognitive debt, the accumulated strain on attention, memory, and metacognitive control triggered by sustained AI engagement. This study investigated how distinctive patterns of ChatGPT involvement spanning usage frequency, emotional and cognitive engagement, and ethical reflection predict cognitive dysfunction across four user typologies: low–moderate, minimal/unhealthy, balanced–cognitive, and ethically reflective users. Method: This study employed a purposive sampling strategy within a web-based cross-sectional design to recruit 300 emerging adults (aged 18–25 years) from universities in Rawalpindi and Islamabad, Pakistan, between June 25 and July 12, 2025. Participants completed two standardized psychological instruments examining ChatGPT usage and cognitive dysfunction via an online survey administered on Google Forms. The survey link was disseminated through multiple digital platforms, including WhatsApp, Facebook, and official university email network to ensure broad accessibility and voluntary participation. Results: The results revealed that higher ChatGPT usage, specifically emotionally driven involvement, was associated with increased cognitive dysfunction, including impairments in memory, attention, and executive control across all user profiles, proposing that emotionally driven and impulsive interplay with generative AI diminishes executive control and heightens cognitive load. In contrast, ethical reflection indicated a mild protective effect against cognitive dysfunction. Moreover, females exhibited higher cognitive vulnerability than males, while males reported greater ChatGPT engagement and susceptibility to its cognitive effects as compared to females. Conclusions: The results explain two diverse cognitive stress pathways: (1) emotional compulsive engagement, described by affect-laden and impulsive AI use, and (2) reflective cognitive overload, where ethical contemplation paradoxically develops metacognitive load. These novel results improve the concept of cognitive debt, proposing that both over reflective and overreliant AI interactions could impair cognitive efficacy. The research highlights the urgency of establishing evidence-based digital ethical-use and literacy approaches to promote cognitively sustainable AI usage.
Abstract Evidence suggests that cognitive training can be beneficial for older adults to maintain and improve executive functions (EFs). EFs are important for many everyday functions, and they are impacted by aging and dementia. Integrating game-like features can boost motivation and engagement during cognitive training, promoting learning outcomes. Yet, incorporating game design elements may also increase cognitive load and distraction for some individuals. Using EF training, we investigated individual differences and the impact of gamification in older adults, including those at risk for Alzheimer’s disease and related dementias (AD/ADRD). We conducted two crossover randomized controlled trials (143 older adults; M=69 years, SD=7.5) comparing the effects of gamified and non-gamified EF interventions. We tested whether and how individual differences in inhibitory control (IC) or general cognitive ability (GCA) at baseline predict the effectiveness of these interventions. We hypothesized that participants with greater difficulty inhibiting distractions would benefit more from non-gamified training, whereas those with higher distractor tolerance would perform better with gamified tasks. Results showed that IC significantly predicted transfer. Furthermore, high IC individuals benefitted more from gamified training, whereas low IC individuals benefitted more from non-gamified training. Conversely, GCA did not predict transfer, with no significant interaction between GCA and training condition. These findings support the IC model over the GCA model on predicting transfer effects. Our findings will inform the development of personalized cognitive interventions to mitigate EF impairments, particularly in populations at risk for AD/ADRD.
This paper presents an IoT-based intelligent parenting system that integrates edge sensing, psychological temperament modeling, large language models (LLMs), and user feedback to deliver personalized and responsive infant care. Unlike existing solutions that rely on passive monitoring or generic recommendations, our system features a closed-loop, multi-agent architecture that supports individualized, explainable parenting interventions. The architecture follows a cloud-edge-user collaborative paradigm: multimodal sensors deployed at the edge and edge agents continuously monitor infant behavior and environmental conditions; on the cloud, an LLM agent performs context-aware reasoning with function calling, while a DeepSearch QA agent retrieves temperament-aligned parenting knowledge. A caregiver-facing app enables temperament profiling, query submission, and feedback collection, completing a human-in-the-loop cycle. To support fine-grained personalization, we construct a centralized knowledge base from validated parenting resources, continuously enriched with de-identified Q&A samples approved by users. Infant temperament profiles influence both retrieval and reasoning, ensuring targeted and interpretable responses. In experimental evaluations, our system achieved a 90% recall rate in personalized knowledge retrieval, significantly outperforming baseline RAG systems (65%). For response generation, Top-1 human approval reached 80%, highlighting the system’s potential as a trustworthy and adaptive AI assistant for infant care. We simulate the full pipeline within a smart crib use case to validate the closed-loop interaction and personalization capability under realistic care conditions.
This pilot study investigates the role of concentration, as measured through EEG, in the deployment of socio-emotional skills and executive functions within gamified simulation environments. Utilizing real-time monitoring, concentration indicators were analyzed to evaluate their relationship with key performance metrics, including satisfaction and overall task performance. Results indicate that higher concentration levels correlate positively with both performance and user satisfaction, highlighting the potential of EEG monitoring to optimize adaptive learning models by dynamically adjusting content based on cognitive states. The study contributes to the field by providing insights into how concentration impacts learning in simulated educational settings, suggesting that cognitive feedback can enhance the personalization and effectiveness of simulation-based education. These findings also highlight the relationship between executive functions, socio-emotional skills and performance in gamified simulators. Limitations include the small sample size and the use of a single EEG device, which may affect the generalizability of findings. Future research should explore integrating additional cognitive biomarkers and expand this approach across various educational contexts to validate the applicability and scalability of adaptive learning systems.
Students with Attention Deficit Hyperactivity Disorder (ADHD) often struggle with traditional text-based learning materials due to executive function deficits that affect their ability to process, organise, and retain information. While the rapid development of Large Language Models (LLMs) has sparked innovation in generative user interfaces, existing products fail to address the specific learning challenges faced by students with ADHD. We introduce a novel approach that leverages LLM agents as interactive mind-map creators specifically designed to support ADHD learners. Our solution automatically transforms dense text-based documents into interactive, ADHD-friendly interactive mind maps. These dynamic visual representations allow students to engage with learning tasks, explore content node by node, asking questions, and monitoring their learning progress. initial evaluation indicates improvements in four key areas: increased motivation to engage with learning materials, enhanced concentration during study sessions, better task planning and organisation skills, and improved ability to extract and understand main ideas from complex texts. By specifically addressing the needs of neurodivergent learners, this research contributes to the emerging field of LLM-powered generative user interfaces by demonstrating their potential as inclusive learning tools, opening up new avenues for exploration.
Embedding emotional intelligence into Artificial General Intelligence (AGI) enhances its cognitive versatility beyond mere emotion recognition. This study develops a multimodal AGI prototype that integrates facial expression data (CK+ dataset) and speech emotion cues (RAVDESS dataset) to inform reasoning and adaptability across diverse tasks. A transformer-based architecture fuses visual and auditory embeddings, feeding them into a reasoning module enhanced by a fine-tuned large language model (LLM) to generate emotionally tailored responses. The AGI assists users in problem-solving or independently resolves scenarios, adapting to emotional contexts. Evaluation targets >85% emotion recognition accuracy and 20% higher task success rates—measured as user or AGI performance improvements—over emotion-agnostic baselines, plus user satisfaction (>4/5) from 50 evaluators. Early simulations suggest emotional integration boosts contextual reasoning by up to 25%. This work pioneers an affective AGI framework, leveraging emotional cues to enhance cognition, with applications spanning education, healthcare, conflict resolution, and ethical AI, and insights into emotion’s role in intelligence.
This study advances a testable account of human-AI collaborative competence via the C²L-AI framework. We integrate Activity Theory, Distributed Cognition, and sociomateriality to reconceptualize collaboration, communication, and leadership alongside AI Interaction Competence. We operationalize constructs with an Evidence-Centered Design (ECD) multimodal matrix spanning NLP, social/epistemic network analysis, and VR behavioral analytics. To generate causal evidence, we propose a three-arm randomized controlled trial in a multi-user VR leadership task comparing Explainable AI (XAI) feedback, standard feedback, and no feedback. We hypothesize that XAI yields greater gains in leadership, communication, and AI interaction competence, mediated by improvements in team cognitive architecture (shared mental models, transactive memory). The work offers a unified theory, measurable indicators, and an empirical pathway for designing effective, ethical human-AI learning systems.
The goal-directed behavior observed in humans arises from the intricate interplay of various processes, including personality dynamics, emotional responses to others, memory encoding, the anticipation of future actions, and associated hedonic experiences. Integrating these multiple processes characteristic of human intelligence into a robotic framework aims to enhance the human-likeness of artificial agents and facilitate more natural and intuitive interactions with humans.For this purpose, in this paper, we propose a comprehensive psychological and cognitive architecture where, personality, as it happens for humans, not only influences the execution of actions but also shapes internal reactions to human emotions and guides anticipatory decision-making processes tailored to the agent’s traits. We demonstrate the framework’s effectiveness in generating perceivable synthetic personalities through an experiment involving participants in a dyadic conversation scenario with a digital human, where the digital human’s behavior is driven by its assigned personality. The results show that participants accurately perceive the artificial personality displayed by the digital human. We also demonstrate the potential of our robotic framework to bridge the gap between cognitive and psychological agents, as the findings highlight its ability to create a cognitively and emotionally intelligent digital human.
No abstract available
In recent years, we have experienced rapid development of advanced technology, machine learning, and artificial intelligence (AI), intended to interact with and augment the abilities of humans in practically every area of life. With the rapid growth of new capabilities, such as those enabled by generative AI (e.g., ChatGPT), AI is increasingly at the center of human communication and collaboration, resulting in a growing recognition of the need to understand how humans and AI can integrate their inputs in collaborative teams. However, there are many unanswered questions regarding how human-AI collective intelligence will emerge and what the barriers might be. Truly integrated collaboration between humans and intelligent agents may result in a different way of working that looks nothing like what we know now, and it is important to keep the essential goal of human societal well-being and prosperity a priority. In this special issue, we begin to scope out the underpinnings of a socio-cognitive architecture for Collective HUman-MAchine INtelligence (COHUMAIN), which is the study of the capability of an integrated human and machine (i.e., intelligent technology) system to achieve goals in a wide range of environments. This topic consists of nine papers including a description of the conceptual foundation for a socio-cognitive architecture for COHUMAIN, empirical tests of some aspects of this architecture, research on proposed representations of intelligent agents that can jointly interact with humans, empirical tests of human-human and human-machine interactions, and philosophical and ethical issues to consider as we develop these systems.
Incorporation of artificial intelligence (AI) into social robots has introduced a novel stage in the relationship between humans and robots, and it focuses on emotional intelligence to help have an empathetic interaction. Emotional intelligence through AI is expected to make robots identify, understand and act on human emotions to build trust, collaboration, and socially adaptive behaviour. Affectively computational social robots make use of multimodal inputs, i.e., facial expressions, speech intonation, gestures, and physiological information to simulate human emotional states in real-time. This essay examines the design concepts, computational theories and uses of emotionally intelligent social robots. These robots are able to modify behavior, communication style and decision using machine learning, natural language processing and cognitive modeling to match human affective cues. The paper explores the architecture of the systems, integration of sensors, algorithms, and ethical issues regarding empathetic interaction. The experimental assessment of AI-based emotional intelligence in healthcare, education, and customer service settings show that this approach has the potential to improve interaction, interpersonal relationship, and task care. The results highlight the revolutionizing potential of the emotively conscious AI in developing social robots that can communicate meaningfully with human beings, find a medium of interaction between computational thinking, and responsiveness through affective response in practice.
No abstract available
No abstract available
The current research examines cognitive biases and climate communication based on artificial intelligence (AI), as well as their linkage to public climate change message comprehension or engagement. Such cognitive biases as optimism, anchoring, and confirmation bias may cause people to misperceive climate messages, tending to undermine risks and lack motivation to change their pro-environmental behaviors. These relationships were studied using a quantitative research design. The sample size of participants was 280 people who are actively seeking climate information over the Internet using digital sources, such as social networks, news websites, and AI-based software. Stratified random sampling strategy was employed so as to cover a wide range of demographic unevenness, in terms of age, sex, education level and career. Structured questionnaires would be used to obtain measurable data that would be analyzed by use of statistical techniques in order to determine association, predictive effects and group differences. The results show that AI technologies can enhance knowledge and inspire pro-environmental attitude, yet their performance can be affected by overcoming cognitive bias and scrupulousness in message delivery. Perception can be interfered with by demographic factors which mean that communication strategies have to be tailored depending on demographic factors. The paper highlights how psychology and AI design can collectively inform climate communication strategies to be evidence-based, persuasive, and inclusive, and offers recommendations to policymakers, educators, and communicators so they can increase awareness, drive informed decision-making, and promote pro-environmental behavior.
Abstract Artificial Intelligence of the next generation needs to interact with users socially, convincing them in its ability to understand human minds, including emotions. For this to happen, an artificial emotional intelligence is needed, capable of adequate, believable behavior in social emotional interactions. Building on previous developments, the present work extends the general framework of emotional Biologically Inspired Cognitive Architecture (eBICA: Samsonovich, 2013, 2018), endowing it with fluents describing, in addition to appraisals, somatic markers, feelings, emotions, moods, emotional reactions and biases. Key building blocks that integrate them are moral schemas and semantic maps. The model describes interaction of three factors: plans and commitments, moral and ethical values, and somatic comfort. Learning in this framework includes self-organization of semantic maps that in turn may provide guidance for active humanlike learning. Implications for empirical studies and practical applications are discussed together with the expected impact.
With the ever-increasing adoption of generative artificial intelligence (GenAI) chatbots in diverse communication roles, there is a burgeoning need to address how people process and respond to messages from these technologies. This study explores the theoretical mechanisms by which the use of GenAI chatbots influences persuasive outcomes related to mental health, examining perceived message contingency and cognitive elaboration as mediators, and issue involvement as a moderator between the two mediators. Through an experimental setting using ChatGPT, compared to non-conversational online resources, we discovered that GenAI chatbot use led to greater cognitive elaboration through an increased sense of message contingency, leading to greater mental health self-efficacy. The influence of perceived message contingency on cognitive elaboration was more pronounced among participants with higher issue involvement.
As artificial intelligence (AI) becomes increasingly prevalent, humans and organizations must make decisions regarding its adoption and use. However, the adoption of AI can be a complex process that challenges existing beliefs and practices. Cognitive dissonance, the discomfort that arises when individuals are faced with conflicting beliefs or behaviors, can exist within both the individual and organizational AI adoption journey. This article explores the concept of cognitive dissonance in relation to the adoption and use of AI, particularly when users may not even be aware that they are interacting with AI. By understanding the role of cognitive dissonance in the adoption process, individuals and organizations can make informed decisions that promote its successful integration and potentially lead to greater acceptance of its capabilities.
Introduction: This perspective article reflects on how innovative technologies, including artificial intelligence (AI) systems like smart voice agents and chatbots, may transform family dynamics and communication. Despite the extensive research on AI’s impact in mental healthcare and education, its influence on family systems remains underexplored. This perspective article aims to draw attention to the possible positive and detrimental effects of using AI in families, highlighting the necessity of fostering AI literacy in this setting. Areas covered: The article delves into integrating AI within family therapy models, focusing on how AI redefines family boundaries, roles, communication, rituals, and narrative creation. It explores AI’s potential to enhance parent training programs and its impact on children’s social and cognitive development. Expert opinion: AI presents both opportunities and challenges for family systems. It can enhance communication, support role negotiation, and promote family cohesion, but it also raises ethical and privacy concerns. The balance between utilizing AI to support family values and avoiding the detrimental effects of over-reliance is crucial. Conclusion: Integrating AI into family systems offers significant potential benefits, but it must be managed carefully to ensure it aligns with family values and strengthens family bonds. Fostering AI literacy within families is essential to navigate the complexities and harness the advantages of AI technologies.
The article discusses the task of building multimodal AI models for diagnosing the cognitive state of students (concentration, fatigue, stress) in digital educational environments. The necessity of transition from traditional methods of psychodiagnostics to automated systems based on natural language processing, computer vision and behavioral analysis is substantiated. A mathematical model based on the CNN-LSTM hybrid architecture with the adaptation of parameters to individual cognitive profiles is proposed. The structure of the model is described, recommendations for its construction and integration into the digital educational infrastructure are given. The problems of interpretability, privacy, and sustainability of such models, as well as the prospects for their application, are discussed.
This study investigated the adoption of Artificial Intelligence (AI) in leadership communication and the influence of emotional intelligence on AI-driven leadership practices in public Colleges of Education in Plateau State, Nigeria. Guided by the Social Cognitive Theory and Emotional Intelligence Theory, the research employed a descriptive survey design. The target population comprised 1,150 academic and administrative staff across the state’s public Colleges of Education. A sample of 250 respondents was selected using stratified random sampling. Data were gathered through a structured questionnaire and analyzed using descriptive statistics, Pearson correlation, and multiple regression techniques to address three research questions and test three hypotheses. Findings revealed a moderate level of AI adoption for leadership communication, with respondents affirming its contribution to decision-making efficiency and communication clarity. Emotional intelligence was found to significantly moderate the relationship between AI use and leadership performance (R² = 0.453, p < 0.05). Additionally, institutional and infrastructural challenges such as inadequate training and poor internet access were statistically significant barriers to AI integration (R² = 0.465, p < 0.05). Based on these findings, the study recommends the institutionalization of AI training for leadership staff, adoption of emotional intelligence frameworks in administrative procedures, and increased investment in digital infrastructure. This study underscores the importance of combining technological advancement with emotional competence in enhancing leadership effectiveness. Strengthening AI and emotional intelligence capacity can help bridge the leadership communication gap in Nigeria’s tertiary education sector.
This study explores the evolving role of Artificial Intelligence (AI) in design decision-making, with a particular emphasis on its impact on human cognitive faculties, including creativity, critical thinking, and intuition. As AI- driven systems increasingly redefine traditional decision-making paradigms through data-driven automation, this research examines the interplay between AI and human sense-making within innovative design processes. Adopting a communication-centric framework, the study underscores the significance of effective collaboration between AI technologies and human designers in enhancing problem-solving capabilities across organizational and creative contexts. While AI enhances efficiency, pattern recognition, and technical rationality, human intuition remains essential for ensuring ethical, contextually aware, and creatively robust design solutions. Drawing on Karl Weick’s sense-making theory, this research provides a structured approach to understanding AI’s role in augmenting rather than supplanting human creativity. The findings highlight the necessity of interdisciplinary collaboration, AI transparency, and iterative feedback mechanisms to sustain a balanced integration of AI-driven insights with human-centered decision-making. Ultimately, this study contributes to the advancement of AI-assisted design frameworks that prioritize both efficiency and human innovation.
No abstract available
Cognitive psychology is a science of human knowledge, which means that people perceive, acquire, memorize, think, and comprehend intellectual capabilities. The psychological strategy involves controlling every action and status of the human body. The problematic states of psychological facts include mental disorders like depression, stress, anxiety, and inferiority complex, leading to memory loss. The emerged technique of cognitive psychological managing framework using artificial intelligence (CPMF-AI) is introduced. The proposed framework is extended to forecast the psychological standards of the human brain for practical well-being. There are four methods to monitor memory power, stress, and other human mental disorders. They are distant neural systems (DNS), convolutional psychology tracking systems (CPTS), intelligent neural systems (INS), and memory-building strategies (MBS). Besides language aspects, physical aspects play a vital part in human–robot interaction (HRI) and make the difference compared to the more limited HRI communication. These methodologies are integrated into four case studies to detect neural passage systems for monitoring mental issues. The simulation analysis helps enhance the framework’s accuracy and minimize the error rate. Thus, the proposed system of cognitive technology is comparatively better than the existing methods.
Artificial intelligence (AI) systems are increasingly designed with affective characteristics. Emotionalized AI machines simulate human emotions to provide emotional assistance to human beings in various areas of life, and in turn, reshaping communication. Both the cognitive and affective aspects of artificial intelligence technology are widely adopted in today’s world due to their positive impact in accomplishing tasks. However, the critical question regarding the implications of AI advancement on the emotional human communication is the crux of this study. Thus, this research work inquired into the implications of artificial intelligence systems on the innate emotional human interaction. Communication research in this area has not received much attention. Therefore, through the use of conceptual research approach, 186 scientific article publications were extracted from Ebsco and Google databases, out of which 47 met the inclusion criteria based on the study objective. Data from these secondary sources were critically analyzed. This study was anchored on the theoretical foundation of Diffusion of Innovation theory. The study argues that though the advancement in artificial intelligence (AI) systems comes with lots of positive values, the intrinsic human-human emotional communication is a greater value that should not be endangered or substituted by technology. Hence, the paper recommends a conscious effort on the part of researchers and manufacturers of artificial intelligence (AI) systems to strictly abide by ethical standards in order to preserve this innate value in the human person.
No abstract available
The articles in this special section examine human performance and design of engineering psychology. Cognitive computing has broad horizons, which covers different characteristics of cognition. Moreover, cognitive science is an interdisciplinary, scientific study of human reasoning, emotions, language, perception, attention, and memory. However, artificial intelligence (AI) explores the design of computers and software that would be capable of intelligent behavior. The integration of cognitive science and AI offers a deep understanding of human cognition and communication. In addition, the creative and technical skills apply the knowledge in AI solutions and applications in engineering psychology.
The article is dedicated to a comprehensive analysis of artificial intelligence as a phenomenon that is increasingly entering the sphere of human consciousness, acting not only as a tool but also as a new type of communication intermediary. The study focuses on the underlying psychological mechanisms activated through interaction with algorithmic systems, particularly emphasising changes in cognitive processes, the delegation of ethical decision-making, the erosion of reflexivity, and the emergence of emotional dependency. The understanding of AI as a source of cognitive and moral influence on human consciousness goes beyond purely technological issues and moves into the realm of psychoanthropological discourse, which allows us to reveal the deep nature of the latest changes in thinking, communication and behaviour. The article argues that digital interaction with neural networks activates archetypal representations and mechanisms of projection, which lead to the perception of AI as a subject capable of empathy, moral judgement, and support. It is this misconception that creates the risk of losing critical thinking, moving from reflective cognition to automated perception, and moral infantilisation of the individual. Particular emphasis is placed on the issue of AI anthropomorphism, which, by penetrating the psycho-emotional sphere, transforms conceptions of interpersonal relationships, communicative reciprocity, and the boundaries of the human «self». It is pointed out that such interaction gradually replaces authentic dialogue and complicates the ability to ethical doubt, reflection and responsible choice. The need is highlighted for the development of new models of digital hygiene that encompass not only technical skills, but also psychological and ethical literacy. The study uses the methods of critical analysis, conceptual generalisation of interdisciplinary approaches, and an associative experiment as a means of revealing hidden attitudes of consciousness towards the image of AI. The conclusion is substantiated that interaction with a technologically simulated interlocutor constitutes not only a challenge to the reconsideration of the structure of thought but also an indicator of a shift in the value paradigm that defines the mode of human existence in the digital age.
Our world is being increasingly pervaded by intelligent robots with varying degrees of autonomy. To seamlessly integrate themselves in our society, these machines should possess the ability to navigate the complexities of our daily routines even in the absence of a human's direct input. In other words, we want these robots to understand the intentions of their partners with the purpose of predicting the best way to help them. In this paper, we present CASPER (Cognitive Architecture for Social Perception and Engagement in Robots): a symbolic cognitive architecture that uses qualitative spatial reasoning to anticipate the pursued goal of another agent and to calculate the best collaborative behavior. This is performed through an ensemble of parallel processes that model a low-level action recognition and a high-level goal understanding, both of which are formally verified. We have tested this architecture in a simulated kitchen environment and the results we have collected show that the robot is able to both recognize an ongoing goal and to properly collaborate towards its achievement. This demonstrates a new use of Qualitative Spatial Relations applied to the problem of intention reading in the domain of human-robot interaction.
Rapid individual cognitive phenotyping holds the potential to revolutionize domains as wide-ranging as personalized learning, employment practices, and precision psychiatry. Going beyond limitations imposed by traditional lab-based experiments, new efforts have been underway towards greater ecological validity and participant diversity to capture the full range of individual differences in cognitive abilities and behaviors across the general population. Building on this, we developed Skill Lab, a novel game-based tool that simultaneously assesses a broad suite of cognitive abilities while providing an engaging narrative. Skill Lab consists of six mini-games as well as 14 established cognitive ability tasks. Using a popular citizen science platform (N = 10725), we conducted a comprehensive validation in the wild of a game-based cognitive assessment suite. Based on the game and validation task data, we constructed reliable models to simultaneously predict eight cognitive abilities based on the users' in-game behavior. Follow-'-up validation tests revealed that the models can discriminate nuances contained within each separate cognitive ability as well as capture a shared main factor of generalized cognitive ability. Our game-based measures are five times faster to complete than the equivalent task-based measures and replicate previous findings on the decline of certain cognitive abilities with age in our large cross-sectional population sample (N = 6369). Taken together, our results demonstrate the feasibility of rapid in-the-wild systematic assessment of cognitive abilities as a promising first step towards population-scale benchmarking and individualized mental health diagnostics.
Wayfinding, the ability to recall the environment and navigate through it, is an essential cognitive skill relied upon almost every day in a person's life. A crucial component of wayfinding is the construction of cognitive maps, mental representations of the environments through which a person travels. Age, disease or injury can severely affect cognitive mapping, making assessment of this basic survival skill particularly important to clinicians and therapists. Cognitive mapping has also been the focus of decades of basic research by cognitive psychologists. Both communities have evolved a number of techniques for assessing cognitive mapping ability. We present the Cognitive Map Probe (CMP), a new computerized tool for assessment of cognitive mapping ability that increases consistency and promises improvements in flexibility, accessibility, sensitivity and control. The CMP uses a tangible user interface that affords spatial manipulation. We describe the design of the CMP, and find that it is sensitive to factors known to affect cognitive mapping performance in extensive experimental testing.
This paper introduces a cognitive architecture for a humanoid robot to engage in a proactive, mixed-initiative exploration and manipulation of its environment, where the initiative can originate from both the human and the robot. The framework, based on a biologically-grounded theory of the brain and mind, integrates a reactive interaction engine, a number of state-of-the-art perceptual and motor learning algorithms, as well as planning abilities and an autobiographical memory. The architecture as a whole drives the robot behavior to solve the symbol grounding problem, acquire language capabilities, execute goal-oriented behavior, and express a verbal narrative of its own experience in the world. We validate our approach in human-robot interaction experiments with the iCub humanoid robot, showing that the proposed cognitive architecture can be applied in real time within a realistic scenario and that it can be used with naive users.
Jackendoff (2002) posed four challenges that linguistic combinatoriality and rules of language present to theories of brain function. The essence of these problems is the question of how to neurally instantiate the rapid construction and transformation of the compositional structures that are typically taken to be the domain of symbolic processing. He contended that typical connectionist approaches fail to meet these challenges and that the dialogue between linguistic theory and cognitive neuroscience will be relatively unproductive until the importance of these problems is widely recognised and the challenges answered by some technical innovation in connectionist modelling. This paper claims that a little-known family of connectionist models (Vector Symbolic Architectures) are able to meet Jackendoff's challenges.
The intellectual organization of the sciences cannot be appreciated sufficiently unless the cognitive dimension is considered as an independent source of variance. Cognitive structures interact and co-construct the organization of scholars and discourses into research programs, specialties, and disciplines. In the sociology of scientific knowledge and the sociology of translation, these heterogeneous sources of variance have been homogenized a priori in the concepts of practices and actor-networks. Practices and actor-networks, however, can be explained in terms of the self-organization of the cognitive code in scientific communication. The code selects knowledge claims by organizing them operationally in the various discourses; the claims can thus be stabilized and potentially globalized. Both the selecting codes and the variation in the knowledge claims remain constructed, but the different sub-dynamics can be expected to operate asymmetrically and to update with other frequencies.
Artificial intelligence is increasingly embedded in human decision-making, where it can either enhance human reasoning or induce excessive cognitive dependence. This paper introduces a conceptual and mathematical framework for distinguishing cognitive amplification, in which AI improves hybrid human-AI performance while preserving human expertise, from cognitive delegation, in which reasoning is progressively outsourced to AI systems. To characterize these regimes, we define a set of operational metrics: the Cognitive Amplification Index (CAI*), the Dependency Ratio (D), the Human Reliance Index (HRI), and the Human Cognitive Drift Rate (HCDR). Together, these quantities provide a low-dimensional metric space for evaluating not only whether human-AI systems achieve genuine synergistic performance, but also whether such performance is cognitively sustainable for the human component over time. The framework highlights a central design tension in human-AI systems: maximizing short-term hybrid capability does not necessarily preserve long-term human cognitive competence. We therefore argue that human-AI systems should be designed under a cognitive sustainability constraint, such that gains in hybrid performance do not come at the cost of degradation in human expertise.
Recent research revealed a considerable lack of reliability for user feedback when interacting with adaptive systems, often denoted as user noise or human uncertainty. Moreover, this lack of reliability holds striking impacts for the assessment of adaptive systems and personalisation approaches. Whenever research on this topic is done, there is a very strong system-centric view in which user variation is something undesirable and should be modelled with the eye to eliminate. However, the possibilities of extracting additional information were only insufficiently considered so far. In this contribution we consider the neuroscientific theory of the Bayesian brain in order to develop novel user models with the power of turning the variability of user behaviour into additional information for improving recommendation and personalisation. To this end, we first introduce an adaptive model in which populations of neurons provide an estimation for a feedback to be submitted. Subsequently, we present various decoder functions with which neuronal activity can be translated into quantitative decisions. The interplay of cognition model and decoder functions lead to different model-based properties of decision-making. This will help to associate users to different clusters on the basis of their individual neural characteristics and thinking patterns. By means of user experiments and simulations, we show that this information can be used to improve the standard collaborative filtering.
Attempts to import dual-system descriptions of System-1 and System-2 into AI have been hindered by a lack of clarity over their distinction. We address this and other issues by situating System-1 and System-2 within the Common Model of Cognition. Results show that what are thought to be distinctive characteristics of System-1 and 2 instead form a spectrum of cognitive properties. The Common Model provides a comprehensive vision of the computational units involved in System-1 and System-2, their underlying mechanisms, and the implications for learning, metacognition, and emotion.
As computational power has continued to increase, and sensors have become more accurate, the corresponding advent of systems that are at once cognitive and immersive has arrived. These \textit{cognitive and immersive systems} (CAISs) fall squarely into the intersection of AI with HCI/HRI: such systems interact with and assist the human agents that enter them, in no small part because such systems are infused with AI able to understand and reason about these humans and their knowledge, beliefs, goals, communications, plans, etc. We herein explain our approach to engineering CAISs. We emphasize the capacity of a CAIS to develop and reason over a `theory of the mind' of its human partners. This capacity entails that the AI in question has a sophisticated model of the beliefs, knowledge, goals, desires, emotions, etc.\ of these humans. To accomplish this engineering, a formal framework of very high expressivity is needed. In our case, this framework is a \textit{cognitive event calculus}, a particular kind of quantified multi-operator modal logic, and a matching high-expressivity automated reasoner and planner. To explain, advance, and to a degree validate our approach, we show that a calculus of this type satisfies a set of formal requirements, and can enable a CAIS to understand a psychologically tricky scenario couched in what we call the \textit{cognitive polysolid framework} (CPF). We also formally show that a room that satisfies these requirements can have a useful property we term \emph{expectation of usefulness}. CPF, a sub-class of \textit{cognitive microworlds}, includes machinery able to represent and plan over not merely blocks and actions (such as seen in the primitive `blocks worlds' of old), but also over agents and their mental attitudes about both other agents and inanimate objects.
This study investigates factors influencing employees' perceptions of the usefulness of Business Process Management Systems (BPMS) in commercial settings. It explores the roles of system dependency, system quality, and the quality of information and knowledge in the adoption and use of BPMS. Data were collected using a structured questionnaire from end-users in various firms and analyzed with Partial Least Squares (PLS). The survey evaluated perceptions of service quality, input quality, system attributes, and overall system quality. The findings indicate that service quality, input quality, and specific system attributes significantly influence perceived system quality, while system dependency and information quality are predictors of perceived usefulness. The results highlight the importance of user training, support, and high-quality information in enhancing satisfaction and BPMS. This research offers empirical evidence on the factors impacting user perceptions and acceptance, emphasizing the need for user-centric approaches in BPMS.
There has been an increasing interest in inferring some personality traits from users and players in social networks and games, respectively. This goes beyond classical sentiment analysis, and also much further than customer profiling. The purpose here is to have a characterisation of users in terms of personality traits, such as openness, conscientiousness, extraversion, agreeableness, and neuroticism. While this is an incipient area of research, we ask the question of whether cognitive abilities, and intelligence in particular, are also measurable from user profiles. However, we pose the question as broadly as possible in terms of subjects, in the context of universal psychometrics, including humans, machines and hybrids. Namely, in this paper we analyse the following question: is it possible to measure the intelligence of humans and (non-human) bots in a social network or a game just from their user profiles, i.e., by observation, without the use of interactive tests, such as IQ tests, the Turing test or other more principled machine intelligence tests?
Within the context of human-robot interaction (HRI), Theory of Mind (ToM) is intended to serve as a user-friendly backend to the interface of robotic systems, enabling robots to infer and respond to human mental states. When integrated into robots, ToM allows them to adapt their internal models to users' behaviors, enhancing the interpretability and predictability of their actions. Similarly, Explainable Artificial Intelligence (XAI) aims to make AI systems transparent and interpretable, allowing humans to understand and interact with them effectively. Since ToM in HRI serves related purposes, we propose to consider ToM as a form of XAI and evaluate it through the eValuation XAI (VXAI) framework and its seven desiderata. This paper identifies a critical gap in the application of ToM within HRI, as existing methods rarely assess the extent to which explanations correspond to the robot's actual internal reasoning. To address this limitation, we propose to integrate ToM within XAI frameworks. By embedding ToM principles inside XAI, we argue for a shift in perspective, as current XAI research focuses predominantly on the AI system itself and often lacks user-centered explanations. Incorporating ToM would enable a change in focus, prioritizing the user's informational needs and perspective.
Understanding each other is the key to success in collaboration. For humans, attributing mental states to others, the theory of mind, provides the crucial advantage. We argue for formulating human--AI interaction as a multi-agent problem, endowing AI with a computational theory of mind to understand and anticipate the user. To differentiate the approach from previous work, we introduce a categorisation of user modelling approaches based on the level of agency learnt in the interaction. We describe our recent work in using nested multi-agent modelling to formulate user models for multi-armed bandit based interactive AI systems, including a proof-of-concept user study.
Self-disclosure is important to help us feel better, yet is often difficult. This difficulty can arise from how we think people are going to react to our self-disclosure. In this workshop paper, we briefly discuss self-disclosure to conversational user interfaces (CUIs) in relation to various social cues. We then, discuss how expressions of uncertainty or representation of a CUI's reasoning could help encourage self-disclosure, by making a CUI's intended "theory of mind" more transparent to users.
Originating in psychology, $\textit{Theory of Mind}$ (ToM) has attracted significant attention across multiple research communities, especially logic, economics, and robotics. Most psychological work does not aim at formalizing those central concepts, namely $\textit{goals}$, $\textit{intentions}$, and $\textit{beliefs}$, to automate a ToM-based computational process, which, by contrast, has been extensively studied by logicians. In this paper, we offer a different perspective by proposing a computational framework viewed through the lens of game theory. On the one hand, the framework prescribes how to make boudedly rational decisions while maintaining a theory of mind about others (and recursively, each of the others holding a theory of mind about the rest); on the other hand, it employs statistical techniques and approximate solutions to retain computability of the inherent computational problem.
The vertebrate motor system employs dimensionality-reducing strategies to limit the complexity of movement coordination, for efficient motor control. But when environments are dense with hidden action-outcome contingencies, movement complexity can promote behavioral innovation. Humans, perhaps uniquely, may infer the presence of hidden environmental dynamics from social cues, by drawing upon computational mechanisms shared with Theory of Mind. This proposed "Theory of Environment" supports behavioral innovation by expanding the dimensionality of motor exploration.
Socially assistive robots are increasingly being explored to improve the engagement of older adults and people with disability in health and well-being-related exercises. However, even if people have various physical conditions, most prior work on social robot exercise coaching systems has utilized generic, predefined feedback. The deployment of these systems still remains a challenge. In this paper, we present our work of iteratively engaging therapists and post-stroke survivors to design, develop, and evaluate a social robot exercise coaching system for personalized rehabilitation. Through interviews with therapists, we designed how this system interacts with the user and then developed an interactive social robot exercise coaching system. This system integrates a neural network model with a rule-based model to automatically monitor and assess patients' rehabilitation exercises and can be tuned with individual patient's data to generate real-time, personalized corrective feedback for improvement. With the dataset of rehabilitation exercises from 15 post-stroke survivors, we demonstrated our system significantly improves its performance to assess patients' exercises while tuning with held-out patient's data. In addition, our real-world evaluation study showed that our system can adapt to new participants and achieved 0.81 average performance to assess their exercises, which is comparable to the experts' agreement level. We further discuss the potential benefits and limitations of our system in practice.
The ability to interpret the mental state of another agent based on its behavior, also called Theory of Mind (ToM), is crucial for humans in any kind of social interaction. Artificial systems, such as intelligent assistants, would also greatly benefit from such mentalizing capabilities. However, humans and systems alike are bound by limitations in their available computational resources. This raises the need for satisficing mentalizing, reconciling accuracy and efficiency in mental state inference that is good enough for a given situation. In this paper, we present different Bayesian models of ToM reasoning and evaluate them based on actual human behavior data that were generated under different kinds of uncertainties. We propose a Switching approach that combines specialized models, embodying simplifying presumptions, in order to achieve a more statisficing mentalizing compared to a Full Bayesian ToM model.
Theory of mind (ToM) enables AI systems to infer agents' hidden goals and mental states, but existing approaches focus mainly on small human understandable gridworld spaces. We introduce HiVAE, a hierarchical variational architecture that scales ToM reasoning to realistic spatiotemporal domains. Inspired by the belief-desire-intention structure of human cognition, our three-level VAE hierarchy achieves substantial performance improvements on a 3,185-node campus navigation task. However, we identify a critical limitation: while our hierarchical structure improves prediction, learned latent representations lack explicit grounding to actual mental states. We propose self-supervised alignment strategies and present this work to solicit community feedback on grounding approaches.
Facial Expression Recognition (FER) plays a foundational role in enabling AI systems to interpret emotional nuances, a critical aspect of affective Theory of Mind (ToM). However, existing models often struggle with poor calibration and a limited capacity to capture emotional intensity and complexity. To address this, we propose Ranking the Emotional Nuance for Theory of Mind (Rank-O-ToM), a framework that leverages ordinal ranking to align confidence levels with the emotional spectrum. By incorporating synthetic samples reflecting diverse affective complexities, Rank-O-ToM enhances the nuanced understanding of emotions, advancing AI's ability to reason about affective states.
While generative artificial intelligence (Gen AI) increasingly transforms academic environments, a critical gap exists in understanding and mitigating human biases in AI interactions, such as anchoring and confirmation bias. This position paper advocates for metacognitive AI literacy interventions to help university students critically engage with AI and address biases across the Human-AI interaction workflows. The paper presents the importance of considering (1) metacognitive support with deliberate friction focusing on human bias; (2) bi-directional Human-AI interaction intervention addressing both input formulation and output interpretation; and (3) adaptive scaffolding that responds to diverse user engagement patterns. These frameworks are illustrated through ongoing work on "DeBiasMe," AIED (AI in Education) interventions designed to enhance awareness of cognitive biases while empowering user agency in AI interactions. The paper invites multiple stakeholders to engage in discussions on design and evaluation methods for scaffolding mechanisms, bias visualization, and analysis frameworks. This position contributes to the emerging field of AI-augmented learning by emphasizing the critical role of metacognition in helping students navigate the complex interaction between human, statistical, and systemic biases in AI use while highlighting how cognitive adaptation to AI systems must be explicitly integrated into comprehensive AI literacy frameworks.
In this work, we present ICEBOAT an interactive tool that enables automotive UX experts to explore how users interact with In-vehicle Information Systems. Based on large naturalistic driving data continuously collected from production line vehicles, ICEBOAT visualizes drivers' interactions and driving behavior on different levels of detail. Hence, it allows to easily compare different user flows based on performance- and safety-related metrics.
A single digital newsletter usually contains many messages (regions). Users' reading time spent on, and read level (skip/skim/read-in-detail) of each message is important for platforms to understand their users' interests, personalize their contents, and make recommendations. Based on accurate but expensive-to-collect eyetracker-recorded data, we built models that predict per-region reading time based on easy-to-collect Javascript browser tracking data. With eye-tracking, we collected 200k ground-truth datapoints on participants reading news on browsers. Then we trained machine learning and deep learning models to predict message-level reading time based on user interactions like mouse position, scrolling, and clicking. We reached 27\% percentage error in reading time estimation with a two-tower neural network based on user interactions only, against the eye-tracking ground truth data, while the heuristic baselines have around 46\% percentage error. We also discovered the benefits of replacing per-session models with per-timestamp models, and adding user pattern features. We concluded with suggestions on developing message-level reading estimation techniques based on available data.
Human-machine interfaces (HMI) facilitate communication between humans and machines, and their importance has increased in modern technology. However, traditional HMIs are often static and do not adapt to individual user preferences or behavior. Adaptive User Interfaces (AUIs) have become increasingly important in providing personalized user experiences. Machine learning techniques have gained traction in User Experience (UX) research to provide smart adaptations that can reduce user cognitive load. This paper presents an ongoing exploration of a method for generating adaptive user interfaces by analyzing user interactions and contextual data. It also provides an illustrative example using Markov chains to predict the next step for users interacting with an app for an industrial mixing machine. Furthermore, the paper conducts an offline evaluation of the approach, focusing on the precision of the recommendations. The study emphasizes the importance of incorporating user interactions and contextual data into the design of adaptive HMIs, while acknowledging the existing challenges and potential benefits.
Human cognition is constrained by processing limitations, leading to cognitive overload and inefficiencies in knowledge synthesis and decision-making. Large Language Models (LLMs) present an opportunity for cognitive augmentation, but their current reactive nature limits their real-world applicability. This position paper explores the potential of context-aware cognitive augmentation, where LLMs dynamically adapt to users' cognitive states and task environments to provide appropriate support. Through a think-aloud study in an exhibition setting, we examine how individuals interact with multi-modal information and identify key cognitive challenges in structuring, retrieving, and applying knowledge. Our findings highlight the need for AI-driven cognitive support systems that integrate real-time contextual awareness, personalized reasoning assistance, and socially adaptive interactions. We propose a framework for AI augmentation that seamlessly transitions between real-time cognitive support and post-experience knowledge organization, contributing to the design of more effective human-centered AI systems.
This study investigates the impact of the Degree of Interactivity on User Experience (UX) and social acceptability (SA) in Mobile Augmented Reality (MAR) applications. As AR technologies become more prevalent, understanding how varying levels of interactivity influence both user perception and social dynamics is crucial for their design and adoption. Two commercially available MAR applications, IKEA and Virtlo, which differ significantly in their interactivity levels, were used to conduct a user study. The study examines how body movements required for interaction with AR content affect both UX and SA, shedding light on users' comfort levels and potential social barriers in public settings. The findings suggest a complex relationship between interactivity, perceived usability, and social considerations, emphasizing the need for a balanced design approach. This research provides valuable insights into the development of future AR applications by addressing not only usability but also the broader social implications of AR interactions. By integrating social acceptability into traditional UX evaluations, this study highlights its significance in ensuring the seamless integration of AR technologies into everyday environments.
We present the first study to explore the use of out-of-turn interaction in websites. Out-of-turn interaction is a technique which empowers the user to supply unsolicited information while browsing. This approach helps flexibly bridge any mental mismatch between the user and the website, in a manner fundamentally different from faceted browsing and site-specific search tools. We built a user interface (Extempore) which accepts out-of-turn input via voice or text; and employed it in a US congressional website, to determine if users utilize out-of-turn interaction for information-finding tasks, and their rationale for doing so. The results indicate that users are adept at discerning when out-of-turn interaction is necessary in a particular task, and actively interleaved it with browsing. However, users found cascading information across information-finding subtasks challenging. Therefore, this work not only improves our understanding of out-of-turn interaction, but also suggests further opportunities to enrich browsing experiences for users.
User Experience (UX) professionals need to be able to analyze large amounts of usage data on their own to make evidence-based design decisions. However, the design process for In-Vehicle Information Systems (IVIS) lacks data-driven support and effective tools for visualizing and analyzing user interaction data. Therefore, we propose ICEBOAT, an interactive visualization tool tailored to the needs of automotive UX experts to effectively and efficiently evaluate driver interactions with IVISs. ICEBOAT visualizes telematics data collected from production line vehicles, allowing UX experts to perform task-specific analyses. Following a mixed methods User-centered design (UCD) approach, we conducted an interview study (N=4) to extract the domain specific information and interaction needs of automotive UX experts and used a co-design approach (N=4) to develop an interactive analysis tool. Our evaluation (N=12) shows that ICEBOAT enables UX experts to efficiently generate knowledge that facilitates data-driven design decisions.
While user-modeling and recommender systems successfully utilize items like emails, news, and movies, they widely neglect mind-maps as a source for user modeling. We consider this a serious shortcoming since we assume user modeling based on mind maps to be equally effective as user modeling based on other items. Hence, millions of mind-mapping users could benefit from user-modeling applications such as recommender systems. The objective of this doctoral thesis is to develop an effective user-modeling approach based on mind maps. To achieve this objective, we integrate a recommender system in our mind-mapping and reference-management software Docear. The recommender system builds user models based on the mind maps, and recommends research papers based on the user models. As part of our research, we identify several variables relating to mind-map-based user modeling, and evaluate the variables' impact on user-modeling effectiveness with an offline evaluation, a user study, and an online evaluation based on 430,893 recommendations displayed to 4,700 users. We find, among others, that the number of analyzed nodes, modification time, visibility of nodes, relations between nodes, and number of children and siblings of a node affect the effectiveness of user modeling. When all variables are combined in a favorable way, this novel approach achieves click-through rates of 7.20%, which is nearly twice as effective as the best baseline. In addition, we show that user modeling based on mind maps performs about as well as user modeling based on other items, namely the research articles users downloaded or cited. Our findings let us to conclude that user modeling based on mind maps is a promising research field, and that developers of mind-mapping applications should integrate recommender systems into their applications. Such systems could create additional value for millions of mind-mapping users.
Increasingly complex and autonomous robots are being deployed in real-world environments with far-reaching consequences. High-stakes scenarios, such as emergency response or offshore energy platform and nuclear inspections, require robot operators to have clear mental models of what the robots can and can't do. However, operators are often not the original designers of the robots and thus, they do not necessarily have such clear mental models, especially if they are novice users. This lack of mental model clarity can slow adoption and can negatively impact human-machine teaming. We propose that interaction with a conversational assistant, who acts as a mediator, can help the user with understanding the functionality of remote robots and increase transparency through natural language explanations, as well as facilitate the evaluation of operators' mental models.
Background: A smartphone is a promising tool for daily cardiovascular measurement and mental stress monitoring. A smartphone camera-based PhotoPlethysmoGraphy (PPG) and a low-cost thermal camera can be used to create cheap, convenient and mobile monitoring systems. However, to ensure reliable monitoring results, a person has to remain still for several minutes while a measurement is being taken. This is very cumbersome and makes its use in real-life mobile situations quite impractical. Objective: We propose a system which combines PPG and thermography with the aim of improving cardiovascular signal quality and capturing stress responses quickly. Methods: Using a smartphone camera with a low cost thermal camera added on, we built a novel system which continuously and reliably measures two different types of cardiovascular events: i) blood volume pulse and ii) vasoconstriction/dilation-induced temperature changes of the nose tip. 17 healthy participants, involved in a series of stress-inducing mental workload tasks, measured their physiological responses to stressors over a short window of time (20 seconds) immediately after each task. Participants reported their level of perceived mental stress using a 10-cm Visual Analogue Scale (VAS). We used normalized K-means clustering to reduce interpersonal differences in the self-reported ratings. For the instant stress inference task, we built novel low-level feature sets representing variability of cardiovascular patterns. We then used the automatic feature learning capability of artificial Neural Networks (NN) to improve the mapping between the extracted set of features and the self-reported ratings. We compared our proposed method with existing hand-engineered features-based machine learning methods. Results, Conclusions: ... due to limited space here, we refer to our manuscript.
Artificial intelligence chatbots have achieved unprecedented adoption, with millions now using these systems for emotional support and companionship in contexts of widespread social isolation and capacity-constrained mental health services. While some users report psychological benefits, concerning edge cases are emerging, including reports of suicide, violence, and delusional thinking linked to perceived emotional relationships with chatbots. To understand this new risk profile we need to consider the interaction between human cognitive and emotional biases, and chatbot behavioural tendencies such as agreeableness (sycophancy) and adaptability (in-context learning). We argue that individuals with mental health conditions face increased risks of chatbot-induced belief destabilization and dependence, owing to altered belief-updating, impaired reality-testing, and social isolation. Current AI safety measures are inadequate to address these interaction-based risks. To address this emerging public health concern, we need coordinated action across clinical practice, AI development, and regulatory frameworks.
Mental models play an important role in whether user interaction with intelligent systems, such as dialog systems is successful or not. Adaptive dialog systems present the opportunity to align a dialog agent's behavior with heterogeneous user expectations. However, there has been little research into what mental models users form when interacting with a task-oriented dialog system, how these models affect users' interactions, or what role system adaptation can play in this process, making it challenging to avoid damage to human-AI partnership. In this work, we collect a new publicly available dataset for exploring user mental models about information seeking dialog systems. We demonstrate that users have a variety of conflicting mental models about such systems, the validity of which directly impacts the success of their interactions and perceived usability of system. Furthermore, we show that adapting a dialog agent's behavior to better align with users' mental models, even when done implicitly, can improve perceived usability, dialog efficiency, and success. To this end, we argue that implicit adaptation can be a valid strategy for task-oriented dialog systems, so long as developers first have a solid understanding of users' mental models.
Addressing the critical shortage of mental health resources for effective screening, diagnosis, and treatment remains a significant challenge. This scarcity underscores the need for innovative solutions, particularly in enhancing the accessibility and efficacy of therapeutic support. Embodied agents with advanced interactive capabilities emerge as a promising and cost-effective supplement to traditional caregiving methods. Crucial to these agents' effectiveness is their ability to simulate non-verbal behaviors, like backchannels, that are pivotal in establishing rapport and understanding in therapeutic contexts but remain under-explored. To improve the rapport-building capabilities of embodied agents we annotated backchannel smiles in videos of intimate face-to-face conversations over topics such as mental health, illness, and relationships. We hypothesized that both speaker and listener behaviors affect the duration and intensity of backchannel smiles. Using cues from speech prosody and language along with the demographics of the speaker and listener, we found them to contain significant predictors of the intensity of backchannel smiles. Based on our findings, we introduce backchannel smile production in embodied agents as a generation problem. Our attention-based generative model suggests that listener information offers performance improvements over the baseline speaker-centric generation approach. Conditioned generation using the significant predictors of smile intensity provides statistically significant improvements in empirical measures of generation quality. Our user study by transferring generated smiles to an embodied agent suggests that agent with backchannel smiles is perceived to be more human-like and is an attractive alternative for non-personal conversations over agent without backchannel smiles.
Understanding the quality of insight has become increasingly important with the trend of allowing users to post comments during visual exploration, yet approaches for qualifying insight are rare. This paper presents a case study to investigate the possibility of characterizing the quality of insight via the interactions performed. To do this, we devised the interaction of a visualization tool-MediSyn-for insight generation. MediSyn supports five types of interactions: selecting, connecting, elaborating, exploring, and sharing. We evaluated MediSyn with 14 participants by allowing them to freely explore the data and generate insights. We then extracted seven interaction patterns from their interaction logs and correlated the patterns to four aspects of insight quality. The results show the possibility of qualifying insights via interactions. Among other findings, exploration actions can lead to unexpected insights; the drill-down pattern tends to increase the domain values of insights. A qualitative analysis shows that using domain knowledge to guide exploration can positively affect the domain value of derived insights. We discuss the study's implications, lessons learned, and future research opportunities.
The capability of GenAI-based chatbots, such as ChatGPT and Gemini, has expanded quickly in recent years, turning them into GenAI Chatbot Ecosystems. Yet, users' understanding of how such ecosystems work remains unknown. In this paper, we investigate users' mental models of how GenAI Chatbot Ecosystems work. This is an important question because users' mental models guide their behaviors, including making decisions that impact their privacy. Through 21 semi-structured interviews, we uncovered users' four mental models towards first-party (e.g., Google Gemini) and third-party (e.g., ChatGPT) GenAI Chatbot Ecosystems. These mental models centered around the role of the chatbot in the entire ecosystem. We further found that participants held a more consistent and simpler mental model towards third-party ecosystems than the first-party ones, resulting in higher trust and fewer concerns towards the third-party ecosystems. We discuss the design and policy implications based on our results.
Recommendation models are predominantly trained using implicit user feedback, since explicit feedback is often costly to obtain. However, implicit feedback, such as clicks, does not always reflect users' real preferences. For example, a user might click on a news article because of its attractive headline, but end up feeling uncomfortable after reading the content. In the absence of explicit feedback, such erroneous implicit signals may severely mislead recommender systems. In this paper, we propose MTRec, a novel sequential recommendation framework designed to align with real user preferences by uncovering their internal satisfaction on recommended items. Specifically, we introduce a mental reward model to quantify user satisfaction and propose a distributional inverse reinforcement learning approach to learn it. The learned mental reward model is then used to guide recommendation models to better align with users' real preferences. Our experiments show that MTRec brings significant improvements to a variety of recommendation models. We also deploy MTRec on an industrial short video platform and observe a 7 percent increase in average user viewing time.
We introduce Language World Models, a class of language-conditional generative model which interpret natural language messages by predicting latent codes of future observations. This provides a visual grounding of the message, similar to an enhanced observation of the world, which may include objects outside of the listening agent's field-of-view. We incorporate this "observation" into a persistent memory state, and allow the listening agent's policy to condition on it, akin to the relationship between memory and controller in a World Model. We show this improves effective communication and task success in 2D gridworld speaker-listener navigation tasks. In addition, we develop two losses framed specifically for our model-based formulation to promote positive signalling and positive listening. Finally, because messages are interpreted in a generative model, we can visualize the model beliefs to gain insight into how the communication channel is utilized.
Healthcare workers (HCWs) encounter challenges in hospitals, such as retrieving medical supplies quickly from crash carts, which could potentially result in medical errors and delays in patient care. Robotic crash carts (RCCs) have shown promise in assisting healthcare teams during medical tasks through guided object searches and task reminders. Limited exploration has been done to determine what communication modalities are most effective and least disruptive to patient care in real-world settings. To address this gap, we conducted a between-subjects experiment comparing the RCC's verbal and non-verbal communication of object search with a standard crash cart in resuscitation scenarios to understand the impact of robot communication on workload and attitudes toward using robots in the workplace. Our findings indicate that verbal communication significantly reduced mental demand and effort compared to visual cues and with a traditional crash cart. Although frustration levels were slightly higher during collaborations with the robot compared to a traditional cart, these research insights provide valuable implications for human-robot teamwork in high-stakes environments.
Scientists have traditionally limited the mechanisms of social cognition to one brain, but recent approaches claim that interaction also realizes cognitive work. Experiments under constrained virtual settings revealed that interaction dynamics implicitly guide social cognition. Here we show that embodied social interaction can be constitutive of agency detection and of experiencing another`s presence. Pairs of participants moved their "avatars" along an invisible virtual line and could make haptic contact with three identical objects, two of which embodied the other`s motions, but only one, the other`s avatar, also embodied the other`s contact sensor and thereby enabled responsive interaction. Co-regulated interactions were significantly correlated with identifications of the other`s avatar and reports of the clearest awareness of the other`s presence. These results challenge folk psychological notions about the boundaries of mind, but make sense from evolutionary and developmental perspectives: an extendible mind can offload cognitive work into its environment.
All online sharing systems gather data that reflects users' collective behaviour and their shared activities. This data can be used to extract different kinds of relationships, which can be grouped into layers, and which are basic components of the multidimensional social network proposed in the paper. The layers are created on the basis of two types of relations between humans, i.e. direct and object-based ones which respectively correspond to either social or semantic links between individuals. For better understanding of the complexity of the social network structure, layers and their profiles were identified and studied on two, spanned in time, snapshots of the Flickr population. Additionally, for each layer, a separate strength measure was proposed. The experiments on the Flickr photo sharing system revealed that the relationships between users result either from semantic links between objects they operate on or from social connections of these users. Moreover, the density of the social network increases in time. The second part of the study is devoted to building a social recommender system that supports the creation of new relations between users in a multimedia sharing system. Its main goal is to generate personalized suggestions that are continuously adapted to users' needs depending on the personal weights assigned to each layer in the multidimensional social network. The conducted experiments confirmed the usefulness of the proposed model.
The Memory-Centred Cognition perspective places an active association substrate at the heart of cognition, rather than as a passive adjunct. Consequently, it places prediction and priming on the basis of prior experience to be inherent and fundamental aspects of processing. Social interaction is taken here to minimally require contingent and co-adaptive behaviours from the interacting parties. In this contribution, I seek to show how the memory-centred cognition approach to cognitive architectures can provide an means of addressing these functions. A number of example implementations are briefly reviewed, particularly focusing on multi-modal alignment as a function of experience-based priming. While there is further refinement required to the theory, and implementations based thereon, this approach provides an interesting alternative perspective on the foundations of cognitive architectures to support robots engage in social interactions with humans.
We present a discovery-based, first version, explicit model of social interaction that provides a basis for measuring the quality of interaction of a human user with a social robot. The two core elements of the social interaction model are engagement and co-regulation. Engagement emphasizes the \textit{qualitative nature} of social interaction and the fact that a user needs to be drawn into the interaction with the robot. Co-regulation emphasizes the interaction process and the fact that a user and a robot need to be acting together. We argue that the quality of social interaction with a robot can be measured in terms of how efficiently engagement and co-regulation are established and maintained during the interaction and how satisfied the user is with the interaction.
AI-generated images are increasingly prevalent on social media, raising concerns about trust and authenticity. This study investigates how different levels of label detail (basic, moderate, maximum) and content stakes (high vs. low) influence user engagement with and perceptions of AI-generated images through a within-subjects experimental study with 105 participants. Our findings reveal that increasing label detail enhances user perceptions of label transparency but does not affect user engagement. However, content stakes significantly impact user engagement and perceptions, with users demonstrating higher engagement and trust in low-stakes images. These results suggest that social media platforms can adopt detailed labels to improve transparency without compromising user engagement, offering insights for effective labeling strategies for AI-generated content.
The rise of social media has fundamentally transformed how people engage in public discourse and form opinions. While these platforms offer unprecedented opportunities for democratic engagement, they have been implicated in increasing social polarization and the formation of ideological echo chambers. Previous research has primarily relied on observational studies of social media data or theoretical modeling approaches, leaving a significant gap in our understanding of how individuals respond to and are influenced by polarized online environments. Here we present a novel experimental framework for investigating polarization dynamics that allows human users to interact with LLM-based artificial agents in a controlled social network simulation. Through a user study with 122 participants, we demonstrate that this approach can successfully reproduce key characteristics of polarized online discourse while enabling precise manipulation of environmental factors. Our results provide empirical validation of theoretical predictions about online polarization, showing that polarized environments significantly increase perceived emotionality and group identity salience while reducing expressed uncertainty. These findings extend previous observational and theoretical work by providing causal evidence for how specific features of online environments influence user perceptions and behaviors. More broadly, this research introduces a powerful new methodology for studying social media dynamics, offering researchers unprecedented control over experimental conditions while maintaining ecological validity.
Social video platforms shape how people access information, while recommendation systems can narrow exposure and increase the risk of toxic interaction. Previous research has often examined text or users in isolation, overlooking the structural context in which such toxic interactions occur. Without considering who interacts with whom and around what content, it is difficult to explain why negative expressions cluster within particular communities. To address this issue, this study focuses on the Chinese social video platform Bilibili, incorporating video-level information as the environment for user expression, modeling users and videos in an interaction matrix. After normalization and dimensionality reduction, we perform separate clustering on both sides of the video-user interaction matrix with K-means. Cluster assignments facilitate comparisons of user behavior, including message length, posting frequency, and source (barrage and comment), as well as textual features such as sentiment and toxicity, and video attributes defined by uploaders. Such a clustering approach integrates structural ties with content signals to identify stable groups of videos and users. We find clear stratification in interaction style (message length, comment ratio) across user clusters, while sentiment and toxicity differences are weak or inconsistent across video clusters. Across video clusters, viewing volume exhibits a clear hierarchy, with higher exposure groups concentrating more toxic expressions. For such a group, platforms should require timely intervention during periods of rapid growth. Across user clusters, comment ratio and message length form distinct hierarchies, and several clusters with longer and comment-oriented messages exhibit lower toxicity. For such groups, platforms should strengthen mechanisms that sustain rational dialogue and encourage engagement across topics.
An increasing portion of modern socializing takes place via online social networks. Members of these communities often play distinct roles that can be deduced from observations of users' online activities. One such activity is the sharing of multimedia, the popularity of which can vary dramatically. Here we discuss our initial analysis of anonymized, scraped data from consenting Facebook users, together with associated demographic and psychological profiles. We present five clusters of users with common observed online behaviors, where these users also show correlated profile characteristics. Finally, we identify some common properties of the most popular multimedia content.
This paper deals with actual fuzzy logic approach for modelling the behavior classification of social news aggregations users. The peculiarities of the structure of informational content of communities on the basis of social news aggregations are explored. A formal model of social news aggregation model has been developed, which includes user of the social news aggregation on the basis of fuzzy measures of its characteristics. The method of behavioral classification of users and methods for structuring sections and discussions of social news aggregations are developed. The methods for determining the main characteristics of the users of the social news aggregation: activeness, creativeness, attractiveness, reactiveness, loyalty, is developed. Method for defining characteristics and classification of social news aggregations users is presented.
Digital social media platforms frequently contribute to cognitive-behavioral fixation, a phenomenon in which users exhibit sustained and repetitive engagement with narrow content domains. While cognitive-behavioral fixation has been extensively studied in psychology, methods for computationally detecting and evaluating such fixation remain underexplored. To address this gap, we propose a novel framework for assessing cognitive-behavioral fixation by analyzing users' multimodal social media engagement patterns. Specifically, we introduce a multimodal topic extraction module and a cognitive-behavioral fixation quantification module that collaboratively enable adaptive, hierarchical, and interpretable assessment of user behavior. Experiments on existing benchmarks and a newly curated multimodal dataset demonstrate the effectiveness of our approach, laying the groundwork for scalable computational analysis of cognitive fixation. All code in this project is publicly available for research purposes at https://github.com/Liskie/cognitive-fixation-evaluation.
Conventional economic and socio-behavioural models assume perfect symmetric access to information and rational behaviour among interacting agents in a social system. However, real-world events and observations appear to contradict such assumptions, leading to the possibility of other, more complex interaction rules existing between such agents. We investigate this possibility by creating two different models for a doctor-patient system. One retains the established assumptions, while the other incorporates principles of reflexivity theory and cognitive social structures. In addition, we utilize a microbial genetic algorithm to optimize the behaviour of the physician and patient agents in both models. The differences in results for the two models suggest that social systems may not always exhibit the behaviour or even accomplish the purpose for which they were designed and that modelling the social and cognitive influences in a social system may capture various ways a social agent balances complementary and competing information signals in making choices.
Information in networks is non-uniformly distributed, enabling individuals in certain network positions to get preferential access to information. Social scientists have developed influential theories about the role of network structure in information access. These theories were validated through numerous studies, which examined how individuals leverage their social networks for competitive advantage, such as a new job or higher compensation. It is not clear how these theories generalize to online networks, which differ from real-world social networks in important respects, including asymmetry of social links. We address this problem by analyzing how users of the social news aggregator Digg adopt stories recommended by friends, i.e., users they follow. We measure the impact different factors, such as network position and activity rate; have on access to novel information, which in Digg's case means set of distinct news stories. We show that a user can improve his information access by linking to active users, though this becomes less effective as the number of friends, or their activity, grows due to structural network constraints. These constraints arise because users in structurally diverse position within the follower graph have topically diverse interests from their friends. Moreover, though in most cases user's friends are exposed to almost all the information available in the network, after they make their recommendations, the user sees only a small fraction of the available information. Our study suggests that cognitive and structural bottlenecks limit access to novel information in online social networks.
Our study presents a multifaceted approach to enhancing user interaction and content relevance in social media platforms through a federated learning framework. We introduce personalized GPT and Context-based Social Media LLM models, utilizing federated learning for privacy and security. Four client entities receive a base GPT-2 model and locally collected social media data, with federated aggregation ensuring up-to-date model maintenance. Subsequent modules focus on categorizing user posts, computing user persona scores, and identifying relevant posts from friends' lists. A quantifying social engagement approach, coupled with matrix factorization techniques, facilitates personalized content suggestions in real-time. An adaptive feedback loop and readability score algorithm also enhance the quality and relevance of content presented to users. Our system offers a comprehensive solution to content filtering and recommendation, fostering a tailored and engaging social media experience while safeguarding user privacy.
Social interactions promote well-being, yet barriers like geographic distance, time limitations, and mental health conditions can limit face-to-face interactions. Emotionally responsive AI systems, such as chatbots, offer new opportunities for social and emotional support, but raise critical questions about how empathy is perceived and experienced in human-AI interactions. This study examines how empathy is evaluated in AI-generated versus human responses. Using personal narratives, we explored how persona attributes (e.g., gender, empathic traits, shared experiences) and story qualities affect empathy ratings. We compared responses from standard and fine-tuned AI models with human judgments. Results show that while humans are highly sensitive to emotional vividness and shared experience, AI-responses are less influenced by these cues, often lack nuance in empathic expression. These findings highlight challenges in designing emotionally intelligent systems that respond meaningfully across diverse users and contexts, and informs the design of ethically aware tools to support social connection and well-being.
Cognitive empathy, the ability to understand others' perspectives, is essential for effective communication, reducing biases, and constructive negotiation. However, this skill is declining in a performance-driven society, which prioritizes efficiency over perspective-taking. Here, the training of cognitive empathy is challenging because it is a subtle, hard-to-perceive soft skill. To address this, we developed CoEmpaTeam, a VR-based system that enables users to train their cognitive empathy by using LLM-driven avatars with different personalities. Through dynamic role play, users actively engage in perspective-taking, experiencing situations through another person's eyes. CoEmpaTeam deploys three avatars who significantly differ in their personality, validated by a technical evaluation and an online experiment (n=90). Next, we evaluated the system through a lab experiment with 32 participants who performed three sessions across two weeks, followed by a one-week diary study. Our results showed a significant increase in cognitive empathy, which, according to participants, transferred into their real lives.
In the digital era, social media has become a major conduit for information dissemination, yet it also facilitates the rapid spread of misinformation. Traditional misinformation detection methods primarily focus on surface-level features, overlooking the crucial roles of human empathy in the propagation process. To address this gap, we propose the Dual-Aspect Empathy Framework (DAE), which integrates cognitive and emotional empathy to analyze misinformation from both the creator and reader perspectives. By examining creators' cognitive strategies and emotional appeals, as well as simulating readers' cognitive judgments and emotional responses using Large Language Models (LLMs), DAE offers a more comprehensive and human-centric approach to misinformation detection. Moreover, we further introduce an empathy-aware filtering mechanism to enhance response authenticity and diversity. Experimental results on benchmark datasets demonstrate that DAE outperforms existing methods, providing a novel paradigm for multimodal misinformation detection.
Virtual and robotic agents capable of perceiving human empathy have the potential to participate in engaging and meaningful human-machine interactions that support human well-being. Prior research in computational empathy has focused on designing empathic agents that use verbal and nonverbal behaviors to simulate empathy and attempt to elicit empathic responses from humans. The challenge of developing agents with the ability to automatically perceive elicited empathy in humans remains largely unexplored. Our paper presents the first approach to modeling user empathy elicited during interactions with a robotic agent. We collected a new dataset from the novel interaction context of participants listening to a robot storyteller (46 participants, 6.9 hours of video). After each storytelling interaction, participants answered a questionnaire that assessed their level of elicited empathy during the interaction with the robot. We conducted experiments with 8 classical machine learning models and 2 deep learning models (long short-term memory networks and temporal convolutional networks) to detect empathy by leveraging patterns in participants' visual behaviors while they were listening to the robot storyteller. Our highest-performing approach, based on XGBoost, achieved an accuracy of 69% and AUC of 72% when detecting empathy in videos. We contribute insights regarding modeling approaches and visual features for automated empathy detection. Our research informs and motivates future development of empathy perception models that can be leveraged by virtual and robotic agents during human-machine interactions.
Empathy is increasingly recognized as a key factor in human-AI communication, yet conventional approaches to "digital empathy" often focus on simulating internal, human-like emotional states while overlooking the inherently subjective, contextual, and relational facets of empathy as perceived by users. In this work, we propose a human-centered taxonomy that emphasizes observable empathic behaviors and introduce a new dataset, Sense-7, of real-world conversations between information workers and Large Language Models (LLMs), which includes per-turn empathy annotations directly from the users, along with user characteristics, and contextual details, offering a more user-grounded representation of empathy. Analysis of 695 conversations from 109 participants reveals that empathy judgments are highly individualized, context-sensitive, and vulnerable to disruption when conversational continuity fails or user expectations go unmet. To promote further research, we provide a subset of 672 anonymized conversation and provide exploratory classification analysis, showing that an LLM-based classifier can recognize 5 levels of empathy with an encouraging average Spearman $ρ$=0.369 and Accuracy=0.487 over this set. Overall, our findings underscore the need for AI designs that dynamically tailor empathic behaviors to user contexts and goals, offering a roadmap for future research and practical development of socially attuned, human-centered artificial agents.
The paper presents an experiment on the effects of adaptive emotional alignment between agents, considered a prerequisite for empathic communication, in Human-Robot Interaction (HRI). Using the NAO robot, we investigate the impact of an emotionally aligned, empathic, dialogue on these aspects: (i) the robot's persuasive effectiveness, (ii) the user's communication style, and (iii) the attribution of mental states and empathy to the robot. In an experiment with 42 participants, two conditions were compared: one with neutral communication and another where the robot provided responses adapted to the emotions expressed by the users. The results show that emotional alignment does not influence users' communication styles or have a persuasive effect. However, it significantly influences attribution of mental states to the robot and its perceived empathy
The double empathy problem recasts the difficulty of forming empathy bonds in social interactions between autistic and neurotypical individuals as a bidirectional problem, rather than due to a deficit exclusive to the person on the spectrum. However, no explicit mechanism to explain such a phenomenon has been proposed. Here we build a feedback-loop mathematical model that would theoretically induce the empathy degradation observed during communication in neurotypical-autistic pairs solely due to differences in communication preferences between neurotypical and neurodivergent individuals. Numerical simulations of dyadic interactions show the model, whose mechanism is based solely on communication preferences, can illustrate the breakdown of empathic bonding observed clinically. Stability analysis of the model provides a way to predict the overall trajectory of the interaction in the empathy space. Furthermore, we suggest experimental designs to measure several parameters outlined here and discuss the future directions for testing the proposed model.
The conventional GUI is more mechanical and does not recognize or communicate emotions. The modern GUIs are trying to infer the likely emotional state and personality of the user and communicate through a corresponding emotional state. Emotions are expressed in graphical icons, sounds, pictures and other means. The emotions are found to be useful in especially in communication software, interactive learning systems, robotics and other adaptive environments. Various mechanisms have been developed to express emotions through graphical user interfaces. This article illustrates some interesting inventions selected from US patent database.
Although a GUI largely replaces textual descriptions by graphical icons, the textual items are not completely removed. The textual items are inevitably used in window titles, message boxes, help items, menu items and popup items. Textual items are necessary for communicating messages that are beyond the limitation of graphical messages. However, it is necessary to harness the textual items on the graphical interface in such a way that they complement each other to produce the best effect. One has to keep various considerations in mind while applying textual items in Graphical User Interface. This article illustrates a few inventions on presenting textual items in a Graphical user Interface.
To facilitate high quality interaction during the regular use of computing systems, it is essential that the user interface (UI) deliver content and components in an appropriate manner. Although extended reality (XR) is emerging as a new computing platform, we still have a limited understanding of how best to design and present interactive content to users in such immersive environments. Adaptive UIs offer a promising approach for optimal presentation in XR as the user's environment, tasks, capabilities, and preferences vary under changing context. In this position paper, we present a design framework for adapting various characteristics of content presented in XR. We frame these as five considerations that need to be taken into account for adaptive XR UIs: What?, How Much?, Where?, How?, and When?. With this framework, we review literature on UI design and adaptation to reflect on approaches that have been adopted or developed in the past towards identifying current gaps and challenges, and opportunities for applying such approaches in XR. Using our framework, future work could identify and develop novel computational approaches for achieving successful adaptive user interfaces in such immersive environments.
Interactive Task Learning (ITL) systems acquire task knowledge from human instructions in natural language interaction. The interaction design of ITL agents for hierarchical tasks stays uncharted. This paper studied Verbal Apprentice Learner(VAL) for gaming, as an ITL example, and qualitatively analyzed the user study data to provide design insights on dialogue language types, task instruction strategies, and error handling. We then proposed an interface design: Editable Hierarchy Knowledge (EHK), as a generic probe for ITL systems for hierarchical tasks.
Based on the fundamental constraints of natural way of interacting such as speech, touch, contextual and environmental awareness,immersive 3D experiences-all with a goal of a computer that can see listen, learn talk and act. We drive a set of trends prevailing for the next generation of user interface: Natural User Interface (NUI).New technologies are pushing the boundaries of what is possible without touching or clicking an interface- paving the way of interaction to information visualization and opportunities in human towards more natural interaction than ever before. In this paper we consider the trends in computer interaction through that must be taken into consideration to come up-in the near future with a well-designed-NUI.
Although large language models (LLMs) have achieved significant success in natural language processing, they still struggle with long-context comprehension. Traditional approaches to mitigating this issue typically rely on fine-tuning or retraining, which is both resource-intensive and challenging to deploy in lightweight industrial settings. In this paper, we investigate the potential to accomplish this without any additional resources. Through an in-depth study of the attention mechanism in LLMs, we propose a method called Scaled ReAttention (SRA) to strengthen LLMs' ability to interpret and retrieve information by strategically manipulating their attention scores during inference. Through extensive experiments, we demonstrate that integrating SRA significantly boosts LLMs' performance on a variety of downstream tasks, highlighting its practical potential for enhancing language understanding without incurring the overhead of traditional training.
This study presents a novel approach for intelligent user interaction interface generation and optimization, grounded in the variational autoencoder (VAE) model. With the rapid advancement of intelligent technologies, traditional interface design methods struggle to meet the evolving demands for diversity and personalization, often lacking flexibility in real-time adjustments to enhance the user experience. Human-Computer Interaction (HCI) plays a critical role in addressing these challenges by focusing on creating interfaces that are functional, intuitive, and responsive to user needs. This research leverages the RICO dataset to train the VAE model, enabling the simulation and creation of user interfaces that align with user aesthetics and interaction habits. By integrating real-time user behavior data, the system dynamically refines and optimizes the interface, improving usability and underscoring the importance of HCI in achieving a seamless user experience. Experimental findings indicate that the VAE-based approach significantly enhances the quality and precision of interface generation compared to other methods, including autoencoders (AE), generative adversarial networks (GAN), conditional GANs (cGAN), deep belief networks (DBN), and VAE-GAN. This work contributes valuable insights into HCI, providing robust technical solutions for automated interface generation and enhanced user experience optimization.
Interactive Machine Learning is concerned with creating systems that operate in environments alongside humans to achieve a task. A typical use is to extend or amplify the capabilities of a human in cognitive or physical ways, requiring the machine to adapt to the users' intentions and preferences. Often, this takes the form of a human operator providing some type of feedback to the user, which can be explicit feedback, implicit feedback, or a combination of both. Explicit feedback, such as through a mouse click, carries a high cognitive load. The focus of this study is to extend the current state of the art in interactive machine learning by demonstrating that agents can learn a human user's behavior and adapt to preferences with a reduced amount of explicit human feedback in a mixed feedback setting. The learning agent perceives a value of its own behavior from hand gestures given via a spatial interface. This feedback mechanism is termed Spatial Interface Valuing. This method is evaluated experimentally in a simulated environment for a grasping task using a robotic arm with variable grip settings. Preliminary results indicate that learning agents using spatial interface valuing can learn a value function mapping spatial gestures to expected future rewards much more quickly as compared to those same agents just receiving explicit feedback, demonstrating that an agent perceiving feedback from a human user via a spatial interface can serve as an effective complement to existing approaches.
The evolution of Large Language Models (LLMs) has showcased remarkable capacities for logical reasoning and natural language comprehension. These capabilities can be leveraged in solutions that semantically and textually model complex problems. In this paper, we present our efforts toward constructing a framework that can serve as an intermediary between a user and their user interface (UI), enabling dynamic and real-time interactions. We employ a system that stands upon textual semantic mappings of UI components, in the form of annotations. These mappings are stored, parsed, and scaled in a custom data structure, supplementary to an agent-based prompting backend engine. Employing textual semantic mappings allows each component to not only explain its role to the engine but also provide expectations. By comprehending the needs of both the user and the components, our LLM engine can classify the most appropriate application, extract relevant parameters, and subsequently execute precise predictions of the user's expected actions. Such an integration evolves static user interfaces into highly dynamic and adaptable solutions, introducing a new frontier of intelligent and responsive user experiences.
The research field of end-user programming has largely been concerned with helping non-experts learn to code sufficiently well in order to achieve their tasks. Generative AI stands to obviate this entirely by allowing users to generate code from naturalistic language prompts. In this essay, we explore the extent to which "traditional" programming languages remain relevant for non-expert end-user programmers in a world with generative AI. We posit the "generative shift hypothesis": that generative AI will create qualitative and quantitative expansions in the traditional scope of end-user programming. We outline some reasons that traditional programming languages may still be relevant and useful for end-user programmers. We speculate whether each of these reasons might be fundamental and enduring, or whether they may disappear with further improvements and innovations in generative AI. Finally, we articulate a set of implications for end-user programming research, including the possibility of needing to revisit many well-established core concepts, such as Ko's learning barriers and Blackwell's attention investment model.
Spatial computing experiences are physically constrained by the geometry and semantics of the local user environment. This limitation is elevated in remote multi-user interaction scenarios, where finding a common virtual ground physically accessible for all participants becomes challenging. Locating a common accessible virtual ground is difficult for the users themselves, particularly if they are not aware of the spatial properties of other participants. In this paper, we introduce a framework to generate an optimal mutual virtual space for a multi-user interaction setting where remote users' room spaces can have different layout and sizes. The framework further recommends movement of surrounding furniture objects that expand the size of the mutual space with minimal physical effort. Finally, we demonstrate the performance of our solution on real-world datasets and also a real HoloLens application. Results show the proposed algorithm can effectively discover optimal shareable space for multi-user virtual interaction and hence facilitate remote spatial computing communication in various collaborative workflows.
Online service platforms (OSPs), such as search engines, news-websites, ad-providers, etc., serve highly pe rsonalized content to the user, based on the profile extracted from his history with the OSP. Although personalization (generally) leads to a better user experience, it also raises privacy concerns for the user---he does not know what is present in his profile and more importantly, what is being used to per sonalize content for him. In this paper, we capture OSP's personalization for an user in a new data structure called the person alization vector ($η$), which is a weighted vector over a set of topics, and present techniques to compute it for users of an OSP. Our approach treats OSPs as black-boxes, and extracts $η$ by mining only their output, specifical ly, the personalized (for an user) and vanilla (without any user information) contents served, and the differences in these content. We formulate a new model called Latent Topic Personalization (LTP) that captures the personalization vector into a learning framework and present efficient inference algorithms for it. We do extensive experiments for search result personalization using both data from real Google users and synthetic datasets. Our results show high accuracy (R-pre = 84%) of LTP in finding personalized topics. For Google data, our qualitative results show how LTP can also identifies evidences---queries for results on a topic with high $η$ value were re-ranked. Finally, we show how our approach can be used to build a new Privacy evaluation framework focused at end-user privacy on commercial OSPs.
Despite the acknowledgment that the perception of explanations may vary considerably between end-users, explainable recommender systems (RS) have traditionally followed a one-size-fits-all model, whereby the same explanation level of detail is provided to each user, without taking into consideration individual user's context, i.e., goals and personal characteristics. To fill this research gap, we aim in this paper at a shift from a one-size-fits-all to a personalized approach to explainable recommendation by giving users agency in deciding which explanation they would like to see. We developed a transparent Recommendation and Interest Modeling Application (RIMA) that provides on-demand personalized explanations of the recommendations, with three levels of detail (basic, intermediate, advanced) to meet the demands of different types of end-users. We conducted a within-subject study (N=31) to investigate the relationship between user's personal characteristics and the explanation level of detail, and the effects of these two variables on the perception of the explainable RS with regard to different explanation goals. Our results show that the perception of explainable RS with different levels of detail is affected to different degrees by the explanation goal and user type. Consequently, we suggested some theoretical and design guidelines to support the systematic design of explanatory interfaces in RS tailored to the user's context.
Video summaries or highlights are a compelling alternative for exploring and contextualizing unprecedented amounts of video material. However, the summarization process is commonly automatic, non-transparent and potentially biased towards particular aspects depicted in the original video. Therefore, our aim is to help users like archivists or collection managers to quickly understand which summaries are the most representative for an original video. In this paper, we present empirical results on the utility of different types of visual explanations to achieve transparency for end users on how representative video summaries are, with respect to the original video. We consider four types of video summary explanations, which use in different ways the concepts extracted from the original video subtitles and the video stream, and their prominence. The explanations are generated to meet target user preferences and express different dimensions of transparency: concept prominence, semantic coverage, distance and quantity of coverage. In two user studies we evaluate the utility of the visual explanations for achieving transparency for end users. Our results show that explanations representing all of the dimensions have the highest utility for transparency, and consequently, for understanding the representativeness of video summaries.
Recommender systems play a vital role in helping users discover content in streaming services, but their effectiveness depends on users understanding why items are recommended. In this study, explanations were based solely on item features rather than personalized data, simulating recommendation scenarios. We compared user perceptions of one-sided (purely positive) and two-sided (positive and negative) feature-based explanations for popular movie recommendations. Through an online study with 129 participants, we examined how explanation style affected perceived trust, transparency, effectiveness, and satisfaction. One-sided explanations consistently received higher ratings across all dimensions. Our findings suggest that in low-stakes entertainment domains such as popular movie recommendations, simpler positive explanations may be more effective. However, the results should be interpreted with caution due to potential confounding factors such as item familiarity and the placement of negative information in explanations. This work provides practical insights for explanation design in recommender interfaces and highlights the importance of context in shaping user preferences.
This paper investigates the user experience of visualizations of a machine learning (ML) system that recognizes objects in images. This is important since even good systems can fail in unexpected ways as misclassifications on photo-sharing websites showed. In our study, we exposed users with a background in ML to three visualizations of three systems with different levels of accuracy. In interviews, we explored how the visualization helped users assess the accuracy of systems in use and how the visualization and the accuracy of the system affected trust and reliance. We found that participants do not only focus on accuracy when assessing ML systems. They also take the perceived plausibility and severity of misclassification into account and prefer seeing the probability of predictions. Semantically plausible errors are judged as less severe than errors that are implausible, which means that system accuracy could be communicated through the types of errors.
This paper presents a novel teachable conversation interaction system that is capable of learning users preferences from cold start by gradually adapting to personal preferences. In particular, the TAI system is able to automatically identify and label user preference in live interactions, manage dialogue flows for interactive teaching sessions, and reuse learned preference for preference elicitation. We develop the TAI system by leveraging BERT encoder models to encode both dialogue and relevant context information, and build action prediction (AP), argument filling (AF) and named entity recognition (NER) models to understand the teaching session. We adopt a seeker-provider interaction loop mechanism to generate diverse dialogues from cold-start. TAI is capable of learning user preference, which achieves 0.9122 turn level accuracy on out-of-sample dataset, and has been successfully adopted in production.
News recommendation and personalization is not a solved problem. People are growing concerned of their data being collected in excess in the name of personalization and the usage of it for purposes other than the ones they would think reasonable. Our experience in building personalization products for publishers while adhering to safeguard user privacy led us to investigate more on the user perspective of privacy and personalization. We conducted a survey to explore people's experience with personalization and privacy and the viewpoints of different age groups. In this paper, we share our major findings with publishers and the community that can inform algorithmic design and implementation of the next generation of news recommender systems, which must put the human at its core and reach a balance between personalization experiences and privacy to reap the benefits of both.
The suggestion of Points of Interest to people with Autism Spectrum Disorder (ASD) challenges recommender systems research because these users' perception of places is influenced by idiosyncratic sensory aversions which can mine their experience by causing stress and anxiety. Therefore, managing individual preferences is not enough to provide these people with suitable recommendations. In order to address this issue, we propose a Top-N recommendation model that combines the user's idiosyncratic aversions with her/his preferences in a personalized way to suggest the most compatible and likable Points of Interest for her/him. We are interested in finding a user-specific balance of compatibility and interest within a recommendation model that integrates heterogeneous evaluation criteria to appropriately take these aspects into account. We tested our model on both ASD and "neurotypical" people. The evaluation results show that, on both groups, our model outperforms in accuracy and ranking capability the recommender systems based on item compatibility, on user preferences, or which integrate these two aspects by means of a uniform evaluation model.
Personalizing Large Language Model (LLM) agents requires conditioning them on user-specific data, creating a critical trade-off between task utility and data disclosure. While the utility of adding user data often exhibits diminishing returns (i.e., submodularity), enabling near-optimal greedy selection, real-world personalization is complicated by structural constraints. These include logical dependencies (e.g., selecting fact A requires fact B), categorical quotas (e.g., select at most one writing style), and hierarchical rules (e.g., select at most two social media preferences, of which at most one can be for a professional network). These constraints violate the assumptions of standard subset selection algorithms. We propose a principled method to formally model such constraints. We introduce a compilation process that transforms a user's knowledge graph with dependencies into a set of abstract macro-facets. Our central result is a proof that common hierarchical and quota-based constraints over these macro-facets form a valid laminar matroid. This theoretical characterization lets us cast structured personalization as submodular maximization under a matroid constraint, enabling greedy with constant-factor guarantees (and (1-1/e) via continuous greedy) for a much richer and more realistic class of problems.
Novice and expert users have different systematic preferences in task-oriented dialogues. However, whether catering to these preferences actually improves user experience and task performance remains understudied. To investigate the effects of expertise-based personalization, we first built a version of an enterprise AI assistant with passive personalization. We then conducted a user study where participants completed timed exams, aided by the two versions of the AI assistant. Preliminary results indicate that passive personalization helps reduce task load and improve assistant perception, but reveal task-specific limitations that can be addressed through providing more user agency. These findings underscore the importance of combining active and passive personalization to optimize user experience and effectiveness in enterprise task-oriented environments.
Trust in a recommendation system (RS) is often algorithmically incorporated using implicit or explicit feedback of user-perceived trustworthy social neighbors, and evaluated using user-reported trustworthiness of recommended items. However, real-life recommendation settings can feature group disparities in trust, power, and prerogatives. Our study examines a complementary view of trust which relies on the editorial power relationships and attitudes of all stakeholders in the RS application domain. We devise a simple, first-principles metric of editorial authority, i.e., user preferences for recommendation sourcing, veto power, and incorporating user feedback, such that one RS user group confers trust upon another by ceding or assigning editorial authority. In a mixed-methods study at Virginia Tech, we surveyed faculty, teaching assistants, and students about their preferences of editorial authority, and hypothesis-tested its relationship with trust in algorithms for a hypothetical `Suggested Readings' RS. We discover that higher RS editorial authority assigned to students is linked to the relative trust the course staff allocates to RS algorithm and students. We also observe that course staff favors higher control for the RS algorithm in sourcing and updating the recommendations long-term. Using content analysis, we discuss frequent staff-recommended student editorial roles and highlight their frequent rationales, such as perceived expertise, scaling the learning environment, professional curriculum needs, and learner disengagement. We argue that our analyses highlight critical user preferences to help detect editorial power asymmetry and identify RS use-cases for supporting teaching and research
Personalized adaptation technology has been adopted in a wide range of digital applications such as health, training and education, e-commerce and entertainment. Personalization systems typically build a user model, aiming to characterize the user at hand, and then use this model to personalize the interaction. Personalization and user modeling, however, are often intrinsically at odds with each other (a fact some times referred to as the personalization paradox). In this paper, we take a closer look at this personalization paradox, and identify two ways in which it might manifest: feedback loops and moving targets. To illustrate these issues, we report results in the domain of personalized exergames (videogames for physical exercise), and describe our early steps to address some of the issues arisen by the personalization paradox.
Recommender systems are essential for guiding users through the vast and diverse landscape of digital content by delivering personalized and relevant suggestions. However, improving both personalization and interpretability remains a challenge, particularly in scenarios involving limited user feedback or heterogeneous item attributes. In this article, we propose a novel hybrid recommendation framework that combines Graph Attention Networks (GATs) with Large Language Models (LLMs) to address these limitations. LLMs are first used to enrich user and item representations by generating semantically meaningful profiles based on metadata such as titles, genres, and overviews. These enriched embeddings serve as initial node features in a user and movie bipartite graph, which is processed using a GAT based collaborative filtering model. To enhance ranking accuracy, we introduce a hybrid loss function that combines Bayesian Personalized Ranking (BPR), cosine similarity, and robust negative sampling. Post-processing involves reranking the GAT-generated recommendations using the LLM, which also generates natural-language justifications to improve transparency. We evaluated our model on benchmark datasets, including MovieLens 100k and 1M, where it consistently outperforms strong baselines. Ablation studies confirm that LLM-based embeddings and the cosine similarity term significantly contribute to performance gains. This work demonstrates the potential of integrating LLMs to improve both the accuracy and interpretability of recommender systems.
Personas are models of users that incorporate motivations, wishes, and objectives; These models are employed in user-centred design to help design better user experiences and have recently been employed in adaptive systems to help tailor the personalized user experience. Designing with personas involves the production of descriptions of fictitious users, which are often based on data from real users. The majority of data-driven persona development performed today is based on qualitative data from a limited set of interviewees and transformed into personas using labour-intensive manual techniques. In this study, we propose a method that employs the modelling of user stereotypes to automate part of the persona creation process and addresses the drawbacks of the existing semi-automated methods for persona development. The description of the method is accompanied by an empirical comparison with a manual technique and a semi-automated alternative (multiple correspondence analysis). The results of the comparison show that manual techniques differ between human persona designers leading to different results. The proposed algorithm provides similar results based on parameter input, but was more rigorous and will find optimal clusters, while lowering the labour associated with finding the clusters in the dataset. The output of the method also represents the largest variances in the dataset identified by the multiple correspondence analysis.
This paper presents PediaMind-R1, a domain-specialized large language model designed to achieve active personalization in intelligent parenting scenarios. Unlike conventional systems that provide generic suggestions, PediaMind-R1 draws on insights from developmental psychology. It introduces temperament theory from the Thomas-Chess framework and builds a temperament knowledge graph for infants and toddlers (0-3 years). Our two-stage training pipeline first uses supervised fine-tuning to teach structured chain-of-thought reasoning, and then applies a GRPO-based alignment stage to reinforce logical consistency, domain expertise, and empathetic caregiving strategies. We further design an evaluation framework comprising temperament-sensitive multiple-choice tests and human assessments. The results demonstrate that PediaMind-R1 can accurately interpret early childhood temperament profiles and proactively engage in individualized reasoning. This work highlights the value of integrating vertical-domain modeling with psychological theory. It offers a novel approach to developing user-centered LLMs that advance the practice of active personalization in sensitive caregiving contexts.
Understanding user identity and behavior is central to applications such as personalization, recommendation, and decision support. Most existing approaches rely on deterministic embeddings or black-box predictive models, offering limited uncertainty quantification and little insight into what latent representations encode. We propose a probabilistic digital twin framework in which each user is modeled as a latent stochastic state that generates observed behavioral data. The digital twin is learned via amortized variational inference, enabling scalable posterior estimation while retaining a fully probabilistic interpretation. We instantiate this framework using a variational autoencoder (VAE) applied to a user-response dataset designed to capture stable aspects of user identity. Beyond standard reconstruction-based evaluation, we introduce a statistically grounded interpretation pipeline that links latent dimensions to observable behavioral patterns. By analyzing users at the extremes of each latent dimension and validating differences using nonparametric hypothesis tests and effect sizes, we demonstrate that specific dimensions correspond to interpretable traits such as opinion strength and decisiveness. Empirically, we find that user structure is predominantly continuous rather than discretely clustered, with weak but meaningful structure emerging along a small number of dominant latent axes. These results suggest that probabilistic digital twins can provide interpretable, uncertainty-aware representations that go beyond deterministic user embeddings.
Utilizing user profiles to personalize Large Language Models (LLMs) has been shown to enhance the performance on a wide range of tasks. However, the precise role of user profiles and their effect mechanism on LLMs remains unclear. This study first confirms that the effectiveness of user profiles is primarily due to personalization information rather than semantic information. Furthermore, we investigate how user profiles affect the personalization of LLMs. Within the user profile, we reveal that it is the historical personalized response produced or approved by users that plays a pivotal role in personalizing LLMs. This discovery unlocks the potential of LLMs to incorporate a greater number of user profiles within the constraints of limited input length. As for the position of user profiles, we observe that user profiles integrated into different positions of the input context do not contribute equally to personalization. Instead, where the user profile that is closer to the beginning affects more on the personalization of LLMs. Our findings reveal the role of user profiles for the personalization of LLMs, and showcase how incorporating user profiles impacts performance providing insight to leverage user profiles effectively.
This paper presents ReasoningRec, a reasoning-based recommendation framework that leverages Large Language Models (LLMs) to bridge the gap between recommendations and human-interpretable explanations. In contrast to conventional recommendation systems that rely on implicit user-item interactions, ReasoningRec employs LLMs to model users and items, focusing on preferences, aversions, and explanatory reasoning. The framework utilizes a larger LLM to generate synthetic explanations for user preferences, subsequently used to fine-tune a smaller LLM for enhanced recommendation accuracy and human-interpretable explanation. Our experimental study investigates the impact of reasoning and contextual information on personalized recommendations, revealing that the quality of contextual and personalized data significantly influences the LLM's capacity to generate plausible explanations. Empirical evaluations demonstrate that ReasoningRec surpasses state-of-the-art methods by up to 12.5\% in recommendation prediction while concurrently providing human-intelligible explanations. The code is available here: https://github.com/millenniumbismay/reasoningrec.
This study proposes augmenting dialog data with think-aloud utterances (TAUs) for modeling individual personalities in text chat by LLM. TAU is a verbalization of a speaker's thought before articulating the utterance. We expect "persona LLMs" trained with TAU-augmented data can mimic the speaker's personality trait better. We tested whether the trained persona LLMs obtain the human personality with respect to Big Five, a framework characterizing human personality traits from five aspects. The results showed that LLMs trained with TAU-augmented data more closely align to the speakers' Agreeableness and Neuroticism of Big Five than those trained with original dialog data. We also found that the quality of TAU-augmentation impacts persona LLM's performance.
We present an initial set of factors, features, and constraints for developing a Computational Auditory System (CAS, aka less formally an artificial ear, AE) for use by cognitive architectures. We start to define a CAS and what tasks it should be able to perform. We then outline the features of a CAS for use by a cognitive architecture and factors that influence its performance. We conclude with an update on what has been created so far and insights on how to create and use a CAS in a cognitive architecture and include a set of functionalities for an artificial ear.
The study of belief change has been an active area in philosophy and AI. In recent years two special cases of belief change, belief revision and belief update, have been studied in detail. In a companion paper (Friedman & Halpern, 1997), we introduce a new framework to model belief change. This framework combines temporal and epistemic modalities with a notion of plausibility, allowing us to examine the change of beliefs over time. In this paper, we show how belief revision and belief update can be captured in our framework. This allows us to compare the assumptions made by each method, and to better understand the principles underlying them. In particular, it shows that Katsuno and Mendelzon's notion of belief update (Katsuno & Mendelzon, 1991a) depends on several strong assumptions that may limit its applicability in artificial intelligence. Finally, our analysis allow us to identify a notion of minimal change that underlies a broad range of belief change operations including revision and update.
Explainable Artificial Intelligence (XAI) is essential for building advanced machine learning-powered applications, especially in critical domains such as medical diagnostics or autonomous driving. Legal, business, and ethical requirements motivate using effective XAI, but the increasing number of different methods makes it challenging to pick the right ones. Further, as explanations are highly context-dependent, measuring the effectiveness of XAI methods without users can only reveal a limited amount of information, excluding human factors such as the ability to understand it. We propose to evaluate XAI methods via the user's ability to successfully perform a proxy task, designed such that a good performance is an indicator for the explanation to provide helpful information. In other words, we address the helpfulness of XAI for human decision-making. Further, a user study on state-of-the-art methods was conducted, showing differences in their ability to generate trust and skepticism and the ability to judge the rightfulness of an AI decision correctly. Based on the results, we highly recommend using and extending this approach for more objective-based human-centered user studies to measure XAI performance in an end-to-end fashion.
This paper outlines a perspective on the future of AI, discussing directions for machines models of human-like intelligence. We explain how developmental and evolutionary theories of human cognition should further inform artificial intelligence. We emphasize the role of ecological niches in sculpting intelligent behavior, and in particular that human intelligence was fundamentally shaped to adapt to a constantly changing socio-cultural environment. We argue that a major limit of current work in AI is that it is missing this perspective, both theoretically and experimentally. Finally, we discuss the promising approach of developmental artificial intelligence, modeling infant development through multi-scale interaction between intrinsically motivated learning, embodiment and a fastly changing socio-cultural environment. This paper takes the form of an interview of Pierre-Yves Oudeyer by Mandred Eppe, organized within the context of a KI - K{ü}nstliche Intelligenz special issue in developmental robotics.
This paper leverages various philosophical and ontological frameworks to explore the concept of embodied artificial general intelligence (AGI), its relationship to human consciousness, and the key role of the metaverse in facilitating this relationship. Several theoretical frameworks underpin this exploration, such as embodied cognition, Michael Levin's computational boundary of a "Self," and Donald D. Hoffman's Interface Theory of Perception, which lead to considering human perceived outer reality as a symbolic representation of alternate inner states of being, and where AGI could embody a different form of consciousness with a larger computational boundary. The paper further discusses the necessary architecture for the emergence of an embodied AGI, how to calibrate an AGI's symbolic interface, and the key role played by the Metaverse, decentralized systems and open-source blockchain technology. The paper concludes by emphasizing the importance of achieving a certain degree of harmony in human relations and recognizing the interconnectedness of humanity at a global level, as key prerequisites for the emergence of a stable embodied AGI.
This chapter presents methodological reflections on the necessity and utility of artificial intelligence in generative design. Specifically, the chapter discusses how generative design processes can be augmented by AI to deliver in terms of a few outcomes of interest or performance indicators while dealing with hundreds or thousands of small decisions. The core of the performance-based generative design paradigm is about making statistical or simulation-driven associations between these choices and consequences for mapping and navigating such a complex decision space. This chapter will discuss promising directions in Artificial Intelligence for augmenting decision-making processes in architectural design for mapping and navigating complex design spaces.
Conceptual modeling (CM) applies abstraction to reduce the complexity of a system under study (e.g., an excerpt of reality). As a result of the conceptual modeling process a human interpretable, formalized representation (i.e., a conceptual model) is derived which enables understanding and communication among humans, and processing by machines. Artificial Intelligence (AI) algorithms are also applied to complex realities (regularly represented by vast amounts of data) to identify patterns or to classify entities in the data. Aside from the commonalities of both approaches, a significant difference can be observed by looking at the results. While conceptual models are comprehensible, reproducible, and explicit knowledge representations, AI techniques are capable of efficiently deriving an output from a given input while acting as a black box. AI solutions often lack comprehensiveness and reproducibility. Even the developers of AI systems can't explain why a certain output is derived. In the Conceptual Modeling meets Artificial Intelligence (CMAI) workshop, we are interested in tackling the intersection of the two, thus far, mostly isolated approached disciplines of CM and AI. The workshop embraces the assumption, that manifold mutual benefits can be realized by i) investigating what Conceptual Modeling (CM) can contribute to AI, and ii) the other way around, what Artificial Intelligence (AI) can contribute to CM.
Dungeon Crawl Stone Soup is a popular, single-player, free and open-source rogue-like video game with a sufficiently complex decision space that makes it an ideal testbed for research in cognitive systems and, more generally, artificial intelligence. This paper describes the properties of Dungeon Crawl Stone Soup that are conducive to evaluating new approaches of AI systems. We also highlight an ongoing effort to build an API for AI researchers in the spirit of recent game APIs such as MALMO, ELF, and the Starcraft II API. Dungeon Crawl Stone Soup's complexity offers significant opportunities for evaluating AI and cognitive systems, including human user studies. In this paper we provide (1) a description of the state space of Dungeon Crawl Stone Soup, (2) a description of the components for our API, and (3) the potential benefits of evaluating AI agents in the Dungeon Crawl Stone Soup video game.
Modular Belief Updates and Confusion about Measures of Certainty in Artificial Intelligence Research
Over the last decade, there has been growing interest in the use or measures or change in belief for reasoning with uncertainty in artificial intelligence research. An important characteristic of several methodologies that reason with changes in belief or belief updates, is a property that we term modularity. We call updates that satisfy this property modular updates. Whereas probabilistic measures of belief update - which satisfy the modularity property were first discovered in the nineteenth century, knowledge and discussion of these quantities remains obscure in artificial intelligence research. We define modular updates and discuss their inappropriate use in two influential expert systems.
Little by little, newspapers are revealing the bright future that Artificial Intelligence (AI) is building. Intelligent machines will help everywhere. However, this bright future has a dark side: a dramatic job market contraction before its unpredictable transformation. Hence, in a near future, large numbers of job seekers will need financial support while catching up with these novel unpredictable jobs. This possible job market crisis has an antidote inside. In fact, the rise of AI is sustained by the biggest knowledge theft of the recent years. Learning AI machines are extracting knowledge from unaware skilled or unskilled workers by analyzing their interactions. By passionately doing their jobs, these workers are digging their own graves. In this paper, we propose Human-in-the-loop Artificial Intelligence (HIT-AI) as a fairer paradigm for Artificial Intelligence systems. HIT-AI will reward aware and unaware knowledge producers with a different scheme: decisions of AI systems generating revenues will repay the legitimate owners of the knowledge used for taking those decisions. As modern Robin Hoods, HIT-AI researchers should fight for a fairer Artificial Intelligence that gives back what it steals.
合并后的分组展示了“心理认知能力洞察用户交流方式”这一主题的完整科研版图:从底层理论(认知架构与数学建模)出发,通过核心技术(ToM心智建模、情感计算、个性化画像)实现对用户深层意图与特质的捕捉,并最终落实到交互界面优化(认知负荷管理、XAI)以及特定社会/医疗场景(心理健康干预、社交动力学)的应用中。研究趋势正从单一的“功能辅助”转向深度的“心理对齐”与“人机协同智能”。