AI驱动的界面即时重构

最终分组结果构建了一个从底层技术到高层伦理的完整研究图谱。研究涵盖了以MLLM为核心的UI感知建模，利用生成式算法实现的布局即时合成，以及向对话式、多模态交互范式的演进。同时，报告深入探讨了智能体在自动化重构中的作用，以及在混合现实等复杂空间环境下的自适应优化。最后，通过对认知理论、伦理安全及垂直领域实践的整合，强调了AI驱动的界面重构正朝着“上下文感知、以人为本、领域增强”的智能化生态系统方向发展。

共 61 篇文献，7 个研究方向

UI感知建模与多模态底层理解

该组文献聚焦于构建能够理解、表征和识别用户界面的底层模型。通过多模态大模型（MLLM）、视觉语言模型（VLM）和自监督学习，实现对UI元素的精确检测、语义分组及导航规划，为即时重构提供结构化输入。相关文献: Hao Yang et. al, 2025 等 8 篇文献

生成式布局优化与动态合成技术

此类研究探讨利用VAE、GAN、扩散模型及强化学习实现界面的自动生成与实时调整。重点在于界面的可塑性（Malleability）、代码自动合成（如React组件）以及根据用户演示或指令进行的即时界面重构。相关文献: Runsheng Zhang et. al, 2024 等 10 篇文献

多模态对话式交互与增强范式

该组文献研究如何将传统的静态图形界面（GUI）转变为由自然语言驱动的对话式界面（CUI）。通过集成LLM、RAG技术及多模态输入（语音、图像、眼动），提升交互的自然度与任务处理效率。相关文献: Yue Feng et. al, 2023 等 10 篇文献

智能体驱动的自动化与人机协作开发

关注AI智能体（Agents）在界面操作与软件工程中的角色。包括能够自主执行UI任务的智能体、作为界面评审者的AI，以及在协同开发过程中AI如何改变开发者与界面的交互模式。相关文献: Kevin Qinghong Lin et. al, 2025 等 7 篇文献

混合现实(MR)与空间情境自适应重构

专门针对3D、AR/VR及混合现实环境，研究如何根据物理环境、社会线索及用户偏好，实现UI在三维空间中的即时布局优化与认知减负。相关文献: Yao Song et. al, 2025 等 7 篇文献

用户认知理论、信任度与伦理安全

从HCI视角出发，分析AI驱动界面对用户心理模型、认知流状态的影响，探讨生成式UI中的暗黑模式、偏见消除、信任建立及负责任的AI设计蓝图。相关文献: Jesun Yeon et. al, 2024 等 10 篇文献

垂直领域驱动的个性化应用实践

展示AI界面重构在医疗、金融、工业、自动驾驶及教育等特定领域的应用。强调根据领域知识、用户情绪及实时数据动态调整界面，以支持复杂决策。相关文献: Reza Samimi et. al, 2025 等 9 篇文献

总计84篇相关文献

Computer-Use Agents as Judges for Generative User Interface

作为生成用户界面裁判的计算机使用代理

Kevin Qinghong Lin, Siyuan Hu, Linjie Li 等, 2025-ArXiv

Computer-Use Agents (CUA) are becoming increasingly capable of autonomously operating digital environments through Graphical User Interfaces (GUI). Yet, most GUI remain designed primarily for humans--prioritizing aesthetics and usability--forcing agents to adopt human-oriented behaviors that are unnecessary for efficient task execution. At the same time, rapid advances in coding-oriented language models (Coder) have transformed automatic GUI design. This raises a fundamental question: Can CUA as judges to assist Coder for automatic GUI design? To investigate, we introduce AUI-Gym, a benchmark for Automatic GUI development spanning 52 applications across diverse domains. Using language models, we synthesize 1560 tasks that simulate real-world scenarios. To ensure task reliability, we further develop a verifier that programmatically checks whether each task is executable within its environment. Building on this, we propose a Coder-CUA in Collaboration framework: the Coder acts as Designer, generating and revising websites, while the CUA serves as Judge, evaluating functionality and refining designs. Success is measured not by visual appearance, but by task solvability and CUA navigation success rate. To turn CUA feedback into usable guidance, we design a CUA Dashboard that compresses multi-step navigation histories into concise visual summaries, offering interpretable guidance for iterative redesign. By positioning agents as both designers and judges, our framework shifts interface design toward agent-native efficiency and reliability. Our work takes a step toward shifting agents from passive use toward active participation in digital environments. Our code and dataset are available at https://github.com/showlab/AUI.