AIGC在IP设计中的辅助应用浅析
传统文化IP的数字化活化与符号重塑
该组文献聚焦于利用AIGC技术(如Stable Diffusion、LoRA、GAN、风格迁移等)对非物质文化遗产、传统纹样、神话传说及文物进行数字化保护与创新再创作。研究重点在于如何提取传统视觉符号(如苗绣、青花瓷、敦煌壁画、甲骨文等)并将其转化为符合现代审美的数字化IP形象。
- Sustainable Design on Intangible Cultural Heritage: Miao Embroidery Pattern Generation and Application Based on Diffusion Models(Qi-xiang Yu, Xuyuan Tao, Jianping Wang, 2025, Sustainability)
- Batik Sketch Coloring Using Generative Adversarial Network Pix2pix(Fanky Abdilqoyyim, Muhammad Ali Syakur, Fitri Damayanti, 2025, Journal of Information Engineering and Educational Technology)
- DiffOBI: Diffusion-based Image Generation of Oracle Bone Inscription Style Characters(Xiaoxuan Xie, Xu Du, Minhao Li, Xi Yang, Haoran Xie, 2024, SIGGRAPH Asia 2024 Technical Communications)
- Region-Aware Style Transfer Between Thangka Images via Combined Segmentation and Adaptive Style Fusion(Yukai Xian, Te Shen, Yunjie Xiang, Pubu Danzeng, Yurui Lee, 2025, 2025 28th International Conference on Computer Supported Cooperative Work in Design (CSCWD))
- The Optimization of Illustration Design in Cultural and Creative Products for Liaoning Region Under Intelligent Generative Adversarial Network(Min Niu, Yingbo Zhou, 2025, IEEE Access)
- USABILITY ANALYSIS OF STABLE DIFFUSION-BASED GENERATIVE MODEL FOR ENRICHING BATIK BAKARAN PATTERN SYNTHESIS(Kornelius Aditya Septemedi, Yonathan Purbo Santosa, 2024, Proxies : Jurnal Informatika)
- AI-Driven Generative Design for Wringinanom Batik Pattern Innovation: A Technopreneurship Approach to Creative Industry Growth(Y. Kusumawati, Viona Putri Salim, Binus Lasmy Management, Chua Soo, Wina Permana, 2025, 2025 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS))
- Research on IP Image Design of Pumi Clothing Patterns Using AIGC Technology(Wei Hua, Aibin Huang, 2025, Applied and Computational Engineering)
- Chinese Opera Character Painting Style Transfer: Using AI to Generate and Preserve Art(Xirong Gao, Liza Marziana Binti Mohammad Noh, Zhongwei Huang, 2024, International Journal of Research and Innovation in Social Science)
- SRAGAN: Saliency regularized and attended generative adversarial network for Chinese ink-wash painting style transfer(Xiang Gao, Yuqi Zhang, 2024, Pattern Recognit.)
- Research on Generating Cultural Relic Images Based on a Low-Rank Adaptive Diffusion Model(Juntao Deng, Xu Cao, Bingqi Cheng, 2024, Proceedings of the 2024 Guangdong-Hong Kong-Macao Greater Bay Area International Conference on Digital Economy and Artificial Intelligence)
- Innovative Design Research on Zigong Lanterns based on Generative AI(Yue Cao, 2025, International Journal of Education and Humanities)
- A Study on the Creation of Cultural Products Image of Intangible Cultural Heritage Using Generative Artificial Intelligence : Based on the Chinese craft Cloisonne using Stable Diffusion and Midjourney(Zhuo-Xun Wu, Ji-Sung Song, 2025, korea soc pub des)
- Prompt Conditioned Batik Pattern Generation Using LoRA Weighted Diffusion Model With Classifier-Free Guidance(Rahmatulloh Daffa Izzuddin Wahid, Novanto Yudistira, Candra Dewi, Irawati Nurmala Sari, Dyanningrum Pradhikta, .. Fatmawati, 2025, IEEE Access)
- AI-Assisted Inheritance of Qinghua Porcelain Cultural Genes and Sustainable Design Using Low-Rank Adaptation and Stable Diffusion(Qian Bao, Jiajia Zhao, Ziqi Liu, Na Liang, 2025, Electronics)
- Fine-tuning diffusion model to generate new kite designs for the revitalization and innovation of intangible cultural heritage(Yaqin Zhou, Yu Liu, Yuxin Shao, Junming Chen, 2025, Scientific Reports)
- Research on the Application of AIGC Technology in the Imagery of Folk Flower-Bird Calligraphy(Biyue Liang, Lin Liu, 2025, Proceedings of the 2025 International Conference on Artificial Intelligence, Virtual Reality and Interaction Design)
- Open-Ended Evolution of Artistic Styles in Diffusion Models via Island-Based Genetic Algorithms(Marcel Salvenmoser, M. Affenzeller, 2025, Proceedings of the Genetic and Evolutionary Computation Conference Companion)
- Supporting the Transmission and Innovation of Intangible Cultural Heritage through AI-Generated Content (AIGC): A Case Study on Assisted Design of Qinghua Porcelain Patterns in China(Jiajia Zhao, Euitay Jung, Meile Le, 2025, Journal of Korea Multimedia Society)
- Digital Art Design and Intelligent Re-creation of Intangible Cultural Heritage Patterns Based on Diffusion Generation Model(Zhiye Zhang, Junjun Hu, 2025, Proceedings of the 2025 International Conference on Artificial Intelligence and Sustainable Development)
- A Study on Character Design Creation of Traditional Chinese Images Using Generative Artificial Intelligence : Focusing on Dunhuang Tang Dynasty's “Flying Fairy” Images via MidJourney(Chen Chen, Ji-Sung Song, 2024, korea soc pub des)
- Style Transfer of Chinese Traditional Dermatoglyphic Pattern with Wide Diffusion Model(Chengcheng Li, Xueli Chen, Jiancong Chen, Wenhui He, 2025, 2025 28th International Conference on Computer Supported Cooperative Work in Design (CSCWD))
- Digital Integration of Traditional Craft Motifs in Mobile AR/VR Interactive Art Creation(Zhijuan Chen, Xijin Li, 2025, Int. J. Interact. Mob. Technol.)
- Research on the Construction of an AI Platform for Haipai New Year Paintings Based on the Stable Diffusion Model(Yueyang Zhao, Jingru Zhang, Damin Ding, Jin Liu, 2025, Int. J. Gaming Comput. Mediat. Simulations)
- A Study on the Generation of Visual Images for Mythological Legends Using AI Generated Content(AIGC): Focusing on the Chinese Mythology Collection “Classic of Mountains and Seas” with ChatGPT and Midjourney(Zi-Yun Zhao, Ji-Sung Song, 2024, korea soc pub des)
- Design Approach Research Using Stable Diffusion AI-Generated Images : Focusing on the Buddhist Figurine Designs of Longxing Temple (龍興寺) at Qingzhou Museum(Jun-Jun li, Ji-Sung Song, 2024, korea soc pub des)
- Digital Activation and Translation Research of Sanxingdui Bronze Bird Artifact Symbols Based on Kansei Engineering and AIGC(Fangshuyang Yu, Xinyu Zhuang, Jingyi Wang, Shiyue Yang, 2025, Proceedings of the 2025 International Conference on Computer Technology, Digital Media and Communication)
- Innovative Design Applications of Sanxingdui Culture Enabled by AI-Generated Content (AIGC)(Xinyue Li, Xian Li, Ziyan Huang, 2025, Journal of Fine Arts)
AIGC底层生成算法、精准控制与效能评估
此类文献侧重于技术方法论与工具评估。研究内容涵盖了GAN、扩散模型等算法的底层优化,特别是针对线稿上色、草图驱动、风格一致性控制、提示词工程及跨模态生成的关键技术。同时包括了对Midjourney、SD等主流工具在设计效能上的对比分析与定量评估。
- Research on automatic generation and style transfer of IP imagery based on deep learning(Yihan Wang, 2025, No journal)
- ClipArtGAN: An Application of Pix2Pix Generative Adversarial Network for Clip Art Generation(Reham H. Elnabawy, Slim Abdennadher, O. Hellwich, Seif Eldawlatly, 2024, Multimedia Tools and Applications)
- SketchAI: A "Sketch-First" Approach to Incorporating Generative AI into Fashion Design(R. Davis, Kevin Fred Mwaita, Livia Müller, D. Tozadore, A. Novikova, Tanja Käser, Thiemo Wambsganss, 2025, Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems)
- LouvreSAE: Sparse Autoencoders for Interpretable and Controllable Style Transfer(Raina Panda, Daniel Fein, Arpita Singhal, M. Fiore, Maneesh Agrawala, Matyáš Boháček, 2025, ArXiv)
- ReChar: Revitalising Characters with Structure Preserved and User-Specified Aesthetic Enhancements(Zhongyu Yang, Junhao Song, Zhang Luo, Zuhao Yang, Yang Xu, Jingfen Lan, Yonghan Zhang, Wei Pang, Siyang Song, Yingfang Yuan, 2025, Proceedings of the SIGGRAPH Asia 2025 Technical Communications)
- A Study on the Use of Generative AI in Design Education Environment -Case of Brand Identity Design-(J. Lee, M. Kim, 2025, The Treatise on The Plastic Media)
- Comparative analysis of neural networks Midjourney, Stable Diffusion, and DALL-E and ways of their implementation in the educational process of students of design specialities(Nataliya Derevyanko, Olena Zalevska, 2023, Scientific Bulletin of Mukachevo State University Series “Pedagogy and Psychology”)
- NEURAL NETWORKS IN ART AS A GRAPHIC DESIGN TOOL(Аrtem Antonenko, Yurii Mishkur, А. О. Твердохліб, S. Vostrikov, A. Balvak, 2024, Herald of Khmelnytskyi National University. Technical sciences)
- The Comparison of the Effectiveness and Efficiency of Fine-Tuning Models on Stable Diffusion in Creating Concept Art(Abdul Bilal Qowy, A. Ihsan, Sri Hartati, 2024, JURNAL TEKNIK INFORMATIKA)
- AI-Powered Sculpting: Reviving Digital Sculpture with Generative Workflows(Xue Bai, Allan Fowler, Athirah Zaini, 2025, Proceedings of the SIGGRAPH Asia 2025 Posters)
- Matrix-based character design systems: Parallel principles in manga matrix and lora for neural networks(Anthoni Reza Pahlevi, A. Z. Mansoor, 2025, Deskomvis: Jurnal Ilmiah Desain Komunikasi Visual, Seni Rupa dan Media)
- DreamWalk: Style Space Exploration using Diffusion Guidance(Michelle Shu, Charles Herrmann, Richard Strong Bowen, Forrester Cole, R. Zabih, 2024, No journal)
- Reverse Prompt: Cracking the Recipe Inside Text-to-Image Generation(Zhiyao Ren, Yibing Zhan, Baosheng Yu, Dacheng Tao, 2025, ArXiv)
- ArtisanFlow: Style Customization System for Illustration Generation(Zeyu Liu, Taiheng Ye, Yiming Li, Chen Song, Yi Xu, 2025, 2025 IEEE International Conference on Multimedia and Expo Workshops (ICMEW))
- esFont: A guided diffusion and multimodal distillation to enhance the efficiency and stability in font design(Weijia Zhu, Xinjin Li, Jing Pu, Jin He, Jing Tan, 2025, PLOS One)
- A Generative Adversarial Network AMS-CycleGAN for Multi-Style Image Transformation(Xiaodi Rang, Zhengyu Dong, Jiachen Han, Chaoqing Ma, Guangzhi Zhao, Wenchao Zhang, 2024, IEEE Access)
- SketcherX: AI-Driven Interactive Robotic drawing with Diffusion model and Vectorization Techniques(Jookyung Song, Mookyoung Kang, Nojun Kwak, 2024, ArXiv)
- Novel Paintings from the Latent Diffusion Model through Transfer Learning(Dayin Wang, Chong-zhu Ma, Siwen Sun, 2023, Applied Sciences)
- Research on Innovation of AIGC-Integrated Digital Media Art Creation and Communication Modes(Kaiping Zhang, Di Wu, 2025, Proceedings of the 2025 2nd International Conference on Digital Economy and Computer Science)
- A Generative AI Framework for Enhancing Human-AI Collaborative Creation in Artistic Design Services(M. Jiang, 2026, International Journal of Information Systems in the Service Sector)
- A Deep Learning-Based Generative Adversarial Network for Digital Art Style Migration(Wenting Ou, 2025, International Journal of Advanced Computer Science and Applications)
- Stable Diffusion Reference Only: Image Prompt and Blueprint Jointly Guided Multi-Condition Diffusion Model for Secondary Painting(Hao Ai, Lu Sheng, 2023, ArXiv)
- Text-Guided Synthesis of Artistic Images with Retrieval-Augmented Diffusion Models(Robin Rombach, A. Blattmann, B. Ommer, 2022, ArXiv)
- An Improved Deep Convolutional Generative Adversarial Network for Anime-Style Character Image Painting(Zhong Hua, Lin Jie, 2024, 2024 9th International Conference on Computer and Communication Systems (ICCCS))
影视、游戏与虚拟角色IP的辅助创作流
该组文献关注数字内容产业中的IP开发,涉及角色设计、3D资产创建、动画制作、长篇叙事生成及游戏世界观构建。重点探讨AI如何通过工作流集成提升创意效率,实现从概念草图到动态视频、从叙事文本到交互式世界设计的全链路辅助。
- Leveraging AI and diffusion models for anime art creation: A study on style transfer and image quality evaluation(Chao-Chun Shen, Shun-Nian Luo, Ling Fan, Chenglin Dai, 2025, Comput. Sci. Inf. Syst.)
- Exploration of the Artistic Expression Characteristics and AIGC Technology Application Boundaries of the Animated Short Film Glimmer(Bo Zhang, 2025, Journal of Contemporary Educational Research)
- Generative AI Application in Designing Animation Characters for ‘Genset Gang’ Visual Creation(Nurul Aqmarina Ardani, T. Riyadi, Angelica Fidelia Kurniawan, 2024, 2024 10th International Conference on Computing, Engineering and Design (ICCED))
- Practical Research on Virtual Reality and Augmented Reality Technology in Min Cultural Heritage Digital IP Character Design Innovation(Honglin Li, Jiankai Weng, 2025, Applied Mathematics and Nonlinear Sciences)
- Rodin: 3D Asset Creation with a Text/Image/3D-Conditioned Large Generative Model for Creative Frontier(Qixuan Zhang, Longwen Zhang, 2024, ACM SIGGRAPH 2024 Real-Time Live!)
- Few-shot multi-token DreamBooth with LoRa for style-consistent character generation(Rubén Pascual, Mikel Sesma-Sara, A. Jurio, D. Paternain, Mikel Galar, 2025, ArXiv)
- SophiaPop: Experiments in Human-AI Collaboration on Popular Music(David Hanson, Frankie Storm, Wenwei Huang, Vytas Krisciunas, Tiger Darrow, Au S Brown, Meng-Ying Lei, M. Aylett, Adam Pickrell, Sophia the Robot, 2020, ArXiv)
- A New Paradigm for Contemporary Film and Television Characters Design in the Context of Artificial Intelligence(Han Yan, Shuangyang Tan, Kaiqi Zhang, Danrui Wang, 2024, Advances in Education, Humanities and Social Science Research)
- Generative AI and the Expansion of Visual Narrative: Focusing on the “Figure KimChorok” Project(Ah-Yeong Kim, Jin-wan Park, 2025, Journal of Digital Contents Society)
- Illustrating Classic Brazilian Books using a Text-To-Image Diffusion Model(Felipe Rodrigues Perche Mahlow, André Felipe Zanella, William Alberto Cruz Castañeda, Regilene Aparecida Sarzi-Ribeiro, 2024, IEEE Latin America Transactions)
- The application of AIGC technology in animation character creation(Yuhong Shi, 2025, No journal)
- AI and animated character design: efficiency, creativity, interactivity(Manyun Tang, Yongcai Chen, 2024, The Frontiers of Society, Science and Technology)
- AIGC Technology: Reshaping the Future of the Animation Industry(Rui Gao, 2023, Highlights in Science, Engineering and Technology)
- Application of Generative AI in Animation Character Creation: A Study on Style Transfer Design*(Xue Li, Yuqi Xiong, 2025, 2025 7th International Conference on Robotics, Intelligent Control and Artificial Intelligence (RICAI))
- The Study on 3D Character Design Utilizing Digital Human Creators and Digital Software(Sun-Hee Hwang, 2024, The Korean Society of Beauty and Art)
- Research on Anime-Style Image Generation Based on Stable Diffusion(Ming-Hsuan Yang, 2025, ITM Web of Conferences)
- THE APPLICATION AND IMPACT OF AI-GENERATED CONTENT ON ANIMATION CREATION(Lingqiong Chen, R. Khynevych, 2025, Theory and Practice of Design)
- Automated Rawstory-to-Video Generation from Nasreddin Hodja Tales via an Expert-Inspired Multi-Stage Transformation Pipeline(Merve Onaran Atalay, M. N. Alpdemir, Ayşe Gül Özkan, Oğuzcan Karaman, Y. K. Akyüz, F. T. Bülbül, E. Açıkgöz, Eda Arısoy, 2026, 2026 5th International Informatics and Software Engineering Conference (IISEC))
- Designing Fictional Worlds for Play Through Large Language Models(Ada-Rhodes Short, 2024, Volume 3B: 50th Design Automation Conference (DAC))
- AI-Assisted Game Concept Art Generation: Techniques and Workflow Integration in Modern Game Development(Qifang Zhang, 2025, Journal of Intelligence and Knowledge Engineering)
- Innovative Digital Storytelling with AIGC: Exploration and Discussion of Recent Advances(Rongzhang Gu, Hui Li, Chang Su, Wenyan Wu, 2023, ArXiv)
- Creating an Anthropomorphic Folktale Animal: A Pilot Study on Character Design Creativity Derived From Autonomous Behavior Generation Powered by Reinforcement Learning(Hongju Yang, Seung Wan Hong, 2025, Computer Animation and Virtual Worlds)
- The Application of AIGC in 3D Graphic Design Courses(Jiuchong Wang, 2025, Academic Journal of Management and Social Sciences)
- The Application Path of AIGC in Assisting Game Character Design - A Case Study of the Black Bear Monster from Journey to the West(Xin Mi, Yang Cao, QiAo Li, 2023, 2023 4th International Conference on Intelligent Design(ICID))
- Enhanced Creativity and Ideation through Stable Video Synthesis(Elijah Miller, Thomas Dupont, Mingming Wang, 2024, ArXiv)
品牌视觉识别、时尚与多领域商业IP应用
该组研究关注AIGC在商业环境中的实战应用,包括品牌Logo、零售空间设计、老字号焕新、美妆及服装时尚IP。强调在维持品牌一致性的前提下,利用AI实现个性化定制、动态标识设计和商业效率的规模化提升。
- A Study on Developing a Bespoke Stationery Brand Logo through Generative AI Design Platforms : Focusing on Adobe Firefly and ChatGPT(S. Lee, Hye Rahn Seo, 2025, Korea Institute of Design Research Society)
- Design Application through Generative AI-Based Brand Identity Learning - Focusing on the Design Styles of Hyundai and BMW(Eunseo You, Jinyoung Choi, Sangyun Go, Seongwon Jeong, 2025, Design Convergence Study)
- A study on the Applicability of AI-Generated Content in Facade Design for Cosmetic Stores(Yuxue Hou, Ji-Sung Song, 2024, korea soc pub des)
- Generative AI-Driven Retail Space Branding: An Exploratory Study on Perceived Brand Image in Flagship Store Design(Chae-yeon Lee, Y. Jeong, Nayeon Kim, Soo-Yong Ahn, Nayeon Kim, 2025, Journal of the Korean Institute of Interior Design)
- Generative AI for Concept Creation in Footwear Design(J. Suessmuth, Florian Fick, Stan Van Der Vossen, 2023, ACM SIGGRAPH 2023 Talks)
- Integrating Generative AI with Design Thinking Approach to Improve Efficiency and Relevancy of MSMEs Promotional Content(Deta Nur Indriani, Riesta Devi Kumalasari, 2025, 2025 IEEE International Conference on Artificial Intelligence for Learning and Optimization (ICoAILO))
- Research on Methods and Practices of AIGC-Empowered Intelligent Design for Brand Visual Identity(Yang Wen, Dizhen Li, Kun Chen, 2025, 2025 6th International Conference on Intelligent Design (ICID))
- A Research on the Dynamization Effect of Brand Visual Identity Design: Mediated by Digital Information Smart Media(Peijie Yuan, 2024, Journal of Information Systems Engineering and Management)
- Clothing image attribute editing based on generative adversarial network, with reference to an upper garment(Wei-Zhen Wang, Hong Xiao, Yuan Fang, 2024, International Journal of Clothing Science and Technology)
- Mixing and Matching Elements for Intelligent Fashion Design: A Generative Adversarial Network With Structure and Texture Disentanglement(Han Yan, Haijun Zhang, Jianyang Shi, Jianghong Ma, 2024, IEEE Transactions on Consumer Electronics)
- Generative AI Fashion Design with LoRA Fine-tuning - Reproducing and Extending the Design Identity of RE;CODE -(GyuYeon Kang, Sunhee Park, 2025, Korean Society of Fashion Design)
- Design of personalized creation model for cultural and creative products based on evolutionary adaptive network(Dailei Hu, Enshi Wang, Muddassira Arshad, 2025, PeerJ Comput. Sci.)
- Tailoring Multimodal AIGC to Time-Honored Brands: A Stable Diffusion-Based Framework for Visual Generation and Evaluation(Xinbao Zhang, Jinjian Li, Shizhen Zhang, Yuwei Chen, 2025, Innovative Applications of AI)
- Research on Building Brand Identity Design Using Generative AI - Focusing on Creative Beverage Brands -(WeiGuang Gao, Miri Kim, 2025, Journal of Cultural Product & Design)
- A Study on Generative AI Cosmetics Packaging Design Using Korean Typography - Focusing on Comparing Sulwhasoo Brand and Consumer Perception by Country -(WeiGuang Gao, Miri Kim, 2025, Journal of Cultural Product & Design)
- A Study on Designing the City Brand Character of Hangshan City Using AIGC(Yanran Qian, Euitay Jung, 2024, JOURNAL OF THE KOREAN SOCIETY DESIGN CULTURE)
设计教育转型、人机协同模式与伦理评估
该组文献从社会学和教育学视角审视AIGC的影响。研究涉及设计学科教学法的变革、学生学习动机的提升、设计师职业角色的转型、人机共创的新型协作范式,以及原创性、作者身份归属、伦理界限等宏观课题。
- A Theoretical Framework for AIGC-Enabled Pedagogy in Digital Media Art and the Transformation of Digital Character Design Education(Jinlong Wu, 2025, Pacific International Journal)
- Reimagining Art and Design Education: An AI-Enhanced Interdisciplinary Project-Based Pedagogical Framework(Jin Zhang, Huajie Wang, T. Miao, Fengyu Yang, 2025, International Journal of Education and Social Development)
- AI-Augmented Branding: Exploring The Impact Of Generative Design Tools On Visual Identity Development In The Creative Industry(Dhewani Indrawati, Nadya Ahla Amanina, Lailatul Maulida, 2025, International Journal of Graphic Design)
- Enhancing College Students’ Achievement, Motivation, and Engagement in Film Character Design Through AI-Driven Smart Sketchpad: A Personalized Interactive Learning Model for Art Education(Siyuan Zeng, Norsafinar Rahim, Songni Xu, 2025, IEEE Access)
- A PICTURE IS WORTH A THOUSAND PROMPTS: TOPIC MODELING OF AI ART SUBREDDIT COMMUNITIES(Jing Han, Andrew Iliadis, 2025, AoIR Selected Papers of Internet Research)
- Human-AI Co-Creation: A Framework for Collaborative Design in Intelligent Systems(Zhangqi Liu, 2025, ArXiv)
- Towards a Diffractive Analysis of Prompt-Based Generative AI(Nina Rajcic, Maria Teresa Llano Rodriguez, Jon McCormack, 2024, Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems)
- Generative AI in the Applied Arts: Workflow Transformations, Evolving Professional Roles, and Emerging Skill Sets(Sonia Andreou, Wesley Reges Soares Pereira, Omiros Panayides, Eva Korae, Aekaterini Mavri, Andri Ioannou, 2025, Informatics in Education)
- Overview of the Shared Task on Multilingual Story Illustration: Bridging Cultures through AI Artistry (MUSIA)(Krishna Tewari, Anshita Malviya, Supriya Chanda, Arjun Mukherjee, Sukomal Pal, 2025, Proceedings of the 17th annual meeting of the Forum for Information Retrieval Evaluation)
- Study on Creativity Assessment Models for Generative AI-Assisted Brand Identity Design(Shin-Hee Park, 2025, Journal of Digital Contents Society)
- A Study on the Usability of Adobe Generative AI Tools - Focusing on Brand and Graphic Design Case Studies -(Seo Yoon Cha, 2024, Journal of Cultural Product & Design)
- Design and application of art creation education system based on generative adversarial network(Xiaoxiao Shi, Yang Yu, 2025, Discover Artificial Intelligence)
- AI Image Generation: Emerging Trends and Its Impact on UI/UX Design(Deepak Durgam, Naveen Anandhan, Rashmi Pathak, 2025, International Journal on Science and Technology)
- The Influence of Midjourney AI on Animation Character Design: A Case Study of Multimedia Design Students at The University of Jordan(Fuad Khasawneh, M. Al-Shboul, Ra'ed Al-Saaideh, 2025, Int. J. Interact. Mob. Technol.)
- Judging the creative prowess of AI(Tanmoy Chakraborty, Sarah Masud, 2023, Nature Machine Intelligence)
- A Study on the Application of AI-Generated Content Technology in Urban Culture-Based Mobile Game Design(Yujie Jiang, Euitay Jung, 2025, Journal of Korea Multimedia Society)
合并后的分组体系完整覆盖了AIGC在IP设计中的全链条:从底层的生成算法优化与工具可控性评估(技术层),到传统文化IP活化与内容产业创意流(应用层),再到商业品牌跨界实践(商业层),最后延伸至行业评估、伦理与教育变革(反思层)。研究趋势显示,AIGC正从简单的素材生成工具演变为深嵌入IP叙事与文化传承的协同系统,同时其对设计师职业身份的重塑已成为学术界关注的焦点。
总计114篇相关文献
Artificial Intelligence–Generated Content (AIGC) is rapidly reshaping creative industries and transforming the epistemology, processes, and pedagogies of digital media art. Despite widespread adoption of generative models in concept art, digital illustration, and character design, the theoretical foundations guiding their integration into higher education remain insufficiently developed. This theoretical–framework article examines how AIGC reconfigures creative cognition, studio pedagogy, and ethical discourse within digital character design education. Synthesizing interdisciplinary literature on human–AI co-creativity, constructivist learning, studio-based pedagogy, and AI ethics, the study proposes the TAE Framework as a holistic model for understanding the multidimensional influence of AIGC on curriculum design and creative learning. The framework conceptualizes AIGC as a cognitive collaborator, an artistic catalyst, and an ethical negotiation site. It provides implications for curriculum redesign, human–AI co-creative studio instruction, assessment reform, and responsible AI literacy. This work contributes a foundational theoretical lens for future empirical research and offers practical directions for educators seeking to cultivate creativity, criticality, and ethical discernment in an intelligent learning environment.
With the continuous development of artificial intelligence technology, AIGC (AI Generated Content) technology has been widely applied. However, there is still a lack of an effective AI-assisted design pathway in the field of game character design. This paper takes a perspective from game character design and is based on the cultural connotations and aesthetic features of the Black Bear Monster from "Journey to the West". It constructs an application pathway that introduces AIGC to promote innovative thinking activities and enhance the creative abilities of designers, thereby achieving AIGC-assisted game character design. The aim of this paper is to optimize the game character production process, provide insights for the creation of monster characters in "Journey to the West", and promote Chinese monster culture and the spirit of the journey.
This study promotes the innovative development of Chinese traditional culture by designing cultural characters based on the Tang Dynasty “Flying Apsaras” from Dunhuang murals using the generative artificial intelligence (AIGC) tool MidJourney v6.1. It addresses the challenges of monotonous dissemination methods and disconnection from younger generations by focusing on Generation Z, a key IP consumer group, to explore how AIGC can align with their modern aesthetic preferences and enhance cultural appeal. Research methods include case analysis to examine MidJourney's advantages, literature review to organize Dunhuang's visual elements, experimental design to create character images, and survey with SPSS data analysis to validate effectiveness. Results show that MidJourney effectively extracts Dunhuang mural elements, creating designs that blend traditional values with modern aesthetics while meeting Generation Z's preferences. However, cultural detail expression requires further optimization. The study highlights AIGC's potential to advance traditional culture preservation and dissemination through digital innovation.
AIGC, which stands for Artificial Intelligence Generated Content, is a new form of content creation following PGC and UGC. Its rise is due to breakthroughs in deep learning technology and the increasing demand for digital content. AIGC can create new forms of digital content generation in multiple fields such as dialogue, images, and videos based on models trained with large amounts of data. In art design courses, AIGC can expand creativity, improve efficiency, and achieve personalized design. Taking the Cinema4D teaching course as an example, AIGC can quickly generate renderings based on sketches during the modeling process, reducing the debugging phase; it can output initial drafts that match the design style in a short time during the rendering stage; and it can also rapidly generate multiple posters and character portrait materials that fit the brand's tone for advertising design. Integrating AIGC into Cinema4D teaching helps students enhance their design capabilities and adapt to the new trends in industry development.
To understand the application of AIGC technology in the production of films, a method of drawing and modeling in 3D virtual animation design is proposed. The paper first created 3D virtual animation and modeling, focusing on video animation, virtual animation, and 3D modeling. Virtual technology and 3D design technology are used to complete the drawings and models. Second, after dividing the characters, the artwork is completed in three stages: design, color and action; As the animated character is completed, the model model in 3D virtual animation design is achieved through three processes: the analysis of the characteristics of the model, the determination of the model, and the creation of the 3D animated model. Import 3D models into an animated virtual simulation platform focused on VR technology to interact and display with users. Finally, the experimental results show that the model developed by our method has a higher frame rate of up to 98.01 f/s. Compared with the two methods, the improvement is between 5~10 f/s. It shows that the model created from this file improves the quality of the image after controlling the character, and the character connection of the animated characters is smoother.
No abstract available
This study investigates the application and innovation pathways of AIGC technology in folk flower-bird calligraphy. Starting from the perspective of preserving and inheriting this traditional art form, it conducts multidimensional kinetic, virtualized, and interactive visual design experiments on character illustrations using AIGC technology. By reviewing the evolution of AIGC and the current state of digitalization in folk flower-bird calligraphy, along with case studies, LoRA model training experiments were carried out on the Liblib AI platform. This facilitated the construction of a semantic database and a prompt word system for folk flower-bird calligraphy, enabling intelligent deconstruction and stylized generation of stroke elements. The results demonstrate that AIGC technology can effectively achieve visual reconstruction and creative extension of the “painting within characters” characteristic of folk flower-bird calligraphy, enhancing both creative efficiency and control precision, thereby promoting a shift from human-led creation to human–AI collaboration.
This paper focuses on the original animated short film Glimmer, conducting an in-depth exploration of its artistic expression characteristics and the application boundaries of AIGC (Artificial Intelligence Generated Content) technology. Presented in a sketch style, the film creates unique visual effects and emotional atmospheres through techniques such as lines and light-dark contrast, demonstrating the distinctive charm of the sketch style in animation creation. Meanwhile, the paper reflects on the application of AIGC technology in animation creation, pointing out its optimization potential in aspects such as character expressions, scene design, storyboard scripts, and creative sound effects. The study emphasizes that although AIGC technology brings opportunities, issues related to originality and dependence on the technology (by creators) also need attention. This paper aims to provide theoretical references and practical guidance for animation education and creation, promote the integrated development of animation art and technology, and drive innovation and progress in the field of animation creation.
This paper explores the transformative role of Artificial Intelligence-assisted Generative Content (AIGC) technology in the animation industry. AIGC's key components, including Generative Adversarial Networks (GANs), Natural Language Processing (NLP), Reinforcement Learning, Virtual Reality (VR), and Augmented Reality (AR) are elucidated. The technology is seen to catalyze innovations in character and scene design, storyline construction, and scriptwriting, enhancing creativity and efficiency. AIGC's application in music and sound effects production, special effects and editing workflows, as well as in rendering and collaboration stages is also discussed, showcasing the improved work efficiency and quality of final products achieved through AI-assisted tools. Through a case study of the pioneering AIGC-assisted animation, "The Dog and the Boy," we demonstrate the potential of AIGC in driving commercial animation. Despite its current limitations, the study concludes that AIGC technology is poised to reshape the animation industry, promising a future marked by enhanced creative expression, increased efficiency, and the successful integration of AI in traditional workflows.
This paper delves into the evolution of Artificial Intelligence Generated Content (AIGC) within the realm of contemporary digital art, and scrutinizes its implications on artistic modalities and production. It delineates the genesis and progression of AIGC technology, offering an analysis of its extensive applications on the design of cinematic character. The paper acknowledges the latent capabilities of AIGC in the domain of character design, and highlighting the significance of the fusion of AI technology and humanities in the realm of artistic production. In conclusion, delineates the prospective opportunities and risks associated with the future application of AIGC technology in the field of character design and offer a critical analysis of its implications for the creative industry, which is maintaining a reverent and contemplative stance towards traditional art while pursuing of technological innovation.
Character design in games involves interdisciplinary collaborations, typically between designers who create the narrative content, and illustrators who realize the design vision. However, traditional workflows face challenges in communication due to the differing backgrounds of illustrators and designers, the latter with limited artistic abilities. To overcome these challenges, we created Sketchar, a Generative AI (GenAI) tool that allows designers to prototype game characters and generate images based on conceptual input, providing visual outcomes that can give immediate feedback and enhance communication with illustrators' next step in the design cycle. We conducted a mixed-method study to evaluate the interaction between game designers and Sketchar. We showed that the reference images generated in co-creating with Sketchar fostered refinement of design details and can be incorporated into real-world workflows. Moreover, designers without artistic backgrounds found the Sketchar workflow to be more expressive and worthwhile. This research demonstrates the potential of GenAI in enhancing interdisciplinary collaboration in the game industry, enabling designers to interact beyond their own limited expertise.
This study aimed to reveal the influence of teaching using the Midjourney artificial intelligence (AI)-based intelligence platform on developing the animation character design skills among students of the multimedia design course at the University of Jordan. The study used a quasi-experimental methodology with a sample of 60 male and female students of the Multimedia Design course from the Department of Visual Arts at the University of Jordan. The study participants were intentionally chosen and randomly divided into two groups: the experimental group, which consisted of 32 male and female students who utilized the educational application based on AI Midjourney, and the control group, which consisted of 28 male and female students who used the conventional method of studying. The findings indicated that there were statistically significant differences, at a significance level of (α = 0.05), in the average scores of the two groups regarding their post-performance in animation character design skills. The experimental group, which utilized the AI-based application Midjourney as a teaching method, demonstrated superior performance compared to the control group.
Abstract With the development of virtual reality (VR) technology and augmented reality (AR), the communication media and expression of culture have been further expanded. In this paper, VR and AR technologies are applied to the digital IP character design of Min cultural heritage, focusing on the image generation and action recognition processes in the digital IP character design process. The LAFITE model is used to generate the digital IP image of Min cultural heritage, and after the pose representation of the digital IP character, the multi-class support vector mechanism is used to construct the action recognition model. The model tests proved that the FID and IS produced by the LAFITE model are superior to those produced by other traditional models by 30% to 316% and 6% to 48% respectively. The output images of the Min cultural heritage digital IP characters are also of better quality. The MSVM model exhibits a high recognition rate for various actions of the IP characters, with each index value exceeding 93%, thereby facilitating effective interaction and enrichment of digital IP characters. The image output and action recognition model proposed in the study can promote the innovative design of digital IP characters of Min culture and enhance the digital creative expression and interactive forms of Min culture.
The integration of AI in character design lacks systematic frameworks that translate traditional visual design principles into computational parameters, creating a conceptual gap between designers' artistic intent and AI tool configuration. This research identifies four parallel principles—structural decomposition, modular recombination, granular control, and hierarchical organization—shared between Hiroyoshi Tsukamoto's Manga Matrix and Low-Rank Adaptation (LoRA), developing the first translation framework that maps visual design decisions to specific LoRA parameters. We reveal how matrix-based organizational principles create actionable pathways for designer-AI collaboration, establishing that these systems share fundamental operational mechanisms despite their distinct domains. Through literature review and comparative analysis, we examine how Manga Matrix's visual grid elements correspond to LoRA's rank values and weight distributions, establishing practical implementation guidelines for each principle. The research establishes foundational methodology for human-AI co-creation, contributing to more practical and consistent AI-assisted design workflows.
Popular in fantasy films, games, and extended reality, anthropomorphic animals often rely on animator creativity and real animal observation for behavior visualization. This artistic approach captures emotional traits but lacks uncovering diverse, unanticipated behaviors beyond creators' concepts. To enrich character design, this study employs reinforcement learning (RL) agent simulation to explore the autonomous behavior and unexpected responses of the nine‐tailed Fox Sister from Korean folklore. As a method, the agent, with a physics‐based controller and skeletal joints, uses hybrid action control to transition between bipedal and quadrupedal actions based on the environment. In result, RL character frequently exhibits behavioral shifts, including unexpected actions in response to training steps and terrain complexities like slopes and hurdles, distinguishing them from animation‐based finite‐state machines. Additionally, this study validates impacts of RL character on character design creativity. To investigate such unknown impacts, this study conducts a comparative pilot study that recruits five character designers under use and nonuse scenario of RL character. Analysis indicates that RL character promotes creativity of character design, conceptualization, and development of scenario and character's attribute. This study highlights RL's potential for visualizing diverse inspirational behaviors of folkloric creatures by simulating interactions between body structure, motion, and environment.
As technologies such as artificial intelligence rapidly evolve, the traditional lecture-based class can no longer meet students’ learning needs. Previous research has demonstrated that AI can be essential for improving the learning experience. However, research has also pointed out that promoting AI applications in education has been challenging. Developing domain-specific applications based on AI technology is a massive challenge for the development of AI in education. Further research has pointed out that the development of AI applications in education rarely considers students’ perceptual learning motivations. This study proposes a teaching method based on an artificial intelligence tool, the Artificial Intelligence Smart Sketchpad (AISS), to assist in teaching character art design in a multimedia environment. The artificial intelligence will identify students’ drawing sketches and make corrections. This study explores the impact of AISS on achievement, engagement, and motivation among film character art design students in a multimedia environment. To this end, we adopted a quantitative research method. Over five weeks, we assessed 60 university students. Research results indicate that the AI smart sketchpad (AISS) significantly improves students’ motivation, engagement, and performance. Student feedback suggests that AI stimulates creative thinking by offering greater creative freedom and possibilities and quickly transforming sketches into high-quality design drawings. A new direction will be explored for personalized and self-learning for students.
No abstract available
No abstract available
No abstract available
No abstract available
This study investigates the potential and strategic applicability of generative AI platforms namely Adobe Firefly and ChatGPT in the development of logo designs for bespoke stationery brands. It compares traditional manual design methods with AI-driven approaches and analyses the design philosophies and visual strategies of leading Japanese bespoke stationery brands. ChatGPT was used to generate brand names and extract emotionally resonant keywords, while Adobe Firefly produced visual logo drafts. This process allowed for an examination of the operational mechanisms and outcomes of AI-based design workflows. The findings indicate that generative AI enhances design efficiency and visual diversity, and aligns well with minimalist, emotion-oriented branding strategies. However, the study also underscores the continued necessity of human designers’ intuitive judgment and strategic involvement to accurately convey brand identity. By addressing both the strengths and limitations of generative AI in design, the research provides practical insights into future human-AI collaborative branding strategies and contributes to the broader discourse on identity development in the era of automation.
The article utilizes the literature research method, case study method, and practical verification method. The article discusses brand visual identity and motion graphics design principles. The article outlines dynamic brand visual identity design trends that digital information and AI enable. It explains AI generative models like GAN and diffusion models that generate graphics and effects. Examples like Stable Diffusion and Midjourney show AI's potential for diverse, abstract visuals in motion graphics. AI could also enable interactive effects by combining with AR/VR. Overall, AI can empower dynamic, personalized graphic design and branding. Key points are that dynamic design brings interactivity and better conveys brand meaning. Brand visual design is diversifying, with core brand image and dynamic performance reinforcing each other. AI can boost efficiency, innovation, and meaning in dynamic design. Though mainstream, 2D branding remains relevant. The article highlights the future potential of AI in motion graphics and visual storytelling, as it can generate new interpretations and experiences.
No abstract available
No abstract available
No abstract available
No abstract available
This study explores how generative artificial intelligence (AI) is reshaping the way visual designers and educators work and think. With tools like Midjourney, Adobe Firefly, and DALL·E becoming more widely used, creative processes are evolving—raising new questions about who creates, what counts as original, and how creative labor is valued. Through a qualitative approach, this research draws on interviews and document analysis involving professional designers and design educators in Indonesia. The findings reveal a complex landscape: generative AI opens up new space for experimentation, speeds up workflows, and helps spark ideas. At the same time, it brings challenges—ethical concerns, growing reliance on automation, and the risk of losing touch with foundational design skills. In the classroom, AI is both a disruption and an opportunity, pushing educators to rethink traditional studio methods while also offering fresh ways to build curricula. The study emphasizes the importance of fostering critical awareness in using AI, developing ethical guidelines for creative work, and adapting educational strategies to keep pace with technological change. By shedding light on how generative AI is influencing visual design, this research contributes to broader conversations about creativity, authorship, and innovation in today’s digital era.
Generative AI technologies are increasingly being incorporated into creativity support tools. However, most generative AI tools rely on text-based prompting, requiring users to translate visual ideas into linguistic descriptions. This approach is misaligned with the sketch-driven workflows of creative professionals. To address this gap, we introduce SketchAI, a novel sketch-first interface for diffusion models that allows practitioners to use real-time sketching on a tablet computer to guide model outputs. Through a qualitative study with 29 fashion design apprentices, we explored the interface’s potential impacts on creative workflows. While some participants identified use cases where SketchAI streamlined routine tasks, others expressed concerns about its potential to undermine creative agency and exploration. These findings unearthed hidden complexity: while generative AI can support some aspects of creativity, its core capabilities may challenge the central identity of creative practitioners. While SketchAI does not resolve this problem, it does take a meaningful step towards reconciliation.
Generative Artificial Intelligence (Gen AI) is rapidly reshaping the landscape of creative practice in the applied arts. While these tools accelerate ideation and support iterative prototyping, they also challenge traditional notions of authorship, authenticity and professional identity. This qualitative study explores how applied arts professionals integrate Gen AI into their workflows, what challenges they face, and what new skills and literacies they see as essential. Through purposive sampling, ten professionals, including designers, art directors, and filmmakers from diverse cultural contexts, were interviewed using semi-structured interviews. Thematic analysis identified two central themes: AI-driven workflow transformations and shifts in professional identity. Participants described Gen AI as a co-creator that enhances early conceptual work but also raised concerns around creative homogenization and ethical use of training data. These findings reinforce broader discussions in the literature about the dual role of AI as both a catalyst for innovation and a force that challenges creative diversity and cultural representation. The study highlights the need for a balanced approach to AI literacy in creative fields, one that integrates technical fluency with critical and ethical awareness. These insights provide a foundation for more nuanced, culturally sensitive, and ethically grounded approaches to AI adoption in the applied arts.
enabled new modes of expression through human – AI collaboration. The “Figure Kim Chorok” project is an artistic attempt to visualize the possibilities of human – AI co-creation in expressing the social identity of an individual in modern society through a hybrid character of a baby and a figure. By setting AI as an active partner in collaboration rather than a mere tool, the project experiments with AI suggestions and human interpretation in artistic creation, including planning, narrative, space
We present AI Archive, a footwear design tool using generative artificial intelligence (AI) that we successfully integrated into the design process at adidas. AI Archive is based on diffusion models and was trained on the entire archive of adidas sneakers, which dates back to the company’s beginning in the 1950ies. Being trained on this unique dataset enables the AI to generate new and innovative sneaker designs that draw inspiration from the archive and pay homage to the rich history of the adidas brand. AI Archive has been rolled out to our designers as a web application in 2022. The tool has since established itself as an essential ingredient in the concept-to-prototype process of many of our designers. The proposed system gives users a high level of control over the design process, enabling them to precisely guide the AI to create designs according to their direction. We believe that the use of generative AI in footwear design has the potential to transform the industry, as it allows designers to explore hundreds of different concepts in almost no time.
Batik, as one of Indonesia's most valuable cultural heritages, continues to face challenges in terms of innovation and market competitiveness, particularly in the digital era where consumer preferences are rapidly shifting toward unique and personalized designs. Wringinanom batik, with its strong cultural identity, requires new approaches to preserve its traditional essence while simultaneously adapting to modern design demands. The research problem lies in how to integrate advanced digital technology specifically Artificial Intelligence (AI) generative design into batik pattern development, while ensuring that this innovation contributes not only to cultural sustainability but also to creative industry growth through technopreneurship. This study aims to explore the potential of AI-driven generative design to create innovative variations of Wringinanom batik patterns that can strengthen cultural branding. The research employs the design thinking method, which allows a user-centered, iterative, and problem-solving approach through five stages. The output of this research is a set of newly generated Wringinanom batik patterns produced through AI models, ready to be applied in fashion and creative products. These patterns not only enrich the visual vocabulary of batik but also serve as a strategic medium to foster technopreneurship, enabling local artisans and entrepreneurs to innovate, differentiate their products, and compete more effectively in the global creative industry.
The rapid advancement of AI-generated content (AIGC) has introduced transformative capabilities to the field of brand visual identity design. This paper proposes a comprehensive intelligent design framework integrating multimodal cultural symbol mining, controllable content generation, and dynamic optimization feedback. By employing techniques such as knowledge graph-guided LoRA fine-tuning, ControlNet conditional control, and a low-threshold workflow system based on ComfyUI, the proposed method significantly enhances the cultural relevance, semantic consistency, and stylistic controllability of automated brand visual generation. A case study involving the production of a university program promotional video highlights AIGC's potential for efficient, scalable, and adaptable brand identity design, while suggesting future directions for multimodal model optimization, human-machine collaboration, and ethical standardization.
Amid information saturation and aesthetic pluralism, artistic design services grapple with inefficient manual workflows and imbalanced creative diversity-semantic fidelity. To address these and advance information system integration in design, this study proposes a two-stage multi-task generative AI framework for artistic design, integrating latent space remapping, hierarchical cross-modal attention distillation, and dynamic resource scheduling. Evaluated on a 30,000-sample dataset, the framework outperforms baselines: 45% lower FID than GAN-based models, 15% higher CLIP-Score for text-image alignment, over 4.3/5 professional designer satisfaction, and 1.2 iterations/second inference on a single 3080Ti GPU. It resolves existing generative AI flaws and advances human-AI collaboration in design services, laying technical groundwork for workflow innovation, design education support, and brand development.
Micro, Small, and Medium Enterprises (MSMEs) are a strategic sector in the economy that requires continuous innovation in digital marketing strategies. As the need for attractive, professional, and easily accessible promotional content increases, generative artificial intelligence (generative AI) technology is a potential solution to support the efficiency and quality of visual design. This study explores how design thinking and generative AI can be used to create attractive and professional visual content. This study uses a design thinking method with seven stages including understand, observe, define point of view, ideate, prototype, test, and reflect. Each step uses specific tools such as persona profile canvas, feedback capture grids, and retrospective boards, as well as generative AI tools used in prototypes such as Canva AI, Prome AI, Leonardo AI and others to create faster and more optimal designs. The results of the study show that the use of AI in promotional materials significantly improves the efficiency and branding of MSMEs, enabling the production of high-quality content at a lower cost. However, challenges remain, related to the limited scope of the study which only focuses on one sector, namely Muslim fashion MSMEs and the lack of AI's ability to understand the nuances of brand identity in depth. This study contributes to increasing the understanding of the synergy between design thinking and generative AI, as it can provide practical guidance for MSMEs in optimizing digital marketing, and suggests future research that focuses more on assisting MSMEs directly in implementing AI in their businesses.
With the increasing demand for the protection and dissemination of intangible cultural heritage (ICH), the digitization and innovative redesign of traditional patterns have become important directions for cultural inheritance. This paper proposes a digital art design and intelligent re-creation method for ICH patterns based on the Diffusion Model (DM). First, a database containing 4320 high-resolution ICH patterns is constructed, and semantic correspondence is achieved through feature annotation. Second, an improved Latent Diffusion Model (LDM) is adopted, introducing an attention-guided module and a cultural semantic embedding layer to enhance the model's understanding of the semantic and stylistic features of the patterns. Then, through style transfer and feature fusion algorithms, the diverse generation and innovative re-creation of ICH pattern styles are achieved. Finally, the effectiveness is evaluated using subjective aesthetic evaluation and the Structural Similarity Index (SSIM). The generated patterns achieve a structural fidelity of 0.956, a 12 % improvement in style consistency , and an average user aesthetic score of 1.4 . This study not only verified the feasibility and innovation of the diffusion model in the digital art design of intangible cultural heritage, but also provided a technical path with practical value for the intelligent transmission of cultural heritage.
Traditional kite creation often relies on the hand-painting of experienced artisans, which limits the revitalization and innovation of this intangible cultural heritage. This study proposes using an AI-based diffusion model to learn kite design and generate new kite patterns, thereby promoting the revitalization and innovation of kite-making craftsmanship. Specifically, to address the lack of training data, this study collected ancient kite drawings and physical kites to create a Traditional Kite Style Patterns Dataset. The study then introduces a novel loss function that incorporates auspicious themes in style and motif composition, and fine-tunes the diffusion model using the newly created dataset. The trained model can produce batches of kite designs based on input text descriptions, incorporating specified auspicious themes, style patterns, and varied motif compositions, all of which are easily modifiable. Experiments demonstrate that the proposed AI-generated kite design can replace traditional hand-painted creation. This approach highlights a new application of AI technology in kite creation. Additionally, this new method can be applied to other areas of cultural heritage preservation. Offering a new technical pathway for the revitalization and innovation of intangible cultural heritage. It also opens new directions for future research in the integration of AI and cultural heritage.
Sketches serve as fundamental blueprints in artistic creation because sketch editing is easier and more intuitive than pixel-level RGB image editing for painting artists, yet sketch generation remains unexplored despite advancements in generative models. We propose a novel framework CoProSketch, providing prominent controllability and details for sketch generation with diffusion models. A straightforward method is fine-tuning a pretrained image generation diffusion model with binarized sketch images. However, we find that the diffusion models fail to generate clear binary images, making the produced sketches chaotic. We thus propose to represent the sketches by unsigned distance field (UDF), which is continuous and can be easily decoded to sketches through a lightweight network. With CoProSketch, users can generate sketches progressively from rough to detailed, and make timely edits if unsatisfied. Additionally, we curate a large-scale text-sketch paired dataset as the training data. Experiments demonstrate superior semantic consistency and controllability over baselines, offering a solution for integrating user edit into generative workflows.
This paper investigates the style transfer of Chinese Traditional Patterns based on diffusion models. Chinese Traditional Patterns boast a rich history and unique artistic style, yet their application and dissemination face challenges. We collected and organized images of Chinese Traditional Patterns, constructing a high-quality Chinese Traditional Dermatoglyphic Pattern Dataset(CTDPD), and proposed the Wide Diffusion Model(WDM) with the Wide AutoEncoder to provide comprehensive receptive field information. The results from the experiment demonstrate that the Wide Diffusion Model achieves outstanding performance on the CTDPD dataset. Our designed WDM achieves the current SOTA performance compared to other cutting-edge methods on CTDPD dataset with Frechet Inception Distance of 5.45 and Inception Score of 207.5. This work thus offers significant backing to the heritability and creation of the Chinese Traditional Patterns.
This study explores the application of the Stable Diffusion Models (SD) in generating Haipai New Year paintings and investigates the approaches and methods for constructing an intelligent creative platform. By leveraging digital intelligence technology, this research aims to promote the innovative development of Haipai New Year paintings. The study employs model fine-tuning and technological innovation by integrating the Low-Rank Adaptation (LoRA) and ControlNet models into the SD, optimizing the quality and style of generated images to ensure consistency with the Haipai New Year painting style. Based on this model, an interactive and participatory intelligent painting creation platform is designed and developed. The key innovation of this research lies in the integration of the Stable Diffusion model with LoRA and ControlNet, introducing a novel approach to model fine-tuning through parameter blending. The significance of this study lies in its breakthrough in the digital revitalization of Haipai New Year paintings by leveraging advanced artificial intelligence technologies. The findings highlight the potential of AI in enhancing the visibility and sustainability of traditional cultural heritage. Using Haipai New Year paintings as a case study, this research offers practical insights for future interdisciplinary studies in cultural preservation.
Batik, a significant element of Indonesian cultural heritage, is renowned for its intricate patterns and profound philosophical meanings. While preserving traditional batik is crucial, the creation of modern patterns is equally encouraged to keep the art form vibrant and evolving. Current research primarily focuses on batik classification, leaving a gap in the exploration of generative models for batik pattern creation. This paper investigates the application of text-to-image (T2I) generative models to synthesize batik motifs, leveraging latent diffusion models (LDM), Low-Rank Adaptation (LoRA), and classifier-free guidance. Our methodology employed a dataset of 20,000 batik images. Multimodal models such as LLaVA and BLIP were utilized to generate detailed captions for these images. A pretrained LDM was subsequently fine-tuned on its denoising U-Net part, either by naively fine-tuned the entire layer or by employing using LoRA. The fine-tuning process was critical in enhancing the model’s capability to generate high-quality and user-specific batik patterns. The results demonstrated that the LDM fine-tuned on the entire denoising U-Net with LLaVA-captioned images outperformed other models, achieving the lowest Fréchet Inception Distance (FID) and highest Inception Score (IS). The thoroughness of LLaVA captions proved superior to those generated by BLIP, emphasizing the significance of detailed image descriptions in generative tasks. Notably, the model not only replicated existing batik patterns but also innovatively combined multiple motifs and even able to create entirely new designs, as verified by batik expert. This research contributes to the field of computer-assisted batik pattern generation, providing significant advantages for batik artists, manufacturers, and users by accelerating the pattern creation process and expanding the possibilities of batik art.
In the field of artificial intelligence, large generative models like Stable Diffusion have made strides in image generation. However, they struggle to accurately generate images of cultural relics with specific historical features. This study uses Low-Rank Adaptive (LoRA) fine-tuning to optimize the Stable Diffusion model for this purpose. We collected and organized cultural relic images and descriptions to customize the model, allowing effective fine-tuning while maintaining its stability. Experimental results show that the fine-tuned model accurately generates images with historical characteristics, aligning with historical data and expert evaluations. We also explore potential applications in cultural creation and artifact restoration, aiming to inspire interdisciplinary innovations and collaborations in AI applications for the cultural sector.
Image rendering from line drawings is vital in design and image generation technologies reduce costs, yet professional line drawings demand preserving complex details. Text prompts struggle with accuracy, and image translation struggles with consistency and fine-grained control. We present LineArt, a framework that transfers complex appearance onto detailed design drawings, facilitating design and artistic creation. It generates high-fidelity appearance while preserving structural accuracy by simulating hierarchical visual cognition and integrating human artistic experience to guide the diffusion process. LineArt overcomes the limitations of current methods in terms of difficulty in fine-grained control and style degradation in design drawings. It requires no precise 3D modeling, physical property specifications, or network training, making it more convenient for design tasks. LineArt consists of two stages: a multi-frequency lines fusion module to supplement the input design drawing with detailed structural information and a two-part painting process for Base Layer Shaping and Surface Layer Coloring. We also present a new design drawing dataset, ProLines, for evaluation. The experiments show that LineArt performs better in accuracy, realism, and material precision compared to SOTAs. Project page: https://meaoxixi.github.io/LineArt/.
In recent years, Generative Artificial Intelligence (GenAI) has undergone a profound transformation in addressing intricate tasks involving diverse modalities such as textual, auditory, visual, and pictorial generation. Within this spectrum, text-to-image (TTI) models have emerged as a formidable approach to generating varied and aesthetically appealing compositions, spanning applications from artistic creation to realistic facial synthesis, and demonstrating significant advancements in computer vision, image processing, and multimodal tasks. The advent of Latent Diffusion Models (LDMs) signifies a paradigm shift in the domain of AI capabilities. This article delves into the feasibility of employing the Stable Diffusion LDM to illustrate literary works. For this exploration, seven classic Brazilian books have been selected as case studies. The objective is to ascertain the practicality of this endeavor and to evaluate the potential of Stable Diffusion in producing illustrations that augment and enrich the reader's experience. We will outline the beneficial aspects, such as the capacity to generate distinctive and contextually pertinent images, as well as the drawbacks, including any shortcomings in faithfully capturing the essence of intricate literary depictions. Through this study, we aim to provide a comprehensive assessment of the viability and efficacy of utilizing AI-generated illustrations in literary contexts, elucidating both the prospects and challenges encountered in this pioneering application of technology.
As an intangible cultural heritage of China, Cloisonne embodies rich historical and cultural significance alongside unique artistic value. However, rapid modernization has led to a growing disconnection in traditional crafts, posing challenges for both preservation and innovation. This study explores innovative visual image generation methods using generative AI technologies—specifically Stable Diffusion and Midjourney—to design and create cultural product images inspired by Cloisonne. Employing literature review, design practice and expert evaluation, the research covers generative AI tools, concepts of intangible cultural heritage and cultural goods, and new approaches to digitalization and design practice. Experimentally, a Cloisonne knowledge graph was constructed as a theoretical basis, followed by building and training a LoRA model. Stable Diffusion was then used to generate visual images of Cloisonne cultural products, which were further refined with Midjourney. Results demonstrate that these AI tools effectively produce creative and artistic cultural product visuals, revitalizing Cloisonne with fresh vitality and innovation, while advancing the digital preservation and modernization of intangible cultural heritage.
The remarkable advancements in artificial intelligence (AI)-driven image generation technologies have brought about a profound transformation across various industries, particularly in new media, video production, and gaming. AI-generated content (AIGC) has emerged as a game-changing, cost efficient solution for companies seeking high-quality visual assets while operating within constrained budgets and having limited access to traditional human resources. Through the use of sophisticated algorithms, AIGC enables the creation of stunning visuals without relying on conventional, labor-intensive workflows. Among the most prominent techniques, diffusion models have played a pivotal role in the development of AI image generation tools, giving rise to both proprietary platforms like Midjourney and open-source alternatives such as Stable Diffusion. These technologies continue to evolve, benefiting from the collaborative contributions of global programming communities. This study focuses on advancing the capabilities of Stable Diffusion, an open source AI image generation model, to address prevalent challenges in style consistency and image quality. By integrating Python and harnessing cutting-edge AI techniques, such as DreamBooth and embedding methods, the research aims to enhance the model's ability to replicate and embed distinct artistic styles. Specifically, the study targets the unique art style of the popular mobile game "Arknights" as a training objective, applying advanced techniques to refine the system's output. The proposed approach demonstrates significant improvements over the baseline model, showcasing enhanced performance in generating style consistent anime imagery. This research contributes to the evolving landscape of AI-driven art generation, offering novel insights into the application of diffusion based technologies within creative industries. By utilizing DreamBooth and embedding for style transfer and injection, the study achieves notable efficiency, drastically reducing the time required to train a new model. Ultimately, this work paves the way for more specialized and customizable AI systems in art creation, pushing the boundaries of what AI can achieve in the realm of creative expression.
In the realm of personalized cultural and creative product design, the capacity for nuanced semantic expression and refined style modulation in image content exerts a pivotal influence on user experience and the perceived creative value. Addressing the limitations of current generative models—particularly in maintaining stylistic coherence and accommodating individualized preferences—this article introduces a novel image synthesis framework grounded in a synergistic mechanism that integrates text-driven guidance, adaptive style modulation, and evolutionary optimization: Evolutionary Adaptive Generative Aesthetic Network (EAGAN). Anchored in the Stable Diffusion architecture, the model incorporates a semantic text encoder and a style transfer module that will realize the image style transfer, augmented by the Adaptive Instance Normalization (AdaIN) mechanism, to enable precise manipulation of stylistic attributes. Concurrently, it embeds an evolutionary optimization component that iteratively refines cue phrases, stylistic parameters, and latent noise vectors through a genetic algorithm, thereby enhancing the system’s responsiveness to dynamic user tastes. Empirical evaluations on benchmark datasets demonstrate that EAGAN surpasses prevailing approaches across a suite of metrics—including Fréchet inception distance (FID), CLIPScore, and Learned Perceptual Image Patch Similarity (LPIPS)—notably excelling in the harmonious alignment of semantic fidelity and stylistic expression. Ablation studies further underscore the critical contributions of the style control and evolutionary optimization modules to overall performance gains. This work delineates a robust and adaptable technological trajectory with substantial practical promise for the intelligent, personalized generation of cultural and creative content, thus fostering the digital and individualized evolution of the creative industries.
"Rodin" leverages a Latent Diffusion Transformer and a 3D ConditionNet to facilitate the rapid generation of 3D assets from text, images, and direct inputs. This system streamlines the creation process, producing production-ready models optimized for real-time engines. This "Real-Time Live!" highlights Rodin’s innovative framework, its seamless integration into professional workflows, and its transformative impact on the 3D content creation industry.
Stable Diffusion and ControlNet have achieved excellent results in the field of image generation and synthesis. However, due to the granularity and method of its control, the efficiency improvement is limited for professional artistic creations such as comics and animation production whose main work is secondary painting. In the current workflow, fixing characters and image styles often need lengthy text prompts, and even requires further training through TextualInversion, DreamBooth or other methods, which is very complicated and expensive for painters. Therefore, we present a new method in this paper, Stable Diffusion Reference Only, a images-to-image self-supervised model that uses only two types of conditional images for precise control generation to accelerate secondary painting. The first type of conditional image serves as an image prompt, supplying the necessary conceptual and color information for generation. The second type is blueprint image, which controls the visual structure of the generated image. It is natively embedded into the original UNet, eliminating the need for ControlNet. We released all the code for the module and pipeline, and trained a controllable character line art coloring model at https://github.com/aihao2000/stable-diffusion-reference-only, that achieved state-of-the-art results in this field. This verifies the effectiveness of the structure and greatly improves the production efficiency of animations, comics, and fanworks.
With the rapid advancement of digital technology and mobile augmented reality (AR) or virtual reality (VR), the digital integration of traditional craft motifs into contemporary art has emerged as a significant research focus. Traditional craft motifs embody profound cultural heritage and artistic value, and their distinctive visual characteristics offer substantial potential for digital artistic expression. The immersive and interactive affordances of mobile AR/VR technologies provide a novel paradigm for artistic creation. The effective integration of traditional craft motifs into mobile AR/VR interactive art is a critical pathway for both cultural heritage preservation and the advancement of contemporary artistic expression. However, existing style transfer algorithms and techniques remain constrained by limitations in content feature preservation, stylistic fidelity, and expressive capacity in interactive art contexts. To address these challenges, a novel diffusion model-based style transfer algorithm tailored for mobile AR/VR interactive art was proposed, enabling the effective extraction and transfer of visual features from traditional craft motifs. This approach emphasizes the preservation of cultural and artistic integrity throughout the style transfer process. Furthermore, an inversion-based feature condition acquisition method was introduced, alongside a two-stage inversion strategy designed to retain essential content features, thereby overcoming prevalent issues such as content loss and insufficient style transfer effect. These innovations not only significantly enhance both the visual quality and expressive power of traditional motifs within mobile AR/VR environments but also contribute to the convergence of digital art and cultural preservation, offering new pathways for inspiration and technique in contemporary interactive art creation.
As artificial intelligence (AI) continues to evolve from a back-end computational tool into an interactive, generative collaborator, its integration into early-stage design processes demands a rethinking of traditional workflows in human-centered design. This paper explores the emergent paradigm of human-AI co-creation, where AI is not merely used for automation or efficiency gains, but actively participates in ideation, visual conceptualization, and decision-making. Specifically, we investigate the use of large language models (LLMs) like GPT-4 and multimodal diffusion models such as Stable Diffusion as creative agents that engage designers in iterative cycles of proposal, critique, and revision.Our study is grounded in a mixed-methods experimental setup involving 24 professional and novice designers from diverse backgrounds. Each participant completed two design tasks: one using a conventional digital toolset (Adobe XD, Figma, Sketch), and another with access to AI-assisted tools that provided both text-based concept ideation and image generation support. We captured all interaction data, output artifacts, and post-task interviews to understand how AI affects cognitive load, ideation fluency, and perceived creativity. The AI models were prompted using open-ended and task-specific queries, and designers could iterate on or reject outputs at will.The findings reveal several key patterns. First, AI significantly reduces the time spent in the “blank slate” phase of ideation, providing a scaffold of initial concepts that users can build upon or remix. Second, the outputs generated by AI often diverge from conventional aesthetics or functional patterns, serving as “creative dissonance” that pushes designers toward new conceptual territories. Third, participants reported a stronger sense of cognitive partnership with AI when systems provided rationale for their suggestions, suggesting that explainability is critical for trust and effective collaboration.We introduce a co-design framework that includes three levels of AI involvement: passive assistance (suggestive prompts), interactive co-creation (real-time response and refinement), and proactive collaboration (AI initiating alternative design pathways). Furthermore, we discuss the ethical and cognitive implications of relying on AI for generative input, including issues related to bias, originality, and designer agency. Our work contributes both to design theory and practical system development, providing guidelines for building next-generation design platforms that are AI-native and human-centered.In conclusion, the integration of generative AI into the design process has the potential to augment not just efficiency but also originality, inclusion, and resilience of design outputs. However, successful implementation requires a redefinition of authorship, transparency in AI behavior, and mechanisms for human oversight and reflection. This paper sets a foundation for future work in human-AI design partnerships and proposes concrete methodologies for evaluating and scaling such systems across design disciplines.
With the rapid development of generative AI technology, animation character creation is transitioning from traditional manual design to a “human-machine collaboration” model, where style transfer an essential application scenario, enables efficient conversion of diverse animation styles. This study focuses on the application of generative AI in animation character style transfer. First, it clarifies the principles of core technologies (GANs, Diffusion Models) and defines the compositional elements of animation character styles as well as style transfer design objectives. Then, it elaborates on specific application paths from three dimensions: data preparation, model selection (CycleGAN, Stable Diffusion), and process implementation. Practical cases validate the technology’ s feasibility and creative efficiency gains. Finally, optimization strategies (e.g., model fine-tuning for precise control, human-machine collaborative copyright compliance) are proposed. This study provides valuable reference for advancing the animation industry toward efficiency and personalization.
With the rapid development of Artificial Intelligence Generated Content (AIGC) technology, the field of digital media art has ushered in unprecedented opportunities for innovation. This study focuses on exploring the integration of AIGC into digital media art creation and the corresponding innovations in communication modes. First, it analyzes the current status of digital media art creation and communication, identifying the limitations of traditional models in terms of efficiency, personalization, and audience interaction. Then, a novel AIGC-integrated digital media art creation framework is proposed, which combines generative adversarial networks (GANs), diffusion models, and transformer-based language models to enhance the diversity and creativity of artworks. For the communication mode, a multi-channel interactive communication model is designed, leveraging social media platforms, virtual reality (VR) exhibition halls, and blockchain-based digital asset management systems to improve the reach and influence of digital media art.To verify the effectiveness of the proposed framework and model, experiments are conducted from two aspects: creation efficiency and communication effect. In the creation efficiency experiment, 50 professional digital media artists and 50 non-professional participants are invited to create artworks using the traditional method and the AIGC-integrated method respectively. The results show that the AIGC-integrated method reduces the creation time by an average of 62.3% and improves the diversity of artworks by 45.8% compared with the traditional method. In the communication effect experiment, two groups of artworks (one created with the AIGC-integrated method and the other with the traditional method) are promoted through different communication channels. The data indicate that the multi-channel interactive communication model increases the audience engagement rate by 78.5% and the artwork sharing rate by 63.2% compared with the single-channel communication model.
Chinese Opera characters ink painting, a distinctive blend of Chinese color ink painting and traditional opera, reflects the rich aesthetic heritage of Chinese culture. The advent of Artificial Intelligence Generated Content (AIGC) technology presents new opportunities for preserving and innovating this traditional art form. While style transfer techniques have been widely applied to Western art, the freehand style of Chinese ink painting remains under-explored. This paper fills this gap by constructing datasets of Chinese Opera character paintings through field visits and web crawling. This paper develops an automated system for transforming realistic opera character images into Chinese opera character paintings by leveraging generative adversarial networks (GAN) technology. The generated results prove that the GAN-based model is able to learn the key features of a style image and be able to distinguish the relationship between people in the image, it is better than an ordinary person with no foundation would draw. This research advances AI’s application in traditional art and provides a new thought to the preservation, dissemination, and modern reinterpretation of Chinese Opera characters ink painting.
AnimHost is an open-source, artist-focused application for real-time character animation driven by AI-generated motion data. It enables intuitive control of movement trajectories within familiar DCC (Digital Content Creation) tools such as Blender, supporting rapid iteration and seamless integration into standard production pipelines. Built on the TRACER1 ecosystem, AnimHost decouples animation generation from host applications and introduces a node-based compute graph for preprocessing, inference, and retargeting. By allowing artists to train and deploy custom motion models, it maintains creative authorship and transparency over training data. AnimHost bridges the gap between research and production, offering a scalable and extensible path for integrating generative AI into real-world animation workflows.
Technology development has brought us to a time when generative AI, such as Stable Diffusion in LeonardoAI, can make the work of a character designer easier. Generative AI can accelerate the character design process, especially at the empathy stage, studying the visual preferences of the target audience and exploring alternative visual styles. In a case study of designing an animated series as content to accompany meals for people aged 25-30 years who like to watch movies while eating, the authors try to shorten the character design process using LeonardoAI. The authors try to understand the visual preferences of the target audience using a quantitative method, using a questionnaire filled with questions about the images generated by LeonardoAI. The questionnaire results became the basis for the character design ideas for the animated series ‘Genset Gang’ using keywords to develop the character's visuals. The authors found that using LeonardoAI can speed up the character design process because it can produce many images quickly, although its use has challenges. After all, LeonardoAI relies heavily on the character designer's ability to translate his ideas into prompt text and how the instruction code is successfully written.
Thangka is a unique intangible cultural heritage of the Tibet, with a long history and numerous schools, character-ized by distinctive techniques for depicting figures and expressing color. Style transfer is a popular research direction in computer vision, which can effectively extract the style from an image and incorporate it into another. Currently, most research on Thangka style transfer focuses on extracting Thangka styles and transferring them to non- Thangka images, while neglecting style transfer between different Thangka images. Style transfer between Thangka images presents two major challenges: first, the rich use of color and complex style in Thangka; second, directly applying style transfer does not adequately represent the original Thangka style in the generated image and can distort the content structure. Therefore, this paper proposes a method that combines segmentation and style transfer, enabling style transfer between corresponding regions of Thangka images. This approach contributes to the digital inheritance and preservation of Thangka art.
A diverse team of engineers, artists, and algorithms, collaborated to create songs for SophiaPop, via various neural networks, robotics technologies, and artistic tools, and animated the results on Sophia the Robot, a robotic celebrity and animated character. Sophia is a platform for arts, research, and other uses. To advance the art and technology of Sophia, we combine various AI with a fictional narrative of her burgeoning career as a popstar. Her actual AI-generated pop lyrics, music, and paintings, and animated conversations wherein she interacts with humans real-time in narratives that discuss her experiences. To compose the music, SophiaPop team built corpora from human and AI-generated Sophia character personality content, along with pop music song forms, to train and provide seeds for a number of AI algorithms including expert models, and custom-trained transformer neural networks, which then generated original pop-song lyrics and melodies. Our musicians including Frankie Storm, Adam Pickrell, and Tiger Darrow, then performed interpretations of the AI-generated musical content, including singing and instrumentation. The human-performed singing data then was processed by a neural-network-based Sophia voice, which was custom-trained from human performances by Cereproc. This AI then generated the unique Sophia voice singing of the songs. Then we animated Sophia to sing the songs in music videos, using a variety of animation generators and human-generated animations. Being algorithms and humans, working together, SophiaPop represents a human-AI collaboration, aspiring toward human AI symbiosis. We believe that such a creative convergence of multiple disciplines with humans and AI working together, can make AI relevant to human culture in new and exciting ways, and lead to a hopeful vision for the future of human-AI relations.
Digital storytelling, as an art form, has struggled with cost-quality balance. The emergence of AI-generated Content (AIGC) is considered as a potential solution for efficient digital storytelling production. However, the specific form, effects, and impacts of this fusion remain unclear, leaving the boundaries of AIGC combined with storytelling undefined. This work explores the current integration state of AIGC and digital storytelling, investigates the artistic value of their fusion in a sample project, and addresses common issues through interviews. Through our study, we conclude that AIGC, while proficient in image creation, voiceover production, and music composition, falls short of replacing humans due to the irreplaceable elements of human creativity and aesthetic sensibilities at present, especially in complex character animations, facial expressions, and sound effects. The research objective is to increase public awareness of the current state, limitations, and challenges arising from combining AIGC and digital storytelling.
No abstract available
No abstract available
No abstract available
No abstract available
This study aims to utilize AIGC(AI Generated Content) technology, specifically employing ChatGPT-4 and Midjourney 6.0 as the primary tools, to design and generate visual images for the Chinese mythology collection “Classic of Mountains and Seas.” In the theoretical background section, the principles of AIGC and its service architecture are elaborated upon, with a detailed account of the development trajectories of ChatGPT and Midjourney. Subsequently, an analysis of the structural framework of “Classic of Mountains and Seas” and the basic characteristics of its mythological creatures provides the theoretical foundation for the experimental design. In the experimental design section, specific experimental steps are formulated by analyzing the advantages of ChatGPT and Midjourney. The processes of prompt input, image generation, and result evaluation are meticulously described. The experimental results indicate that ChatGPT-4 and Midjourney 6.0 can generate highly realistic and artistically rich visual images, effectively reproducing the classic imagery of the “Classic of Mountains and Seas.” However, challenges remain in handling complex mythological texts and intricate details, necessitating designer assistance to enhance accuracy. This study demonstrates the potential of AIGC in mythological art creation, paving a new path for the contemporary expression of traditional culture.
Artificial intelligence-generated content(AIGC) has demonstrated its potential to provide efficient and creative design solutions through user-friendly interfaces. This capability has garnered significant attention in the design field. In recent years, cosmetic retail stores need differentiated facade designs to effectively convey their brand identity. Accordingly, this study aims to explore the integration potential of AIGC technology in cosmetic retail facade design. The research methodology involved a review of AIGC technology, image-generation platforms, and cosmetic retail facade design. Facade images for three Korean cosmetic retail brands Skinfood, Innisfree and Missha were enerated and analyzed using Midjourney and Stable Diffusion Web UI platforms. These platforms effectively expressed both structural and decorative elements to convey brand concepts. Specifically, they generated creative images that faithfully reflected brand concepts in elements such as signboards, shop windows, and lighting, while logos were presented as substitutes due to copyright considerations. This study concludes that AIGC technology can provide innovative and competitive solutions for cosmetic retail facade design. Furthermore, AIGC is expected to offer creative and innovative application strategies in commercial space design.
In this study, through Artificial Intelligence Generated Content (AIGC) for the local ethnic minority Pumi people's dress patterns of China's local culture as a carrier, we mainly extract the dress patterns characterized by the Lonicera pattern, combining AIGC technology and the Lonicera pattern to further generate the IP image with national characteristics. Using the stable diffusion model, based on the dataset of a complete Lonicera pattern, the LoRA (Low Rank Adaptation) model is trained for the Pumi dress pattern, and an IP series of image designs based on the Chinese Pumi ethnic group is innovatively designed. This ultimately achieves the purpose of integrating traditional culture and technology, publicizing the local characteristics of the dresses, and passing down the excellent traditional dress patterns of this ethnic group. This research provides reference value for the integration of the Pumi ethnic minority and promotes the digital transformation and innovative development of national culture.
To address the dual requirements of cultural expression and engineering implementation in the visual design of time-honored brands, this study proposes an adaptive optimization architecture based on Stable Diffusion. The framework employs Textual Inversion to derive composable cultural tokens and utilizes LoRA/DreamBooth parameters for the efficient fine-tuning of both generic and proprietary styles. By integrating ControlNet and IP-Adapter, the system achieves a fusion of layout and style priors, while a dual-channel gating mechanism enables collaborative control over semantics and composition. During inference, reliability in prompt adherence is calibrated through CFG-Rescale, attention reweighting, and temperature scaling. Extensive experiments on publicly available multimodal datasets and real-world brand scenarios demonstrate a significant improvement in the alignment between objective metrics and human evaluations. The method's stability and necessity are confirmed through robustness tests and component ablation studies, while A/B testing reveals its distinct advantages in cost-effectiveness and operational efficiency. This research ultimately provides a replicable and verifiable technical solution for the visual generation needs of both cultural heritage and commercial brands.
Animation style generation technology based on artificial intelligence is gaining increasing attention in the current society, and it has shown a wide range of application prospects in creative design, game development, and advertising. Based on the Animefull-follow pre-training dataset, the effect of using Stable Diffusion technology combined with LoRA model to generate a single anime character image was discussed. The experimental results show that the generated image is highly consistent with the original image in terms of style and detail, and successfully captures the unique characteristics of animation art. Although the Fréchet Inception Distance (FID) value of the generated image is 77. 29 when calculating the FID value, indicating that there are visual differences to a certain extent, these differences usually do not significantly affect the user's perception in the animation field, and the generated images still show excellent visual effects. Combined with the advantages of the LoRA model, the resources and time required for training are significantly reduced, enabling high- quality image generation even in resource-constrained environments.
Blue-and-white porcelain, as a representative of traditional Chinese craftsmanship, embodies rich cultural genes and possesses significant research value. Against the backdrop of the generative AI era, this study aims to optimize the creative processes of blue-and-white porcelain to enhance the efficiency and accuracy of complex artistic innovations. Traditional methods of crafting blue-and-white porcelain encounter challenges in accurately and efficiently constructing intricate patterns. This research employs grounded theory in conjunction with the KANO-AHP hybrid model to classify and quantify the core esthetic features of blue-and-white porcelain, thereby establishing a multidimensional esthetic feature library of its patterns. Subsequently, leveraging the Stable Diffusion platform and utilizing Low-Rank Adaptation (LoRA) technology, a generative artificial intelligence (AIGC)-assisted workflow was proposed, capable of accurately restoring and innovating blue-and-white porcelain patterns. This workflow enhances the efficiency and precision of pattern innovation while maintaining consistency with the original artistic style. Finally, by integrating principles of sustainable design, this study explores new pathways for digital innovation in blue-and-white porcelain design, offering viable solutions for the contemporary reinvention of traditional crafts. The results indicate that AIGC technology effectively facilitates the integration of traditional and modern design approaches. It not only empowers the inheritance and continuation of the cultural genes of blue-and-white porcelain but also introduces new ideas and possibilities for the sustainable development of traditional craftsmanship.
The rapid development of technology today helps us in various fields of work. One of the fields that can utilize technology in helping their work is batik. Utilizing Deep Learning to manage data in the form of batik pattern images and typical bakaran batik patterns using the Generative Model method, namely Stable Diffusion which aims to produce better and more detailed batik pattern images by maintaining the original pattern of batik patterns and typical bakaran batik patterns. This research only uses datasets in the form of batik pattern images and typical bakaran batik patterns. The image data is processed augmentation first by performing the inverse on the image, resizing the image to 512x512, then randomly rotating the image, performing a random horizontal flip on the image, and performing the inverse again on the image. Pre-Training on image data to find the right parameters and conditions used in the training process. The result of this research is that the Stable Diffusion model version 1.4 and version 2.1 show good performance in processing and creating batik pattern images and batik patterns typical of Bakaran. In this study, the score calculation process for Stable Diffusion version 1.4 and version 2.1 was carried out using Inception Score and CLIP Score to calculate the images generated from the two versions. In the calculation using CLIP Score, the results obtained by version 1.4 are higher than version 2.1 for the same reason as Inception Score because the image produced by version 1.4 is more abstract. Of the two versions used is version 1.4 because the resulting image shows an abstract image that reflects a good batik pattern. Then, the version used to process batik patterns and batik patterns typical of Bakaran is Stable Diffusion version 1.4 which shows excellent performance in processing batik pattern images. The results of Stable Diffusion version 1.4 show good and abstract batik patterns in accordance with the characteristics of Bakaran batik.
The implementation of neural networks in the creative design process enables original and innovative results and increased efficiency in creating a visual art product, and therefore it is important to explore how various interactive tools can contribute to the development of the creative abilities of future design professionals. The purpose of this study was to investigate the capabilities and characteristics of Midjourney, Stable Diffusion, and DALL-E neural networks in the context of their use in teaching design students. The study used the analytical method, comparison, generalisation, and systematisation methods. The study found that the neural networks Midjourney, Stable Diffusion and DALL-E have prospects for implementation in the educational process for students of design specialities. The authors of this paper revealed the significant potential of artificial intelligence, namely neural networks, in design, namely for creating fonts, typographic elements, posters, banners, graphics, and illustrations. By comparing the capabilities of the Midjourney, Stable Diffusion, and DALL-E neural networks, it was found that each of them has a specific purpose and architecture that is effective for performing various design tasks. The findings of the study demonstrate the potential of neural networks to improve the education of students of design-related specialities. It was substantiated that the introduction of suitable methods and techniques can help expand the creative spectrum, ensure stability and control in generating images, and lead to a more effective implementation of ideas in visual realities. The results of this study can be useful as tools for developing educational approaches in the field of design and introducing modern technologies into the educational process.
This research aims to overcome the limitations of the Stable Diffusion model in creating conceptual works of art, focusing on problem identification, research objectives, methodology and research results. Even though Stable Diffusion has been recognized as the best model, especially in the context of creating conceptual artwork, there is still a need to simplify the process of creating concept art and find the most suitable generative model. This research used three methods: Latent Diffusion Model, Dreambooth: fine-tuning Model, and Stable Diffusion. The research results show that the Dreambooth model produces a more real and realistic painting style, while Textual Inversion tends towards a fantasy and cartoonist style. Although the effectiveness of both is relatively high, with minimal differences, the Dreambooth model is proven to be more effective based on the consistency of FID, PSNR, and visual perception scores. The Dreambooth model is more efficient in training time, even though it requires more memory, while the inference time for both is relatively similar. This research makes a significant contribution to the development of artificial intelligence in the creative industries, opens up opportunities to improve the use of generative models in creating conceptual works of art, and can potentially drive positive change in the use of artificial intelligence in the creative industries more broadly.
This study focuses on utilizing Stable Diffusion technology to assist hand figurine design, exploring innovative methods through image generation AI to create unique Buddhist statue hand figurines. The research centers on the hand figurine of the Longxing Temple Buddha from the Cheongju Museum, explaining the principles of the technology and the characteristics of visual creation through literature and case studies. It analyzes the concept of hand figurines and explores the aesthetics of Buddhist statues and the SWOT analysis of the hand figurine market, identifying design challenges. Based on similar research and user analysis, the study derives user requirements and transforms the aesthetic characteristics of the Buddhist statue into a LoRA model training set, verifying its learning effectiveness. Subsequently, it generates images using prompts and LoRA hierarchical control, evaluating their usability. The results show that this hand figurine design method contributes to modernizing aesthetic characteristics and integrating trendy elements, enhancing efficiency and effectiveness, and is expected to offer a new direction for hand figurine design development.
Miao embroidery holds significant cultural, economic, and aesthetic value. However, its transmission faces numerous challenges: a high learning threshold, a lack of interest among younger generations, and low production efficiency. These factors have created obstacles to its sustainable development. In the age of artificial intelligence (AI), generative AI is expected to improve the efficiency of pattern innovation and the adaptability of the embroidery industry. Therefore, this study proposes a Miao embroidery pattern generation and application method based on Stable Diffusion and low-rank adaptation (LoRA) fine-tuning. The process includes image preprocessing, data labeling, model training, pattern generation, and embroidery production. Combining objective indicators with subjective expert review, supplemented by feedback from local artisans, we systematically evaluated five representative Miao embroidery styles, focusing on generation quality and their social and business impact. The results demonstrate that the proposed model outperforms the original diffusion model in terms of pattern quality and style consistency, with optimal results obtained under a LoRA scale of 0.8–1.2 and diffusion steps of 20–40. Generated patterns were parameterized and successfully implemented in digital embroidery. This method uses AI technology to lower the skill threshold for embroidery training. Combined with digital embroidery machines, it reduces production costs, significantly improving productivity and increasing the income of embroiderers. This promotes broader participation in embroidery practice and supports the sustainable inheritance of Miao embroidery. It also provides a replicable technical path for the intelligent generation and sustainable design of intangible cultural heritage (ICH).
We introduce SketcherX, a novel robotic system for personalized portrait drawing through interactive human-robot engagement. Unlike traditional robotic art systems that rely on analog printing techniques, SketcherX captures and processes facial images to produce vectorized drawings in a distinctive, human-like artistic style. The system comprises two 6-axis robotic arms : a face robot, which is equipped with a head-mounted camera and Large Language Model (LLM) for real-time interaction, and a drawing robot, utilizing a fine-tuned Stable Diffusion model, ControlNet, and Vision-Language models for dynamic, stylized drawing. Our contributions include the development of a custom Vector Low Rank Adaptation model (LoRA), enabling seamless adaptation to various artistic styles, and integrating a pair-wise fine-tuning approach to enhance stroke quality and stylistic accuracy. Experimental results demonstrate the system's ability to produce high-quality, personalized portraits within two minutes, highlighting its potential as a new paradigm in robotic creativity. This work advances the field of robotic art by positioning robots as active participants in the creative process, paving the way for future explorations in interactive, human-robot artistic collaboration.
The growing demand for efficient and scalable solutions in creative industries has spurred significant advancements in artificial intelligence, particularly in the design and generation of IP imagery. Traditional methods for IP imagery design rely heavily on manual efforts, making the process time-consuming, resource-intensive, and difficult to scale. Recent developments in deep learning, such as generative adversarial networks (GANs) and diffusion models, have enabled automation and innovation in artistic workflows. However, these methods often fail to address the unique requirements of IP imagery, such as stylistic consistency, adaptability across diverse platforms, and alignment with branding objectives. This study proposes a novel deep learning framework for the automatic generation and style transfer of IP imagery. The framework integrates a transformer-based generation model with a hierarchical style transfer mechanism, enabling the production of high-quality, stylistically consistent outputs across various artistic domains. Hierarchical latent space encoding preserves global structural coherence and local stylistic detail, while adaptive instance normalization (AdaIN) facilitates seamless blending of content and style. Extensive experiments demonstrate the framework's superiority over baseline models, achieving lower Fréchet Inception Distance (FID), improved perceptual similarity, and enhanced style consistency. A user study further validates its practicality, highlighting its ability to meet diverse artistic and branding requirements. These results underscore the framework's potential in automating creative workflows and advancing IP design.
This paper talks about the uses of Assisted by AIs during the development of game concept art and how the use of AIs affects the modern game development industry. For something like stable diffusion DALL - E, Midjourney, get it, keep the look, design it piece by piece. And just like what we’re seeing even just through some of the case studies for some of the current game studios, they’ve been able to reduce their concept iteration times by up to anywhere between 40-60% and yet maintain the artistry. We suggest a real-world setup where we do an AI setup above a pipeline already in use - there will be prompts that need to be shaped, a need for quality standards and for some coaching of artists. According to what we’ve found out AI can help increase productivity and exploration but at the same time people’s directions must be there for a coherent and moving game art for game developers, they can learn how to make full use of AI technology and how to keep creativity up from the above research.
This paper explores the innovative application of Stable Video Diffusion (SVD), a diffusion model that revolutionizes the creation of dynamic video content from static images. As digital media and design industries accelerate, SVD emerges as a powerful generative tool that enhances productivity and introduces novel creative possibilities. The paper examines the technical underpinnings of diffusion models, their practical effectiveness, and potential future developments, particularly in the context of video generation. SVD operates on a probabilistic framework, employing a gradual denoising process to transform random noise into coherent video frames. It addresses the challenges of visual consistency, natural movement, and stylistic reflection in generated videos, showcasing high generalization capabilities. The integration of SVD in design tasks promises enhanced creativity, rapid prototyping, and significant time and cost efficiencies. It is particularly impactful in areas requiring frame-to-frame consistency, natural motion capture, and creative diversity, such as animation, visual effects, advertising, and educational content creation. The paper concludes that SVD is a catalyst for design innovation, offering a wide array of applications and a promising avenue for future research and development in the field of digital media and design.
With the development of deep learning, image synthesis has achieved unprecedented achievements in the past few years. Image synthesis models, represented by diffusion models, demonstrated stable and high-fidelity image generation. However, the traditional diffusion model computes in pixel space, which is memory-heavy and computing-heavy. Therefore, to ease the expensive computing and improve the accessibility of diffusion models, we train the diffusion model in latent space. In this paper, we are devoted to creating novel paintings from existing paintings based on powerful diffusion models. Because the cross-attention layer is adopted in the latent diffusion model, we can create novel paintings with conditional text prompts. However, direct training of the diffusion model on the limited dataset is non-trivial. Therefore, inspired by the transfer learning, we train the diffusion model with the pre-trained weights, which eases the training process and enhances the image synthesis results. Additionally, we introduce the GPT-2 model to expand text prompts for detailed image generation. To validate the performance of our model, we train the model on paintings of the specific artist from the dataset WikiArt. To make up for missing image context descriptions of the WikiArt dataset, we adopt a pre-trained language model to generate corresponding context descriptions automatically and clean wrong descriptions manually, and we will make it available to the public. Experimental results demonstrate the capacity and effectiveness of the model.
This study investigates the paradigm shift in interdisciplinary art and design education driven by artificial intelligence (AI) advancements, proposing an "AI-Interdisciplinary-Project" pedagogical model validated through the empirical case of Guyi Garden cultural product design. By integrating generative AI technologies (e.g., Stable Diffusion), a STEAM-E (Science, Technology, Engineering, Arts, Mathematics, and Ethics) knowledge framework, and agile project-based learning, we establish a dynamic pedagogical cycle comprising three innovation layers: AI-accelerated cultural symbol extraction (40% efficiency gain), human-AI co-creation workflows (reduced design iteration cycle to 5.3 days/prototype), and ethically constrained social validation (82.4 user satisfaction score). Empirical results demonstrate significant educational outcomes, with a 27% enhancement in students' interdisciplinary collaboration competence and the development of 8 commercially viable product prototypes, while effectively bridging traditional architectural motifs with contemporary design paradigms. The research further articulates a "techno-humanistic equilibrium" framework supported by an open-source toolchain ecosystem, providing replicable strategies for AI-era design education innovation. Its applicability extends to rural intangible cultural heritage revitalization and inclusive product development, catalyzing synergistic evolution among educational, industrial, and cultural ecosystems through techno-cultural hybridization.
This paper presents a minimalist hybrid workflow that couples digital sculpting with generative image synthesis for character look development. A manually sculpted bust in ZBrush serves as the authoritative base; camera-matched renders are processed with Stable Diffusion (image-to-image and inpainting) to propose stylistic variations and local ornament while preserving volumetric intent. Treating the model as an instrument beside craft accelerates exploration without displacing authorship. Qualitative studies show that prompt discipline and conservative denoise ranges yield coherent, review-ready outcomes; approved decisions are round-tripped back into geometry so they persist beyond 2D imagery. The approach demonstrates a pragmatic, reproducible pathway for co-creative sculpt workflows leveraging latent diffusion and established digital sculpt practice.
With the rapid development of artificial intelligence, particularly AI-Generated Content (AIGC) technologies, the digital innovation of intangible cultural heritage has entered a new stage of opportunity. Zigong lantern art, as a representative form of traditional Chinese lantern craftsmanship, now faces an urgent need for transformation in terms of design methods, dissemination channels, and modes of artistic expression. This study takes AIGC as a starting point to explore its application pathways and innovative potential in the design of Zigong lanterns. The paper first reviews the traditional craftsmanship and cultural characteristics of Zigong lanterns, and, in conjunction with current developments in AIGC and image generation mechanisms, proposes a collaborative design model of "AI generation - human selection - craft transformation." In the design practice, the theme “Camel Shadows Along the Silk Road” is used to generate visual design concepts via Midjourney and Stable Diffusion, and the transition from AI-generated images to physical lantern installations is tested. Findings indicate that AIGC can effectively enhance design efficiency and the quality of creative output, offering significant advantages in visual ideation, form expression, and thematic exploration. However, challenges such as cultural misrepresentation and distorted image details persist. The paper concludes by offering strategic recommendations for the application of AIGC in lantern art, aiming to provide a reference for the integrated development of traditional crafts and intelligent design.
The integration of AI into the design industry is transforming image generation with novel creative opportunities, alongside new difficulties for designers. This research analyses the effects of emerging artificial intelligence image generation trends on user interface (UI) and user experience (UX) design and the growing role of the designer in an AI-enhanced creative workflow. The development of generative adversarial networks (GANs) for image-to-text programs such as DALL·E and Stable Diffusion has led to sophisticated and personalized virtual imagery. These technologies also allow for automated user interface (UI) element creation, content flexibility, and advanced design mockup construction, thereby improving the effectiveness of the design process. The application of AI in design comes with issues, including the violation of design integrity, ethical matter regarding bias and plagiarism, and dependence on automation. In this paper, we argue about the coexistence of human imagination and AI content under the premise that designers are able to use AI as an assistant, not as a replacement. Using a combination of literature review and case studies, we investigate the integration of artificial intelligence in imagery with the purpose of improving user experience (UX), accessibility, and personalization of user engagement.
Recent developments in prompt-based generative AI has given rise to discourse surrounding the perceived ethical concerns, economic implications, and consequences for the future of cultural production. As generative imagery becomes pervasive in mainstream society, dominated primarily by emerging industry leaders, we encourage that the role of the CHI community be one of inquiry; to investigate the numerous ways in which generative AI has the potential to, and already is, augmenting human creativity. In this paper, we conducted a diffractive analysis exploring the potential role of prompt-based interfaces in artists’ creative practice. Over a two week period, seven visual artists were given access to a personalised instance of Stable Diffusion, fine-tuned on a dataset of their work. In the following diffractive analysis, we identified two dominant modes adopted by participants, AI for ideation, and AI for production. We furthermore present a number of ethical design considerations for the future development of generative AI interfaces.
: This paper explores the multifaceted relationship between artificial intelligence (AI) and animation character design, with a particular focus on efficiency, creativity, and interactivity. Adopting a interpretative phenomenological method, we delve into the understanding of these aspects by analyzing the application of AI in animation character design, providing a unique perspective to explore the perceptions and experiences of animators and audiences in this context. The comprehensive exploration comprises four main sections: an introduction to AI in animation character design, an overview of phenomenological hermeneutics, detailed analysis of efficiency, creativity, and interactivity cases, and finally, a discussion on the substitutable and irreplaceable aspects of AI in animation character design.
In an era where AI image generators are being integrated across various artistic domains, there is a need for a directional shift in character design in the beauty education field to align with the current trends. The graphic tools currently in use often face limitations in achieving a refined expression when artistic sensibility and expression techniques are lacking. This study proposes the character design education method enabling an intuitive and fast process as the tools that realistically mimic human behavior and appearance. This implies that anyone with the ability to analyze scenarios can create characters with a sense of realism. This study selects the main characters of Jean Valjean, Fantine, Javert, and Cosette from Les Misérables, one of the world's four major musicals, and develops the characters through the following process. Firstly, personality analysis of the characters is conducted through scenario analysis. Next, 3D character design utilizing meta-human creators is executed based on detailed character descriptions. Thirdly, designs are finely adjusted using Photoshop, and Photoshop AI is used to stage backgrounds fitting the era of the play, maximizing the character images. The proposed approach in this study aims to facilitate the creation of high-quality character designs in the beauty education field through various methods, enabling students to enhance their analytical skills without investing significant time in learning tool usage, thereby fostering them to fulfill genuine planner roles.
—This study introduces the ConvNeXt-CycleGAN, a novel deep learning-based Generative Adversarial Network (GAN) designed for digital art style migration. The model addresses the time-consuming and expertise-driven nature of traditional artistic creation, aiming to automate and accelerate the style transfer process using artificial intelligence. The ConvNeXt-CycleGAN integrates ConvNeXt blocks within the CycleGAN framework, enhancing convolution capabilities and leveraging self-attention mechanisms for precise and nuanced artistic style capture. The model undergoes rigorous evaluation using multiple performance metrics, including Inception Score (IS), Peak Signal-to-Noise Ratio (PSNR), and Fréchet Inception Distance (FID), ensuring its effectiveness in generating high-quality, diverse images while retaining fidelity during style transfer. The ConvNeXt-CycleGAN surpasses traditional GAN models across key metrics: it achieves an IS of 12.7004 (higher image diversity), a PSNR of 14.0211 (better preservation of original artwork integrity), and an FID of 234.1679 (closer resemblance to real artistic distributions). Additionally, its ability to efficiently train on unpaired images via unsupervised learning enhances its real-world applicability. This research presents an architectural innovation by combining ConvNeXt blocks with the CycleGAN framework, offering robust performance across diverse datasets and artistic styles. The ConvNeXt-CycleGAN represents a significant advancement in the integration of AI with creative processes, providing a powerful tool for rapid prototyping in digital art creation and innovation.
Batik, an Indonesian cultural heritage recognized by UNESCO, involves a complex and time-consuming coloring process. Digitalization offers a crucial solution for the preservation and development of batik art in the modern era. This research implements a Generative Adversarial Network (GAN), specifically the Pix2Pix model, to automate the transformation of batik sketches into colored images. The primary focus is a performance comparison between the U-Net generator architecture, which excels at preserving spatial details via skip-connections, and the ResNet architecture, which is capable of learning deeper and more complex features. This study utilized 1164 paired images, divided into 931 training, 117 validation, and 116 test data points. The models were trained with consistent hyperparameters, including an Adam optimizer and a combination of L1 and binary cross-entropy loss functions, with evaluations at 50 and 100 epochs. Quantitative evaluation was performed using Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), and Fréchet Inception Distance (FID) metrics. The results indicate that the model with the ResNet generator trained for 100 epochs achieved the most balanced and superior performance, with a PSNR of 8.11, SSIM of 0.39, and an FID of 120.72. Overall, the ResNet generator proved more capable of producing realistic and high-quality colored batik images, offering an innovative solution to enhance the efficiency of the coloring process while supporting cultural preservation.
No abstract available
This study explores the application of intelligent Generative Adversarial Networks (GANs) in illustration design and cultural and creative product design in Liaoning. It aims to enhance design efficiency and quality while promoting the inheritance and innovation development of local cultural heritage. The study employs CycleGAN combined with self-attention mechanisms to conduct style transfer experiments on cultural illustrations featuring Liaoning’s regional characteristics. It compares the performance of six models, including traditional GAN, CycleGAN, and CycleGAN + Self-attention, in terms of image quality and training efficiency. The results show that the CycleGAN + Self-attention model outperforms others in image quality indicators (PSNR and SSIM). On the Liaoning illustration dataset, this model achieves a PSNR of 30.5 dB and an SSIM of 0.96, which are 10 dB and 0.21 higher than traditional GANs. Tests across diverse datasets also demonstrate its robust generalization and adaptability. The proposed CycleGAN + Self-attention model performs well in illustration and cultural product design, generating high-quality illustrations with local cultural features. Beyond enhancing design efficiency, this model offers new technical means for cultural inheritance. Overall, this study furnishes novel ideas and methods for integrating intelligent GANs into art design practices. It helps promote Liaoning’s cultural heritage and innovation, and supports the development of the cultural and creative industry.
Recent style transfer problems are still largely dominated by Generative Adversarial Network (GAN) from the perspective of cross-domain image-to-image (I2I) translation, where the pivotal issue is to learn and transfer target-domain style patterns onto source-domain content images. This paper handles the problem of translating real pictures into traditional Chinese ink-wash paintings, i.e., Chinese ink-wash painting style transfer. Though a wide range of I2I models tackle this problem, a notable challenge is that the content details of the source image could be easily erased or corrupted due to the transfer of ink-wash style elements. To remedy this issue, we propose to incorporate saliency detection into the unpaired I2I framework to regularize image content, where the detected saliency map is utilized from two aspects: (\romannumeral1) we propose saliency IOU (SIOU) loss to explicitly regularize object content structure by enforcing saliency consistency before and after image stylization; (\romannumeral2) we propose saliency adaptive normalization (SANorm) which implicitly enhances object structure integrity of the generated paintings by dynamically injecting image saliency information into the generator to guide stylization process. Besides, we also propose saliency attended discriminator which harnesses image saliency information to focus generative adversarial attention onto the drawn objects, contributing to generating more vivid and delicate brush strokes and ink-wash textures. Extensive qualitative and quantitative experiments demonstrate superiority of our approach over related advanced image stylization methods in both GAN and diffusion model paradigms.
The objective of image style transfer is to create an image that has the artistic features of a reference style image while also retaining the details of the original content image. Despite the promising outcomes of current approaches, they are still susceptible to generating image information distortion or noise texture problems due to the absence of an effective style representation. As a solution to the aforementioned issues, this paper proposes AMS-CycleGAN (Attention Moment Shortcut-Cycle Generative Adversarial Network), a CycleGAN-based method that achieves style transfer, resulting in artwork that closely resembles hand-painted masterpieces by artists. Initially, the framework makes use of the Positional Normalization-Moment Shortcut (PONO-MS) module, the purpose of which is to retain and transmit structural information in the generator. Additionally, the Multi-Scale-Structural Similarity Index (MS-SSIM) loss is added to strengthen the constraint on the brightness and colour contrast of images. Finally, an attention mechanism module is introduced in the discriminator to emphasize available features and suppress irrelevant features during the style transformation process. According to the experimental results obtained, our method demonstrates a higher level of consistency with human perception when compared to current state-of-the-art methods in image style transfer.
Mixing and matching design elements from different fashion items to automatically create new textures or structures is desirable for fashion designers in order to facilitate the repetitive drawing process, and this mixing and matching process may also inspire them. In this research, we propose a structure and texture disentanglement generative adversarial network (STD-GAN) to create automated mix-and-match designs. Our model is trained to disentangle fashion items into different style elements and to integrate these elements to create a new design in an unsupervised manner. More specifically, a fashion attribute encoder is developed to disentangle the features of fashion items into two representations based on structure and texture. A texture mapping network is then applied to encode the texture representation in the form of different spatial features. A fashion fusion decoder is also developed that can generate mixed-style fashion items by utilizing the structure representation and the different texture features. In addition, a multi-discriminator framework is proposed to determine the authenticity and texture similarity of the reconstructed and mixed fashion items. Extensive experimental results demonstrate the effectiveness of our STD-GAN and its potential to facilitate the fashion design process by creating different textures and structures in a mix-and-match manner.
PurposeNowadays, artificial intelligence (AI) technology has demonstrated extensive applications in the field of art design. Attribute editing is an important means to realize clothing style and color design via computer language, which aims to edit and control the garment image based on the specified target attributes while preserving other details from the original image. The current image attribute editing model often generates images containing missing or redundant attributes. To address the problem, this paper aims for a novel design method utilizing the Fashion-attribute generative adversarial network (AttGAN) model was proposed for image attribute editing specifically tailored to women’s blouses.Design/methodology/approachThe proposed design method primarily focuses on optimizing the feature extraction network and loss function. To enhance the feature extraction capability of the model, an increase in the number of layers in the feature extraction network was implemented, and the structure similarity index measure (SSIM) loss function was employed to ensure the independent attributes of the original image were consistent. The characteristic-preserving virtual try-on network (CP_VTON) dataset was used for train-ing to enable the editing of sleeve length and color specifically for women’s blouse.FindingsThe experimental results demonstrate that the optimization model’s generated outputs have significantly reduced problems related to missing attributes or visual redundancy. Through a comparative analysis of the numerical changes in the SSIM and peak signal-to-noise ratio (PSNR) before and after the model refinement, it was observed that the improved SSIM increased substantially by 27.4%, and the PSNR increased by 2.8%, serving as empirical evidence of the effectiveness of incorporating the SSIM loss function.Originality/valueThe proposed algorithm provides a promising tool for precise image editing of women’s blouses based on the GAN. This introduces a new approach to eliminate semantic expression errors in image editing, thereby contributing to the development of AI in clothing design.
The application of Deep Convolutional Generative Adversarial Networks (DCGANs) has gained significant popularity in the domain of anime-style image painting. However, the model often suffers from the imbalanced training process and other certain limitations, such as the presence of blurred regions and feature misalignment. In our work, an enhanced DCGAN is proposed to tackle these challenges. First of all, we introduce a scoring mechanism in each convolutional layer of the discriminator to enhance its ability to capture multi-scale features for accurate rating. From another perspective, we integrate the self-attention mechanism and modify the sizes of convolutional kernels in the generator, resulting in greater capacity to generate details. To ensure training stability, we employ the WGAN-GP loss, which also improves the generation diversity. Furthermore, in order to provide a fair and thorough evaluation of the generated outputs, we employ a comprehensive human-machine scoring method to evaluate the outcomes. The experimental results demonstrate that our improved DCGAN exhibits stronger feature extraction capabilities and effectively enhances the quality of generated images.
No abstract available
Recent text-to-video models can synthesize visually compelling clips from short prompts, but still struggle with long-form narratives, character consistency, and culturally grounded stories. Professional film production relies on intermediate arte-facts such as scene lists, shot lists, and storyboards, created by experts like scriptwriters and storyboard artists. These artefacts are largely absent from most current generative pipelines.We present an LLM-orchestrated framework that automates a production-style workflow for turning short Nasreddin Hodja stories into coherent animated videos. Starting from a raw textual tale, the system performs scene breakdown, shot and sub-shot decomposition, storyboard generation, action breakdown, and transition planning, before synthesizing images, clips, and audio. Each stage is driven by role-specific prompt templates that encode expert knowledge of narrative structure and cinematography. On a set of Nasreddin Hodja stories, the resulting videos achieve high scores on chronology, character consistency, speaker consistency, pacing, and technical completeness. The framework is model-agnostic and can sit on top of current text-to-image and text-to-video backbones.
The audiovisual industry is undergoing a profound transformation as it is integrating AI developments not only to automate routine tasks but also to inspire new forms of art. This paper addresses the problem of producing a virtually unlimited number of novel characters that preserve the artistic style and shared visual traits of a small set of human-designed reference characters, thus broadening creative possibilities in animation, gaming, and related domains. Our solution builds upon DreamBooth, a well-established fine-tuning technique for text-to-image diffusion models, and adapts it to tackle two core challenges: capturing intricate character details beyond textual prompts and the few-shot nature of the training data. To achieve this, we propose a multi-token strategy, using clustering to assign separate tokens to individual characters and their collective style, combined with LoRA-based parameter-efficient fine-tuning. By removing the class-specific regularization set and introducing random tokens and embeddings during generation, our approach allows for unlimited character creation while preserving the learned style. We evaluate our method on five small specialized datasets, comparing it to relevant baselines using both quantitative metrics and a human evaluation study. Our results demonstrate that our approach produces high-quality, diverse characters while preserving the distinctive aesthetic features of the reference characters, with human evaluation further reinforcing its effectiveness and highlighting the potential of our method.
We propose ArtisanFlow, a framework designed to enable illustrators to seamlessly incorporate their personal artistic styles into AI-assisted image generation workflows. ArtisanFlow leverages Low-Rank Adaptation (LoRA) fine-tuning, achieving high stylistic fidelity while maintaining compatibility with popular generation pipelines such as ControlNet and Layer Diffusion. Our approach introduces multi-dimensional evaluation metrics, assessing style, semantic similarity, as well as generation stability and flexibility. Furthermore, we integrate tailored prompt engineering strategies to enhance layered composition and pose guidance. Experimental results demonstrate that ArtisanFlow can effectively capture artistic details, delivering outputs that closely resemble the artist’s original works. Moreover, the modular and node-based interface offers easy customization, with reasonable computational demands that do not require enterprise-grade hardware, while maintaining a user-friendly experience for both individual artists and creative teams.
Artistic style transfer in generative models remains a significant challenge, as existing methods often introduce style only via model fine-tuning, additional adapters, or prompt engineering, all of which can be computationally expensive and may still entangle style with subject matter. In this paper, we introduce a training- and inference-light, interpretable method for representing and transferring artistic style. Our approach leverages an art-specific Sparse Autoencoder (SAE) on top of latent embeddings of generative image models. Trained on artistic data, our SAE learns an emergent, largely disentangled set of stylistic and compositional concepts, corresponding to style-related elements pertaining brushwork, texture, and color palette, as well as semantic and structural concepts. We call it LouvreSAE and use it to construct style profiles: compact, decomposable steering vectors that enable style transfer without any model updates or optimization. Unlike prior concept-based style transfer methods, our method requires no fine-tuning, no LoRA training, and no additional inference passes, enabling direct steering of artistic styles from only a few reference images. We validate our method on ArtBench10, achieving or surpassing existing methods on style evaluations (VGG Style Loss and CLIP Score Style) while being 1.7-20x faster and, critically, interpretable.
Text-conditioned diffusion models can generate impressive images, but fall short when it comes to fine-grained control. Unlike direct-editing tools like Photoshop, text conditioned models require the artist to perform"prompt engineering,"constructing special text sentences to control the style or amount of a particular subject present in the output image. Our goal is to provide fine-grained control over the style and substance specified by the prompt, for example to adjust the intensity of styles in different regions of the image (Figure 1). Our approach is to decompose the text prompt into conceptual elements, and apply a separate guidance term for each element in a single diffusion process. We introduce guidance scale functions to control when in the diffusion process and \emph{where} in the image to intervene. Since the method is based solely on adjusting diffusion guidance, it does not require fine-tuning or manipulating the internal layers of the diffusion model's neural network, and can be used in conjunction with LoRA- or DreamBooth-trained models (Figure2). Project page: https://mshu1.github.io/dreamwalk.github.io/
Text-to-image generation has become increasingly popular, but achieving the desired images often requires extensive prompt engineering. In this paper, we explore how to decode textual prompts from reference images, a process we refer to as image reverse prompt engineering. This technique enables us to gain insights from reference images, understand the creative processes of great artists, and generate impressive new images. To address this challenge, we propose a method known as automatic reverse prompt optimization (ARPO). Specifically, our method refines an initial prompt into a high-quality prompt through an iteratively imitative gradient prompt optimization process: 1) generating a recreated image from the current prompt to instantiate its guidance capability; 2) producing textual gradients, which are candidate prompts intended to reduce the difference between the recreated image and the reference image; 3) updating the current prompt with textual gradients using a greedy search method to maximize the CLIP similarity between prompt and reference image. We compare ARPO with several baseline methods, including handcrafted techniques, gradient-based prompt tuning methods, image captioning, and data-driven selection method. Both quantitative and qualitative results demonstrate that our ARPO converges quickly to generate high-quality reverse prompts. More importantly, we can easily create novel images with diverse styles and content by directly editing these reverse prompts. Code will be made publicly available.
Purpose This study addresses the difficulty of preserving cultural features and aligning with user perception when digitally translating Sanxingdui bronze bird symbols. It develops a digital activation method that integrates Kansei Engineering and AIGC technology. Method A perceptual image survey was conducted to identify three key dimensions of user perception: meaning, form, and decoration. The Analytic Hierarchy Process (AHP) was used to determine the weight of each dimension. Two LoRA models were trained to control cultural style and object form separately. Weight parameters were mapped into the Stable Diffusion prompt system to enable intelligent generation. Results The AIGC system, guided by weight control, produced digital artworks with strong cultural recognition and emotional resonance. User evaluations confirmed the clear advantages of the generated solutions. Conclusion This research establishes a “perceptual quantification-weight configuration-intelligent generation” translation pathway. It provides a systematic approach for the digital innovation of cultural heritage. It also promotes the deeper application of AIGC technology for specific cultural objects.
Oracle bone inscriptions are the most valuable asset of humankind with irreplaceable archaeological and visual values. However, it is still challenging to analyze and generate oracle bone inscription-style images for visual identification and art design. In this paper, we propose DiffOBI, a novel approach for generating images in the oracle bone inscription style using diffusion models. We first construct a dataset that aligns with oracle bone inscription, text prompts, and object images. This dataset serves as the foundation for fine-tuning ControlNet. By inputting images of various objects along with corresponding text prompts, the model generates the corresponding images in the style of oracle bone inscription. To further enhance the quality of the generated images, we integrate a refinement module to refine the initial results, ensuring the refined results conform more closely to the original structure and norms of oracle bone inscription. This approach ensures that the generated images conform to the given input images and also preserves the unique pictographic features of oracle bone inscription.
Despite recent advances in generative models, artistic character generation remains an open problem. The key challenge is to balance the preservation of character structures to ensure integrity while incorporating aesthetic enhancements, which can be broadly categorized into visual styles and user-specified decorative elements. To address this, we propose ReChar, a plug-and-play framework composed of three complementary modules that preserve structure, extract style, and generate decorative elements. These modules are integrated via a fusion model to enable precise and coherent artistic character generation. To systematically evaluate artistic character generation, we introduce ImageNet-ReChar, the first large-scale benchmark for this task, covering multiple writing systems, diverse visual styles, and over 1,000 semantically grounded decorative prompts. Extensive experiments show that ReChar outperforms state-of-the-art baselines in structural integrity, stylistic fidelity, and prompt adherence, achieving an SSIM of 0.8690 and over 93% human preference across all criteria.
The Multilingual Story Illustration Shared Task (MUSIA), conducted as part of FIRE-2025, investigates the challenge of generating culturally grounded visual narratives for short stories written in Hindi and English. As multimodal AI becomes increasingly relevant in education, and creative content generation, this task evaluates how effectively current systems can interpret multilingual text and produce coherent, culturally faithful image sequences. A total of 8 teams registered for the shared task; 5 teams submitted system runs, and 4 teams ultimately submitted overview papers. Their approaches spanned narrative segmentation, translation, summarization, prompt engineering, and diffusion-based image generation using state-of-the-art models. Human evaluation measured system performance along three dimensions: relevance, consistency, and visual quality. Results showed that pipelines integrating large language models for story-aware prompt construction, combined with diffusion models, achieved the strongest performance, particularly in visual quality and cultural alignment. However, several systems struggled with maintaining character identity, stylistic continuity, and fine-grained adherence to story details. This paper presents the dataset, task formulation, methodological strategies adopted by participating teams, and comparative results, offering a foundation for advancing research in multilingual, culturally sensitive story visualization.
Text-to-image generation (AI art) has become a mainstream phenomenon since the introduction of DALL-E by OpenAI in January 2021 (Nast, 2023). ). On the one hand, AI art challenges definitions of creativity that center on anthropocentric values and discredits the contributions of artists in the training of these AI models (Knibbs, 2023). On the other hand, it blurs the line between artists and non-artists by enabling new ways of creating art (e.g., prompt engineering: an iterative and experimental text-based process to interact with text-to-image generation models). Regardless of one’s ethical stance, practitioners of AI art, including artists of various skills and non-artists, form and participate in online communities to showcase their wares, share practices and resources, and learn from each other. This study uses a topic modeling approach to examine topics within three subreddit communities centered on three text-to-image generation models (r/StableDiffusion, r/midjourney, and r/weirddalle). The analysis, based on the top 500 posts from each subreddit over one month, reveals distinct community foci: r/StableDiffusion emphasizes technological innovations and technical learning, r/midjourney showcases AI art and prompt learning, while r/weirddalle is more competitive, focusing on creative or entertaining results. The study further derives topics from prompts extracted from the images, revealing preferences for popular media characters, high photorealism, and surrealist styles, with a notable emphasis on portraits of women.
Novel architectures have recently improved generative image synthesis leading to excellent visual quality in various tasks. Of particular note is the field of ``AI-Art'', which has seen unprecedented growth with the emergence of powerful multimodal models such as CLIP. By combining speech and image synthesis models, so-called ``prompt-engineering'' has become established, in which carefully selected and composed sentences are used to achieve a certain visual style in the synthesized image. In this note, we present an alternative approach based on retrieval-augmented diffusion models (RDMs). In RDMs, a set of nearest neighbors is retrieved from an external database during training for each training instance, and the diffusion model is conditioned on these informative samples. During inference (sampling), we replace the retrieval database with a more specialized database that contains, for example, only images of a particular visual style. This provides a novel way to prompt a general trained model after training and thereby specify a particular visual style. As shown by our experiments, this approach is superior to specifying the visual style within the text prompt. We open-source code and model weights at https://github.com/CompVis/latent-diffusion .
This paper explores the application of Large Language Models (LLMs) in addressing the challenges of procedural generation in video game design, specifically tackling the Oatmeal Problem and the Bach Faucet Problem. These issues refer to the generation of endlessly similar, unengaging content and the devaluation of content due to its infinite producibility, respectively. By integrating design thinking, computational-cognitive heuristics, and prompt engineering with modern machine intelligence tools, we propose a novel approach termed Procedural Woodworking (PWw). This methodology aims to generate unique worlds and narratives through LLMs while enhancing player engagement by embedding a player’s character into the game’s narrative in a meaningful way. We utilize OpenAI’s ChatGPT-4 to create custom GPTs for world-building and narrative generation, evaluating their novelty and logical consistency through comparative rating scales. Our results indicate that both Top-Down and Bottom-Up world-building methods can produce logically consistent and structurally coherent outputs. The Procedural Woodworking approach, in particular, shows promise in generating novel narrative content that aligns with pre-made human-designed fictional worlds, potentially mitigating the identified procedural generation problems. This study lays the groundwork for further investigation into how LLMs can enrich game design and player experience by offering a more personalized and value-driven narrative generation process.
Story visualization aims to generate coherent image sequences that faithfully depict a narrative and align with character references. Despite progress in generative models, existing benchmarks are narrow in scope, often limited to short prompts, lacking character references, or single-image cases, and fail to capture real-world storytelling complexity. This hinders a nuanced understanding of model capabilities and limitations. We present \textbf{ViStoryBench}, a comprehensive benchmark designed to evaluate story visualization models across diverse narrative structures, visual styles, and character settings. The benchmark features richly annotated multi-shot scripts derived from curated stories spanning literature, film, and folklore. Large language models assist in story summarization and script generation, with all outputs human-verified to ensure coherence and fidelity. Character references are carefully curated to maintain intra-story consistency across varying artistic styles. To enable thorough evaluation, ViStoryBench introduces a set of automated metrics that assess character consistency, style similarity, prompt alignment, aesthetic quality, and generation artifacts such as copy-paste behavior. These metrics are validated through human studies, and used to benchmark a broad range of open-source and commercial models. ViStoryBench offers a multi-dimensional evaluation suite that facilitates systematic analysis and fosters future progress in visual storytelling.
No abstract available
The article examines the possibilities of various neural networks as tools for the implementation of projects in graphic design, evaluates their ability to ensure quality and efficiency in the creation of visual content for various types of products. The advantages and disadvantages of each neural network are also analyzed. The work presents the opinions of scientists and practitioners about the variety of neural networks that can be used to perform graphic design tasks. In addition, the results of own practical experience of working with neural networks are given. The study confirmed the effectiveness of neural networks in creating concepts of characters and locations for computer games, illustrations for printed and electronic publications, as well as in the development of trademarks and logos, corporate style and graphic design of packaging. However, their functionality does not yet provide the necessary quality level for such products as posters created on the basis of figurative language tropes; fonts; engineering graphics in axonometric projections showing the internal structure of devices or equipment; layout for print publications, websites and mobile applications, as well as infographics based on stylized images and design solutions for packaging. Maze Guru, Midjourney, and Leonardo AI are best for graphic design content. The ChatGPT neural network is an effective tool for matching peers and gathering feedback from scientists. The advantage of using neural networks is a significant acceleration of the process of creating visual content, as well as the possibility of combining different programs to supplement and improve the results obtained. Disadvantages include mainly English-language communication between the user and the network, as well as discrepancies between the images that exist in the user's mind and those generated by the network. Creations created by neural networks are easily recognizable, and for similar text queries, they can give very similar results.
Font design is an area that presents a unique opportunity to blend artistic creativity and artificial intelligence. However, traditional methods are time-consuming, especially for complex fonts or large character sets. Font transfer streamlines this process by learning font transitions to generate multiple styles from a target font. Yet, existing Generative Adversarial Network (GAN) based approaches often suffer from instability. Current diffusion-based font generation methods typically depend on single-modal inputs, either visual or textual, limiting their capacity to capture detailed structural and semantic font features. Additionally, current diffusion models suffer from high computational complexity due to their deep and redundant architectures. To address these challenges, we propose esFont, a novel guided Diffusion framework. It incorporates a Contrastive Language–Image Pre-training (CLIP) based text encoder, and a Vision Transformer (ViT) based image encoder, enriching the font transfer process through multimodal guidance from text and images. Our model further integrates deep clipping and timestep optimization techniques, significantly reducing parameter complexity while maintaining superior performance. Experimental results demonstrate that esFont improves both efficiency and quality. Our model shows clear enhancements in structural accuracy (SSIM improved to 0.91), pixel-level fidelity (RMSE reduced to 2.68), perceptual quality aligned with human vision (LPIPS reduced to 0.07), and stylistic realism (FID decreased to 13.87). It reduces the model size to 100M parameters, cuts training time to just 1.3 hours, and lowers inference time to only 21 minutes. In summary, esFont achieves significant advancements in both scientific and engineering domains by the innovative combination of multimodal encoding, structural depth pruning, and timestep optimization.
No abstract available
合并后的分组体系完整覆盖了AIGC在IP设计中的全链条:从底层的生成算法优化与工具可控性评估(技术层),到传统文化IP活化与内容产业创意流(应用层),再到商业品牌跨界实践(商业层),最后延伸至行业评估、伦理与教育变革(反思层)。研究趋势显示,AIGC正从简单的素材生成工具演变为深嵌入IP叙事与文化传承的协同系统,同时其对设计师职业身份的重塑已成为学术界关注的焦点。