模型异构场景下的生成式可证明安全隐写
可证明安全隐写的理论框架与安全定义
该组文献奠定了生成式隐写的数学基础,探讨了统计不可区分性、最小熵耦合、KL散度优化以及公钥隐写协议。研究重点在于从信息论角度定义完美安全性(Perfect Security)的界限,为后续各种生成模型的隐写方案提供理论保证。
- Perfectly Secure Steganography Using Minimum Entropy Coupling(Christian Schroeder de Witt, Samuel Sokota, J. Zico Kolter, Jakob Foerster, Martin Strohmeier, 2022, ArXiv Preprint)
- Formalization of statistical indistinguishability of probability distribution ensembles in Mizar(Hiroyuki Okazaki, 2016, 2016 International Symposium on Information Theory and Its Applications (ISITA))
- Achieving Efficient and Provably Secure Steganography in Practice(Aubrey Alston, 2017, ArXiv Preprint)
- Rethinking Prefix-Based Steganography for Enhanced Security and Efficiency(Chao Pan, Donghui Hu, Yaofei Wang, Kejiang Chen, Yinyin Peng, Xianjin Rong, Chen Gu, Meng Li, 2025, IEEE Transactions on Information Forensics and Security)
- Steganography Security: Principle and Practice(Yan Ke, Jia Liu, Min-qing Zhang, Ting-ting Su, Xiao-yuan Yang, 2018, ArXiv Preprint)
- Defining security in steganographic systems(S. Katzenbeisser, F. Petitcolas, 2002, No journal)
- On The Limits Of Perfect Security For Steganographic System(Khan Farhan Rafat, M. Sher, 2013, ArXiv Preprint)
- Provably Secure Public-Key Steganography Based on Admissible Encoding(Xin Zhang, Kejiang Chen, Na Zhao, Weiming Zhang, Nenghai Yu, 2025, ArXiv Preprint)
语言模型驱动的生成式隐写与语义歧义消除
针对文本生成场景,这组文献解决了分词歧义(Segmentation Ambiguity)、长尾分布优化及语义可控性问题。通过自适应概率控制和认知不可感知性增强,在保证统计分布一致性的同时,提升了文本隐写的嵌入容量与自然度。
- A Near-Imperceptible Disambiguating Approach via Verification for Generative Linguistic Steganography(Ruiyi Yan, T. Song, Yating Yang, 2024, 2024 IEEE International Conference on Systems, Man, and Cybernetics (SMC))
- A Secure and Disambiguating Approach for Generative Linguistic Steganography(Ruiyi Yan, Yating Yang, T. Song, 2023, IEEE Signal Processing Letters)
- ADLM - stega: A Universal Adaptive Token Selection Algorithm for Improving Steganographic Text Quality via Information Entropy(Zezheng Qin, Congcong Sun, Taiyi He, Yuke He, Azizol Abdullah, Normalia Samian, N. A. Roslan, 2024, ArXiv)
- Graph-Stega: Semantic Controllable Steganographic Text Generation Guided by Knowledge Graph(Zhongliang Yang, Baitao Gong, Yamin Li, Jinshuai Yang, Zhiwen Hu, Yongfeng Huang, 2020, ArXiv)
- Linguistic Steganography Based on Adaptive Probability Distribution(Xuejing Zhou, Wanli Peng, Boya Yang, Juan Wen, Yiming Xue, P. Zhong, 2021, IEEE Transactions on Dependable and Secure Computing)
- Provably Secure Generative Linguistic Steganography(Siyu Zhang, Zhongliang Yang, Jinshuai Yang, Yongfeng Huang, 2021, ArXiv Preprint)
- Linguistic Generative Steganography With Enhanced Cognitive-Imperceptibility(Zhongliang Yang, Lingyun Xiang, Siyu Zhang, Xingming Sun, Yongfeng Huang, 2021, IEEE Signal Processing Letters)
- STEAD: Robust Provably Secure Linguistic Steganography with Diffusion Language Model(Yuang Qi, Na Zhao, Qiyi Yao, Benlong Wu, Weiming Zhang, Nenghai Yu, Kejiang Chen, 2026, ArXiv Preprint)
- Neural Linguistic Steganography with Controllable Security(Tianhe Lu, Gongshen Liu, Ru Zhang, Tianjie Ju, 2023, 2023 International Joint Conference on Neural Networks (IJCNN))
- Provably Secure Disambiguating Neural Linguistic Steganography(Yuang Qi, Kejiang Chen, Kai Zeng, Weiming Zhang, Nenghai Yu, 2024, ArXiv Preprint)
扩散模型与GAN架构下的潜空间映射隐写
该组文献利用扩散模型(Diffusion Models)和GAN的生成能力,通过建立秘密信息与高斯噪声或潜在空间(Latent Space)之间的可逆映射,实现高保真图像或文本隐写。研究涵盖了逆变换采样、稀疏采样及分布保持的可逆生成技术。
- GTSD: Generative Text Steganography Based on Diffusion Model(Zhengxian Wu, Juan Wen, Yiming Xue, Ziwei Zhang, Yinghan Zhou, 2025, ArXiv Preprint)
- SDS-TG: Secure Diffusion Steganography in Text-Guided Generative Images(Haozhong Yang, Hongxia Wang, Jinhe Li, Fei Zhang, 2025, 2025 IEEE International Conference on Multimedia and Expo (ICME))
- Plug-and-Hide: Provable and Adjustable Diffusion Generative Steganography(Jiahao Zhu, Zixuan Chen, Lingxiao Yang, Xiaohua Xie, Yi Zhou, 2024, ArXiv)
- Provably Secure Generative Steganography Based on Adjustable Orthogonal Mapping(Qinghua Zhang, Fangjun Huang, 2026, IEEE Transactions on Dependable and Secure Computing)
- DGADM-GIS: Deterministic Guided Additive Diffusion Model for Generative Image Steganography(Chengsheng Yuan, Zhaonan Ji, Xinting Li, Zhili Zhou, Zhihua Xia, Q. M. J. Wu, 2025, IEEE Transactions on Dependable and Secure Computing)
- Generative Steganography Diffusion(Ping Wei, Qing Zhou, Zichi Wang, Zhenxing Qian, Xinpeng Zhang, Sheng Li, 2023, ArXiv Preprint)
- TASDF-Stega: High Capacity Secure Text-Audio Joint Steganography Using Diffusion Latent Space(Zhen Yang, Yelei Wang, Yufei Luo, Xin Xu, Ru Zhang, 2026, IEEE Signal Processing Letters)
- Generative Image Steganography Based on Text-to-Image Multimodal Generative Model(Jingyuan Jiang, Zichi Wang, Zihan Yuan, Xinpeng Zhang, 2025, IEEE Transactions on Circuits and Systems for Video Technology)
- SparSamp: Efficient Provably Secure Steganography Based on Sparse Sampling(Yaofei Wang, Gang Pei, Kejiang Chen, Jinyang Ding, Chao Pan, Weilong Pang, Donghui Hu, Weiming Zhang, 2025, ArXiv Preprint)
- Reversible generative steganography with distribution-preserving(Weixuan Tang, Yuan Rao, Zuopeng Yang, Fei Peng, Xutong Cui, Junhao Huang, Peijun Zhu, 2025, Cybersecurity)
- Provably Secure Robust Image Steganography(Zijin Yang, Kejiang Chen, Kai Zeng, Weiming Zhang, Neng H. Yu, 2024, IEEE Transactions on Multimedia)
- Latent Space-based Coverless Steganography with Encryption and Multi-Metric Security Evaluation(Soud Mohamed Amen, Hassan Mahmood, 2026, Asian Journal of Research in Computer Science)
- Secure Steganography Based on Wasserstein Generative Adversarial Networks with Gradient Penalty(Fang Ren, Yiyuan Wang, Tingge Zhu, Bo Gao, 2024, 2024 6th International Conference on Natural Language Processing (ICNLP))
- Enhancing steganography capacity through multi-stage generator model in generative adversarial network based image concealment(Bisma Sultan, M. A. Wani, 2024, Journal of Electronic Imaging)
- Error-Correcting Image Steganography Method Based on Generative Adversarial Networks(Qifei Liang, 2024, Proceedings of the 2024 International Conference on Virtual Reality, Image and Signal Processing)
- ISGI steganography scheme: image steganography scheme leveraging a novel cryptography algorithm and generative adversarial network(S. Hashemi, M. Majidi, 2026, Multimedia Tools and Applications)
- A generative image steganography method based on joint encoding of multi-object semantic information(Ke Shi, Peng Liu, Songbin Li, Jingang Wang, 2025, Pattern Analysis and Applications)
- IS-DGM: an improved steganography method based on a deep generative model and hyper logistic map encryption via social media networks(Mohamed Abdel Hameed, M. Hassaballah, Tong Qiao, 2024, Multimedia Systems)
模型异构与非对称资源下的鲁棒隐写方案
专门探讨收发双方模型参数不一致、计算资源不对等以及跨模态(语音、点云、医疗影像)场景下的隐写挑战。研究重点在于利用“模型再现性”、固定神经网络及分布式协议,确保在有损信道或异构环境下秘密信息的准确提取。
- Provably Robust and Secure Steganography in Asymmetric Resource Scenario(Minhao Bai, Jinshuai Yang, Kaiyi Pang, Xin Xu, Zhen Yang, Yongfeng Huang, 2024, ArXiv Preprint)
- Robust Generative Steganography for Image Hiding Using Concatenated Mappings(Liyan Chen, Bingwen Feng, Zhihua Xia, Wei Lu, Jian Weng, 2025, IEEE Transactions on Information Forensics and Security)
- Reversible Generative Steganography Leveraging Distribution Preserving Encoding for Enhanced Data Security and Integrity(J. Uthayakumar, S. Sreeraj, C. Sandhiya, Ram H., T. Mohanraj, R. Senthilkumar, 2025, Proceedings of the 1st International Conference on Research and Development in Information, Communication, and Computing Technologies)
- INR-Based Generative Steganography by Point Cloud Representation(Zhong Yangjie, Liu Jia, Luo Peng, Ke Yan, Cai Shen, 2024, ArXiv Preprint)
- Distribution-Preserving Steganography Based on Text-to-Speech Generative Models(Kejiang Chen, Hang Zhou, Hanqing Zhao, Dongdong Chen, Weiming Zhang, Nenghai Yu, 2022, IEEE Transactions on Dependable and Secure Computing)
- Cover Reproducible Steganography via Deep Generative Models(Kejiang Chen, Hang Zhou, Yaofei Wang, Meng Li, Weiming Zhang, Neng H. Yu, 2022, IEEE Transactions on Dependable and Secure Computing)
- The Emergence of Reproducibility and Generalizability in Diffusion Models(Huijie Zhang, Jinfan Zhou, Yifu Lu, Minzhe Guo, Peng Wang, Liyue Shen, Qing Qu, 2023, ArXiv Preprint)
- Cover-separable Fixed Neural Network Steganography via Deep Generative Models(Guobiao Li, Sheng Li, Zhenxing Qian, Xinpeng Zhang, 2024, Proceedings of the 32nd ACM International Conference on Multimedia)
- Game-theoretic distributed learning of generative models for heterogeneous data collections(D. Schlesinger, B. Flach, 2025, 2025 3rd International Conference on Foundation and Large Language Models (FLLM))
- Robust Generative Steganography Based on Image Mapping(Qinghua Zhang, Fangjun Huang, 2024, IEEE Transactions on Circuits and Systems for Video Technology)
隐写鲁棒性增强、纠错机制与抗检测技术
针对现实信道中的有损压缩(如JPEG)和编辑攻击,这组文献提出了结合纠错码、对抗训练及距离约束编码的增强方案。通过引入对抗生成网络(GAN)和自适应优化,提升隐写系统在复杂攻击下的生存能力。
- Robust Generative Image Steganography based on Frequency Domain using Fourier Transform(Yujie Jiang, Jing Dong, Shutiao Luo, 2025, Proceedings of the 2025 4th International Conference on Big Data, Information and Computer Network)
- LDStega: Practical and Robust Generative Image Steganography based on Latent Diffusion Models(Yinyin Peng, Yaofei Wang, Donghui Hu, Kejiang Chen, Xianjin Rong, Weiming Zhang, 2024, Proceedings of the 32nd ACM International Conference on Multimedia)
- Establishing Robust Generative Image Steganography via Popular Stable Diffusion(Xiaoxiao Hu, Sheng Li, Qichao Ying, Wanli Peng, Xinpeng Zhang, Zhenxing Qian, 2024, IEEE Transactions on Information Forensics and Security)
- Robust Provably Secure Image Steganography via Latent Iterative Optimization(Yanan Li, Zixuan Wang, Qiyang Xiao, Yanzhen Ren, 2026, ArXiv Preprint)
- Alkaid: Resilience to Edit Errors in Provably Secure Steganography via Distance-Constrained Encoding(Zhihan Cao, Gaolei Li, Jun Wu, Jianhua Li, Hang Zhang, Mingzhe Chen, 2026, ArXiv Preprint)
- Provably Secure Robust Image Steganography via Cross-Modal Error Correction(Yuang Qi, Kejiang Chen, Na Zhao, Zijin Yang, Weiming Zhang, 2024, ArXiv Preprint)
- Provably Secure and Robust Audio Steganography Under Multi-Format Low-Bitrate Compression(Yanan Li, Qiyang Xiao, Zixuan Wang, Yanzhen Ren, Lina Wang, 2025, IEEE Transactions on Information Forensics and Security)
- Multichannel Steganography: A Provably Secure Hybrid Steganographic Model for Secure Communication(Obinna Omego, Michal Bosy, 2025, ArXiv Preprint)
- Generating Steganographic Images via Adversarial Training(Jamie Hayes, George Danezis, 2017, ArXiv Preprint)
- A Robust Generative Image Steganography Method based on Guidance Features in Image Synthesis(Youqiang Sun, Jianyi Liu, Ru Zhang, 2023, 2023 IEEE International Conference on Multimedia and Expo (ICME))
- StegaStyleGAN: Towards Generic and Practical Generative Image Steganography(Wenkang Su, J. Ni, Yiyang Sun, 2024, No journal)
先进生成模型基础架构与多模态合成支撑
该组文献提供了作为隐写载体的底层生成技术,包括高分辨率合成、多阶段引导、跨模态(文生图/音)合成以及非欧几里得数据生成。这些技术为实现高质量、高容量的生成式隐写提供了必要的模型工具。
- AnomalyPainter: Vision-Language-Diffusion Synergy for Zero-Shot Realistic and Diverse Industrial Anomaly Synthesis(Zhangyu Lai, Yilin Lu, Xinyang Li, Jianghang Lin, Yansong Qu, Liujuan Cao, Ming Li, Rongrong Ji, 2025, ArXiv Preprint)
- DiffWave: A Versatile Diffusion Model for Audio Synthesis(Zhifeng Kong, Wei Ping, Jiaji Huang, Kexin Zhao, Bryan Catanzaro, 2020, ArXiv)
- Vector Quantized Diffusion Model for Text-to-Image Synthesis(Shuyang Gu, Dong Chen, Jianmin Bao, Fang Wen, Bo Zhang, Dongdong Chen, Lu Yuan, B. Guo, 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))
- Generative Steganography by Sampling(Zhuo Zhang, Jia Liu, Yan Ke, Yu Lei, Jun Li, Minqing Zhang, Xiaoyuan Yang, 2018, ArXiv Preprint)
- Generative Steganography Network(Ping Wei, Sheng Li, Xinpeng Zhang, Ge Luo, Zhenxing Qian, Qing Zhou, 2022, Proceedings of the 30th ACM International Conference on Multimedia)
- NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers(Kai Shen, Zeqian Ju, Xu Tan, Yanqing Liu, Yichong Leng, Lei He, Tao Qin, Sheng Zhao, Jiang Bian, 2023, ArXiv Preprint)
- Latent Diffusion Model without Variational Autoencoder(Minglei Shi, Haolin Wang, Wenzhao Zheng, Ziyang Yuan, Xiaoshi Wu, Xintao Wang, Pengfei Wan, Jie Zhou, Jiwen Lu, 2025, ArXiv)
- TendiffPure: a convolutional tensor-train denoising diffusion model for purification(Mingyuan Bai, Derun Zhou, Qibin Zhao, 2024, Frontiers of Information Technology & Electronic Engineering)
- Convergence and Stability Analysis of Self-Consuming Generative Models with Heterogeneous Human Curation(Hongru Zhao, Jinwen Fu, Tuan Pham, 2025, ArXiv)
- cWDM: Conditional Wavelet Diffusion Models for Cross-Modality 3D Medical Image Synthesis(Paul Friedrich, Alicia Durrer, Julia Wolleb, Philippe C. Cattin, 2024, ArXiv Preprint)
- Text-driven Visual Synthesis with Latent Diffusion Prior(Ting-Hsuan Liao, Songwei Ge, Yiran Xu, Yao-Chih Lee, Badour AlBahar, Jia-Bin Huang, 2023, ArXiv Preprint)
- Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos(Kun Su, Kaizhi Qian, Eli Shlizerman, Antonio Torralba, Chuang Gan, 2023, ArXiv Preprint)
- Guided Image Synthesis via Initial Image Editing in Diffusion Model(Jiafeng Mao, Xueting Wang, K. Aizawa, 2023, Proceedings of the 31st ACM International Conference on Multimedia)
- Efficient Training-Free High-Resolution Synthesis with Energy Rectification in Diffusion Models(Zhen Yang, Guibao Shen, Minyang Li, Liang Hou, Mushui Liu, Luozhou Wang, Xin Tao, Pengfei Wan, Di Zhang, Ying-Cong Chen, 2025, ArXiv Preprint)
- Multistage guidance on the diffusion model inspired by human artists’ creative thinking(W. Qi, Huanghuang Deng, Taihao Li, 2023, Frontiers of Information Technology & Electronic Engineering)
- Pard: Permutation-Invariant Autoregressive Diffusion for Graph Generation(Lingxiao Zhao, Xueying Ding, Leman Akoglu, 2024, ArXiv Preprint)
- Text Generation with Diffusion Language Models: A Pre-training Approach with Continuous Paragraph Denoise(Zhenghao Lin, Yeyun Gong, Yelong Shen, Tong Wu, Zhihao Fan, Chen Lin, Nan Duan, Weizhu Chen, 2022, ArXiv Preprint)
本组文献系统地展示了生成式可证明安全隐写从理论定义向复杂异构场景落地的演进全景。研究脉络清晰地从统计不可区分性的严谨证明,延伸至大语言模型与扩散模型在潜空间映射上的创新应用。特别是在模型异构与非对称资源环境下,研究者通过引入模型再现性、分布式学习及跨模态协同技术,有效解决了发送方与接收方在模型架构不一致时的同步难题。同时,结合纠错码与对抗训练的鲁棒性增强方案,使得生成式隐写在面对现实有损信道时具备了实用化的可能,实现了安全性、容量与鲁棒性在多模态环境下的深度平衡。
总计75篇相关文献
Diffusion model-based generative image steganography (DM-GIS) is an emerging paradigm that leverages the generative power of diffusion models to conceal secret messages without requiring pre-existing cover images. In this paper, we identify a fundamental trade-off between stego image quality, steganographic security, and extraction reliability within the DM-GIS framework. Drawing on this insight, we propose \textbf{PA-B2G}, a \textbf{P}rovable and \textbf{A}djustable \textbf{B}it-to-\textbf{G}aussian mapping. Theoretically, PA-B2G guarantees the reversible encoding of arbitrary-length bit sequences into pure Gaussian noise; practically, it enables fine-grained control over the balance between image fidelity, security, and extraction accuracy. By integrating PA-B2G with probability-flow ordinary differential equations (PF-ODEs), we establish a theoretically invertible mapping between secret bitstreams and stego images. PA-B2G is model-agnostic and can be seamlessly integrated into mainstream diffusion models without additional training or fine-tuning, making it also suitable for diffusion model watermarking. Extensive experiments validate our theoretical analysis of the inherent DM-GIS trade-offs and demonstrate that our method flexibly supports arbitrary payloads while achieving competitive image quality and security. Furthermore, our method exhibits strong resilience to lossy processing in watermarking applications, highlighting its practical utility.
The recent advances in generative image steganography have drawn increasing attention due to their potential for provable security and bulk embedding capacity. However, existing generative steganographic schemes are usually tailored for specific tasks and are hardly applied to applications with practical constraints. To address this issue, this paper proposes a generic generative image steganography scheme called Steganography StyleGAN (StegaStyleGAN) that meets the practical objectives of security, capacity, and robustness within the same framework. In StegaStyleGAN, a novel Distribution-Preserving Secret Data Modulator (DP-SDM) is used to achieve provably secure generative image steganography by preserving the data distribution of the model inputs. Additionally, a generic and efficient Secret Data Extractor (SDE) is invented for accurate secret data extraction. By choosing whether to incorporate the Image Attack Simulator (IAS) during the training process, one can obtain two models with different parameters but the same structure (both generator and extractor) for lossless and lossy channel covert communication, namely StegaStyleGAN-Ls and StegaStyleGAN-Ly. Furthermore, by mating with GAN inversion, conditional generative steganography can be achieved as well. Experimental results demonstrate that, whether for lossless or lossy communication channels, the proposed StegaStyleGAN can significantly outperform the corresponding state-of-the-art schemes.
Generative steganography employs generative models to synthesize stego images directly from secret information, avoiding cover image modifications required in traditional steganography, thereby evading detection by steganalytic tools. The latest advances in image generation technology have spurred significant progress in the field of generative steganography. Nevertheless, existing generative steganography still encounters significant challenges in terms of provable security, robustness, capacity, and visual quality. To this end, we construct an adjustable orthogonal mapping (AOM) framework, and based on it propose a provably secure generative steganographic method, named GSAOM. Specifically, the sender employs AOM to convert the secret information into the latent variable that follows the standard normal distribution and then inputs it into the diffusion model to generate a high-quality stego image. Correspondingly, the receiver converts the stego image back into the latent variable through the inversion of the diffusion model, and then extracts the secret information via the inverse mapping of the AOM. Since both the latent variable exported from AOM and the latent variable randomly generated during regular image generation follow the standard normal distribution, our proposed GSAOM can achieve provable security. Additionally, AOM allows for adjustable capacity and can maximize the distinguishability of the extracted secret information, endowing GSAOM with advantages in capacity and robustness. Extensive experiments demonstrate that GSAOM performs well in capacity, visual quality, security, robustness, and generalization.
The maturity of generative models and the popularity of generated data have brought new technical means and camouflage environments to steganography. Numerous generative image steganography methods have emerged, but achieving provable security, robustness, and relatively high capacity simultaneously remains challenging. This paper proposes a provably secure robust image steganography method via the generative adversarial network (GAN), named PARIS. The sender maps the secret message, following a uniform distribution, to latent vectors conforming to a standard Gaussian distribution using inverse transform sampling. Subsequently, the latent vector is fed into the generator, producing the stego image. In this way, the stego image cannot be distinguished from the normally generated image. The receiver extracts the secret message from the recovered latent vector via gradient descent optimization. To enhance the robustness, a noise layer is introduced while recovering the latent vector to simulate potential lossy operations in real scenarios. The security of the proposed method is theoretically proven. Extensive experiments have also verified the proposed method's robustness, security, and relatively high capacity in terms of different GAN architectures, noises, and datasets.
With the rapid advancement of audio generation models, research on audio steganography has entered a new phase of opportunity. Nevertheless, most existing generative steganographic approaches focus primarily on security while neglecting the compression and transcoding processes that are common in real-world communication. This oversight leads to two major issues: the introduction of verification mechanisms would violate its security proof assumptions, and quantization-based compression markedly reduces message extraction accuracy. In this work, we propose a robust audio steganography method that preserves provable security under various compression conditions. The security of our method relies exclusively on the latent space following a fixed distribution, which is independent of the embedded message. The proposed encoding–decoding scheme supports a tunable trade-off between capacity and robustness, allowing the sacrifice of partial capacity to reinforce robustness. Theoretical analysis shows that even with redundant error-checking codes, the latent distribution remains invariant after message embedding, thereby preserving both steganographic security and generation quality while ensuring practical applicability. Experimental results demonstrate that our method maintains message extraction accuracy under both MP3 and AAC compression and re-compression across bitrates of 160 kbps, 128 kbps, 64 kbps, 48 kbps, and even 32 kbps.
: Distribution-Preserving encoding combined with reversible generative steganography has been a key development in safe data embedding and retrieval. Achieving lossless recovery is difficult with traditional stenographic approaches since they frequently compromise reversibility and data integrity. In this paper, a novel framework that combines a distribution-preserving encoding technique with reversible generative steganography is proposed. The suggested technique embeds cover images with hidden messages while guaranteeing that the encoded data maintains its statistical characteristics. In order to ensure imperceptibility and reversibility, our method uses a deep generative model to map the secret message into a latent space while maintaining the data distribution. In order to recover the original secret message without distortion, a stego-image is created and decoded using an inverse generative model. We perform extensive tests on benchmark datasets to assess the efficacy of our system, proving full reversibility, improved embedding capability, and security. Our methodology offers state-of-the-art performance, ensuring lossless information retrieval and preserving high quality in cover images when compared to conventional methods.
Generative steganography stands as a promising technique for information hiding, primarily due to its remarkable resistance to steganalysis detection. Despite its potential, hiding a secret image using existing generative steganographic models remains a challenge, especially in lossy or noisy communication channels. This paper proposes a robust generative steganography model for hiding full-size image. It lies on three reversible concatenated mappings proposed. The first mapping uses VQGAN with an order-preserving codebook to compress an image into a more concise representation. The second mapping incorporates error correction to further convert the representation into a robust binary representation. The third mapping devises a distribution-preserving sampling mapping that transforms the binary representation into the latent representation. This latent representation is then used as input for a text-to-image Diffusion model, which generates the final stego image. Experimental results show that our proposed scheme can freely customize the stego image content. Moreover, it simultaneously attains high stego and recovery image quality, high robustness, and provable security.
No abstract available
Generative steganography, a novel paradigm in information hiding, has garnered considerable attention for its potential to withstand steganalysis. However, existing generative steganography approaches suffer from the limited visual quality of generated images and are challenging to apply to lossy transmissions in real-world scenarios with unknown channel attacks. To address these issues, this paper proposes a novel robust generative image steganography scheme, facilitating zero-shot text-driven stego image generation without the need for additional training or fine-tuning. Specifically, we employ the popular Stable Diffusion model as the backbone generative network to establish a covert transmission channel. Our proposed framework overcomes the challenges of numerical instability and perturbation sensitivity inherent in diffusion models. Adhering to Kerckhoff’s principle, we propose a novel mapping module based on dual keys to enhance robustness and security under lossy transmission conditions. Experimental results showcase the superior performance of our method in terms of extraction accuracy, robustness, security, and image quality.
The security of private communication is increasingly at risk due to widespread surveillance. Steganography, a technique for embedding secret messages within innocuous carriers, enables covert communication over monitored channels. Provably Secure Steganography (PSS), which ensures computational indistinguishability between the normal model output and steganography output, is the state-of-the-art in this field. However, current PSS methods often require obtaining the explicit distributions of the model. In this paper, we propose a provably secure steganography scheme that only requires a model API that accepts a seed as input. Our core mechanism involves sampling a candidate set of tokens and constructing a map from possible message bit strings to these tokens. The output token is selected by applying this mapping to the real secret message, which provably preserves the original model's distribution. To ensure correct decoding, we address collision cases, where multiple candidate messages map to the same token, by maintaining and strategically expanding a dynamic collision set within a bounded size range. Extensive evaluations of three real-world datasets and three large language models demonstrate that our sampling-based method is comparable with existing PSS methods in efficiency and capacity.
A generative image steganography method based on joint encoding of multi-object semantic information
No abstract available
Generative steganography is a steganography method that uses a generator to convert secret messages into realistic images. It has received widespread attention due to its ability to resist steganalysis. However, existing methods suffer from poor quality of generated stego images and the inability to withstand losses during complex social media transmission processes. In response to these issues, this article proposes a new frequency-domain diffusion generative steganography method that can achieve secure and robust steganography without the need for training or fine-tuning the network. In addition, we also studied the inherent errors in the bidirectional mapping of diffusion models and proposed solutions. The experimental results demonstrate the excellent performance of our method in terms of extraction accuracy, robustness, security, and image quality.
To accommodate covert communications based on generated images, researchers hid secret information into the image generation process in an endogenous manner. Existing methods often degrade model performance by altering the image generation process or model parameters. To address these limitations, we propose a secure diffusion steganography framework in text-guided generative images (SDS-TG) based on Stable Diffusion, embedding secret information into an imperceptible latent space, thereby generating high-quality images without compromising model performance. We establish a reversible mapping mechanism between secret information and standard Gaussian noise, seamlessly integrating it into the conditional image generation process. Additionally, we fine-tune the encoder part of the diffusion models, enhancing the robustness of the steganographic algorithm and improving the accuracy of information extraction, without altering the image generation process. Experiments demonstrate that this approach offers high security and surpasses SOTA methods in terms of robustness and capacity.
Generative models have demonstrated remarkable capabilities in synthesizing realistic content, creating new opportunities for secure communication through steganography---the practice of embedding covert messages within seemingly innocuous data. While prefix-based steganography, which encodes secret messages into shared probability intervals during generative sampling, has emerged as a promising paradigm for provably secure communication, its practical adoption remains constrained by inherent tradeoffs between security, capacity, and efficiency. To address these challenges, we propose two enhancements. The first enhancement optimizes quantization distortion in existing frameworks to minimize KL divergence, thereby enhancing theoretical security. The second redesigns the sampling mechanism via distribution coupling to amplify steganographic capacity, achieving this without incurring substantial computational overhead. Experimental validation on text generation task confirms our enhancements substantially outperform previous implementations, demonstrating notable capacity improvements, marked security enhancements, and efficiency gains on consumer-grade hardware. Cross-task comparisons with popular provably secure steganography further establish the proposed enhancements as achieving superior security-capacity-efficiency tradeoffs across diverse generative scenarios, advancing the practical deployment of provably secure steganography systems.
No abstract available
Abstract. Traditional steganography algorithms use procedures created by human experts to conceal the secret message inside a cover medium. Generative adversarial networks (GANs) have recently been used to automate this process. However, GAN based steganography has some limitations. The capacity of these models is limited. By increasing the steganography capacity, security is decreased, and distortion is increased. The performance of the extractor network also decreases with increasing the steganography capacity. In this work, an approach for developing a generator model for image steganography is proposed. The approach involves building a generator model, called the late embedding generator model, in two stages. The first stage of the generator model uses only the flattened cover image, and second stage uses a secret message and the first stage’s output to generate the stego image. Furthermore, a dual-training strategy is employed to train the generator network: the first stage focuses on learning fundamental image features through a reconstruction loss, and the second stage is trained with three loss terms, including an adversarial loss, to incorporate the secret message. The proposed approach demonstrates that hiding data only in the deeper layers of the generator network boosts capacity without requiring complex architectures, reducing computational storage requirements. The efficacy of the proposed approach is evaluated by varying the depth of these two stages, resulting in four generator models. A comprehensive set of experiments was performed on the CelebA dataset, which contains more than 200,000 samples. The results show that the late embedding model performs better than the state-of-the-art models. Also, it increases the steganography capacity to more than four times compared with the existing GAN-based steganography methods. The extracted payload achieves an accuracy of 99.98%, with the extractor model successfully decoding the secret message.
In the field of digital communications, ensuring the security of sensitive data, especially within images, is of paramount importance. This paper introduces the Steganographic Generative Adversarial Network based on Wasserstein Generative Adversarial Networks with Gradient Penalty(SWGAN-GP). The model comprises three networks: the generator that creates realistic cover images, the discriminator that enhances the authenticity of steganography by comparing steganographed and real images, and the steganalysis model that detects steganographic activities in images. By observing the convergence of the loss function of generator, we found that introducing a gradient penalty into the interpolation between real and steganographed images not only strengthens the model's training stability but also accelerates the training convergence. Additionally, the experimental results demonstrate that our model maintain the best image quality and structural integrity after steganography. These findings suggest the great potential of our model in the realm of secure steganographic communications.
In response to the issues of low image quality and insufficient security in traditional image steganography methods, this paper proposes an error-correcting image steganography method based on generative adversarial networks (GAN). During the generation of the stego-image, a global attention mechanism is incorporated, enabling the model to effectively extract image features and the quality of the stego-image continuously improve. The use of Reed-Solomon (RS) error-correcting code ensures a high accuracy rate for the extraction of secret messages. Experimental studies show that the peak signal-to-noise ratio (PSNR) of the stego-image reaches 44.23 dB, and the structural similarity index (SSIM) reaches 0.96, with a certain increase in security as well.
Steganography usually modifies cover media to embed secret data. A new steganographic approach called generative steganography (GS) has emerged recently, in which stego images (images containing secret data) are generated from secret data directly without cover media. However, existing GS schemes are often criticized for their poor performances. In this paper, we propose an advanced generative steganography network (GSN) that can generate realistic stego images without using cover images. We firstly introduce the mutual information mechanism in GS, which helps to achieve high secret extraction accuracy. Our model contains four sub-networks, i.e., an image generator (G), a discriminator (D), a steganalyzer (S), and a data extractor (E). D and S act as two adversarial discriminators to ensure the visual quality and security of generated stego images. E is to extract the hidden secret from generated stego images. The generator G is flexibly constructed to synthesize either cover or stego images with different inputs. It facilitates covert communication by concealing the function of generating stego images in a normal generator. A module named secret block is designed to hide secret data in the feature maps during image generation, with which high hiding capacity and image fidelity are achieved. In addition, a novel hierarchical gradient decay (HGD) skill is developed to resist steganalysis detection. Experiments demonstrate the superiority of our work over existing methods.
Segmentation ambiguity in generative linguistic steganography could induce decoding errors. One existing disambiguating way is removing the tokens whose mapping words are the prefixes of others in each candidate pool. However, it neglects probability distribution of candidates and degrades imperceptibility. To enhance steganographic security, meanwhile addressing segmentation ambiguity, we propose a secure and disambiguating approach for linguistic steganography. In this letter, we focus on two questions: (1) Which candidate pools should be modified? (2) Which tokens should be retained? Firstly, we propose a secure token-selection principle that the sum of selected tokens' probabilities is positively correlated to statistical imperceptibility. To meet both disambiguation and optimal security, we present a lightweight disambiguating approach that is finding out a maximum weight independent set (MWIS) in one candidate graph only when candidate-level ambiguity occurs. Experiments show that our approach outperforms the existing method in various security metrics, improving 25.7% statistical imperceptibility and 11.2% anti-steganalysis capacity averagely.
No abstract available
This research introduces a coverless steganography framework that embeds secret messages within the latent space of images produced by a generative adversarial network (GAN). It facilitates clandestine transmission without altering any pre-existing cover media. The sender encodes the message into a 512-dimensional latent vector utilizing binary encoding with a 16-bit header, optionally employing XOR encryption to augment confidentiality. An picture generated from the encoded latent vector is conveyed as the carrier, maintaining the coverless principle and enhancing resistance to steganalysis. Experimental findings demonstrate consistent end-to-end recovery for brief messages (e.g., "OK," “AI DEMO”), minimal transmission overhead (sub-kilobyte payloads), and effective decoding (average ≈0.22 s). The approach accommodates messages with up to 37 characters and maintains robustness under JPEG compression within the evaluated parameters. A comprehensive review utilizing many metrics, enhanced by automated visual analytics, offers a clear assessment of security, efficiency, and reconstruction quality. The proposed technique provides a viable, reproducible, and scalable basis for AI-driven coverless covert communication.
Coverless steganography requires no modification of the cover image and can effectively resist steganalysis, which has received widespread attention from researchers in recent years. However, existing coverless image steganographic methods are achieved by constructing a mapping between the secret information and images in a known dataset. This image dataset needs to be sent to the receiver, which consumes substantial resources and poses a risk of information leakage. In addition, existing methods cannot achieve high-accuracy extraction when facing various attacks. To address the aforementioned issues, we propose a robust generative steganography based on image mapping (GSIM). This method establishes prompts based on the topic and quantity requirements first and then generate the candidate image database according to the prompts, which can be independently generated by both the sender and receiver without the need for transmission. In order to improve the robustness of the algorithm, our proposed GSIM utilizes prompts and fractional-order Chebyshev-Fourier moments (FrCHFMs) to construct the mapping between the generated images and the predefined binary sequences, as well as uses speeded-up robust features (SURFs) as auxiliary features in the information extraction phase. The experimental results show that GSIM is superior to existing coverless image steganographic methods in terms of capacity, security, and robustness.
The existing generative steganography methods have the limitations of the low capacity and poor stego quality. The target image which are served as guidance features and used to translate the image from the original to special one during the synthetic process. These guidance features are abundant, stable and do not contain the identity information which can be used as cover in steganography. This paper proposed a generative image steganography by using the guidance features in image synthesis, and a secret fusion algorithm is proposed to solve the problems of guidance features embedding and extraction errors. Due to the robustness of styles and attributes, the embedded guidance features can be extracted directly in receiver side from the synthesized image without code book or database. Compared with the existing generative steganography methods, the proposed method can achieve a higher security and quality while maintaining a larger embedding capacity.
In recent years, linguistic generative steganography has been greatly developed. The previous works are mainly to optimize the perceptual-imperceptibility and statistical-imperceptibility of the generated steganographic text, and the latest developments show that they have been able to generate steganographic texts that look authentic enough. However, we noticed that these works generally cannot control the semantic expression of the generated steganographic text, and we believe this will bring potential security risks. We named this kind of security challenges as cognitive-imperceptibility. We think this is a new challenge that the generative steganography models must strive to overcome in the future. In this letter, we conduct some preliminary attempts to solve this challenge. Experimental results show that the proposed methods can further constrain the semantic expression of the generated steganographic text on the basis of ensuring certain perceptual-imperceptibility and statistical-imperceptibility, so as to enhance its cognitive-imperceptibility.
Information hiding is an art and science with a long history and is widely used in covert communication. There are many ways to hide secret data in image, audio, and video. However, relatively few systems can hide information in text. Generative text steganography is a promising topic in natural language text infor-mation hiding. Previous generative text steganography methods use a fixed candidate pool generation rule, and they cannot effec-tively control the security of the generated text. The perceptual-imperceptibility and statistical-imperceptibility conflict effect also causes the poor quality of the steganographic text generated by previous generative text steganography methods. Moreover, pre-vious generative text steganography approaches barely discuss the robustness of steganographic text. This paper proposes a security controllable text steganography method that can generate natural-looking steganographic text with a statistical distribution that matches the natural language distribution. The proposed method combines the metrics of per-ceptual-imperceptibility and statistical-imperceptibility to calcu-late the combined distortion. It selects the tokens with the smallest combined distortion to construct a candidate pool at each time step. Moreover, the maximum combined distortion threshold is set when embedding secret messages to ensure controllable security. We conducted several experiments to evaluate the proposed model from the perspectives of embedding rate, perceptual-impercepti-bility, statistical-imperceptibility, and anti-attack ability. The ex-perimental results show that the proposed method can generate smooth and readable steganographic sentences with good re-sistance to steganalysis and high robustness.
Provably secure steganography ensures indistinguishability between stego and cover carrier through mathematical proofs. However, existing methods face limited embedding capacity and distribution synchronization challenges, especially at high embedding rates. To address these issues, we propose TASDF-Stega, a text-audio joint steganography method based on the latent space of diffusion models, which achieves high capacity and provable security. First, we design an encrypted steganographic mapping module with adaptive arithmetic decoding, which efficiently embeds secret information into the latent space while preserving the distribution. Second, a reversible secret diffusion mechanism enables high-capacity embedding and precise extraction. Moreover, to resolve the problem of distribution parameter synchronization in practical communication, we introduce an audio-assisted joint encode module. This design ensures accurate reconstruction of the diffusion inverse process and avoids cumulative extraction errors. Experimental results on multiple datasets demonstrate that TASDF-Stega achieves provable security, the outperforms state-of-the-art methods in embedding capacity and imperceptibility.
In the context of widespread global information sharing, information security and privacy protection have become focal points. Steganographic systems enhance information security by embedding confidential information into public carriers; however, existing generative text steganography methods face challenges in handling the long-tail distribution of candidate word pools, which impacts the imperceptibility of steganographic information. This paper proposes a quality control theory for steganographic text generation based on information entropy constraints, exploring the relationship between the imperceptibility of steganographic texts and information entropy. By controlling the information entropy of the candidate word pool within a specific range, we optimize the imperceptibility of the steganographic text. We establish upper and lower bounds for information entropy and introduce an adaptive truncation method to balance semantic coherence and lexical diversity. Experimental results demonstrate that reasonably controlling the candidate pool size and information entropy thresholds significantly enhances the quality and detection resistance of steganographic texts, showcasing broad application potential in the field of natural language processing.
No abstract available
Most of the existing text generative steganographic methods are based on coding the conditional probability distribution of each word during the generation process, and then selecting specific words according to the secret information, so as to achieve information hiding. Such methods have their limitations which may bring potential security risks. Firstly, with the increase of embedding rate, these models will choose words with lower conditional probability, which will reduce the quality of the generated steganographic texts; secondly, they can not control the semantic expression of the final generated steganographic text. This paper proposes a new text generative steganography method which is quietly different from the existing models. We use a Knowledge Graph (KG) to guide the generation of steganographic sentences. On the one hand, we hide the secret information by coding the path in the knowledge graph, but not the conditional probability of each generated word; on the other hand, we can control the semantic expression of the generated steganographic text to a certain extent. The experimental results show that the proposed model can guarantee both the quality of the generated text and its semantic expression, which is a supplement and improvement to the current text generation steganography.
Text has become one of the most extensively used digital media in Internet, which provides steganography an effective carrier to realize confidential message hiding. Nowadays, generation-based linguistic steganography has made a significant breakthrough due to the progress of deep learning. However, previous methods based on recurrent neural network have two deviations including exposure bias and embedding deviation, which seriously destroys the security of steganography. In this article, we propose a novel linguistic steganographic model based on adaptive probability distribution and generative adversarial network, which achieves the goal of hiding secret messages in the generated text while guaranteeing high security performance. First, the steganographic generator is trained by using generative adversarial network to effectively tackle the exposure bias, and then the candidate pool is obtained by a probability similarity function at each time step, which alleviates the embedding deviation through dynamically maintaining the diversity of probability distribution. Third, to further improve the security, a novel strategy that conducts information embedding during model training is put forward. We design various experiments from different aspects to verify the performance of the proposed model, including imperceptibility, statistical distribution, anti-steganalysis ability. demonstrate that our proposed model outperforms the current state-of-the-art steganographic schemes.
No abstract available
One of the main challenges in distributed learning arises from the difficulty of handling heterogeneous local models and data. In light of the recent success of generative models, we propose to meet this challenge by building on the idea of exchanging synthetic data instead of sharing model parameters. Local models can then be treated as "black boxes" with the ability to learn their parameters from data and to generate data according to these parameters. Moreover, if the local models admit semi-supervised learning, we can extend the approach by enabling local models on different probability spaces. This allows to handle heterogeneous data with different modalities. We formulate the learning of the local models as a cooperative game starting from the principles of game theory. We prove the existence of a unique Nash equilibrium for exponential family local models and show that the proposed learning approach converges to this equilibrium. We demonstrate the advantages of our approach on standard benchmark vision datasets for image classification and conditional generation.
Self-consuming generative models have received significant attention over the last few years. In this paper, we study a self-consuming generative model with heterogeneous preferences that is a generalization of the model in Ferbach et al. (2024). The model is retrained round by round using real data and its previous-round synthetic outputs. The asymptotic behavior of the retraining dynamics is investigated across four regimes using different techniques including the nonlinear Perron--Frobenius theory. Our analyses improve upon that of Ferbach et al. (2024) and provide convergence results in settings where the well-known Banach contraction mapping arguments do not apply. Stability and non-stability results regarding the retraining dynamics are also given.
Generative image steganography has gained significant attention due to its ability to hide secret data during image generation. However, existing generative image steganography methods still face challenges in terms of controllability, usability, and robustness, making it difficult to apply real-world scenarios. We propose a practical and robust generative image steganography based on Latent Diffusion Models, called LDStega. LDStega takes controllable condition text as input and designs an encoding strategy in the reverse process of the Latent Diffusion Models to couple latent space generation with data hiding. The encoding strategy selects a sampling interval from a candidate pool of truncated Gaussian distributions guided by secret data to generate the stego latent space. Subsequently, the stego latent space is fed into the Decoder to generate the stego image. The receiver extracts the secret data from the globally Gaussian distribution of the lossy-reconstructed latent space in the reverse process. Experimental results demonstrate that LDStega achieves high extraction accuracy while controllably generating image content and saving the stego image in the widely used PNG and JPEG formats. Additionally, LDStega outperforms state-of-the-art techniques in resisting common image attacks.
Image steganography is the process of hiding secret data in a cover image by subtle perturbation. Recent studies show that it is feasible to use a fixed neural network for data embedding and extraction. Such Fixed Neural Network Steganography (FNNS) demonstrates favorable performance without the need for training networks, making it more practical for real-world applications. However, the stego-images generated by the existing FNNS methods exhibit high distortion, which is prone to be detected by steganalysis tools. To deal with this issue, we propose a Cover-separable Fixed Neural Network Steganography, namely Cs-FNNS. In Cs-FNNS, we propose a Steganographic Perturbation Search (SPS) algorithm to directly encode the secret data into an imperceptible perturbation, which is combined with an AI-generated cover image for transmission. Through accessing the same deep generative models, the receiver could reproduce the cover image using a pre-agreed key, to separate the perturbation in the stego-image for data decoding. such an encoding/decoding strategy focuses on the secret data and eliminates the disturbance of the cover images, hence achieving a better performance. We apply our Cs-FNNS to the steganographic field that hiding secret images within cover images. Through comprehensive experiments, we demonstrate the superior performance of the proposed method in terms of visual quality and undetectability. Moreover, we show the flexibility of our Cs-FNNS in terms of hiding multiple secret images for different receivers. Code is available at https://github.com/albblgb/Cs-FNNS
Steganography is the art and science of hiding secret messages in public communication so that the presence of secret messages cannot be detected. There are two distribution-preserving steganographic frameworks, one is sampler-based and the other is compression-based. The former requires a perfect sampler which yields data following the same distribution, and the latter needs the explicit distribution of generative objects. However, these two conditions are too strict even unrealistic in the traditional data environment, e.g., the distribution of natural images is hard to seize. Fortunately, generative models bring new vitality to distribution-preserving steganography, which can serve as the perfect sampler or provide the explicit distribution of generative media. Taking text-to-speech generation task as an example, we propose distribution-preserving steganography based on WaveGlow and WaveRNN, which corresponds to the former two categories. Steganalysis experiments and theoretical analysis are conducted to demonstrate that the proposed methods can preserve the distribution.
Whereas cryptography easily arouses attacks by means of encrypting a secret message into a suspicious form, steganography is advantageous for its resilience to attacks by concealing the message in an innocent-looking cover signal. Minimal distortion steganography, one of the mainstream steganography frameworks, embeds messages while minimizing the distortion caused by the modification on the cover elements. Due to the unavailability of the original cover signal for the receiver, message embedding is realized by finding the coset leader of the syndrome function of steganographic codes migrated from channel coding, which is complex and has limited performance. Fortunately, deep generative models and the robust semantic of generated data make it possible for the receiver to perfectly reproduce the cover signal from the stego signal. With this advantage, we propose cover-reproducible steganography where the source coding, e.g., arithmetic coding, serves as the steganographic code. Specifically, the decoding process of arithmetic coding is used for message embedding and its encoding process is regarded as message extraction. Taking text-to-speech and text-to-image synthesis tasks as two examples, we illustrate the feasibility of cover-reproducible steganography. Steganalysis experiments and theoretical analysis are conducted to demonstrate that the proposed methods outperform the existing methods in most cases.
Image steganography, the technique of hiding secret messages within images, has recently advanced with generative image steganography, which hides messages during image creation. However, current generative steganography methods often face criticism for their low extraction accuracy and poor robustness—particularly their vulnerability to JPEG compression. To address these challenges, we propose a novel generative image steganography method based on the text-to-image multimodal generative model (StegaMGM). StegaMGM utilizes the initial random normalization distribution in the generative process of latent diffusion models (LDMs), the secret message is hidden in the generated image through message sampling, ensuring it follows the same probability distribution as typical image generative. The content of the stego image can also be controlled through the prompts. On the receiver side, using the shared prompt and diffusion inversion, can extract secret message with high accuracy. In the experimental section, we conducted detailed experiments to demonstrate the advantages of our proposed StegaMGM framework in extraction accuracy, resistance to JPEG compression, and security.
In recent years, generative steganography has witnessed remarkable progress in the field of covert communication. It leverages techniques such as generative adversarial networks (GANs) or flow-based generative models (GLOW) to generate stego images. However, these approaches often grapple with the dilemma of achieving optimal steganographic capacity while ensuring the accurate extraction of hidden information. Additionally, the models occasionally still generate low-quality images that are highly vulnerable to detection by steganalysis tools. To tackle the aforementioned challenges and enhance the overall performance of generative image steganography, this paper proposes the deterministic guided additive diffusion model for generative image steganography (DGADM-GIS). Initially, we devise a reversible mapping function that is used for deterministic guided by a provided secret message, and then construct a secret latent Gaussian vector. Moreover, the proposed DGADM-GIS framework designs an additive sampling method based on the superposition principle of normal distribution to obtain a Gaussian vector that satisfies independent, random and obeys the standard normal distribution, which is transformed to a stego image in a way of maintaining the distribution by the diffusion model. Furthermore, we conduct error analysis experiments on our proposed scheme and derive methods to enhance the accuracy of secret information extraction. The experimental results show that our proposed steganographic method exhibits robust resistance to steganalysis. When embedding 3 bits of secret information per pixel, it achieves nearly 100% extraction accuracy.
Generative linguistic steganography aims to embed information into natural language texts to achieve covert transmission. However, currently in most approaches based on subword-supporting language models, the extraction process relies on tokenizing steganographic texts into tokens, which could cause segmentation ambiguity, leading to false results or failures of extraction finally. Despite several existing countermeasures (or disambiguation) that have been proposed, they are based on removing tokens of candidate pools, which render them incompatible from the sights of keeping imperceptibility, potentially incurring safety risks. To avoid it, we focus on tackling segmentation ambiguity with near-integrity of candidate pools. In this paper, we propose a near-imperceptible disambiguating approach via verification for generative linguistic steganography. First, this paper draws an all-case extraction method to obtain possible true extracted results. Further, length verification and checksum verification are presented to filter wrong extracted results caused by segmentation ambiguity. Experiments show that our disam-biguating approach outperforms the existing disambiguating approaches, on various criteria, including about 23.49 % higher embedding capacity, about 23.46 % higher imperceptibility and about 5.73% anti-steganalysis capacity of steganographic texts.
We present the vector quantized diffusion (VQ-Diffusion) model for text-to-image generation. This method is based on a vector quantized variational autoencoder (VQ-VAE) whose latent space is modeled by a conditional variant of the recently developed Denoising Diffusion Probabilistic Model (DDPM). We find that this latent-space method is well-suited for text-to-image generation tasks because it not only eliminates the unidirectional bias with existing methods but also allows us to incorporate a mask-and-replace diffusion strategy to avoid the accumulation of errors, which is a serious problem with existing methods. Our experiments show that the VQ-Diffusion produces significantly better text-to-image generation results when compared with conventional autoregressive (AR) models with similar numbers of parameters. Compared with previous GAN-based text-to-image methods, our VQ-Diffusion can handle more complex scenes and improve the synthesized image quality by a large margin. Finally, we show that the image generation computation in our method can be made highly efficient by reparameterization. With traditional AR methods, the text-to-image generation time increases linearly with the output image resolution and hence is quite time consuming even for normal size images. The VQ-Diffusion allows us to achieve a better trade-off between quality and speed. Our experiments indicate that the VQ-Diffusion model with the reparameterization is fifteen times faster than traditional AR methods while achieving a better image quality. The code and models are available at https://github.com/cientgu/VQ-Diffusion.
Recent progress in diffusion-based visual generation has largely relied on latent diffusion models with variational autoencoders (VAEs). While effective for high-fidelity synthesis, this VAE+diffusion paradigm suffers from limited training efficiency, slow inference, and poor transferability to broader vision tasks. These issues stem from a key limitation of VAE latent spaces: the lack of clear semantic separation and strong discriminative structure. Our analysis confirms that these properties are crucial not only for perception and understanding tasks, but also for the stable and efficient training of latent diffusion models. Motivated by this insight, we introduce SVG, a novel latent diffusion model without variational autoencoders, which leverages self-supervised representations for visual generation. SVG constructs a feature space with clear semantic discriminability by leveraging frozen DINO features, while a lightweight residual branch captures fine-grained details for high-fidelity reconstruction. Diffusion models are trained directly on this semantically structured latent space to facilitate more efficient learning. As a result, SVG enables accelerated diffusion training, supports few-step sampling, and improves generative quality. Experimental results further show that SVG preserves the semantic and discriminative capabilities of the underlying self-supervised representations, providing a principled pathway toward task-general, high-quality visual representations. Code and interpretations are available at https://howlin-wang.github.io/svg/.
In this work, we propose DiffWave, a versatile Diffusion probabilistic model for conditional and unconditional Waveform generation. The model is non-autoregressive, and converts the white noise signal into structured waveform through a Markov chain with a constant number of steps at synthesis. It is efficiently trained by optimizing a variant of variational bound on the data likelihood. DiffWave produces high-fidelity audios in Different Waveform generation tasks, including neural vocoding conditioned on mel spectrogram, class-conditional generation, and unconditional generation. We demonstrate that DiffWave matches a strong WaveNet vocoder in terms of speech quality~(MOS: 4.44 versus 4.43), while synthesizing orders of magnitude faster. In particular, it significantly outperforms autoregressive and GAN-based waveform models in the challenging unconditional generation task in terms of audio quality and sample diversity from various automatic and human evaluations.
Diffusion models have the ability to generate high quality images by denoising pure Gaussian noise images. While previous research has primarily focused on improving the control of image generation through adjusting the denoising process, we propose a novel direction of manipulating the initial noise to control the generated image. Through experiments on stable diffusion, we show that blocks of pixels in the initial latent images have a preference for generating specific content, and that modifying these blocks can significantly influence the generated image. In particular, we show that modifying a part of the initial image affects the corresponding region of the generated image while leaving other regions unaffected, which is useful for repainting tasks. Furthermore, we find that the generation preferences of pixel blocks are primarily determined by their values, rather than their position. By moving pixel blocks with a tendency to generate user-desired content to user-specified regions, our approach achieves state-of-the-art performance in layout-to-image generation. Our results highlight the flexibility and power of initial image manipulation in controlling the generated image.
扩散模型是有效的纯化方法,在现有分类器执行分类任务之前,使用生成方法去除噪声或对抗性攻击。然而,扩散模型的效率仍然是一个问题,现有的解决方案基于知识蒸馏,由于生成步骤较少,可能会危及生成质量。因此,我们提出TendiffPure,一种用于纯化的张量化和压缩的扩散模型。与知识蒸馏方法不同,我们直接使用张量链分解压缩扩散模型的U-Net骨干网络,减少参数数量,并在多维数据(如图像)中捕获更多的空间信息。空间复杂度从 O ( N ^2)减少到 O ( NR ^2),其中 R ≤4为张量序列秩, N 为通道数。实验结果表明,基于CIFAR-10、Fashion-MNIST和MNIST数据集,TendiffPure可以更有效地生成高质量的净化结果,并在两种噪声和一次对抗性攻击下优于基线纯化方法。 Diffusion models are effective purification methods, where the noises or adversarial attacks are removed using generative approaches before pre-existing classifiers conducting classification tasks. However, the efficiency of diffusion models is still a concern, and existing solutions are based on knowledge distillation which can jeopardize the generation quality because of the small number of generation steps. Hence, we propose TendiffPure as a tensorized and compressed diffusion model for purification. Unlike the knowledge distillation methods, we directly compress U-Nets as backbones of diffusion models using tensor-train decomposition, which reduces the number of parameters and captures more spatial information in multi-dimensional data such as images. The space complexity is reduced from O ( N ^2) to O ( NR ^2) with R ≤ 4 as the tensor-train rank and N as the number of channels. Experimental results show that TendiffPure can more efficiently obtain high-quality purification results and outperforms the baseline purification methods on CIFAR-10, Fashion-MNIST, and MNIST datasets for two noises and one adversarial attack.
目前文本生成图像的研究已显示出与普通画家类似的水平,但与艺术家绘画水平相比仍有很大改进空间;艺术家水平的绘画通常将多个意象的特征融合到一个意象中,以表示多层次语义信息。在预实验中,我们证实了这一点,并咨询了3个具有不同艺术欣赏能力的群体的意见,以确定画家和艺术家之间绘画水平的区别。之后,利用这些观点帮助人工智能绘画系统从普通画家水平的图像生成改进为艺术家水平的图像生成。具体来说,提出一种无需任何进一步预训练的、基于文本的多阶段引导方法,帮助扩散模型在生成的图像中向多层次语义表示迈进。实验中的机器和人工评估都验证了所提方法的有效性。此外,与之前单阶段引导方法不同,该方法能够通过控制不同阶段之间的指导步数来控制各个意象特征在绘画中的表现程度。
Steganography embeds confidential data within seemingly innocuous communications. Provable security in steganography, a long-sought goal, has become feasible with deep generative models. However, existing methods face a critical trade-off between security and efficiency. This paper introduces SparSamp, an efficient provably secure steganography method based on sparse sampling. SparSamp embeds messages by combining them with pseudo-random numbers to obtain message-derived random numbers for sampling. It enhances extraction accuracy and embedding capacity by increasing the sampling intervals and making the sampling process sparse. SparSamp preserves the original probability distribution of the generative model, thus ensuring security. It introduces only $O(1)$ additional complexity per sampling step, enabling the fastest embedding speed without compromising generation speed. SparSamp is designed to be plug-and-play; message embedding can be achieved by simply replacing the sampling component of an existing generative model with SparSamp. We implemented SparSamp in text, image, and audio generation models. It can achieve embedding speeds of up to 755 bits/second with GPT-2, 5046 bits/second with DDPM, and 9,223 bits/second with WaveRNN.
The technique of hiding secret messages within seemingly harmless covertext to evade examination by censors with rigorous security proofs is known as provably secure steganography (PSS). PSS evolves from symmetric key steganography to public-key steganography, functioning without the requirement of a pre-shared key and enabling the extension to multi-party covert communication and identity verification mechanisms. Recently, a public-key steganography method based on elliptic curves was proposed, which uses point compression to eliminate the algebraic structure of curve points. However, this method has strict requirements on the curve parameters and is only available on half of the points. To overcome these limitations, this paper proposes a more general elliptic curve public key steganography method based on admissible encoding. By applying the tensor square function to the known well-distributed encoding, we construct admissible encoding, which can create the pseudo-random public-key encryption function. The theoretical analysis and experimental results show that the proposed provable secure public-key steganography method can be deployed on all types of curves and utilize all points on the curve.
Generative linguistic steganography mainly utilized language models and applied steganographic sampling (stegosampling) to generate high-security steganographic text (stegotext). However, previous methods generally lead to statistical differences between the conditional probability distributions of stegotext and natural text, which brings about security risks. In this paper, to further ensure security, we present a novel provably secure generative linguistic steganographic method ADG, which recursively embeds secret information by Adaptive Dynamic Grouping of tokens according to their probability given by an off-the-shelf language model. We not only prove the security of ADG mathematically, but also conduct extensive experiments on three public corpora to further verify its imperceptibility. The experimental results reveal that the proposed method is able to generate stegotext with nearly perfect security.
Steganography is the task of concealing a message within a medium such that the presence of the hidden message cannot be detected. Though the prospect of steganography is conceivably interesting in many contexts, and though work has been done both towards formalizing steganographic security and providing provably secure constructions, little work exists attempting to provide efficient and provably secure steganographic schemes in specific, useful domains. Beginning from the starting point of the initial definition of steganographic security, I have engaged in an exploration which has developed to include two primary tasks, both pointing towards the realization of efficient and secure steganographic systems in practice: (a) investigating the syntactic and semantic applicability of the current formalism of steganographic security to a broader range of potentially interesting domains and (b) constructing and implementing provably secure (symmetric-key) steganographic schemes in domains which are well-suited to the current formalism.
Recent research in provably secure neural linguistic steganography has overlooked a crucial aspect: the sender must detokenize stegotexts to avoid raising suspicion from the eavesdropper. The segmentation ambiguity problem, which arises when using language models based on subwords, leads to occasional decoding failures in all neural language steganography implementations based on these models. Current solutions to this issue involve altering the probability distribution of candidate words, rendering them incompatible with provably secure steganography. We propose a novel secure disambiguation method named SyncPool, which effectively addresses the segmentation ambiguity problem. We group all tokens with prefix relationships in the candidate pool before the steganographic embedding algorithm runs to eliminate uncertainty among ambiguous tokens. To enable the receiver to synchronize the sampling process of the sender, a shared cryptographically-secure pseudorandom number generator (CSPRNG) is deployed to select a token from the ambiguity pool. SyncPool does not change the size of the candidate pool or the distribution of tokens and thus is applicable to provably secure language steganography methods. We provide theoretical proofs and experimentally demonstrate the applicability of our solution to various languages and models, showing its potential to significantly improve the reliability and security of neural linguistic steganography systems.
We propose a robust and provably secure image steganography framework based on latent-space iterative optimization. Within this framework, the receiver treats the transmitted image as a fixed reference and iteratively refines a latent variable to minimize the reconstruction error, thereby improving message extraction accuracy. Unlike prior methods, our approach preserves the provable security of the embedding while markedly enhancing robustness under various compression and image processing scenarios. On benchmark datasets, the experimental results demonstrate that the proposed iterative optimization not only improves robustness against image compression while preserving provable security, but can also be applied as an independent module to further reinforce robustness in other provably secure steganographic schemes. This highlights the practicality and promise of latent-space optimization for building reliable, robust, and secure steganographic systems.
The rapid development of image generation models has facilitated the widespread dissemination of generated images on social networks, creating favorable conditions for provably secure image steganography. However, existing methods face issues such as low quality of generated images and lack of semantic control in the generation process. To leverage provably secure steganography with more effective and high-performance image generation models, and to ensure that stego images can accurately extract secret messages even after being uploaded to social networks and subjected to lossy processing such as JPEG compression, we propose a high-quality, provably secure, and robust image steganography method based on state-of-the-art autoregressive (AR) image generation models using Vector-Quantized (VQ) tokenizers. Additionally, we employ a cross-modal error-correction framework that generates stego text from stego images to aid in restoring lossy images, ultimately enabling the extraction of secret messages embedded within the images. Extensive experiments have demonstrated that the proposed method provides advantages in stego quality, embedding capacity, and robustness, while ensuring provable undetectability.
To circumvent the unbridled and ever-encroaching surveillance and censorship in cyberspace, steganography has garnered attention for its ability to hide private information in innocent-looking carriers. Current provably secure steganography approaches require a pair of encoder and decoder to hide and extract private messages, both of which must run the same model with the same input to obtain identical distributions. These requirements pose significant challenges to the practical implementation of steganography, including limited access to powerful hardware and the intolerance of any changes to the shared input. To relax the limitation of hardware and solve the challenge of vulnerable shared input, a novel and practically significant scenario with asymmetric resource should be considered, where only the encoder is high-resource and accessible to powerful models while the decoder can only read the steganographic carriers without any other model's input. This paper proposes a novel provably robust and secure steganography framework for the asymmetric resource setting. Specifically, the encoder uses various permutations of distribution to hide secret bits, while the decoder relies on a sampling function to extract the hidden bits by guessing the permutation used. Further, the sampling function only takes the steganographic carrier as input, which makes the decoder independent of model's input and model itself. A comprehensive assessment of applying our framework to generative models substantiates its effectiveness. Our implementation demonstrates robustness when transmitting over binary symmetric channels with errors.
Recent provably secure linguistic steganography (PSLS) methods rely on mainstream autoregressive language models (ARMs) to address historically challenging tasks, that is, to disguise covert communication as ``innocuous'' natural language communication. However, due to the characteristic of sequential generation of ARMs, the stegotext generated by ARM-based PSLS methods will produce serious error propagation once it changes, making existing methods unavailable under an active tampering attack. To address this, we propose a robust, provably secure linguistic steganography with diffusion language models (DLMs). Unlike ARMs, DLMs can generate text in a partially parallel manner, allowing us to find robust positions for steganographic embedding that can be combined with error-correcting codes. Furthermore, we introduce error correction strategies, including pseudo-random error correction and neighborhood search correction, during steganographic extraction. Theoretical proof and experimental results demonstrate that our method is secure and robust. It can resist token ambiguity in stegotext segmentation and, to some extent, withstand token-level attacks of insertion, deletion, and substitution.
Secure covert communication in hostile environments requires simultaneously achieving invisibility, provable security guarantees, and robustness against informed adversaries. This paper presents a novel hybrid steganographic framework that unites cover synthesis and cover modification within a unified multichannel protocol. A secret-seeded PRNG drives a lightweight Markov-chain generator to produce contextually plausible cover parameters, which are then masked with the payload and dispersed across independent channels. The masked bit-vector is imperceptibly embedded into conventional media via a variance-aware least-significant-bit algorithm, ensuring that statistical properties remain within natural bounds. We formalize a multichannel adversary model (MC-ATTACK) and prove that, under standard security assumptions, the adversary's distinguishing advantage is negligible, thereby guaranteeing both confidentiality and integrity. Empirical results corroborate these claims: local-variance-guided embedding yields near-lossless extraction (mean BER $<5\times10^{-3}$, correlation $>0.99$) with minimal perceptual distortion (PSNR $\approx100$,dB, SSIM $>0.99$), while key-based masking drives extraction success to zero (BER $\approx0.5$) for a fully informed adversary. Comparative analysis demonstrates that purely distortion-free or invertible schemes fail under the same threat model, underscoring the necessity of hybrid designs. The proposed approach advances high-assurance steganography by delivering an efficient, provably secure covert channel suitable for deployment in high-surveillance networks.
This paper focuses on several theoretical issues and principles in steganography security, and defines four security levels by analyzing the corresponding algorithm instances. In the theoretical analysis, we discuss the differences between steganography security and watermarking security. The two necessary conditions for the steganography security are obtained. Under the current technology situation, we then analyze the indistinguishability of the cover and stego-cover, and consider that the steganography security should rely on the key secrecy with algorithms open. By specifying the role of key in steganography, the necessary conditions for a secure steganography algorithm in theory are formally presented. When analyzing the security instances, we have classified the steganalysis attacks according to their variable access to the steganography system, and then defined the four security levels. The higher level security one has, the higher level attacks one can resist. We have also presented algorithm instances based on current technical conditions, and analyzed their data hiding process, security level, and practice requirements.
Alkaid: Resilience to Edit Errors in Provably Secure Steganography via Distance-Constrained Encoding
While provably secure steganography provides strong concealment by ensuring stego carriers are indistinguishable from natural samples, such systems remain vulnerable to real-world edit errors (e.g., insertions, deletions, substitutions) because their decoding depends on perfect synchronization and lacks error-correcting capability. To bridge this gap, we propose Alkaid, a provably secure steganographic scheme resilient to edit errors via distance-constrained encoding. The key innovation integrates the minimum distance decoding principle directly into the encoding process by enforcing a strict lower bound on the edit distance between codewords of different messages. Specifically, if two candidate codewords violate this bound, they are merged to represent the same message, thereby guaranteeing reliable recovery. While maintaining provable security, we theoretically prove that Alkaid offers deterministic robustness against bounded errors. To implement this scheme efficiently, we adopt block-wise and batch processing. Extensive experiments demonstrate that Alkaid achieves decoding success rates of 99\% to 100\% across diverse error channels, delivers a payload of 0.2 bits per token for high embedding capacity, and maintains an encoding speed of 6.72 bits per second, significantly surpassing state-of-the-art (SOTA) methods in robustness, capacity, and efficiency.
Steganography is the practice of encoding secret information into innocuous content in such a manner that an adversarial third party would not realize that there is hidden meaning. While this problem has classically been studied in security literature, recent advances in generative models have led to a shared interest among security and machine learning researchers in developing scalable steganography techniques. In this work, we show that a steganography procedure is perfectly secure under Cachin (1998)'s information-theoretic model of steganography if and only if it is induced by a coupling. Furthermore, we show that, among perfectly secure procedures, a procedure maximizes information throughput if and only if it is induced by a minimum entropy coupling. These insights yield what are, to the best of our knowledge, the first steganography algorithms to achieve perfect security guarantees for arbitrary covertext distributions. To provide empirical validation, we compare a minimum entropy coupling-based approach to three modern baselines -- arithmetic coding, Meteor, and adaptive dynamic grouping -- using GPT-2, WaveRNN, and Image Transformer as communication channels. We find that the minimum entropy coupling-based approach achieves superior encoding efficiency, despite its stronger security constraints. In aggregate, these results suggest that it may be natural to view information-theoretic steganography through the lens of minimum entropy coupling.
Until now the discussion on perfect security for steganographic systems has remained confined within the realm of mathematicians and information theory experts whose concise and symbolic representation of their philosophies, postulates, and inference thereafter has made it hard for the naïve academics to have an insight of the concepts. This paper is an endeavor not only to appraise on the limitations of one of such pioneer comprehensions but also to illustrate a pitfall in another scheme that asserts on having perfect security without the use of public or secret key. Goals set are accomplished through contrasting test results of a steganographic scheme that exploits English words with corresponding acronyms for hiding bits of secret information in chat - a preferred way to exchange messages these days. The misapprehension about perfect security and reign in characteristic of stego key in bit embedding process are unfolded respectively by launching elementary chosen-message and chosen-cover attack, and through proposed enhancement of target scheme.
With the rapid development of deep learning, existing generative text steganography methods based on autoregressive models have achieved success. However, these autoregressive steganography approaches have certain limitations. Firstly, existing methods require encoding candidate words according to their output probability and generating each stego word one by one, which makes the generation process time-consuming. Secondly, encoding and selecting candidate words changes the sampling probabilities, resulting in poor imperceptibility of the stego text. Thirdly, existing methods have low robustness and cannot resist replacement attacks. To address these issues, we propose a generative text steganography method based on a diffusion model (GTSD), which improves generative speed, robustness, and imperceptibility while maintaining security. To be specific, a novel steganography scheme based on diffusion model is proposed to embed secret information through prompt mapping and batch mapping. The prompt mapping maps secret information into a conditional prompt to guide the pre-trained diffusion model generating batches of candidate sentences. The batch mapping selects stego text based on secret information from batches of candidate sentences. Extensive experiments show that the GTSD outperforms the SOTA method in terms of generative speed, robustness, and imperceptibility while maintaining comparable anti-steganalysis performance. Moreover, we verify that the GTSD has strong potential: embedding capacity is positively correlated with prompt capacity and model batch sizes while maintaining security.
In this paper, a novel data-driven information hiding scheme called generative steganography by sampling (GSS) is proposed. Unlike in traditional modification-based steganography, in our method the stego image is directly sampled by a powerful generator: no explicit cover is used. Both parties share a secret key used for message embedding and extraction. The Jensen-Shannon divergence is introduced as a new criterion for evaluating the security of generative steganography. Based on these principles, we propose a simple practical generative steganography method that uses semantic image inpainting. The message is written in advance to an uncorrupted region that needs to be retained in the corrupted image. Then, the corrupted image with the secret message is fed into a Generator trained by a generative adversarial network (GAN) for semantic completion. Message loss and prior loss terms are proposed for penalizing message extraction error and unrealistic stego image. In our design, we first train a generator whose training target is the generation of new data samples from the same distribution as that of existing training data. Next, for the trained generator, backpropagation to the message and prior loss are introduced to optimize the coding of the input noise data for the generator. The presented experiments demonstrate the potential of the proposed framework based on both qualitative and quantitative evaluations of the generated stego images.
Generative steganography (GS) directly generates stego-media through secret message-driven generation. It makes the hiding capacity of GS higher than that of traditional steganography, as well as more resistant to classical steganalysis. However, the generators and extractors of existing GS methods can only target specific formats and types of data and lack of universality. Besides, the model size is usually related to the underlying grid resolution, and the transmission behavior of the extractor is susceptible to suspicion of steganalysis. Implicit neural representation(INR) is a technique for representing data in a continuous manner. Inspired by this, we propose an INR-based generative steganography by point cloud representation (INR-GSPC). By using the function generator, the problem of the generator model size growing exponentially with the increase of gridded data has been solved. That is able to generate a wide range of data types and break through the limitation of resolution. In order to unify the data formats of the generator and message extractor, the data is converted to point cloud representation. We designed and fixed a point cloud message extractor. By iterating over the point cloud with adding small perturbations to generate stego-media. This method can avoid the training and transmission process of the message extractor. To the best of our knowledge, this is the first method to apply point cloud to generative steganography. Experiments demonstrate that the stego-images generated by the scheme have an average PSNR value of more than 65, and the accuracy of message extraction reaches more than 99%.
Generative steganography (GS) is an emerging technique that generates stego images directly from secret data. Various GS methods based on GANs or Flow have been developed recently. However, existing GAN-based GS methods cannot completely recover the hidden secret data due to the lack of network invertibility, while Flow-based methods produce poor image quality due to the stringent reversibility restriction in each module. To address this issue, we propose a novel GS scheme called "Generative Steganography Diffusion" (GSD) by devising an invertible diffusion model named "StegoDiffusion". It not only generates realistic stego images but also allows for 100\% recovery of the hidden secret data. The proposed StegoDiffusion model leverages a non-Markov chain with a fast sampling technique to achieve efficient stego image generation. By constructing an ordinary differential equation (ODE) based on the transition probability of the generation process in StegoDiffusion, secret data and stego images can be converted to each other through the approximate solver of ODE -- Euler iteration formula, enabling the use of irreversible but more expressive network structures to achieve model invertibility. Our proposed GSD has the advantages of both reversibility and high performance, significantly outperforming existing GS methods in all metrics.
Diffusion models have achieved remarkable progress across various visual generation tasks. However, their performance significantly declines when generating content at resolutions higher than those used during training. Although numerous methods have been proposed to enable high-resolution generation, they all suffer from inefficiency. In this paper, we propose RectifiedHR, a straightforward and efficient solution for training-free high-resolution synthesis. Specifically, we propose a noise refresh strategy that unlocks the model's training-free high-resolution synthesis capability and improves efficiency. Additionally, we are the first to observe the phenomenon of energy decay, which may cause image blurriness during the high-resolution synthesis process. To address this issue, we introduce average latent energy analysis and find that tuning the classifier-free guidance hyperparameter can significantly improve generation performance. Our method is entirely training-free and demonstrates efficient performance. Furthermore, we show that RectifiedHR is compatible with various diffusion model techniques, enabling advanced features such as image editing, customized generation, and video synthesis. Extensive comparisons with numerous baseline methods validate the superior effectiveness and efficiency of RectifiedHR.
There has been tremendous progress in large-scale text-to-image synthesis driven by diffusion models enabling versatile downstream applications such as 3D object synthesis from texts, image editing, and customized generation. We present a generic approach using latent diffusion models as powerful image priors for various visual synthesis tasks. Existing methods that utilize such priors fail to use these models' full capabilities. To improve this, our core ideas are 1) a feature matching loss between features from different layers of the decoder to provide detailed guidance and 2) a KL divergence loss to regularize the predicted latent features and stabilize the training. We demonstrate the efficacy of our approach on three different applications, text-to-3D, StyleGAN adaptation, and layered image editing. Extensive results show our method compares favorably against baselines.
While existing anomaly synthesis methods have made remarkable progress, achieving both realism and diversity in synthesis remains a major obstacle. To address this, we propose AnomalyPainter, a zero-shot framework that breaks the diversity-realism trade-off dilemma through synergizing Vision Language Large Model (VLLM), Latent Diffusion Model (LDM), and our newly introduced texture library Tex-9K. Tex-9K is a professional texture library containing 75 categories and 8,792 texture assets crafted for diverse anomaly synthesis. Leveraging VLLM's general knowledge, reasonable anomaly text descriptions are generated for each industrial object and matched with relevant diverse textures from Tex-9K. These textures then guide the LDM via ControlNet to paint on normal images. Furthermore, we introduce Texture-Aware Latent Init to stabilize the natural-image-trained ControlNet for industrial images. Extensive experiments show that AnomalyPainter outperforms existing methods in realism, diversity, and generalization, achieving superior downstream performance.
This paper contributes to the "BraTS 2024 Brain MR Image Synthesis Challenge" and presents a conditional Wavelet Diffusion Model (cWDM) for directly solving a paired image-to-image translation task on high-resolution volumes. While deep learning-based brain tumor segmentation models have demonstrated clear clinical utility, they typically require MR scans from various modalities (T1, T1ce, T2, FLAIR) as input. However, due to time constraints or imaging artifacts, some of these modalities may be missing, hindering the application of well-performing segmentation algorithms in clinical routine. To address this issue, we propose a method that synthesizes one missing modality image conditioned on three available images, enabling the application of downstream segmentation models. We treat this paired image-to-image translation task as a conditional generation problem and solve it by combining a Wavelet Diffusion Model for high-resolution 3D image synthesis with a simple conditioning strategy. This approach allows us to directly apply our model to full-resolution volumes, avoiding artifacts caused by slice- or patch-wise data processing. While this work focuses on a specific application, the presented method can be applied to all kinds of paired image-to-image translation problems, such as CT $\leftrightarrow$ MR and MR $\leftrightarrow$ PET translation, or mask-conditioned anatomically guided image generation.
Modeling sounds emitted from physical object interactions is critical for immersive perceptual experiences in real and virtual worlds. Traditional methods of impact sound synthesis use physics simulation to obtain a set of physics parameters that could represent and synthesize the sound. However, they require fine details of both the object geometries and impact locations, which are rarely available in the real world and can not be applied to synthesize impact sounds from common videos. On the other hand, existing video-driven deep learning-based approaches could only capture the weak correspondence between visual content and impact sounds since they lack of physics knowledge. In this work, we propose a physics-driven diffusion model that can synthesize high-fidelity impact sound for a silent video clip. In addition to the video content, we propose to use additional physics priors to guide the impact sound synthesis procedure. The physics priors include both physics parameters that are directly estimated from noisy real-world impact sound examples without sophisticated setup and learned residual parameters that interpret the sound environment via neural networks. We further implement a novel diffusion model with specific training and inference strategies to combine physics priors and visual information for impact sound synthesis. Experimental results show that our model outperforms several existing systems in generating realistic impact sounds. More importantly, the physics-based representations are fully interpretable and transparent, thus enabling us to perform sound editing flexibly.
Graph generation has been dominated by autoregressive models due to their simplicity and effectiveness, despite their sensitivity to ordering. Yet diffusion models have garnered increasing attention, as they offer comparable performance while being permutation-invariant. Current graph diffusion models generate graphs in a one-shot fashion, but they require extra features and thousands of denoising steps to achieve optimal performance. We introduce PARD, a Permutation-invariant Auto Regressive Diffusion model that integrates diffusion models with autoregressive methods. PARD harnesses the effectiveness and efficiency of the autoregressive model while maintaining permutation invariance without ordering sensitivity. Specifically, we show that contrary to sets, elements in a graph are not entirely unordered and there is a unique partial order for nodes and edges. With this partial order, PARD generates a graph in a block-by-block, autoregressive fashion, where each block's probability is conditionally modeled by a shared diffusion model with an equivariant network. To ensure efficiency while being expressive, we further propose a higher-order graph transformer, which integrates transformer with PPGN. Like GPT, we extend the higher-order graph transformer to support parallel training of all blocks. Without any extra features, PARD achieves state-of-the-art performance on molecular and non-molecular datasets, and scales to large datasets like MOSES containing 1.9M molecules. Pard is open-sourced at https://github.com/LingxiaoShawn/Pard.
In this work, we investigate an intriguing and prevalent phenomenon of diffusion models which we term as "consistent model reproducibility": given the same starting noise input and a deterministic sampler, different diffusion models often yield remarkably similar outputs. We confirm this phenomenon through comprehensive experiments, implying that different diffusion models consistently reach the same data distribution and scoring function regardless of diffusion model frameworks, model architectures, or training procedures. More strikingly, our further investigation implies that diffusion models are learning distinct distributions affected by the training data size. This is supported by the fact that the model reproducibility manifests in two distinct training regimes: (i) "memorization regime", where the diffusion model overfits to the training data distribution, and (ii) "generalization regime", where the model learns the underlying data distribution. Our study also finds that this valuable property generalizes to many variants of diffusion models, including those for conditional use, solving inverse problems, and model fine-tuning. Finally, our work raises numerous intriguing theoretical questions for future investigation and highlights practical implications regarding training efficiency, model privacy, and the controlled generation of diffusion models.
In this paper, we introduce a novel dIffusion language modEl pre-training framework for text generation, which we call GENIE. GENIE is a large-scale pretrained diffusion language model that consists of an encoder and a diffusion-based decoder, which can generate text by gradually transforming a random noise sequence into a coherent text sequence. To pre-train GENIE on a large-scale language corpus, we design a new continuous paragraph denoise objective, which encourages the diffusion-decoder to reconstruct a clean text paragraph from a corrupted version, while preserving the semantic and syntactic coherence. We evaluate GENIE on four downstream text generation benchmarks, namely XSum, CNN/DailyMail, Gigaword, and CommonGen. Our experimental results show that GENIE achieves comparable performance with the state-of-the-art autoregressive models on these benchmarks, and generates more diverse text samples. The code and models of GENIE are available at https://github.com/microsoft/ProphetNet/tree/master/GENIE.
Adversarial training was recently shown to be competitive against supervised learning methods on computer vision tasks, however, studies have mainly been confined to generative tasks such as image synthesis. In this paper, we apply adversarial training techniques to the discriminative task of learning a steganographic algorithm. Steganography is a collection of techniques for concealing information by embedding it within a non-secret medium, such as cover texts or images. We show that adversarial training can produce robust steganographic techniques: our unsupervised training scheme produces a steganographic algorithm that competes with state-of-the-art steganographic techniques, and produces a robust steganalyzer, which performs the discriminative task of deciding if an image contains secret information. We define a game between three parties, Alice, Bob and Eve, in order to simultaneously train both a steganographic algorithm and a steganalyzer. Alice and Bob attempt to communicate a secret message contained within an image, while Eve eavesdrops on their conversation and attempts to determine if secret information is embedded within the image. We represent Alice, Bob and Eve by neural networks, and validate our scheme on two independent image datasets, showing our novel method of studying steganographic problems is surprisingly competitive against established steganographic techniques.
Scaling text-to-speech (TTS) to large-scale, multi-speaker, and in-the-wild datasets is important to capture the diversity in human speech such as speaker identities, prosodies, and styles (e.g., singing). Current large TTS systems usually quantize speech into discrete tokens and use language models to generate these tokens one by one, which suffer from unstable prosody, word skipping/repeating issue, and poor voice quality. In this paper, we develop NaturalSpeech 2, a TTS system that leverages a neural audio codec with residual vector quantizers to get the quantized latent vectors and uses a diffusion model to generate these latent vectors conditioned on text input. To enhance the zero-shot capability that is important to achieve diverse speech synthesis, we design a speech prompting mechanism to facilitate in-context learning in the diffusion model and the duration/pitch predictor. We scale NaturalSpeech 2 to large-scale datasets with 44K hours of speech and singing data and evaluate its voice quality on unseen speakers. NaturalSpeech 2 outperforms previous TTS systems by a large margin in terms of prosody/timbre similarity, robustness, and voice quality in a zero-shot setting, and performs novel zero-shot singing synthesis with only a speech prompt. Audio samples are available at https://speechresearch.github.io/naturalspeech2.
本组文献系统地展示了生成式可证明安全隐写从理论定义向复杂异构场景落地的演进全景。研究脉络清晰地从统计不可区分性的严谨证明,延伸至大语言模型与扩散模型在潜空间映射上的创新应用。特别是在模型异构与非对称资源环境下,研究者通过引入模型再现性、分布式学习及跨模态协同技术,有效解决了发送方与接收方在模型架构不一致时的同步难题。同时,结合纠错码与对抗训练的鲁棒性增强方案,使得生成式隐写在面对现实有损信道时具备了实用化的可能,实现了安全性、容量与鲁棒性在多模态环境下的深度平衡。