GAN生成器 综述论文
异构骨干网络与现代神经算子集成
该组研究关注于超越传统CNN,引入Transformer、Mamba(状态空间模型)、ConvNeXt、NODE(神经常微分方程)等现代架构,以增强生成器对全局依赖和复杂非线性分布的建模能力。
- DSS-GAN: Directional State Space GAN with Mamba backbone for Class-Conditional Image Synthesis(Aleksander Ogonowski, Konrad Klimaszewski, Przemysław Rokita, 2026, ArXiv Preprint)
- ConvNeXt-GAN: A Generative Adversarial Network for Blueberry Leaf Disease Dataset Expansion(Mingyue Gao, 2025, 2025 2nd International Conference on Intelligent Perception and Pattern Recognition (IPPR))
- EffBaGAN: An Efficient Balancing GAN for Earth Observation in Data Scarcity Scenarios(Nicolás Vilela-Pérez, D. B. Heras, Francisco Argüello, 2025, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing)
- Transfer Learning Enabled Transformer based Generative Adversarial Networks (TT-GAN) for Terahertz Channel Modeling and Generating(Zhengdong Hu, Yuanbo Li, Chong Han, 2024, ArXiv Preprint)
- MambaRA-GAN: Underwater Image Enhancement via Mamba and Intra-Domain Reconstruction Autoencoder(Jiangyan Wu, Guanghui Zhang, Yugang Fan, 2025, Journal of Marine Science and Engineering)
- A Novel Lightweight Hybrid GAN-Transformer Architecture for Sand-Dust and Haze Image Enhancement(Muhammad Khawaja Kashif Masood, Pablo Otero, 2026, IEEE Access)
- Enhancing Generative Adversarial Network Performance with a Hybrid Operational Neural Network Generator(Kamil Dinleyici, A. Keçeli, 2025, 2025 Innovations in Intelligent Systems and Applications Conference (ASYU))
- A GAN-LSTM Architecture for ECG Anomaly Detection(Soumia Oubadi, Naçima Mellal, R. Bouchouareb, Soumia Zertal, 2024, 2024 1st International Conference on Electrical, Computer, Telecommunication and Energy Technologies (ECTE-Tech))
- Expo-GAN: A Style Transfer Generative Adversarial Network for Exhibition Hall Design Based on Optimized Cyclic and Neural Architecture Search(Qing Xie, Ruiyun Yu, 2025, Computers, Materials & Continua)
- NODE-AdvGAN: Improving the transferability and perceptual similarity of adversarial examples by dynamic-system-driven adversarial generative model(Xinheng Xie, Yue Wu, Cuiyu He, 2024, ArXiv Preprint)
Diffusion-GAN 混合架构与快速采样
该组探讨将扩散模型的逐步去噪能力与GAN的一步生成优势相结合,旨在解决扩散模型推理慢和GAN训练不稳定的痛点,适用于图像生成与离散布局生成。
- SpecDiff-GAN: A Spectrally-Shaped Noise Diffusion GAN for Speech and Music Synthesis(Teysir Baoueb, Haocheng Liu, Mathieu Fontaine, Jonathan Le Roux, Gael Richard, 2024, ArXiv Preprint)
- Latent Denoising Diffusion GAN: Faster sampling, Higher image quality(Luan Thanh Trinh, Tomoki Hamagami, 2024, ArXiv Preprint)
- DogLayout: Denoising Diffusion GAN for Discrete and Continuous Layout Generation(Zhaoxing Gan, Guangnan Ye, 2024, ArXiv Preprint)
注意力机制与多尺度特征融合策略
通过引入双注意力机制、U-Net融合、残差去噪结构及多尺度特征提取,增强生成器对局部纹理细节和全局语义的一致性捕捉。
- EA-GAN: enhanced attention generative adversarial network applied to hair removal in dermoscopy images(Ghada Halladja, Akila Djebbar, Ahmed Boulemden, A. Melouah, Yousra Mebarki, 2025, Signal, Image and Video Processing)
- FR-GAN: A Fusion Refine with Dual Attention Architecture for Denoising of Dental Panoramic X-ray Images(S.Kausar Begum, Nagaraj Yamanakkanavar, 2026, Journal of Innovative Image Processing)
- The GAN Spatiotemporal Fusion Model Based on Multiscale Convolution and Attention Mechanism for Remote Sensing Images(Youping Xie, Jun Hu, Kang He, Li Cao, Kaijun Yang, Luo Chen, 2025, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing)
- CAS‐GAN: A Novel Generative Adversarial Network‐Based Architecture for Coronary Artery Segmentation(Rawaa Hamdi, Asma Kerkeni, A. Abdallah, M. Hedi, 2024, International Journal of Imaging Systems and Technology)
- ResDeblur-GAN: A ResNet and PatchGAN Based Architecture for Image Deblurring(CH . Jaidev, 2025, INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT)
- Image defogging algorithm based on generative adversarial networks with multi-port generator(Liuyi Ling, Bolun Hong, Yuwen Liu, Guo Wei, Shuai Xu, 2025, Signal, Image and Video Processing)
- New Approach to Underwater Image Enhancement Using Modified Residual Blocks in Generator Architecture for Improved Cycle Generative Adversarial Networks(K. Selvaraju, S. Rajamani, 2024, Proceedings of the Bulgarian Academy of Sciences)
量子计算与演化算法驱动的架构演进
利用量子储层计算、混合量子-经典电路、遗传算法及神经演化策略,探索非梯度或物理启发式的方法来优化生成器的拓扑结构与参数空间。
- MediQ-GAN: Quantum-Inspired GAN for High Resolution Medical Image Generation(Qingyue Jiao, Yongcan Tang, Jun Zhuang, Jason Cong, Yiyu Shi, 2025, ArXiv Preprint)
- Bridging Quantum and Classical Computing in Drug Design: Architecture Principles for Improved Molecule Generation(Andrew Smith, Erhan Guven, 2025, ArXiv Preprint)
- Quantum Reservoir GAN(Hikaru Wakaura, 2025, ArXiv Preprint)
- Evolutionary-Enhanced GAN With Wavelet-Based Discrimination: A Hardware-Accelerated Architecture for Efficient Synthetic Image Generation(Charudatta Gurudas, Korde, K. G. Shreeharsha, R. Siddharth, I. M. H. V. Member, I. A. K. S. Senior Member, K. G. Shreeharsha, 2025, IEEE Access)
- Neuroevolution of a Multi-Generator GAN (Student Abstract)(S. Pandey, 2024, Proceedings of the AAAI Conference on Artificial Intelligence)
- PrAda-GAN: A Private Adaptive Generative Adversarial Network with Bayes Network Structure(Ke Jia, Yuheng Ma, Yang Li, Feifei Wang, 2025, AAAI Conference on Artificial Intelligence)
外部知识引导与多任务领域适配
结合预训练模型(如CLIP)的语义先验、神经符号逻辑、物理/化学约束以及三维结构信息,使生成器能胜任医疗、分子设计、3D重建及遥感等高专业性任务。
- CLIP-GAN: Stacking CLIPs and GAN for Efficient and Controllable Text-to-Image Synthesis(Yingli Hou, Wei Zhang, Zhiliang Zhu, Hai Yu, 2025, IEEE Transactions on Multimedia)
- HP-GAN: Harnessing pretrained networks for GAN improvement with FakeTwins and discriminator consistency(Geonhui Son, Jeong Ryong Lee, Dosik Hwang, 2026, ArXiv Preprint)
- Semantic Loss Functions for Neuro-Symbolic Structured Prediction(Kareem Ahmed, Stefano Teso, Paolo Morettin, Luca Di Liello, Pierfrancesco Ardino, Jacopo Gobbi, Yitao Liang, Eric Wang, Kai-Wei Chang, Andrea Passerini, Guy Van den Broeck, 2024, ArXiv Preprint)
- Improved Molecular Generation through Attribute-Driven Integrative Embeddings and GAN Selectivity(Nandan Joshi, Erhan Guven, 2025, ArXiv Preprint)
- Fusion-driven semi-supervised learning-based lung nodules classification with dual-discriminator and dual-generator generative adversarial network(A. Saihood, W. R. Abdulhussien, Laith Alzubaidi, Mohamed Manoufali, Yuantong Gu, 2024, BMC Medical Informatics and Decision Making)
- Multi-Class Guided GAN for Remote-Sensing Image Synthesis Based on Semantic Labels(Zhenye Niu, Yuxia Li, Yushu Gong, Bowei Zhang, Yuan He, Jinglin Zhang, Mengyu Tian, Lei He, 2025, Remote Sensing)
- Microstructural Studies Using Generative Adversarial Network (GAN): a Case Study(Owais Ahmad, Vishal Panwar, Kaushik Das, Rajdip Mukherjee, Somnath Bhowmick, 2025, ArXiv Preprint)
- A Reinforcement Learning-Driven Transformer GAN for Molecular Generation(Chen Li, Huidong Tang, Ye Zhu, Yoshihiro Yamanishi, 2025, ArXiv Preprint)
- Target Population Synthesis using CT-GAN(Tanay Rastogi, Daniel Jonsson, 2025, ArXiv Preprint)
- Synthesis of pulses from particle detectors with a Generative Adversarial Network (GAN)(Alberto Regadío, Luis Esteban, Sebastián Sánchez-Prieto, 2024, ArXiv Preprint)
- Deformation-aware GAN for Medical Image Synthesis with Substantially Misaligned Pairs(Bowen Xin, Tony Young, Claire E Wainwright, Tamara Blake, Leo Lebrat, Thomas Gaass, Thomas Benkert, Alto Stemmer, David Coman, Jason Dowling, 2024, ArXiv Preprint)
- Triplane generator-based NeRF-GAN framework for single-view ship reconstruction(Tao Liu, Shiqi Geng, Yuchen Fu, Zhengling Lei, Yuchi Huo, Xiaocai Zhang, Fang Wang, Bing Han, Mei Sha, Zhongdai Wu, 2025, International Journal of Digital Earth)
- FreeGen: Feed-Forward Reconstruction-Generation Co-Training for Free-Viewpoint Driving Scene Synthesis(Shijie Chen, Peixi Peng, 2025, ArXiv Preprint)
- Fault Diagnosis with Extremely Limited Samples: A Generalization Generator Improves GAN-Based Methods(Cuiying Lin, Hao Shen, Junhui Qi, Jie Zhang, Guoyu Huang, Yun Kong, 2024, 2024 Global Reliability and Prognostics and Health Management Conference (PHM-Beijing))
- High-Resolution Feature Generator for Small-Ship Detection in Optical Remote Sensing Images(Haopeng Zhang, Sizhe Wen, Zhaoxiang Wei, Zhuoyi Chen, 2024, IEEE Transactions on Geoscience and Remote Sensing)
- Secure Image Steganography with GAN Architecture: A High-Fidelity and Robust Data Hiding Approach(Swapnil Shetty, P. P, 2025, 2025 9th International Conference on Electronics, Communication and Aerospace Technology (ICECA))
- Inference-based GAN Video Generation(Jingbo Yang, Adrian G. Bors, 2025, ArXiv Preprint)
- Real-world Underwater Image Enhancement Based on Histogram Equalization and Fast GAN-based Architecture(Dingding Hu, Min Si Young, C. Lin, 2025, 2025 IEEE 14th Global Conference on Consumer Electronics (GCCE))
- Block Induced Signature Generative Adversarial Network (BISGAN): Signature Spoofing Using GANs and Their Evaluation(Haadia Amjad, Kilian Goeller, Steffen Seitz, Carsten Knoll, Naseer Bajwa, Ronald Tetzlaff, Muhammad Imran Malik, 2024, ArXiv Preprint)
潜空间分析、可解释性与训练稳定性
侧重于生成器潜空间的拓扑性质、解耦表征(信息瓶颈、KL正则化)以及通过新型损失函数(如BOLT)和可视化手段提升模型的可解释性与训练稳健性。
- Latent Space Imaging(Matheus Souza, Yidan Zheng, Kaizhang Kang, Yogeshwar Nath Mishra, Qiang Fu, Wolfgang Heidrich, 2024, ArXiv Preprint)
- Unsupervised Panoptic Interpretation of Latent Spaces in GANs Using Space-Filling Vector Quantization(Mohammad Hassan Vali, Tom Bäckström, 2024, ArXiv Preprint)
- Adaptive GAN Architecture with Constrained Latent Embedding for Unsupervised Video Anomaly Detection(Takveer Singh, P. Chitra, Saswati Roy Chel, Mani Ramakrishnan, Rakesh Kumar Yadav, T. Manoj Kumar, 2025, 2025 IEEE 1st International Conference on Smart Innovations in Systems, Infrastructure, Mechanical, Power, AI and Computing Technologies (SISIMPACT))
- An Enhanced GAN Architecture with Latent Discriminator for Accurate and Efficient Video Anomaly Detection(T. M. Kumar, Saksham Sood, R. Gomathi, Priyanka Niranjan, Pradeep Kumar Shinde, Divyanshi Rajvanshi, 2025, 2025 IEEE 1st International Conference on Smart Innovations in Systems, Infrastructure, Mechanical, Power, AI and Computing Technologies (SISIMPACT))
- IB-GAN: Disentangled Representation Learning with Information Bottleneck Generative Adversarial Networks(Insu Jeon, Wonkwang Lee, Myeongjang Pyeon, Gunhee Kim, 2025, ArXiv Preprint)
- The Evolving Nature of Latent Spaces: From GANs to Diffusion(Ludovica Schaerf, 2025, ArXiv Preprint)
- Efficient Visualization of Neural Networks with Generative Models and Adversarial Perturbations(Athanasios Karagounis, 2024, ArXiv Preprint)
- BOLT-GAN: Bayes-Error-Motivated Objective for Stable GAN Training(Mohammadreza Tavasoli Naeini, Ali Bereyhi, Morteza Noshad, Ben Liang, Alfred O. Hero, 2025, ArXiv Preprint)
多生成器协同与自动化架构搜索优化
涵盖双生成器竞争模式、协同演化拓扑以及利用神经架构搜索(NAS)、剪枝和下游任务反馈(DSF)来自动提升生成器性能的工程化方法。
- TAP-GAN+: A Topological Adversarial Pipeline for Inference-Time Detection in Machine Learning(Galamo F. Monkam, Jie Yan, Nathaniel D. Bastian, 2025, MILCOM 2025 - 2025 IEEE Military Communications Conference (MILCOM))
- Improving Person-Re Identification with Dual-Generator and Dual-Discriminator Architecture in Conditional GANs(P. P. Ghadekar, Atharv Kinage, Aadesh Kabra, K. Agarwal, Ketan Gangwal, Kshitij Chaudhari, 2024, 2024 1st International Conference on Cognitive, Green and Ubiquitous Computing (IC-CGU))
- Catch Me If You Can: A Modified GAN Architecture Leveraging SIFT and Feature-matching for Highly Stable Training(Asraa Saeed, Ahmed A. Hashim, 2025, International Journal of Scientific Research in Science, Engineering and Technology)
- Generate more than one child in your co-evolutionary semi-supervised learning GAN(Francisco Sedeño, Jamal Toutouh, Francisco Chicano, 2025, ArXiv Preprint)
- Enhancing GAN Performance Through Neural Architecture Search and Tensor Decomposition(Prasanna Reddy Pulakurthi, Mahsa Mozaffari, S. Dianat, Majid Rabbani, Jamison Heard, Raghuveer Rao, 2024, ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP))
- Optimal Eye Surgeon: Finding Image Priors through Sparse Generators at Initialization(Avrajit Ghosh, Xitong Zhang, Kenneth K. Sun, Qing Qu, Saiprasad Ravishankar, Rongrong Wang, 2024, ArXiv Preprint)
- GANetic Loss for Generative Adversarial Networks with a Focus on Medical Applications(Shakhnaz Akhmedova, Nils Körber, 2024, ArXiv Preprint)
- A Focused Approach to Generator Superiority in Digit Generation(Pranita Mohapatro, Biswaranjan Routray, Subhasmita Pani, Akankhya Satapathi, Lipsa Priyadarsini Singh, 2025, 2025 International Conference on Responsible, Generative and Explainable AI (ResGenXAI))
- DSF-GAN: DownStream Feedback Generative Adversarial Network(Oriel Perets, Nadav Rappoport, 2024, ArXiv Preprint)
本综述全面整合了GAN生成器在架构创新、计算范式、领域知识集成及理论分析方面的最新进展。当前研究呈现出明显的跨学科融合趋势:一方面,通过引入Mamba、Transformer及Diffusion混合架构不断推高生成质量的上限;另一方面,利用量子计算与演化策略探索非传统的模型优化路径。同时,生成器正从通用的图像合成向深度集成了领域知识、物理约束的可解释性模型演进,并借由自动化神经搜索技术实现了性能与部署效率的平衡。
总计62篇相关文献
Layout Generation aims to synthesize plausible arrangements from given elements. Currently, the predominant methods in layout generation are Generative Adversarial Networks (GANs) and diffusion models, each presenting its own set of challenges. GANs typically struggle with handling discrete data due to their requirement for differentiable generated samples and have historically circumvented the direct generation of discrete labels by treating them as fixed conditions. Conversely, diffusion-based models, despite achieving state-of-the-art performance across several metrics, require extensive sampling steps which lead to significant time costs. To address these limitations, we propose \textbf{DogLayout} (\textbf{D}en\textbf{o}ising Diffusion \textbf{G}AN \textbf{Layout} model), which integrates a diffusion process into GANs to enable the generation of discrete label data and significantly reduce diffusion's sampling time. Experiments demonstrate that DogLayout considerably reduces sampling costs by up to 175 times and cuts overlap from 16.43 to 9.59 compared to existing diffusion models, while also surpassing GAN based and other layout methods. Code is available at https://github.com/deadsmither5/DogLayout.
To address the possible lack or total absence of pulses from particle detectors during the development of its associate electronics, we propose a model that can generate them without losing the features of the real ones. This model is based on artificial neural networks, namely Generative Adversarial Networks (GAN). We describe the proposed network architecture, its training methodology and the approach to train the GAN with real pulses from a scintillator receiving radiation from sources of ${}^{137}$Cs and ${}^{22}$Na. The Generator was installed in a Xilinx's System-On-Chip (SoC). We show how the network is capable of generating pulses with the same shape as the real ones that even match the data distributions in the original pulse-height histogram data.
We propose a new GAN-based unsupervised model for disentangled representation learning. The new model is discovered in an attempt to utilize the Information Bottleneck (IB) framework to the optimization of GAN, thereby named IB-GAN. The architecture of IB-GAN is partially similar to that of InfoGAN but has a critical difference; an intermediate layer of the generator is leveraged to constrain the mutual information between the input and the generated output. The intermediate stochastic layer can serve as a learnable latent distribution that is trained with the generator jointly in an end-to-end fashion. As a result, the generator of IB-GAN can harness the latent space in a disentangled and interpretable manner. With the experiments on dSprites and Color-dSprites dataset, we demonstrate that IB-GAN achieves competitive disentanglement scores to those of state-of-the-art \b{eta}-VAEs and outperforms InfoGAN. Moreover, the visual quality and the diversity of samples generated by IB-GAN are often better than those by \b{eta}-VAEs and Info-GAN in terms of FID score on CelebA and 3D Chairs dataset.
The growing demand for molecules with tailored properties in fields such as drug discovery and chemical engineering has driven advancements in computational methods for molecular design. Machine learning-based approaches for de-novo molecular generation have recently garnered significant attention. This paper introduces a transformer-based vector embedding generator combined with a modified Generative Adversarial Network (GAN) to generate molecules with desired properties. The embedding generator utilizes a novel molecular descriptor, integrating Morgan fingerprints with global molecular attributes, enabling the transformer to capture local functional groups and broader molecular characteristics. Modifying the GAN generator loss function ensures the generation of molecules with specific desired properties. The transformer achieves a reconversion accuracy of 94% while translating molecular descriptors back to SMILES strings, validating the utility of the proposed embeddings for generative tasks. The approach is validated by generating novel odorant molecules using a labeled dataset of odorant and non-odorant compounds. With the modified range-loss function, the GAN exclusively generates odorant molecules. This work underscores the potential of combining novel vector embeddings with transformers and modified GAN architectures to accelerate the discovery of tailored molecules, offering a robust tool for diverse molecular design applications.
Machine learning-assisted diagnosis shows promise, yet medical imaging datasets are often scarce, imbalanced, and constrained by privacy, making data augmentation essential. Classical generative models typically demand extensive computational and sample resources. Quantum computing offers a promising alternative, but existing quantum-based image generation methods remain limited in scale and often face barren plateaus. We present MediQ-GAN, a quantum-inspired GAN with prototype-guided skip connections and a dual-stream generator that fuses classical and quantum-inspired branches. Its variational quantum circuits inherently preserve full-rank mappings, avoid rank collapse, and are theory-guided to balance expressivity with trainability. Beyond generation quality, we provide the first latent-geometry and rank-based analysis of quantum-inspired GANs, offering theoretical insight into their performance. Across three medical imaging datasets, MediQ-GAN outperforms state-of-the-art GANs and diffusion models. While validated on IBM hardware for robustness, our contribution is hardware-agnostic, offering a scalable and data-efficient framework for medical image generation and augmentation.
Hybrid quantum-classical machine learning offers a path to leverage noisy intermediate-scale quantum (NISQ) devices for drug discovery, but optimal model architectures remain unclear. We systematically optimize the quantum-classical bridge architecture of generative adversarial networks (GANs) for molecule discovery using multi-objective Bayesian optimization. Our optimized model (BO-QGAN) significantly improves performance, achieving a 2.27-fold higher Drug Candidate Score (DCS) than prior quantum-hybrid benchmarks and 2.21-fold higher than the classical baseline, while reducing parameter count by more than 60%. Key findings favor layering multiple (3-4) shallow (4-8 qubit) quantum circuits sequentially, while classical architecture shows less sensitivity above a minimum capacity. This work provides the first empirically-grounded architectural guidelines for hybrid models, enabling more effective integration of current quantum computers into pharmaceutical research pipelines.
The generative adversarial network (GAN) is one of the most widely used deep generative models for synthesizing high-quality images with the same statistics as the training set. Finite element method (FEM) based property prediction often relies on synthetically generated microstructures. The phase-field model is a computational method of generating realistic microstructures considering the underlying thermodynamics and kinetics of the material. Due to the expensive nature of the simulations, it is not always feasible to use phase-field for synthetic microstructure generation. In this work, we train a GAN with microstructures generated from the phase-field simulations. Mechanical properties calculated using the finite element method on synthetic and actual phase field microstructures show excellent agreement. Since the GAN model generates thousands of images within seconds, it has the potential to improve the quality of synthetic microstructures needed for FEM calculations or any other applications requiring a large number of realistic synthetic images at minimal computational cost.
Utility and privacy are two crucial measurements of the quality of synthetic tabular data. While significant advancements have been made in privacy measures, generating synthetic samples with high utility remains challenging. To enhance the utility of synthetic samples, we propose a novel architecture called the DownStream Feedback Generative Adversarial Network (DSF-GAN). This approach incorporates feedback from a downstream prediction model during training to augment the generator's loss function with valuable information. Thus, DSF-GAN utilizes a downstream prediction task to enhance the utility of synthetic samples. To evaluate our method, we tested it using two popular datasets. Our experiments demonstrate improved model performance when training on synthetic samples generated by DSF-GAN, compared to those generated by the same GAN architecture without feedback. The evaluation was conducted on the same validation set comprising real samples. All code and datasets used in this research will be made openly available for ease of reproduction.
Terahertz (THz) communications, ranging from 100 GHz to 10 THz, are envisioned as a promising technology for 6G and beyond wireless systems. As foundation of designing THz communications, channel modeling and characterization are crucial to scrutinize the potential of the new spectrum. However, current channel modeling and standardization heavily rely on measurements, which are both time-consuming and costly to obtain in the THz band. Here, we propose a Transfer learning enabled Transformer based Generative Adversarial Network (TT-GAN) for THz channel modeling. Specifically, as a fundamental building block, a GAN is exploited to generate channel parameters, which can substitute measurements. To greatly improve the accuracy, the first T, i.e., a transformer structure with a self-attention mechanism is incorporated in GAN. Still incurring errors compared with ground-truth measurement, the second T, i.e., a transfer learning is designed to solve the mismatch between the formulated network and measurement. The proposed TT-GAN can achieve high accuracy in channel modeling, while requiring only rather limited amount of measurement, which is a promising complementary of channel standardization that fundamentally differs from the current techniques that heavily rely on measurement.
Generative Adversarial Networks (GANs) have made significant progress in enhancing the quality of image synthesis. Recent methods frequently leverage pretrained networks to calculate perceptual losses or utilize pretrained feature spaces. In this paper, we extend the capabilities of pretrained networks by incorporating innovative self-supervised learning techniques and enforcing consistency between discriminators during GAN training. Our proposed method, named HP-GAN, effectively exploits neural network priors through two primary strategies: FakeTwins and discriminator consistency. FakeTwins leverages pretrained networks as encoders to compute a self-supervised loss and applies this through the generated images to train the generator, thereby enabling the generation of more diverse and high quality images. Additionally, we introduce a consistency mechanism between discriminators that evaluate feature maps extracted from Convolutional Neural Network (CNN) and Vision Transformer (ViT) feature networks. Discriminator consistency promotes coherent learning among discriminators and enhances training robustness by aligning their assessments of image quality. Our extensive evaluation across seventeen datasets-including scenarios with large, small, and limited data, and covering a variety of image domains-demonstrates that HP-GAN consistently outperforms current state-of-the-art methods in terms of Fréchet Inception Distance (FID), achieving significant improvements in image diversity and quality. Code is available at: https://github.com/higun2/HP-GAN.
This paper presents a novel approach for deep visualization via a generative network, offering an improvement over existing methods. Our model simplifies the architecture by reducing the number of networks used, requiring only a generator and a discriminator, as opposed to the multiple networks traditionally involved. Additionally, our model requires less prior training knowledge and uses a non-adversarial training process, where the discriminator acts as a guide rather than a competitor to the generator. The core contribution of this work is its ability to generate detailed visualization images that align with specific class labels. Our model incorporates a unique skip-connection-inspired block design, which enhances label-directed image generation by propagating class information across multiple layers. Furthermore, we explore how these generated visualizations can be utilized as adversarial examples, effectively fooling classification networks with minimal perceptible modifications to the original images. Experimental results demonstrate that our method outperforms traditional adversarial example generation techniques in both targeted and non-targeted attacks, achieving up to a 94.5% fooling rate with minimal perturbation. This work bridges the gap between visualization methods and adversarial examples, proposing that fooling rate could serve as a quantitative measure for evaluating visualization quality. The insights from this study provide a new perspective on the interpretability of neural networks and their vulnerabilities to adversarial attacks.
Understanding adversarial examples is crucial for improving model robustness, as they introduce imperceptible perturbations to deceive models. Effective adversarial examples, therefore, offer the potential to train more robust models by eliminating model singularities. We propose NODE-AdvGAN, a novel approach that treats adversarial generation as a continuous process and employs a Neural Ordinary Differential Equation (NODE) to simulate generator dynamics. By mimicking the iterative nature of traditional gradient-based methods, NODE-AdvGAN generates smoother and more precise perturbations that preserve high perceptual similarity when added to benign images. We also propose a new training strategy, NODE-AdvGAN-T, which enhances transferability in black-box attacks by tuning the noise parameters during training. Experiments demonstrate that NODE-AdvGAN and NODE-AdvGAN-T generate more effective adversarial examples that achieve higher attack success rates while preserving better perceptual quality than baseline models.
Video generation has seen remarkable progress thanks to advancements in generative deep learning. However, generating long sequences remains a significant challenge. Generated videos should not only display coherent and continuous movement but also meaningful movement in successions of scenes. Models such as GANs, VAEs, and Diffusion Networks have been used for generating short video sequences, typically up to 16 frames. In this paper, we first propose a new type of video generator by enabling adversarial-based unconditional video generators with a variational encoder, akin to a VAE-GAN hybrid structure. The proposed model, as in other video deep learning-based processing frameworks, incorporates two processing branches, one for content and another for movement. However, existing models struggle with the temporal scaling of the generated videos. Classical approaches often result in degraded video quality when attempting to increase the generated video length, especially for significantly long sequences. To overcome this limitation, our research study extends the initially proposed VAE-GAN video generation model by employing a novel, memory-efficient approach to generate long videos composed of hundreds or thousands of frames ensuring their temporal continuity, consistency and dynamics. Our approach leverages a Markov chain framework with a recall mechanism, where each state represents a short-length VAE-GAN video generator. This setup enables the sequential connection of generated video sub-sequences, maintaining temporal dependencies and resulting in meaningful long video sequences.
We introduce Optimal Eye Surgeon (OES), a framework for pruning and training deep image generator networks. Typically, untrained deep convolutional networks, which include image sampling operations, serve as effective image priors (Ulyanov et al., 2018). However, they tend to overfit to noise in image restoration tasks due to being overparameterized. OES addresses this by adaptively pruning networks at random initialization to a level of underparameterization. This process effectively captures low-frequency image components even without training, by just masking. When trained to fit noisy images, these pruned subnetworks, which we term Sparse-DIP, resist overfitting to noise. This benefit arises from underparameterization and the regularization effect of masking, constraining them in the manifold of image priors. We demonstrate that subnetworks pruned through OES surpass other leading pruning methods, such as the Lottery Ticket Hypothesis, which is known to be suboptimal for image recovery tasks (Wu et al., 2023). Our extensive experiments demonstrate the transferability of OES-masks and the characteristics of sparse-subnetworks for image generation. Code is available at https://github.com/Avra98/Optimal-Eye-Surgeon.git.
Deep learning is actively being used in biometrics to develop efficient identification and verification systems. Handwritten signatures are a common subset of biometric data for authentication purposes. Generative adversarial networks (GANs) learn from original and forged signatures to generate forged signatures. While most GAN techniques create a strong signature verifier, which is the discriminator, there is a need to focus more on the quality of forgeries generated by the generator model. This work focuses on creating a generator that produces forged samples that achieve a benchmark in spoofing signature verification systems. We use CycleGANs infused with Inception model-like blocks with attention heads as the generator and a variation of the SigCNN model as the base Discriminator. We train our model with a new technique that results in 80% to 100% success in signature spoofing. Additionally, we create a custom evaluation technique to act as a goodness measure of the generated forgeries. Our work advocates generator-focused GAN architectures for spoofing data quality that aid in a better understanding of biometric data generation and evaluation.
We introduce BOLT-GAN, a novel framework for stable GAN training using the Bayes optimal learning threshold (BOLT). The discriminator is trained via the BOLT loss under a standard 1-Lipschitz constraint. This guides the generator to maximize the Bayes error of the discrimination task. We show that the training objective in this case represents a class of metrics on probability measures controlled by a 1-Lipschitz discriminator minimizing an integral probability metric that is upper-bounded by Wasserstein-1 distance. Across four standard image-generation benchmarks, BOLT-GAN improves FID and precision/recall over benchmark GAN frameworks under identical architectures and training budgets. Our experimental findings further confirm the advantage of linking the GAN training objective to a min-max Bayes error criterion.
Generating molecules with desired chemical properties presents a critical challenge in fields such as chemical synthesis and drug discovery. Recent advancements in artificial intelligence (AI) and deep learning have significantly contributed to data-driven molecular generation. However, challenges persist due to the inherent sensitivity of simplified molecular input line entry system (SMILES) representations and the difficulties in applying generative adversarial networks (GANs) to discrete data. This study introduces RL-MolGAN, a novel Transformer-based discrete GAN framework designed to address these challenges. Unlike traditional Transformer architectures, RL-MolGAN utilizes a first-decoder-then-encoder structure, facilitating the generation of drug-like molecules from both $de~novo$ and scaffold-based designs. In addition, RL-MolGAN integrates reinforcement learning (RL) and Monte Carlo tree search (MCTS) techniques to enhance the stability of GAN training and optimize the chemical properties of the generated molecules. To further improve the model's performance, RL-MolWGAN, an extension of RL-MolGAN, incorporates Wasserstein distance and mini-batch discrimination, which together enhance the stability of the GAN. Experimental results on two widely used molecular datasets, QM9 and ZINC, validate the effectiveness of our models in generating high-quality molecular structures with diverse and desirable chemical properties.
Generative Adversarial Networks (GANs) are very useful methods to address semi-supervised learning (SSL) datasets, thanks to their ability to generate samples similar to real data. This approach, called SSL-GAN has attracted many researchers in the last decade. Evolutionary algorithms have been used to guide the evolution and training of SSL-GANs with great success. In particular, several co-evolutionary approaches have been applied where the two networks of a GAN (the generator and the discriminator) are evolved in separate populations. The co-evolutionary approaches published to date assume some spatial structure of the populations, based on the ideas of cellular evolutionary algorithms. They also create one single individual per generation and follow a generational replacement strategy in the evolution. In this paper, we re-consider those algorithmic design decisions and propose a new co-evolutionary approach, called Co-evolutionary Elitist SSL-GAN (CE-SSLGAN), with panmictic population, elitist replacement, and more than one individual in the offspring. We evaluate the performance of our proposed method using three standard benchmark datasets. The results show that creating more than one offspring per population and using elitism improves the results in comparison with a classical SSL-GAN.
Quantum machine learning is known as one of the promising applications of quantum computers. Many types of quantum machine learning methods have been released, such as Quantum Annealer, Quantum Neural Network, Variational Quantum Algorithms, and Quantum Reservoir Computers. They can work, consuming far less energy for networks of equivalent size. Quantum Reservoir Computers, in particular, have no limit on the size of input data. However, their accuracy is not enough for practical use, and the effort to improve accuracy is mainly focused on hardware improvements. Therefore, we propose the approach from software called Quantum Reservoir Generative Adversarial Network (GAN), which uses Quantum Reservoir Computers as a generator of GAN. We performed the generation of handwritten single digits and monochrome pictures on the CIFAR-10 and Fashion-MNIST datasets. As a result, Quantum Reservoir GAN is confirmed to be more accurate than Quantum GAN, Classical Neural Network, and ordinary Quantum Reservoir Computers.
Digital imaging systems have traditionally relied on brute-force measurement and processing of pixels arranged on regular grids. In contrast, the human visual system performs significant data reduction from the large number of photoreceptors to the optic nerve, effectively encoding visual information into a low-bandwidth latent space representation optimized for brain processing. Inspired by this, we propose a similar approach to advance artificial vision systems. Latent Space Imaging introduces a new paradigm that combines optics and software to encode image information directly into the semantically rich latent space of a generative model. This approach substantially reduces bandwidth and memory demands during image capture and enables a range of downstream tasks focused on the latent space. We validate this principle through an initial hardware prototype based on a single-pixel camera. By implementing an amplitude modulation scheme that encodes into the generative model's latent space, we achieve compression ratios ranging from 1:100 to 1:1000 during imaging, and up to 1:16384 for downstream applications. This approach leverages the model's intrinsic linear boundaries, demonstrating the potential of latent space imaging for highly efficient imaging hardware, adaptable future applications in high-speed imaging, and task-specific cameras with significantly reduced hardware complexity.
Generative adversarial networks (GANs) learn a latent space whose samples can be mapped to real-world images. Such latent spaces are difficult to interpret. Some earlier supervised methods aim to create an interpretable latent space or discover interpretable directions, which requires exploiting data labels or annotated synthesized samples for training. However, we propose using a modification of vector quantization called space-filling vector quantization (SFVQ), which quantizes the data on a piece-wise linear curve. SFVQ can capture the underlying morphological structure of the latent space, making it interpretable. We apply this technique to model the latent space of pre-trained StyleGAN2 and BigGAN networks on various datasets. Our experiments show that the SFVQ curve yields a general interpretable model of the latent space such that it determines which parts of the latent space correspond to specific generative factors. Furthermore, we demonstrate that each line of the SFVQ curve can potentially refer to an interpretable direction for applying intelligible image transformations. We also demonstrate that the points located on an SFVQ line can be used for controllable data augmentation.
This paper examines the evolving nature of internal representations in generative visual models, focusing on the conceptual and technical shift from GANs and VAEs to diffusion-based architectures. Drawing on Beatrice Fazi's account of synthesis as the amalgamation of distributed representations, we propose a distinction between "synthesis in a strict sense", where a compact latent space wholly determines the generative process, and "synthesis in a broad sense," which characterizes models whose representational labor is distributed across layers. Through close readings of model architectures and a targeted experimental setup that intervenes in layerwise representations, we show how diffusion models fragment the burden of representation and thereby challenge assumptions of unified internal space. By situating these findings within media theoretical frameworks and critically engaging with metaphors such as the latent space and the Platonic Representation Hypothesis, we argue for a reorientation of how generative AI is understood: not as a direct synthesis of content, but as an emergent configuration of specialized processes.
Diffusion models are emerging as powerful solutions for generating high-fidelity and diverse images, often surpassing GANs under many circumstances. However, their slow inference speed hinders their potential for real-time applications. To address this, DiffusionGAN leveraged a conditional GAN to drastically reduce the denoising steps and speed up inference. Its advancement, Wavelet Diffusion, further accelerated the process by converting data into wavelet space, thus enhancing efficiency. Nonetheless, these models still fall short of GANs in terms of speed and image quality. To bridge these gaps, this paper introduces the Latent Denoising Diffusion GAN, which employs pre-trained autoencoders to compress images into a compact latent space, significantly improving inference speed and image quality. Furthermore, we propose a Weighted Learning strategy to enhance diversity and image quality. Experimental results on the CIFAR-10, CelebA-HQ, and LSUN-Church datasets prove that our model achieves state-of-the-art running speed among diffusion models. Compared to its predecessors, DiffusionGAN and Wavelet Diffusion, our model shows remarkable improvements in all evaluation metrics. Code and pre-trained checkpoints: \url{https://github.com/thanhluantrinh/LDDGAN.git}
Generative adversarial networks (GANs) are machine learning models that are used to estimate the underlying statistical structure of a given dataset and as a result can be used for a variety of tasks such as image generation or anomaly detection. Despite their initial simplicity, designing an effective loss function for training GANs remains challenging, and various loss functions have been proposed aiming to improve the performance and stability of the generative models. In this study, loss function design for GANs is presented as an optimization problem solved using the genetic programming (GP) approach. Initial experiments were carried out using small Deep Convolutional GAN (DCGAN) model and the MNIST dataset, in order to search experimentally for an improved loss function. The functions found were evaluated on CIFAR10, with the best function, named GANetic loss, showing exceptionally better performance and stability compared to the losses commonly used for GAN training. To further evalute its general applicability on more challenging problems, GANetic loss was applied for two medical applications: image generation and anomaly detection. Experiments were performed with histopathological, gastrointestinal or glaucoma images to evaluate the GANetic loss in medical image generation, resulting in improved image quality compared to the baseline models. The GANetic Loss used for polyp and glaucoma images showed a strong improvement in the detection of anomalies. In summary, the GANetic loss function was evaluated on multiple datasets and applications where it consistently outperforms alternative loss functions. Moreover, GANetic loss leads to stable training and reproducible results, a known weak spot of GANs.
Structured output prediction problems are ubiquitous in machine learning. The prominent approach leverages neural networks as powerful feature extractors, otherwise assuming the independence of the outputs. These outputs, however, jointly encode an object, e.g. a path in a graph, and are therefore related through the structure underlying the output space. We discuss the semantic loss, which injects knowledge about such structure, defined symbolically, into training by minimizing the network's violation of such dependencies, steering the network towards predicting distributions satisfying the underlying structure. At the same time, it is agnostic to the arrangement of the symbols, and depends only on the semantics expressed thereby, while also enabling efficient end-to-end training and inference. We also discuss key improvements and applications of the semantic loss. One limitations of the semantic loss is that it does not exploit the association of every data point with certain features certifying its membership in a target class. We should therefore prefer minimum-entropy distributions over valid structures, which we obtain by additionally minimizing the neuro-symbolic entropy. We empirically demonstrate the benefits of this more refined formulation. Moreover, the semantic loss is designed to be modular and can be combined with both discriminative and generative neural models. This is illustrated by integrating it into generative adversarial networks, yielding constrained adversarial networks, a novel class of deep generative models able to efficiently synthesize complex objects obeying the structure of the underlying domain.
Closed-loop simulation and scalable pre-training for autonomous driving require synthesizing free-viewpoint driving scenes. However, existing datasets and generative pipelines rarely provide consistent off-trajectory observations, limiting large-scale evaluation and training. While recent generative models demonstrate strong visual realism, they struggle to jointly achieve interpolation consistency and extrapolation realism without per-scene optimization. To address this, we propose FreeGen, a feed-forward reconstruction-generation co-training framework for free-viewpoint driving scene synthesis. The reconstruction model provides stable geometric representations to ensure interpolation consistency, while the generation model performs geometry-aware enhancement to improve realism at unseen viewpoints. Through co-training, generative priors are distilled into the reconstruction model to improve off-trajectory rendering, and the refined geometry in turn offers stronger structural guidance for generation. Experiments demonstrate that FreeGen achieves state-of-the-art performance for free-viewpoint driving scene synthesis.
Generative adversarial network (GAN) models can synthesize highquality audio signals while ensuring fast sample generation. However, they are difficult to train and are prone to several issues including mode collapse and divergence. In this paper, we introduce SpecDiff-GAN, a neural vocoder based on HiFi-GAN, which was initially devised for speech synthesis from mel spectrogram. In our model, the training stability is enhanced by means of a forward diffusion process which consists in injecting noise from a Gaussian distribution to both real and fake samples before inputting them to the discriminator. We further improve the model by exploiting a spectrally-shaped noise distribution with the aim to make the discriminator's task more challenging. We then show the merits of our proposed model for speech and music synthesis on several datasets. Our experiments confirm that our model compares favorably in audio quality and efficiency compared to several baselines.
Medical image synthesis generates additional imaging modalities that are costly, invasive or harmful to acquire, which helps to facilitate the clinical workflow. When training pairs are substantially misaligned (e.g., lung MRI-CT pairs with respiratory motion), accurate image synthesis remains a critical challenge. Recent works explored the directional registration module to adjust misalignment in generative adversarial networks (GANs); however, substantial misalignment will lead to 1) suboptimal data mapping caused by correspondence ambiguity, and 2) degraded image fidelity caused by morphology influence on discriminators. To address the challenges, we propose a novel Deformation-aware GAN (DA-GAN) to dynamically correct the misalignment during the image synthesis based on multi-objective inverse consistency. Specifically, in the generative process, three levels of inverse consistency cohesively optimise symmetric registration and image generation for improved correspondence. In the adversarial process, to further improve image fidelity under misalignment, we design deformation-aware discriminators to disentangle the mismatched spatial morphology from the judgement of image fidelity. Experimental results show that DA-GAN achieved superior performance on a public dataset with simulated misalignments and a real-world lung MRI-CT dataset with respiratory motion misalignment. The results indicate the potential for a wide range of medical image synthesis tasks such as radiotherapy planning.
Agent-based models used in scenario planning for transportation and urban planning usually require detailed population information from the base as well as target scenarios. These populations are usually provided by synthesizing fake agents through deterministic population synthesis methods. However, these deterministic population synthesis methods face several challenges, such as handling high-dimensional data, scalability, and zero-cell issues, particularly when generating populations for target scenarios. This research looks into how a deep generative model called Conditional Tabular Generative Adversarial Network (CT-GAN) can be used to create target populations either directly from a collection of marginal constraints or through a hybrid method that combines CT-GAN with Fitness-based Synthesis Combinatorial Optimization (FBS-CO). The research evaluates the proposed population synthesis models against travel survey and zonal-level aggregated population data. Results indicate that the stand-alone CT-GAN model performs the best when compared with FBS-CO and the hybrid model. CT-GAN by itself can create realistic-looking groups that match single-variable distributions, but it struggles to maintain relationships between multiple variables. However, the hybrid model demonstrates improved performance compared to FBS-CO by leveraging CT-GAN ability to generate a descriptive base population, which is then refined using FBS-CO to align with target-year marginals. This study demonstrates that CT-GAN represents an effective methodology for target populations and highlights how deep generative models can be successfully integrated with conventional synthesis techniques to enhance their performance.
We present DSS-GAN, the first generative adversarial network to employ Mamba as a hierarchical generator backbone for noise-to-image synthesis. The central contribution is Directional Latent Routing (DLR), a novel conditioning mechanism that decomposes the latent vector into direction-specific subvectors, each jointly projected with a class embedding to produce a feature-wise affine modulation of the corresponding Mamba scan. Unlike conventional class conditioning that injects a global signal, DLR couples class identity and latent structure along distinct spatial axes of the feature map, applied consistently across all generative scales. DSS-GAN achieves improved FID, KID, and precision-recall scores compared to StyleGAN2-ADA across multiple tested datasets. Analysis of the latent space reveals that directional subvectors exhibit measurable specialization: perturbations along individual components produce structured, direction-correlated changes in the synthesized image.
Images captured underwater frequently have a low resolution as a result of a number of issues including light attenuation, backscattering, and colour distortion. The restoration of underwater images, which serves as an essential building block for the field of underwater vision research, remains a difficult endeavor. The process of removing the haziness and the colour distortion caused by the underwater environs is the main focus of the work that goes into the restoration of underwater images. Within the confines of this research, we present an enhanced approach for the enhancement of underwater images called Improved Cycle GAN (Generative Adversarial Network). The suggested approach makes use of a dual architecture that is composed of a generator network and a discriminator network in order to learn the mapping between low-quality underwater photographs and high-quality images. This dual architecture is comprised of a generator network and a discriminator network. The generator network is trained to transform the input image into an enhanced image, while the discriminator network evaluates the realism of the generated images. The suggested method outperforms state-of-the-art visual quality methods on a real-world UFO underwater image dataset. The proposed method is used to recover the original image. In order to measure quantity, the underwater image quality measure attributes called underwater image colourfulness measure (UICM), underwater image sharpness measure (UISM), and underwater image contrast measure (UIConM) are assessed. The proposed method could be employed in various underwater imaging processing applications, such as underwater surveillance, marine biology research, and underwater exploration, where high-quality images are crucial for effective analysis and decision-making.
ABSTRACT Acquiring sufficient visual information for the three-dimensional (3D) reconstruction of ships in navigation is particularly challenging. With the evolution of 3D reconstruction methodologies predicated on neural rendering, the computational pipeline for 3D reconstruction has undergone enhancements and optimizations. However, this pipeline necessitates a substantial corpus of input images. Research into 3D reconstruction from monocular images is in its nascent stages, and to date, no unsupervised deep learning approach for 3D reconstruction of ships from single-view UAV imagery exists within the realm of navigation. This paper introduces a novel network architecture for reconstructing 3D representations of ships from single-view UAV images. Initially, a priori statistical analysis of the dataset is conducted to harness color distribution information for noise generation. Subsequently, a novel generator and mask module are engineered to produce optimized feature outputs. Plus, discriminator and encoder networks, coupled with a tailored loss function, are formulated to direct model optimization. Ultimately, to demonstrate the effectiveness of our proposed method for single-view 3D reconstruction, we conducted experiments across three distinct datasets from various domains. Our method's FID value of 10.61 is impressive. At the same time, it yields an LPIPS value of 0.091, which is the best among the six different methods.
Recently, generative adversarial networks (GANs) have provided a powerful data augmentation means for fault diagnosis in limited samples scenarios. Nevertheless, the challenge of poor generalization ability with extremely limited training data still limits effective applications of GANs-based methods for industrial fault diagnosis. To address this challenge, a novel generalization generator is proposed for machinery fault diagnostics under extremely limited samples. It mainly consists of a random sampling module, a generating module and a generalization module to improve the quality of generated samples. Meanwhile, our proposed generalization generator can be flexibly combined with the architecture of GANs, which proves its universality and robustness. A case study for machinery fault diagnostics considering extremely limited samples has been carried out to validate the effectiveness and superiority of our proposed method. Experiment results have demonstrated that our proposed generalization generator can generate high-quality time-series samples and achieve superior diagnostic results under extremely limited training data, in contrast to some mainstream data augmentation approaches.
Evolutionary Algorithms (EA) have been leveraged to tackle the challenges faced while using GANs such as mode collapse, vanishing gradient, latent space search, etc. However, the existing techniques of using EA with GANs operate backpropagation and EA in isolation from each other, leaving ample room for further exploration. This paper creates a collaborative bridge between EA and GANs by exploring a neuroevolution method for utilising both EA and backpropagation-based optimisation, simultaneously, for a multi-generator GAN architecture. Experiments conducted using a standard dataset with variants of the proposed method highlight the towering impact of each of the components involved in the proposed method.
Underwater image enhancement and restoration pose persistent challenges in image processing. As light propagates through water, it undergoes scattering and absorption caused by depth and suspended particles, leading to blurring, haze, and color distortion—particularly in the blue and green channels. Although prior-knowledge-based and direct restoration methods offer certain improvements, they are often scene-dependent and require complex parameter tuning, limiting their practical applicability. Additionally, many deep learning-based approaches are computationally intensive and architecturally complex. To address these limitations, we propose a novel method based on generative adversarial training using real underwater images. The generator architecture is inspired by DCE-Net and enhanced with a compensation block to improve output quality. Comparative evaluations with state-of-the-art methods show that our approach delivers superior visual performance and more natural color restoration, highlighting its potential for real-world applications.
Digital communication requires strong methods to protect sensitive information from unauthorized access. Traditional image steganography techniques, like Least Significant Bit (LSB) substitution, have major drawbacks in balancing payload capacity with protection against steganalysis detection. This study introduces a new Generative Adversarial Network (GAN)-based steganography framework with three main parts: a Generator for message embedding, a Discriminator for training that ensures the stego image looks real, and an Extractor for accurate message recovery. Experimental tests on the Set14 dataset show better performance than traditional LSB and basic GAN methods. It achieves a Peak Signal-to-Noise Ratio (PSNR) of 39.10 dB, a Structural Similarity Index (SSIM) of 0.97, and a Bit Accuracy Rate (BAR) of 96.80%. The proposed framework provides better security and visual invisibility, making it suitable for secure communication in fields like defense, healthcare, and digital media.
Abstract—Convolutional deblurring is an advanced image restoration process that aims to recover sharp images from blurred ones caused by motion, defocus, or camera shake. This paper presents a deep learning approach leveraging DeblurGAN- v2, an improved generative adversarial network (GAN) archi- tecture. The model integrates a lightweight generator with a hierarchical discriminator and utilizes attention mechanisms and dense skip connections to retain fine image details. A novel loss function is introduced to balance perceptual, structural similarity, and adversarial components, reducing oversmoothing and enhancing restoration quality. Index Terms—Image Deblurring, Convolutional Neural Net- works, Generative Adversarial Networks, DeblurGAN-v2, Image Restoration, Attention Mechanism, SSIM, PSNR.
Adaptive GAN Architecture with Constrained Latent Embedding for Unsupervised Video Anomaly Detection
In order to do unsupervised video anomaly identification, we provide CVAD-GAN, an adaptive GAN based framework that incorporates a latent embedding technique that is subject to restrictions. The model consists of Generator that is grounded on an encoder-decoder and a Discriminator, which are adversarially trained to discriminate amongst normal and anomalous video frames. To enhance the reconstruction faithfulness and discrimination of anomalies, the latent space is regularised with Kullback-Leibler(KL) divergence, which imposes a Gaussian prior. To enhance generalizability and robustness, input frames are adjusted using white Gaussian noise. The Generator is tuned to reconstruction of normal frames, and anomalies cause distortion because they were not seen in advance. Skip connections and dilated convolutions allow preserving spatial features, enlarging the receptive field and quality of reconstruction. The outcomes of the Discriminator and the pixel-wise restoration errors are used to compute anomalous scores, and the optimum threshold is found via the search for grids. Experiments on dataset UCSD Peds1 show that CVAD-GAN outperforms the current methods in relations of accuracy and speed of performance. The framework is compact and can be executed in resource-limited devices in real-time surveillance purposes.
In this paper, to achieve efficient and precise video anomaly detection, an improved GAN-based framework, VALD-GAN is introduced in which a latent discriminator is used to impose a Gaussian distribution on the latent space. The model learns normal input video frames and rebuilds them and anomalies are identified via reconstruction errors and a new distance measure with Jeffrey divergence. The latent discriminator enhances the latent space of the generator, which makes distinction of anomalies easier. Superior performance is shown by experiments on two benchmark datasets UCSD Peds2 and CUHK Avenue. On UCSD Peds2, VALD-GAN attains an AUC of 98.74 percent and EER of 7.01 percent, and on CUHK Avenue, it gets an AUC of 95.03 percent and EER of 9.04 percent, surpassing current state-of-the-art approaches. Generalization is enhanced by adding Gaussian noise ($\sigma=\mathbf{0. 3 4}$) and the best reconstruction quality is obtained with hyperparameters $\lambda_{1}=0.4$ and $\lambda_{2}=0.8$. The mean inference duration is less and VALD-GAN can be applicable in real-time environments. Such architecture makes a massive improvement in the localization and classification accuracy of anomalies.
Generative Adversarial Networks (GANs) traditionally consist of two networks: a generator that creates new samples and a discriminator that evaluates the authenticity of these samples. Both networks are trained together competitively to generate samples indistinguishable from real data. This paper proposes a novel GAN architecture, Frank-GAN, which comprises two generators and one discriminator. The inspiration for this work comes from the movie "Catch Me If You Can." In the film, the main character, Frank, deceives the government by forging checks and certifications. After apprehending him, the government decides to utilize his expertise by appointing him to banking security to catch other forgers. Similarly, the second generator in Frank-GAN is specially trained using the feature-matching technique to match the activations of hidden discriminator-layer feature values between real and fake images, to enhance the GAN's effectiveness and stability. Unlike traditional GANs that rely on random noise as input, the input to the second generator consists of keypoints that are extracted from real images using a Scale-invariant feature (SIFT). Frank-GAN was evaluated on the CelebA and The Oxford 102 Flower datasets. The Fréchet inception distance metric (FID) improved from 48.5 to 7.04 on the CelebA dataset and from 49.84 to 6.04 on the Oxford 102 Flower dataset, while the inception score (IS) increased significantly from 2.70 to 18.68 on the CelebA dataset, when using Frank_GAN compared to traditional DCGANs. This demonstrates the good performance of Frank-GAN in generating high-quality and realistic images at 128×128 size.
This study presents a resource-efficient synthetic image generation architecture that integrates evolutionary optimization with wavelet-based adversarial learning. The proposed framework embeds a genetic algorithm within the generator to adaptively optimize network parameters, while the discriminator employs wavelet transform-based feature extraction to enhance classification fidelity and accelerate convergence. Experimental evaluation using the Fréchet Inception Distance (FID) across MNIST, Fashion-MNIST, and CelebA datasets demonstrates a 20% reduction in training time compared to conventional GAN architectures, with competitive generation quality. To enable practical deployment, the architecture was synthesized and implemented on the AMD Xilinx Kintex-7 FPGA KC705 platform. Benchmarking against state-of-the-art models reveals substantial computational gains: the proposed design achieves a 33%-76% reduction in inference latency (<inline-formula> <tex-math notation="LaTeX">$138\mu $ </tex-math></inline-formula>s vs. <inline-formula> <tex-math notation="LaTeX">$570\mu $ </tex-math></inline-formula>s), a 36% decrease in power consumption (24mW vs. 47mW), and the lowest area footprint (21.7%) among all compared implementations. Resource utilization metrics further highlight architectural efficiency, requiring only 6,000 LUTs, 4,500 FFs, 10 BRAMs, and 6 DSPs on MNIST—representing a 43%-50% reduction relative to prior works. As dataset complexity increases, the model scales predictably, with deeper pipelines and expanded tiling (e.g., <inline-formula> <tex-math notation="LaTeX">$64\times 64$ </tex-math></inline-formula> for CelebA), while maintaining synthesis stability. The convergence of evolutionary search, wavelet-based discrimination, and hardware-aware design establishes a robust framework for real-time synthetic data generation in resource-constrained environments. The proposed architecture offers a reproducible and modular foundation for edge AI applications, enabling efficient deployment across diverse domains.
: This study presents a groundbreaking method named Expo-GAN (Exposition-Generative Adversarial Network) for style transfer in exhibition hall design, using a refined version of the Cycle Generative Adversarial Network (CycleGAN). The primary goal is to enhance the transformation of image styles while maintaining visual consistency, an area where current CycleGAN models often fall short. These traditional models typically face difficulties in accurately capturing expansive features as well as the intricate stylistic details necessary for high-quality image transformation. To address these limitations, the research introduces several key modifications to the CycleGAN architecture. Enhance-ments to the generator involve integrating U-net with SpecTransformer modules. This integration incorporates the use of Fourier transform techniques coupled with multi-head self-attention mechanisms, which collectively improve the generator’s ability to depict both large-scale structural patterns and minute elements meticulously in the generated images. This enhancement allows the generator to achieve a more detailed and coherent fusion of styles, essential for exhibition hall designs where both broad aesthetic strokes and detailed nuances matter significantly. The study also proposes innovative changes to the discriminator by employing dilated convolution and global attention mechanisms. These are derived using the Differentiable Architecture Search (DARTS) Neural Architecture Search framework to expand the receptive field, which is crucial for recognizing comprehensive artistically styled images. By broadening the ability to discern complex artistic features, the model avoids previous pitfalls associated with style inconsistency and missing detailed features. Moreover, the traditional cyde-consistency loss function is replaced with the Learned Perceptual Image Patch Similarity (LPIPS) metric. This shift aims to significantly enhance the perceptual quality of the resultant images by prioritizing human-perceived similarities, which aligns better with user expectations and professional standards in design aesthetics. The experimental phase of this research demonstrates that this novel approach consistently outperforms the conventional CycleGAN across a broad range of datasets. Complementary ablation studies and qualitative assessments underscore its superiority, particularly in maintaining detail fidelity and style continuity. This is critical for creating a visually harmonious exhibition hall design where every detail contributes to the overall aesthetic appeal. The results illustrate that this refined approach effectively bridges the gap between technical capability and artistic necessity, marking a significant advancement in computational design methodologies.
Cardiovascular diseases (CVDs) continue to be a primary cause of global mortality, requiring prompt detection and care to avert serious consequences. Electrocardiograms (ECGs) are vital diagnostic instruments for cardiovascular disorders, yet traditional diagnostic methods encounter considerable obstacles, such as class imbalance and substantial heterogeneity in the available data. This study introduces a novel ECG classification technique utilizing GAN and LSTM. Generative Adversarial Networks (GANs) are employed to produce additional synthetic ECG samples, addressing class imbalance in the training dataset. The suggested design integrates LSTM units into the GAN generator to collect and maintain temporal signal patterns, while anomaly detection is performed using a CNN-LSTM model. Experimental findings indicate a significant enhancement in classification performance, achieving an accuracy of up to 94.74%. The findings indicate that the proposed GAN-LSTM model significantly improves ECG anomaly detection and classification, offering substantial potential for the rapid and precise identification of arrhythmias.
Accurate and automated segmentation of x‐ray coronary angiography (XRCA) is crucial for both diagnosing and treating coronary artery diseases. Despite the outstanding results achieved by deep learning (DL)‐based methods in this area, this task remains challenging due to several factors such as poor image quality, the presence of motion artifacts, and inherent variability in vessel structure sizes. To address this challenge, this paper introduces a novel GAN‐based architecture for coronary artery segmentation using XRCA images. This architecture includes a novel U‐Net variant with two types of self‐attention blocks in the generator segment. An auxiliary path connects the attention block and the prediction block to enhance feature generalization, improving vessel structure delineation, especially thin vessels in low‐contrast regions. In parallel, the discriminator network employs a residual CNN with similar attention blocks for balanced performance and improved predictive capabilities. With a streamlined 6.74 M parameters, the resulting architecture surpasses existing methods in efficiency. We assess its efficacy on three coronary artery datasets: our private “CORONAR,” and the public “DCA1” and “CHUAC” datasets. Empirical results showcase our model's superiority across these datasets, utilizing both original and preprocessed images. Notably, our proposed architecture achieves the highest F1‐score of 0.7972 for the CHUAC dataset, 0.8245 for the DCA1 dataset, and 0.8333 for the CORONAR dataset.
A significant challenge in computer vision, which aims to recognise a person across numerous camera views, is the re-identification of persons over several non-interconnected camera views. Generative Adversarial Networks (GANs) have been employed successfully in person re-identification tasks, where they can generate realistic images of a person based on a query image. In this paper, a new novel Dual-Generator and Dual-Discriminator architecture for Conditional GANs is used for re-identification. The proposed architecture consists of two image generators and two discriminators that generate high-quality images and capture a person's identity information efficiently. The discriminators use the Wasserstein distance method for classification between real and fake images. Additionally, the system utilizes low-light image optimization for pre-processing the images, which improves the quality and visibility of the images captured in low-light conditions, making them more suitable for re-identification. Study results show that the Dual-Generator and Dual-Discriminator architecture for Conditional GAN re-identification achieves improved accuracy and resilience. This research paper presents a new cutting-edge method for person-re identification.
Generative Adversarial Networks (GANs) have emerged as a powerful tool for generating high-fidelity content. This paper presents a new training procedure that leverages Neural Architecture Search (NAS) to discover the optimal architecture for image generation while employing the Maximum Mean Discrepancy (MMD) repulsive loss for adversarial training. Moreover, the generator network is compressed using tensor decomposition to reduce its computational footprint and inference time while preserving its generative performance. Experimental results show improvements of 34% and 28% in the FID score on the CIFAR-10 and STL-10 datasets, respectively, with corresponding footprint reductions of 14× and 31× compared to the best FID score method reported in the literature. The implementation code is available at: https://github.com/PrasannaPulakurthi/MMD-AdversarialNAS.
Denoising dental panoramic X-ray images has become a major concern in medical imaging and computer vision, particularly when the Gaussian noise is high. To denoise the noisy images while preserving image features, we introduce a fusion-refine generative adversarial network (FR-GAN). The FR-GAN comprises a generator, a dual-attention U-Net, and a ResNet enhanced discriminator. The generator takes noisy images as input and produces coarse representations for initial reconstruction. Then, these coarse images are used as a preliminary denoised image, which is again refined to restore structural features for image clarity. In addition, we incorporate the U-Net with a dual-attention mechanism linking the generator and discriminator, which enables the refinement of coarse features into smooth features to create meaningful representations. Meanwhile, the U-Net refiner incorporates progressive feature fusion attention and low-rank local window attention to complement multi-scale feature extraction and local texture refinement. Lastly, the ResNet block-enhanced discriminator implements perceptual realism by differentiating genuine and reconstructed images and directing the network to yield high-quality images. In our experiment, 100 dental images collected from Al-Badar Dental College and Hospital, Kalaburagi, were subjected to Gaussian noise at 20 dB and 40 dB using ImageJ software. To evaluate the effectiveness of the proposed approach, experiments were conducted with Gaussian noise at 20 dB and 40 dB. The experimental results show that the proposed method is superior to other state-of-the-art approaches for Gaussian noise of 20 dB and 40 dB. It achieves a PSNR of 31.705, SSIM of 0.826, MSE of 0.0007, and MAE of 0.0195 at 40 dB, and a PSNR of 32.897, SSIM of 0.861, MSE of 0.0007, and MAE of 0.0184 at 20 dB. The framework provides effective high-quality denoising with clean edges, clear textures, and few artifacts, and is applicable to precision-critical tasks, such as medical and scientific imaging.
Images captured under severe environmental degradation, such as sand, dust, and haze, significantly reduce the performance of vision-based systems, including autonomous vehicles, surveillance platforms, remote sensing missions, UAVs, and other computer vision applications. To address these challenges, this paper proposes a GTD (Generator–Transformer–Discriminator) hybrid architecture for sand-dust and haze image enhancement, designed to handle multi-scale hierarchical distortions that affect both local textures and global color distributions. The proposed framework integrates lightweight generative modules for local feature extraction with Transformer-based global feature modelling. Specifically, the conventional encoder is replaced by three parallel generator–Transformer pathways operating on full-, half-, and quarter-resolution inputs, enabling effective representation of degradations across different receptive fields. Features at each scale are refined using attention fusion blocks that incorporate channel-wise and spatial attention mechanisms, followed by adaptive multi-scale feature combination and a unified discriminator for image restoration. The model is trained using a composite loss function consisting of adversarial, perceptual, L1, and SSIM losses. Experimental evaluations on sand-dust and haze datasets demonstrate that the proposed method achieves consistent improvements in terms of PSNR, SSIM, DSSIM, CIE94, visual quality, inference speed, and computational efficiency when compared with representative benchmark methods. In addition, an Energy Efficiency Index (EEI) is introduced to assess the trade-off between restoration quality and energy consumption. Owing to its lightweight design and efficient inference, the proposed framework is suitable for real-time and embedded vision applications operating under challenging environmental conditions.
Ship detection in optical remote sensing (RS) images remains a persistent challenge in current research. While prevailing methods achieve satisfactory outcomes in detecting large ship objects within RS images, the identification of small-ship objects poses greater difficulty due to their limited pixel information. To address this challenge, the utilization of generative adversarial network-based (GAN-based) super-resolution (SR) techniques proves effective. Therefore, in this article, we present a high-resolution feature generator (HRFG) specifically tailored for small-ship detection. Different from previous GAN-based methods that rely on image-level SR or feature sharing between SR and detection, we design a new architecture that uses an additional network branch, that is, high-resolution feature extractor (HRFE), to extract real high-resolution (HR) feature as a feature-level supervisory signal. The intuition is that real HR features may guide the generator network to extract HR features from low-resolution (LR) images directly. Consequently, the feature for detection is extracted and enhanced at the same time so that a large amount of calculation brought by image-level SR is avoided. Additionally, we introduce a background degradation strategy within the HRFE to improve the performance of small object recognition. Extensive experiments on a self-assembled ship dataset and two other public datasets show the superiority of the proposed method in small-ship detection tasks.
No abstract available
This work focuses on designing a modified Generative Adversarial Network (GAN) to improve the generator's performance while keeping the discriminator's influence minimal. The network is trained using the MNIST dataset, which consists of handwritten digits in grayscale. A DCGAN-inspired (Deep Convolutional GAN) architecture is adopted, featuring convolutional layers in both the generator and discriminator. To ensure stable training and encourage the generator to produce more realistic images, several enhancements are applied. These include adding noise to real images, applying label smoothing, and introducing dropout layers within the discriminator to prevent it from overfitting. Additionally, the discriminator is trained with a slower learning rate compared to the generator, helping to maintain a balance between the two networks. During training, both generator and discriminator losses and accuracies are tracked and visualized. The results show that the generator becomes more effective over time, producing images that the discriminator struggles to classify correctly. As training progresses, the generator loss decreases and its accuracy increases, while the discriminator's accuracy tends toward chance level. These outcomes highlight how targeted adjustments to training and architecture can improve GAN performance by allowing the generator to learn better and replicate the characteristics of real data.
Although Generative Adversarial Networks (GANs) have displayed a significant capacity for creating realistic data, they are frequently plagued by issues of mode collapse and instability during training. Advances such as the Wasserstein GAN with Gradient Penalty (WGAN-GP) have alleviated these problems, but improving sample quality and diversity by augmenting the generator's expressive power continues to be a vital research direction. This study puts forth a novel hybrid generator design that incorporates an Operational Neural Network (ONN) block within a WGAN-GP architecture. The central hypothesis is that the ONN's capacity for learning adaptive non-linearities—which allows neurons to define their own activation functions—endows the generator with an enhanced representational ability to model complex data distributions. A comparative analysis is performed on three models trained on the MNIST dataset: a baseline GAN, a WGAN-GP, and the proposed WGAN-GP with a hybrid ONN generator. The Fréchet Inception Distance (FID) score is used for performance evaluation. The findings consistently indicate that the hybrid ONN model attains lower FID scores, signifying a notable enhancement in the quality and diversity of the generated images and thereby underscoring the promise of learnable neuron operators in generative modeling.
Recent advances in text-to-image synthesis have captivated audiences worldwide, drawing considerable attention. Although significant progress in generating photo-realistic images through large pre-trained autoregressive and diffusion models, these models face three critical constraints: (1) The requirement for extensive training data and numerous model parameters; (2) Inefficient, multi-step image generation process; and (3) Difficulties in controlling the output visual features, requiring complexly designed prompts to ensure text-image alignment. Addressing these challenges, we introduce the CLIP-GAN model, which innovatively integrates the pretrained CLIP model into both the generator and discriminator of the GAN. Our architecture includes a CLIP-based generator that employs visual concepts derived from CLIP through text prompts in a feature adapter module. We also propose a CLIP-based discriminator, utilizing CLIP's advanced scene understanding capabilities for more precise image quality evaluation. Additionally, our generator applies visual concepts from CLIP via the Text-based Generator Block (TG-Block) and the Polarized Feature Fusion Module (PFFM) enabling better fusion of text and image semantic information. This integration within the generator and discriminator enhances training efficiency, enabling our model to achieve evaluation results not inferior to large pre-trained autoregressive and diffusion models, but with a 94% reduction in learnable parameters. CLIP-GAN aims to achieve the best efficiency-accuracy trade-off in image generation given the limited resource budget. Extensive evaluations validate the superior performance of the model, demonstrating faster image generation speed and the potential for greater stylistic diversity within the GAN model, while still preserving its smooth latent space.
Underwater images frequently suffer from severe quality degradation due to light attenuation and scattering effects, manifesting as color distortion, low contrast, and detail blurring. These issues significantly impair the performance of downstream tasks. Therefore, underwater image enhancement (UIE) becomes a key technology to solve underwater image degradation. However, existing data-driven UIE methods typically rely on difficult-to-acquire paired data for training, severely limiting their practical applicability. To overcome this limitation, this study proposes MambaRA-GAN, a novel unpaired UIE framework built upon a CycleGAN architecture, which introduces a novel integration of Mamba and intra-domain reconstruction autoencoders. The key innovations of our work are twofold: (1) We design a generator architecture based on a Triple-Gated Mamba (TG-Mamba) block. This design dynamically allocates feature channels to three parallel branches via learnable weights, achieving optimal fusion of CNN’s local feature extraction capabilities and Mamba’s global modeling capabilities. (2) We construct an intra-domain reconstruction autoencoder, isomorphic to the generator, to quantitatively assess the quality of reconstructed images within the cycle consistency loss. This introduces more effective structural information constraints during training. The experimental results demonstrate that the proposed method achieves significant improvements across five objective performance metrics. Visually, it effectively restores natural colors, enhances contrast, and preserves rich detail information, robustly validating its efficacy for the UIE task.
The detection and classification of lung nodules are crucial in medical imaging, as they significantly impact patient outcomes related to lung cancer diagnosis and treatment. However, existing models often suffer from mode collapse and poor generalizability, as they fail to capture the complete diversity of the data distribution. This study addresses these challenges by proposing a novel generative adversarial network (GAN) architecture tailored for semi-supervised lung nodule classification. The proposed DDDG-GAN model consists of dual generators and discriminators. Each generator specializes in benign or malignant nodules, generating diverse, high-fidelity synthetic images for each class. This dual-generator setup prevents mode collapse. The dual-discriminator framework enhances the model’s generalization capability, ensuring better performance on unseen data. Feature fusion techniques are incorporated to refine the model’s discriminatory power between benign and malignant nodules. The model is evaluated in two scenarios: (1) training and testing on the LIDC-IDRI dataset and (2) training on LIDC-IDRI, testing on the unseen LUNA16 dataset and the unseen LUNGx dataset. In Scenario 1, the DDDG-GAN achieved an accuracy of 92.56%, a precision of 90.12%, a recall of 95.87%, and an F1 score of 92.77%. In Scenario 2, the model demonstrated robust performance with an accuracy of 72.6%, a precision of 72.3%, a recall of 73.82%, and an F1 score of 73.39% when testing using Luna16 and an accuracy of 71.23%, a precision of 67.56%, a recall of 73.52%, and an F1 score of 70.42% when testing using LungX. The results indicate that the proposed model outperforms state-of-the-art semi-supervised learning approaches. The DDDG-GAN model mitigates mode collapse and improves generalizability in lung nodule classification. It demonstrates superior performance on both the LIDC-IDRI and the unseen LUNA16 and LungX datasets, offering significant potential for improving diagnostic accuracy in clinical practice.
High spatial and temporal resolution remote sensing images are essential for monitoring vegetation, natural disasters, and changes in the ground surface. However, acquiring such images is challenging due to current technical limitations and cost constraints. Spatiotemporal fusion offers an effective and economical solution to achieve high spatial and temporal resolution simultaneously. This article introduces a new generative adversarial network (GAN) spatiotemporal fusion model based on multiscale convolution and attention mechanism for remote sensing images (MSCAM-GAN), to generate high-resolution fused images. The generator in MSCAM-GAN comprises three key components: feature extraction, feature fusion, and image reconstruction. Employing an encoder–decoder architecture, the generator effectively extracts multilevel features, accommodating significant resolution differences between high-resolution and low-resolution images. In the feature extraction stage, multiscale convolutional attention network (MSCAN) captures detailed features across multiple scales, dealing with spatial dependencies and long-distance relationships within the images. During the feature fusion stage, a dual parallel attention feature fusion mechanism is designed to fully integrate the extracted multiscale features. Different attention weights are assigned based on their contributions to the final output, resulting in more accurate predicted images. MSCAM-GAN was tested on the Coleambally irrigated area and lower Gwydir catchment datasets and compared with classic spatiotemporal fusion algorithms. Ablation experiments were conducted to evaluate the effectiveness of the various submodules in MSCAM-GAN. Experimental results and ablation analysis demonstrate the superior performance of the proposed method compared to other approaches.
No abstract available
In the scenario of limited labeled remote-sensing datasets, the model’s performance is constrained by the insufficient availability of data. Generative model-based data augmentation has emerged as a promising solution to this limitation. While existing generative models perform well in natural scene domains (e.g., faces and street scenes), their performance in remote sensing is hindered by severe data imbalance and the semantic similarity among land-cover classes. To tackle these challenges, we propose the Multi-Class Guided GAN (MCGGAN), a novel network for generating remote-sensing images from semantic labels. Our model features a dual-branch architecture with a global generator that captures the overall image structure and a multi-class generator that improves the quality and differentiation of land-cover types. To integrate these generators, we design a shared-parameter encoder for consistent feature encoding across two branches, and a spatial decoder that synthesizes outputs from the class generators, preventing overlap and confusion. Additionally, we employ perceptual loss (LVGG) to assess perceptual similarity between generated and real images, and texture matching loss (LT) to capture fine texture details. To evaluate the quality of image generation, we tested multiple models on two custom datasets (one from Chongzhou, Sichuan Province, and another from Wuzhen, Zhejiang Province, China) and a public dataset LoveDA. The results show that MCGGAN achieves improvements of 52.86 in FID, 0.0821 in SSIM, and 0.0297 in LPIPS compared to the Pix2Pix baseline. We also conducted comparative experiments to assess the semantic segmentation accuracy of the U-Net before and after incorporating the generated images. The results show that data augmentation with the generated images leads to an improvement of 4.47% in FWIoU and 3.23% in OA across the Chongzhou and Wuzhen datasets. Experiments show that MCGGAN can be effectively used as a data augmentation approach to improve the performance of downstream remote-sensing image segmentation tasks.
Generative adversarial networks (GANs) can be used as a data augmentation technique in scenarios with limited labeled information and class imbalances, common issues in remote sensing datasets. The EfficientNet architecture has gained attention for achieving high accuracy with moderate computational cost. This work introduces efficient balancing generative adversarial network (EffBaGAN), a generative network specifically designed for the classification of multispectral remote sensing images based on EfficientNet, addressing data scarcity and class imbalances while minimizing network complexity. EffBaGAN is built upon a balancing generative adversarial network (BAGAN) architecture, incorporating a custom EfficientNet-based discriminator and generator. In particular, for the discriminator we propose reduced EfficientNet discriminator, a reduced version of EfficientNet-B0 adapted to multispectral imagery. The generator, residual EfficientNet generator, includes a residual EfficientNet-based path, which enhances the quality of the generated synthetic samples. In addition, a superpixel-based sample extraction procedure is used to further reduce the computational cost of the method. Experiments were conducted on large, very high-resolution multispectral images of vegetation, demonstrating that EffBaGAN achieves higher accuracy than other advanced classification methods, including vision transformers and residual BAGAN, while maintaining a significantly lower computational cost. In fact, EffBaGAN is more than twice as fast as the residual BAGAN, making it an efficient solution for remote sensing image classification in data-scarce environments.
We revisit the problem of generating synthetic data under differential privacy. To address the core limitations of marginal-based methods, we propose the Private Adaptive Generative Adversarial Network with Bayes Network Structure (PrAda-GAN), which integrates the strengths of both GAN-based and marginal-based approaches. Our method adopts a sequential generator architecture to capture complex dependencies among variables, while adaptively regularizing the learned structure to promote sparsity in the underlying Bayes network. Theoretically, we establish diminishing bounds on the parameter distance, variable selection error, and Wasserstein distance. Our analysis shows that leveraging dependency sparsity leads to significant improvements in convergence rates. Empirically, experiments on both synthetic and real-world datasets demonstrate that PrAda-GAN outperforms existing tabular data synthesis methods in terms of the privacy–utility trade-off.
In this paper, we propose a ConvNeXt-GAN model for generating blueberry leaf disease dataset in response to the problems of difficulty in acquiring the dataset and scarcity of sample size. The model incorporates the ConvNeXt feature extractor and conditional mapping network into StyleGAN3's generator architecture. It combines StyleGAN's original training objectives with pixel distance and feature matching losses to improve visual fidelity and semantic consistency in high-resolution image synthesis tasks. The experimental results prove that the blueberry leaf disease images generated by ConvNeXt-GAN are outstanding in visual effect and key evaluation indexes, and the performance of the disease detection model trained with this generated dataset has been significantly improved, which provides strong data support for the blueberry leaf disease detection task.
As machine learning systems become integral to military applications, for example, they face increasingly sophisticated adversarial threats. Conventional defenses often fail because they typically assume static adversarial strategies, lack real-time adaptation, and overlook complex data distributions and topological structures. To overcome this, we present TAP-GAN+, a real-time adversarial detection framework that integrates Fast Fourier Transform (FFT), Topological Data Analysis, dual-generator adversarial simulation, and entropy-aware decision logic. The pipeline first converts inputs into the frequency domain, extracting robust features via FFT and capturing multi-scale topological structures using persistent homology. These features are fused into a compact representation, enabling efficient classification into real, adversarial, or suspicious categories. Our training strategy includes a dual-generator architecture: G1 perturbs the pixel space, simulating clean-label attacks, while G2 introduces feature-space distortions to enhance robustness. The real-time detector evaluates fused features using entropy-based rules, ensuring low-latency decisions. Benchmark evaluations on CIFAR-10, CIFAKE, and CelebA datasets demonstrate superior detection accuracy, interpretability, and out-of-distribution generalization compared to prior methods. Optimized for mission-critical applications in military contexts, TAP-GAN+ enhances situational awareness by detecting adversarial threats in real-time.
本综述全面整合了GAN生成器在架构创新、计算范式、领域知识集成及理论分析方面的最新进展。当前研究呈现出明显的跨学科融合趋势:一方面,通过引入Mamba、Transformer及Diffusion混合架构不断推高生成质量的上限;另一方面,利用量子计算与演化策略探索非传统的模型优化路径。同时,生成器正从通用的图像合成向深度集成了领域知识、物理约束的可解释性模型演进,并借由自动化神经搜索技术实现了性能与部署效率的平衡。