扩散模型用于指纹生成
扩散模型在指纹生成中的前沿应用
这组文献集中讨论了如何将去噪扩散概率模型(DDPM)和潜扩散模型(LDM)应用于指纹合成,解决了指纹的身份保持、多样性生成、残缺指纹补全以及潜指纹(Latent Fingerprint)的端到端生成等挑战。
- Universal Fingerprint Generation: Controllable Diffusion Model With Multimodal Conditions(Steven A. Grosz, Anil K. Jain, 2024, IEEE Transactions on Pattern Analysis and Machine Intelligence)
- DiffFinger: Advancing Synthetic Fingerprint Generation through Denoising Diffusion Probabilistic Models(Fred M. Grabovski, Lior Yasur, Yaniv Hacmon, Lior Nisimov, Stav Nimrod, 2024, ArXiv)
- DENOISING DIFFUSION PROBABILISTIC MODEL WITH WAVELET PACKET TRANSFORM FOR FINGERPRINT GENERATION(Li Chen, Yong Chan, 2024, Jordanian Journal of Computers and Information Technology)
- Inpainting Diffusion Synthetic and Data Augment With Feature Keypoints for Tiny Partial Fingerprints(Mao-Hsiu Hsu, Yung-Ching Hsu, Ching-Te Chiu, 2025, IEEE Transactions on Biometrics, Behavior, and Identity Science)
- Diffusion Probabilistic Model Based End-to-End Latent Fingerprint Synthesis(Kejian Li, Xiao Yang, 2023, 2023 IEEE 4th International Conference on Pattern Recognition and Machine Learning (PRML))
- Fingerprint Synthesis from Diffusion Models and Generative Adversarial Networks(Weizhong Tang, Diego Andre Figueroa Llamosas, Donglin Liu, K. Johnsson, A. Sopasakis, 2025, No journal)
条件引导与可控扩散模型的理论基础
这类文献提供了扩散模型实现可控生成的理论框架和技术细节,包括ControlNet、潜扩散模型(LDM)、分数匹配(Score-matching)理论以及多条件引导技术。这些通用生成模型技术是实现指纹身份(ID)与样式(Style)精确控制的技术底座。
- Conditional Image Generation with Score-Based Diffusion Models(Georgios Batzolis, Jan Stanczuk, C. Schonlieb, Christian Etmann, 2021, ArXiv)
- UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild(Can Qin, Shu Zhang, Ning Yu, Yihao Feng, Xinyi Yang, Yingbo Zhou, Haiquan Wang, Juan Carlos Niebles, Caiming Xiong, S. Savarese, Stefano Ermon, Yun Fu, Ran Xu, 2023, ArXiv)
- Layout-to-Image Generation with Localized Descriptions using ControlNet with Cross-Attention Control(Denis Lukovnikov, Asja Fischer, 2024, ArXiv)
- DMM: Building a Versatile Image Generation Model via Distillation-Based Model Merging(Tian-Shu Song, Weixin Feng, Shuai Wang, Xubin Li, Tiezheng Ge, Bo Zheng, Limin Wang, 2025, ArXiv)
- FlexControl: Computation-Aware ControlNet with Differentiable Router for Text-to-Image Generation(Zheng Fang, Li Xiang, Xu Cai, Kaicheng Zhou, Hongkai Wen, 2025, ArXiv)
- SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis(Dustin Podell, Zion English, Kyle Lacey, A. Blattmann, Tim Dockhorn, Jonas Muller, Joe Penna, Robin Rombach, 2023, ArXiv)
- Fourier Diffusion Models: A Method to Control MTF and NPS in Score-Based Stochastic Image Generation.(Matthew Tivnan, Jacopo Teneggi, Tzu-Cheng Lee, Ruoqiao Zhang, K. Boedeker, L. Cai, G. Gang, Jeremias Sulam, J. Stayman, 2023, IEEE transactions on medical imaging)
- Monte Carlo Score Matching for Image Generation(Nishanth Shetty, C. Seelamantula, 2025, ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP))
- High-Resolution Image Synthesis with Latent Diffusion Models(Robin Rombach, A. Blattmann, Dominik Lorenz, Patrick Esser, B. Ommer, 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))
- Long and Short Guidance in Score identity Distillation for One-Step Text-to-Image Generation(Mingyuan Zhou, Zhendong Wang, Huangjie Zheng, Hai Huang, 2024, ArXiv)
- DC-ControlNet: Decoupling Inter- and Intra-Element Conditions in Image Generation with Diffusion Models(Hongji Yang, Wencheng Han, Yucheng Zhou, Jianbing Shen, 2025, ArXiv)
- Uncertainty-Aware ControlNet: Bridging Domain Gaps with Synthetic Image Generation(Joshua Niemeijer, Jan Ehrhardt, H. Handels, Hristina Uzunova, 2025, 2025 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW))
- Multilevel Diffusion: Infinite Dimensional Score-Based Diffusion Models for Image Generation(Paul Hagemann, Lars Ruthotto, G. Steidl, Ni Yang, 2023, ArXiv)
- Layered Rendering Diffusion Model for Controllable Zero-Shot Image Synthesis(Zipeng Qi, Guoxi Huang, Chenyang Liu, Fei Ye, 2023, No journal)
- DCFace: Synthetic Face Generation with Dual Condition Diffusion Model(Minchul Kim, Feng Liu, Anil Jain, Xiaoming Liu, 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))
基于GAN的传统指纹合成技术
在扩散模型流行之前,生成对抗网络(GAN)是合成指纹的主流方法。这组文献涵盖了利用DCGAN、CycleGAN、StyleTransfer等技术生成高分辨率指纹、三级特征指纹以及多印象指纹的研究。
- SynFi: Automatic Synthetic Fingerprint Generation(M. Riazi, Seyed M. Chavoshian, F. Koushanfar, 2020, ArXiv)
- A deep convolutional generative adversarial network-based fake fingerprint generation method(Chenhao Zhong, Pengxin Xu, Longsheng Zhu, 2021, 2021 IEEE International Conference on Computer Science, Electronic Information Engineering and Intelligent Control Technology (CEI))
- Vikriti-ID: A Novel Approach For Real Looking Fingerprint Data-set Generation(Rishabh Shukla, Aditya Sinha, Vanshika Singh, Harkeerat Kaur, 2024, 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV))
- Level Three Synthetic Fingerprint Generation(André Brasil Vieira Wyzykowski, M. P. Segundo, R. Lemes, 2020, 2020 25th International Conference on Pattern Recognition (ICPR))
- A Lightweight GAN Network for Large Scale Fingerprint Generation(Masud An Nur Islam Fahim, H. Jung, 2020, IEEE Access)
- HQ-finGAN: High-Quality Synthetic Fingerprint Generation Using GANs(Ataher Sams, Homaira Huda Shomee, S. Rahman, 2022, Circuits, Systems, and Signal Processing)
- High Fidelity Fingerprint Generation: Quality, Uniqueness, And Privacy(Keivan Bahmani, Richard O. Plesh, Peter A. Johnson, S. Schuckers, Timothy Swyka, 2021, 2021 IEEE International Conference on Image Processing (ICIP))
- Multiresolution synthetic fingerprint generation(André Brasil Vieira Wyzykowski, M. P. Segundo, R. Lemes, 2022, IET Biom.)
指纹生成的安全性评估与特定任务应用
这组文献关注指纹生成的下游应用,如通过生成假指纹进行欺骗攻击(Spoofing)测试、从细节点(Minutiae)重建指纹、以及利用生成图像评估现有指纹识别系统的安全性漏洞。
- Fingerprint Spoof Generation Using Style Transfer(Abdarahmane Wone, Joël Di Manno, Christophe Charrier, Christophe Rosenberger, 2025, IEEE Transactions on Biometrics, Behavior, and Identity Science)
- Fingerprint Image Generation Based on Attention-Based Deep Generative Adversarial Networks and Its Application in Deep Siamese Matching Model Security Validation(Jiahuai Ma, Xiaoyan Chen, 2024, Journal of Computational Methods in Engineering Applications)
- Deep Learning-Enhanced Fingerprint Generation and Security Verification in the Context of Siamese Network Matching Models(Junyan Guo, 2023, 2023 4th International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE))
- FingerGAN: A Constrained Fingerprint Generation Scheme for Latent Fingerprint Enhancement(Yanming Zhu, Xuefei Yin, Jiankun Hu, 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence)
- Data-driven Reconstruction of Fingerprints from Minutiae Maps(A. Makrushin, V. Mannam, B.N Meghana Rao, J. Dittmann, 2022, 2022 IEEE 24th International Workshop on Multimedia Signal Processing (MMSP))
生成模型的性能评估、对比与优化
此类文献侧重于方法论的横向对比(如GAN vs Diffusion)、评估指标(如FID、IS、MS-SSIM)的适用性研究,以及通过分布式训练(DDP)等手段优化指纹生成模型的训练效率。
- Data augmentation-based enhanced fingerprint recognition using deep convolutional generative adversarial network and diffusion models(Yukai Liu, 2024, Applied and Computational Engineering)
- Enhancing Fingerprint Image Synthesis with GANs, Diffusion Models, and Style Transfer Techniques(W. Tang, D. Figueroa, D. Liu, K. Johnsson, A. Sopasakis, 2024, ArXiv)
- Evaluating the Suitability of Inception Score and Fréchet Inception Distance as Metrics for Quality and Diversity in Image Generation(Derrick Adrian Chan, S. Sithungu, 2024, Proceedings of the 2024 7th International Conference on Computational Intelligence and Intelligent Systems)
- Distributed Data Parallel Acceleration-Based Generative Adversarial Network for Fingerprint Generation(Shuguang Xiong, Huitao Zhang, Meng Wang, Ning Zhou, 2021, Innovations in Applied Engineering and Technology)
该组论文展示了指纹生成技术从传统的基于规则和GAN的方法向扩散模型(Diffusion Models)演进的趋势。研究重点已从单纯追求视觉真实性,转向如何通过条件引导(如细节点、传感器类型、质量等级)实现可控生成,并利用生成数据解决隐私保护、数据稀缺以及生物识别系统的安全性评估等核心问题。
总计38篇相关文献
The utilization of synthetic data for fingerprint recognition has garnered increased attention due to its potential to alleviate privacy concerns surrounding sensitive biometric data. However, current methods for generating fingerprints have limitations in creating impressions of the same finger with useful intra-class variations. To tackle this challenge, we present GenPrint, a framework to produce fingerprint images of various types while maintaining identity and offering humanly understandable control over different appearance factors, such as fingerprint class, acquisition type, sensor device, and quality level. Unlike previous fingerprint generation approaches, GenPrint is not confined to replicating style characteristics from the training dataset alone: it enables the generation of novel styles from unseen devices without requiring additional fine-tuning. To accomplish these objectives, we developed GenPrint using latent diffusion models with multimodal conditions (text and image) for consistent generation of style and identity. Our experiments leverage a variety of publicly available datasets for training and evaluation. Results demonstrate the benefits of GenPrint in terms of identity preservation, explainable control, and universality of generated images. Importantly, the GenPrint-generated images yield comparable or even superior accuracy to models trained solely on real data and further enhances performance when augmenting the diversity of existing real fingerprint datasets.
The majority of contemporary fingerprint synthesis is based on the Generative Adversarial Network (GAN). Recently, the Denoising Diffusion Probabilistic Model (DDPM) has been demonstrated to be more effective than GAN in numerous scenarios, particularly in terms of diversity and fidelity. This research develops a model based on the enhanced DDPM for fingerprint generation. Specifically, the image is decomposed into sub-images of varying frequency sub-bands through the use of a wavelet packet transform (WPT). This method enables DDPM to operate at a more local and detailed level, thereby accurately obtaining the characteristics of the data. Furthermore, a polynomial noise schedule has been designed to replace the linear noise strategy, which can result in a smoother noise addition process. Experiments based on multiple metrics on the datasets SOCOFing and NIST4 demonstrate that the proposed model is superior to existing models.
This study explores the generation of synthesized fingerprint images using Denoising Diffusion Probabilistic Models (DDPMs). The significant obstacles in collecting real biometric data, such as privacy concerns and the demand for diverse datasets, underscore the imperative for synthetic biometric alternatives that are both realistic and varied. Despite the strides made with Generative Adversarial Networks (GANs) in producing realistic fingerprint images, their limitations prompt us to propose DDPMs as a promising alternative. DDPMs are capable of generating images with increasing clarity and realism while maintaining diversity. Our results reveal that DiffFinger not only competes with authentic training set data in quality but also provides a richer set of biometric data, reflecting true-to-life variability. These findings mark a promising stride in biometric synthesis, showcasing the potential of DDPMs to advance the landscape of fingerprint identification and authentication systems.
Generating synthetic datasets for training face recognition models is challenging because dataset generation entails more than creating high fidelity images. It involves generating multiple images of same subjects under different factors (e.g., variations in pose, illumination, expression, aging and occlusion) which follows the real image conditional distribution. Previous works have studied the generation of synthetic datasets using GAN or 3D models. In this work, we approach the problem from the aspect of combining subject appearance (ID) and external factor (style) conditions. These two conditions provide a direct way to control the inter-class and intra-class variations. To this end, we propose a Dual Condition Face Generator (DCFace) based on a diffusion model. Our novel Patch-wise style extractor and Time-step dependent ID loss enables DCFace to consistently produce face images of the same subject under different styles with precise control. Face recognition models trained on synthetic images from the proposed DCFace provide higher verification accuracies compared to previous works by 6.11% on average in 4 out of 5 test datasets, LFW, CFP-FP, CPLFW, AgeDB and CALFW. Code Link
Achieving machine autonomy and human control often represent divergent objectives in the design of interactive AI systems. Visual generative foundation models such as Stable Diffusion show promise in navigating these goals, especially when prompted with arbitrary languages. However, they often fall short in generating images with spatial, structural, or geometric controls. The integration of such controls, which can accommodate various visual conditions in a single unified model, remains an unaddressed challenge. In response, we introduce UniControl, a new generative foundation model that consolidates a wide array of controllable condition-to-image (C2I) tasks within a singular framework, while still allowing for arbitrary language prompts. UniControl enables pixel-level-precise image generation, where visual conditions primarily influence the generated structures and language prompts guide the style and context. To equip UniControl with the capacity to handle diverse visual conditions, we augment pretrained text-to-image diffusion models and introduce a task-aware HyperNet to modulate the diffusion models, enabling the adaptation to different C2I tasks simultaneously. Trained on nine unique C2I tasks, UniControl demonstrates impressive zero-shot generation abilities with unseen visual conditions. Experimental results show that UniControl often surpasses the performance of single-task-controlled methods of comparable model sizes. This control versatility positions UniControl as a significant advancement in the realm of controllable visual generation.
Fingerprints have been crucial evidence for law enforcement agencies for a long time. Though the rapidly developing deep learning has dramatically improved the performance of the latent fingerprint recognition algorithm, a fully automated latent fingerprint identification system is still far from meeting actual needs. One major issue is the lack of publicly available latent fingerprint databases. Recently, diffusion probabilistic models have emerged as state-of-the-art deep generative methods for image synthesis. These models have better distribution coverage and less mode collapse than the popular Generative Adversarial Networks. In this paper, we propose an end-to-end latent fingerprint synthetic approach based on the improved denoising diffusion probabilistic model. The proposed approach could simultaneously generate latent, rolled, and plain fingerprints of high visual realism. Several primary degradation factors, such as various background textures, limited area of ridge patterns, and structural noise, can be directly generated without any postprocessing, unlike existing methods. We conduct NFIQ2 and perceptual analysis in the experiments to evaluate the proposed approach. The results indicate that the quality and visual realism of the proposed synthetic fingerprints is similar to the natural ones, demonstrating the effectiveness of our approach.
We present SDXL, a latent diffusion model for text-to-image synthesis. Compared to previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone: The increase of model parameters is mainly due to more attention blocks and a larger cross-attention context as SDXL uses a second text encoder. We design multiple novel conditioning schemes and train SDXL on multiple aspect ratios. We also introduce a refinement model which is used to improve the visual fidelity of samples generated by SDXL using a post-hoc image-to-image technique. We demonstrate that SDXL shows drastically improved performance compared the previous versions of Stable Diffusion and achieves results competitive with those of black-box state-of-the-art image generators. In the spirit of promoting open research and fostering transparency in large model training and evaluation, we provide access to code and model weights at https://github.com/Stability-AI/generative-models
By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image data and beyond. Additionally, their formulation allows for a guiding mechanism to control the image generation process without retraining. However, since these models typically operate directly in pixel space, optimization of powerful DMs often consumes hundreds of GPU days and inference is expensive due to sequential evaluations. To enable DM training on limited computational resources while retaining their quality and flexibility, we apply them in the latent space of powerful pretrained autoencoders. In contrast to previous work, training diffusion models on such a representation allows for the first time to reach a near-optimal point between complexity reduction and detail preservation, greatly boosting visual fidelity. By introducing cross-attention layers into the model architecture, we turn diffusion models into powerful and flexible generators for general conditioning inputs such as text or bounding boxes and high-resolution synthesis becomes possible in a convolutional manner. Our latent diffusion models (LDMs) achieve new state of the art scores for image inpainting and class-conditional image synthesis and highly competitive performance on various tasks, including unconditional image generation, text-to-image synthesis, and super-resolution, while significantly reducing computational requirements compared to pixel-based DMs.
No abstract available
We present novel approaches involving generative adversarial networks and diffusion models in order to synthesize high quality, live and spoof fingerprint images while preserving features such as uniqueness and diversity. We generate live fingerprints from noise with a variety of methods, and we use image translation techniques to translate live fingerprint images to spoof. To generate different types of spoof images based on limited training data we incorporate style transfer techniques through a cycle autoencoder equipped with a Wasserstein metric along with Gradient Penalty (CycleWGAN-GP) in order to avoid mode collapse and instability. We find that when the spoof training data includes distinct spoof characteristics, it leads to improved live-to-spoof translation. We assess the diversity and realism of the generated live fingerprint images mainly through the Fr\'echet Inception Distance (FID) and the False Acceptance Rate (FAR). Our best diffusion model achieved a FID of 15.78. The comparable WGAN-GP model achieved slightly higher FID while performing better in the uniqueness assessment due to a slightly lower FAR when matched against the training data, indicating better creativity. Moreover, we give example images showing that a DDPM model clearly can generate realistic fingerprint images.
The progress of fingerprint recognition applications encounters substantial hurdles due to privacy and security concerns, leading to limited fingerprint data availability and stringent data quality requirements. This article endeavors to tackle the challenges of data scarcity and data quality in fingerprint recognition by implementing data augmentation techniques. Specifically, this research employed two state-of-the-art generative models in the domain of deep learning, namely Deep Convolutional Generative Adversarial Network (DCGAN) and the Diffusion model, for fingerprint data augmentation. Generative Adversarial Network (GAN), as a popular generative model, effectively captures the features of sample images and learns the diversity of the sample images, thereby generating realistic and diverse images. DCGAN, as a variant model of traditional GAN, inherits the advantages of GAN while alleviating issues such as blurry images and mode collapse, resulting in improved performance. On the other hand, Diffusion, as one of the most popular generative models in recent years, exhibits outstanding image generation capabilities and surpasses traditional GAN in some image generation tasks. The experimental results demonstrate that both DCGAN and Diffusion can generate clear, high-quality fingerprint images, fulfilling the requirements of fingerprint data augmentation. Furthermore, through the comparison between DCGAN and Diffusion, it is concluded that the quality of fingerprint images generated by DCGAN is superior to the results of Diffusion, and DCGAN exhibits higher efficiency in both training and generating images compared to Diffusion.
Inpainting Diffusion Synthetic and Data Augment With Feature Keypoints for Tiny Partial Fingerprints
The advancement of fingerprint research within public academic circles has been trailing behind facial recognition, primarily due to the scarcity of extensive publicly available datasets, despite fingerprints being widely used across various domains. Recent progress has seen the application of deep learning techniques to synthesize fingerprints, predominantly focusing on large-area fingerprints within existing datasets. However, with the emergence of AIoT and edge devices, the importance of tiny partial fingerprints has been underscored for their faster and more cost-effective properties. Yet, there remains a lack of publicly accessible datasets for such fingerprints. To address this issue, we introduce publicly available datasets tailored for tiny partial fingerprints. Using advanced generative deep learning, we pioneer diffusion methods for fingerprint synthesis. By combining random sampling with inpainting diffusion guided by feature keypoints masks, we enhance data augmentation while preserving key features, achieving up to 99.1% recognition matching rate. To demonstrate the usefulness of our fingerprint images generated using our approach, we conducted experiments involving model training for various tasks, including denoising, deblurring, and deep forgery detection. The results showed that models trained with our generated datasets outperformed those trained without our datasets or with other synthetic datasets. This indicates that our approach not only produces diverse fingerprints but also improves the model’s generalization capabilities. Furthermore, our approach ensures confidentiality without compromise by partially transforming randomly sampled synthetic fingerprints, which reduces the likelihood of real fingerprints being leaked. The total number of generated fingerprints published in this article amounts to 818,077. Moving forward, we are ongoing updates and releases to contribute to the advancement of the tiny partial fingerprint field. The code and our generated tiny partial fingerprint dataset can be accessed at https://github.com/Hsu0623/Inpainting-Diffusion-Synthetic-and-Data-Augment-with-Feature-Keypoints-for-Tiny-Partial-Fingerprints.git
This study addresses the critical need to evaluate the security of deep learning models in fingerprint recognition systems, by testing their vulnerability to misidentification. While deep learning techniques have significantly advanced biometric authentication, the potential for misclassification and unauthorized access due to synthetic fingerprints has not been thoroughly investigated. To this end, we propose an enhanced Deep Convolutional Generative Adversarial Network (DCGAN) with attention mechanisms to generate realistic synthetic fingerprint images. These images are then used to test the robustness and security of a Siamese Network employed for fingerprint matching. Experimental results demonstrate that the AE-DCGAN model outperforms traditional DCGANs in image quality and precision, achieving higher accuracy in generating realistic fingerprint textures. Additionally, the Siamese Network, when tested with synthetic fingerprints, reveals certain vulnerabilities, highlighting potential risks in security. Grad-CAM visualizations are employed to further understand the model's attention during fingerprint matching, providing insights into how the model focuses on key fingerprint features. The proposed approach aims to investigate both the generation and recognition phases, contributing to improved robustness and reliability in fingerprint-based systems.
Score-based diffusion models (SBDM) have recently emerged as state-of-the-art approaches for image generation. Existing SBDMs are typically formulated in a finite-dimensional setting, where images are considered as tensors of a finite size. This papers develops SBDMs in the infinite-dimensional setting, that is, we model the training data as functions supported on a rectangular domain. Besides the quest for generating images at ever higher resolution our primary motivation is to create a well-posed infinite-dimensional learning problem so that we can discretize it consistently on multiple resolution levels. We thereby hope to obtain diffusion models that generalize across different resolution levels and improve the efficiency of the training process. We demonstrate how to overcome two shortcomings of current SBDM approaches in the infinite-dimensional setting. First, we modify the forward process to ensure that the latent distribution is well-defined in the infinite-dimensional setting using the notion of trace class operators. Second, we illustrate that approximating the score function with an operator network, in our case Fourier neural operators (FNOs), is beneficial for multilevel training. After deriving the forward process in the infinite-dimensional setting and reverse processes for finite approximations, we show their well-posedness, derive adequate discretizations, and investigate the role of the latent distributions. We provide first promising numerical results on two datasets, MNIST and material structures. In particular, we show that multilevel training is feasible within this framework.
Score-based diffusion models are new and powerful tools for image generation. They are based on a forward stochastic process where an image is degraded with additive white noise and optional input scaling. A neural network can be trained to estimate the time-dependent score function, and used to run the reverse-time stochastic process to generate new samples from the training image distribution. However, one issue is that sampling the reverse process requires many passes of the neural network. In this work we present Fourier Diffusion Models which replace the scalar operations of the forward process with linear shift invariant systems and additive spatially-stationary noise. This allows for a model of continuous probability flow from true images to measurements with a specific modulation transfer function (MTF) and noise power spectrum (NPS). We also derive the reverse process for posterior sampling of high-quality images given blurry noisy measurements. We conducted a computational experiment using the Lung Image Database Consortium dataset of chest CT images and simulated low-dose CT measurements with noise and system blur. Our results show that Fourier diffusion models can improve image quality for supervised diffusion posterior sampling relative to existing conditional diffusion models.
Score-based diffusion models have emerged as one of the most promising frameworks for deep generative modelling. In this work we conduct a systematic comparison and theoretical analysis of different approaches to learning conditional probability distributions with score-based diffusion models. In particular, we prove results which provide a theoretical justification for one of the most successful estimators of the conditional score. Moreover, we introduce a multi-speed diffusion framework, which leads to a new estimator for the conditional score, performing on par with previous state-of-the-art approaches. Our theoretical and experimental findings are accompanied by an open source library MSDiff which allows for application and further research of multi-speed diffusion models.
No abstract available
Score-based models are state-of-the-art generative models for image generation. We propose a novel loss namely the Monte Carlo Score Matching (MCSM) loss as an approximation of the original score matching loss. MCSM leverages a Taylor-series expansion of the score function to approximate the expensive calculation involved in computing the trace of the Jacobian of the score function. MCSM is competitive with models trained using the Sliced-Score Matching (SSM) loss. We validate the efficacy of the proposed technique in terms of negative log-likelihood and Fréchet Inception distance (FID) on MNIST and CelebA datasets, respectively. In particular, we show that FID of images generated with models trained using MCSM loss is on par with, and in some cases, better than, sliced score-matching for image generation.
The success of text-to-image (T2I) generation models has spurred a proliferation of numerous model checkpoints fine-tuned from the same base model on various specialized datasets. This overwhelming specialized model production introduces new challenges for high parameter redundancy and huge storage cost, thereby necessitating the development of effective methods to consolidate and unify the capabilities of diverse powerful models into a single one. A common practice in model merging adopts static linear interpolation in the parameter space to achieve the goal of style mixing. However, it neglects the features of T2I generation task that numerous distinct models cover sundry styles which may lead to incompatibility and confusion in the merged model. To address this issue, we introduce a style-promptable image generation pipeline which can accurately generate arbitrary-style images under the control of style vectors. Based on this design, we propose the score distillation based model merging paradigm (DMM), compressing multiple models into a single versatile T2I model. Moreover, we rethink and reformulate the model merging task in the context of T2I generation, by presenting new merging goals and evaluation protocols. Our experiments demonstrate that DMM can compactly reorganize the knowledge from multiple teacher models and achieve controllable arbitrary-style generation.
Variational Autoencoders (VAEs) have gained popularity as one of the main approaches for generating diverse and high-quality synthetic images. This study examines the suitability of evaluation metrics, specifically Inception Score and Fréchet Inception Distance (FID), for assessing these images. Particularly, the study focuses on the generation of synthetic images based on the MNIST handwritten digits dataset. Through the use of VAE-generated MNIST image samples, the study analyses the abovementioned metrics alongside alternative methods that can be used to assess image quality and diversity. The findings made from the study reveal the strengths and limitations of each metric in evaluating image quality and diversity. This paper underscores the need for tailored metrics to enhance the evaluation of generative models, while specifically using the performance of a VAE as the domain of investigation.
This paper introduces innovative solutions to enhance spatial controllability in diffusion models reliant on text queries. We first introduce vision guidance as a foundational spatial cue within the perturbed distribution. This significantly refines the search space in a zero-shot paradigm to focus on the image sampling process adhering to the spatial layout conditions. To precisely control the spatial layouts of multiple visual concepts with the employment of vision guidance, we propose a universal framework, Layered Rendering Diffusion (LRDiff), which constructs an image-rendering process with multiple layers, each of which applies the vision guidance to instructively estimate the denoising direction for a single object. Such a layered rendering strategy effectively prevents issues like unintended conceptual blending or mismatches while allowing for more coherent and contextually accurate image synthesis. The proposed method offers a more efficient and accurate means of synthesising images that align with specific layout and contextual requirements. Through experiments, we demonstrate that our method outperforms existing techniques, both quantitatively and qualitatively, in two specific layout-to-image tasks: bounding box-to-image and instance maskto-image. Furthermore, we extend the proposed framework to enable spatially controllable editing
Nowadays, biometrics is becoming more and more present in our everyday lives. They are used in ID documents, border controls, authentication, and e-payment, etc. Therefore, ensuring the security of biometric systems has become a major concern. The certification process aims at qualifying the behavior of a biometric system and verifying its conformity to international specifications. It involves the evaluation of the system performance and its robustness to attacks. Anti-spoofing tests require the creation of physical presentation attack instruments (PAIs), which are used to evaluate the robustness of biometric systems against spoofing through multiple attempts of testing on the device. In this article, we propose a new solution based on deep learning to generate synthetic fingerprint spoof images from a small dataset of real-life images acquired by a specific sensor. We artificially modify these images to simulate how they would appear if generated from known spoof materials usually involved in fingerprint spoofing tests. Experiments on LivDet datasets show first, that synthetic fingerprint spoof images give similar performance to real-life ones from a matching point of view only and second, that injection attacks succeed 50% of the time for most of the materials we tested.
Fingerprint recognition, a widely adopted technology in various domains, confronts significant challenges in both technical and practical realms. Therefore, it is essential to develop an effective algorithm to improve the accuracy of fingerprint recognition. In this study, a novel method is proposed to generate fake fingerprint images by Deep Convolution Generative Adversarial Networks (DCGAN) network, utilizing FVC2002 and FVC2004 as the dataset. Furthermore, a dataset consisting of 6, 000 fingerprint pairs, with the first 3, 000 pairs collected from the same individual and the remaining 3, 000 pairs from different individuals, is employed to train a Siamese Network. Finally, the real fingerprint image and the generated fingerprint image are used as two inputs to the Siamese Network to verify whether any two fingerprint images can be incorrectly matched. Experimental results indicate that DCGAN has an excellent ability to generate fingerprint images, although a small portion of the generated images have defects, which may be caused by the model training the blank part of the images as important features, etc. Additionally, the security verification experiment employing the Siamese Network reveals potential vulnerabilities in the fingerprint recognition system, possibly stemming from the network’s focus on localized similarities between the two input fingerprint images.
In this paper, we introduce DC (Decouple)-ControlNet, a highly flexible and precisely controllable framework for multi-condition image generation. The core idea behind DC-ControlNet is to decouple control conditions, transforming global control into a hierarchical system that integrates distinct elements, contents, and layouts. This enables users to mix these individual conditions with greater flexibility, leading to more efficient and accurate image generation control. Previous ControlNet-based models rely solely on global conditions, which affect the entire image and lack the ability of element- or region-specific control. This limitation reduces flexibility and can cause condition misunderstandings in multi-conditional image generation. To address these challenges, we propose both intra-element and Inter-element Controllers in DC-ControlNet. The Intra-Element Controller handles different types of control signals within individual elements, accurately describing the content and layout characteristics of the object. For interactions between elements, we introduce the Inter-Element Controller, which accurately handles multi-element interactions and occlusion based on user-defined relationships. Extensive evaluations show that DC-ControlNet significantly outperforms existing ControlNet models and Layout-to-Image generative models in terms of control flexibility and precision in multi-condition control. Our project website is available at: https://um-lab.github.io/DC-ControlNet/
Latent fingerprint enhancement is an essential preprocessing step for latent fingerprint identification. Most latent fingerprint enhancement methods try to restore corrupted gray ridges/valleys. In this paper, we propose a new method that formulates latent fingerprint enhancement as a constrained fingerprint generation problem within a generative adversarial network (GAN) framework. We name the proposed network FingerGAN. It can enforce its generated fingerprint (i.e, enhanced latent fingerprint) indistinguishable from the corresponding ground truth instance in terms of the fingerprint skeleton map weighted by minutia locations and the orientation field regularized by the FOMFE model. Because minutia is the primary feature for fingerprint recognition and minutia can be retrieved directly from the fingerprint skeleton map, we offer a holistic framework that can perform latent fingerprint enhancement in the context of directly optimizing minutia information. This will help improve latent fingerprint identification performance significantly. Experimental results on two public latent fingerprint databases demonstrate that our method outperforms the state of the arts significantly. The codes will be available for non-commercial purposes from https://github.com/HubYZ/LatentEnhancement.
ControlNet offers a powerful way to guide diffusion-based generative models, yet most implementations rely on ad-hoc heuristics to choose which network blocks to control-an approach that varies unpredictably with different tasks. To address this gap, we propose FlexControl, a novel framework that copies all diffusion blocks during training and employs a trainable gating mechanism to dynamically select which blocks to activate at each denoising step. With introducing a computation-aware loss, we can encourage control blocks only to activate when it benefit the generation quality. By eliminating manual block selection, FlexControl enhances adaptability across diverse tasks and streamlines the design pipeline, with computation-aware training loss in an end-to-end training manner. Through comprehensive experiments on both UNet (e.g., SD1.5) and DiT (e.g., SD3.0), we show that our method outperforms existing ControlNet variants in certain key aspects of interest. As evidenced by both quantitative and qualitative evaluations, FlexControl preserves or enhances image fidelity while also reducing computational overhead by selectively activating the most relevant blocks. These results underscore the potential of a flexible, data-driven approach for controlled diffusion and open new avenues for efficient generative model design. The code will soon be available at https://github.com/Anonymousuuser/FlexControl.
Generative Models are a valuable tool for the controlled creation of high-quality image data. Controlled diffusion models like the ControlNet have allowed the creation of labeled distributions. Such synthetic datasets can augment the original training distribution when discriminative models, like semantic segmentation, are trained. However, this augmentation effect is limited since ControlNets tend to re-produce the original training distribution. This work introduces a method to utilize data from unlabeled domains to train ControlNets by introducing the concept of uncertainty into the control mechanism. The uncertainty indicates that a given image was not part of the training distribution of a downstream task, e.g., segmentation. Thus, two types of control are engaged in the final network: an uncertainty control from an unlabeled dataset and a semantic control from the labeled dataset. The resulting ControlNet allows us to create annotated data with high uncertainty from the target domain, i.e., synthetic data from the unlabeled distribution with labels. In our scenario, we consider retinal OCTs, where typically high-quality Spectralis images are available with given ground truth segmentations, enabling the training of segmentation networks. The recent development in Home-OCT devices, however, yields retinal OCTs with lower quality and a large domain shift, such that out-of-the-pocket segmentation networks cannot be applied for this type of data. Synthesizing annotated images from the Home-OCT domain using the proposed approach closes this gap and leads to significantly improved segmentation results without adding any further supervision. The advantage of uncertainty-guidance becomes obvious when compared to style transfer: it enables arbitrary domain shifts without any strict learning of an image style. This is also demonstrated in a traffic scene experiment. The implementation is available: https://github.com/JNiemeijer/UnAICorN.git
Fingerprint recognition research faces significant challenges due to the limited availability of extensive and publicly available fingerprint databases. Existing databases lack a sufficient number of identities and fingerprint impressions, which hinders progress in areas such as Fingerprint-based access control. To address this challenge, we present Vikriti-ID, a synthetic fingerprint generator capable of generating unique fingerprints with multiple impressions. Using Vikriti-ID, we generated a large database containing 500000 unique fingerprints, each with 10 associated impressions. We then demonstrate the effectiveness of the database generated by Vikriti-ID by evaluating it for imposter-genuine score distribution and EER score. Apart from this we also trained a deep network to check the usability of data. We trained the network inspired from [13], on both Vikriti-ID generated data as well as public data. This generated data achieved an Equal Error Rate(EER) of 0.16%, AUC of 0.89%. This improvement is possible due to the limitations of existing publicly available data sets, which struggle in numbers or multiple impressions.
Layout-to-Image Generation with Localized Descriptions using ControlNet with Cross-Attention Control
While text-to-image diffusion models can generate highquality images from textual descriptions, they generally lack fine-grained control over the visual composition of the generated images. Some recent works tackle this problem by training the model to condition the generation process on additional input describing the desired image layout. Arguably the most popular among such methods, ControlNet, enables a high degree of control over the generated image using various types of conditioning inputs (e.g. segmentation maps). However, it still lacks the ability to take into account localized textual descriptions that indicate which image region is described by which phrase in the prompt. In this work, we show the limitations of ControlNet for the layout-to-image task and enable it to use localized descriptions using a training-free approach that modifies the crossattention scores during generation. We adapt and investigate several existing cross-attention control methods in the context of ControlNet and identify shortcomings that cause failure (concept bleeding) or image degradation under specific conditions. To address these shortcomings, we develop a novel cross-attention manipulation method in order to maintain image quality while improving control. Qualitative and quantitative experimental studies focusing on challenging cases are presented, demonstrating the effectiveness of the investigated general approach, and showing the improvements obtained by the proposed cross-attention control method.
No abstract available
In the expanding landscape of artificial intelligence, scaling model training to accommodate larger and more intricate neural networks and datasets is imperative. This study addresses the scaling issue by employing Distributed Data Parallel (DDP) frameworks to enhance the training of deep learning models, specifically focusing on the generation of synthetic fingerprints. Utilizing DDP enables efficient management of vast datasets essential for training generative models, ensuring comprehensive coverage of the variability inherent in fingerprints. Moreover, the application of DDP in fingerprint generation not only expedites the training process but also enhances data security by distributing computation across multiple nodes. The effectiveness of DDP is demonstrated through substantial improvements in training efficiency, as evidenced by reduced training times and balanced Graphics Processing Unit (GPU) utilization rates. However, the study reveals challenges in GPU underutilization with larger batch sizes, indicating opportunities for optimizing resource allocation. Advances in Deep Convolutional Generative Adversarial Network (DCGAN) architecture is also discussed, highlighting the model's capability to create realistic synthetic fingerprints and suggesting a future focus on algorithmic adaptability and network sophistication.
In this work, we utilize progressive growth-based Generative Adversarial Networks (GANs) to develop the Clarkson Fingerprint Generator (CFG). We demonstrate that the CFG is capable of generating realistic, high fidelity, $512 \times 512$ pixels, full, plain impression fingerprints. Our results suggest that the fingerprints generated by the CFG are unique, diverse, and resemble the training dataset in terms of minutiae configuration and quality, while not revealing the underlying identities of the training data. We make the pre-trained CFG model and the synthetically generated dataset publicly available at https://github.com/keivanB/Clarkson_Finger_Gen
No abstract available
Fingerprints recognition (FPR) is considered as a secure and efficient way of personal identification verification. However, research of fake fingerprints generation by generative adversarial networks (GAN) disclosed a potential vulnerability of partial FPR systems. In this paper, we proposed two networks based on deep convolutional generative adversarial networks (DCGAN) to improve the performance of DCGAN by enhancing the performance of the discriminator. More specifically, the structure of discriminators in our network is modified by two approximate typical convolutional neural networks, AlexNet and VGG11. (We abbreviated them to Alex-GAN and VGG-GAN respectively) We evaluated the output of generators in our models by cosine similarity index every 10 epochs during training, which monitored the performance of the generator and was convenient for comparing two models. The results of cosine similarity and texture analysis show that VGG-GAN performs better than Alex-GAN. Although our models are unstable and unbalanced in discriminator and generator, we acquired notable results unexpectedly which demonstrate the competitiveness of our model.
Generating fingerprint images for biometric purposes is both necessary and challenging. In this study, we presented a fingerprint generation approach based on generative adversarial network. To ensure GAN training stability, we have introduced conditional loss doping that allows a continuous flow of gradients. Our study utilizes a careful combination of a residual network and spectral normalization to generate fingerprints. The proposed average residual connection shows more immunity against vanishing gradients than a simple residual connection. Spectral normalization allows our network to enjoy reduced variance in weight generation, which further stabilizes the training. Proposed scheme uses spectral bounding only in the input and the fully connected layers. Our network synthesized fingerprints up to 256 by 256 in size. We used the multi-scale structural similarity (MS-SSIM) metric for measuring the diversity of the generated samples. Our model has achieved 0.23 MS-SSIM scores for the generated fingerprints. The MS-SSIM score indicates that the proposed scheme is more likely to produce more diverse images and less likely to face mode collapse.
Today's legal restrictions that protect the privacy of biometric data are hampering fingerprint recognition researches. For instance, all high-resolution fingerprint databases ceased to be publicly available. To address this problem, we present a novel hybrid approach to synthesize realistic, high-resolution fingerprints. First, we improved Anguli, a handcrafted fingerprint generator, to obtain dynamic ridge maps with sweat pores and scratches. Then, we trained a CycleGAN to transform these maps into realistic fingerprints. Unlike other CNN-based works, we can generate several images for the same identity. We used our approach to create a synthetic database with 7400 images in an attempt to propel further studies in this field without raising legal issues. We included sweat pore annotations in 740 images to encourage research developments in pore detection. In our experiments, we employed two fingerprint matching approaches to confirm that real and synthetic databases have similar performance. We conducted a human perception analysis where sixty volunteers could hardly differ between real and synthesized fingerprints. Given that we also favorably compare our results with the most advanced works in the literature, our experimentation suggests that our approach is the new state-of-the-art.
Authentication and identification methods based on human fingerprints are ubiquitous in several systems ranging from government organizations to consumer products. The performance and reliability of such systems directly rely on the volume of data on which they have been verified. Unfortunately, a large volume of fingerprint databases is not publicly available due to many privacy and security concerns. In this paper, we introduce a new approach to automatically generate high-fidelity synthetic fingerprints at scale. Our approach relies on (i) Generative Adversarial Networks to estimate the probability distribution of human fingerprints and (ii) Super-Resolution methods to synthesize fine-grained textures. We rigorously test our system and show that our methodology is the first to generate fingerprints that are computationally indistinguishable from real ones, a task that prior art could not accomplish.
In this paper we explore the power of conditional generative adversarial networks and in particular of the pix2pix network to reconstruct realistic fingerprint patterns from minutiae maps. In our considerations a minutiae map is a grayscale image that encodes minutiae locations and orientations as these are presented in a minutiae template. We propose a novel approach for minutiae encoding in a minutiae map and study to which degree the reconstruction may be successful if trained with a low number of samples. Moreover, we explore the generalization ability of the trained models in cross-dataset and cross-sensor experiments. Reconstruction from pseudo-random minutiae enables synthesis of anonymous fingerprints as well as controlling the diversity of generated samples including synthesis of mated fingerprints which is vital for compilation of large-scale public evaluation datasets.
该组论文展示了指纹生成技术从传统的基于规则和GAN的方法向扩散模型(Diffusion Models)演进的趋势。研究重点已从单纯追求视觉真实性,转向如何通过条件引导(如细节点、传感器类型、质量等级)实现可控生成,并利用生成数据解决隐私保护、数据稀缺以及生物识别系统的安全性评估等核心问题。