扩散模型用于指纹生成
基于扩散模型的指纹图像合成与数据增强
这类文献直接探讨了利用去噪扩散概率模型(DDPM)或其变体进行指纹图像的生成。研究重点在于如何利用扩散模型的高保真度和多样性来克服传统GAN模型的模式崩溃问题,并应用于潜指纹生成、小面积/残缺指纹合成以及数据增强,以解决指纹识别领域公共数据集匮乏和隐私保护的问题。
- Diffusion Probabilistic Model Based End-to-End Latent Fingerprint Synthesis(Kejian Li, Xiao Yang, 2023, 2023 IEEE 4th International Conference on Pattern Recognition and Machine Learning (PRML))
- DENOISING DIFFUSION PROBABILISTIC MODEL WITH WAVELET PACKET TRANSFORM FOR FINGERPRINT GENERATION(Li Chen, Yong Chan, 2024, Jordanian Journal of Computers and Information Technology)
- Inpainting Diffusion Synthetic and Data Augment With Feature Keypoints for Tiny Partial Fingerprints(Mao-Hsiu Hsu, Yung-Ching Hsu, Ching-Te Chiu, 2025, IEEE Transactions on Biometrics, Behavior, and Identity Science)
- Data augmentation-based enhanced fingerprint recognition using deep convolutional generative adversarial network and diffusion models(Yukai Liu, 2024, Applied and Computational Engineering)
- DiffFinger: Advancing Synthetic Fingerprint Generation through Denoising Diffusion Probabilistic Models(Fred M. Grabovski, Lior Yasur, Yaniv Hacmon, Lior Nisimov, Stav Nimrod, 2024, ArXiv)
- Enhancing Fingerprint Image Synthesis with GANs, Diffusion Models, and Style Transfer Techniques(W. Tang, D. Figueroa, D. Liu, K. Johnsson, A. Sopasakis, 2024, ArXiv)
- Fingerprint Synthesis from Diffusion Models and Generative Adversarial Networks(Weizhong Tang, Diego Andre Figueroa Llamosas, Donglin Liu, K. Johnsson, A. Sopasakis, 2025, No journal)
- Privacy-preserving synthetic fingerprint generation with pore-level details(Ritika Dhaneshwar, Mandeep Kaur, Manvjeet Kaur, 2025, Multimedia Tools and Applications)
可控性与多模态条件的扩散生成技术
这类文献关注于如何在扩散模型中引入多模态条件(如文本、图像、掩码或结构控制)来实现精准的可控生成。这对于指纹生成尤为重要,因为需要根据特定的传感器类型、质量等级或类别标签来定制化合成指纹,同时解决多模态输入时的模态偏置和协调问题。
- Utilizing Greedy Nature for Multimodal Conditional Image Synthesis in Transformers(Sitong Su, Junchen Zhu, Lianli Gao, Jingkuan Song, 2024, IEEE Transactions on Multimedia)
- Universal Fingerprint Generation: Controllable Diffusion Model With Multimodal Conditions(Steven A. Grosz, Anil K. Jain, 2024, IEEE Transactions on Pattern Analysis and Machine Intelligence)
- MMoT: Mixture-of-Modality-Tokens Transformer for Composed Multimodal Conditional Image Synthesis(Jinsheng Zheng, Daqing Liu, Chaoyue Wang, Minghui Hu, Zuopeng Yang, Changxing Ding, Dacheng Tao, 2023, International Journal of Computer Vision)
- UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild(Can Qin, Shu Zhang, Ning Yu, Yihao Feng, Xinyi Yang, Yingbo Zhou, Haiquan Wang, Juan Carlos Niebles, Caiming Xiong, S. Savarese, Stefano Ermon, Yun Fu, Ran Xu, 2023, ArXiv)
- Multimodal Conditional Image Synthesis with Product-of-Experts GANs(Xun Huang, Arun Mallya, Ting-Chun Wang, Ming-Yu Liu, 2021, No journal)
身份特征保持与个性化生成方法
虽然部分文献以人像生成为背景,但其核心技术——如何在扩散生成过程中保持身份(Identity)的一致性,是扩散模型用于指纹生成的关键环节。这些研究通过双重条件控制、参考图像嵌入或上下文匹配技术,确保生成的生物特征图像在不同风格下仍具有相同的身份属性。
- IC-Portrait: In-Context Matching for View-Consistent Personalized Portrait(Han Yang, Enis Simsar, Sotiris Anagnostidis, Yanlong Zang, Thomas Hofmann, Ziwei Liu, 2025, ArXiv)
- Advancements in Text-to-Image Diffusion Models for Personalized Image Generation: A Review of ID-Preserving Techniques of InstantID(Ardhiansyah Kurniawan, 2024, IC-ITECHS)
- DCFace: Synthetic Face Generation with Dual Condition Diffusion Model(Minchul Kim, Feng Liu, Anil Jain, Xiaoming Liu, 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))
- MegaPortrait: Revisiting Diffusion Control for High-fidelity Portrait Generation(Han Yang, Sotiris Anagnostidis, Enis Simsar, Thomas Hofmann, 2024, ArXiv)
指纹领域知识建模与合成质量评估
这类文献侧重于指纹的底层物理/数学模型(如脊线计数、细节点建模)以及合成指纹的评估方法。这些研究为扩散模型提供了必要的领域先验知识,帮助模型生成符合生物识别逻辑的图像,并提供了衡量合成数据在搜索和识别任务中有效性的基准。
- Fingerprint Model Based on Fingerprint Image Topology and Ridge Count Values(V. Gudkov, Daria Lepikhova, 2018, 2018 Global Smart Industry Conference (GloSIC))
- Machine Learning for Fingerprint Ridge Counting(Alexandre Nadolni Bonacim, R. Minetto, Maurício Pamplona Segundo, 2026, IEEE Transactions on Biometrics, Behavior, and Identity Science)
- Fingerprint Synthesis: Evaluating Fingerprint Search at Scale(Kai Cao, Anil K. Jain, 2018, 2018 International Conference on Biometrics (ICB))
- RRInf: Efficient Influence Function Estimation via Ridge Regression for Large Language Models and Text-to-Image Diffusion Models(Zhuozhuo Tu, Cheng Chen, Yuxuan Du, 2025, No journal)
本组文献展示了扩散模型在指纹生成领域的全面演进。研究方向从最初的端到端高质量指纹图像合成(利用DDPM提升多样性与真实感),逐步深入到结合领域知识(如脊线计数、小波变换)的精细化建模。同时,借鉴了计算机视觉领域先进的多模态控制和身份保持技术,实现了可控、可定制且具备身份一致性的指纹合成系统,旨在通过高质量的合成数据解决生物识别中的数据稀缺、隐私保护和模型泛化性挑战。
总计21篇相关文献
The utilization of synthetic data for fingerprint recognition has garnered increased attention due to its potential to alleviate privacy concerns surrounding sensitive biometric data. However, current methods for generating fingerprints have limitations in creating impressions of the same finger with useful intra-class variations. To tackle this challenge, we present GenPrint, a framework to produce fingerprint images of various types while maintaining identity and offering humanly understandable control over different appearance factors, such as fingerprint class, acquisition type, sensor device, and quality level. Unlike previous fingerprint generation approaches, GenPrint is not confined to replicating style characteristics from the training dataset alone: it enables the generation of novel styles from unseen devices without requiring additional fine-tuning. To accomplish these objectives, we developed GenPrint using latent diffusion models with multimodal conditions (text and image) for consistent generation of style and identity. Our experiments leverage a variety of publicly available datasets for training and evaluation. Results demonstrate the benefits of GenPrint in terms of identity preservation, explainable control, and universality of generated images. Importantly, the GenPrint-generated images yield comparable or even superior accuracy to models trained solely on real data and further enhances performance when augmenting the diversity of existing real fingerprint datasets.
The majority of contemporary fingerprint synthesis is based on the Generative Adversarial Network (GAN). Recently, the Denoising Diffusion Probabilistic Model (DDPM) has been demonstrated to be more effective than GAN in numerous scenarios, particularly in terms of diversity and fidelity. This research develops a model based on the enhanced DDPM for fingerprint generation. Specifically, the image is decomposed into sub-images of varying frequency sub-bands through the use of a wavelet packet transform (WPT). This method enables DDPM to operate at a more local and detailed level, thereby accurately obtaining the characteristics of the data. Furthermore, a polynomial noise schedule has been designed to replace the linear noise strategy, which can result in a smoother noise addition process. Experiments based on multiple metrics on the datasets SOCOFing and NIST4 demonstrate that the proposed model is superior to existing models.
This study explores the generation of synthesized fingerprint images using Denoising Diffusion Probabilistic Models (DDPMs). The significant obstacles in collecting real biometric data, such as privacy concerns and the demand for diverse datasets, underscore the imperative for synthetic biometric alternatives that are both realistic and varied. Despite the strides made with Generative Adversarial Networks (GANs) in producing realistic fingerprint images, their limitations prompt us to propose DDPMs as a promising alternative. DDPMs are capable of generating images with increasing clarity and realism while maintaining diversity. Our results reveal that DiffFinger not only competes with authentic training set data in quality but also provides a richer set of biometric data, reflecting true-to-life variability. These findings mark a promising stride in biometric synthesis, showcasing the potential of DDPMs to advance the landscape of fingerprint identification and authentication systems.
Achieving machine autonomy and human control often represent divergent objectives in the design of interactive AI systems. Visual generative foundation models such as Stable Diffusion show promise in navigating these goals, especially when prompted with arbitrary languages. However, they often fall short in generating images with spatial, structural, or geometric controls. The integration of such controls, which can accommodate various visual conditions in a single unified model, remains an unaddressed challenge. In response, we introduce UniControl, a new generative foundation model that consolidates a wide array of controllable condition-to-image (C2I) tasks within a singular framework, while still allowing for arbitrary language prompts. UniControl enables pixel-level-precise image generation, where visual conditions primarily influence the generated structures and language prompts guide the style and context. To equip UniControl with the capacity to handle diverse visual conditions, we augment pretrained text-to-image diffusion models and introduce a task-aware HyperNet to modulate the diffusion models, enabling the adaptation to different C2I tasks simultaneously. Trained on nine unique C2I tasks, UniControl demonstrates impressive zero-shot generation abilities with unseen visual conditions. Experimental results show that UniControl often surpasses the performance of single-task-controlled methods of comparable model sizes. This control versatility positions UniControl as a significant advancement in the realm of controllable visual generation.
Generating synthetic datasets for training face recognition models is challenging because dataset generation entails more than creating high fidelity images. It involves generating multiple images of same subjects under different factors (e.g., variations in pose, illumination, expression, aging and occlusion) which follows the real image conditional distribution. Previous works have studied the generation of synthetic datasets using GAN or 3D models. In this work, we approach the problem from the aspect of combining subject appearance (ID) and external factor (style) conditions. These two conditions provide a direct way to control the inter-class and intra-class variations. To this end, we propose a Dual Condition Face Generator (DCFace) based on a diffusion model. Our novel Patch-wise style extractor and Time-step dependent ID loss enables DCFace to consistently produce face images of the same subject under different styles with precise control. Face recognition models trained on synthetic images from the proposed DCFace provide higher verification accuracies compared to previous works by 6.11% on average in 4 out of 5 test datasets, LFW, CFP-FP, CPLFW, AgeDB and CALFW. Code Link
Fingerprints have been crucial evidence for law enforcement agencies for a long time. Though the rapidly developing deep learning has dramatically improved the performance of the latent fingerprint recognition algorithm, a fully automated latent fingerprint identification system is still far from meeting actual needs. One major issue is the lack of publicly available latent fingerprint databases. Recently, diffusion probabilistic models have emerged as state-of-the-art deep generative methods for image synthesis. These models have better distribution coverage and less mode collapse than the popular Generative Adversarial Networks. In this paper, we propose an end-to-end latent fingerprint synthetic approach based on the improved denoising diffusion probabilistic model. The proposed approach could simultaneously generate latent, rolled, and plain fingerprints of high visual realism. Several primary degradation factors, such as various background textures, limited area of ridge patterns, and structural noise, can be directly generated without any postprocessing, unlike existing methods. We conduct NFIQ2 and perceptual analysis in the experiments to evaluate the proposed approach. The results indicate that the quality and visual realism of the proposed synthetic fingerprints is similar to the natural ones, demonstrating the effectiveness of our approach.
Inpainting Diffusion Synthetic and Data Augment With Feature Keypoints for Tiny Partial Fingerprints
The advancement of fingerprint research within public academic circles has been trailing behind facial recognition, primarily due to the scarcity of extensive publicly available datasets, despite fingerprints being widely used across various domains. Recent progress has seen the application of deep learning techniques to synthesize fingerprints, predominantly focusing on large-area fingerprints within existing datasets. However, with the emergence of AIoT and edge devices, the importance of tiny partial fingerprints has been underscored for their faster and more cost-effective properties. Yet, there remains a lack of publicly accessible datasets for such fingerprints. To address this issue, we introduce publicly available datasets tailored for tiny partial fingerprints. Using advanced generative deep learning, we pioneer diffusion methods for fingerprint synthesis. By combining random sampling with inpainting diffusion guided by feature keypoints masks, we enhance data augmentation while preserving key features, achieving up to 99.1% recognition matching rate. To demonstrate the usefulness of our fingerprint images generated using our approach, we conducted experiments involving model training for various tasks, including denoising, deblurring, and deep forgery detection. The results showed that models trained with our generated datasets outperformed those trained without our datasets or with other synthetic datasets. This indicates that our approach not only produces diverse fingerprints but also improves the model’s generalization capabilities. Furthermore, our approach ensures confidentiality without compromise by partially transforming randomly sampled synthetic fingerprints, which reduces the likelihood of real fingerprints being leaked. The total number of generated fingerprints published in this article amounts to 818,077. Moving forward, we are ongoing updates and releases to contribute to the advancement of the tiny partial fingerprint field. The code and our generated tiny partial fingerprint dataset can be accessed at https://github.com/Hsu0623/Inpainting-Diffusion-Synthetic-and-Data-Augment-with-Feature-Keypoints-for-Tiny-Partial-Fingerprints.git
No abstract available
No abstract available
We present novel approaches involving generative adversarial networks and diffusion models in order to synthesize high quality, live and spoof fingerprint images while preserving features such as uniqueness and diversity. We generate live fingerprints from noise with a variety of methods, and we use image translation techniques to translate live fingerprint images to spoof. To generate different types of spoof images based on limited training data we incorporate style transfer techniques through a cycle autoencoder equipped with a Wasserstein metric along with Gradient Penalty (CycleWGAN-GP) in order to avoid mode collapse and instability. We find that when the spoof training data includes distinct spoof characteristics, it leads to improved live-to-spoof translation. We assess the diversity and realism of the generated live fingerprint images mainly through the Fr\'echet Inception Distance (FID) and the False Acceptance Rate (FAR). Our best diffusion model achieved a FID of 15.78. The comparable WGAN-GP model achieved slightly higher FID while performing better in the uniqueness assessment due to a slightly lower FAR when matched against the training data, indicating better creativity. Moreover, we give example images showing that a DDPM model clearly can generate realistic fingerprint images.
No abstract available
The progress of fingerprint recognition applications encounters substantial hurdles due to privacy and security concerns, leading to limited fingerprint data availability and stringent data quality requirements. This article endeavors to tackle the challenges of data scarcity and data quality in fingerprint recognition by implementing data augmentation techniques. Specifically, this research employed two state-of-the-art generative models in the domain of deep learning, namely Deep Convolutional Generative Adversarial Network (DCGAN) and the Diffusion model, for fingerprint data augmentation. Generative Adversarial Network (GAN), as a popular generative model, effectively captures the features of sample images and learns the diversity of the sample images, thereby generating realistic and diverse images. DCGAN, as a variant model of traditional GAN, inherits the advantages of GAN while alleviating issues such as blurry images and mode collapse, resulting in improved performance. On the other hand, Diffusion, as one of the most popular generative models in recent years, exhibits outstanding image generation capabilities and surpasses traditional GAN in some image generation tasks. The experimental results demonstrate that both DCGAN and Diffusion can generate clear, high-quality fingerprint images, fulfilling the requirements of fingerprint data augmentation. Furthermore, through the comparison between DCGAN and Diffusion, it is concluded that the quality of fingerprint images generated by DCGAN is superior to the results of Diffusion, and DCGAN exhibits higher efficiency in both training and generating images compared to Diffusion.
The evolution of text-to-image diffusion models, such as GLIDE, DALL-E 2, and Stable Diffusion, has significantly enhanced image generation capabilities. However, achieving image personalization with precise facial detail retention, minimal reference images, and reduced computational costs remains challenging. Traditional methods like DreamBooth and Textual Inversion rely on extensive fine-tuning, while techniques like IP-Adapter, which avoid fine-tuning, often compromise accuracy. Addressing these gaps, InstantID introduces a novel plug-and-play module that uses a single reference image to enable efficient identity preservation with high fidelity and flexibility. InstantID departs from conventional approaches by employing ID Embedding and an Image Adapter to enhance semantic richness and facial detail fidelity. Unlike models relying on CLIP-based visual prompts, InstantID integrates ID Embedding with ControlNet to refine the cross-attention process. This involves using simplified facial keypoints for conditional input and replacing text prompts with ID Embedding. Trained on a large-scale dataset comprising LAION-Face and additional high-quality annotated images, InstantID demonstrates superior ID retention and facial detail restoration. Notably, its performance improves with multiple reference images but remains highly effective with just one. The results highlight the effectiveness of InstantID's modular components, such as IdentityNet and the Image Adapter, in ensuring exceptional generation quality and detail retention. Although currently optimized for SDXL checkpoints, InstantID offers a scalable and efficient solution for personalized image generation. By integrating with tools like ComfyUI, it provides a seamless and accessible approach to image personalization with strong ID control and adaptability.
We propose MegaPortrait. It's an innovative system for creating personalized portrait images in computer vision. It has three modules: Identity Net, Shading Net, and Harmonization Net. Identity Net generates learned identity using a customized model fine-tuned with source images. Shading Net re-renders portraits using extracted representations. Harmonization Net fuses pasted faces and the reference image's body for coherent results. Our approach with off-the-shelf Controlnets is better than state-of-the-art AI portrait products in identity preservation and image fidelity. MegaPortrait has a simple but effective design and we compare it with other methods and products to show its superiority.
Existing diffusion models show great potential for identity-preserving generation. However, personalized portrait generation remains challenging due to the diversity in user profiles, including variations in appearance and lighting conditions. To address these challenges, we propose IC-Portrait, a novel framework designed to accurately encode individual identities for personalized portrait generation. Our key insight is that pre-trained diffusion models are fast learners (e.g.,100 ~ 200 steps) for in-context dense correspondence matching, which motivates the two major designs of our IC-Portrait framework. Specifically, we reformulate portrait generation into two sub-tasks: 1) Lighting-Aware Stitching: we find that masking a high proportion of the input image, e.g., 80%, yields a highly effective self-supervisory representation learning of reference image lighting. 2) View-Consistent Adaptation: we leverage a synthetic view-consistent profile dataset to learn the in-context correspondence. The reference profile can then be warped into arbitrary poses for strong spatial-aligned view conditioning. Coupling these two designs by simply concatenating latents to form ControlNet-like supervision and modeling, enables us to significantly enhance the identity preservation fidelity and stability. Extensive evaluations demonstrate that IC-Portrait consistently outperforms existing state-of-the-art methods both quantitatively and qualitatively, with particularly notable improvements in visual qualities. Furthermore, IC-Portrait even demonstrates 3D-aware relighting capabilities.
Ridge counting is an important feature in standardized fingerprint templates, supporting interoperability, improving matching accuracy, and providing a transparent and interpretable measure to validate automated decisions. However, traditional ridge counting techniques that rely on binarization and skeletonization are prone to artifacts, limiting their accuracy and robustness. In this study, we propose and analyze machine learning methods for ridge counting based on different architectures paradigms such as fully connect residual networks, convolutional and Transformer techniques, which operate directly on raw grayscale fingerprint images. As part of this study, we introduce a benchmark dataset comprising 23,724 of manually annotated ridge counts from 50 subjects. Our experiments show that convolutional and Transformer-based models achieved the highest accuracy, with exact ridge count match rates of 96.6% and 95.6%, respectively, outperforming classical techniques and a commercial solution top ranked in the National Institute of Standards and Technology (NIST) Minutiae Interoperability Exchange (MINEX) III evaluation. Additionally, we investigate the influence of ridge counting on fingerprint matching performance using the Fingerprint Verification Competition 2002 DB1 A (FVC2002) benchmark, with additional cross-dataset tests on Fingerprint Verification Competition 2004 DB1 A (FVC2004) and NIST Special Database 301A (SD301). The source codes, ridge count dataset, and trained models are available on https://github.com/Bonacim/ridge-count.
In this paper the authors propose a mathematical model of fingerprint images based on the vectors of ridge count values. The model takes into account the instability of the values of the ridge count in the region of the lines curvature and possible mutations of minutiae. This increases the stability of fingerprint template and reliability of identification. The vectors of ridge count are formed using topological descriptors, which are constructed for the neighborhood of all fingerprint minutiae. In the proposed model, a template of a fingerprint image keeps the list of minutiae, the list of topological vectors and the list of the ridge count vectors. The value of the ridge count can be represented as a fractional number.
No abstract available
Multimodal Conditional Image Synthesis(MCIS) aims to generate images according to different modalities input and their combination, which allows users to describe their requirements in complementary ways, e.g. segmentation for shapes and text for attributes. Despite satisfying results in MCIS, a non-trivial issue is neglected. Some modalities are fully optimized and dominate the generation, while other modalities are sub-optimized and fail to contribute their complementary information. We coin this phenomenon as Modality Bias. Our analysis reveals that generative models own greedy nature. Specifically, the modality that shares less semantic gap with the synthesized modality will be greedily incorporated and thus takes a larger proportion in synthesis. The main idea of previous works in Modality Bias is to punish the greedy nature, which hurts the performance of dominant modalities and impedes their contribution to multimodal synthesis. Instead, we propose to utilize the greedy nature by setting dominant modalities as guidance for sub-optimized modalities through coordinated feature space, named Coordinated Knowledge Mining. Afterwards, improved uni-modalities are aggregated by fusing coordinated features to further boost the performance of multimodal image synthesis, called Coordinated Knowledge Fusion. Extensive experiments prove that our method not only increases uni-modal performance by a large margin, but also promotes multimodal image synthesis by fully utilizing complementary information from different modalities.
Existing multimodal conditional image synthesis (MCIS) methods generate images conditioned on any combinations of various modalities that require all of them must be exactly conformed, hindering the synthesis controllability and leaving the potential of cross-modality under-exploited. To this end, we propose to generate images conditioned on the compositions of multimodal control signals, where modalities are imperfectly complementary, i.e., composed multimodal conditional image synthesis (CMCIS). Specifically, we observe two challenging issues of the proposed CMCIS task, i.e., the modality coordination problem and the modality imbalance problem. To tackle these issues, we introduce a Mixture-of-Modality-Tokens Transformer (MMoT) that adaptively fuses fine-grained multimodal control signals, a multimodal balanced training loss to stabilize the optimization of each modality, and a multimodal sampling guidance to balance the strength of each modality control signal. Comprehensive experimental results demonstrate that MMoT achieves superior performance on both unimodal conditional image synthesis and MCIS tasks with high-quality and faithful image synthesis on complex multimodal conditions. The project website is available at https://jabir-zheng.github.io/MMoT.
Existing conditional image synthesis frameworks generate images based on user inputs in a single modality, such as text, segmentation, sketch, or style reference. They are often unable to leverage multimodal user inputs when available, which reduces their practicality. To address this limitation, we propose the Product-of-Experts Generative Adversarial Networks (PoE-GAN) framework, which can synthesize images conditioned on multiple input modalities or any subset of them, even the empty set. PoE-GAN consists of a product-of-experts generator and a multimodal multiscale projection discriminator. Through our carefully designed training scheme, PoE-GAN learns to synthesize images with high quality and diversity. Besides advancing the state of the art in multimodal conditional image synthesis, PoE-GAN also outperforms the best existing unimodal conditional image synthesis approaches when tested in the unimodal setting. The project website is available at https://deepimagination.github.io/PoE-GAN .
本组文献展示了扩散模型在指纹生成领域的全面演进。研究方向从最初的端到端高质量指纹图像合成(利用DDPM提升多样性与真实感),逐步深入到结合领域知识(如脊线计数、小波变换)的精细化建模。同时,借鉴了计算机视觉领域先进的多模态控制和身份保持技术,实现了可控、可定制且具备身份一致性的指纹合成系统,旨在通过高质量的合成数据解决生物识别中的数据稀缺、隐私保护和模型泛化性挑战。