This is the official repository for the paper titled "A Survey on Diffusion Models for Anomaly Detection", submitted to ๐ฐ IJCAI 2025 (Paper arXiv). This survey provides a comprehensive review of the latest advancements in diffusion models for anomaly detection (DMAD). We delve into the fundamental concepts of anomaly detection and diffusion models, and analyze classic DM architectures such as DDPMs, DDIMs, and Score SDEs. The paper categorizes existing DMAD methods into reconstruction-based, density-based, and hybrid approaches, offering detailed examinations of their methodological innovations. Additionally, we explore diverse tasks across different data modalities, including image, time series, video, and multimodal data analysis. The survey also addresses critical challenges and emerging research directions, such as computational efficiency, model interpretability, robustness enhancement, edge-cloud collaboration, and integration with large language models. This repository curates existing literature, available code, public datasets, and tools to facilitate learning for beginners.
Taxonomy of diffusion models for anomaly detection
โจ If you found this survey and repository useful, please consider to star this repository and cite our survey paper:
@misc{liu2025survey,
title = {A Survey on Diffusion Models for Anomaly Detection},
author = {Liu, Jing and Ma, Zhenchao and Wang, Zepu and Liu, Yang and Wang, Zehua and Sun, Peng and Song, Liang and Hu, Bo and Boukerche, Azzedine and Leung, Victor C. M.},
year = {2025},
month = jan,
number = {arXiv:2501.11430},
eprint = {2501.11430},
primaryclass = {cs},
doi = {10.48550/arXiv.2501.11430},
}
- Reconstruction-based Anomaly Detection
- Density-based Anomaly Detection
- Hybrid Approaches
- Image Anomaly Detection
- Time Series Anomaly Detection
- Video Anomaly Detection
- Multimodal Anomaly Detection
- Anomaly Detection Datasets for Diverse Task
- Anomaly Detection Tools
- Related Topics
2024
-
Unsupervised diffusion based anomaly detection for time series Zuo et al. APIN '24. [paper]
Unsupervised anomaly detection aims to construct a model that effectively detects invisible anomalies by training and reconstruct normal data. While a significant amount of reconstruction-based methods has made effective progress for time series anomaly detection, challenges still exist in aspects such as temporal feature extraction and generalization ability. Firstly, temporal features of data are subject to local information interference in reconstruction methods, which limits the long-term signal reconstruction methods. Secondly, the training dataset collector is subject to information nourishment such as collection methods, collection periods and locations, and data patterns are diverse, requiring the model to rebuild normal data according to different patterns. These issues hinder the anomaly detection capability of reconstruction-based methods. We propose an unsupervised anomaly detection model based on a diffusion model, which learns normal data pattern learning through noisy forward diffusion and reverse noise regression. By using a cascaded structure and combining it with a structured state space layer, long-term time series signal feature can be well extracted. Different collection signals are distinguished by introducing collector entity ID embedding. The method proposed in this article significantly improves performance in experimental tests on three public datasets. Innovative aspects: (1) Utilizing the S4 method to capture long-term dependencies; (2) Employing a diffusion model for reconstruction learning; (3) Leveraging embedding techniques to enhance different pattern learning.
-
DRAD: Surface Anomaly Detection and Localization with Diffusion-based Reconstruction Sheng et al. IJCNN '24. [paper]
Surface anomaly detection aims to detect and locate anomalies in the product surface images. Anomalies are mostly rare and difficult to find, so the unsupervised method which trained using anomaly-free training samples is popular. One of the most successful unsupervised approaches is reconstruction-based method. This kind of method assumes that anomalies are much harder to be accurately reconstructed than normal samples, so the anomalies are detected by the differences between input images and reconstructed images. On the one hand, the detection performance is significantly determined by the quality of reconstruction, the reconstruction ability of current models needs to be improved. On the other hand, anomalies sometimes can also be well reconstructed, leading to detection failure. To address those challenges, we propose a novel diffusion-based image reconstruction method for anomaly detection (DRAD). Leveraging the capabilities of diffusion model to generate high-quality and diverse images, DRAD achieves better reconstruction results compared to previous methods. Additionally, we propose a noise embedding process during the reconstruction, which avoids the direct copying of anomalies. Extensive experiments demonstrate that the proposed DRAD method achieves state-of-the-art performance on the MVTec AD dataset, particularly in anomaly detection with 99.1% image-level AUROC.
-
DDTAD: Anomaly detection for telemetry time series using a denoising diffusion probabilistic model Sui et al. IEEE Sensors Journal '24. [paper]
Efficient anomaly detection in telemetry time series is of great importance to ensure the safety and reliability of spacecraft. However, traditional methods are complicated to train, have a limited ability to maintain details, and do not consider temporal-spatial patterns. These problems make it still a challenge to effectively identify anomalies for multivariate time series. In this article, we propose Denoising Diffusion Time Series Anomaly Detection (DDTAD), an unsupervised reconstruction-based method using a denoising diffusion probabilistic model (DDPM). Our model offers the advantages of training stability, flexibility, and robust high-quality sample generation. We employ 1-D-U-Net architecture to capture both temporal dependencies and intervariable information. We restore the anomalous regions from the noise-corrupted input while preserving the precise features of the normal regions intact. Anomalies are identified as discrepancies between the original time series input and its corresponding reconstruction. Experiments on two public datasets demonstrate that our method outperforms the current dominant data-driven methods and enables the accurate detection of point anomalies, contextual anomalies, and subsequence anomalies.
-
MDPS: Unsupervised anomaly detection via masked diffusion posterior sampling Wu et al. Arxiv '24. [paper] [code]
Reconstruction-based methods have been commonly used for unsupervised anomaly detection, in which a normal image is reconstructed and compared with the given test image to detect and locate anomalies. Recently, diffusion models have shown promising applications for anomaly detection due to their powerful generative ability. However, these models lack strict mathematical support for normal image reconstruction and unexpectedly suffer from low reconstruction quality. To address these issues, this paper proposes a novel and highly-interpretable method named Masked Diffusion Posterior Sampling (MDPS). In MDPS, the problem of normal image reconstruction is mathematically modeled as multiple diffusion posterior sampling for normal images based on the devised masked noisy observation model and the diffusion-based normal image prior under Bayesian framework. Using a metric designed from pixel-level and perceptual-level perspectives, MDPS can effectively compute the difference map between each normal posterior sample and the given test image. Anomaly scores are obtained by averaging all difference maps for multiple posterior samples. Exhaustive experiments on MVTec and BTAD datasets demonstrate that MDPS can achieve state-of-the-art performance in normal image reconstruction quality as well as anomaly detection and localization.
2023
-
Unsupervised industrial anomaly detection with diffusion models Xu et al. J. Vis. Commun. Image R. '23. [paper] [code]
Due to the limitations of autoencoders and generative adversarial networks, the performance of reconstruction-based unsupervised image anomaly detection methods are not satisfactory. In this paper, we aim to explore the potential of a more powerful generative model, the diffusion model, in the anomaly detection problem. Specifically, we design a Reconstructed Diffusion Models (RecDMs) based on conditional denoising diffusion implicit models for image reconstruction. To eliminate the stochastic nature of the generation process, our key idea is to use a learnable encoder to extract meaningful semantic representations, which are then used as signal conditions in an iterative denoising process to guide the model in recovering the image, while avoiding falling into an โidentical shortcutโ to meaningless image reconstruction. To accurately locate anomaly regions, we introduce a discriminative network to obtain the pixel-level anomaly segmentation map based on the reconstructed image. Our experiments demonstrate the effectiveness of the proposed method, achieving a new state-of-the-art image-level AUC score of 98.1% and a pixel-level AUC score of 94.6% on the MVTec AD dataset, among all reconstruction-based methods. We also show the significant potential and promising future of our method on the challenging real-world dataset, the CHL AD dataset.
-
DiffusionAD: DiffusionAD: Norm-guided one-step denoising diffusion for anomaly detection Zhang et al. Arxiv '23. [paper]
Anomaly detection has garnered extensive applications in real industrial manufacturing due to its remarkable effectiveness and efficiency. However, previous generative-based models have been limited by suboptimal reconstruction quality, hampering their overall performance. A fundamental enhancement lies in our reformulation of the reconstruction process using a diffusion model into a noise-to-norm paradigm. Here, anomalous regions are perturbed with Gaussian noise and reconstructed as normal, overcoming the limitations of previous models by facilitating anomaly-free restoration. Additionally, we propose a rapid one-step denoising paradigm, significantly faster than the traditional iterative denoising in diffusion models. Furthermore, the introduction of the norm-guided paradigm elevates the accuracy and fidelity of reconstructions. The segmentation sub-network predicts pixel-level anomaly scores using the input image and its anomaly-free restoration. Comprehensive evaluations on four standard and challenging benchmarks reveal that DiffusionAD outperforms current state-of-the-art approaches, demonstrating the effectiveness and broad applicability of the proposed pipeline.
-
AnoDODE: AnoDODE: Anomaly detection with diffusion ODE Hu et al. Arxiv '23. [paper]
Anomaly detection is the process of identifying atypical data samples that significantly deviate from the majority of the dataset. In the realm of clinical screening and diagnosis, detecting abnormalities in medical images holds great importance. Typically, clinical practice provides access to a vast collection of normal images, while abnormal images are relatively scarce. We hypothesize that abnormal images and their associated features tend to manifest in low-density regions of the data distribution. Following this assumption, we turn to diffusion ODEs for unsupervised anomaly detection, given their tractability and superior performance in density estimation tasks. More precisely, we propose a new anomaly detection method based on diffusion ODEs by estimating the density of features extracted from multi-scale medical images. Our anomaly scoring mechanism depends on computing the negative log-likelihood of features extracted from medical images at different scales, quantified in bits per dimension. Furthermore, we propose a reconstruction-based anomaly localization suitable for our method. Our proposed method not only identifie anomalies but also provides interpretability at both the image and pixel levels. Through experiments on the BraTS2021 medical dataset, our proposed method outperforms existing methods. These results confirm the effectiveness and robustness of our method.
-
DiffAD: Imputation-based time-series anomaly detection with conditional weight-incremental diffusion models Xiao et al. KDD '23. [paper] [code]
Existing anomaly detection models for time series are primarily trained with normal-point-dominant data and would become ineffective when anomalous points intensively occur in certain episodes. To solve this problem, we propose a new approach, called DiffAD, from the perspective of time series imputation. Unlike previous prediction- and reconstruction-based methods that adopt either partial or complete data as observed values for estimation, DiffAD uses a density ratio-based strategy to select normal observations flexibly that can easily adapt to the anomaly concentration scenarios. To alleviate the model bias problem in the presence of anomaly concentration, we design a new denoising diffusion-based imputation method to enhance the imputation performance of missing values with conditional weight-incremental diffusion, which can preserve the information of observed values and substantially improves data generation quality for stable anomaly detection. Besides, we customize a multi-scale state space model to capture the long-term dependencies across episodes with different anomaly patterns. Extensive experimental results on real-world datasets show that DiffAD performs better than state-of-the-art benchmarks.
-
LDM: Unsupervised 3D out-of-distribution detection with latent diffusion models Graham et al. MICCAI '23. [paper] [code]
Methods for out-of-distribution (OOD) detection that scale to 3D data are crucial components of any real-world clinical deep learning system. Classic denoising diffusion probabilistic models (DDPMs) have been recently proposed as a robust way to perform reconstruction-based OOD detection on 2D datasets, but do not trivially scale to 3D data. In this work, we propose to use Latent Diffusion Models (LDMs), which enable the scaling of DDPMs to high-resolution 3D medical data. We validate the proposed approach on near- and far-OOD datasets and compare it to a recently proposed, 3D-enabled approach using Latent Transformer Models (LTMs). Not only does the proposed LDM-based approach achieve statistically significant better performance, it also shows less sensitivity to the underlying latent representation, more favourable memory scaling, and produces better spatial anomaly maps. Code is available at https://github.com/marksgraham/ddpm-ood.
2022
-
AnoDDPM: AnoDDPM: Anomaly detection with denoising diffusion probabilistic models using simplex noise Wyatt et al. CVPR '22. [paper] [code]
Generative models have been shown to provide a powerful mechanism for anomaly detection by learning to model healthy or normal reference data which can subsequently be used as a baseline for scoring anomalies. In this work we consider denoising diffusion probabilistic models (DDPMs) for unsupervised anomaly detection. DDPMs have superior mode coverage over generative adversarial networks (GANs) and higher sample quality than variational autoencoders (VAEs). However, this comes at the expense of poor scalability and increased sampling times due to the long Markov chain sequences required. We observe that within reconstruction-based anomaly detection a full-length Markov chain diffusion is not required. This leads us to develop a novel partial diffusion anomaly detection strategy that scales to high-resolution imagery, named AnoDDPM. A secondary problem is that Gaussian diffusion fails to capture larger anomalies; therefore we develop a multi-scale simplex noise diffusion process that gives control over the target anomaly size. AnoDDPM with simplex noise is shown to significantly outperform both f-AnoGAN and Gaussian diffusion for the tumorous dataset of 22 T1-weighted MRI scans (CCBS Edinburgh) qualitatively and quantitatively (improvement of +25.5% Sorensen-Dice coefficient, +17.6% IoU and +7.4% AUC).
2023
-
DTE: On diffusion modeling for anomaly detection Livernoche et al. ICLR '24. [paper] [code]
Known for their impressive performance in generative modeling, diffusion models are attractive candidates for density-based anomaly detection. This paper investigates different variations of diffusion modeling for unsupervised and semi-supervised anomaly detection. In particular, we find that Denoising Diffusion Probability Models (DDPM) are performant on anomaly detection benchmarks yet computationally expensive. By simplifying DDPM in application to anomaly detection, we are naturally led to an alternative approach called Diffusion Time Estimation (DTE). DTE estimates the distribution over diffusion time for a given input and uses the mode or mean of this distribution as the anomaly score. We derive an analytical form for this density and leverage a deep neural network to improve inference efficiency. Through empirical evaluations on the ADBench benchmark, we demonstrate that all diffusion-based anomaly detection methods perform competitively for both semi-supervised and unsupervised settings. Notably, DTE achieves orders of magnitude faster inference time than DDPM, while outperforming it on this benchmark. These results establish diffusion-based anomaly detection as a scalable alternative to traditional methods and recent deep-learning techniques for standard unsupervised and semi-supervised anomaly detection settings.
2021
-
Perfect Density Models Cannot Guarantee Anomaly Detection Le Lan et al. Entropy '21. [paper]
Thanks to the tractability of their likelihood, several deep generative models show promise for seemingly straightforward but important applications like anomaly detection, uncertainty estimation, and active learning. However, the likelihood values empirically attributed to anomalies conflict with the expectations these proposed applications suggest. In this paper, we take a closer look at the behavior of distribution densities through the lens of reparametrization and show that these quantities carry less meaningful information than previously thought, beyond estimation issues or the curse of dimensionality. We conclude that the use of these likelihoods for anomaly detection relies on strong and implicit hypotheses, and highlight the necessity of explicitly formulating these assumptions for reliable anomaly detection.
-
Progressive distillation for fast sampling of diffusion models Salimans et al. ICLR '22. [paper] [code]
Diffusion models have recently shown great promise for generative modeling, outperforming GANs on perceptual quality and autoregressive models at density estimation. A remaining downside is their slow sampling time: generating high quality samples takes many hundreds or thousands of model evaluations. Here we make two contributions to help eliminate this downside: First, we present new parameterizations of diffusion models that provide increased stability when using few sampling steps, compared to models in the literature. Second, we present a method to distill a trained deterministic diffusion sampler, using many steps, into a new diffusion model that takes half as many sampling steps. We then keep progressively applying this distillation procedure to our model, halving the number of required sampling steps each time. On standard image generation benchmarks like CIFAR-10, ImageNet, and LSUN, we start out with (near) state-of-the-art samplers taking 1024 or 8192 steps, and are able to distill down to models taking as little as 4 steps without losing much perceptual quality; achieving, for example, a FID of 3.0 on CIFAR-10 in 4 steps. Finally, we show that the full progressive distillation procedure does not take more time than it takes to train the original model, thus representing an efficient solution for generative modeling using diffusion at both train and test time.
2025
-
GLAD: GLAD: Towards better reconstruction with global and local adaptive diffusion models for unsupervised anomaly detection Yao et al. Arxiv '24. [paper] [code]
Diffusion models have shown superior performance on unsupervised anomaly detection tasks. Since trained with normal data only, diffusion models tend to reconstruct normal counterparts of test images with certain noises added. However, these methods treat all potential anomalies equally, which may cause two main problems. From the global perspective, the difficulty of reconstructing images with different anomalies is uneven. For example, adding back a missing element is harder than dealing with a scratch, thus requiring a larger number of denoising steps. Therefore, instead of utilizing the same setting for all samples, we propose to predict a particular denoising step for each sample by evaluating the difference between image contents and the priors extracted from diffusion models. From the local perspective, reconstructing abnormal regions differs from normal areas even in the same image. Theoretically, the diffusion model predicts a noise for each step, typically following a standard Gaussian distribution. However, due to the difference between the anomaly and its potential normal counterpart, the predicted noise in abnormal regions will inevitably deviate from the standard Gaussian distribution. To this end, we propose introducing synthetic abnormal samples in training to encourage the diffusion models to break through the limitation of standard Gaussian distribution, and a spatial-adaptive feature fusion scheme is utilized during inference. With the above modifications, we propose a global and local adaptive diffusion model (abbreviated to GLAD) for unsupervised anomaly detection, which introduces appealing flexibility and achieves anomaly-free reconstruction while retaining as much normal information as possible. Extensive experiments are conducted on three commonly used anomaly detection datasets (MVTec-AD, MPDD, and VisA) and a printed circuit board dataset (PCB-Bank) we integrated, showing the effectiveness of the proposed method. The source code and pre-trained models are publicly available at https://github.com/hyao1/GLAD.
2024
-
DRDC: Enhancing multi-class anomaly detection via diffusion refinement with dual conditioning Zhan et al. Arxiv '24. [paper]
Anomaly detection, the technique of identifying abnormal samples using only normal samples, has attracted widespread interest in industry. Existing one-model-per-category methods often struggle with limited generalization capabilities due to their focus on a single category, and can fail when encountering variations in product. Recent feature reconstruction methods, as representatives in one-model-all-categories schemes, face challenges including reconstructing anomalous samples and blurry reconstructions. In this paper, we creatively combine a diffusion model and a transformer for multi-class anomaly detection. This approach leverages diffusion to obtain high-frequency information for refinement, greatly alleviating the blurry reconstruction problem while maintaining the sampling efficiency of the reverse diffusion process. The task is transformed into image inpainting to disconnect the input-output correlation, thereby mitigating the "identical shortcuts" problem and avoiding the model from reconstructing anomalous samples. Besides, we introduce category-awareness using dual conditioning to ensure the accuracy of prediction and reconstruction in the reverse diffusion process, preventing excessive deviation from the target category, thus effectively enabling multi-class anomaly detection. Futhermore, Spatio-temporal fusion is also employed to fuse heatmaps predicted at different timesteps and scales, enhancing the performance of multi-class anomaly detection. Extensive experiments on benchmark datasets demonstrate the superior performance and exceptional multi-class anomaly detection capabilities of our proposed method compared to others.
-
MDPS: Unsupervised anomaly detection via masked diffusion posterior sampling Wu et al. Arxiv '24. [paper] [code]
Reconstruction-based methods have been commonly used for unsupervised anomaly detection, in which a normal image is reconstructed and compared with the given test image to detect and locate anomalies. Recently, diffusion models have shown promising applications for anomaly detection due to their powerful generative ability. However, these models lack strict mathematical support for normal image reconstruction and unexpectedly suffer from low reconstruction quality. To address these issues, this paper proposes a novel and highly-interpretable method named Masked Diffusion Posterior Sampling (MDPS). In MDPS, the problem of normal image reconstruction is mathematically modeled as multiple diffusion posterior sampling for normal images based on the devised masked noisy observation model and the diffusion-based normal image prior under Bayesian framework. Using a metric designed from pixel-level and perceptual-level perspectives, MDPS can effectively compute the difference map between each normal posterior sample and the given test image. Anomaly scores are obtained by averaging all difference maps for multiple posterior samples. Exhaustive experiments on MVTec and BTAD datasets demonstrate that MDPS can achieve state-of-the-art performance in normal image reconstruction quality as well as anomaly detection and localization.
-
Dynamic addition of noise in a diffusion model for anomaly detection Tebbe et al. CVPRW '24. [paper]
Diffusion models have found valuable applications in anomaly detection by capturing the nominal data distribution and identifying anomalies via reconstruction. Despite their merits they struggle to localize anomalies of varying scales especially larger anomalies like entire missing components. Addressing this we present a novel framework that enhances the capability of diffusion models by extending the previous introduced implicit conditioning approach [??] in three significant ways. First we incorporate a dynamic step size computation that allows for variable noising steps in the forward process guided by an initial anomaly prediction. Second we demonstrate that denoising an only scaled input without any added noise outperforms conventional denoising process. Third we project images in a latent space to abstract away from fine details that interfere with reconstruction of large missing components. Additionally we propose a fine-tuning mechanism that facilitates the model to effectively grasp the nuances of the target domain. Our method undergoes rigorous evaluation on prominent anomaly detection datasets VisA BTAD and MVTec yielding strong performance. Importantly our framework effectively localizes anomalies regardless of their scale marking a pivotal advancement in diffusion-based anomaly detection.
2023
-
DiffAD: Imputation-based time-series anomaly detection with conditional weight-incremental diffusion models Xiao et al. KDD '23. [paper] [code]
Existing anomaly detection models for time series are primarily trained with normal-point-dominant data and would become ineffective when anomalous points intensively occur in certain episodes. To solve this problem, we propose a new approach, called DiffAD, from the perspective of time series imputation. Unlike previous prediction- and reconstruction-based methods that adopt either partial or complete data as observed values for estimation, DiffAD uses a density ratio-based strategy to select normal observations flexibly that can easily adapt to the anomaly concentration scenarios. To alleviate the model bias problem in the presence of anomaly concentration, we design a new denoising diffusion-based imputation method to enhance the imputation performance of missing values with conditional weight-incremental diffusion, which can preserve the information of observed values and substantially improves data generation quality for stable anomaly detection. Besides, we customize a multi-scale state space model to capture the long-term dependencies across episodes with different anomaly patterns. Extensive experimental results on real-world datasets show that DiffAD performs better than state-of-the-art benchmarks.
-
AutoDDPM: Mask, stitch, and re-sample: Enhancing robustness and generalizability in anomaly detection through automatic diffusion models Bercea et al. Arxiv '23. [paper]
The introduction of diffusion models in anomaly detection has paved the way for more effective and accurate image reconstruction in pathologies. However, the current limitations in controlling noise granularity hinder diffusion models' ability to generalize across diverse anomaly types and compromise the restoration of healthy tissues. To overcome these challenges, we propose AutoDDPM, a novel approach that enhances the robustness of diffusion models. AutoDDPM utilizes diffusion models to generate initial likelihood maps of potential anomalies and seamlessly integrates them with the original image. Through joint noised distribution re-sampling, AutoDDPM achieves harmonization and in-painting effects. Our study demonstrates the efficacy of AutoDDPM in replacing anomalous regions while preserving healthy tissues, considerably surpassing diffusion models' limitations. It also contributes valuable insights and analysis on the limitations of current diffusion models, promoting robust and interpretable anomaly detection in medical imaging - an essential aspect of building autonomous clinical decision systems with higher interpretability.
2025
-
Image-Conditioned Diffusion Models for Medical Anomaly Detection Sudre et al. UNSURE 2024. [paper] [code]
Generating pseudo-healthy reconstructions of images is an effective way to detect anomalies, as identifying the differences between the reconstruction and the original can localise arbitrary anomalies whilst also providing interpretability for an observer by displaying what the image โshouldโ look like. All existing reconstruction-based methods have a common shortcoming; they assume that models trained on purely normal data are incapable of reproducing pathologies yet also able to fully maintain healthy tissue. These implicit assumptions often fail, with models either not recovering normal regions or reproducing both the normal and abnormal features. We rectify this issue using image-conditioned diffusion models. Our model takes the input image as conditioning and is explicitly trained to correct synthetic anomalies introduced into healthy images, ensuring that it removes anomalies at test time. This conditioning allows the model to attend to the entire image without any loss of information, enabling it to replicate healthy regions with high fidelity. We evaluate our method across four datasets and define a new state-of-the-art performance for residual-based anomaly detection. Code is available at https://github.com/matt-baugh/img-cond-diffusion-model-ad.
2024
-
FedDiff: FedDiff: Diffusion Model Driven Federated Learning for Multi-Modal and Multi-Clients Li et al. TCSVT '24. [paper]
With the rapid development of imaging sensor technology in the field of remote sensing, multi-modal remote sensing data fusion has emerged as a crucial research direction for land cover classification tasks. While diffusion models have made great progress in generative models and image classification tasks, existing models primarily focus on single-modality and single-client control, that is, the diffusion process is driven by a single modal in a single computing node. To facilitate the secure fusion of heterogeneous data from clients, it is necessary to enable distributed multi-modal control, such as merging the hyperspectral data of organization A and the LiDAR data of organization B privately on each base station client. In this study, we propose a multi-modal collaborative diffusion federated learning framework called FedDiff. Our framework establishes a dual-branch diffusion model feature extraction setup, where the two modal data are inputted into separate branches of the encoder. Our key insight is that diffusion models driven by different modalities are inherently complementary in terms of potential denoising steps on which bilateral connections can be built. Considering the challenge of private and efficient communication between multiple clients, we embed the diffusion model into the federated learning communication structure, and introduce a lightweight communication module. Qualitative and quantitative experiments validate the superiority of our framework in terms of image quality and conditional consistency. To the best of our knowledge, this is the first instance of deploying a diffusion model into a federated learning framework, achieving optimal both privacy protection and performance for heterogeneous data. Our FedDiff surpasses existing methods in terms of performance on three multi-modal datasets, achieving a classification average accuracy of 96.77% while reducing the communication cost.
-
DualAnoDiff: DualAnoDiff: Dual-interrelated diffusion model for few-shot anomaly image generation Jin et al. arXiv '24. [paper] [code]
The performance of anomaly inspection in industrial manufacturing is constrained by the scarcity of anomaly data. To overcome this challenge, researchers have started employing anomaly generation approaches to augment the anomaly dataset. However, existing anomaly generation methods suffer from limited diversity in the generated anomalies and struggle to achieve a seamless blending of this anomaly with the original image. In this paper, we overcome these challenges from a new perspective, simultaneously generating a pair of the overall image and the corresponding anomaly part. We propose DualAnoDiff, a novel diffusion-based few-shot anomaly image generation model, which can generate diverse and realistic anomaly images by using a dual-interrelated diffusion model, where one of them is employed to generate the whole image while the other one generates the anomaly part. Moreover, we extract background and shape information to mitigate the distortion and blurriness phenomenon in few-shot image generation. Extensive experiments demonstrate the superiority of our proposed model over state-of-the-art methods in terms of both realism and diversity. Overall, our approach significantly improves the performance of downstream anomaly detection tasks, including anomaly detection, anomaly localization, and anomaly classification tasks.
-
Pancreatic tumor segmentation as anomaly detection in CT images using denoising diffusion models Babaei et al. Arxiv '24. [paper]
Despite the advances in medicine, cancer has remained a formidable challenge. Particularly in the case of pancreatic tumors, characterized by their diversity and late diagnosis, early detection poses a significant challenge crucial for effective treatment. The advancement of deep learning techniques, particularly supervised algorithms, has significantly propelled pancreatic tumor detection in the medical field. However, supervised deep learning approaches necessitate extensive labeled medical images for training, yet acquiring such annotations is both limited and costly. Conversely, weakly supervised anomaly detection methods, requiring only image-level annotations, have garnered interest. Existing methodologies predominantly hinge on generative adversarial networks (GANs) or autoencoder models, which can pose complexity in training and, these models may face difficulties in accurately preserving fine image details. This research presents a novel approach to pancreatic tumor detection, employing weak supervision anomaly detection through denoising diffusion algorithms. By incorporating a deterministic iterative process of adding and removing noise along with classifier guidance, the method enables seamless translation of images between diseased and healthy subjects, resulting in detailed anomaly maps without requiring complex training protocols and segmentation masks. This study explores denoising diffusion models as a recent advancement over traditional generative models like GANs, contributing to the field of pancreatic tumor detection. Recognizing the low survival rates of pancreatic cancer, this study emphasizes the need for continued research to leverage diffusion models' efficiency in medical segmentation tasks.
-
ODEED: Detecting out-of-distribution earth observation images with diffusion models Bellier et al. CVPRW '24. [paper]
Earth Observation imagery can capture rare and unusual events, such as disasters and major landscape changes, whose visual appearance contrasts with the usual observations. Deep models trained on common remote sensing data will output drastically different features for these out-of-distribution samples, compared to those closer to their training dataset. Detecting them could therefore help anticipate changes in the observations, either geographical or environmental. In this work, we show that the reconstruction error of diffusion models can effectively serve as unsupervised out-of-distribution detectors for remote sensing images., using as a plausibility score. Moreover, we introduce ODEED, a novel reconstruction-based scorer using the probability-flow ODE of diffusion models. We validate it experimentally on SpaceNet 8 with various scenarios, such as classical OOD detection with geographical shift and near-OOD setups: pre/post-flood and non-flooded/flooded image recognition. We show that our ODEED scorer significantly outperforms other diffusion-based and discriminative baselines on the more challenging near-OOD scenarios of flood image detection, where OOD images are close to the distribution tail. We aim to pave the way towards better use of generative models for anomaly detection in remote sensing.
-
AnomalyXFusion: AnomalyXFusion: Multi-modal anomaly synthesis with diffusion Hu et al. Arxiv '24. [paper] [code]
Anomaly synthesis is one of the effective methods to augment abnormal samples for training. However, current anomaly synthesis methods predominantly rely on texture information as input, which limits the fidelity of synthesized abnormal samples. Because texture information is insufficient to correctly depict the pattern of anomalies, especially for logical anomalies. To surmount this obstacle, we present the AnomalyXFusion framework, designed to harness multi-modality information to enhance the quality of synthesized abnormal samples. The AnomalyXFusion framework comprises two distinct yet synergistic modules: the Multi-modal In-Fusion (MIF) module and the Dynamic Dif-Fusion (DDF) module. The MIF module refines modality alignment by aggregating and integrating various modality features into a unified embedding space, termed X-embedding, which includes image, text, and mask features. Concurrently, the DDF module facilitates controlled generation through an adaptive adjustment of X-embedding conditioned on the diffusion steps. In addition, to reveal the multi-modality representational power of AnomalyXFusion, we propose a new dataset, called MVTec Caption. More precisely, MVTec Caption extends 2.2k accurate image-mask-text annotations for the MVTec AD and LOCO datasets. Comprehensive evaluations demonstrate the effectiveness of AnomalyXFusion, especially regarding the fidelity and diversity for logical anomalies. Project page: http:github.com/hujiecpp/MVTec-Caption
-
IgCONDA-PET: IgCONDA-PET: Implicitly-guided counterfactual diffusion for detecting anomalies in PET images Ahamed et al. Arxiv '24. [paper] [code]
Minimizing the need for pixel-level annotated data for training PET anomaly segmentation networks is crucial, particularly due to time and cost constraints related to expert annotations. Current un-/weakly-supervised anomaly detection methods rely on autoencoder or generative adversarial networks trained only on healthy data, although these are more challenging to train. In this work, we present a weakly supervised and Implicitly guided COuNterfactual diffusion model for Detecting Anomalies in PET images, branded as IgCONDA-PET. The training is conditioned on image class labels (healthy vs. unhealthy) along with implicit guidance to generate counterfactuals for an unhealthy image with anomalies. The counterfactual generation process synthesizes the healthy counterpart for a given unhealthy image, and the difference between the two facilitates the identification of anomaly locations. The code is available at: https://github.com/igcondapet/IgCONDA-PET.git
-
MMCCD: Modality cycles with masked conditional diffusion for unsupervised anomaly segmentation in MRI Liang et al. Arxiv '23. [paper] [code]
Unsupervised anomaly segmentation aims to detect patterns that are distinct from any patterns processed during training, commonly called abnormal or out-of-distribution patterns, without providing any associated manual segmentations. Since anomalies during deployment can lead to model failure, detecting the anomaly can enhance the reliability of models, which is valuable in high-risk domains like medical imaging. This paper introduces Masked Modality Cycles with Conditional Diffusion (MMCCD), a method that enables segmentation of anomalies across diverse patterns in multimodal MRI. The method is based on two fundamental ideas. First, we propose the use of cyclic modality translation as a mechanism for enabling abnormality detection. Image-translation models learn tissue-specific modality mappings, which are characteristic of tissue physiology. Thus, these learned mappings fail to translate tissues or image patterns that have never been encountered during training, and the error enables their segmentation. Furthermore, we combine image translation with a masked conditional diffusion model, which attempts to โimagineโ what tissue exists under a masked area, further exposing unknown patterns as the generative model fails to recreate them. We evaluate our method on a proxy task by training on healthy-looking slices of BraTS2021 multi-modality MRIs and testing on slices with tumors. We show that our method compares favorably to previous unsupervised approaches based on image reconstruction and denoising with autoencoders and diffusion models. Code is available at: .
-
MAEDiff: MAEDiff: Masked autoencoder-enhanced diffusion models for unsupervised anomaly detection in brain images Xu et al. Arxiv '24. [paper]
Unsupervised anomaly detection has gained significant attention in the field of medical imaging due to its capability of relieving the costly pixel-level annotation. To achieve this, modern approaches usually utilize generative models to produce healthy references of the diseased images and then identify the abnormalities by comparing the healthy references and the original diseased images. Recently, diffusion models have exhibited promising potential for unsupervised anomaly detection in medical images for their good mode coverage and high sample quality. However, the intrinsic characteristics of the medical images, e.g. the low contrast, and the intricate anatomical structure of the human body make the reconstruction challenging. Besides, the global information of medical images often remain underutilized. To address these two issues, we propose a novel Masked Autoencoder-enhanced Diffusion Model (MAEDiff) for unsupervised anomaly detection in brain images. The MAEDiff involves a hierarchical patch partition. It generates healthy images by overlapping upper-level patches and implements a mechanism based on the masked autoencoders operating on the sub-level patches to enhance the condition on the unnoised regions. Extensive experiments on data of tumors and multiple sclerosis lesions demonstrate the effectiveness of our method.
-
mDDPM: Unsupervised anomaly detection in medical images using masked diffusion model Iqbal et al. MLMI '23. [paper] [code]
It can be challenging to identify brain MRI anomalies using supervised deep-learning techniques due to anatomical heterogeneity and the requirement for pixel-level labeling. Unsupervised anomaly detection approaches provide an alternative solution by relying only on sample-level labels of healthy brains to generate a desired representation to identify abnormalities at the pixel level. Although, generative models are crucial for generating such anatomically consistent representations of healthy brains, accurately generating the intricate anatomy of the human brain remains a challenge. In this study, we present a method called the masked-denoising diffusion probabilistic model (mDDPM), which introduces masking-based regularization to reframe the generation task of diffusion models. Specifically, we introduce Masked Image Modeling (MIM) and Masked Frequency Modeling (MFM) in our self-supervised approach that enables models to learn visual representations from unlabeled data. To the best of our knowledge, this is the first attempt to apply MFM in denoising diffusion probabilistic models (DDPMs) for medical applications. We evaluate our approach on datasets containing tumors and numerous sclerosis lesions and exhibit the superior performance of our unsupervised method as compared to the existing fully/weakly supervised baselines. Project website: https://mddpm.github.io/.
-
THOR: Diffusion models with implicit guidance for medical anomaly detection Linguraru et al. Arxiv '24. [paper] [code]
Diffusion models have advanced unsupervised anomaly detection by improving the transformation of pathological images into pseudo-healthy equivalents. Nonetheless, standard approaches may compromise critical information during pathology removal, leading to restorations that do not align with unaffected regions in the original scans. Such discrepancies can inadvertently increase false positive rates and reduce specificity, complicating radiological evaluations. This paper introduces Temporal Harmonization for Optimal Restoration (THOR), which refines the reverse diffusion process by integrating implicit guidance through intermediate masks. THOR aims to preserve the integrity of healthy tissue details in reconstructed images, ensuring fidelity to the original scan in areas unaffected by pathology. Comparative evaluations reveal that THOR surpasses existing diffusion-based methods in retaining detail and precision in image restoration and detecting and segmenting anomalies in brain MRIs and wrist X-rays. Code: https://github.com/compai-lab/2024-miccai-bercea-thor.git.
-
Dif-fuse: Diffusion models for counterfactual generation and anomaly detection in brain images Fontanella et al. IEEE TMI '24. [paper]
Segmentation masks of pathological areas are useful in many medical applications, such as brain tumour and stroke management. Moreover, healthy counterfactuals of diseased images can be used to enhance radiologistsโ training files and to improve the interpretability of segmentation models. In this work, we present a weakly supervised method to generate a healthy version of a diseased image and then use it to obtain a pixel-wise anomaly map. To do so, we start by considering a saliency map that approximately covers the pathological areas, obtained with ACAT. Then, we propose a technique that allows to perform targeted modifications to these regions, while preserving the rest of the image. In particular, we employ a diffusion model trained on healthy samples and combine Denoising Diffusion Probabilistic Model (DDPM) and Denoising Diffusion Implicit Model (DDIM) at each step of the sampling process. DDPM is used to modify the areas affected by a lesion within the saliency map, while DDIM guarantees reconstruction of the normal anatomy outside of it. The two parts are also fused at each timestep, to guarantee the generation of a sample with a coherent appearance and a seamless transition between edited and unedited parts. We verify that when our method is applied to healthy samples, the input images are reconstructed without significant modifications. We compare our approach with alternative weakly supervised methods on the task of brain lesion segmentation, achieving the highest mean Dice and IoU scores among the models considered.
-
Masked Bernoulli Diffusion: Binary noise for binary tasks: Masked bernoulli diffusion for unsupervised anomaly detection Linguraru et al. arXiv '24. [paper] [code]
The high performance of denoising diffusion models for image generation has paved the way for their application in unsupervised medical anomaly detection. As diffusion-based methods require a lot of GPU memory and have long sampling times, we present a novel and fast unsupervised anomaly detection approach based on latent Bernoulli diffusion models. We first apply an autoencoder to compress the input images into a binary latent representation. Next, a diffusion model that follows a Bernoulli noise schedule is employed to this latent space and trained to restore binary latent representations from perturbed ones. The binary nature of this diffusion model allows us to identify entries in the latent space that have a high probability of flipping their binary code during the denoising process, which indicates out-of-distribution data. We propose a masking algorithm based on these probabilities, which improves the anomaly detection scores. We achieve state-of-the-art performance compared to other diffusion-based unsupervised anomaly detection algorithms while significantly reducing sampling time and memory consumption. The code is available at https://github.com/JuliaWolleb/Anomaly_berdiff.
-
DISYRE v2: Ensembled cold-diffusion restorations for unsupervised anomaly detection Naval Marimont et al. MICCAI '24. [paper]
Unsupervised Anomaly Detection (UAD) methods aim to identify anomalies in test samples comparing them with a normative distribution learned from a dataset known to be anomaly-free. Approaches based on generative models offer interpretability by generating anomaly-free versions of test images, but are typically unable to identify subtle anomalies. Alternatively, approaches using feature modelling or self-supervised methods, such as the ones relying on synthetically generated anomalies, do not provide out-of-the-box interpretability. In this work, we present a novel method that combines the strengths of both strategies: a generative cold-diffusion pipeline (i.e., a diffusion-like pipeline which uses corruptions not based on noise) that is trained with the objective of turning synthetically-corrupted images back to their normal, original appearance. To support our pipeline we introduce a novel synthetic anomaly generation procedure, called DAG, and a novel anomaly score which ensembles restorations conditioned with different degrees of abnormality. Our method surpasses the prior state-of-the art for unsupervised anomaly detection in three different Brain MRI datasets.
2023
-
Unsupervised industrial anomaly detection with diffusion models Xu et al. J. Vis. Commun. Image R. '23. [paper] [code]
Due to the limitations of autoencoders and generative adversarial networks, the performance of reconstruction-based unsupervised image anomaly detection methods are not satisfactory. In this paper, we aim to explore the potential of a more powerful generative model, the diffusion model, in the anomaly detection problem. Specifically, we design a Reconstructed Diffusion Models (RecDMs) based on conditional denoising diffusion implicit models for image reconstruction. To eliminate the stochastic nature of the generation process, our key idea is to use a learnable encoder to extract meaningful semantic representations, which are then used as signal conditions in an iterative denoising process to guide the model in recovering the image, while avoiding falling into an โidentical shortcutโ to meaningless image reconstruction. To accurately locate anomaly regions, we introduce a discriminative network to obtain the pixel-level anomaly segmentation map based on the reconstructed image. Our experiments demonstrate the effectiveness of the proposed method, achieving a new state-of-the-art image-level AUC score of 98.1% and a pixel-level AUC score of 94.6% on the MVTec AD dataset, among all reconstruction-based methods. We also show the significant potential and promising future of our method on the challenging real-world dataset, the CHL AD dataset.
-
ODD: Odd: One-class anomaly detection via the diffusion model Wang et al. ICIP '23. [paper]
Anomaly detection identifies instances that deviate the distribution of the normal class. Recently, the diffusion models have shown great promise. Our research revealed that by training the diffusion model solely on normal data, it is able to transform both normal and anomalous samples into normal images. Employing this discovery, we propose ODD (One-Class Anomaly Detection via the Diffusion model), which consists of: a diffusion model to convert both normal and anomalous data into normal data, and a similarity network enhanced with outlier exposure to measure the semantic distance between the input and output of the diffusion model. If the score is low, the input is considered as an anomaly instance. The ODD is evaluated on a variety of datasets. Both qualitative and quantitative results demonstrate that our method outperforms existing state-of-the-art techniques.
-
Fast non-markovian diffusion model for weakly supervised anomaly detection in brain MR images Li et al. MICCAI '23. [paper]
In medical image analysis, anomaly detection in weakly supervised settings has gained significant interest due to the high cost associated with expert-annotated pixel-wise labeling. Current methods primarily rely on auto-encoders and flow-based healthy image reconstruction to detect anomalies. However, these methods have limitations in terms of high-fidelity generation and suffer from complicated training processes and low-quality reconstructions. Recent studies have shown promising results with diffusion models in image generation. However, their practical value in medical scenarios is restricted due to their weak detail-retaining ability and low inference speed. To address these limitations, we propose a fast non-Markovian diffusion model (FNDM) with hybrid-condition guidance to detect high-precision anomalies in the brain MR images. A non-Markovian diffusion process is designed to enable the efficient transfer of anatomical information across diffusion steps. Additionally, we introduce new hybrid pixel-wise conditions as more substantial guidance on hidden states, which enables the model to concentrate more efficiently on the anomaly regions. Furthermore, to reduce computational burden during clinical applications, we have accelerated the encoding and sampling procedures in our FNDM using multi-step ODE solvers. As a result, our proposed FNDM method outperforms the previous state-of-the-art diffusion model, achieving a 9.56% and 19.98% improvement in Dice scores on the BRATS 2020 and ISLES datasets, respectively, while requiring only six times less computational cost.
-
Collaborative Diffusion: Collaborative diffusion for multi-modal face generation and editing Huang et al. CVPR '23. [paper] [code]
Diffusion models arise as a powerful generative tool recently. Despite the great progress, existing diffusion models mainly focus on uni-modal control, i.e., the diffusion process is driven by only one modality of condition. To further unleash the users' creativity, it is desirable for the model to be controllable by multiple modalities simultaneously, e.g., generating and editing faces by describing the age (text-driven) while drawing the face shape (mask-driven). In this work, we present Collaborative Diffusion, where pre-trained uni-modal diffusion models collaborate to achieve multi-modal face generation and editing without re-training. Our key insight is that diffusion models driven by different modalities are inherently complementary regarding the latent denoising steps, where bilateral connections can be established upon. Specifically, we propose dynamic diffuser, a meta-network that adaptively hallucinates multi-modal denoising steps by predicting the spatial-temporal influence functions for each pre-trained uni-modal model. Collaborative Diffusion not only collaborates generation capabilities from uni-modal diffusion models, but also integrates multiple uni-modal manipulations to perform multi-modal editing. Extensive qualitative and quantitative experiments demonstrate the superiority of our framework in both image quality and condition consistency.
2022
-
Fast unsupervised brain anomaly detection and segmentation with diffusion models Pinaya et al. MICCAI '22. [paper]
Deep generative models have emerged as promising tools for detecting arbitrary anomalies in data, dispensing with the necessity for manual labelling. Recently, autoregressive transformers have achieved state-of-the-art performance for anomaly detection in medical imaging. Nonetheless, these models still have some intrinsic weaknesses, such as requiring images to be modelled as 1D sequences, the accumulation of errors during the sampling process, and the significant inference times associated with transformers. Denoising diffusion probabilistic models are a class of non-autoregressive generative models recently shown to produce excellent samples in computer vision (surpassing Generative Adversarial Networks), and to achieve log-likelihoods that are competitive with transformers while having relatively fast inference times. Diffusion models can be applied to the latent representations learnt by autoencoders, making them easily scalable and great candidates for application to high dimensional data, such as medical images. Here, we propose a method based on diffusion models to detect and segment anomalies in brain imaging. By training the models on healthy data and then exploring its diffusion and reverse steps across its Markov chain, we can identify anomalous areas in the latent space and hence identify anomalies in the pixel space. Our diffusion models achieve competitive performance compared with autoregressive approaches across a series of experiments with 2D CT and MRI data involving synthetic and real pathological lesions with much reduced inference times, making their usage clinically viable.
-
Diffusion models for medical anomaly detection Wolleb et al. MICCAI '22. [paper] [code]
In medical applications, weakly supervised anomaly detection methods are of great interest, as only image-level annotations are required for training. Current anomaly detection methods mainly rely on generative adversarial networks or autoencoder models. Those models are often complicated to train or have difficulties to preserve fine details in the image. We present a novel weakly supervised anomaly detection method based on denoising diffusion implicit models. We combine the deterministic iterative noising and denoising scheme with classifier guidance for image-to-image translation between diseased and healthy subjects. Our method generates very detailed anomaly maps without the need for a complex training procedure. We evaluate our method on the BRATS2020 dataset for brain tumor detection and the CheXpert dataset for detecting pleural effusions.
2025
-
PHAD: Prototype-oriented hypergraph representation learning for anomaly detection in tabular data Li et al. Information Processing and Management '25. [paper] [code]
Anomaly detection in tabular data holds significant importance across various industries such as manufacturing, healthcare, and finance. However, existing methods are constrained by the size and diversity of datasets, leading to poor generalization. Moreover, they primarily concentrate on feature correlations while overlooking interactions among data instances. Furthermore, the vulnerability of these methods to noisy data hinders their deployment in practical engineering applications. To tackle these issues, this paper proposes prototype-oriented hypergraph representation learning for anomaly detection in tabular data (PHAD). Specifically, PHAD employs a diffusion-based data augmentation strategy tailored for tabular data to enhance both the size and diversity of the training data. Subsequently, it constructs a hypergraph from the combined augmented and original training data to capture higher-order correlations among data instances by leveraging hypergraph neural networks. Lastly, PHAD utilizes an adaptive fusion of local and global data representations to derive the prototype of latent normal data, serving as a benchmark for detecting anomalies. Extensive experiments on twenty-six public datasets across various engineering fields demonstrate that our proposed PHAD outperforms other state-of-the-art methods in terms of performance, robustness, and efficiency.
-
Unsupervised anomaly detection for tabular data using noise evaluation Dai et al. AAAI '25. [paper]
Unsupervised anomaly detection (UAD) plays an important role in modern data analytics and it is crucial to provide simple yet effective and guaranteed UAD algorithms for real applications. In this paper, we present a novel UAD method for tabular data by evaluating how much noise is in the data. Specifically, we propose to learn a deep neural network from the clean (normal) training dataset and a noisy dataset, where the latter is generated by adding highly diverse noises to the clean data. The neural network can learn a reliable decision boundary between normal data and anomalous data when the diversity of the generated noisy data is sufficiently high so that the hard abnormal samples lie in the noisy region. Importantly, we provide theoretical guarantees, proving that the proposed method can detect anomalous data successfully, although the method does not utilize any real anomalous data in the training stage. Extensive experiments through more than 60 benchmark datasets demonstrate the effectiveness of the proposed method in comparison to 12 baselines of UAD. Our method obtains a 92.27\% AUC score and a 1.68 ranking score on average. Moreover, compared to the state-of-the-art UAD methods, our method is easier to implement.
2024
-
Retrieval augmented deep anomaly detection for tabular data Thimonier et al. CIKM '24. [paper] [code]
Deep learning for tabular data has garnered increasing attention in recent years, yet employing deep models for structured data remains challenging. While these models excel with unstructured data, their efficacy with structured data has been limited. Recent research has introduced retrieval-augmented models to address this gap, demonstrating promising results in supervised tasks such as classification and regression. In this work, we investigate using retrieval-augmented models for anomaly detection on tabular data. We propose a reconstruction-based approach in which a transformer model learns to reconstruct masked features ofnormal samples. We test the effectiveness of KNN-based and attention-based modules to select relevant samples to help in the reconstruction process of the target sample. Our experiments on a benchmark of 31 tabular datasets reveal that augmenting this reconstruction-based anomaly detection (AD) method with sample-sample dependencies via retrieval modules significantly boosts performance. The present work supports the idea that retrieval module are useful to augment any deep AD method to enhance anomaly detection on tabular data. Our code to reproduce the experiments is made available on GitHub.
-
SimpDM: Self-supervision improves diffusion models for tabular data imputation Liu et al. CIKM '24. [paper] [code]
The ubiquity of missing data has sparked considerable attention and focus on tabular data imputation methods. Diffusion models, recognized as the cutting-edge technique for data generation, demonstrate significant potential in tabular data imputation tasks. However, in pursuit of diversity, vanilla diffusion models often exhibit sensitivity to initialized noises, which hinders the models from generating stable and accurate imputation results. Additionally, the sparsity inherent in tabular data poses challenges for diffusion models in accurately modeling the data manifold, impacting the robustness of these models for data imputation. To tackle these challenges, this paper introduces an advanced diffusion model named <u>S</u> elf-supervised <u>imp</u> utation <u>D</u> iffusion <u>M</u> odel (SimpDM for brevity), specifically tailored for tabular data imputation tasks. To mitigate sensitivity to noise, we introduce a self-supervised alignment mechanism that aims to regularize the model, ensuring consistent and stable imputation predictions. Furthermore, we introduce a carefully devised state-dependent data augmentation strategy within SimpDM, enhancing the robustness of the diffusion model when dealing with limited data. Extensive experiments demonstrate that SimpDM matches or outperforms state-of-the-art imputation methods across various scenarios.
-
Unsupervised diffusion based anomaly detection for time series Zuo et al. Applied Intelligence '24. [paper] [code]
Unsupervised anomaly detection aims to construct a model that effectively detects invisible anomalies by training and reconstruct normal data. While a significant amount of reconstruction-based methods has made effective progress for time series anomaly detection, challenges still exist in aspects such as temporal feature extraction and generalization ability. Firstly, temporal features of data are subject to local information interference in reconstruction methods, which limits the long-term signal reconstruction methods. Secondly, the training dataset collector is subject to information nourishment such as collection methods, collection periods and locations, and data patterns are diverse, requiring the model to rebuild normal data according to different patterns. These issues hinder the anomaly detection capability of reconstruction-based methods. We propose an unsupervised anomaly detection model based on a diffusion model, which learns normal data pattern learning through noisy forward diffusion and reverse noise regression. By using a cascaded structure and combining it with a structured state space layer, long-term time series signal feature can be well extracted. Different collection signals are distinguished by introducing collector entity ID embedding. The method proposed in this article significantly improves performance in experimental tests on three public datasets. Innovative aspects: (1) Utilizing the S4 method to capture long-term dependencies; (2) Employing a diffusion model for reconstruction learning; (3) Leveraging embedding techniques to enhance different pattern learning.
-
TSAD-C: Contaminated multivariate time-series anomaly detection with spatio-temporal graph conditional diffusion models Ho et al. Arxiv '24. [paper]
Mainstream unsupervised anomaly detection algorithms often excel in academic datasets, yet their real-world performance is restricted due to the controlled experimental conditions involving clean training data. Addressing the challenge of training with noise, a prevalent issue in practical anomaly detection, is frequently overlooked. In a pioneering endeavor, this study delves into the realm of label-level noise within sensory time-series anomaly detection (TSAD). This paper presents a novel and practical end-to-end unsupervised TSAD when the training data is contaminated with anomalies. The introduced approach, called TSAD-C, is devoid of access to abnormality labels during the training phase. TSAD-C encompasses three core modules: a Decontaminator to rectify anomalies (aka noise) present during training, a Long-range Variable Dependency Modeling module to capture long-term intra- and inter-variable dependencies within the decontaminated data that is considered as a surrogate of the pure normal data, and an Anomaly Scoring module to detect anomalies from all types. Our extensive experiments conducted on four reliable and diverse datasets conclusively demonstrate that TSAD-C surpasses existing methodologies, thus establishing a new state-of-the-art in the TSAD field.
-
TimeAutoDiff: TimeAutoDiff: Combining autoencoder and diffusion model for time series tabular data synthesizing Suh et al. Arxiv '24. [paper] [code]
In this paper, we leverage the power of latent diffusion models to generate synthetic time series tabular data. Along with the temporal and feature correlations, the heterogeneous nature of the feature in the table has been one of the main obstacles in time series tabular data modeling. We tackle this problem by combining the ideas of the variational auto-encoder (VAE) and the denoising diffusion probabilistic model (DDPM). Our model named as \texttt{TimeAutoDiff} has several key advantages including (1) Generality: the ability to handle the broad spectrum of time series tabular data from single to multi-sequence datasets; (2) Good fidelity and utility guarantees: numerical experiments on six publicly available datasets demonstrating significant improvements over state-of-the-art models in generating time series tabular data, across four metrics measuring fidelity and utility; (3) Fast sampling speed: entire time series data generation as opposed to the sequential data sampling schemes implemented in the existing diffusion-based models, eventually leading to significant improvements in sampling speed, (4) Entity conditional generation: the first implementation of conditional generation of multi-sequence time series tabular data with heterogenous features in the literature, enabling scenario exploration across multiple scientific and engineering domains. Codes are in preparation for release to the public, but available upon request.
-
TimeDiT: TimeDiT: General-purpose diffusion transformers for time series foundation model Cao et al. ICML 2024 Workshop. [paper]
Time series modeling is critical for many real-world applications, but most existing approaches are task-specific. With the unique characteristics such as missing values, irregular sampling, multi-resolution and complex temporal dependencies, it is challenging to develop general foundation models for time series. In this paper, we introduce the Time Series Diffusion Transformer (TimeDiT) equipped with three distinct masking schemes designed to facilitate a uniform training and inference pipeline across various time series tasks. TimeDiT leverages the transformer architecture for capturing temporal dependencies and employs diffusion processes for generating high-quality candidate samples without stringent assumptions on the target distribution. Extensive experiments conducted on different datasets encompassing tasks such as forecasting, imputation, and anomaly detection demonstrate the modelโs effectiveness. Both in-domain and zero-shot testing scenarios confirm the potential of our model to serve as a robust foundation model for multiple time series applications.
-
ProDiffAD: ProDiffAD: Progressively distilled diffusion models for multivariate time series anomaly detection in JointCloud environment Tian et al. IJCNN '24. [paper]
Anomaly detection in multivariate time series has emerged as a critical challenge in the time series research community with significant application potentials in various scenarios, ranging from fault diagnosis to system state estimation in Industrial Control Systems (ICSs). Meanwhile, the demand for high availability and extensibility of ICSs necessitates their deployment in the JointCloud environment. Therefore, the performance of the multivariate time series anomaly detection model is expected to be enhanced in the JointCloud environment when encountering dynamic network conditions among multiple clouds. Impressed by the effectiveness of diffusion models in anomaly detection, we have chosen diffusion models for empowering our anomaly detection model. Specifically, we propose Progressively Distilled Diffusion Anomaly Detection model (ProDiffAD) in the JointCloud environment to seek for the balance between effectiveness and efficiency. Moreover, our proposed model is capable of being adaptive with the dynamic network conditions in the JointCloud environment by modeling the intercloud network conditions. To validate the effectiveness and efficiency of our model, comprehensive experiments are conducted on two real and five synthetic datasets. The experimental results demonstrate that our proposed model achieves more accurate and faster multivariate time series anomaly detection in the JointCloud environment under dynamic network conditions compared to state-of-the-art models.
-
TSDE: Self-supervised learning of time series representation via diffusion process and imputation-interpolation-forecasting mask Senane et al. Arxiv '24. [paper] [code]
Time Series Representation Learning (TSRL) focuses on generating informative representations for various Time Series (TS) modeling tasks. Traditional Self-Supervised Learning (SSL) methods in TSRL fall into four main categories: reconstructive, adversarial, contrastive, and predictive, each with a common challenge of sensitivity to noise and intricate data nuances. Recently, diffusion-based methods have shown advanced generative capabilities. However, they primarily target specific application scenarios like imputation and forecasting, leaving a gap in leveraging diffusion models for generic TSRL. Our work, Time Series Diffusion Embedding (TSDE), bridges this gap as the first diffusion-based SSL TSRL approach. TSDE segments TS data into observed and masked parts using an Imputation-Interpolation-Forecasting (IIF) mask. It applies a trainable embedding function, featuring dual-orthogonal Transformer encoders with a crossover mechanism, to the observed part. We train a reverse diffusion process conditioned on the embeddings, designed to predict noise added to the masked part. Extensive experiments demonstrate TSDE's superiority in imputation, interpolation, forecasting, anomaly detection, classification, and clustering. We also conduct an ablation study, present embedding visualizations, and compare inference speed, further substantiating TSDE's efficiency and validity in learning representations of TS data.
-
DDTAD: Anomaly detection for telemetry time series using a denoising diffusion probabilistic model Sui et al. IEEE Sensors Journal '24. [paper]
Efficient anomaly detection in telemetry time series is of great importance to ensure the safety and reliability of spacecraft. However, traditional methods are complicated to train, have a limited ability to maintain details, and do not consider temporal-spatial patterns. These problems make it still a challenge to effectively identify anomalies for multivariate time series. In this article, we propose Denoising Diffusion Time Series Anomaly Detection (DDTAD), an unsupervised reconstruction-based method using a denoising diffusion probabilistic model (DDPM). Our model offers the advantages of training stability, flexibility, and robust high-quality sample generation. We employ 1-D-U-Net architecture to capture both temporal dependencies and intervariable information. We restore the anomalous regions from the noise-corrupted input while preserving the precise features of the normal regions intact. Anomalies are identified as discrepancies between the original time series input and its corresponding reconstruction. Experiments on two public datasets demonstrate that our method outperforms the current dominant data-driven methods and enables the accurate detection of point anomalies, contextual anomalies, and subsequence anomalies.
-
TimeADDM: Unsupervised anomaly detection for multivariate time series using diffusion model Hu et al. ICASSP '24. [paper] [code]
Unsupervised anomaly detection for multivariate time series (MTS) is a challenging task due to the difficulties of precisely learning the complex data patterns of MTS. The recent progress in sample generation achieved by diffusion models (DMs) motivates us to leverage the powerful learning ability of DMs to make a breakthrough in unsupervised anomaly detection for MTS. In this paper, we make the first attempt to design a novel diffusion-based anomaly detection model (named TimeADDM) for MTS data using the effective learning mechanism of DMs. To enhance the learning effect on MTS data, we propose to apply diffusion steps to the representations that accumulate the global time correlations through recurrent embedding. To enable the model for accurate anomaly detection, we design a reconstruction strategy that uses various levels of diffusion to compute the anomaly scores from different angles. By comparing TimeADDM with the state-of-the-art benchmarks, the results demonstrate that TimeADDM outperforms all baselines in terms of detection accuracy in four real-world MTS datasets and makes an improvement on the F1 score by up to 22%. The codes of the experiments with datasets and our algorithms are available at https://github.com/Hurongyao/TIMEADDM.
-
Generative inpainting for shapley-value-based anomaly explanation Tritscher et al. xAI '24. [paper] [code]
Feature relevance explanations currently constitute the most used type of explanation in anomaly detection related tasks such as cyber security and fraud detection. Recent works have underscored the importance of optimizing hyperparameters of post-hoc explainers which show a large impact on the resulting explanation quality. In this work, we propose a new method to set the hyperparameter of replacement values within Shapley-value-based post-hoc explainers. Our method leverages ideas from the domain of generative image inpainting, where generative machine learning models are used to replace parts of a given input image. We show that these generative models can also be applied to tabular replacement value generation for Shapley-value-based feature relevance explainers. Experimentally, we train a denoising diffusion probabilistic model for generative inpainting on two tabular anomaly detection datasets from the domains of network intrusion detection and occupational fraud detection, and integrate the generative inpainting model into the SHAP explanation framework. We empirically show that generative inpainting may be used to achieve consistently strong explanation quality when explaining different anomaly detectors on tabular data.
-
NGLS-Di๏ฌ: Diffusion model in normal gathering latent space for time series anomaly detection Bifet et al. ECML PKDD '24. [paper] [code]
Generative models have been widely used in time series anomaly detection, effectively identifying abnormal states within the data. Among these, diffusion models stand out for their powerful generative capabilities and have been increasingly applied to anomaly detection tasks, showcasing advantages in handling complex time series data. However, existing approaches employ diffusion models directly in the numerical space, which leads to several limitations, particularly in failing to reconstruct normal time series. To address these issues, we propose NGLS-Diff, an innovative approach that uses a diffusion model within a normal gathering latent space to enhance anomaly detection capabilities. This method introduces a novel latent space that captures the distributions of normal temporal patterns, thus rectifying the drawbacks of previous diffusion models. By operating the diffusion process in the normal gathering latent space, our approach significantly enhances the modelโs ability to detect anomalies within normal time series data. Extensive experiments conducted on four real-world datasets demonstrate the significant performance improvements of our NGLS-Diff compared to various methods, validating its effectiveness in time series anomaly detection.
-
Dynamic Splitting: Dynamic splitting of diffusion models for multivariate time series anomaly detection in a JointCloud environment Cao et al. KSEM 2024. [paper]
Speeding up multivariate time series anomaly detection models in the JointCloud environment is a challenging task. Typically, anomaly detection models with high computational requirements are offloaded to cross-cloud nodes with abundant computational resources, which can accelerate the inference procedure. However, this cross-cloud task offloading is hampered by the instability of network conditions in the JointCloud environment. Meanwhile, compressing the anomaly detection model to a size suitable for being deployed onto a cloud node close to the application scenario may lead to dramatic performance loss. To overcome these challenges, dynamic splitting anomaly detection models emerge as a promising solution by utilizing the computational power of various cross-cloud nodes to reduce the computational cost while maintaining their performance.
2023
-
D3R: Drift doesn't matter: Dynamic decomposition with diffusion reconstruction for unstable multivariate time series anomaly detection Wang et al. NeurIPS '23. [paper] [code]
Many unsupervised methods have recently been proposed for multivariate time series anomaly detection. However, existing works mainly focus on stable data yet often omit the drift generated from non-stationary environments, which may lead to numerous false alarms. We propose **D**ynamic **D**ecomposition with **D**iffusion **R**econstruction (D 3 R), a novel anomaly detection network for real-world unstable data to fill the gap. D 3 R tackles the drift via decomposition and reconstruction. In the decomposition procedure, we utilize data-time mix-attention to dynamically decompose long-period multivariate time series, overcoming the limitation of the local sliding window. The information bottleneck is critical yet difficult to determine in the reconstruction procedure. To avoid retraining once the bottleneck changes, we control it externally by noise diffusion and directly reconstruct the polluted data. The whole model can be trained end-to-end. Extensive experiments on various real-world datasets demonstrate that D 3 R significantly outperforms existing methods, with a 11% average relative improvement over the previous SOTA models.
-
Time series anomaly detection using diffusion-based models Pintilie et al. ICDMW '23. [paper] [code]
Diffusion models have been recently used for anomaly detection (AD) in images. In this paper we investigate whether they can also be leveraged for AD on multivariate time series (MTS). We test two diffusion-based models and compare them to several strong neural baselines. We also extend the PA%K protocol, by computing a ROCK-AUC metric, which is agnostic to both the detection threshold and the ratio K of correctly detected points. Our models outperform the baselines on synthetic datasets and are competitive on real-world datasets, illustrating the potential of diffusion-based methods for AD in multivariate time series.
-
Diffusion+: Diffusion-based time series data imputation for cloud failure prediction at microsoft 365 Yang et al. ESEC/FSE '23. [paper]
Ensuring reliability in large-scale cloud systems like Microsoft 365 is crucial. Cloud failures, such as disk and node failure, threaten service reliability, causing service interruptions and financial loss. Existing works focus on failure prediction and proactively taking action before failures happen. However, they suffer from poor data quality, like data missing in model training and prediction, which limits performance. In this paper, we focus on enhancing data quality through data imputation by the proposed Diffusion+, a sample-efficient diffusion model, to impute the missing data efficiently conditioned on the observed data. Experiments with industrial datasets and application practice show that our model contributes to improving the performance of downstream failure prediction.
-
FinDiff: FinDiff: Diffusion Models for Financial Tabular Data Generation Sattarov et al. ICAIF '23. [paper] [code]
The sharing of microdata, such as fund holdings and derivative instruments, by regulatory institutions presents a unique challenge due to strict data confidentiality and privacy regulations. These challenges often hinder the ability of both academics and practitioners to conduct collaborative research effectively. The emergence of generative models, particularly diffusion models, capable of synthesizing data mimicking the underlying distributions of real-world data presents a compelling solution. This work introduces Financial Tabular Diffusion (FinDiff), a diffusion model designed to generate real-world mixed-type financial tabular data for a variety of downstream tasks, for example, economic scenario modeling, stress tests, and fraud detection. The model uses embedding encodings to model mixed modality financial data, comprising both categorical and numeric attributes. The performance of FinDiff in generating synthetic tabular financial data is evaluated against state-of-the-art baseline models using three real-world financial datasets (including two publicly available datasets and one proprietary dataset). Empirical results demonstrate that FinDiff excels in generating synthetic tabular financial data with high fidelity, privacy, and utility.
-
DDMT: DDMT: Denoising diffusion mask transformer models for multivariate time series anomaly detection Yang et al. Arxiv '23. [paper] [code]
Anomaly detection in multivariate time series has emerged as a crucial challenge in time series research, with significant research implications in various fields such as fraud detection, fault diagnosis, and system state estimation. Reconstruction-based models have shown promising potential in recent years for detecting anomalies in time series data. However, due to the rapid increase in data scale and dimensionality, the issues of noise and Weak Identity Mapping (WIM) during time series reconstruction have become increasingly pronounced. To address this, we introduce a novel Adaptive Dynamic Neighbor Mask (ADNM) mechanism and integrate it with the Transformer and Denoising Diffusion Model, creating a new framework for multivariate time series anomaly detection, named Denoising Diffusion Mask Transformer (DDMT). The ADNM module is introduced to mitigate information leakage between input and output features during data reconstruction, thereby alleviating the problem of WIM during reconstruction. The Denoising Diffusion Transformer (DDT) employs the Transformer as an internal neural network structure for Denoising Diffusion Model. It learns the stepwise generation process of time series data to model the probability distribution of the data, capturing normal data patterns and progressively restoring time series data by removing noise, resulting in a clear recovery of anomalies. To the best of our knowledge, this is the first model that combines Denoising Diffusion Model and the Transformer for multivariate time series anomaly detection. Experimental evaluations were conducted on five publicly available multivariate time series anomaly detection datasets. The results demonstrate that the model effectively identifies anomalies in time series data, achieving state-of-the-art performance in anomaly detection.
-
NetDiffus: NetDiffus: Network traffic generation by diffusion models through time-series imaging Sivaroopan et al. Arxiv '23. [paper]
Network data analytics are now at the core of almost every networking solution. Nonetheless, limited access to networking data has been an enduring challenge due to many reasons including complexity of modern networks, commercial sensitivity, privacy and regulatory constraints. In this work, we explore how to leverage recent advancements in Diffusion Models (DM) to generate synthetic network traffic data. We develop an end-to-end framework - NetDiffus that first converts one-dimensional time-series network traffic into two-dimensional images, and then synthesizes representative images for the original data. We demonstrate that NetDiffus outperforms the state-of-the-art traffic generation methods based on Generative Adversarial Networks (GANs) by providing 66.4% increase in fidelity of the generated data and 18.1% increase in downstream machine learning tasks. We evaluate NetDiffus on seven diverse traffic traces and show that utilizing synthetic data significantly improves traffic fingerprinting, anomaly detection and traffic classification.
-
SaSDim: sasdim: Self-adaptive noise scaling diffusion model for spatial time series imputation Zhang et al. Arxiv '23. [paper]
Spatial time series imputation is critically important to many real applications such as intelligent transportation and air quality monitoring. Although recent transformer and diffusion model based approaches have achieved significant performance gains compared with conventional statistic based methods, spatial time series imputation still remains as a challenging issue due to the complex spatio-temporal dependencies and the noise uncertainty of the spatial time series data. Especially, recent diffusion process based models may introduce random noise to the imputations, and thus cause negative impact on the model performance. To this end, we propose a self-adaptive noise scaling diffusion model named SaSDim to more effectively perform spatial time series imputation. Specially, we propose a new loss function that can scale the noise to the similar intensity, and propose the across spatial-temporal global convolution module to more effectively capture the dynamic spatial-temporal dependencies. Extensive experiments conducted on three real world datasets verify the effectiveness of SaSDim by comparison with current state-of-the-art baselines.
-
DiffAD: Imputation-based time-series anomaly detection with conditional weight-incremental diffusion models Xiao et al. KDD '23. [paper] [code]
Existing anomaly detection models for time series are primarily trained with normal-point-dominant data and would become ineffective when anomalous points intensively occur in certain episodes. To solve this problem, we propose a new approach, called DiffAD, from the perspective of time series imputation. Unlike previous prediction- and reconstruction-based methods that adopt either partial or complete data as observed values for estimation, DiffAD uses a density ratio-based strategy to select normal observations flexibly that can easily adapt to the anomaly concentration scenarios. To alleviate the model bias problem in the presence of anomaly concentration, we design a new denoising diffusion-based imputation method to enhance the imputation performance of missing values with conditional weight-incremental diffusion, which can preserve the information of observed values and substantially improves data generation quality for stable anomaly detection. Besides, we customize a multi-scale state space model to capture the long-term dependencies across episodes with different anomaly patterns. Extensive experimental results on real-world datasets show that DiffAD performs better than state-of-the-art benchmarks.
-
TabADM: TabADM: Unsupervised tabular anomaly detection with diffusion models Zamberg et al. Arxiv '24. [paper]
Tables are an abundant form of data with use cases across all scientific fields. Real-world datasets often contain anomalous samples that can negatively affect downstream analysis. In this work, we only assume access to contaminated data and present a diffusion-based probabilistic model effective for unsupervised anomaly detection. Our model is trained to learn the density of normal samples by utilizing a unique rejection scheme to attenuate the influence of anomalies on the density estimation. At inference, we identify anomalies as samples in low-density regions. We use real data to demonstrate that our method improves detection capabilities over baselines. Furthermore, our method is relatively stable to the dimension of the data and does not require extensive hyperparameter tuning.
-
CoDi: CoDi: Co-evolving contrastive diffusion models for mixed-type tabular synthesis Lee et al. ICML '23. [paper] [code]
With growing attention to tabular data these days, the attempt to apply a synthetic table to various tasks has been expanded toward various scenarios. Owing to the recent advances in generative modeling, fake data generated by tabular data synthesis models become sophisticated and realistic. However, there still exists a difficulty in modeling discrete variables (columns) of tabular data. In this work, we propose to process continuous and discrete variables separately (but being conditioned on each other) by two diffusion models. The two diffusion models are co-evolved during training by reading conditions from each other. In order to further bind the diffusion models, moreover, we introduce a contrastive learning method with a negative sampling method. In our experiments with 11 real-world tabular datasets and 8 baseline methods, we prove the efficacy of the proposed method, called CoDiCoDi\texttt{CoDi}. Our code is available at https://github.com/ChaejeongLee/CoDi.
-
SSSD: Diffusion-based time series imputation and forecasting with structured state space models Alcaraz et al. Arxiv '23. [paper]
The imputation of missing values represents a significant obstacle for many real-world data analysis pipelines. Here, we focus on time series data and put forward SSSD, an imputation model that relies on two emerging technologies, (conditional) diffusion models as state-of-the-art generative models and structured state space models as internal model architecture, which are particularly suited to capture long-term dependencies in time series data. We demonstrate that SSSD matches or even exceeds state-of-the-art probabilistic imputation and forecasting performance on a broad range of data sets and different missingness scenarios, including the challenging blackout-missing scenarios, where prior approaches failed to provide meaningful results.
2025
-
MoVideo: MoVideo: Motion-Aware Video Generation with Diffusion Model Leonardis et al. ECCV '24. [paper] [code]
While recent years have witnessed great progress on using diffusion models for video generation, most of them are simple extensions of image generation frameworks, which fail to explicitly consider one of the key differences between videos and images, i.e., motion. In this paper, we propose a novel motion-aware video generation (MoVideo) framework that takes motion into consideration from two aspects: video depth and optical flow. The former regulates motion by per-frame object distances and spatial layouts, while the later describes motion by cross-frame correspondences that help in preserving fine details and improving temporal consistency. More specifically, given a key frame that exists or generated from text prompts, we first design a diffusion model with spatio-temporal modules to generate the video depth and the corresponding optical flows. Then, the video is generated in the latent space by another spatio-temporal diffusion model under the guidance of depth, optical flow-based warped latent video and the calculated occlusion mask. Lastly, we use optical flows again to align and refine different frames for better video decoding from the latent space to the pixel space. In experiments, MoVideo achieves state-of-the-art results in both text-to-video and image-to-video generation, showing promising prompt consistency, frame consistency and visual quality.
2024
-
Diffusion-based normality pre-training for weakly supervised video anomaly detection Basak et al. ESWA '24. [paper]
Weakly supervised video anomaly detection is the task of detecting anomalous frames in videos where no frame-level labels are provided at training phase. Previous methods usually employed a multiple instance learning (MIL)-based ranking loss to ensure inter-class separation. However, these methods are unable to completely utilise the information from the huge amounts of normal frames. Moreover, the performance of these methods is misguided by the erroneous initial prediction of the MIL-based classifier. Taking these shortcomings into consideration, we propose a diffusion-based normality learning pretrain step, which first involves training a GlobalโLocal Feature Encoder (GLFE) model with only normal videos to understand the feature distribution of normal frames. The resulting pre-trained GlobalโLocal feature encoder is further optimised using Multi-Sequence Contrastive loss using both normal and anomalous videos. Our proposed GLFE model captures long- and short-range temporal features using a Transformer block and pyramid of dilated convolutions in a two-branch setup. The model adaptively learns the relation between the two branch features by introducing the Co-Attention module, which provides a learnable fusion of features. Additionally we introduced a triplet contrastive loss to provide better separation between abnormal and normal frames in anomalous videos. The developed methodology is evaluated through extensive experiments on two public benchmark datasets (UCF-Crime and ShanghaiTech). The results obtained are comparable to or better than the existing state-of-the-art weakly supervised methods.
-
VADiffusion: VADiffusion: Compressed domain information guided conditional diffusion for video anomaly detection Liu et al. IEEE TCSVT '24. [paper] [code]
The demand for security surveillance has grown exponentially, making video anomaly detection particularly crucial. Existing image-domain based anomaly detection algorithms face implementation challenges due to several drawbacks, including latency during long-distance transmission, the need for complete decoding, and the complexity of network inference structures. Moreover, current frame prediction methods using generative models suffer from low prediction quality and mode collapse. To tackle these challenges, we propose VADiffusion, a compressed domain information guided conditional diffusion framework. VADiffusion adopts a dual-branch structure that combines motion vector reconstruction and I-frame prediction, effectively addressing the limitations of the reconstruction method in identifying sudden anomalies and the struggles of the frame prediction method in detecting persistent anomalies. Furthermore, our proposed framework incorporates the diffusion model into the realm of video anomaly detection, thereby improving the stability and accuracy of the model. Specifically, we employ sparse sampling of the compressed video, utilizing I-frames to capture appearance information and motion vectors to represent motion-related details. Different from the existing independent two-branch mechanism, we adopt a reconstruction-assisted prediction strategy, leveraging I-frames and the reconstructed motion vectors from the reconstruction branch as conditions for the diffusion model utilized in frame prediction. Ultimately, we perform decision fusion of reconstruction and prediction branches to determine anomalies. Through extensive experiments, we demonstrate that our algorithm achieves an effective trade-off between detection accuracy and model complexity. The source code is publicly released at https://github.com/LHaoooo/VADiffusion.
-
GiCiSAD: Graph-jigsaw conditioned diffusion model for skeleton-based video anomaly detection Karami et al. Arxiv '24. [paper]
Skeleton-based video anomaly detection (SVAD) is a crucial task in computer vision. Accurately identifying abnormal patterns or events enables operators to promptly detect suspicious activities, thereby enhancing safety. Achieving this demands a comprehensive understanding of human motions, both at body and region levels, while also accounting for the wide variations of performing a single action. However, existing studies fail to simultaneously address these crucial properties. This paper introduces a novel, practical and lightweight framework, namely Graph-Jigsaw Conditioned Diffusion Model for Skeleton-based Video Anomaly Detection (GiCiSAD) to overcome the challenges associated with SVAD. GiCiSAD consists of three novel modules: the Graph Attention-based Forecasting module to capture the spatio-temporal dependencies inherent in the data, the Graph-level Jigsaw Puzzle Maker module to distinguish subtle region-level discrepancies between normal and abnormal motions, and the Graph-based Conditional Diffusion model to generate a wide spectrum of human motions. Extensive experiments on four widely used skeleton-based video datasets show that GiCiSAD outperforms existing methods with significantly fewer training parameters, establishing it as the new state-of-the-art.
-
DHVAD: Denoising Diffusion-Augmented Hybrid Video Anomaly Detection via Reconstructing Noised Frames Cheng et al. IJCAI '24. [paper]
Video Anomaly Detection (VAD) is crucial for enhancing security and surveillance systems through automatic identification of irregular events, thereby enabling timely responses and augmenting overall situational awareness. Although existing methods have achieved decent detection performances on benchmarks, their predicted objects still remain ambiguous in terms of the semantic aspect. To overcome this limitation, we propose the Denoising diffusion-augmented Hybrid Video Anomaly Detection (DHVAD) framework. The proposed Denoising diffusion-based Reconstruction Unit (DRU) enhances the understanding of semantically accurate normality as a crucial component in DHVAD. Meanwhile, we propose a detection strategy that integrates the advantages of a prediction-based Frame Prediction Unit (FPU) with DRU by exploring the spatial-temporal consistency seamlessly. The competitive performance of DHVAD compared with state-of-the-art methods on three benchmark datasets proves the effectiveness of our framework. The extended experimental analysis demonstrates that our framework can gain a better understanding of the normality in terms of semantic accuracy for VAD and efficiently leverage the strengths of both components.
-
DiffVAD: Safeguarding Sustainable Cities: Unsupervised Video Anomaly Detection through Diffusion-based Latent Pattern Learning Zhang et al. IJCAI '24. [paper]
Sustainable cities requires high-quality community management and surveillance analytics, which are supported by video anomaly detection techniques. However, mainstream video anomaly detection techniques still require manually labeled data and do not apply to real-world massive videos. Without labeling, unsupervised video anomaly detection (UVAD) is challenged by the problem of pseudo-labeled noise and the openness of anomaly detection. In response, a diffusion-based latent pattern learning UVAD framework is proposed, called DiffVAD. The method learns potential patterns by generating different patterns of the same event through diffusion models. The detection of anomalies is realized by evaluating the pattern distribution. The different patterns of normal events are diverse but correlated, while the different patterns of abnormal events are more diffuse. This manner of detection is equally effective for unseen normal events in the training set. In addition, we design a refinement strategy for pseudo-labels to mitigate the effects of the noise problem. Extensive experiments on six benchmark datasets demonstrate the designโs promising generalization ability and high efficiency. Specifically, DiffVAD obtains an AUC score of 81.9% on the ShanghaiTech dataset.
-
Unsupervised conditional diffusion models in video anomaly detection for monitoring dust pollution Cai et al. Sensors '24. [paper]
Video surveillance is widely used in monitoring environmental pollution, particularly harmful dust. Currently, manual video monitoring remains the predominant method for analyzing potential pollution, which is inefficient and prone to errors. In this paper, we introduce a new unsupervised method based on latent diffusion models. Specifically, we propose a spatio-temporal network structure, which better integrates the spatial and temporal features of videos. Our conditional guidance mechanism samples frames of input videos to guide high-quality generation and obtains frame-level anomaly scores, comparing generated videos with original ones. We also propose an efficient compression strategy to reduce computational costs, allowing the model to perform in a latent space. The superiority of our method was demonstrated by numerical experiments in three public benchmarks and practical application analysis in coal mining over previous SOTA methods with better AUC, of at most over 3%. Our method accurately detects abnormal patterns in multiple challenging environmental monitoring scenarios, illustrating the potential application possibilities in the environmental protection domain and beyond.
-
FDAE: Flow-guided diffusion autoencoder for unsupervised video anomaly detection Liu et al. PRCV '23. [paper]
Video anomaly detection (VAD) aims to automatically detect abnormalities that deviate from expected behaviors. Due to the heavy reliance of mainstream one-class methods on labeled normal samples, some unsupervised VAD methods have emerged. However, these methods are unable to detect both appearance and motion anomalies in videos comprehensively. To address the above problem, we present for the first time a Flow-guided Diffusion AutoEncoder (FDAE) that generates objects of each frame to detect anomalous in an unsupervised manner. Our model takes foreground objects and motion information as inputs to train a conditional diffusion autoencoder for foreground reconstruction. To make our model concentrate on learning normal samples, we further design a sample refinement scheme and introduce a mixed Gaussian clustering network to enhance the capability of the diffusion model in capturing typical characteristics of normal samples. Comprehensive experiments on three public available datasets demonstrate that the proposed FDAE outperforms all competing unsupervised approaches.
2023
-
Stable video diffusion: Scaling latent video diffusion models to large datasets Blattmann et al. Arxiv '23. [paper] [code]
We present Stable Video Diffusion - a latent video diffusion model for high-resolution, state-of-the-art text-to-video and image-to-video generation. Recently, latent diffusion models trained for 2D image synthesis have been turned into generative video models by inserting temporal layers and finetuning them on small, high-quality video datasets. However, training methods in the literature vary widely, and the field has yet to agree on a unified strategy for curating video data. In this paper, we identify and evaluate three different stages for successful training of video LDMs: text-to-image pretraining, video pretraining, and high-quality video finetuning. Furthermore, we demonstrate the necessity of a well-curated pretraining dataset for generating high-quality videos and present a systematic curation process to train a strong base model, including captioning and filtering strategies. We then explore the impact of finetuning our base model on high-quality data and train a text-to-video model that is competitive with closed-source video generation. We also show that our base model provides a powerful motion representation for downstream tasks such as image-to-video generation and adaptability to camera motion-specific LoRA modules. Finally, we demonstrate that our model provides a strong multi-view 3D-prior and can serve as a base to finetune a multi-view diffusion model that jointly generates multiple views of objects in a feedforward fashion, outperforming image-based methods at a fraction of their compute budget. We release code and model weights at https://github.com/Stability-AI/generative-models .
-
FPDM: Feature prediction diffusion model for video anomaly detection Yan et al. ICCV '23. [paper]
Anomaly detection in the video is an important research area and a challenging task in real applications. Due to the unavailability of large-scale annotated anomaly events, most existing video anomaly detection (VAD) methods focus on learning the distribution of normal samples to detect the substantially deviated samples as anomalies. To well learn the distribution of normal motion and appearance, many auxiliary networks are employed to extract foreground object or action information. These high-level semantic features effectively filter the noise from the background to decrease its influence on detection models. However, the capability of these extra semantic models heavily affects the performance of the VAD methods. Motivated by the impressive generative and anti-noise capacity of diffusion model (DM), in this work, we introduce a novel DM-based method to predict the features of video frames for anomaly detection. We aim to learn the distribution of normal samples without any extra high-level semantic feature extraction models involved. To this end, we build two denoising diffusion implicit modules to predict and refine the features. The first module concentrates on feature motion learning, while the last focuses on feature appearance learning. To the best of our knowledge, it is the first DM-based method to predict frame features for VAD. The strong capacity of DMs also enables our method to more accurately predict the normal features than non-DM-based feature prediction-based VAD methods. Extensive experiments show that the proposed approach substantially outperforms state-of-the-art competing methods. The code is available atFPDM.
-
MoCoDAD: Multimodal Motion Conditioned Diffusion Model for Skeleton-based Video Anomaly Detection Flaborea et al. ICCV '23. [paper] [code]
Anomalies are rare and anomaly detection is often therefore framed as One-Class Classification (OCC), i.e. trained solely on normalcy. Leading OCC techniques constrain the latent representations of normal 1 motions to limited volumes and detect as abnormal anything outside, which accounts satisfactorily for the opensetโness of anomalies. But normalcy shares the same opensetโness property since humans can perform the same action in several ways, which the leading techniques neglect.We propose a novel generative model for video anomaly detection (VAD), which assumes that both normality and abnormality are multimodal. We consider skeletal representations and leverage state-of-the-art diffusion probabilistic models to generate multimodal future human poses. We contribute a novel conditioning on the past motion of people and exploit the improved mode coverage capabilities of diffusion processes to generate different-but-plausible future motions. Upon the statistical aggregation of future modes, an anomaly is detected when the generated set of motions is not pertinent to the actual future. We validate our model on 4 established benchmarks: UBnormal, HR-UBnormal, HR-STC, and HR-Avenue, with extensive experiments surpassing state-of-the-art results.
-
Exploring diffusion models for unsupervised video anomaly detection Tur et al. ICIP '23. [paper]
This paper investigates the performance of diffusion models for video anomaly detection (VAD) within the most challenging but also the most operational scenario in which the data annotations are not used. As being sparse, diverse, contextual, and often ambiguous, detecting abnormal events precisely is a very ambitious task. To this end, we rely only on the information-rich spatio-temporal data, and the reconstruction power of the diffusion models such that a high reconstruction error is utilized to decide the abnormality. Experiments performed on two large-scale video anomaly detection datasets demonstrate the consistent improvement of the proposed method over the state-of-the-art generative models while in some cases our method achieves better scores than the more complex models. This is the first study using a diffusion model and examining its parametersโ influence to present guidance for VAD in surveillance scenarios.
-
Ensemble anomaly score for video anomaly detection using denoise diffusion model and motion filters Wang et al. Neurocomputing '23. [paper]
Video anomaly detection is a crucial task that aims to differentiate between normal and abnormal events. The current mainstream approach involves constructing an anomaly score based on the reconstruction error from a prediction model trained on normal frame sequences. However, this approach is limited by its deterministic nature, which may cause the anomaly score to be sensitive to underlying noise in the video. To address this limitation, this paper proposes an ensemble anomaly score constructed using a series of stochastic reconstructions of the original prediction. Specifically, we introduce the denoise diffusion model as a perturbation-denoise tool. First, the original prediction undergoes a perturbation process through a diffusion process. Then, a denoise diffusion model trained on normal predictions is used to directly reconstruct a series of noise-free predictions from the perturbed versions with different noise levels. Finally, an ensemble of all the reconstruction errors is used to provide a more generic and regularized anomaly score. Furthermore, we introduce motion filters into the detection pipeline to improve the modeling accuracy of the image distribution. The proposed method is evaluated on public datasets, and experimental results demonstrate its effectiveness, particularly in detecting performance under out-of-distribution (OOD) conditions.
-
Masked Diffusion: Masked diffusion with task-awareness for procedure planning in instructional videos Fang et al. ICCV '23. [paper] [code]
A key challenge with procedure planning in instructional videos lies in how to handle a large decision space consisting of a multitude of action types that belong to various tasks. To understand real-world video content, an AI agent must proficiently discern these action types (e.g., pour milk, pour water, open lid, close lid, etc.) based on brief visual observation. Moreover, it must adeptly capture the intricate semantic relation of the action types and task goals, along with the variable action sequences. Recently, notable progress has been made via the integration of diffusion models and visual representation learning to address the challenge. However, existing models employ rudimentary mechanisms to utilize task information to manage the decision space. To overcome this limitation, we introduce a simple yet effective enhancement - a masked diffusion model. The introduced mask acts akin to a task-oriented attention filter, enabling the diffusion/denoising process to concentrate on a subset of action types. Furthermore, to bolster the accuracy of task classification, we harness more potent visual representation learning techniques. In particular, we learn a joint visual-text embedding, where a text embedding is generated by prompting a pre-trained vision-language model to focus on human actions. We evaluate the method on three public datasets and achieve state-of-the-art performance on multiple metrics. Code is available at https://github.com/ffzzy840304/Masked-PDPP.
-
Unsupervised video anomaly detection with diffusion models conditioned on compact motion representations Tur et al. ICIAP '23. [paper]
This paper aims to address the unsupervised video anomaly detection (VAD) problem, which involves classifying each frame in a video as normal or abnormal, without any access to labels. To accomplish this, the proposed method employs conditional diffusion models, where the input data is the spatiotemporal features extracted from a pre-trained network, and the condition is the features extracted from compact motion representations that summarize a given video segment in terms of its motion and appearance. Our method utilizes a data-driven threshold and considers a high reconstruction error as an indicator of anomalous events. This study is the first to utilize compact motion representations for VAD and the experiments conducted on two large-scale VAD benchmarks demonstrate that they supply relevant information to the diffusion model, and consequently improve VAD performances w.r.t the prior art. Importantly, our method exhibits better generalization performance across different datasets, notably outperforming both the state-of-the-art and baseline methods. The code of our method is available .
-
Reuse and Diffuse: Reuse and diffuse: Iterative denoising for text-to-video generation Gu et al. Arxiv '23. [paper] [code]
Inspired by the remarkable success of Latent Diffusion Models (LDMs) for image synthesis, we study LDM for text-to-video generation, which is a formidable challenge due to the computational and memory constraints during both model training and inference. A single LDM is usually only capable of generating a very limited number of video frames. Some existing works focus on separate prediction models for generating more video frames, which suffer from additional training cost and frame-level jittering, however. In this paper, we propose a framework called "Reuse and Diffuse" dubbed $\textit{VidRD}$ to produce more frames following the frames already generated by an LDM. Conditioned on an initial video clip with a small number of frames, additional frames are iteratively generated by reusing the original latent features and following the previous diffusion process. Besides, for the autoencoder used for translation between pixel space and latent space, we inject temporal layers into its decoder and fine-tune these layers for higher temporal consistency. We also propose a set of strategies for composing video-text data that involve diverse content from multiple existing datasets including video datasets for action recognition and image-text datasets. Extensive experiments show that our method achieves good results in both quantitative and qualitative evaluations. Our project page is available $\href{https://anonymous0x233.github.io/ReuseAndDiffuse/}{here}$.
-
GD-VDM: GD-VDM: Generated depth for better diffusion-based video generation Lapid et al. Arxiv '23. [paper] [code]
The field of generative models has recently witnessed significant progress, with diffusion models showing remarkable performance in image generation. In light of this success, there is a growing interest in exploring the application of diffusion models to other modalities. One such challenge is the generation of coherent videos of complex scenes, which poses several technical difficulties, such as capturing temporal dependencies and generating long, high-resolution videos. This paper proposes GD-VDM, a novel diffusion model for video generation, demonstrating promising results. GD-VDM is based on a two-phase generation process involving generating depth videos followed by a novel diffusion Vid2Vid model that generates a coherent real-world video. We evaluated GD-VDM on the Cityscapes dataset and found that it generates more diverse and complex scenes compared to natural baselines, demonstrating the efficacy of our approach.
-
Diffusion Model: Anomaly detection in satellite videos using diffusion models Awasthi et al. Arxiv '23. [paper]
The definition of anomaly detection is the identification of an unexpected event. Real-time detection of extreme events such as wildfires, cyclones, or floods using satellite data has become crucial for disaster management. Although several earth-observing satellites provide information about disasters, satellites in the geostationary orbit provide data at intervals as frequent as every minute, effectively creating a video from space. There are many techniques that have been proposed to identify anomalies in surveillance videos; however, the available datasets do not have dynamic behavior, so we discuss an anomaly framework that can work on very high-frequency datasets to find very fast-moving anomalies. In this work, we present a diffusion model which does not need any motion component to capture the fast-moving anomalies and outperforms the other baseline methods.
-
AADiff: AADiff: Audio-aligned video synthesis with text-to-image diffusion Lee et al. Arxiv '23. [paper]
Recent advances in diffusion models have showcased promising results in the text-to-video (T2V) synthesis task. However, as these T2V models solely employ text as the guidance, they tend to struggle in modeling detailed temporal dynamics. In this paper, we introduce a novel T2V framework that additionally employ audio signals to control the temporal dynamics, empowering an off-the-shelf T2I diffusion to generate audio-aligned videos. We propose audio-based regional editing and signal smoothing to strike a good balance between the two contradicting desiderata of video synthesis, i.e., temporal flexibility and coherence. We empirically demonstrate the effectiveness of our method through experiments, and further present practical applications for contents creation.
-
Align your latents: High-resolution video synthesis with latent diffusion models Blattmann et al. CVPR '23. [paper]
Latent Diffusion Models (LDMs) enable high-quality image synthesis while avoiding excessive compute demands by training a diffusion model in a compressed lower-dimensional latent space. Here, we apply the LDM paradigm to high-resolution video generation, a particularly resource-intensive task. We first pre-train an LDM on images only; then, we turn the image generator into a video generator by introducing a temporal dimension to the latent space diffusion model and fine-tuning on encoded image sequences, ie, videos. Similarly, we temporally align diffusion model upsamplers, turning them into temporally consistent video super resolution models. We focus on two relevant real-world applications: Simulation of in-the-wild driving data and creative content creation with text-to-video modeling. In particular, we validate our Video LDM on real driving videos of resolution 512x1024, achieving state-of-the-art performance. Furthermore, our approach can easily leverage off-the-shelf pre-trained image LDMs, as we only need to train a temporal alignment model in that case. Doing so, we turn the publicly available, state-of-the-art text-to-image LDM Stable Diffusion into an efficient and expressive text-to-video model with resolution up to 1280x2048. We show that the temporal layers trained in this way generalize to different fine-tuned text-to-image LDMs. Utilizing this property, we show the first results for personalized text-to-video generation, opening exciting directions for future content creation. Project page: https://nv-tlabs. github. io/VideoLDM/
2024
-
AnomalyDiffusion: AnomalyDiffusion: Few-shot anomaly image generation with diffusion model Hu et al. AAAI '24. [paper] [code]
Anomaly inspection plays an important role in industrial manufacture. Existing anomaly inspection methods are limited in their performance due to insufficient anomaly data. Although anomaly generation methods have been proposed to augment the anomaly data, they either suffer from poor generation authenticity or inaccurate alignment between the generated anomalies and masks. To address the above problems, we propose AnomalyDiffusion, a novel diffusion-based few-shot anomaly generation model, which utilizes the strong prior information of latent diffusion model learned from large-scale dataset to enhance the generation authenticity under few-shot training data. Firstly, we propose Spatial Anomaly Embedding, which consists of a learnable anomaly embedding and a spatial embedding encoded from an anomaly mask, disentangling the anomaly information into anomaly appearance and location information. Moreover, to improve the alignment between the generated anomalies and the anomaly masks, we introduce a novel Adaptive Attention Re-weighting Mechanism. Based on the disparities between the generated anomaly image and normal sample, it dynamically guides the model to focus more on the areas with less noticeable generated anomalies, enabling generation of accurately-matched anomalous image-mask pairs. Extensive experiments demonstrate that our model significantly outperforms the state-of-the-art methods in generation authenticity and diversity, and effectively improves the performance of downstream anomaly inspection tasks. The code and data are available in https://github.com/sjtuplayer/anomalydiffusion.
-
DIAG: Exploiting multimodal latent diffusion models for accurate anomaly detection in industry 5.0 Capogrosso et al. x-IA 2024. [paper]
Defect detection is the task of identifying defects in production samples. Usually, defect detection classifiers are trained on ground-truth data formed by normal samples (negative data) and samples with defects (positive data), where the latter are consistently fewer than normal samples. State-of-the-art data augmentation procedures add synthetic defect data by superimposing artifacts to normal samples to mitigate problems related to unbalanced training data. These techniques often produce out-of-distribution images, resulting in systems that learn what is not a normal sample but cannot accurately identify what a defect looks like. In this paper, we show the research we are carrying out in collaboration with QUALYCO, a startup spin-off of the University of Verona, on multimodal Latent Diffusion Models (LDMs) for accurate anomaly detection in Industry 5.0. Unlike conventional image generation techniques, we work within a human feedback loop pipeline, where domain experts provide multimodal guidance to the model through text descriptions and region localization of the possible anomalies. This strategic shift enhances the interpretability of results and fosters a more robust human feedback loop, facilitating iterative improvements of the generated outputs. Remarkably, our approach operates in a zero-shot manner, avoiding time-consuming fine-tuning procedures while achieving superior performance. We demonstrate its efficacy and versatility on the challenging KSDD2 dataset, achieving state-of-the-art results.
2023
-
Energy-based models for anomaly detection: A manifold diffusion recovery approach Yoon et al. NeurIPS '23. [paper] [code]
We present a new method of training energy-based models (EBMs) for anomaly detection that leverages low-dimensional structures within data. The proposed algorithm, Manifold Projection-Diffusion Recovery (MPDR), first perturbs a data point along a low-dimensional manifold that approximates the training dataset. Then, EBM is trained to maximize the probability of recovering the original data. The training involves the generation of negative samples via MCMC, as in conventional EBM training, but from a different distribution concentrated near the manifold. The resulting near-manifold negative samples are highly informative, reflecting relevant modes of variation in data. An energy function of MPDR effectively learns accurate boundaries of the training data distribution and excels at detecting out-of-distribution samples. Experimental results show that MPDR exhibits strong performance across various anomaly detection tasks involving diverse data types, such as images, vectors, and acoustic signals.
-
anomalib
: An anomaly detection library comprising state-of-the-art algorithms and features such as experiment management, hyper-parameter optimization, and edge inference. ๐ Project Page. -
PyAnomaly
: A PyTorch toolbox for video anomaly detection. ๐ Paper, ๐ Project Page.
- Anomaly Detection
- Diffusion Models
- Generative Models
- Unsupervised Learning
- Edge-Cloud Collaboration
- Large Language Models