Skip to content

huzongxiang/sowa

Repository files navigation

Soldier-Offier Window self-Attention (SOWA)

PyTorch Lightning Config: Hydra Template
Paper Conference

Description

concept

Visual anomaly detection is critical in industrial manufacturing, but traditional methods often rely on extensive normal datasets and custom models, limiting scalability. Recent advancements in large-scale visual-language models have significantly improved zero/few-shot anomaly detection. However, these approaches may not fully utilize hierarchical features, potentially missing nuanced details. We introduce a window self-attention mechanism based on the CLIP model, combined with learnable prompts to process multi-level features within a Soldier-Offier Window selfAttention (SOWA) framework. Our method has been tested on five benchmark datasets, demonstrating superior performance by leading in 18 out of 20 metrics compared to existing state-of-the-art techniques.

architecture

Installation

Pip

# clone project
git clone https://github.com/huzongxiang/sowa
cd sowa

# [OPTIONAL] create conda environment
conda create -n sowa python=3.9
conda activate sowa

# install pytorch according to instructions
# https://pytorch.org/get-started/

# install requirements
pip install -r requirements.txt

Conda

# clone project
git clone https://github.com/huzongxiang/sowa
cd sowa

# create conda environment and install dependencies
conda env create -f environment.yaml -n sowa

# activate conda environment
conda activate sowa

How to run

Data

Process the downloaded data using data scripts, specifying the data set location in the configuration file sowa_mvt.yaml

_target_: src.data.anomaly_clip_datamodule.AnomalyCLIPDataModule
data_dir:
  train: /home/hzx/Projects/Data/Visa
  valid: /home/hzx/Projects/Data/MVTec-AD
  test: /home/hzx/Projects/Data/MVTec-AD

Train

Train model with default configuration

# train on mvtec
python src/train.py trainer=gpu data=sowa_mvt model=sowa_hfwa

# train on visa
python src/train.py trainer=gpu data=sowa_visa model=sowa_hfwa

Inference

Weights can be downloaded from Huggingface Project or Baidu Cloud

# eval on visa
python src/eval.py trainer=gpu data=sowa_visa model=sowa_hfwa ckpt_path=your_mvtec_ckpt model.k_shot=true data.dataset.kshot.k_shot=4

# eval on mvt
python src/eval.py trainer=gpu data=sowa_mvt model=sowa_hfwa ckpt_path=your_visa_ckpt model.k_shot=true data.dataset.kshot.k_shot=4

Results

Comparisons with few-shot (K=4) anomaly detection methods on datasets of MVTec-AD, Visa, BTAD, DAGM and DTD Synthetic.

Metric Dataset WinCLIP April-GAN Ours
AC AUROC MVTec-AD 95.2±1.3 92.8±0.2 96.8±0.3
Visa 87.3±1.8 92.6±0.4 92.9±0.2
BTAD 87.0±0.2 92.1±0.2 94.8±0.2
DAGM 93.8±0.2 96.2±1.1 98.9±0.3
DTD-Synthetic 98.1±0.2 98.5±0.1 99.1±0.0
AC AP MVTec-AD 97.3±0.6 96.3±0.1 98.3±0.3
Visa 88.8±1.8 94.5±0.3 94.5±0.2
BTAD 86.8±0.0 95.2±0.5 95.5±0.7
DAGM 83.8±1.1 86.7±4.5 95.2±1.7
DTD-Synthetic 99.1±0.1 99.4±0.0 99.6±0.0
AS AUROC MVTec-AD 96.2±0.3 95.9±0.0 95.7±0.1
Visa 97.2±0.2 96.2±0.0 97.1±0.0
BTAD 95.8±0.0 94.4±0.1 97.1±0.0
DAGM 93.8±0.1 88.9±0.4 96.9±0.0
DTD-Synthetic 96.8±0.2 96.7±0.0 98.7±0.0
AS AUPRO MVTec-AD 89.0±0.8 91.8±0.1 92.4±0.2
Visa 87.6±0.9 90.2±0.1 91.4±0.0
BTAD 66.6±0.2 78.2±0.1 81.2±0.2
DAGM 82.4±0.3 77.8±0.9 94.4±0.1
DTD-Synthetic 90.1±0.5 92.2±0.0 96.6±0.1

Performance Comparison on MVTec-AD and Visa Datasets.

Method Source MVTec-AD AC AUROC MVTec-AD AS AUROC MVTec-AD AS PRO Visa AC AUROC Visa AS AUROC Visa AS PRO
SPADE arXiv 2020 84.8±2.5 92.7±0.3 87.0±0.5 81.7±3.4 96.6±0.3 87.3±0.8
PaDiM ICPR 2021 80.4±2.4 92.6±0.7 81.3±1.9 72.8±2.9 93.2±0.5 72.6±1.9
PatchCore CVPR 2022 88.8±2.6 94.3±0.5 84.3±1.6 85.3±2.1 96.8±0.3 84.9±1.4
WinCLIP CVPR 2023 95.2±1.3 96.2±0.3 89.0±0.8 87.3±1.8 97.2±0.2 87.6±0.9
April-GAN CVPR 2023 VAND workshop 92.8±0.2 95.9±0.0 91.8±0.1 92.6±0.4 96.2±0.0 90.2±0.1
PromptAD CVPR 2024 96.6±0.9 96.5±0.2 - 89.1±1.7 97.4±0.3 -
InCTRL CVPR 2024 94.5±1.8 - - 87.7±1.9 - -
SOWA Ours 96.8±0.3 95.7±0.1 92.4±0.2 92.9±0.2 97.1±0.0 91.4±0.0

Comparisons with few-shot anomaly detection methods on datasets of MVTec-AD, Visa, BTAD, DAGM and DTD Synthetic.

few-shot

Visualization

Visualization results under the few-shot setting (K=4).

concept

Mechanism

Hierarchical Results on MVTec-AD Dataset. A set of images showing the real outputs of the model, illustrating how different layers (H1 to H4) process various feature modes. Each row represents a different sample, with columns showing the original image, segmentation mask, heatmap, and feature outputs from H1 to H4, and fusion. mechanism

Inference Speed

Inference performance comparison of different methods on a single NVIDIA RTX3070 8GB GPU.

speed

Citation

Please cite the following paper if this work helps your project:

@article{hu2024sowa,
  title={SOWA: Adapting Hierarchical Frozen Window Self-Attention to Visual-Language Models for Better Anomaly Detection},
  author={Hu, Zongxiang and Zhang, zhaosheng},
  journal={arXiv preprint arXiv:2407.03634},
  year={2024}
}

Contact

If you have any problem with this code, please feel free to contact mail [email protected] or wechat voodoozx2015.