Soldier-Offier Window self-Attention (SOWA)

Description

Visual anomaly detection is critical in industrial manufacturing, but traditional methods often rely on extensive normal datasets and custom models, limiting scalability. Recent advancements in large-scale visual-language models have significantly improved zero/few-shot anomaly detection. However, these approaches may not fully utilize hierarchical features, potentially missing nuanced details. We introduce a window self-attention mechanism based on the CLIP model, combined with learnable prompts to process multi-level features within a Soldier-Offier Window selfAttention (SOWA) framework. Our method has been tested on five benchmark datasets, demonstrating superior performance by leading in 18 out of 20 metrics compared to existing state-of-the-art techniques.

Installation

Pip

# clone project
git clone https://github.com/huzongxiang/sowa
cd sowa

# [OPTIONAL] create conda environment
conda create -n sowa python=3.9
conda activate sowa

# install pytorch according to instructions
# https://pytorch.org/get-started/

# install requirements
pip install -r requirements.txt

Conda

# clone project
git clone https://github.com/huzongxiang/sowa
cd sowa

# create conda environment and install dependencies
conda env create -f environment.yaml -n sowa

# activate conda environment
conda activate sowa

How to run

Data

Process the downloaded data using data scripts, specifying the data set location in the configuration file sowa_mvt.yaml

_target_: src.data.anomaly_clip_datamodule.AnomalyCLIPDataModule
data_dir:
  train: /home/hzx/Projects/Data/Visa
  valid: /home/hzx/Projects/Data/MVTec-AD
  test: /home/hzx/Projects/Data/MVTec-AD

Train

Train model with default configuration

# train on mvtec
python src/train.py trainer=gpu data=sowa_mvt model=sowa_hfwa

# train on visa
python src/train.py trainer=gpu data=sowa_visa model=sowa_hfwa

Inference

Weights can be downloaded from Huggingface Project or Baidu Cloud

# eval on visa
python src/eval.py trainer=gpu data=sowa_visa model=sowa_hfwa ckpt_path=your_mvtec_ckpt model.k_shot=true data.dataset.kshot.k_shot=4

# eval on mvt
python src/eval.py trainer=gpu data=sowa_mvt model=sowa_hfwa ckpt_path=your_visa_ckpt model.k_shot=true data.dataset.kshot.k_shot=4

Results

Comparisons with few-shot (K=4) anomaly detection methods on datasets of MVTec-AD, Visa, BTAD, DAGM and DTD Synthetic.

Metric	Dataset	WinCLIP	April-GAN	Ours
AC AUROC	MVTec-AD	95.2±1.3	92.8±0.2	96.8±0.3
	Visa	87.3±1.8	92.6±0.4	92.9±0.2
	BTAD	87.0±0.2	92.1±0.2	94.8±0.2
	DAGM	93.8±0.2	96.2±1.1	98.9±0.3
	DTD-Synthetic	98.1±0.2	98.5±0.1	99.1±0.0
AC AP	MVTec-AD	97.3±0.6	96.3±0.1	98.3±0.3
	Visa	88.8±1.8	94.5±0.3	94.5±0.2
	BTAD	86.8±0.0	95.2±0.5	95.5±0.7
	DAGM	83.8±1.1	86.7±4.5	95.2±1.7
	DTD-Synthetic	99.1±0.1	99.4±0.0	99.6±0.0
AS AUROC	MVTec-AD	96.2±0.3	95.9±0.0	95.7±0.1
	Visa	97.2±0.2	96.2±0.0	97.1±0.0
	BTAD	95.8±0.0	94.4±0.1	97.1±0.0
	DAGM	93.8±0.1	88.9±0.4	96.9±0.0
	DTD-Synthetic	96.8±0.2	96.7±0.0	98.7±0.0
AS AUPRO	MVTec-AD	89.0±0.8	91.8±0.1	92.4±0.2
	Visa	87.6±0.9	90.2±0.1	91.4±0.0
	BTAD	66.6±0.2	78.2±0.1	81.2±0.2
	DAGM	82.4±0.3	77.8±0.9	94.4±0.1
	DTD-Synthetic	90.1±0.5	92.2±0.0	96.6±0.1

Performance Comparison on MVTec-AD and Visa Datasets.

Method	Source	MVTec-AD AC AUROC	MVTec-AD AS AUROC	MVTec-AD AS PRO	Visa AC AUROC	Visa AS AUROC	Visa AS PRO
SPADE	arXiv 2020	84.8±2.5	92.7±0.3	87.0±0.5	81.7±3.4	96.6±0.3	87.3±0.8
PaDiM	ICPR 2021	80.4±2.4	92.6±0.7	81.3±1.9	72.8±2.9	93.2±0.5	72.6±1.9
PatchCore	CVPR 2022	88.8±2.6	94.3±0.5	84.3±1.6	85.3±2.1	96.8±0.3	84.9±1.4
WinCLIP	CVPR 2023	95.2±1.3	96.2±0.3	89.0±0.8	87.3±1.8	97.2±0.2	87.6±0.9
April-GAN	CVPR 2023 VAND workshop	92.8±0.2	95.9±0.0	91.8±0.1	92.6±0.4	96.2±0.0	90.2±0.1
PromptAD	CVPR 2024	96.6±0.9	96.5±0.2	-	89.1±1.7	97.4±0.3	-
InCTRL	CVPR 2024	94.5±1.8	-	-	87.7±1.9	-	-
SOWA	Ours	96.8±0.3	95.7±0.1	92.4±0.2	92.9±0.2	97.1±0.0	91.4±0.0

Comparisons with few-shot anomaly detection methods on datasets of MVTec-AD, Visa, BTAD, DAGM and DTD Synthetic.

Visualization

Visualization results under the few-shot setting (K=4).

Mechanism

Hierarchical Results on MVTec-AD Dataset. A set of images showing the real outputs of the model, illustrating how different layers (H1 to H4) process various feature modes. Each row represents a different sample, with columns showing the original image, segmentation mask, heatmap, and feature outputs from H1 to H4, and fusion.

Inference Speed

Inference performance comparison of different methods on a single NVIDIA RTX3070 8GB GPU.

Citation

Please cite the following paper if this work helps your project:

@article{hu2024sowa,
  title={SOWA: Adapting Hierarchical Frozen Window Self-Attention to Visual-Language Models for Better Anomaly Detection},
  author={Hu, Zongxiang and Zhang, zhaosheng},
  journal={arXiv preprint arXiv:2407.03634},
  year={2024}
}

Contact

If you have any problem with this code, please feel free to contact mail [email protected] or wechat voodoozx2015.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
configs		configs
notebooks		notebooks
scripts		scripts
src		src
tests		tests
.DS_Store		.DS_Store
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.project-root		.project-root
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
environment.yaml		environment.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Soldier-Offier Window self-Attention (SOWA)

Description

Installation

Pip

Conda

How to run

Data

Train

Inference

Results

Visualization

Mechanism

Inference Speed

Citation

Contact

About

Releases

Packages

Languages

License

huzongxiang/sowa

Folders and files

Latest commit

History

Repository files navigation

Soldier-Offier Window self-Attention (SOWA)

Description

Installation

Pip

Conda

How to run

Data

Train

Inference

Results

Visualization

Mechanism

Inference Speed

Citation

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages