Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jpcbertoldo/mvtec ad loco 2 #553

Closed

Conversation

jpcbertoldo
Copy link
Contributor

Description

Create a new dataset: MVTec LOCO Anomaly Detection.

"LOCO" stands for "LOgical COnstraints"

I based myself on anomalib/data/mvtec.py.

imread_strategy

The dataset supports an option imread_strategy which allows the user how to choose when the images are loaded:

  • onthefly: behaviour I found in mvtec.py, the images are loaded upon demand during the training;
  • preload: all the images are cached in the memory (RAM, not GPU) when the dataset is being initialized.

anotype and super_anotype

Besides providing the binary label, I also create the dataset with two other categorical values:

  • super_anotype: is it a logical or structural anomaly? (or a normal?)
  • anotype: "what is the problem with the image?", mvtec ad also has different types of anomalies for each category but this is particularly more interesting here because there are many types of logical violations possible.

I specifically included this because I am interested in evaluating separately by those types but I will later create an issue for that feature.

mask vs. masks

MVTec LOCO's logical anomalies may include several anoamlies in a single image and to properly evaluate them one needs to consider them separately so they are segmented in different mask files in the ground truth.

Since the rest of library expects a tensor mask (SINGULAR), I merge them all into a single binary maks (with loss information because they cannot be separated anymore).

In order to later peform proper evaluation there is a second tensor masks (PLURAL) which encodes each anomalous region with a different value (0 is a normal pixel, and 1, 2, ..., N are anomalous pixels).

things in MVTecAD but not in MVTecLOCO

1) self.transform_config_val = self.transform_config_train

        if self.transform_config_train is not None and self.transform_config_val is None:
            self.transform_config_val = self.transform_config_train

Is there a good reason for assuming this?

For me it could make sense that self.transform_config_val could have light data augmentations (say, tiny brightness changes) but that should not be repeated in the validation set.

2) split_normal_images_in_train_set(samples, split_ratio, seed)

MVTec LOCO already defines fixed validation sets so i did not include the option of doing it dinamically like in MVTec AD.


Checklists

Changes

  • Bug fix (non-breaking change which fixes an issue)
  • Refactor (non-breaking change which refactors the code base)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Checklist

  • My code follows the pre-commit style and check guidelines of this project.
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • [] New and existing tests pass locally with my changes

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

"mask_paths": str(self.samples.iloc[index]["mask_paths"]),
# TODO CHECK IF THE DOUBLE CALL TO PREPROCESS WILL WORK WITH ALBUMENTATIONS
"masks": self.pre_process(image=image, mask=mask_dict["masks"])["mask"],
"mask": self.pre_process(image=image, mask=mask_dict["mask"])["mask"],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.pre_process is being called for the 3rd time here, will that create any problems?

I'm thinking that maybe the random transforms will apply the same transform every two times (for the image and for the mask).

category: str,
task: str = TASK_SEGMENTATION,
imread_strategy: str = IMREAD_STRATEGY_PRELOAD,
image_size: Optional[Union[int, Tuple[int, int]]] = None,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The images in this dataset are not squared.
The ratio of widh/height can end up too different than the original image when the image size is given as an int.

Maybe we should add a warning here?

Copy link
Contributor Author

@jpcbertoldo jpcbertoldo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ready to review

@jpcbertoldo jpcbertoldo marked this pull request as ready for review September 11, 2022 12:29
* comet benchmarking enabled

* updated BM docs

* tweaked comment

* commen changet

* fixed end of file

Co-authored-by: Samet Akcay <[email protected]>
@jpcbertoldo
Copy link
Contributor Author

Please ignore for now, I will wait for PR #558.
Also, I realize that the pre-loading of data is probably unnecessary so I'll remove it to make the PR simpler.

samet-akcay and others added 12 commits September 16, 2022 15:46
…openvinotoolkit#570)

* move sample generation to datamodule instead of dataset

* move sample generation from init to setup

* remove inference stage and add base classes

* replace dataset classes with AnomalibDataset

* move setup to base class, create samples as class method

* update docstrings

* refactor btech to new format

* allow training with no anomalous data

* remove MVTec name from comment

* raise NotImplementedError in base class

* allow both png and bmp images for btech

* use label_index to check if dataset contains anomalous images

* refactor getitem in dataset class

* use iloc for indexing

* move dataloader getters to base class

* refactor to add validate stage in setup

* Add warning message when there is no config file passed

* Extract get_transforms and get_height_and_width functions

* refactor pre-processor and fix visualizer normalization issue

* Revert thenew data refactor

* rename variable

* Revert the changes not merged yet

* Fix tests

* Fix tests

* Address codacy concerns

Co-authored-by: Dick Ameln <[email protected]>
* added hpo

* lint fixed

* Update hyperparameter_optimization.rst

* fixed file lint

* fixed documentation images

* added sweep doc image

* updated hpo docs to include images

* fixed linting errors

* added config folder to store sample sweeps

* fixed docs for new location of config files

* not needed. moved to config directory

* not needed moved to config directory

* renamed to configs

* changed to "configs"

* fixed grammar
* fix patchcore image-level score computation

* docstring and comment

* remove default value for n_neighbors

* torch.Tensor -> Tensor
* Add benchmark to tutorial

* Move export to tutorials

* Move hpo to tutorials

* Move inference to tutorials

* Move logging to tutorials

* Create installation in tutorials

* Create training to tutorials

* Create tutorials index

* Update conf.py file

* Add anomalib logos to logos directory

* Add data docs

* Add algos

* Add model docs

* Add reference api

* Remove blank line in metrics

* Add reference guide

* Add how to guides

* Add developer guide

* Add blog to how-to-guide

* Remove guides directory

* Add train custom data to how-to-guides

* Fix typos

* Add notebooks to how-to-guides

* Add anomalib favicon

* Add missing algo descriptions

* Rename Reference to Reference Guide

* Add how to add a new model

* fix typos

* Merge PR 544

* Minor refactor (openvinotoolkit#587)

* 🛠 Fix PatchCore image-level score computation (openvinotoolkit#580)

* fix patchcore image-level score computation

* docstring and comment

* remove default value for n_neighbors

* torch.Tensor -> Tensor

* Minor refactor

Co-authored-by: Dick Ameln <[email protected]>
Co-authored-by: Ashwin Vaidya <[email protected]>

* Address Dicks comments

Co-authored-by: Ashwin Vaidya <[email protected]>
Co-authored-by: Dick Ameln <[email protected]>
Co-authored-by: Ashwin Vaidya <[email protected]>
* Add notebook for hpo

* Reference notebook in docs

Co-authored-by: Ashwin Vaidya <[email protected]>
* Fix comet hpo + refactoring + fix metriccallback in benchmarking

* Move sweep runners + utils to anomalib

Co-authored-by: Ashwin Vaidya <[email protected]>
* Add util to convert single value to tuple

* Update documentation

* Remove unused pytest import

* Address PR comments

* update text in documentation

Co-authored-by: Ashwin Vaidya <[email protected]>
ashwinvaidya17 and others added 12 commits October 20, 2022 10:52
* refactor export callback

* refactor export functions

* Rename export_convert to export

* Rename optimize to export + fix tests

* Fix imports

* Address tests

* Add nosec to surpress subprocess warnings

* Add nosec to surpress run
Address docs build dependency issues
* move sample generation to datamodule instead of dataset

* move sample generation from init to setup

* remove inference stage and add base classes

* replace dataset classes with AnomalibDataset

* move setup to base class, create samples as class method

* update docstrings

* refactor btech to new format

* allow training with no anomalous data

* remove MVTec name from comment

* raise NotImplementedError in base class

* allow both png and bmp images for btech

* use label_index to check if dataset contains anomalous images

* refactor getitem in dataset class

* use iloc for indexing

* move dataloader getters to base class

* refactor to add validate stage in setup

* implement alternative datamodules solution

* small improvements

* improve design

* remove unused constructor arguments

* adapt btech to new design

* add prepare_data method for mvtec

* implement more generic random splitting function

* update docstrings for folder module

* ensure type consistency when performing operations on dataset

* change imports

* change variable names

* replace pass with NotImplementedError

* allow training on folder without test images

* use relative path for normal_test_dir

* fix dataset tests

* update validation set parameter in configs

* change default argument

* use setter for samples

* hint options for val_split_mode

* update assert message and docstring

* revert name change dataset vs datamodule

* typing and docstrings

* remove samples argument from dataset constructor

* val/test -> eval

* remove Split.Full from enum

* sort samples when setting

* update warn message

* formatting

* use setter when creating samples in dataset classes

* add tests for new dataset class

* add test case for label aware random split

* update parameter name in inferencers

* move _setup implementation to base class

* address codacy issues

* fix pylint issues

* codacy

* update example dataset config in docs

* fix test

* move base classes to separate files (avoid circular import)

* add base classes

* update docstring

* fix imports

* validation_split_mode -> val_split_mode

* update docs

* Update anomalib/data/base/dataset.py

Co-authored-by: Joao P C Bertoldo <[email protected]>

* get length from self.samples

* assert unique indices

* check is_setup for individual datasets

Co-authored-by: Joao P C Bertoldo <[email protected]>

* remove assert in __getitem_\

Co-authored-by: Joao P C Bertoldo <[email protected]>

* Update anomalib/data/btech.py

Co-authored-by: Joao P C Bertoldo <[email protected]>

* clearer assert message

* clarify list inversion in comment

* comments and typing

* validate contents of samples dataframe before setting

* add file paths check

* add seed to random_split function

* fix expected columns

* fix typo

* add seed parameter to datamodules

* set global seed in test entrypoint

* add NONE option to valsplitmode

* clarify setup behaviour in docstring

* fix typo

Co-authored-by: Joao P C Bertoldo <[email protected]>

Co-authored-by: Joao P C Bertoldo <[email protected]>
@github-actions github-actions bot added Benchmarking Callbacks CI CLI Config Dependencies Pull requests that update a dependency file HPO Inference Metrics Metric Component. Post-Processing The components that are related to post-processing Pre-Processing Setup Tools labels Oct 31, 2022
@djdameln
Copy link
Contributor

djdameln commented Apr 5, 2023

@jpcbertoldo I am closing this because it has been inactive for a long time and is outdated. Feel free to re-open if you resume working on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Benchmarking CI CLI Config Data Dependencies Pull requests that update a dependency file HPO Inference Metrics Metric Component. Notebooks Post-Processing The components that are related to post-processing Pre-Processing Setup Tests Tools
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for MVTEC LOCO AD
7 participants