New datamodules design #572

djdameln · 2022-09-22T08:57:58Z

Description

This is a proof of concept for a new datamodule design. The base classes and all three datasets (MVTec, Folder, BTech) have been updated according to the new design.
Fixes Support training with only normal images (no evaluation) #277

Summary of the design

The AnomalibDataset and its subclasses now have a setup method, which is called from DataModule._setup(). When called, the dataframe will be created by using the create_dataset function. setup must be called before the dataset class can be used. This is so that we can instantiate the dataset in the constructor of the datamodule, before prepare_data is called.
The DataModule is responsible for performing the random subset splitting. It first creates the fixed subsets in the constructor, and then performs any additional subset splitting in DataModule._setup().
The _setup method from the DataModule is called only once, and is independent of the stage argument.
To facilitate the dynamic subset splitting on the DataModule side, helper functions can be added to the data utils (see concatenate_datasets and random_split for an example).

Responsibilites of the different classes:

create_dataset functions: Create a dataframe with information about the samples, including any information about the fixed train/val/test split that may follow from the folder organization or annotation files of the dataset.
AnomalibDataset and subclasses: Prepare dataset items and ground truth labels + masks to be used as model input.
AnomalibDataModule: Create dataloaders and perform dynamic subset splitting.

Advantages

Clear responsibility of the different classes.
The Dataset classes can be used standalone.
No seeds needed, because dynamic subset splitting is done only once.

Known issues

~~Some terminology is still a bit confusing, e.g. inconsistent use of val, infer and test (tranform_config_val vs pre_process_infer vs test_batch_size)~~
~~Tests need to be updated.~~

ashwinvaidya17

I went over it and I don't have any major feedback. I feel this is a much cleaner design and more flexible.

anomalib/data/base.py

jpcbertoldo

(i'm going to be a bit annoying because i think this refactor is important, and it will my work a lot)

this version is better than the other one IMO, but overall both are creating divergences with how LightningDatamodule is supposed to be used (and how their downstream Dataset is supposed to be designed)

There are some advantages in this design but I think following lightnings's patterns is a better way to go because then you make better use of it

Another guideline i'm considering: the behavior of parent classes should be kept minimal (or use the template method pattern, like lightning) to allow more composability.

i am trying to keep this short but below i give you 2 reasons to not go this way

i wrote another draft in my fork (didn't open another PR yet): my branch

base.py
mvtec.py (got much cleaner)

reason 1

the order of things is not quite right

example:

LightningDatamodule.prepare_data() is supposed to not change state and only manage the files in the system to make sure they`re in place
LightningDatamodule.setup() should be loading whatever necessary thing to the memory

In the way things are being done here (correct me if i'm wrong), the setup is being made even before the prepare_data().

well, because of (2), the Dataset somehow has to load things to the memory either (a) lazyly (following LightningDatamodule's pattern) or (b) it should be instantiated at LightningDatamodule.setup(), which means that the data preparation should not be a method (but a staticmethod could do).

in my draft i solved that by making the Dataset follow exactly the same pattern: .prepare_data() (which is calls a staticmethod) and .setup() (which will make use of the args from __init__).

reason 2

i think the current design is biased by the behavior in folder.py and forcing to fit the (already functional) code of the make_datate-like functions

from what i understan, a Dataset should keep the knowlege of how to find and load samples from a specific pre-defined split; it should not deal with dynamic splitting (i.e. creating random subsplits to create a validation set)

i think that kind of behavior should be at the Datamodule level because it can ensure the compatibiilty between the splits when they've all already been setup

in my draft i used torch.utils.data.Subset to do that and transfered all this behavior to AnomalibDatamodule; two advantages:

downstream class (see mvtec.py) become cleaner
the behavior is better factorized out

you will see that, for instance, seed and create_validation_set (smt. like that) dont need to be passed downstream to the child class (then to the function)

anomalib/data/base.py

jpcbertoldo · 2022-09-24T11:02:20Z

@djdameln i'm very willing to help on this refactor in particular but unfortunately i'm struggling to find time to create a neat and well explained PR like yours

maybe can we make a call? i think it would be more productive

djdameln · 2022-09-29T12:31:14Z

@jpcbertoldo Thanks for your comments. As it turns out, it's quite tricky to get this design right, so it's nice to have an extra pair of eyes on this. I agree with some, but not all of your suggestions.

In the way things are being done here (correct me if i'm wrong), the setup is being made even before the prepare_data().

The call to create_dataset can be made at the end of the prepare_data method, so in this case we would create the samples after downloading the data. But yeah, I do agree that this does not really fit the intended use of prepare_data and setup.

in my draft i solved that by making the Dataset follow exactly the same pattern: .prepare_data() (which is calls a staticmethod) and .setup() (which will make use of the args from init).

I had a look at your draft and correct me if I'm wrong, but I don't think it's necessary to have the implementation of prepare_data in the dataset class. We could keep this functionality in the datamodule. That way it would be called only once for the entire datamodule, instead of three times (once for every subset).

I do see the added value of moving the setup method to the dataset class, because it allows us to instantiate the dataset object in the constructor of the datamodule. That way we can get rid of the awkward create_dataset method of my design.

you will see that, for instance, seed and create_validation_set (smt. like that) dont need to be passed downstream to the child class (then to the function)

This is true for MVTec, where the train/test set is fixed, but not for the Folder dataset where we create a random train/test split at runtime. When following your design, we would still need to ensure somehow that the same seed is used between train_data.setup and test_data.setup.

This was actually the main motivation behind my latest design. By starting with a common dataset object with the 'Full' split, we ensure that we only have to call create_samples once, so there's no need to pass the seed around.

Anyway let's continue the discussion in a call. I'll schedule one for early next week.

Co-authored-by: Joao P C Bertoldo <[email protected]>

…inotoolkit/anomalib into da/datamodules-alternative

jpcbertoldo · 2022-10-27T16:19:21Z

When we perform the dynamic subset splitting in anomaly tasks, we often extract the validation set from the pre-supplied test set (for example, when val_split_mode is set to from_test). In this case, to create the validation set, we first need to create the test set and then randomly split the test set into validation and testing subsets. If we would use stage-related logic in the setup method, we would end up creating and splitting the test set in both the fit and test stages.

Hmm, ok I see. Makes sense.

This would also mean that we would have pass a seed to the subset splitting, to ensure that we'll end up with the same val/test split during validation and testing.

Got your point but still would be nice to have a seed attrribute (enforced to be not None when something random is involved, maybe?) in the datamodule, no?

Right now there is that option using the random_split but no seed being kept in the datamodule to keep a trace.
And I think it is ok to stop at the datamodule level, the datasets from the split shouldn't need to keep track of the seed IMO.

Btw, it's be nice to have an option ValSplitMode.NONE for the case where the user doesn't care about having a validation split (my case sometimes 🙃 ). (don't forget to adapt the logic in is_setup() in that case)

I feel that having to pass a seed around between different parts of the code to ensure non-overlapping subsets is error-prone and should be avoided.

Yup.

I guess the major drawback of this approach is that it leads to higher memory consumption, but since our dataset classes do not keep all of their images in memory but rather read the images from the file system when needed in __getitem__, I expect this to be negligible in most cases.

Yup, good point!

Are you going to make any other minor changes?

... decided to deviate from the intended use of the stage argument in setup....

I think it'd be nice to explicitly state this in the code and/or doc for the sake of the record and to have future devs not question themselves hahaha.

djdameln · 2022-10-28T13:22:57Z

Got your point but still would be nice to have a seed attrribute (enforced to be not None when something random is involved, maybe?) in the datamodule, no?

Good idea, I agree that this could be useful to have. I've added the seed argument to the datamodule. Please note that we're not using this argument when running the training from the entrypoint scripts, because we already set the global seed using seed_everything. So we don't need to pass the seed to ensure consistency between runs (but I still see the added value of the seed parameter for custom use-cases).

Btw, it's be nice to have an option ValSplitMode.NONE for the case where the user doesn't care about having a validation split (my case sometimes 🙃 ). (don't forget to adapt the logic in is_setup() in that case)

Done.

Are you going to make any other minor changes?

We're planning to merge this to a feature branch for now, as we're working on adding more functionality to the data side of the library (synthetic anomaly generation, support for video datasets), and we don't want to expose this to the main branch until it's stable. So we'll keep making incremental changes to the datamodules on the feature branch. In terms of overall design I don't expect many more changes though, so if you want to start building stuff on top of these base classes that should be fine. You could just target your PR to the feature branch.

I think it'd be nice to explicitly state this in the code and/or doc for the sake of the record and to have future devs not question themselves hahaha.

I've added an explanation to the docstring for now. We'll probably update the documentation on the datamodules at a later point.

anomalib/data/base/datamodule.py

jpcbertoldo

nice :)

So we don't need to pass the seed to ensure consistency between runs (but I still see the added value of the seed parameter for custom use-cases)

Yes, you are right.
BUT, there ways of ensuring consistency better than others haha.
Let's leave that for another chat : )

Thanks for the changes, great work!

Co-authored-by: Joao P C Bertoldo <[email protected]>

* New datamodules design (#572) * move sample generation to datamodule instead of dataset * move sample generation from init to setup * remove inference stage and add base classes * replace dataset classes with AnomalibDataset * move setup to base class, create samples as class method * update docstrings * refactor btech to new format * allow training with no anomalous data * remove MVTec name from comment * raise NotImplementedError in base class * allow both png and bmp images for btech * use label_index to check if dataset contains anomalous images * refactor getitem in dataset class * use iloc for indexing * move dataloader getters to base class * refactor to add validate stage in setup * implement alternative datamodules solution * small improvements * improve design * remove unused constructor arguments * adapt btech to new design * add prepare_data method for mvtec * implement more generic random splitting function * update docstrings for folder module * ensure type consistency when performing operations on dataset * change imports * change variable names * replace pass with NotImplementedError * allow training on folder without test images * use relative path for normal_test_dir * fix dataset tests * update validation set parameter in configs * change default argument * use setter for samples * hint options for val_split_mode * update assert message and docstring * revert name change dataset vs datamodule * typing and docstrings * remove samples argument from dataset constructor * val/test -> eval * remove Split.Full from enum * sort samples when setting * update warn message * formatting * use setter when creating samples in dataset classes * add tests for new dataset class * add test case for label aware random split * update parameter name in inferencers * move _setup implementation to base class * address codacy issues * fix pylint issues * codacy * update example dataset config in docs * fix test * move base classes to separate files (avoid circular import) * add base classes * update docstring * fix imports * validation_split_mode -> val_split_mode * update docs * Update anomalib/data/base/dataset.py Co-authored-by: Joao P C Bertoldo <[email protected]> * get length from self.samples * assert unique indices * check is_setup for individual datasets Co-authored-by: Joao P C Bertoldo <[email protected]> * remove assert in __getitem_\ Co-authored-by: Joao P C Bertoldo <[email protected]> * Update anomalib/data/btech.py Co-authored-by: Joao P C Bertoldo <[email protected]> * clearer assert message * clarify list inversion in comment * comments and typing * validate contents of samples dataframe before setting * add file paths check * add seed to random_split function * fix expected columns * fix typo * add seed parameter to datamodules * set global seed in test entrypoint * add NONE option to valsplitmode * clarify setup behaviour in docstring * fix typo Co-authored-by: Joao P C Bertoldo <[email protected]> Co-authored-by: Joao P C Bertoldo <[email protected]> * Video Datamodules (#676) * move sample generation to datamodule instead of dataset * move sample generation from init to setup * remove inference stage and add base classes * replace dataset classes with AnomalibDataset * move setup to base class, create samples as class method * update docstrings * refactor btech to new format * allow training with no anomalous data * remove MVTec name from comment * raise NotImplementedError in base class * allow both png and bmp images for btech * use label_index to check if dataset contains anomalous images * refactor getitem in dataset class * use iloc for indexing * move dataloader getters to base class * refactor to add validate stage in setup * implement alternative datamodules solution * small improvements * improve design * remove unused constructor arguments * adapt btech to new design * add prepare_data method for mvtec * implement more generic random splitting function * update docstrings for folder module * ensure type consistency when performing operations on dataset * change imports * change variable names * replace pass with NotImplementedError * allow training on folder without test images * use relative path for normal_test_dir * fix dataset tests * update validation set parameter in configs * change default argument * use setter for samples * hint options for val_split_mode * update assert message and docstring * revert name change dataset vs datamodule * typing and docstrings * remove samples argument from dataset constructor * val/test -> eval * remove Split.Full from enum * sort samples when setting * update warn message * formatting * use setter when creating samples in dataset classes * add tests for new dataset class * add test case for label aware random split * update parameter name in inferencers * move _setup implementation to base class * address codacy issues * fix pylint issues * codacy * update example dataset config in docs * fix test * move base classes to separate files (avoid circular import) * add base classes * update docstring * fix imports * validation_split_mode -> val_split_mode * update docs * Update anomalib/data/base/dataset.py Co-authored-by: Joao P C Bertoldo <[email protected]> * get length from self.samples * assert unique indices * check is_setup for individual datasets Co-authored-by: Joao P C Bertoldo <[email protected]> * remove assert in __getitem_\ Co-authored-by: Joao P C Bertoldo <[email protected]> * Update anomalib/data/btech.py Co-authored-by: Joao P C Bertoldo <[email protected]> * clearer assert message * clarify list inversion in comment * comments and typing * validate contents of samples dataframe before setting * add file paths check * add seed to random_split function * fix expected columns * fix typo * add pedestrian and avenue datasets and video utils * add seed parameter to datamodules * set global seed in test entrypoint * add NONE option to valsplitmode * clarify setup behaviour in docstring * add basic visualization for video datasets * simplify ucsdped implementation * add ucsd and avenue to __all__ * add default value for task * add tests for ucsd and avenue * add tests for video dataset and utils * add download info for avenue dataset * add download info for ucsd pedestrian dataset * more consistent naming * fix path to masks folder in gt dir * pass original image in batch to facilitate visualization * convert mask files for avenue * suppress warning due to torchvision bug * fix bug in avenue masks * store visualizations for each video in separate folder * rename parameters * add warning for clip_length > 1 * fix dataset tests * fix labels tensor shape bug * add pyav to requirements * add description for avenue dataset * use pathlib * Update anomalib/data/avenue.py Co-authored-by: Samet Akcay <[email protected]> * Update anomalib/data/avenue.py Co-authored-by: Samet Akcay <[email protected]> * Update anomalib/data/utils/video.py Co-authored-by: Samet Akcay <[email protected]> * Update anomalib/data/base/video.py Co-authored-by: Samet Akcay <[email protected]> * Update anomalib/data/base/video.py Co-authored-by: Samet Akcay <[email protected]> * Update anomalib/data/ucsd_ped.py Co-authored-by: Samet Akcay <[email protected]> * import video dataset from base * fix bug when collecting ucsd samples * clean up datamodules tests * fix tests * remove redundant test cases * retrieve masks as numpy array * use pathlib * variable name * pathlib * use preprocesser from arguments * fix indexing bug Co-authored-by: Joao P C Bertoldo <[email protected]> Co-authored-by: Samet Akcay <[email protected]> * Update lightning_inference.py * Make `val split ratio` configurable (#760) * make val split ratio configurable * use DeprecationWarning, update config key * Add support for Detection task type (#732) * add basic support for detection task * use enum for task type * formatting * small bugfix * add unit tests for bounding box conversion * update error message * use as_tensor * typing and docstring * explicit keyword arguments * simplify bbox handling in video dataset * docstring consistency * add missing licenses * add whitespace for readability * add missing license * Update anomalib/data/utils/boxes.py Co-authored-by: Samet Akcay <[email protected]> * Revert "Update anomalib/data/utils/boxes.py" This reverts commit cec6138. * add test case for custom collate function * docstring * add integration tests for detection dataloading * extend and clean up datamodules tests * add detection task type to visualizer tests * only show pred_boxes during inference * add detection support for torch inference * add detection support for openvino inference * test inference for all task types * pylint Co-authored-by: Samet Akcay <[email protected]> * [Datamodules] Update deprecation messages (#764) * update deprecation messages * raise warnings as DeprecationWarning * Improve image source parsing for Folder dataset (#784) * mask -> mask_dir * properly handle absolute and relative paths * make root path parameter optional * formatting * path -> root * update docs * remove options hint for name parameter * refactor function * Update anomalib/config/config.py Co-authored-by: Samet Akcay <[email protected]> * Update anomalib/config/config.py Co-authored-by: Samet Akcay <[email protected]> * make root and abnormal_dir optional * Update anomalib/data/folder.py Co-authored-by: Samet Akcay <[email protected]> Co-authored-by: Samet Akcay <[email protected]> * Synthetic anomaly for testing and validation (#634) * move sample generation to datamodule instead of dataset * move sample generation from init to setup * remove inference stage and add base classes * replace dataset classes with AnomalibDataset * move setup to base class, create samples as class method * update docstrings * refactor btech to new format * allow training with no anomalous data * remove MVTec name from comment * raise NotImplementedError in base class * allow both png and bmp images for btech * use label_index to check if dataset contains anomalous images * refactor getitem in dataset class * use iloc for indexing * move dataloader getters to base class * refactor to add validate stage in setup * implement alternative datamodules solution * small improvements * improve design * remove unused constructor arguments * adapt btech to new design * add prepare_data method for mvtec * implement more generic random splitting function * update docstrings for folder module * ensure type consistency when performing operations on dataset * change imports * change variable names * replace pass with NotImplementedError * allow training on folder without test images * use relative path for normal_test_dir * fix dataset tests * update validation set parameter in configs * change default argument * use setter for samples * hint options for val_split_mode * update assert message and docstring * revert name change dataset vs datamodule * typing and docstrings * remove samples argument from dataset constructor * val/test -> eval * remove Split.Full from enum * sort samples when setting * update warn message * formatting * use setter when creating samples in dataset classes * add tests for new dataset class * add test case for label aware random split * update parameter name in inferencers * move _setup implementation to base class * address codacy issues * fix pylint issues * codacy * update example dataset config in docs * fix test * move base classes to separate files (avoid circular import) * add synthetic dataset class * move augmenter to data directory * add base classes * update docstring * use synthetic dataset in base datamodule * fix imports * clean up synthetic anomaly dataset implementation * fix mistake in augmenter * change default split ratio * remove accidentally added file * validation_split_mode -> val_split_mode * update docs * Update anomalib/data/base/dataset.py Co-authored-by: Joao P C Bertoldo <[email protected]> * get length from self.samples * assert unique indices * check is_setup for individual datasets Co-authored-by: Joao P C Bertoldo <[email protected]> * remove assert in __getitem_\ Co-authored-by: Joao P C Bertoldo <[email protected]> * Update anomalib/data/btech.py Co-authored-by: Joao P C Bertoldo <[email protected]> * clearer assert message * clarify list inversion in comment * comments and typing * validate contents of samples dataframe before setting * add file paths check * add seed to random_split function * fix expected columns * fix typo * add seed parameter to datamodules * set global seed in test entrypoint * add NONE option to valsplitmode * clarify setup behaviour in docstring * add logging message * use val_split_ratio for synthetic validation set * pathlib * make synthetic anomaly available for test set * update configs * add tests * simplify test set splitting logic * update docstring * add missing licence * split_normal_and_anomalous -> split_by_label * VideoAnomalib -> AnomalibVideo Co-authored-by: Joao P C Bertoldo <[email protected]> * Bugfixes for Datamodules feature branch (#800) * properly handle NoneType mask_dir and add test case * fix wrong deprecation handling * Deprecate PreProcessor (#795) * deprecate PreProcessor * update configs * update deprecation messages * update video dataset * update inference dataset * move transforms to data module * update and extend transform tests * fix cyclic import * add validity checks for image size and center crop * pass image size as tuple * update path to get_transforms * update error message * fix center crop tuple conversion * update inferencers * remove draem transform config * update changelog * fix cyclic import * add crop size vs image size check * improve readability * mypy * use enum to configure input normalization * update lightning inference * update inference dataset * [Datamodules] Fix bug in bbox score to image score conversion (#803) handle empty box predictions * Improve handling of `test_split_mode='none'` and `val_split_mode='none'` (#801) * enable none as split mode * use get to retrieve config keys * update deprecation message and config key * fix to float transform * Detection improvements (#820) * apply pixel threshold to bbox detections * allow visualizing normal boxes * normalize box scores * fix bbox logic in base anomaly module * boxes_scores -> box_scores * fix inferencers * update changelog * update csflow config to new format * remove unused imports * line length * suppress bandit warnings * use torch rng in augmenter * use tuple instead of list * add missing params to dosctring * add missing licence information * COLS -> COLUMNS * typing and variable naming * remove duplicate parameter in docstring * im_dir -> image_dir * typing and docstring * typing * ValSplitMode -> ValidationSplitMode * add missing licence * rename variable * remove empty comment * remove unused class attribute * [Detection] Compute box score when generating boxes from masks (#828) * infer box scores from anomaly maps * discard single pixel boxes * revert discard single pixel boxes * add test case for bbox scores * update torch inferencer * minor refactor * revert val_split_mode -> validation_split_mode Co-authored-by: Joao P C Bertoldo <[email protected]> Co-authored-by: Samet Akcay <[email protected]>

* New datamodules design (#572) * move sample generation to datamodule instead of dataset * move sample generation from init to setup * remove inference stage and add base classes * replace dataset classes with AnomalibDataset * move setup to base class, create samples as class method * update docstrings * refactor btech to new format * allow training with no anomalous data * remove MVTec name from comment * raise NotImplementedError in base class * allow both png and bmp images for btech * use label_index to check if dataset contains anomalous images * refactor getitem in dataset class * use iloc for indexing * move dataloader getters to base class * refactor to add validate stage in setup * implement alternative datamodules solution * small improvements * improve design * remove unused constructor arguments * adapt btech to new design * add prepare_data method for mvtec * implement more generic random splitting function * update docstrings for folder module * ensure type consistency when performing operations on dataset * change imports * change variable names * replace pass with NotImplementedError * allow training on folder without test images * use relative path for normal_test_dir * fix dataset tests * update validation set parameter in configs * change default argument * use setter for samples * hint options for val_split_mode * update assert message and docstring * revert name change dataset vs datamodule * typing and docstrings * remove samples argument from dataset constructor * val/test -> eval * remove Split.Full from enum * sort samples when setting * update warn message * formatting * use setter when creating samples in dataset classes * add tests for new dataset class * add test case for label aware random split * update parameter name in inferencers * move _setup implementation to base class * address codacy issues * fix pylint issues * codacy * update example dataset config in docs * fix test * move base classes to separate files (avoid circular import) * add base classes * update docstring * fix imports * validation_split_mode -> val_split_mode * update docs * Update anomalib/data/base/dataset.py Co-authored-by: Joao P C Bertoldo <[email protected]> * get length from self.samples * assert unique indices * check is_setup for individual datasets Co-authored-by: Joao P C Bertoldo <[email protected]> * remove assert in __getitem_\ Co-authored-by: Joao P C Bertoldo <[email protected]> * Update anomalib/data/btech.py Co-authored-by: Joao P C Bertoldo <[email protected]> * clearer assert message * clarify list inversion in comment * comments and typing * validate contents of samples dataframe before setting * add file paths check * add seed to random_split function * fix expected columns * fix typo * add seed parameter to datamodules * set global seed in test entrypoint * add NONE option to valsplitmode * clarify setup behaviour in docstring * fix typo Co-authored-by: Joao P C Bertoldo <[email protected]> Co-authored-by: Joao P C Bertoldo <[email protected]> * Video Datamodules (#676) * move sample generation to datamodule instead of dataset * move sample generation from init to setup * remove inference stage and add base classes * replace dataset classes with AnomalibDataset * move setup to base class, create samples as class method * update docstrings * refactor btech to new format * allow training with no anomalous data * remove MVTec name from comment * raise NotImplementedError in base class * allow both png and bmp images for btech * use label_index to check if dataset contains anomalous images * refactor getitem in dataset class * use iloc for indexing * move dataloader getters to base class * refactor to add validate stage in setup * implement alternative datamodules solution * small improvements * improve design * remove unused constructor arguments * adapt btech to new design * add prepare_data method for mvtec * implement more generic random splitting function * update docstrings for folder module * ensure type consistency when performing operations on dataset * change imports * change variable names * replace pass with NotImplementedError * allow training on folder without test images * use relative path for normal_test_dir * fix dataset tests * update validation set parameter in configs * change default argument * use setter for samples * hint options for val_split_mode * update assert message and docstring * revert name change dataset vs datamodule * typing and docstrings * remove samples argument from dataset constructor * val/test -> eval * remove Split.Full from enum * sort samples when setting * update warn message * formatting * use setter when creating samples in dataset classes * add tests for new dataset class * add test case for label aware random split * update parameter name in inferencers * move _setup implementation to base class * address codacy issues * fix pylint issues * codacy * update example dataset config in docs * fix test * move base classes to separate files (avoid circular import) * add base classes * update docstring * fix imports * validation_split_mode -> val_split_mode * update docs * Update anomalib/data/base/dataset.py Co-authored-by: Joao P C Bertoldo <[email protected]> * get length from self.samples * assert unique indices * check is_setup for individual datasets Co-authored-by: Joao P C Bertoldo <[email protected]> * remove assert in __getitem_\ Co-authored-by: Joao P C Bertoldo <[email protected]> * Update anomalib/data/btech.py Co-authored-by: Joao P C Bertoldo <[email protected]> * clearer assert message * clarify list inversion in comment * comments and typing * validate contents of samples dataframe before setting * add file paths check * add seed to random_split function * fix expected columns * fix typo * add pedestrian and avenue datasets and video utils * add seed parameter to datamodules * set global seed in test entrypoint * add NONE option to valsplitmode * clarify setup behaviour in docstring * add basic visualization for video datasets * simplify ucsdped implementation * add ucsd and avenue to __all__ * add default value for task * add tests for ucsd and avenue * add tests for video dataset and utils * add download info for avenue dataset * add download info for ucsd pedestrian dataset * more consistent naming * fix path to masks folder in gt dir * pass original image in batch to facilitate visualization * convert mask files for avenue * suppress warning due to torchvision bug * fix bug in avenue masks * store visualizations for each video in separate folder * rename parameters * add warning for clip_length > 1 * fix dataset tests * fix labels tensor shape bug * add pyav to requirements * add description for avenue dataset * use pathlib * Update anomalib/data/avenue.py Co-authored-by: Samet Akcay <[email protected]> * Update anomalib/data/avenue.py Co-authored-by: Samet Akcay <[email protected]> * Update anomalib/data/utils/video.py Co-authored-by: Samet Akcay <[email protected]> * Update anomalib/data/base/video.py Co-authored-by: Samet Akcay <[email protected]> * Update anomalib/data/base/video.py Co-authored-by: Samet Akcay <[email protected]> * Update anomalib/data/ucsd_ped.py Co-authored-by: Samet Akcay <[email protected]> * import video dataset from base * fix bug when collecting ucsd samples * clean up datamodules tests * fix tests * remove redundant test cases * retrieve masks as numpy array * use pathlib * variable name * pathlib * use preprocesser from arguments * fix indexing bug Co-authored-by: Joao P C Bertoldo <[email protected]> Co-authored-by: Samet Akcay <[email protected]> * Update lightning_inference.py * Make `val split ratio` configurable (#760) * make val split ratio configurable * use DeprecationWarning, update config key * Add support for Detection task type (#732) * add basic support for detection task * use enum for task type * formatting * small bugfix * add unit tests for bounding box conversion * update error message * use as_tensor * typing and docstring * explicit keyword arguments * simplify bbox handling in video dataset * docstring consistency * add missing licenses * add whitespace for readability * add missing license * Update anomalib/data/utils/boxes.py Co-authored-by: Samet Akcay <[email protected]> * Revert "Update anomalib/data/utils/boxes.py" This reverts commit cec6138. * add test case for custom collate function * docstring * add integration tests for detection dataloading * extend and clean up datamodules tests * add detection task type to visualizer tests * only show pred_boxes during inference * add detection support for torch inference * add detection support for openvino inference * test inference for all task types * pylint Co-authored-by: Samet Akcay <[email protected]> * [Datamodules] Update deprecation messages (#764) * update deprecation messages * raise warnings as DeprecationWarning * Improve image source parsing for Folder dataset (#784) * mask -> mask_dir * properly handle absolute and relative paths * make root path parameter optional * formatting * path -> root * update docs * remove options hint for name parameter * refactor function * Update anomalib/config/config.py Co-authored-by: Samet Akcay <[email protected]> * Update anomalib/config/config.py Co-authored-by: Samet Akcay <[email protected]> * make root and abnormal_dir optional * Update anomalib/data/folder.py Co-authored-by: Samet Akcay <[email protected]> Co-authored-by: Samet Akcay <[email protected]> * Synthetic anomaly for testing and validation (#634) * move sample generation to datamodule instead of dataset * move sample generation from init to setup * remove inference stage and add base classes * replace dataset classes with AnomalibDataset * move setup to base class, create samples as class method * update docstrings * refactor btech to new format * allow training with no anomalous data * remove MVTec name from comment * raise NotImplementedError in base class * allow both png and bmp images for btech * use label_index to check if dataset contains anomalous images * refactor getitem in dataset class * use iloc for indexing * move dataloader getters to base class * refactor to add validate stage in setup * implement alternative datamodules solution * small improvements * improve design * remove unused constructor arguments * adapt btech to new design * add prepare_data method for mvtec * implement more generic random splitting function * update docstrings for folder module * ensure type consistency when performing operations on dataset * change imports * change variable names * replace pass with NotImplementedError * allow training on folder without test images * use relative path for normal_test_dir * fix dataset tests * update validation set parameter in configs * change default argument * use setter for samples * hint options for val_split_mode * update assert message and docstring * revert name change dataset vs datamodule * typing and docstrings * remove samples argument from dataset constructor * val/test -> eval * remove Split.Full from enum * sort samples when setting * update warn message * formatting * use setter when creating samples in dataset classes * add tests for new dataset class * add test case for label aware random split * update parameter name in inferencers * move _setup implementation to base class * address codacy issues * fix pylint issues * codacy * update example dataset config in docs * fix test * move base classes to separate files (avoid circular import) * add synthetic dataset class * move augmenter to data directory * add base classes * update docstring * use synthetic dataset in base datamodule * fix imports * clean up synthetic anomaly dataset implementation * fix mistake in augmenter * change default split ratio * remove accidentally added file * validation_split_mode -> val_split_mode * update docs * Update anomalib/data/base/dataset.py Co-authored-by: Joao P C Bertoldo <[email protected]> * get length from self.samples * assert unique indices * check is_setup for individual datasets Co-authored-by: Joao P C Bertoldo <[email protected]> * remove assert in __getitem_\ Co-authored-by: Joao P C Bertoldo <[email protected]> * Update anomalib/data/btech.py Co-authored-by: Joao P C Bertoldo <[email protected]> * clearer assert message * clarify list inversion in comment * comments and typing * validate contents of samples dataframe before setting * add file paths check * add seed to random_split function * fix expected columns * fix typo * add seed parameter to datamodules * set global seed in test entrypoint * add NONE option to valsplitmode * clarify setup behaviour in docstring * add logging message * use val_split_ratio for synthetic validation set * pathlib * make synthetic anomaly available for test set * update configs * add tests * simplify test set splitting logic * update docstring * add missing licence * split_normal_and_anomalous -> split_by_label * VideoAnomalib -> AnomalibVideo Co-authored-by: Joao P C Bertoldo <[email protected]> * Bugfixes for Datamodules feature branch (#800) * properly handle NoneType mask_dir and add test case * fix wrong deprecation handling * Deprecate PreProcessor (#795) * deprecate PreProcessor * update configs * update deprecation messages * update video dataset * update inference dataset * move transforms to data module * update and extend transform tests * fix cyclic import * add validity checks for image size and center crop * pass image size as tuple * update path to get_transforms * update error message * fix center crop tuple conversion * update inferencers * remove draem transform config * update changelog * fix cyclic import * add crop size vs image size check * improve readability * mypy * use enum to configure input normalization * update lightning inference * update inference dataset * [Datamodules] Fix bug in bbox score to image score conversion (#803) handle empty box predictions * Improve handling of `test_split_mode='none'` and `val_split_mode='none'` (#801) * enable none as split mode * use get to retrieve config keys * update deprecation message and config key * fix to float transform * Detection improvements (#820) * apply pixel threshold to bbox detections * allow visualizing normal boxes * normalize box scores * fix bbox logic in base anomaly module * boxes_scores -> box_scores * fix inferencers * update changelog * update csflow config to new format * remove unused imports * line length * refactor make_mvtec_dataset to improve flexibility * add visa dataset * move download and extract functionality to shared location * move visa subset splitting to separate method * update changelog * add tests for visa dataset * suppress bandit url warning * update test * address PR comments * suppress bandit warnings * use torch rng in augmenter * fix logic in prepare_data * add comments * cleaner zipfile import * address PR comments * use tuple instead of list * add missing params to dosctring * add missing licence information * COLS -> COLUMNS * typing and variable naming * remove duplicate parameter in docstring * im_dir -> image_dir * typing and docstring * typing * ValSplitMode -> ValidationSplitMode * add missing licence * rename variable * remove empty comment * remove unused class attribute * [Detection] Compute box score when generating boxes from masks (#828) * infer box scores from anomaly maps * discard single pixel boxes * revert discard single pixel boxes * add test case for bbox scores * update torch inferencer * minor refactor * revert val_split_mode -> validation_split_mode * use empty string instead of nan as empty mask path * typing Co-authored-by: Joao P C Bertoldo <[email protected]> Co-authored-by: Samet Akcay <[email protected]>

* fix pylint issues * codacy * update example dataset config in docs * fix test * move base classes to separate files (avoid circular import) * add base classes * update docstring * fix imports * validation_split_mode -> val_split_mode * update docs * Update anomalib/data/base/dataset.py Co-authored-by: Joao P C Bertoldo <[email protected]> * get length from self.samples * assert unique indices * check is_setup for individual datasets Co-authored-by: Joao P C Bertoldo <[email protected]> * remove assert in __getitem_\ Co-authored-by: Joao P C Bertoldo <[email protected]> * Update anomalib/data/btech.py Co-authored-by: Joao P C Bertoldo <[email protected]> * clearer assert message * clarify list inversion in comment * comments and typing * validate contents of samples dataframe before setting * add file paths check * add seed to random_split function * fix expected columns * fix typo * add pedestrian and avenue datasets and video utils * add seed parameter to datamodules * set global seed in test entrypoint * add NONE option to valsplitmode * clarify setup behaviour in docstring * Created rbad directory * Keep refactoring region-extractor * rename new_image_sizes to transformed_image_sizes * Renamed the variables in region extractor * post-process function in region extractor * Refactored tile-boxes function * Added feature extractor * Add main.py * Added feature extractor to tests * Update the jupyter notebook * Uncomment loa weights from region.py * Add feature and region extractors * Finished feature-extractor implementation * Rename the algo as rkde * New datamodules design (#572) * move sample generation to datamodule instead of dataset * move sample generation from init to setup * remove inference stage and add base classes * replace dataset classes with AnomalibDataset * move setup to base class, create samples as class method * update docstrings * refactor btech to new format * allow training with no anomalous data * remove MVTec name from comment * raise NotImplementedError in base class * allow both png and bmp images for btech * use label_index to check if dataset contains anomalous images * refactor getitem in dataset class * use iloc for indexing * move dataloader getters to base class * refactor to add validate stage in setup * implement alternative datamodules solution * small improvements * improve design * remove unused constructor arguments * adapt btech to new design * add prepare_data method for mvtec * implement more generic random splitting function * update docstrings for folder module * ensure type consistency when performing operations on dataset * change imports * change variable names * replace pass with NotImplementedError * allow training on folder without test images * use relative path for normal_test_dir * fix dataset tests * update validation set parameter in configs * change default argument * use setter for samples * hint options for val_split_mode * update assert message and docstring * revert name change dataset vs datamodule * typing and docstrings * remove samples argument from dataset constructor * val/test -> eval * remove Split.Full from enum * sort samples when setting * update warn message * formatting * use setter when creating samples in dataset classes * add tests for new dataset class * add test case for label aware random split * update parameter name in inferencers * move _setup implementation to base class * address codacy issues * fix pylint issues * codacy * update example dataset config in docs * fix test * move base classes to separate files (avoid circular import) * add base classes * update docstring * fix imports * validation_split_mode -> val_split_mode * update docs * Update anomalib/data/base/dataset.py Co-authored-by: Joao P C Bertoldo <[email protected]> * get length from self.samples * assert unique indices * check is_setup for individual datasets Co-authored-by: Joao P C Bertoldo <[email protected]> * remove assert in __getitem_\ Co-authored-by: Joao P C Bertoldo <[email protected]> * Update anomalib/data/btech.py Co-authored-by: Joao P C Bertoldo <[email protected]> * clearer assert message * clarify list inversion in comment * comments and typing * validate contents of samples dataframe before setting * add file paths check * add seed to random_split function * fix expected columns * fix typo * add seed parameter to datamodules * set global seed in test entrypoint * add NONE option to valsplitmode * clarify setup behaviour in docstring * fix typo Co-authored-by: Joao P C Bertoldo <[email protected]> Co-authored-by: Joao P C Bertoldo <[email protected]> * add basic visualization for video datasets * simplify ucsdped implementation * TODO: Investigate torch_model * add ucsd and avenue to __all__ * add default value for task * add tests for ucsd and avenue * add tests for video dataset and utils * add download info for avenue dataset * add download info for ucsd pedestrian dataset * more consistent naming * fix path to masks folder in gt dir * pass original image in batch to facilitate visualization * convert mask files for avenue * suppress warning due to torchvision bug * fix bug in avenue masks * store visualizations for each video in separate folder * rename parameters * add warning for clip_length > 1 * fix dataset tests * fix labels tensor shape bug * add pyav to requirements * Add TODO notes * add todo notes * add description for avenue dataset * use pathlib * Update anomalib/data/avenue.py Co-authored-by: Samet Akcay <[email protected]> * Update anomalib/data/avenue.py Co-authored-by: Samet Akcay <[email protected]> * Update anomalib/data/utils/video.py Co-authored-by: Samet Akcay <[email protected]> * Update anomalib/data/base/video.py Co-authored-by: Samet Akcay <[email protected]> * Update anomalib/data/base/video.py Co-authored-by: Samet Akcay <[email protected]> * Update anomalib/data/ucsd_ped.py Co-authored-by: Samet Akcay <[email protected]> * import video dataset from base * fix bug when collecting ucsd samples * clean up datamodules tests * fix tests * remove redundant test cases * add test case for normality model * retrieve masks as numpy array * use pathlib * variable name * pathlib * use preprocesser from arguments * fix indexing bug * Video Datamodules (#676) * move sample generation to datamodule instead of dataset * move sample generation from init to setup * remove inference stage and add base classes * replace dataset classes with AnomalibDataset * move setup to base class, create samples as class method * update docstrings * refactor btech to new format * allow training with no anomalous data * remove MVTec name from comment * raise NotImplementedError in base class * allow both png and bmp images for btech * use label_index to check if dataset contains anomalous images * refactor getitem in dataset class * use iloc for indexing * move dataloader getters to base class * refactor to add validate stage in setup * implement alternative datamodules solution * small improvements * improve design * remove unused constructor arguments * adapt btech to new design * add prepare_data method for mvtec * implement more generic random splitting function * update docstrings for folder module * ensure type consistency when performing operations on dataset * change imports * change variable names * replace pass with NotImplementedError * allow training on folder without test images * use relative path for normal_test_dir * fix dataset tests * update validation set parameter in configs * change default argument * use setter for samples * hint options for val_split_mode * update assert message and docstring * revert name change dataset vs datamodule * typing and docstrings * remove samples argument from dataset constructor * val/test -> eval * remove Split.Full from enum * sort samples when setting * update warn message * formatting * use setter when creating samples in dataset classes * add tests for new dataset class * add test case for label aware random split * update parameter name in inferencers * move _setup implementation to base class * address codacy issues * fix pylint issues * codacy * update example dataset config in docs * fix test * move base classes to separate files (avoid circular import) * add base classes * update docstring * fix imports * validation_split_mode -> val_split_mode * update docs * Update anomalib/data/base/dataset.py Co-authored-by: Joao P C Bertoldo <[email protected]> * get length from self.samples * assert unique indices * check is_setup for individual datasets Co-authored-by: Joao P C Bertoldo <[email protected]> * remove assert in __getitem_\ Co-authored-by: Joao P C Bertoldo <[email protected]> * Update anomalib/data/btech.py Co-authored-by: Joao P C Bertoldo <[email protected]> * clearer assert message * clarify list inversion in comment * comments and typing * validate contents of samples dataframe before setting * add file paths check * add seed to random_split function * fix expected columns * fix typo * add pedestrian and avenue datasets and video utils * add seed parameter to datamodules * set global seed in test entrypoint * add NONE option to valsplitmode * clarify setup behaviour in docstring * add basic visualization for video datasets * simplify ucsdped implementation * add ucsd and avenue to __all__ * add default value for task * add tests for ucsd and avenue * add tests for video dataset and utils * add download info for avenue dataset * add download info for ucsd pedestrian dataset * more consistent naming * fix path to masks folder in gt dir * pass original image in batch to facilitate visualization * convert mask files for avenue * suppress warning due to torchvision bug * fix bug in avenue masks * store visualizations for each video in separate folder * rename parameters * add warning for clip_length > 1 * fix dataset tests * fix labels tensor shape bug * add pyav to requirements * add description for avenue dataset * use pathlib * Update anomalib/data/avenue.py Co-authored-by: Samet Akcay <[email protected]> * Update anomalib/data/avenue.py Co-authored-by: Samet Akcay <[email protected]> * Update anomalib/data/utils/video.py Co-authored-by: Samet Akcay <[email protected]> * Update anomalib/data/base/video.py Co-authored-by: Samet Akcay <[email protected]> * Update anomalib/data/base/video.py Co-authored-by: Samet Akcay <[email protected]> * Update anomalib/data/ucsd_ped.py Co-authored-by: Samet Akcay <[email protected]> * import video dataset from base * fix bug when collecting ucsd samples * clean up datamodules tests * fix tests * remove redundant test cases * retrieve masks as numpy array * use pathlib * variable name * pathlib * use preprocesser from arguments * fix indexing bug Co-authored-by: Joao P C Bertoldo <[email protected]> Co-authored-by: Samet Akcay <[email protected]> * properly handle batch processing * include batch index in rois tensor * return rkde results as lists * update default rkde config * add basic support for detection task * use enum for task type * formatting * small bugfix * add unit tests for bounding box conversion * update error message * use as_tensor * typing and docstring * explicit keyword arguments * simplify bbox handling in video dataset * docstring consistency * add missing licenses * add whitespace for readability * add missing license * Update anomalib/data/utils/boxes.py Co-authored-by: Samet Akcay <[email protected]> * Revert "Update anomalib/data/utils/boxes.py" This reverts commit cec6138. * add test case for custom collate function * docstring * add integration tests for detection dataloading * extend and clean up datamodules tests * add detection task type to visualizer tests * Update lightning_inference.py * only show pred_boxes during inference * add detection support for torch inference * add detection support for openvino inference * test inference for all task types * pylint * Make `val split ratio` configurable (#760) * make val split ratio configurable * use DeprecationWarning, update config key * Add support for Detection task type (#732) * add basic support for detection task * use enum for task type * formatting * small bugfix * add unit tests for bounding box conversion * update error message * use as_tensor * typing and docstring * explicit keyword arguments * simplify bbox handling in video dataset * docstring consistency * add missing licenses * add whitespace for readability * add missing license * Update anomalib/data/utils/boxes.py Co-authored-by: Samet Akcay <[email protected]> * Revert "Update anomalib/data/utils/boxes.py" This reverts commit cec6138. * add test case for custom collate function * docstring * add integration tests for detection dataloading * extend and clean up datamodules tests * add detection task type to visualizer tests * only show pred_boxes during inference * add detection support for torch inference * add detection support for openvino inference * test inference for all task types * pylint Co-authored-by: Samet Akcay <[email protected]> * [Datamodules] Update deprecation messages (#764) * update deprecation messages * raise warnings as DeprecationWarning * update rkde * Improve image source parsing for Folder dataset (#784) * mask -> mask_dir * properly handle absolute and relative paths * make root path parameter optional * formatting * path -> root * update docs * remove options hint for name parameter * refactor function * Update anomalib/config/config.py Co-authored-by: Samet Akcay <[email protected]> * Update anomalib/config/config.py Co-authored-by: Samet Akcay <[email protected]> * make root and abnormal_dir optional * Update anomalib/data/folder.py Co-authored-by: Samet Akcay <[email protected]> Co-authored-by: Samet Akcay <[email protected]> * Synthetic anomaly for testing and validation (#634) * move sample generation to datamodule instead of dataset * move sample generation from init to setup * remove inference stage and add base classes * replace dataset classes with AnomalibDataset * move setup to base class, create samples as class method * update docstrings * refactor btech to new format * allow training with no anomalous data * remove MVTec name from comment * raise NotImplementedError in base class * allow both png and bmp images for btech * use label_index to check if dataset contains anomalous images * refactor getitem in dataset class * use iloc for indexing * move dataloader getters to base class * refactor to add validate stage in setup * implement alternative datamodules solution * small improvements * improve design * remove unused constructor arguments * adapt btech to new design * add prepare_data method for mvtec * implement more generic random splitting function * update docstrings for folder module * ensure type consistency when performing operations on dataset * change imports * change variable names * replace pass with NotImplementedError * allow training on folder without test images * use relative path for normal_test_dir * fix dataset tests * update validation set parameter in configs * change default argument * use setter for samples * hint options for val_split_mode * update assert message and docstring * revert name change dataset vs datamodule * typing and docstrings * remove samples argument from dataset constructor * val/test -> eval * remove Split.Full from enum * sort samples when setting * update warn message * formatting * use setter when creating samples in dataset classes * add tests for new dataset class * add test case for label aware random split * update parameter name in inferencers * move _setup implementation to base class * address codacy issues * fix pylint issues * codacy * update example dataset config in docs * fix test * move base classes to separate files (avoid circular import) * add synthetic dataset class * move augmenter to data directory * add base classes * update docstring * use synthetic dataset in base datamodule * fix imports * clean up synthetic anomaly dataset implementation * fix mistake in augmenter * change default split ratio * remove accidentally added file * validation_split_mode -> val_split_mode * update docs * Update anomalib/data/base/dataset.py Co-authored-by: Joao P C Bertoldo <[email protected]> * get length from self.samples * assert unique indices * check is_setup for individual datasets Co-authored-by: Joao P C Bertoldo <[email protected]> * remove assert in __getitem_\ Co-authored-by: Joao P C Bertoldo <[email protected]> * Update anomalib/data/btech.py Co-authored-by: Joao P C Bertoldo <[email protected]> * clearer assert message * clarify list inversion in comment * comments and typing * validate contents of samples dataframe before setting * add file paths check * add seed to random_split function * fix expected columns * fix typo * add seed parameter to datamodules * set global seed in test entrypoint * add NONE option to valsplitmode * clarify setup behaviour in docstring * add logging message * use val_split_ratio for synthetic validation set * pathlib * make synthetic anomaly available for test set * update configs * add tests * simplify test set splitting logic * update docstring * add missing licence * split_normal_and_anomalous -> split_by_label * VideoAnomalib -> AnomalibVideo Co-authored-by: Joao P C Bertoldo <[email protected]> * Bugfixes for Datamodules feature branch (#800) * properly handle NoneType mask_dir and add test case * fix wrong deprecation handling * Deprecate PreProcessor (#795) * deprecate PreProcessor * update configs * update deprecation messages * update video dataset * update inference dataset * move transforms to data module * update and extend transform tests * fix cyclic import * add validity checks for image size and center crop * pass image size as tuple * update path to get_transforms * update error message * fix center crop tuple conversion * update inferencers * remove draem transform config * update changelog * fix cyclic import * add crop size vs image size check * improve readability * mypy * use enum to configure input normalization * update lightning inference * update inference dataset * expose more parameters and fix wrong return format * fix tdd tests * update config * [Datamodules] Fix bug in bbox score to image score conversion (#803) handle empty box predictions * update config * apply pixel threshold to bbox detections * remove confidence threshold parameter from rkde * hardcode steepness and offset * rename variable * remove unused parameters from config * Improve handling of `test_split_mode='none'` and `val_split_mode='none'` (#801) * enable none as split mode * use get to retrieve config keys * update deprecation message and config key * update config with new keys * remove unused parameter * set device in rpn stage * move prediction format conversion to lightning model * clean up torch model * move region- and feature-extractor to separate files * allow visualizing normal boxes * refactor * WIP: simplify region extractor * simplify region extractor * cleanup and docstrings * typing * expose max detections per image parameter * explain configurable parameters * fix wrong config value * remove unnecessary squeeze * box_likelihood -> rcnn_box_threshold * update comments * remove unnecessary typing * separate density estimation stage from torch model * improve readability * change default transform settings * fix to float transform * simplify feature extractor * normalize box scores * further simplify region extractor * update comment * improve prn configurability * remove unnecessary check * use enum for roi stage options * use enum for feature scaling method * re-order parameters * clean up model dir * fix bbox logic in base anomaly module * update key in output dict * boxes_scores -> box_scores * remove notebook * add comments and todo * Detection improvements (#820) * apply pixel threshold to bbox detections * allow visualizing normal boxes * normalize box scores * fix bbox logic in base anomaly module * boxes_scores -> box_scores * fix inferencers * add readme * update changelog * update changelog * update csflow config to new format * initialize max_length as empty tensor * include RKDE in model tests * remove unused imports * line length * move kde classifier to shared location * fix import * re-use RKDE classifier in DFKDE * remove old imports * docstrings * fix codacy issues * load feature extractor weights from url * suppress bandit warnings * use torch rng in augmenter * typing * add fit method to torch model * fix typo * use enum when checking stage * use tuple instead of list * add missing params to dosctring * add missing licence information * COLS -> COLUMNS * typing and variable naming * remove duplicate parameter in docstring * im_dir -> image_dir * typing and docstring * typing * ValSplitMode -> ValidationSplitMode * add missing licence * rename variable * remove empty comment * remove unused class attribute * [Detection] Compute box score when generating boxes from masks (#828) * infer box scores from anomaly maps * discard single pixel boxes * revert discard single pixel boxes * add test case for bbox scores * update torch inferencer * minor refactor * revert val_split_mode -> validation_split_mode Co-authored-by: Joao P C Bertoldo <[email protected]> Co-authored-by: Samet <[email protected]>

djdameln added 18 commits September 9, 2022 17:37

move sample generation to datamodule instead of dataset

ee1cfce

move sample generation from init to setup

ec5199e

remove inference stage and add base classes

9f0a35e

replace dataset classes with AnomalibDataset

dea176f

move setup to base class, create samples as class method

62a04f8

update docstrings

e91afad

refactor btech to new format

df4a805

allow training with no anomalous data

c225a83

remove MVTec name from comment

ac0dc8a

raise NotImplementedError in base class

5d90209

allow both png and bmp images for btech

c1e6724

use label_index to check if dataset contains anomalous images

2d70d89

refactor getitem in dataset class

f5f17db

use iloc for indexing

f02065f

move dataloader getters to base class

9cba9da

refactor to add validate stage in setup

5b3e841

implement alternative datamodules solution

f652227

small improvements

0e565a4

djdameln requested review from ashwinvaidya17 and samet-akcay September 22, 2022 08:57

github-actions bot added the Data label Sep 22, 2022

ashwinvaidya17 reviewed Sep 22, 2022

View reviewed changes

anomalib/data/base.py Outdated Show resolved Hide resolved

anomalib/data/base.py Outdated Show resolved Hide resolved

anomalib/data/base.py Outdated Show resolved Hide resolved

djdameln mentioned this pull request Sep 22, 2022

sketch my suggestions #564

Closed

jpcbertoldo reviewed Sep 24, 2022

View reviewed changes

anomalib/data/base.py Outdated Show resolved Hide resolved

anomalib/data/base.py Outdated Show resolved Hide resolved

djdameln added 4 commits October 7, 2022 12:20

improve design

297195a

remove unused constructor arguments

94cabb7

adapt btech to new design

1ee8a96

add prepare_data method for mvtec

7fc5483

djdameln and others added 13 commits October 21, 2022 11:59

check is_setup for individual datasets

3e77014

Co-authored-by: Joao P C Bertoldo <[email protected]>

remove assert in __getitem_\

ede213a

Co-authored-by: Joao P C Bertoldo <[email protected]>

Update anomalib/data/btech.py

f5e2d24

Co-authored-by: Joao P C Bertoldo <[email protected]>

clearer assert message

d9e1369

clarify list inversion in comment

2e6bc60

comments and typing

af0cd99

Merge branch 'da/datamodules-alternative' of https://github.com/openv…

d508786

…inotoolkit/anomalib into da/datamodules-alternative

Merge branch 'main' into da/datamodules-alternative

c85713c

validate contents of samples dataframe before setting

5ee8480

add file paths check

a5e876a

add seed to random_split function

c490e30

fix expected columns

4808287

fix typo

10bbf9c

jpcbertoldo mentioned this pull request Oct 24, 2022

Add support for Photometric stereo (PS)-AD dataset #648

Closed

djdameln changed the base branch from main to feature/datamodules October 25, 2022 12:06

djdameln added 4 commits October 28, 2022 14:10

add seed parameter to datamodules

81d3ca3

set global seed in test entrypoint

b372dd1

add NONE option to valsplitmode

e07a12c

clarify setup behaviour in docstring

ffdb47c

jpcbertoldo reviewed Oct 28, 2022

View reviewed changes

anomalib/data/base/datamodule.py Outdated Show resolved Hide resolved

jpcbertoldo reviewed Oct 28, 2022

View reviewed changes

fix typo

dedfd4b

Co-authored-by: Joao P C Bertoldo <[email protected]>

djdameln merged commit b21045b into feature/datamodules Oct 31, 2022

djdameln deleted the da/datamodules-alternative branch October 31, 2022 10:26

monkeycc mentioned this pull request Feb 28, 2023

Support training with only normal images,How to configure #931

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New datamodules design #572

New datamodules design #572

djdameln commented Sep 22, 2022 •

edited by samet-akcay

Loading

ashwinvaidya17 left a comment

jpcbertoldo left a comment

jpcbertoldo commented Sep 24, 2022

djdameln commented Sep 29, 2022

jpcbertoldo commented Oct 27, 2022 •

edited

Loading

djdameln commented Oct 28, 2022 •

edited

Loading

jpcbertoldo left a comment •

edited

Loading

New datamodules design #572

New datamodules design #572

Conversation

djdameln commented Sep 22, 2022 • edited by samet-akcay Loading

Description

Summary of the design

Responsibilites of the different classes:

Advantages

Known issues

ashwinvaidya17 left a comment

Choose a reason for hiding this comment

jpcbertoldo left a comment

Choose a reason for hiding this comment

jpcbertoldo commented Sep 24, 2022

djdameln commented Sep 29, 2022

jpcbertoldo commented Oct 27, 2022 • edited Loading

djdameln commented Oct 28, 2022 • edited Loading

jpcbertoldo left a comment • edited Loading

Choose a reason for hiding this comment

djdameln commented Sep 22, 2022 •

edited by samet-akcay

Loading

jpcbertoldo commented Oct 27, 2022 •

edited

Loading

djdameln commented Oct 28, 2022 •

edited

Loading

jpcbertoldo left a comment •

edited

Loading