-
Notifications
You must be signed in to change notification settings - Fork 690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Custom Dataset Training Support #154
Conversation
…ib into feature/data/btad
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, great addition! I didn't manually test the custom dataset format yet, but I'll do that and will post here if I run into any issues.
README.md
Outdated
task: segmentation # classification or segmentation | ||
mask: <path/to/mask/annotations> #optional | ||
extensions: null | ||
split_ratio: 0.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add some comments here to the parameters that may be hard to understand. e.g.
split_ratio: 0.2 # ratio of the normal images that will be used to create a test split
README.md
Outdated
It is also possible to train on a custom dataset. To do so, `data` section in `config.yaml` is to be modified as follows: | ||
```yaml | ||
dataset: | ||
name: custom |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should use format
here instead of name
. For MVTec we also have a format
field in addition to name
. The way I see it, format
determines which dataset class is used under the hood, while name
can be anything that identifies the specific dataset that is used.
anomalib/config/config.py
Outdated
@@ -177,7 +177,8 @@ def get_configurable_parameters( | |||
config = update_input_size_config(config) | |||
|
|||
# Project Configs | |||
project_path = Path(config.project.path) / config.model.name / config.dataset.name / config.dataset.category | |||
category = config.dataset.category if "category" in config.dataset.keys() else "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be a bit more clear if we check the dataset type here, and only add the category to the path if the type is MVTec.
anomalib/data/custom.py
Outdated
return samples | ||
|
||
|
||
class CustomDataset(Dataset): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about the naming. Maybe FolderDataset
would be more appropriate? Custom sounds a bit like users can choose their own 'custom' format. But this class represents a dataset that follows a fixed format based on the folder structure of the data.
anomalib/data/custom.py
Outdated
The dataset expects that mask annotation filenames must be same as the original filename. | ||
To show an example, we therefore need to modify the mask filenames in MVTec dataset. | ||
|
||
>>> # Rename MVTec mask annotations so that they are the same as image filanames |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid the example in the docstring might cause some confusion with the users (why use the custom dataset class for MVTec if there is a dataset class specific for mvtec). Maybe we could keep it simple and start the example with the assumption that the user has a folder of normal images and a folder of abnormal images, and explicitly state this at the beginning of the example.
…o feature/data/custom-dataset
…o feature/data/custom-dataset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Description
This PR adds custom dataset support.
Fixes Support training with custom MVTec like dataset but without masks (ground truths) #147
Changes
Checklist