-
Notifications
You must be signed in to change notification settings - Fork 517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make cache annotation optional #1332
Make cache annotation optional #1332
Conversation
So, let's first make a really, really clear what cache annotation does. I feel everyone feels different about what it does. So we should clarify EXACTLY what it does. I believe initially it was indented to cache resized images on disk, but AFAIK this is never used nowadays. So that part of code can be safely deleted as dead one ( Secondly, caching annotations is nothing but reading it from format stored on disk (COCO JSON) to something more usable by a dataset itself. And this is where our problem is coming from. First issue: We did not measure (benchmarked) how much time does it take to parse COCO annotation file and to get the necessary sample at a specified index. We should check whether increasing dataset size from 1e3 to 1e4, 1e5,1e6, 1e7 obeys O(n) for parsing and O(1) for accessing sample. If it's not - that should be fixed. Action point 1: Benchmark where we are now Second issue: I feel the whole concept of the base detection dataset class is plainly wrong and should be rewritten from scratch. It mush receive numpy arrays of boxes/labels/sample ids. Action point 2: I believe we should aim to make the base class as simple as possible:
That's my take how it could be improved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment in the discussion
There are 2 things; 1. COCO annotations loading, and 2. how we process these annotations.
I agree that it should be rewritten, but the main issue is backward compatibility. |
I don't see your point about the first issue.
|
src/super_gradients/training/datasets/detection_datasets/detection_dataset.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Goal
Adding option to run without caching
Notes
Why is it more complex than it may seem at first?
ignore_empty_annotations
requires to go over the whole dataset to know which index are empty or not in order to know the dataset length._load_annotation
).