Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

auto-annotation fails for large videos #1224

Closed
gitunit opened this issue Mar 2, 2020 · 10 comments
Closed

auto-annotation fails for large videos #1224

gitunit opened this issue Mar 2, 2020 · 10 comments
Labels
enhancement New feature or request
Milestone

Comments

@gitunit
Copy link

gitunit commented Mar 2, 2020

i have attempted to auto-annotate a large video (2 hours with appr. 140 000 frames) with a yolov3 model (OpenVino). the first attempt just stuck and there was no progress visible in the UI even after several days. later i did docker-compose down and up again and tried several times, it always got stuck and only recently showed popups that the task failed.
after another attempt i investigated with htop how much this process occupies CPUs and RAM, there i saw that as soon as RAM is fully occupied, the auto-annotation process stops (but the actual process keeps living).
so long story short, long videos seem not to work. with shorter videos there is no issue.

@gitunit
Copy link
Author

gitunit commented Mar 2, 2020

i guess there must be some kind of memory leak happening. im currently testing with another model (SSD) which doesnt have that rapid RAM size increase

@gitunit
Copy link
Author

gitunit commented Mar 2, 2020

i've just tried yolov3 auto-annotation on my local machine with 32 GB RAM, it ate all up. even the swap of 30 GB got fully occupied, so appr. 62 GB was used. there is obviously a big memory leak.

edit: at this stage im considering to drop the usage of OpenVINO altogether

@gitunit
Copy link
Author

gitunit commented Mar 2, 2020

i've tried the C++ sample from the ModelZoo (OpenVINO native) called "object_detection_demo_yolov3_async" and there was no memory leak observable. can i assume the interp.py is to blame?

@gitunit
Copy link
Author

gitunit commented Mar 2, 2020

after testing with an empty interpreter.py script, i think we can exclude that one. so there must be inside of CVAT some major memory leak happening.
i have also tested OpenVINO "native" (C++ and python) [see here] with this model, there i couldn't observe any memory leaks. thus it really must be inside CVAT.

@benhoff
Copy link
Contributor

benhoff commented Mar 3, 2020

Wondering if the use of exec means that the interp.py code or the result object (class is Result from the Auto Annotation module) is never properly reference counted to 0. Technically the exec'd code has a copy/reference of the result as well as the main code.

I thought I'd also seen references to the rq workers not cleaning up memory properly wrt auto annotation, but a quick search didn't find anything.

This is the line that the interp code gets exec'd
https://github.com/opencv/cvat/blob/b3f7f5b8bcc40a10871a3a0aefe3d4757d78b4e3/cvat/apps/engine/utils.py#L45

The method gets called from here:

https://github.com/opencv/cvat/blob/b3f7f5b8bcc40a10871a3a0aefe3d4757d78b4e3/cvat/apps/auto_annotation/inference.py#L30

I'd probably check how the compiled interp code is getting cleaned up. I'd also check if the Result object gets cleaned up.

One could probably put a print statement in a deconstructor of the class Result to see if it's getting cleaned up correctly.

But those are just idle thoughts.

@gitunit
Copy link
Author

gitunit commented Mar 4, 2020

@benhoff i agree, it's probably the Result object since i made a test with an empty interp.py file the memory leak was still there.
interesting is also that the memory leak is bigger for yolov3 than for SSD. which makes sense because yolov3 generates more results.
@bsekachev has probably more insights.

@gitunit
Copy link
Author

gitunit commented Mar 6, 2020

i have identified the actual problem. it is in this line:
https://github.com/opencv/cvat/blob/b3f7f5b8bcc40a10871a3a0aefe3d4757d78b4e3/cvat/apps/auto_annotation/inference.py#L133

this object grows until all frames have been processed and thus always lives in RAM. maybe one way is to write it to disk and then if finished with inference, process it in chunks. any other ideas? @bsekachev

@benhoff
Copy link
Contributor

benhoff commented Mar 7, 2020

this object grows until all frames have been processed and thus always lives in RAM. maybe one way is to write it to disk and then if finished with inference, process it in chunks. any other ideas?

You're probably better off batching it in the model manager. See here:

https://github.com/opencv/cvat/blob/24130cda415f3fce28ad5b6890368f284c18746c/cvat/apps/auto_annotation/model_manager.py#L244

The get_image_data is the start of the problem, because it has no idea how many frames are there.
Probably should check the amount of frames available, and then grab them in batches of 50 or 100 if it's over a threshold.

This file might already have modifications in a different pr. nmanovic mentioned that one of the upcoming pull requests was going to break some of my past work. See here for that note: #934 (comment)

It would be worth checking out what happened in #787 before you go fix the problem :)

@benhoff
Copy link
Contributor

benhoff commented Apr 14, 2020

@gitunit , would it be possible for you to test #1328 and see if it resolves your problem?

@gitunit
Copy link
Author

gitunit commented Apr 20, 2020

@benhoff i will try in the upcoming days

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants