Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor grid default boxes with torch meshgrid #3799
Refactor grid default boxes with torch meshgrid #3799
Changes from 5 commits
1ac87d7
422f2c2
c7a0872
aef2479
bbb010a
03c4d3f
2fc211d
c47c1de
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the
default_boxes
are generated on the CPU device and then migrated to the CUDA device. I've tried the following method to generate thedefault_boxes
directly on CUDA device, but It will take longer than thefor-loop
method.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've hit cases in the past where micro-benchmarks on exactly this part of the code could be faster if running on the CPU, but would present significant slowdowns when training on multiple GPUs. Even if this might be slower on micro-benchmarks if run on a single GPU, it might still be faster on multiple GPUs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fmassa Thus you recommend passing the target device to this method and putting them right away in there, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can leave it as is for now, but I would create a follow-up issue to benchmark this and the other configuration on multiple GPUs to verify
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, I've tested the inferring consumption time of the total COCO eval datasets betweed this two default boxes generations methods on different device, the consumption time of these two is very similar.
Validated with (using one card):