-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make input a dictionary for multi-modal object detection #95
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you write a test?
@@ -2,6 +2,7 @@ | |||
|
|||
defaults: | |||
- COCO_TorchvisionFasterRCNN | |||
- override /model/[email protected]: preprocessor_multi_modal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this the default now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A preprocessor is required and I made the single-modal normalizer the default one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that specifies the multimodal one?
_target_: torchvision.transforms.Compose | ||
transforms: | ||
- _target_: mart.transforms.GetItems | ||
keys: ${datamodule.test_dataset.modalities} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is correct. These keys need to be specified independently because I could load the images in [depth, rgb] order but require they be order as [rgb, depth] for the model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would also be nice if this could use the yaml file below (preprocessor_single_modal.yaml).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now it use the other yaml file.
I think it's convenient to use interpolation by default. We can always change the keys in experiment.yaml
or in the command line if we encounter that rare situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But isn't there a silent bug if I switch the datamodule modality from [rgb, depth] to [depth, rgb]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure...
@@ -0,0 +1,12 @@ | |||
# @package model.modules.preprocessor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not create a modules directory? Why does this live under detection when it has nothing to do with detection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
model/detection -> model/modules
@@ -0,0 +1,6 @@ | |||
# @package model.modules.preprocessor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not create a modules directory? Why does this live under detection when it has nothing to do with detection?
I also think this should just be preprocessor.yaml
or something like that. Perhaps 8bit_preprocessor.yaml
or something indicating that this is doing 0-255 normalization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I named it tuple_normalizer.yaml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
What does this PR do?
Using dictionary input like
{"rgb": tensor1, "depth": tensor2}
should make it easier to compose multi-modal adversaries.The dictionary is later converted back to tensor so that models understand the input.
This is backward compatible with single-modal object detection, because single-modal is a special case of multi-modal.
Type of change
Please check all relevant options.
Testing
Please describe the tests that you ran to verify your changes. Consider listing any relevant details of your test configuration.
Before submitting
pre-commit run -a
command without errorsDid you have fun?
Make sure you had fun coding 🙃