-
Notifications
You must be signed in to change notification settings - Fork 6.8k
IndexError in gluon-cv Mask-RCNN validation on master #17485
Comments
@Kh4L was this issue fixed? If yes what was the workaround? |
The issue occurs when batch size is greater than 1 per GPU. If user provide the Batch size of 1 per GPU, then Validation works perfectly fine. However, when batch size is greater than 1 per GPU then they see this error |
I did some analysis on different
Note: Validation doesn't support multi-batch. Meaning it always runs with 1 image per GPU irrespective of |
|
@karan6181 we can modify this line to if autograd.is_training():
x = x.reshape((-4, self._batch_images, -1, 0, 0, 0))
else:
# always use batch_size = 1 for inference
x = x.reshape((-4, 1, -1, 0, 0, 0)) |
@karan6181 tested it on EC2 instances and verified that this works. |
Description
An IndexError occurs during the first validation step when training gluon-cv Mask-RCNN with horovod.
Error Message
Steps to reproduce
Environment
We recommend using our script for collecting the diagnostic information. Run the following command and paste the outputs below:
The text was updated successfully, but these errors were encountered: