Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix bug when the target is empty in FCOS #5267

Merged
merged 4 commits into from
Jan 26, 2022
Merged

fix bug when the target is empty in FCOS #5267

merged 4 commits into from
Jan 26, 2022

Conversation

xiaohu2015
Copy link
Contributor

@xiaohu2015 xiaohu2015 commented Jan 24, 2022

Last week, we released the detetion model: FCOS at #4961

some users have found there is a problem when the inputs has empty targets.
#5266

same issus can also be found in detectron2: facebookresearch/detectron2#3851, facebookresearch/detectron2#3910

we fix this bug in this pr, you can test:

model = fcos_resnet50_fpn(pretrained=False)
out = model(torch.zeros((1,3,512,512)), targets=[{"boxes": torch.empty(0,4), "labels": torch.empty(0,1).to(torch.int64)}])

## 
{'classification': tensor(1.5403, grad_fn=<DivBackward0>), 'bbox_regression': tensor(0., grad_fn=<DivBackward0>), 'bbox_ctrness': tensor(0., grad_fn=<DivBackward0>)}

@facebook-github-bot
Copy link

facebook-github-bot commented Jan 24, 2022

💊 CI failures summary and remediations

As of commit 622c16c (more details on the Dr. CI page):


None of the CI failures appear to be your fault 💚



🚧 9 ongoing upstream failures:

These were probably caused by upstream breakages that are not fixed yet.


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@@ -59,9 +59,13 @@ def compute_loss(
all_gt_classes_targets = []
all_gt_boxes_targets = []
for targets_per_image, matched_idxs_per_image in zip(targets, matched_idxs):
gt_classes_targets = targets_per_image["labels"][matched_idxs_per_image.clip(min=0)]
if len(targets_per_image["labels"]) == 0:
Copy link
Contributor

@datumbox datumbox Jan 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need to confirm that this will works as expected and doesn't produce errors when find_unused_params=False (see discussion at #2784 (comment))

The runtime error to be looking for is something like:

RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by (1) passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`; (2) making sure all `forward` function outputs participate in calculating loss. If you already have done the above two steps, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable).

@jdsgomes Could you confirm by kicking off a run for an epoch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@datumbox thanks. I have not considered this problem. By the way, I want to know that how retinanet can handle this.

Copy link
Contributor

@datumbox datumbox Jan 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had this issue previously with a couple of models. Retina was one of them. The way we avoided it was by rewriting the loss estimations in a way that the vectors can cope with empty indeces. Let's check first if it is an issue before starting rewriting.

Edit: I found the PR with the patch: #3032

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@datumbox I think the current implementation will not have this trouble, as you see, the regression loss and centerness loss is just in the way that the vectors can cope with empty indeces. but for safe, we had better check once. but I don't have the empty datasets, so can you help to check this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's our understanding as well, that you should be OK. A single run on the scripts for 1 epoch should be enough to confirm if it's a problem (at least that was the case previously). I'll sync with Joao to confirm. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the other user confirmed the patch works, see #5266 (comment).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm that the patch works and also that after training for an epoch no runtime errors are observed.

@xiaohu2015 xiaohu2015 changed the title fix bug when the target is empty fix bug when the target is empty in FCOS Jan 24, 2022
@datumbox datumbox requested a review from jdsgomes January 24, 2022 16:39
Copy link
Contributor

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good to me and it seems we have confirmation that the patch works from #5266 (comment). I'm approving but I'll leave @jdsgomes merge once he reviews and gets the results of the 1 epoch run.

Thanks for the update @xiaohu2015!

Copy link
Contributor

@jdsgomes jdsgomes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thank you for the fix!

@@ -59,9 +59,13 @@ def compute_loss(
all_gt_classes_targets = []
all_gt_boxes_targets = []
for targets_per_image, matched_idxs_per_image in zip(targets, matched_idxs):
gt_classes_targets = targets_per_image["labels"][matched_idxs_per_image.clip(min=0)]
if len(targets_per_image["labels"]) == 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm that the patch works and also that after training for an epoch no runtime errors are observed.

@jdsgomes jdsgomes merged commit 11d903e into pytorch:main Jan 26, 2022
@github-actions
Copy link

Hey @jdsgomes!

You merged this PR, but no labels were added. The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py

facebook-github-bot pushed a commit that referenced this pull request Feb 3, 2022
Summary:
* fix bug when the target is empty

* Add unittest for empty instance training

Reviewed By: kazhang

Differential Revision: D33927512

fbshipit-source-id: e92355380948d9181e135b7612596c5309afeeda

Co-authored-by: Zhiqiang Wang <[email protected]>
Co-authored-by: Joao Gomes <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants