fix bug when the target is empty in FCOS #5267

xiaohu2015 · 2022-01-24T12:24:48Z

Last week, we released the detetion model: FCOS at #4961

some users have found there is a problem when the inputs has empty targets.
#5266

same issus can also be found in detectron2: facebookresearch/detectron2#3851, facebookresearch/detectron2#3910

we fix this bug in this pr, you can test:

model = fcos_resnet50_fpn(pretrained=False)
out = model(torch.zeros((1,3,512,512)), targets=[{"boxes": torch.empty(0,4), "labels": torch.empty(0,1).to(torch.int64)}])

## 
{'classification': tensor(1.5403, grad_fn=<DivBackward0>), 'bbox_regression': tensor(0., grad_fn=<DivBackward0>), 'bbox_ctrness': tensor(0., grad_fn=<DivBackward0>)}

facebook-github-bot · 2022-01-24T12:24:56Z

💊 CI failures summary and remediations

As of commit 622c16c (more details on the Dr. CI page):

✅ None of the CI failures appear to be your fault 💚

9/9 broken upstream at merge base 579f5f5 since Jan 26

🚧 9 ongoing upstream failures:

These were probably caused by upstream breakages that are not fixed yet.

binary_win_wheel_py3.9_cu115 since Jan 26 (aef9964)
- 🔁 rerun
binary_win_conda_py3.9_cu115 since Jan 26 (aef9964)
- 🔁 rerun
cmake_linux_gpu since Jan 26 (aef9964)
- 🔁 rerun
unittest_windows_cpu_py3.9 since Jan 26 (aef9964)
- 🔁 rerun
cmake_windows_cpu since Jan 26 (aef9964)
- 🔁 rerun
binary_win_conda_py3.9_cpu since Jan 26 (aef9964)
- 🔁 rerun
unittest_windows_cpu_py3.7 since Jan 26 (aef9964)
- 🔁 rerun
binary_win_wheel_py3.9_cpu since Jan 26 (aef9964)
- 🔁 rerun
unittest_windows_cpu_py3.8 since Jan 26 (aef9964)
- 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

datumbox · 2022-01-24T12:43:30Z

torchvision/models/detection/fcos.py

@@ -59,9 +59,13 @@ def compute_loss(
        all_gt_classes_targets = []
        all_gt_boxes_targets = []
        for targets_per_image, matched_idxs_per_image in zip(targets, matched_idxs):
-            gt_classes_targets = targets_per_image["labels"][matched_idxs_per_image.clip(min=0)]
+            if len(targets_per_image["labels"]) == 0:


We might need to confirm that this will works as expected and doesn't produce errors when find_unused_params=False (see discussion at #2784 (comment))

The runtime error to be looking for is something like:

RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by (1) passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`; (2) making sure all `forward` function outputs participate in calculating loss. If you already have done the above two steps, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable).

@jdsgomes Could you confirm by kicking off a run for an epoch?

@datumbox thanks. I have not considered this problem. By the way, I want to know that how retinanet can handle this.

We had this issue previously with a couple of models. Retina was one of them. The way we avoided it was by rewriting the loss estimations in a way that the vectors can cope with empty indeces. Let's check first if it is an issue before starting rewriting.

Edit: I found the PR with the patch: #3032

@datumbox I think the current implementation will not have this trouble, as you see, the regression loss and centerness loss is just in the way that the vectors can cope with empty indeces. but for safe, we had better check once. but I don't have the empty datasets, so can you help to check this?

That's our understanding as well, that you should be OK. A single run on the scripts for 1 epoch should be enough to confirm if it's a problem (at least that was the case previously). I'll sync with Joao to confirm. :)

Looks like the other user confirmed the patch works, see #5266 (comment).

I can confirm that the patch works and also that after training for an epoch no runtime errors are observed.

Add unittest for empty instance training

datumbox

The changes look good to me and it seems we have confirmation that the patch works from #5266 (comment). I'm approving but I'll leave @jdsgomes merge once he reviews and gets the results of the 1 epoch run.

Thanks for the update @xiaohu2015!

jdsgomes

Looks good to me, thank you for the fix!

jdsgomes · 2022-01-26T17:37:21Z

torchvision/models/detection/fcos.py

@@ -59,9 +59,13 @@ def compute_loss(
        all_gt_classes_targets = []
        all_gt_boxes_targets = []
        for targets_per_image, matched_idxs_per_image in zip(targets, matched_idxs):
-            gt_classes_targets = targets_per_image["labels"][matched_idxs_per_image.clip(min=0)]
+            if len(targets_per_image["labels"]) == 0:


I can confirm that the patch works and also that after training for an epoch no runtime errors are observed.

github-actions · 2022-01-26T17:39:22Z

Hey @jdsgomes!

You merged this PR, but no labels were added. The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py

Summary: * fix bug when the target is empty * Add unittest for empty instance training Reviewed By: kazhang Differential Revision: D33927512 fbshipit-source-id: e92355380948d9181e135b7612596c5309afeeda Co-authored-by: Zhiqiang Wang <[email protected]> Co-authored-by: Joao Gomes <[email protected]>

fix bug when the target is empty

1451d97

pytorch-bot bot added the ciflow/default label Jan 24, 2022

facebook-github-bot added the cla signed label Jan 24, 2022

xiaohu2015 mentioned this pull request Jan 24, 2022

FCOS empty box images #5266

Open

datumbox reviewed Jan 24, 2022

View reviewed changes

zhiqwang and others added 2 commits January 24, 2022 20:47

Add unittest for empty instance training

4b2a856

Merge pull request #1 from zhiqwang/fcos-empty-test

05eb772

Add unittest for empty instance training

xiaohu2015 changed the title ~~fix bug when the target is empty~~ fix bug when the target is empty in FCOS Jan 24, 2022

datumbox requested a review from jdsgomes January 24, 2022 16:39

datumbox approved these changes Jan 24, 2022

View reviewed changes

jdsgomes approved these changes Jan 26, 2022

View reviewed changes

Merge branch 'main' into main

622c16c

jdsgomes merged commit 11d903e into pytorch:main Jan 26, 2022

datumbox added enhancement module: models labels Jan 26, 2022

pmeier mentioned this pull request Mar 16, 2022

remove option to pass fill as str #5632

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix bug when the target is empty in FCOS #5267

fix bug when the target is empty in FCOS #5267

xiaohu2015 commented Jan 24, 2022 •

edited

Loading

facebook-github-bot commented Jan 24, 2022 •

edited

Loading

datumbox Jan 24, 2022 •

edited

Loading

xiaohu2015 Jan 24, 2022

datumbox Jan 24, 2022 •

edited

Loading

xiaohu2015 Jan 24, 2022

datumbox Jan 24, 2022

datumbox Jan 24, 2022

jdsgomes Jan 26, 2022

datumbox left a comment

jdsgomes left a comment

jdsgomes Jan 26, 2022

github-actions bot commented Jan 26, 2022

fix bug when the target is empty in FCOS #5267

fix bug when the target is empty in FCOS #5267

Conversation

xiaohu2015 commented Jan 24, 2022 • edited Loading

facebook-github-bot commented Jan 24, 2022 • edited Loading

💊 CI failures summary and remediations

🚧 9 ongoing upstream failures:

datumbox Jan 24, 2022 • edited Loading

Choose a reason for hiding this comment

xiaohu2015 Jan 24, 2022

Choose a reason for hiding this comment

datumbox Jan 24, 2022 • edited Loading

Choose a reason for hiding this comment

xiaohu2015 Jan 24, 2022

Choose a reason for hiding this comment

datumbox Jan 24, 2022

Choose a reason for hiding this comment

datumbox Jan 24, 2022

Choose a reason for hiding this comment

jdsgomes Jan 26, 2022

Choose a reason for hiding this comment

datumbox left a comment

Choose a reason for hiding this comment

jdsgomes left a comment

Choose a reason for hiding this comment

jdsgomes Jan 26, 2022

Choose a reason for hiding this comment

github-actions bot commented Jan 26, 2022

xiaohu2015 commented Jan 24, 2022 •

edited

Loading

facebook-github-bot commented Jan 24, 2022 •

edited

Loading

datumbox Jan 24, 2022 •

edited

Loading

datumbox Jan 24, 2022 •

edited

Loading