bug for ForwardModelsVal #69

zongshenmu · 2020-09-25T06:12:54Z

why it happens that the input and target size of cross entropy loss are not matched?

Traceback (most recent call last):
  File "train_tasks.py", line 679, in <module>
    main()
  File "train_tasks.py", line 604, in main
    tbLogger,
  File "train_tasks.py", line 662, in evaluate
    args, task_cfg, device, task_id, batch, model, task_losses
  File "/data1/mzs/Code/vilbert-multi-task/vilbert/task_utils.py", line 155, in ForwardModelsVal
    loss = task_losses[task_id](vil_binary_prediction, target)
  File "/data0/mzs/anaconda3/envs/vilbert-mt/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/data0/mzs/anaconda3/envs/vilbert-mt/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 601, in forward
    reduction=self.reduction)
  File "/data0/mzs/anaconda3/envs/vilbert-mt/lib/python3.6/site-packages/torch/nn/functional.py", line 2124, in binary_cross_entropy_with_logits
    raise ValueError("Target size ({}) must be the same as input size ({})".format(target.size(), input.size()))
ValueError: Target size (torch.Size([6, 2])) must be the same as input size (torch.Size([12, 2]))

The text was updated successfully, but these errors were encountered:

vedanuj · 2020-09-25T16:01:02Z

Which task are you running on? Also please share your training command.

zongshenmu · 2020-09-26T01:05:57Z

I pull your code from repository and only change the train batch size in the vilbert_tasks.yml. I run the Multi-task Training as you mentioned in the README.md but fail in the NLVR dataset.
I debug the dataset and find it is abnormal condition of the evaluation. It is always failed in the last iteration in the evaluation process of the model. When the batch data is fed in the model, the batch size between target data and the vil_binary_prediction data is not matched, which causes the task_losse, binary_cross_entropy_with_logits, cannot calculate right.

vil_binary_prediction torch.Size([12, 2]) target torch.Size([6, 2])
Traceback (most recent call last):
  File "train_tasks.py", line 682, in <module>
    main()
  File "train_tasks.py", line 605, in main
    tbLogger,
  File "train_tasks.py", line 665, in evaluate
    args, task_cfg, device, task_id, batch, model, task_losses
  File "/data1/mzs/Code/vilbert-multi-task/vilbert/task_utils.py", line 157, in ForwardModelsVal
    loss = task_losses[task_id](vil_binary_prediction, target)
  File "/data0/mzs/anaconda3/envs/vilbert-mt/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/data0/mzs/anaconda3/envs/vilbert-mt/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 601, in forward
    reduction=self.reduction)
  File "/data0/mzs/anaconda3/envs/vilbert-mt/lib/python3.6/site-packages/torch/nn/functional.py", line 2124, in binary_cross_entropy_with_logits
    raise ValueError("Target size ({}) must be the same as input size ({})".format(target.size(), input.size()))
ValueError: Target size (torch.Size([6, 2])) must be the same as input size (torch.Size([12, 2]))

I also find it is possibly not fit for 1688-1691 lines of the code in the vilbert.py. When the batch size is odd, it does not run.

 if pooled_output.size(0) % 2 == 0:
            vil_binary_prediction = self.vil_binary_prediction(
                pooled_output.view(-1, pooled_output.size(1) * 2)
            )

zongshenmu · 2020-09-28T09:28:36Z

Your code in eval_tasks.py for retrieval_datasets.py is also wrong. Dataloader cannot analyze the right batch size and num options:

    elif task_cfg[task_id]["process"] in ["retrieval"]:
        max_num_bbox = features.size(1)
        num_options = question.size(1)
       
        features = features.view(-1, features.size(2), features.size(3))
        spatials = spatials.view(-1, spatials.size(2), spatials.size(3))
        image_mask = image_mask.view(-1, image_mask.size(2))
        question = question.view(-1, question.size(2))
        input_mask = input_mask.view(-1, input_mask.size(2))
        segment_ids = segment_ids.view(-1, segment_ids.size(2))
        co_attention_mask = co_attention_mask.view(
            -1, co_attention_mask.size(2), co_attention_mask.size(3)
        )

Can you provide your retrieval evaluation metric?

ZhiyuanChen · 2020-10-05T17:53:02Z

Code quality of this repo is surprisingly low. I couldn't believe this is an engineering of facebook. It would take much less effort to just rewrite than debug.

chen398936790 · 2020-12-08T04:33:58Z

Dear all,

I met the same problem when I tried to run multi-task learning with the command in README.md.

The training had already continued for hours and I can even see a validation on refcocog was done in iter 513.
But the code returned the same ValueError at iter 661 at last.

Do you find any solution to that? Or are there any ideas to solve this problem?

Thank you!

enaserianhanzaei · 2021-03-23T09:50:26Z

it is three weeks that I'm trying to run this code, it's unbelievable how full of bug it is, I'm just wondering if the results they reported are actually true.

enaserianhanzaei · 2021-05-11T15:41:07Z

@zongshenmu @vedanuj @chen398936790 @ZhiyuanChen

I wrote a step-by-step tutorial on how to set up the environment, train and test this model. I also added a section on extracting the visiolinguistic embeddings from the image-text data.
https://naserian-elahe.medium.com/vilbert-a-model-for-learning-joint-representations-of-image-content-and-natural-language-47f56a313a79
I very much appreciate any comments or suggestions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug for ForwardModelsVal #69

bug for ForwardModelsVal #69

zongshenmu commented Sep 25, 2020

vedanuj commented Sep 25, 2020

zongshenmu commented Sep 26, 2020 •

edited

Loading

zongshenmu commented Sep 28, 2020

ZhiyuanChen commented Oct 5, 2020

chen398936790 commented Dec 8, 2020

enaserianhanzaei commented Mar 23, 2021

enaserianhanzaei commented May 11, 2021

bug for ForwardModelsVal #69

bug for ForwardModelsVal #69

Comments

zongshenmu commented Sep 25, 2020

vedanuj commented Sep 25, 2020

zongshenmu commented Sep 26, 2020 • edited Loading

zongshenmu commented Sep 28, 2020

ZhiyuanChen commented Oct 5, 2020

chen398936790 commented Dec 8, 2020

enaserianhanzaei commented Mar 23, 2021

enaserianhanzaei commented May 11, 2021

zongshenmu commented Sep 26, 2020 •

edited

Loading