Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After using extract_features_vmb.py to extract features, and then training with mcan, it reported the error message that the feature dimensions did not match. #1239

Open
clearlove7-s11 opened this issue Apr 24, 2022 · 0 comments

Comments

@clearlove7-s11
Copy link

clearlove7-s11 commented Apr 24, 2022

❓ Questions and Help

After using extract_features_vmb.py to extract features, and then training with mcan, it reported the error message that the feature dimensions did not match.

RuntimeError: Expected 4-dimensional input for 4-dimensional weight [256, 2048, 1, 1], but got 3-dimensional input of size [32, 100, 2048] instead

'''
2022-04-24T08:36:57 | mmf.utils.configuration: Overriding option config to /home/cvpr/vqa/mmf-main/projects/movie_mcan/configs/vqa2/defaults.yaml
2022-04-24T08:36:57 | mmf.utils.configuration: Overriding option model to movie_mcan
2022-04-24T08:36:57 | mmf.utils.configuration: Overriding option datasets to vqa2
2022-04-24T08:36:57 | mmf.utils.configuration: Overriding option run_type to train_val
/root/anaconda3/envs/mm/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence MMF_LOG_DIR, some elements are missing: please replace them with empty quoted strings. See omry/omegaconf#572 for details.
category=UserWarning,
/root/anaconda3/envs/mm/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence MMF_REPORT_DIR, some elements are missing: please replace them with empty quoted strings. See omry/omegaconf#572 for details.
category=UserWarning,
/root/anaconda3/envs/mm/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence MMF_TENSORBOARD_LOGDIR, some elements are missing: please replace them with empty quoted strings. See omry/omegaconf#572 for details.
category=UserWarning,
/root/anaconda3/envs/mm/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence MMF_WANDB_LOGDIR, some elements are missing: please replace them with empty quoted strings. See omry/omegaconf#572 for details.
category=UserWarning,
/root/anaconda3/envs/mm/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence MMF_USER_DIR, some elements are missing: please replace them with empty quoted strings. See omry/omegaconf#572 for details.
category=UserWarning,
2022-04-24T08:37:01 | mmf.utils.distributed: XLA Mode:False
2022-04-24T08:37:01 | mmf.utils.distributed: Distributed Init (Rank 1): tcp://localhost:14677
/root/anaconda3/envs/mm/lib/python3.7/site-packages/omegaconf/grammar_visitor.py:257: UserWarning: In the sequence MMF_USER_DIR, some elements are missing: please replace them with empty quoted strings. See omry/omegaconf#572 for details.
category=UserWarning,
2022-04-24T08:37:01 | mmf.utils.distributed: XLA Mode:False
2022-04-24T08:37:01 | mmf.utils.distributed: Distributed Init (Rank 0): tcp://localhost:14677
2022-04-24T08:37:02 | torch.distributed.distributed_c10d: Added key: store_based_barrier_key:1 to store for rank: 1
2022-04-24T08:37:02 | torch.distributed.distributed_c10d: Added key: store_based_barrier_key:1 to store for rank: 0
2022-04-24T08:37:02 | torch.distributed.distributed_c10d: Rank 0: Completed store-based barrier for 2 nodes.
2022-04-24T08:37:02 | mmf.utils.distributed: Initialized Host dgx5 as Rank 0
2022-04-24T08:37:02 | torch.distributed.distributed_c10d: Rank 1: Completed store-based barrier for 2 nodes.
2022-04-24T08:37:02 | mmf.utils.distributed: Initialized Host dgx5 as Rank 1
2022-04-24T08:37:23 | mmf: Logging to: ./save/train.log
2022-04-24T08:37:23 | mmf_cli.run: Namespace(config_override=None, local_rank=None, opts=['config=/home/cvpr/vqa/mmf-main/projects/movie_mcan/configs/vqa2/defaults.yaml', 'model=movie_mcan', 'datasets=vqa2', 'run_type=train_val'])
2022-04-24T08:37:23 | mmf_cli.run: Torch version: 1.9.0+cu102
2022-04-24T08:37:23 | mmf.utils.general: CUDA Device 0 is: Tesla V100-SXM2-32GB
2022-04-24T08:37:23 | mmf_cli.run: Using seed 23836977
2022-04-24T08:37:23 | mmf.trainers.mmf_trainer: Loading datasets
qqqqqqqqqwqwqwwwwwwwwwwwwwwwwwwwwwwwwwwaaaaaaaaaaaaaaaaa
qqqqqqqqqwqwqwwwwwwwwwwwwwwwwwwwwwwwwwwaaaaaaaaaaaaaaaaa
2022-04-24T08:37:24 | torchtext.vocab: Loading vectors from /root/.cache/torch/mmf/glove.6B.300d.txt.pt
2022-04-24T08:37:25 | torchtext.vocab: Loading vectors from /root/.cache/torch/mmf/glove.6B.300d.txt.pt
2022-04-24T08:37:26 | mmf.datasets.multi_datamodule: Multitasking disabled by default for single dataset training
2022-04-24T08:37:26 | mmf.datasets.multi_datamodule: Multitasking disabled by default for single dataset training
2022-04-24T08:37:26 | mmf.datasets.multi_datamodule: Multitasking disabled by default for single dataset training
2022-04-24T08:37:26 | mmf.trainers.mmf_trainer: Loading model
2022-04-24T08:37:29 | mmf.trainers.mmf_trainer: Loading optimizer
2022-04-24T08:37:29 | mmf.trainers.mmf_trainer: Loading metrics
2022-04-24T08:37:29 | mmf.trainers.core.device: Using PyTorch DistributedDataParallel
WARNING 2022-04-24T08:37:29 | py.warnings: /home/cvpr/vqa/mmf-main/mmf/utils/distributed.py:412: UserWarning: You can enable ZeRO and Sharded DDP, by installing fairscale and setting optimizer.enable_state_sharding=True.
builtin_warn(*args, **kwargs)

WARNING 2022-04-24T08:37:29 | py.warnings: /home/cvpr/vqa/mmf-main/mmf/utils/distributed.py:412: UserWarning: You can enable ZeRO and Sharded DDP, by installing fairscale and setting optimizer.enable_state_sharding=True.
builtin_warn(*args, **kwargs)

2022-04-24T08:37:29 | mmf.trainers.mmf_trainer: ===== Model =====
2022-04-24T08:37:29 | mmf.trainers.mmf_trainer: DistributedDataParallel(
(module): MoVieMcan(
(word_embedding): Embedding(75505, 300)
(text_embeddings): TextEmbedding(
(module): SAEmbedding(
(lstm): LSTM(300, 1024, batch_first=True)
(self_attns): ModuleList(
(0): SelfAttention(
(multi_head_attn): MovieMcanMultiHeadAttention(
(linears): ModuleList(
(0): Linear(in_features=1024, out_features=1024, bias=True)
(1): Linear(in_features=1024, out_features=1024, bias=True)
(2): Linear(in_features=1024, out_features=1024, bias=True)
(3): Linear(in_features=1024, out_features=1024, bias=True)
)
(dropout): Dropout(p=0.1, inplace=False)
)
(fcn): Sequential(
(0): Linear(in_features=1024, out_features=4096, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.1, inplace=False)
(3): Linear(in_features=4096, out_features=1024, bias=True)
)
(drop_mha): Dropout(p=0.1, inplace=False)
(ln_mha): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(drop_fcn): Dropout(p=0.1, inplace=False)
(ln_fcn): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(1): SelfAttention(
(multi_head_attn): MovieMcanMultiHeadAttention(
(linears): ModuleList(
(0): Linear(in_features=1024, out_features=1024, bias=True)
(1): Linear(in_features=1024, out_features=1024, bias=True)
(2): Linear(in_features=1024, out_features=1024, bias=True)
(3): Linear(in_features=1024, out_features=1024, bias=True)
)
(dropout): Dropout(p=0.1, inplace=False)
)
(fcn): Sequential(
(0): Linear(in_features=1024, out_features=4096, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.1, inplace=False)
(3): Linear(in_features=4096, out_features=1024, bias=True)
)
(drop_mha): Dropout(p=0.1, inplace=False)
(ln_mha): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(drop_fcn): Dropout(p=0.1, inplace=False)
(ln_fcn): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(2): SelfAttention(
(multi_head_attn): MovieMcanMultiHeadAttention(
(linears): ModuleList(
(0): Linear(in_features=1024, out_features=1024, bias=True)
(1): Linear(in_features=1024, out_features=1024, bias=True)
(2): Linear(in_features=1024, out_features=1024, bias=True)
(3): Linear(in_features=1024, out_features=1024, bias=True)
)
(dropout): Dropout(p=0.1, inplace=False)
)
(fcn): Sequential(
(0): Linear(in_features=1024, out_features=4096, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.1, inplace=False)
(3): Linear(in_features=4096, out_features=1024, bias=True)
)
(drop_mha): Dropout(p=0.1, inplace=False)
(ln_mha): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(drop_fcn): Dropout(p=0.1, inplace=False)
(ln_fcn): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(3): SelfAttention(
(multi_head_attn): MovieMcanMultiHeadAttention(
(linears): ModuleList(
(0): Linear(in_features=1024, out_features=1024, bias=True)
(1): Linear(in_features=1024, out_features=1024, bias=True)
(2): Linear(in_features=1024, out_features=1024, bias=True)
(3): Linear(in_features=1024, out_features=1024, bias=True)
)
(dropout): Dropout(p=0.1, inplace=False)
)
(fcn): Sequential(
(0): Linear(in_features=1024, out_features=4096, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.1, inplace=False)
(3): Linear(in_features=4096, out_features=1024, bias=True)
)
(drop_mha): Dropout(p=0.1, inplace=False)
(ln_mha): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(drop_fcn): Dropout(p=0.1, inplace=False)
(ln_fcn): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(4): SelfAttention(
(multi_head_attn): MovieMcanMultiHeadAttention(
(linears): ModuleList(
(0): Linear(in_features=1024, out_features=1024, bias=True)
(1): Linear(in_features=1024, out_features=1024, bias=True)
(2): Linear(in_features=1024, out_features=1024, bias=True)
(3): Linear(in_features=1024, out_features=1024, bias=True)
)
(dropout): Dropout(p=0.1, inplace=False)
)
(fcn): Sequential(
(0): Linear(in_features=1024, out_features=4096, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.1, inplace=False)
(3): Linear(in_features=4096, out_features=1024, bias=True)
)
(drop_mha): Dropout(p=0.1, inplace=False)
(ln_mha): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(drop_fcn): Dropout(p=0.1, inplace=False)
(ln_fcn): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(5): SelfAttention(
(multi_head_attn): MovieMcanMultiHeadAttention(
(linears): ModuleList(
(0): Linear(in_features=1024, out_features=1024, bias=True)
(1): Linear(in_features=1024, out_features=1024, bias=True)
(2): Linear(in_features=1024, out_features=1024, bias=True)
(3): Linear(in_features=1024, out_features=1024, bias=True)
)
(dropout): Dropout(p=0.1, inplace=False)
)
(fcn): Sequential(
(0): Linear(in_features=1024, out_features=4096, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.1, inplace=False)
(3): Linear(in_features=4096, out_features=1024, bias=True)
)
(drop_mha): Dropout(p=0.1, inplace=False)
(ln_mha): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(drop_fcn): Dropout(p=0.1, inplace=False)
(ln_fcn): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
)
(attn_pool): AttnPool1d(
(linear): Sequential(
(0): Linear(in_features=1024, out_features=512, bias=True)
(1): ReLU()
(2): Dropout(p=0.1, inplace=False)
(3): Linear(in_features=512, out_features=2, bias=True)
)
)
)
)
(image_feature_encoders): Identity()
(image_feature_embeddings_list): TwoBranchEmbedding(
(sga): SGAEmbedding(
(linear): Linear(in_features=2048, out_features=1024, bias=True)
(self_guided_attns): ModuleList(
(0): SelfGuidedAttention(
(multi_head_attn): ModuleList(
(0): MovieMcanMultiHeadAttention(
(linears): ModuleList(
(0): Linear(in_features=1024, out_features=1024, bias=True)
(1): Linear(in_features=1024, out_features=1024, bias=True)
(2): Linear(in_features=1024, out_features=1024, bias=True)
(3): Linear(in_features=1024, out_features=1024, bias=True)
)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): MovieMcanMultiHeadAttention(
(linears): ModuleList(
(0): Linear(in_features=1024, out_features=1024, bias=True)
(1): Linear(in_features=1024, out_features=1024, bias=True)
(2): Linear(in_features=1024, out_features=1024, bias=True)
(3): Linear(in_features=1024, out_features=1024, bias=True)
)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(fcn): Sequential(
(0): Linear(in_features=1024, out_features=4096, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.1, inplace=False)
(3): Linear(in_features=4096, out_features=1024, bias=True)
)
(drop_mha): ModuleList(
(0): Dropout(p=0.1, inplace=False)
(1): Dropout(p=0.1, inplace=False)
)
(ln_mha): ModuleList(
(0): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(drop_fcn): Dropout(p=0.1, inplace=False)
(ln_fcn): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(1): SelfGuidedAttention(
(multi_head_attn): ModuleList(
(0): MovieMcanMultiHeadAttention(
(linears): ModuleList(
(0): Linear(in_features=1024, out_features=1024, bias=True)
(1): Linear(in_features=1024, out_features=1024, bias=True)
(2): Linear(in_features=1024, out_features=1024, bias=True)
(3): Linear(in_features=1024, out_features=1024, bias=True)
)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): MovieMcanMultiHeadAttention(
(linears): ModuleList(
(0): Linear(in_features=1024, out_features=1024, bias=True)
(1): Linear(in_features=1024, out_features=1024, bias=True)
(2): Linear(in_features=1024, out_features=1024, bias=True)
(3): Linear(in_features=1024, out_features=1024, bias=True)
)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(fcn): Sequential(
(0): Linear(in_features=1024, out_features=4096, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.1, inplace=False)
(3): Linear(in_features=4096, out_features=1024, bias=True)
)
(drop_mha): ModuleList(
(0): Dropout(p=0.1, inplace=False)
(1): Dropout(p=0.1, inplace=False)
)
(ln_mha): ModuleList(
(0): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(drop_fcn): Dropout(p=0.1, inplace=False)
(ln_fcn): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(2): SelfGuidedAttention(
(multi_head_attn): ModuleList(
(0): MovieMcanMultiHeadAttention(
(linears): ModuleList(
(0): Linear(in_features=1024, out_features=1024, bias=True)
(1): Linear(in_features=1024, out_features=1024, bias=True)
(2): Linear(in_features=1024, out_features=1024, bias=True)
(3): Linear(in_features=1024, out_features=1024, bias=True)
)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): MovieMcanMultiHeadAttention(
(linears): ModuleList(
(0): Linear(in_features=1024, out_features=1024, bias=True)
(1): Linear(in_features=1024, out_features=1024, bias=True)
(2): Linear(in_features=1024, out_features=1024, bias=True)
(3): Linear(in_features=1024, out_features=1024, bias=True)
)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(fcn): Sequential(
(0): Linear(in_features=1024, out_features=4096, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.1, inplace=False)
(3): Linear(in_features=4096, out_features=1024, bias=True)
)
(drop_mha): ModuleList(
(0): Dropout(p=0.1, inplace=False)
(1): Dropout(p=0.1, inplace=False)
)
(ln_mha): ModuleList(
(0): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(drop_fcn): Dropout(p=0.1, inplace=False)
(ln_fcn): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(3): SelfGuidedAttention(
(multi_head_attn): ModuleList(
(0): MovieMcanMultiHeadAttention(
(linears): ModuleList(
(0): Linear(in_features=1024, out_features=1024, bias=True)
(1): Linear(in_features=1024, out_features=1024, bias=True)
(2): Linear(in_features=1024, out_features=1024, bias=True)
(3): Linear(in_features=1024, out_features=1024, bias=True)
)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): MovieMcanMultiHeadAttention(
(linears): ModuleList(
(0): Linear(in_features=1024, out_features=1024, bias=True)
(1): Linear(in_features=1024, out_features=1024, bias=True)
(2): Linear(in_features=1024, out_features=1024, bias=True)
(3): Linear(in_features=1024, out_features=1024, bias=True)
)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(fcn): Sequential(
(0): Linear(in_features=1024, out_features=4096, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.1, inplace=False)
(3): Linear(in_features=4096, out_features=1024, bias=True)
)
(drop_mha): ModuleList(
(0): Dropout(p=0.1, inplace=False)
(1): Dropout(p=0.1, inplace=False)
)
(ln_mha): ModuleList(
(0): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(drop_fcn): Dropout(p=0.1, inplace=False)
(ln_fcn): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(4): SelfGuidedAttention(
(multi_head_attn): ModuleList(
(0): MovieMcanMultiHeadAttention(
(linears): ModuleList(
(0): Linear(in_features=1024, out_features=1024, bias=True)
(1): Linear(in_features=1024, out_features=1024, bias=True)
(2): Linear(in_features=1024, out_features=1024, bias=True)
(3): Linear(in_features=1024, out_features=1024, bias=True)
)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): MovieMcanMultiHeadAttention(
(linears): ModuleList(
(0): Linear(in_features=1024, out_features=1024, bias=True)
(1): Linear(in_features=1024, out_features=1024, bias=True)
(2): Linear(in_features=1024, out_features=1024, bias=True)
(3): Linear(in_features=1024, out_features=1024, bias=True)
)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(fcn): Sequential(
(0): Linear(in_features=1024, out_features=4096, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.1, inplace=False)
(3): Linear(in_features=4096, out_features=1024, bias=True)
)
(drop_mha): ModuleList(
(0): Dropout(p=0.1, inplace=False)
(1): Dropout(p=0.1, inplace=False)
)
(ln_mha): ModuleList(
(0): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(drop_fcn): Dropout(p=0.1, inplace=False)
(ln_fcn): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(5): SelfGuidedAttention(
(multi_head_attn): ModuleList(
(0): MovieMcanMultiHeadAttention(
(linears): ModuleList(
(0): Linear(in_features=1024, out_features=1024, bias=True)
(1): Linear(in_features=1024, out_features=1024, bias=True)
(2): Linear(in_features=1024, out_features=1024, bias=True)
(3): Linear(in_features=1024, out_features=1024, bias=True)
)
(dropout): Dropout(p=0.1, inplace=False)
)
(1): MovieMcanMultiHeadAttention(
(linears): ModuleList(
(0): Linear(in_features=1024, out_features=1024, bias=True)
(1): Linear(in_features=1024, out_features=1024, bias=True)
(2): Linear(in_features=1024, out_features=1024, bias=True)
(3): Linear(in_features=1024, out_features=1024, bias=True)
)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(fcn): Sequential(
(0): Linear(in_features=1024, out_features=4096, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.1, inplace=False)
(3): Linear(in_features=4096, out_features=1024, bias=True)
)
(drop_mha): ModuleList(
(0): Dropout(p=0.1, inplace=False)
(1): Dropout(p=0.1, inplace=False)
)
(ln_mha): ModuleList(
(0): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(drop_fcn): Dropout(p=0.1, inplace=False)
(ln_fcn): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
)
)
(sga_pool): AttnPool1d(
(linear): Sequential(
(0): Linear(in_features=1024, out_features=512, bias=True)
(1): ReLU()
(2): Dropout(p=0.1, inplace=False)
(3): Linear(in_features=512, out_features=1, bias=True)
)
)
(cbn): CBNEmbedding(
(layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(cbns): ModuleList(
(0): MovieBottleneck(
(conv1): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(256, eps=1e-05)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(256, eps=1e-05)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(1024, eps=1e-05)
(relu): ReLU(inplace=True)
(downsample): Conv2d(2048, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(cond): Modulation(
(linear): Linear(in_features=1024, out_features=2048, bias=True)
(conv): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1))
)
(se): SEModule(
(se): Sequential(
(0): AdaptiveAvgPool2d(output_size=(1, 1))
(1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(2): ReLU(inplace=True)
(3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(4): Sigmoid()
)
(attn): Sequential(
(0): ChannelPool()
(1): Conv2d(1, 1, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), bias=False)
(2): Sigmoid()
)
)
)
(1): MovieBottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(256, eps=1e-05)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(256, eps=1e-05)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(1024, eps=1e-05)
(relu): ReLU(inplace=True)
(cond): Modulation(
(linear): Linear(in_features=1024, out_features=1024, bias=True)
(conv): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
)
(se): SEModule(
(se): Sequential(
(0): AdaptiveAvgPool2d(output_size=(1, 1))
(1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(2): ReLU(inplace=True)
(3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(4): Sigmoid()
)
(attn): Sequential(
(0): ChannelPool()
(1): Conv2d(1, 1, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), bias=False)
(2): Sigmoid()
)
)
)
(2): MovieBottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(256, eps=1e-05)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(256, eps=1e-05)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(1024, eps=1e-05)
(relu): ReLU(inplace=True)
(cond): Modulation(
(linear): Linear(in_features=1024, out_features=1024, bias=True)
(conv): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
)
(se): SEModule(
(se): Sequential(
(0): AdaptiveAvgPool2d(output_size=(1, 1))
(1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(2): ReLU(inplace=True)
(3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(4): Sigmoid()
)
(attn): Sequential(
(0): ChannelPool()
(1): Conv2d(1, 1, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), bias=False)
(2): Sigmoid()
)
)
)
(3): MovieBottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): FrozenBatchNorm2d(256, eps=1e-05)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): FrozenBatchNorm2d(256, eps=1e-05)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): FrozenBatchNorm2d(1024, eps=1e-05)
(relu): ReLU(inplace=True)
(cond): Modulation(
(linear): Linear(in_features=1024, out_features=1024, bias=True)
(conv): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
)
(se): SEModule(
(se): Sequential(
(0): AdaptiveAvgPool2d(output_size=(1, 1))
(1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(2): ReLU(inplace=True)
(3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(4): Sigmoid()
)
(attn): Sequential(
(0): ChannelPool()
(1): Conv2d(1, 1, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3), bias=False)
(2): Sigmoid()
)
)
)
)
)
)
(image_text_multi_modal_combine_layer): BranchCombineLayer(
(linear_cga): ModuleList(
(0): Linear(in_features=1024, out_features=2048, bias=True)
(1): Linear(in_features=1024, out_features=2048, bias=True)
)
(linear_cbn): ModuleList(
(0): Linear(in_features=1024, out_features=2048, bias=True)
(1): Linear(in_features=1024, out_features=2048, bias=True)
)
(linear_ques): ModuleList(
(0): Linear(in_features=1024, out_features=2048, bias=True)
(1): Linear(in_features=1024, out_features=2048, bias=True)
)
(layer_norm): ModuleList(
(0): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
(1): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
(2): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
)
)
(classifier): ClassifierLayer(
(module): TripleLinear(
(linears): ModuleList(
(0): Linear(in_features=2048, out_features=3129, bias=True)
(1): Linear(in_features=2048, out_features=3129, bias=True)
(2): Linear(in_features=2048, out_features=3129, bias=True)
)
)
)
(losses): Losses(
(losses): ModuleList(
(0): MMFLoss(
(loss_criterion): TripleLogitBinaryCrossEntropy()
)
)
)
)
)
2022-04-24T08:37:29 | mmf.utils.general: Total Parameters: 254918110. Trained Parameters: 254918110
2022-04-24T08:37:29 | mmf.trainers.core.training_loop: Starting training...
Traceback (most recent call last):
File "/root/anaconda3/envs/mm/bin/mmf_run", line 33, in
sys.exit(load_entry_point('mmf==1.0.0rc12', 'console_scripts', 'mmf_run')())
File "/home/cvpr/vqa/mmf-main/mmf_cli/run.py", line 129, in run
nprocs=config.distributed.world_size,
File "/root/anaconda3/envs/mm/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/root/anaconda3/envs/mm/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/root/anaconda3/envs/mm/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/root/anaconda3/envs/mm/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/home/cvpr/vqa/mmf-main/mmf_cli/run.py", line 66, in distributed_main
main(configuration, init_distributed=True, predict=predict)
File "/home/cvpr/vqa/mmf-main/mmf_cli/run.py", line 56, in main
trainer.train()
File "/home/cvpr/vqa/mmf-main/mmf/trainers/mmf_trainer.py", line 145, in train
self.training_loop()
File "/home/cvpr/vqa/mmf-main/mmf/trainers/core/training_loop.py", line 33, in training_loop
self.run_training_epoch()
File "/home/cvpr/vqa/mmf-main/mmf/trainers/core/training_loop.py", line 91, in run_training_epoch
report = self.run_training_batch(batch, num_batches_for_this_update)
File "/home/cvpr/vqa/mmf-main/mmf/trainers/core/training_loop.py", line 166, in run_training_batch
report = self._forward(batch)
File "/home/cvpr/vqa/mmf-main/mmf/trainers/core/training_loop.py", line 200, in _forward
model_output = self.model(prepared_batch)
File "/root/anaconda3/envs/mm/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/root/anaconda3/envs/mm/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 799, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/cvpr/vqa/mmf-main/mmf/models/base_model.py", line 309, in call
model_output = super().call(sample_list, *args, **kwargs)
File "/root/anaconda3/envs/mm/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/cvpr/vqa/mmf-main/mmf/models/movie_mcan.py", line 266, in forward
"image", sample_list, text_embedding_total, text_embedding_vec[:, 0]
File "/home/cvpr/vqa/mmf-main/mmf/models/movie_mcan.py", line 243, in process_feature_embedding
sample_list.text_mask,
File "/root/anaconda3/envs/mm/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/cvpr/vqa/mmf-main/mmf/modules/embeddings.py", line 621, in forward
x_cbn = self.cbn(x, v)
File "/root/anaconda3/envs/mm/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/cvpr/vqa/mmf-main/mmf/modules/embeddings.py", line 589, in forward
x, _ = cbn(x, v)
File "/root/anaconda3/envs/mm/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/cvpr/vqa/mmf-main/mmf/modules/bottleneck.py", line 137, in forward
x = self.conv1(x) + self.cond(x, cond)
File "/root/anaconda3/envs/mm/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/root/anaconda3/envs/mm/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 443, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/root/anaconda3/envs/mm/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 440, in _conv_forward
self.padding, self.dilation, self.groups)
RuntimeError: Expected 4-dimensional input for 4-dimensional weight [256, 2048, 1, 1], but got 3-dimensional input of size [32, 100, 2048] instead

(mm) root@dgx5:/home/cvpr/vqa/mmf-main# Traceback (most recent call last):
File "", line 1, in
File "/root/anaconda3/envs/mm/lib/python3.7/multiprocessing/spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "/root/anaconda3/envs/mm/lib/python3.7/multiprocessing/spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
_pickle.UnpicklingError: pickle data was truncated
'''

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant