You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
full code you wrote or full changes you made (git diff)
I didn't change the code.
what exact command you run:
I run this following code under a singularity container.
singularity exec --nv singularity.sif mmf_run config=/scratch/UserName/hateful_meme/mmf/projects/hateful_memes/configs/visual_bert/direct.yaml model=visual_bert dataset=hateful_memes env.data_dir=/scratch/UserName/hateful_meme/data training.num_workers=1 training.fast_read=True
full logs you observed:
WARNING: underlay of /usr/bin/nvidia-debugdump required more than 50 (375) bind mounts
/usr/local/lib/python3.7/dist-packages/omegaconf/dictconfig.py:252: UserWarning: Keys with dot (model.bert) are deprecated and will have different semantic meaning the next major version of OmegaConf (2.1)
See the compact keys issue for more details: Enhancement: Compact key support omry/omegaconf#152
You can disable this warning by setting the environment variable OC_DISABLE_DOT_ACCESS_WARNING=1
warnings.warn(message=msg, category=UserWarning)
2020-11-18T07:22:20 | mmf.utils.configuration: Overriding option config to /scratch/UserName/hateful_meme/mmf/projects/hateful_memes/configs/visual_bert/direct.yaml
2020-11-18T07:22:20 | mmf.utils.configuration: Overriding option model to visual_bert
2020-11-18T07:22:20 | mmf.utils.configuration: Overriding option datasets to hateful_memes
2020-11-18T07:22:20 | mmf.utils.configuration: Overriding option env.data_dir to /scratch/UserName/hateful_meme/data
2020-11-18T07:22:20 | mmf: Logging to: ./save/train.log
2020-11-18T07:22:20 | mmf_cli.run: Namespace(config_override=None, local_rank=None, opts=['config=/scratch/UserName/hateful_meme/mmf/projects/hateful_memes/configs/visual_bert/direct.yaml', 'model=visual_bert', 'dataset=hateful_memes', 'env.data_dir=/scratch/UserName/hateful_meme/data'])
2020-11-18T07:22:20 | mmf_cli.run: Torch version: 1.6.0+cu101
2020-11-18T07:22:20 | mmf.utils.general: CUDA Device 0 is: Tesla V100-PCIE-32GB
2020-11-18T07:22:20 | mmf_cli.run: Using seed 21259699
2020-11-18T07:22:20 | mmf.trainers.mmf_trainer: Loading datasets
[ Starting checksum for features.tar.gz]
[ Checksum successful for features.tar.gz]
Unpacking features.tar.gz
Expected behavior:
No error pops out, but it takes forever to unpack the features.tar.gz file, which is unexpected. I tried to manually download and unpack it locally in order to check whether it is the problem of slow unpacking and it turned out not. However, when I rerun the above code after that, it actually bypassed the downloading stage but get stuck again at "mmf.trainers.mmf_trainer: Loading datasets". I waited overnight to make sure it is not just too slow, but nothing changed.
Environment:
WARNING: underlay of /usr/bin/nvidia-debugdump required more than 50 (375) bind mounts
Collecting environment information...
PyTorch version: 1.6.0+cu101
Is debug build: No
CUDA used to build PyTorch: 10.1
OS: Ubuntu 16.04.6 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
CMake version: Could not collect
Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: Tesla K20Xm
Nvidia driver version: 418.39
cuDNN version: Could not collect
Versions of relevant libraries:
[pip3] numpy==1.19.4
[pip3] torch==1.6.0+cu101
[pip3] torchtext==0.5.0
[pip3] torchvision==0.7.0+cu101
[conda] Could not collect
The text was updated successfully, but these errors were encountered:
You don't need training.fast_read, it is for something else. Specifically, note CUDA_VISIBLE_DEVICES=0 to run it on single GPU and training.num_workers=0 to run it with only one dataset worker.
Hi, I tried your suggested command, but it still gets stuck at the same place. If it is useful, I can smoothly launch the code only for baseline Image-Grid. I find that the similar issue always happens when it tries to unpack things, either extras.tar.gz or features.tar.gz. Those files can be automatically downloaded and during unpacking, the size of resulted files keeps growing then becomes stable at some point but the main code gets stuck at "unpacking X.tar.gz" and doesn't change.
Instructions To Reproduce the Issue:
Check https://stackoverflow.com/help/minimal-reproducible-example for how to ask good questions. Simplify the steps to reproduce the issue using suggestions from the above link, and provide them below:
git diff
)I didn't change the code.
I run this following code under a singularity container.
singularity exec --nv singularity.sif mmf_run config=/scratch/UserName/hateful_meme/mmf/projects/hateful_memes/configs/visual_bert/direct.yaml model=visual_bert dataset=hateful_memes env.data_dir=/scratch/UserName/hateful_meme/data training.num_workers=1 training.fast_read=True
WARNING: underlay of /usr/bin/nvidia-debugdump required more than 50 (375) bind mounts
/usr/local/lib/python3.7/dist-packages/omegaconf/dictconfig.py:252: UserWarning: Keys with dot (model.bert) are deprecated and will have different semantic meaning the next major version of OmegaConf (2.1)
See the compact keys issue for more details: Enhancement: Compact key support omry/omegaconf#152
You can disable this warning by setting the environment variable OC_DISABLE_DOT_ACCESS_WARNING=1
warnings.warn(message=msg, category=UserWarning)
2020-11-18T07:22:20 | mmf.utils.configuration: Overriding option config to /scratch/UserName/hateful_meme/mmf/projects/hateful_memes/configs/visual_bert/direct.yaml
2020-11-18T07:22:20 | mmf.utils.configuration: Overriding option model to visual_bert
2020-11-18T07:22:20 | mmf.utils.configuration: Overriding option datasets to hateful_memes
2020-11-18T07:22:20 | mmf.utils.configuration: Overriding option env.data_dir to /scratch/UserName/hateful_meme/data
2020-11-18T07:22:20 | mmf: Logging to: ./save/train.log
2020-11-18T07:22:20 | mmf_cli.run: Namespace(config_override=None, local_rank=None, opts=['config=/scratch/UserName/hateful_meme/mmf/projects/hateful_memes/configs/visual_bert/direct.yaml', 'model=visual_bert', 'dataset=hateful_memes', 'env.data_dir=/scratch/UserName/hateful_meme/data'])
2020-11-18T07:22:20 | mmf_cli.run: Torch version: 1.6.0+cu101
2020-11-18T07:22:20 | mmf.utils.general: CUDA Device 0 is: Tesla V100-PCIE-32GB
2020-11-18T07:22:20 | mmf_cli.run: Using seed 21259699
2020-11-18T07:22:20 | mmf.trainers.mmf_trainer: Loading datasets
[ Starting checksum for features.tar.gz]
[ Checksum successful for features.tar.gz]
Unpacking features.tar.gz
Expected behavior:
No error pops out, but it takes forever to unpack the features.tar.gz file, which is unexpected. I tried to manually download and unpack it locally in order to check whether it is the problem of slow unpacking and it turned out not. However, when I rerun the above code after that, it actually bypassed the downloading stage but get stuck again at "mmf.trainers.mmf_trainer: Loading datasets". I waited overnight to make sure it is not just too slow, but nothing changed.
Environment:
WARNING: underlay of /usr/bin/nvidia-debugdump required more than 50 (375) bind mounts
Collecting environment information...
PyTorch version: 1.6.0+cu101
Is debug build: No
CUDA used to build PyTorch: 10.1
OS: Ubuntu 16.04.6 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
CMake version: Could not collect
Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: Tesla K20Xm
Nvidia driver version: 418.39
cuDNN version: Could not collect
Versions of relevant libraries:
[pip3] numpy==1.19.4
[pip3] torch==1.6.0+cu101
[pip3] torchtext==0.5.0
[pip3] torchvision==0.7.0+cu101
[conda] Could not collect
The text was updated successfully, but these errors were encountered: