Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Following the Bert-finetuning tutorial results in ImportError or IsADirectoryError: for run_squad_baseline.sh #474

Open
Santosh-Gupta opened this issue Oct 16, 2020 · 14 comments

Comments

@Santosh-Gupta
Copy link

I followed the getting started directions here

https://www.deepspeed.ai/tutorials/bert-finetuning/

I pulled the docker image and started a container.

I ran the following commands in a Jupyter notebook (server running in the container)

%set_env CUDA_VISIBLE_DEVICES=0,1,2
%cd /home/santosh/Projects/MsZeroTS
!git clone https://github.com/microsoft/DeepSpeed
!mkdir tfModel
!mkdir hfModel

#Save a HF version of the model
!pip install transformers
from transformers import BertModel
model = BertModel.from_pretrained('bert-base-cased')
model.save_pretrained('/home/santosh/Projects/MsZeroTS/hfModel')

#Save a tf version of the model 
%cd /home/santosh/Projects/MsZeroTS/tfModel
!wget https://storage.googleapis.com/bert_models/2018_10_18/cased_L-12_H-768_A-12.zip
!unzip -q cased_L-12_H-768_A-12.zip
%cd ..

%cd DeepSpeed
!git submodule update --init --recursive
%cd DeepSpeedExamples/BingBertSquad

#Download data 
!mkdir Data
%cd Data
!wget https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json
!wget https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json
%cd ..

#try tf version
!bash run_squad_baseline.sh 3 /home/santosh/Projects/MsZeroTS/TestModel/cased_L-12_H-768_A-12 /home/santosh/Projects/MsZeroTS/DeepSpeed/DeepSpeedExamples/BingBertSquad/Data /home/santosh/Projects/MsZeroTS/output1

#try hf version 
!bash run_squad_baseline.sh 3 /home/santosh/Projects/MsZeroTS/hfModel /home/santosh/Projects/MsZeroTS/DeepSpeed/DeepSpeedExamples/BingBertSquad/Data /home/santosh/Projects/MsZeroTS/output2

Neither the tf or hf versions of the models are working. This is a sample output from the baselines

lr is 0.00003
seed is 12345
master port is 29500
dropout is 0.1
deepspeed --num_nodes 1 --num_gpus 3 --master_port=29500 --hostfile /dev/null nvidia_run_squad_deepspeed.py --bert_model bert-large-uncased --do_train --do_lower_case --predict_batch_size 3 --do_predict --train_file /home/santosh/Projects/MicrosoftZero/DeepSpeed/DeepSpeedExamples/BingBertSquad/Data/train-v1.1.json --predict_file /home/santosh/Projects/MicrosoftZero/DeepSpeed/DeepSpeedExamples/BingBertSquad/Data/dev-v1.1.json --train_batch_size 8 --learning_rate 0.00003 --num_train_epochs 2.0 --max_seq_length 384 --doc_stride 128 --output_dir /home/santosh/Projects/MicrosoftZero/DeepSpeed/output --job_name deepspeed_3GPUs_24batch_size --gradient_accumulation_steps 2 --fp16 --deepspeed --deepspeed_config onebit_deepspeed_bsz24_config.json --dropout 0.1 --model_file /home/santosh/Projects/MicrosoftZero/TestModel/cased_L-12_H-768_A-12 --seed 12345 --preln
[2020-10-16 04:30:19,447] [WARNING] [stage2.py:32:<module>] apex was installed without --cpp_ext.  Falling back to Python flatten and unflatten.
[2020-10-16 04:30:19,465] [WARNING] [engine.py:48:<module>] Warning:  apex was installed without --cpp_ext.  Falling back to Python flatten and unflatten.
[2020-10-16 04:30:19,469] [WARNING] [runner.py:117:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2020-10-16 04:30:19,533] [INFO] [runner.py:355:main] cmd = /usr/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMl19 --master_addr=127.0.0.1 --master_port=29500 nvidia_run_squad_deepspeed.py --bert_model bert-large-uncased --do_train --do_lower_case --predict_batch_size 3 --do_predict --train_file /home/santosh/Projects/MicrosoftZero/DeepSpeed/DeepSpeedExamples/BingBertSquad/Data/train-v1.1.json --predict_file /home/santosh/Projects/MicrosoftZero/DeepSpeed/DeepSpeedExamples/BingBertSquad/Data/dev-v1.1.json --train_batch_size 8 --learning_rate 0.00003 --num_train_epochs 2.0 --max_seq_length 384 --doc_stride 128 --output_dir /home/santosh/Projects/MicrosoftZero/DeepSpeed/output --job_name deepspeed_3GPUs_24batch_size --gradient_accumulation_steps 2 --fp16 --deepspeed --deepspeed_config onebit_deepspeed_bsz24_config.json --dropout 0.1 --model_file /home/santosh/Projects/MicrosoftZero/TestModel/cased_L-12_H-768_A-12 --seed 12345 --preln
[2020-10-16 04:30:20,172] [WARNING] [stage2.py:32:<module>] apex was installed without --cpp_ext.  Falling back to Python flatten and unflatten.
[2020-10-16 04:30:20,190] [WARNING] [engine.py:48:<module>] Warning:  apex was installed without --cpp_ext.  Falling back to Python flatten and unflatten.
[2020-10-16 04:30:20,193] [INFO] [launch.py:71:main] 0 NCCL_VERSION 2.6.4
[2020-10-16 04:30:20,193] [INFO] [launch.py:78:main] WORLD INFO DICT: {'localhost': [0, 1, 2]}
[2020-10-16 04:30:20,194] [INFO] [launch.py:87:main] nnodes=1, num_local_procs=3, node_rank=0
[2020-10-16 04:30:20,194] [INFO] [launch.py:99:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2]})
[2020-10-16 04:30:20,194] [INFO] [launch.py:100:main] dist_world_size=3
[2020-10-16 04:30:20,194] [INFO] [launch.py:103:main] Setting CUDA_VISIBLE_DEVICES=0,1,2
[2020-10-16 04:30:20,899] [WARNING] [stage2.py:32:<module>] apex was installed without --cpp_ext.  Falling back to Python flatten and unflatten.
[2020-10-16 04:30:20,917] [WARNING] [engine.py:48:<module>] Warning:  apex was installed without --cpp_ext.  Falling back to Python flatten and unflatten.
[2020-10-16 04:30:20,927] [WARNING] [stage2.py:32:<module>] apex was installed without --cpp_ext.  Falling back to Python flatten and unflatten.
[2020-10-16 04:30:20,945] [WARNING] [engine.py:48:<module>] Warning:  apex was installed without --cpp_ext.  Falling back to Python flatten and unflatten.
[2020-10-16 04:30:20,985] [WARNING] [stage2.py:32:<module>] apex was installed without --cpp_ext.  Falling back to Python flatten and unflatten.
[2020-10-16 04:30:21,003] [WARNING] [engine.py:48:<module>] Warning:  apex was installed without --cpp_ext.  Falling back to Python flatten and unflatten.
10/16/2020 04:30:21 - INFO - pytorch_pretrained_bert.tokenization -   loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-vocab.txt from cache at /home/santosh/.pytorch_pretrained_bert/9b3c03a36e83b13d5ba95ac965c9f9074a99e14340c523ab405703179e79fc46.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
10/16/2020 04:30:21 - INFO - pytorch_pretrained_bert.tokenization -   loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-vocab.txt from cache at /home/santosh/.pytorch_pretrained_bert/9b3c03a36e83b13d5ba95ac965c9f9074a99e14340c523ab405703179e79fc46.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
10/16/2020 04:30:21 - INFO - pytorch_pretrained_bert.tokenization -   loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-vocab.txt from cache at /home/santosh/.pytorch_pretrained_bert/9b3c03a36e83b13d5ba95ac965c9f9074a99e14340c523ab405703179e79fc46.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
Traceback (most recent call last):
  File "nvidia_run_squad_deepspeed.py", line 1143, in <module>
    main()
  File "nvidia_run_squad_deepspeed.py", line 816, in main
    model = BertForQuestionAnsweringPreLN(bert_config, args)
  File "/home/santosh/Projects/MicrosoftZero/DeepSpeed/DeepSpeedExamples/BingBertSquad/turing/nvidia_modelingpreln.py", line 1500, in __init__
    self.bert = BertModel(config, args)
  File "/home/santosh/Projects/MicrosoftZero/DeepSpeed/DeepSpeedExamples/BingBertSquad/turing/nvidia_modelingpreln.py", line 927, in __init__
    self.embeddings = BertEmbeddings(config)
  File "/home/santosh/Projects/MicrosoftZero/DeepSpeed/DeepSpeedExamples/BingBertSquad/turing/nvidia_modelingpreln.py", line 354, in __init__
    self.LayerNorm = BertLayerNorm(config.hidden_size, eps=1e-12)
  File "/usr/local/lib/python3.6/dist-packages/apex/normalization/fused_layer_norm.py", line 133, in __init__
    fused_layer_norm_cuda = importlib.import_module("fused_layer_norm_cuda")
  File "/usr/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 658, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 571, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 922, in create_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
ImportError: /usr/local/lib/python3.6/dist-packages/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe26detail37_typeMetaDataInstance_preallocated_32E
Traceback (most recent call last):
  File "nvidia_run_squad_deepspeed.py", line 1143, in <module>
    main()
  File "nvidia_run_squad_deepspeed.py", line 816, in main
    model = BertForQuestionAnsweringPreLN(bert_config, args)
  File "/home/santosh/Projects/MicrosoftZero/DeepSpeed/DeepSpeedExamples/BingBertSquad/turing/nvidia_modelingpreln.py", line 1500, in __init__
    self.bert = BertModel(config, args)
  File "/home/santosh/Projects/MicrosoftZero/DeepSpeed/DeepSpeedExamples/BingBertSquad/turing/nvidia_modelingpreln.py", line 927, in __init__
    self.embeddings = BertEmbeddings(config)
  File "/home/santosh/Projects/MicrosoftZero/DeepSpeed/DeepSpeedExamples/BingBertSquad/turing/nvidia_modelingpreln.py", line 354, in __init__
    self.LayerNorm = BertLayerNorm(config.hidden_size, eps=1e-12)
  File "/usr/local/lib/python3.6/dist-packages/apex/normalization/fused_layer_norm.py", line 133, in __init__
    fused_layer_norm_cuda = importlib.import_module("fused_layer_norm_cuda")
  File "/usr/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 658, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 571, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 922, in create_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
ImportError: /usr/local/lib/python3.6/dist-packages/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe26detail37_typeMetaDataInstance_preallocated_32E
Traceback (most recent call last):
  File "nvidia_run_squad_deepspeed.py", line 1143, in <module>
    main()
  File "nvidia_run_squad_deepspeed.py", line 816, in main
    model = BertForQuestionAnsweringPreLN(bert_config, args)
  File "/home/santosh/Projects/MicrosoftZero/DeepSpeed/DeepSpeedExamples/BingBertSquad/turing/nvidia_modelingpreln.py", line 1500, in __init__
    self.bert = BertModel(config, args)
  File "/home/santosh/Projects/MicrosoftZero/DeepSpeed/DeepSpeedExamples/BingBertSquad/turing/nvidia_modelingpreln.py", line 927, in __init__
    self.embeddings = BertEmbeddings(config)
  File "/home/santosh/Projects/MicrosoftZero/DeepSpeed/DeepSpeedExamples/BingBertSquad/turing/nvidia_modelingpreln.py", line 354, in __init__
    self.LayerNorm = BertLayerNorm(config.hidden_size, eps=1e-12)
  File "/usr/local/lib/python3.6/dist-packages/apex/normalization/fused_layer_norm.py", line 133, in __init__
    fused_layer_norm_cuda = importlib.import_module("fused_layer_norm_cuda")
  File "/usr/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 658, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 571, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 922, in create_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
ImportError: /usr/local/lib/python3.6/dist-packages/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe26detail37_typeMetaDataInstance_preallocated_32E

I tried both the hf and tf versions of the model because it looked like the error was related to the model initialization.

This info might be helpful; in the same notebook I ran another pytorch training script without any errors.

I tried running run_squad_baseline.sh outside the jupyter notebook, directly in terminal. For both the hf and tf versions, I get a different error; it looks like it's not able to load the model from the directory. Here is a sample output.

/home/santosh/Projects/MicrosoftZero/DeepSpeed/DeepSpeedExamples/BingBertSquad$ bash run_squad_baseline.sh 3 /home/santosh/Projects/MsZeroTS/hfModel/ /home/santosh/Projects/MsZeroTS/DeepSpeed/DeepSpeedExamples/BingBertSquad/Data /home/santosh/Projects/MsZeroTS/output12
deepspeed --num_nodes 1 --num_gpus 3 nvidia_run_squad_baseline.py --bert_model bert-large-uncased --do_train --do_lower_case --do_predict --train_file /home/santosh/Projects/MsZeroTS/DeepSpeed/DeepSpeedExamples/BingBertSquad/Data/train-v1.1.json --predict_file /home/santosh/Projects/MsZeroTS/DeepSpeed/DeepSpeedExamples/BingBertSquad/Data/dev-v1.1.json --train_batch_size 8 --learning_rate 3e-5 --num_train_epochs 2.0 --max_seq_length 384 --doc_stride 128 --output_dir /home/santosh/Projects/MsZeroTS/output12 --job_name baseline_3GPUs_24batch_size --gradient_accumulation_steps 1 --fp16 --model_file /home/santosh/Projects/MsZeroTS/hfModel/
[2020-10-16 05:23:17,877] [WARNING] [runner.py:117:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2020-10-16 05:23:17,910] [INFO] [runner.py:355:main] cmd = /usr/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMl19 --master_addr=127.0.0.1 --master_port=29500 nvidia_run_squad_baseline.py --bert_model bert-large-uncased --do_train --do_lower_case --do_predict --train_file /home/santosh/Projects/MsZeroTS/DeepSpeed/DeepSpeedExamples/BingBertSquad/Data/train-v1.1.json --predict_file /home/santosh/Projects/MsZeroTS/DeepSpeed/DeepSpeedExamples/BingBertSquad/Data/dev-v1.1.json --train_batch_size 8 --learning_rate 3e-5 --num_train_epochs 2.0 --max_seq_length 384 --doc_stride 128 --output_dir /home/santosh/Projects/MsZeroTS/output12 --job_name baseline_3GPUs_24batch_size --gradient_accumulation_steps 1 --fp16 --model_file /home/santosh/Projects/MsZeroTS/hfModel/
[2020-10-16 05:23:18,468] [INFO] [launch.py:71:main] 0 NCCL_VERSION 2.6.4
[2020-10-16 05:23:18,469] [INFO] [launch.py:78:main] WORLD INFO DICT: {'localhost': [0, 1, 2]}
[2020-10-16 05:23:18,469] [INFO] [launch.py:87:main] nnodes=1, num_local_procs=3, node_rank=0
[2020-10-16 05:23:18,469] [INFO] [launch.py:99:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2]})
[2020-10-16 05:23:18,469] [INFO] [launch.py:100:main] dist_world_size=3
[2020-10-16 05:23:18,469] [INFO] [launch.py:103:main] Setting CUDA_VISIBLE_DEVICES=0,1,2
10/16/2020 05:23:19 - INFO - __main__ -   device: cuda:1 n_gpu: 1, distributed training: True, 16-bits training: True
10/16/2020 05:23:19 - INFO - __main__ -   device: cuda:2 n_gpu: 1, distributed training: True, 16-bits training: True
10/16/2020 05:23:19 - INFO - __main__ -   device: cuda:0 n_gpu: 1, distributed training: True, 16-bits training: True
10/16/2020 05:23:19 - INFO - pytorch_pretrained_bert.tokenization -   loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-vocab.txt from cache at /home/santosh/Projects/MicrosoftZero/DeepSpeed/cache/9b3c03a36e83b13d5ba95ac965c9f9074a99e14340c523ab405703179e79fc46.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
10/16/2020 05:23:19 - INFO - pytorch_pretrained_bert.tokenization -   loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-vocab.txt from cache at /home/santosh/Projects/MicrosoftZero/DeepSpeed/cache/9b3c03a36e83b13d5ba95ac965c9f9074a99e14340c523ab405703179e79fc46.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
10/16/2020 05:23:19 - INFO - pytorch_pretrained_bert.tokenization -   loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-vocab.txt from cache at /home/santosh/Projects/MicrosoftZero/DeepSpeed/cache/9b3c03a36e83b13d5ba95ac965c9f9074a99e14340c523ab405703179e79fc46.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
VOCAB SIZE: 30528
10/16/2020 05:23:32 - INFO - __main__ -   Loading Pretrained Bert Encoder from: /home/santosh/Projects/MsZeroTS/hfModel/
Traceback (most recent call last):
  File "nvidia_run_squad_baseline.py", line 1158, in <module>
    main()
  File "nvidia_run_squad_baseline.py", line 872, in main
    map_location=torch.device("cpu"))
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 381, in load
    f = open(f, 'rb')
IsADirectoryError: [Errno 21] Is a directory: '/home/santosh/Projects/MsZeroTS/hfModel/'
VOCAB SIZE: 30528
10/16/2020 05:23:32 - INFO - __main__ -   Loading Pretrained Bert Encoder from: /home/santosh/Projects/MsZeroTS/hfModel/
Traceback (most recent call last):
  File "nvidia_run_squad_baseline.py", line 1158, in <module>
    main()
  File "nvidia_run_squad_baseline.py", line 872, in main
    map_location=torch.device("cpu"))
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 381, in load
    f = open(f, 'rb')
IsADirectoryError: [Errno 21] Is a directory: '/home/santosh/Projects/MsZeroTS/hfModel/'
VOCAB SIZE: 30528
10/16/2020 05:23:32 - INFO - __main__ -   Loading Pretrained Bert Encoder from: /home/santosh/Projects/MsZeroTS/hfModel/
Traceback (most recent call last):
  File "nvidia_run_squad_baseline.py", line 1158, in <module>
    main()
  File "nvidia_run_squad_baseline.py", line 872, in main
    map_location=torch.device("cpu"))
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 381, in load
    f = open(f, 'rb')
IsADirectoryError: [Errno 21] Is a directory: '/home/santosh/Projects/MsZeroTS/hfModel/'
@tjruwase
Copy link
Contributor

@Santosh-Gupta, thanks for using DeepSpeed.

The second argument to script is the model file itself rather than a folder. Please see here and here for details.

@Santosh-Gupta
Copy link
Author

Thanks for the info. It looks like I need to point to the checkpoint file in particular. So for a Tensorflow model, point to the model.ckpt.index (or is it the model.ckpt.meta ? ), and for a huggingface model, just point to the model.bin.

It seems that some of the model types need more than one file to be fully defined, I'm guessing the library will search the containing folder to search for any other files it needs, such as the config files. Is that what is going on, or is it somehow just using the checkpoint file?

@tjruwase
Copy link
Contributor

@Santosh-Gupta Did you report a Default process group is not initialized error?

@Santosh-Gupta
Copy link
Author

Santosh-Gupta commented Oct 23, 2020

@Santosh-Gupta Did you report a Default process group is not initialized error?

Yes, sorry I noticed in the code that the model used was bert-large-cased where I was using bert-base-uncased so I wanted to see if switching the model made a difference, but I'm still getting errors.

For the following, I pointed the model file path to the .bin huggingface file, running run_squad_deepspeed.sh

I first tried running the code in a jupyter notebook, the server running on the deepspeed container. This was the full output

lr is 0.00003
seed is 12345
master port is 29500
dropout is 0.1
deepspeed --num_nodes 1 --num_gpus 3 --master_port=29500 --hostfile /dev/null nvidia_run_squad_deepspeed.py --bert_model bert-large-uncased --do_train --do_lower_case --predict_batch_size 3 --do_predict --train_file /home/santosh/projects/deepSpeed/testData/train-v1.1.json --predict_file /home/santosh/projects/deepSpeed/testData/dev-v1.1.json --train_batch_size 8 --learning_rate 0.00003 --num_train_epochs 2.0 --max_seq_length 384 --doc_stride 128 --output_dir /home/santosh/projects/deepSpeed/outputs/a1 --job_name deepspeed_3GPUs_24batch_size --gradient_accumulation_steps 2 --fp16 --deepspeed --deepspeed_config onebit_deepspeed_bsz24_config.json --dropout 0.1 --model_file /home/santosh/projects/deepSpeed/testModels/hf/pytorch_model.bin --seed 12345 --preln
[2020-10-23 08:33:23,956] [WARNING] [runner.py:117:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2020-10-23 08:33:24,011] [INFO] [runner.py:355:main] cmd = /usr/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMl19 --master_addr=127.0.0.1 --master_port=29500 nvidia_run_squad_deepspeed.py --bert_model bert-large-uncased --do_train --do_lower_case --predict_batch_size 3 --do_predict --train_file /home/santosh/projects/deepSpeed/testData/train-v1.1.json --predict_file /home/santosh/projects/deepSpeed/testData/dev-v1.1.json --train_batch_size 8 --learning_rate 0.00003 --num_train_epochs 2.0 --max_seq_length 384 --doc_stride 128 --output_dir /home/santosh/projects/deepSpeed/outputs/a1 --job_name deepspeed_3GPUs_24batch_size --gradient_accumulation_steps 2 --fp16 --deepspeed --deepspeed_config onebit_deepspeed_bsz24_config.json --dropout 0.1 --model_file /home/santosh/projects/deepSpeed/testModels/hf/pytorch_model.bin --seed 12345 --preln
[2020-10-23 08:33:24,576] [INFO] [launch.py:71:main] 0 NCCL_VERSION 2.6.4
[2020-10-23 08:33:24,576] [INFO] [launch.py:78:main] WORLD INFO DICT: {'localhost': [0, 1, 2]}
[2020-10-23 08:33:24,576] [INFO] [launch.py:87:main] nnodes=1, num_local_procs=3, node_rank=0
[2020-10-23 08:33:24,576] [INFO] [launch.py:99:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2]})
[2020-10-23 08:33:24,576] [INFO] [launch.py:100:main] dist_world_size=3
[2020-10-23 08:33:24,576] [INFO] [launch.py:103:main] Setting CUDA_VISIBLE_DEVICES=0,1,2
10/23/2020 08:33:25 - INFO - pytorch_pretrained_bert.tokenization -   loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-vocab.txt from cache at "/home/santosh"/.pytorch_pretrained_bert/9b3c03a36e83b13d5ba95ac965c9f9074a99e14340c523ab405703179e79fc46.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
10/23/2020 08:33:25 - INFO - pytorch_pretrained_bert.tokenization -   loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-vocab.txt from cache at "/home/santosh"/.pytorch_pretrained_bert/9b3c03a36e83b13d5ba95ac965c9f9074a99e14340c523ab405703179e79fc46.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
10/23/2020 08:33:25 - INFO - pytorch_pretrained_bert.tokenization -   loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-vocab.txt from cache at "/home/santosh"/.pytorch_pretrained_bert/9b3c03a36e83b13d5ba95ac965c9f9074a99e14340c523ab405703179e79fc46.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
Traceback (most recent call last):
  File "nvidia_run_squad_deepspeed.py", line 1143, in <module>
    main()
  File "nvidia_run_squad_deepspeed.py", line 816, in main
    model = BertForQuestionAnsweringPreLN(bert_config, args)
  File "/home/santosh/projects/deepSpeed/DeepSpeed/DeepSpeedExamples/BingBertSquad/turing/nvidia_modelingpreln.py", line 1500, in __init__
    self.bert = BertModel(config, args)
  File "/home/santosh/projects/deepSpeed/DeepSpeed/DeepSpeedExamples/BingBertSquad/turing/nvidia_modelingpreln.py", line 930, in __init__
    self.apply(self.init_bert_weights)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 293, in apply
    module.apply(fn)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 293, in apply
    module.apply(fn)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 293, in apply
    module.apply(fn)
  [Previous line repeated 3 more times]
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 294, in apply
    fn(self)
  File "/home/santosh/projects/deepSpeed/DeepSpeed/DeepSpeedExamples/BingBertSquad/turing/nvidia_modelingpreln.py", line 730, in init_bert_weights
    if torch.distributed.get_rank() == 0:
  File "/usr/local/lib/python3.6/dist-packages/torch/distributed/distributed_c10d.py", line 562, in get_rank
    _check_default_pg()
  File "/usr/local/lib/python3.6/dist-packages/torch/distributed/distributed_c10d.py", line 191, in _check_default_pg
    "Default process group is not initialized"
AssertionError: Default process group is not initialized
Traceback (most recent call last):
  File "nvidia_run_squad_deepspeed.py", line 1143, in <module>
    main()
  File "nvidia_run_squad_deepspeed.py", line 816, in main
    model = BertForQuestionAnsweringPreLN(bert_config, args)
  File "/home/santosh/projects/deepSpeed/DeepSpeed/DeepSpeedExamples/BingBertSquad/turing/nvidia_modelingpreln.py", line 1500, in __init__
    self.bert = BertModel(config, args)
  File "/home/santosh/projects/deepSpeed/DeepSpeed/DeepSpeedExamples/BingBertSquad/turing/nvidia_modelingpreln.py", line 930, in __init__
    self.apply(self.init_bert_weights)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 293, in apply
    module.apply(fn)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 293, in apply
    module.apply(fn)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 293, in apply
    module.apply(fn)
  [Previous line repeated 3 more times]
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 294, in apply
    fn(self)
  File "/home/santosh/projects/deepSpeed/DeepSpeed/DeepSpeedExamples/BingBertSquad/turing/nvidia_modelingpreln.py", line 730, in init_bert_weights
    if torch.distributed.get_rank() == 0:
  File "/usr/local/lib/python3.6/dist-packages/torch/distributed/distributed_c10d.py", line 562, in get_rank
    _check_default_pg()
  File "/usr/local/lib/python3.6/dist-packages/torch/distributed/distributed_c10d.py", line 191, in _check_default_pg
    "Default process group is not initialized"
AssertionError: Default process group is not initialized
Traceback (most recent call last):
  File "nvidia_run_squad_deepspeed.py", line 1143, in <module>
    main()
  File "nvidia_run_squad_deepspeed.py", line 816, in main
    model = BertForQuestionAnsweringPreLN(bert_config, args)
  File "/home/santosh/projects/deepSpeed/DeepSpeed/DeepSpeedExamples/BingBertSquad/turing/nvidia_modelingpreln.py", line 1500, in __init__
    self.bert = BertModel(config, args)
  File "/home/santosh/projects/deepSpeed/DeepSpeed/DeepSpeedExamples/BingBertSquad/turing/nvidia_modelingpreln.py", line 930, in __init__
    self.apply(self.init_bert_weights)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 293, in apply
    module.apply(fn)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 293, in apply
    module.apply(fn)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 293, in apply
    module.apply(fn)
  [Previous line repeated 3 more times]
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 294, in apply
    fn(self)
  File "/home/santosh/projects/deepSpeed/DeepSpeed/DeepSpeedExamples/BingBertSquad/turing/nvidia_modelingpreln.py", line 730, in init_bert_weights
    if torch.distributed.get_rank() == 0:
  File "/usr/local/lib/python3.6/dist-packages/torch/distributed/distributed_c10d.py", line 562, in get_rank
    _check_default_pg()
  File "/usr/local/lib/python3.6/dist-packages/torch/distributed/distributed_c10d.py", line 191, in _check_default_pg
    "Default process group is not initialized"
AssertionError: Default process group is not initialized

I then tried running it directly in the deepspeed docker container terminal, in case there was an issue with jupyter, since there seems to be a different error.

lr is 0.00003
seed is 12345
master port is 29500
dropout is 0.1
deepspeed --num_nodes 1 --num_gpus 3 --master_port=29500 --hostfile /dev/null nvidia_run_squad_deepspeed.py --bert_model bert-large-uncased --do_train --do_lower_case --predict_batch_size 3 --do_predict --train_file /home/santosh/projects/deepSpeed/testData/train-v1.1.json --predict_file /home/santosh/projects/deepSpeed/testData/dev-v1.1.json --train_batch_size 8 --learning_rate 0.00003 --num_train_epochs 2.0 --max_seq_length 384 --doc_stride 128 --output_dir /home/santosh/projects/deepSpeed/outputs/a1ab --job_name deepspeed_3GPUs_24batch_size --gradient_accumulation_steps 2 --fp16 --deepspeed --deepspeed_config onebit_deepspeed_bsz24_config.json --dropout 0.1 --model_file /home/santosh/projects/deepSpeed/testModels/hf/pytorch_model.bin --seed 12345 --preln
[2020-10-23 09:11:34,436] [WARNING] [stage2.py:32:<module>] apex was installed without --cpp_ext.  Falling back to Python flatten and unflatten.
[2020-10-23 09:11:34,454] [WARNING] [engine.py:48:<module>] Warning:  apex was installed without --cpp_ext.  Falling back to Python flatten and unflatten.
[2020-10-23 09:11:34,458] [WARNING] [runner.py:117:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2020-10-23 09:11:34,514] [INFO] [runner.py:355:main] cmd = /usr/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMl19 --master_addr=127.0.0.1 --master_port=29500 nvidia_run_squad_deepspeed.py --bert_model bert-large-uncased --do_train --do_lower_case --predict_batch_size 3 --do_predict --train_file /home/santosh/projects/deepSpeed/testData/train-v1.1.json --predict_file /home/santosh/projects/deepSpeed/testData/dev-v1.1.json --train_batch_size 8 --learning_rate 0.00003 --num_train_epochs 2.0 --max_seq_length 384 --doc_stride 128 --output_dir /home/santosh/projects/deepSpeed/outputs/a1ab --job_name deepspeed_3GPUs_24batch_size --gradient_accumulation_steps 2 --fp16 --deepspeed --deepspeed_config onebit_deepspeed_bsz24_config.json --dropout 0.1 --model_file /home/santosh/projects/deepSpeed/testModels/hf/pytorch_model.bin --seed 12345 --preln
[2020-10-23 09:11:35,131] [WARNING] [stage2.py:32:<module>] apex was installed without --cpp_ext.  Falling back to Python flatten and unflatten.
[2020-10-23 09:11:35,150] [WARNING] [engine.py:48:<module>] Warning:  apex was installed without --cpp_ext.  Falling back to Python flatten and unflatten.
[2020-10-23 09:11:35,153] [INFO] [launch.py:71:main] 0 NCCL_VERSION 2.6.4
[2020-10-23 09:11:35,154] [INFO] [launch.py:78:main] WORLD INFO DICT: {'localhost': [0, 1, 2]}
[2020-10-23 09:11:35,154] [INFO] [launch.py:87:main] nnodes=1, num_local_procs=3, node_rank=0
[2020-10-23 09:11:35,154] [INFO] [launch.py:99:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2]})
[2020-10-23 09:11:35,154] [INFO] [launch.py:100:main] dist_world_size=3
[2020-10-23 09:11:35,154] [INFO] [launch.py:103:main] Setting CUDA_VISIBLE_DEVICES=0,1,2
[2020-10-23 09:11:35,886] [WARNING] [stage2.py:32:<module>] apex was installed without --cpp_ext.  Falling back to Python flatten and unflatten.
[2020-10-23 09:11:35,897] [WARNING] [stage2.py:32:<module>] apex was installed without --cpp_ext.  Falling back to Python flatten and unflatten.
[2020-10-23 09:11:35,905] [WARNING] [engine.py:48:<module>] Warning:  apex was installed without --cpp_ext.  Falling back to Python flatten and unflatten.
[2020-10-23 09:11:35,916] [WARNING] [engine.py:48:<module>] Warning:  apex was installed without --cpp_ext.  Falling back to Python flatten and unflatten.
[2020-10-23 09:11:35,953] [WARNING] [stage2.py:32:<module>] apex was installed without --cpp_ext.  Falling back to Python flatten and unflatten.
[2020-10-23 09:11:35,971] [WARNING] [engine.py:48:<module>] Warning:  apex was installed without --cpp_ext.  Falling back to Python flatten and unflatten.
10/23/2020 09:11:36 - INFO - pytorch_pretrained_bert.tokenization -   loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-vocab.txt from cache at /home/santosh/.pytorch_pretrained_bert/9b3c03a36e83b13d5ba95ac965c9f9074a99e14340c523ab405703179e79fc46.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
10/23/2020 09:11:36 - INFO - pytorch_pretrained_bert.tokenization -   loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-vocab.txt from cache at /home/santosh/.pytorch_pretrained_bert/9b3c03a36e83b13d5ba95ac965c9f9074a99e14340c523ab405703179e79fc46.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
10/23/2020 09:11:36 - INFO - pytorch_pretrained_bert.tokenization -   loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-vocab.txt from cache at /home/santosh/.pytorch_pretrained_bert/9b3c03a36e83b13d5ba95ac965c9f9074a99e14340c523ab405703179e79fc46.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
Traceback (most recent call last):
  File "nvidia_run_squad_deepspeed.py", line 1143, in <module>
    main()
  File "nvidia_run_squad_deepspeed.py", line 816, in main
    model = BertForQuestionAnsweringPreLN(bert_config, args)
  File "/home/santosh/projects/deepSpeed/DeepSpeed/DeepSpeedExamples/BingBertSquad/turing/nvidia_modelingpreln.py", line 1500, in __init__
    self.bert = BertModel(config, args)
  File "/home/santosh/projects/deepSpeed/DeepSpeed/DeepSpeedExamples/BingBertSquad/turing/nvidia_modelingpreln.py", line 927, in __init__
    self.embeddings = BertEmbeddings(config)
  File "/home/santosh/projects/deepSpeed/DeepSpeed/DeepSpeedExamples/BingBertSquad/turing/nvidia_modelingpreln.py", line 354, in __init__
    self.LayerNorm = BertLayerNorm(config.hidden_size, eps=1e-12)
  File "/usr/local/lib/python3.6/dist-packages/apex/normalization/fused_layer_norm.py", line 133, in __init__
    fused_layer_norm_cuda = importlib.import_module("fused_layer_norm_cuda")
  File "/usr/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 658, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 571, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 922, in create_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
ImportError: /usr/local/lib/python3.6/dist-packages/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe26detail37_typeMetaDataInstance_preallocated_32E
Traceback (most recent call last):
  File "nvidia_run_squad_deepspeed.py", line 1143, in <module>
    main()
  File "nvidia_run_squad_deepspeed.py", line 816, in main
    model = BertForQuestionAnsweringPreLN(bert_config, args)
  File "/home/santosh/projects/deepSpeed/DeepSpeed/DeepSpeedExamples/BingBertSquad/turing/nvidia_modelingpreln.py", line 1500, in __init__
    self.bert = BertModel(config, args)
  File "/home/santosh/projects/deepSpeed/DeepSpeed/DeepSpeedExamples/BingBertSquad/turing/nvidia_modelingpreln.py", line 927, in __init__
    self.embeddings = BertEmbeddings(config)
  File "/home/santosh/projects/deepSpeed/DeepSpeed/DeepSpeedExamples/BingBertSquad/turing/nvidia_modelingpreln.py", line 354, in __init__
    self.LayerNorm = BertLayerNorm(config.hidden_size, eps=1e-12)
  File "/usr/local/lib/python3.6/dist-packages/apex/normalization/fused_layer_norm.py", line 133, in __init__
    fused_layer_norm_cuda = importlib.import_module("fused_layer_norm_cuda")
  File "/usr/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 658, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 571, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 922, in create_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
ImportError: /usr/local/lib/python3.6/dist-packages/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe26detail37_typeMetaDataInstance_preallocated_32E
Traceback (most recent call last):
  File "nvidia_run_squad_deepspeed.py", line 1143, in <module>
    main()
  File "nvidia_run_squad_deepspeed.py", line 816, in main
    model = BertForQuestionAnsweringPreLN(bert_config, args)
  File "/home/santosh/projects/deepSpeed/DeepSpeed/DeepSpeedExamples/BingBertSquad/turing/nvidia_modelingpreln.py", line 1500, in __init__
    self.bert = BertModel(config, args)
  File "/home/santosh/projects/deepSpeed/DeepSpeed/DeepSpeedExamples/BingBertSquad/turing/nvidia_modelingpreln.py", line 927, in __init__
    self.embeddings = BertEmbeddings(config)
  File "/home/santosh/projects/deepSpeed/DeepSpeed/DeepSpeedExamples/BingBertSquad/turing/nvidia_modelingpreln.py", line 354, in __init__
    self.LayerNorm = BertLayerNorm(config.hidden_size, eps=1e-12)
  File "/usr/local/lib/python3.6/dist-packages/apex/normalization/fused_layer_norm.py", line 133, in __init__
    fused_layer_norm_cuda = importlib.import_module("fused_layer_norm_cuda")
  File "/usr/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 658, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 571, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 922, in create_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
ImportError: /usr/local/lib/python3.6/dist-packages/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe26detail37_typeMetaDataInstance_preallocated_32E

I get the same errors when pointing to a tensorflow .ckpt.index file

In both cases, the issue seems to be due to loading the model. If it helps, I am able to run other pytorch training code in the container.

I also tried running run_squad_baseline.sh, and also got errors

deepspeed --num_nodes 1 --num_gpus 3 nvidia_run_squad_baseline.py --bert_model bert-large-uncased --do_train --do_lower_case --do_predict --train_file /home/santosh/projects/deepSpeed/testData/train-v1.1.json --predict_file /home/santosh/projects/deepSpeed/testData/dev-v1.1.json --train_batch_size 8 --learning_rate 3e-5 --num_train_epochs 2.0 --max_seq_length 384 --doc_stride 128 --output_dir /home/santosh/projects/deepSpeed/outputs/a1a --job_name baseline_3GPUs_24batch_size --gradient_accumulation_steps 1 --fp16 --model_file /home/santosh/projects/deepSpeed/testModels/hf/pytorch_model.bin
[2020-10-23 09:23:19,668] [WARNING] [stage2.py:32:<module>] apex was installed without --cpp_ext.  Falling back to Python flatten and unflatten.
[2020-10-23 09:23:19,685] [WARNING] [engine.py:48:<module>] Warning:  apex was installed without --cpp_ext.  Falling back to Python flatten and unflatten.
[2020-10-23 09:23:19,689] [WARNING] [runner.py:117:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2020-10-23 09:23:19,744] [INFO] [runner.py:355:main] cmd = /usr/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMl19 --master_addr=127.0.0.1 --master_port=29500 nvidia_run_squad_baseline.py --bert_model bert-large-uncased --do_train --do_lower_case --do_predict --train_file /home/santosh/projects/deepSpeed/testData/train-v1.1.json --predict_file /home/santosh/projects/deepSpeed/testData/dev-v1.1.json --train_batch_size 8 --learning_rate 3e-5 --num_train_epochs 2.0 --max_seq_length 384 --doc_stride 128 --output_dir /home/santosh/projects/deepSpeed/outputs/a1a --job_name baseline_3GPUs_24batch_size --gradient_accumulation_steps 1 --fp16 --model_file /home/santosh/projects/deepSpeed/testModels/hf/pytorch_model.bin
[2020-10-23 09:23:20,360] [WARNING] [stage2.py:32:<module>] apex was installed without --cpp_ext.  Falling back to Python flatten and unflatten.
[2020-10-23 09:23:20,377] [WARNING] [engine.py:48:<module>] Warning:  apex was installed without --cpp_ext.  Falling back to Python flatten and unflatten.
[2020-10-23 09:23:20,380] [INFO] [launch.py:71:main] 0 NCCL_VERSION 2.6.4
[2020-10-23 09:23:20,380] [INFO] [launch.py:78:main] WORLD INFO DICT: {'localhost': [0, 1, 2]}
[2020-10-23 09:23:20,380] [INFO] [launch.py:87:main] nnodes=1, num_local_procs=3, node_rank=0
[2020-10-23 09:23:20,380] [INFO] [launch.py:99:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2]})
[2020-10-23 09:23:20,380] [INFO] [launch.py:100:main] dist_world_size=3
[2020-10-23 09:23:20,380] [INFO] [launch.py:103:main] Setting CUDA_VISIBLE_DEVICES=0,1,2
10/23/2020 09:23:21 - INFO - __main__ -   device: cuda:1 n_gpu: 1, distributed training: True, 16-bits training: True
10/23/2020 09:23:21 - INFO - pytorch_pretrained_bert.tokenization -   loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-vocab.txt from cache at /home/santosh/.pytorch_pretrained_bert/9b3c03a36e83b13d5ba95ac965c9f9074a99e14340c523ab405703179e79fc46.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
10/23/2020 09:23:22 - INFO - __main__ -   device: cuda:2 n_gpu: 1, distributed training: True, 16-bits training: True
10/23/2020 09:23:22 - INFO - __main__ -   device: cuda:0 n_gpu: 1, distributed training: True, 16-bits training: True
10/23/2020 09:23:22 - INFO - pytorch_pretrained_bert.tokenization -   loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-vocab.txt from cache at /home/santosh/.pytorch_pretrained_bert/9b3c03a36e83b13d5ba95ac965c9f9074a99e14340c523ab405703179e79fc46.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
10/23/2020 09:23:22 - INFO - pytorch_pretrained_bert.tokenization -   loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-vocab.txt from cache at /home/santosh/.pytorch_pretrained_bert/9b3c03a36e83b13d5ba95ac965c9f9074a99e14340c523ab405703179e79fc46.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
Traceback (most recent call last):
  File "nvidia_run_squad_baseline.py", line 1158, in <module>
    main()
  File "nvidia_run_squad_baseline.py", line 866, in main
    model = BertForQuestionAnswering(bert_config, args)
  File "/home/santosh/projects/deepSpeed/DeepSpeed/DeepSpeedExamples/BingBertSquad/turing/nvidia_modeling.py", line 1472, in __init__
    self.bert = BertModel(config, args)
  File "/home/santosh/projects/deepSpeed/DeepSpeed/DeepSpeedExamples/BingBertSquad/turing/nvidia_modeling.py", line 900, in __init__
    self.embeddings = BertEmbeddings(config)
  File "/home/santosh/projects/deepSpeed/DeepSpeed/DeepSpeedExamples/BingBertSquad/turing/nvidia_modeling.py", line 347, in __init__
    self.LayerNorm = BertLayerNorm(config.hidden_size, eps=1e-12)
  File "/usr/local/lib/python3.6/dist-packages/apex/normalization/fused_layer_norm.py", line 133, in __init__
    fused_layer_norm_cuda = importlib.import_module("fused_layer_norm_cuda")
  File "/usr/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 658, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 571, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 922, in create_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
ImportError: /usr/local/lib/python3.6/dist-packages/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe26detail37_typeMetaDataInstance_preallocated_32E
Traceback (most recent call last):
  File "nvidia_run_squad_baseline.py", line 1158, in <module>
    main()
  File "nvidia_run_squad_baseline.py", line 866, in main
    model = BertForQuestionAnswering(bert_config, args)
  File "/home/santosh/projects/deepSpeed/DeepSpeed/DeepSpeedExamples/BingBertSquad/turing/nvidia_modeling.py", line 1472, in __init__
    self.bert = BertModel(config, args)
  File "/home/santosh/projects/deepSpeed/DeepSpeed/DeepSpeedExamples/BingBertSquad/turing/nvidia_modeling.py", line 900, in __init__
    self.embeddings = BertEmbeddings(config)
  File "/home/santosh/projects/deepSpeed/DeepSpeed/DeepSpeedExamples/BingBertSquad/turing/nvidia_modeling.py", line 347, in __init__
    self.LayerNorm = BertLayerNorm(config.hidden_size, eps=1e-12)
  File "/usr/local/lib/python3.6/dist-packages/apex/normalization/fused_layer_norm.py", line 133, in __init__
    fused_layer_norm_cuda = importlib.import_module("fused_layer_norm_cuda")
  File "/usr/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 658, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 571, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 922, in create_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
ImportError: /usr/local/lib/python3.6/dist-packages/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe26detail37_typeMetaDataInstance_preallocated_32E
Traceback (most recent call last):
  File "nvidia_run_squad_baseline.py", line 1158, in <module>
    main()
  File "nvidia_run_squad_baseline.py", line 866, in main
    model = BertForQuestionAnswering(bert_config, args)
  File "/home/santosh/projects/deepSpeed/DeepSpeed/DeepSpeedExamples/BingBertSquad/turing/nvidia_modeling.py", line 1472, in __init__
    self.bert = BertModel(config, args)
  File "/home/santosh/projects/deepSpeed/DeepSpeed/DeepSpeedExamples/BingBertSquad/turing/nvidia_modeling.py", line 900, in __init__
    self.embeddings = BertEmbeddings(config)
  File "/home/santosh/projects/deepSpeed/DeepSpeed/DeepSpeedExamples/BingBertSquad/turing/nvidia_modeling.py", line 347, in __init__
    self.LayerNorm = BertLayerNorm(config.hidden_size, eps=1e-12)
  File "/usr/local/lib/python3.6/dist-packages/apex/normalization/fused_layer_norm.py", line 133, in __init__
    fused_layer_norm_cuda = importlib.import_module("fused_layer_norm_cuda")
  File "/usr/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 658, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 571, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 922, in create_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
ImportError: /usr/local/lib/python3.6/dist-packages/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe26detail37_typeMetaDataInstance_preallocated_32E

And for terminal I am getting

deepspeed --num_nodes 1 --num_gpus 3 nvidia_run_squad_baseline.py --bert_model bert-large-uncased --do_train --do_lower_case --do_predict --train_file /home/santosh/projects/deepSpeed/testData/train-v1.1.json --predict_file /home/santosh/projects/deepSpeed/testData/dev-v1.1.json --train_batch_size 8 --learning_rate 3e-5 --num_train_epochs 2.0 --max_seq_length 384 --doc_stride 128 --output_dir /home/santosh/projects/deepSpeed/outputs/a1ak --job_name baseline_3GPUs_24batch_size --gradient_accumulation_steps 1 --fp16 --model_file /home/santosh/projects/deepSpeed/testModels/hf/pytorch_model.bin
[2020-10-23 09:26:40,968] [WARNING] [stage2.py:32:<module>] apex was installed without --cpp_ext.  Falling back to Python flatten and unflatten.
[2020-10-23 09:26:40,987] [WARNING] [engine.py:48:<module>] Warning:  apex was installed without --cpp_ext.  Falling back to Python flatten and unflatten.
[2020-10-23 09:26:40,991] [WARNING] [runner.py:117:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2020-10-23 09:26:41,046] [INFO] [runner.py:355:main] cmd = /usr/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMl19 --master_addr=127.0.0.1 --master_port=29500 nvidia_run_squad_baseline.py --bert_model bert-large-uncased --do_train --do_lower_case --do_predict --train_file /home/santosh/projects/deepSpeed/testData/train-v1.1.json --predict_file /home/santosh/projects/deepSpeed/testData/dev-v1.1.json --train_batch_size 8 --learning_rate 3e-5 --num_train_epochs 2.0 --max_seq_length 384 --doc_stride 128 --output_dir /home/santosh/projects/deepSpeed/outputs/a1ak --job_name baseline_3GPUs_24batch_size --gradient_accumulation_steps 1 --fp16 --model_file /home/santosh/projects/deepSpeed/testModels/hf/pytorch_model.bin
[2020-10-23 09:26:41,712] [WARNING] [stage2.py:32:<module>] apex was installed without --cpp_ext.  Falling back to Python flatten and unflatten.
[2020-10-23 09:26:41,730] [WARNING] [engine.py:48:<module>] Warning:  apex was installed without --cpp_ext.  Falling back to Python flatten and unflatten.
[2020-10-23 09:26:41,733] [INFO] [launch.py:71:main] 0 NCCL_VERSION 2.6.4
[2020-10-23 09:26:41,733] [INFO] [launch.py:78:main] WORLD INFO DICT: {'localhost': [0, 1, 2]}
[2020-10-23 09:26:41,733] [INFO] [launch.py:87:main] nnodes=1, num_local_procs=3, node_rank=0
[2020-10-23 09:26:41,733] [INFO] [launch.py:99:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2]})
[2020-10-23 09:26:41,733] [INFO] [launch.py:100:main] dist_world_size=3
[2020-10-23 09:26:41,733] [INFO] [launch.py:103:main] Setting CUDA_VISIBLE_DEVICES=0,1,2
10/23/2020 09:26:42 - INFO - __main__ -   device: cuda:1 n_gpu: 1, distributed training: True, 16-bits training: True
10/23/2020 09:26:42 - INFO - __main__ -   device: cuda:2 n_gpu: 1, distributed training: True, 16-bits training: True
10/23/2020 09:26:42 - INFO - __main__ -   device: cuda:0 n_gpu: 1, distributed training: True, 16-bits training: True
10/23/2020 09:26:42 - INFO - pytorch_pretrained_bert.tokenization -   loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-vocab.txt from cache at /home/santosh/.pytorch_pretrained_bert/9b3c03a36e83b13d5ba95ac965c9f9074a99e14340c523ab405703179e79fc46.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
10/23/2020 09:26:42 - INFO - pytorch_pretrained_bert.tokenization -   loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-vocab.txt from cache at /home/santosh/.pytorch_pretrained_bert/9b3c03a36e83b13d5ba95ac965c9f9074a99e14340c523ab405703179e79fc46.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
10/23/2020 09:26:42 - INFO - pytorch_pretrained_bert.tokenization -   loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-vocab.txt from cache at /home/santosh/.pytorch_pretrained_bert/9b3c03a36e83b13d5ba95ac965c9f9074a99e14340c523ab405703179e79fc46.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
Traceback (most recent call last):
  File "nvidia_run_squad_baseline.py", line 1158, in <module>
    main()
  File "nvidia_run_squad_baseline.py", line 866, in main
    model = BertForQuestionAnswering(bert_config, args)
  File "/home/santosh/projects/deepSpeed/DeepSpeed/DeepSpeedExamples/BingBertSquad/turing/nvidia_modeling.py", line 1472, in __init__
    self.bert = BertModel(config, args)
  File "/home/santosh/projects/deepSpeed/DeepSpeed/DeepSpeedExamples/BingBertSquad/turing/nvidia_modeling.py", line 900, in __init__
    self.embeddings = BertEmbeddings(config)
  File "/home/santosh/projects/deepSpeed/DeepSpeed/DeepSpeedExamples/BingBertSquad/turing/nvidia_modeling.py", line 347, in __init__
    self.LayerNorm = BertLayerNorm(config.hidden_size, eps=1e-12)
  File "/usr/local/lib/python3.6/dist-packages/apex/normalization/fused_layer_norm.py", line 133, in __init__
    fused_layer_norm_cuda = importlib.import_module("fused_layer_norm_cuda")
  File "/usr/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 658, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 571, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 922, in create_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
ImportError: /usr/local/lib/python3.6/dist-packages/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe26detail37_typeMetaDataInstance_preallocated_32E
Traceback (most recent call last):
  File "nvidia_run_squad_baseline.py", line 1158, in <module>
    main()
  File "nvidia_run_squad_baseline.py", line 866, in main
    model = BertForQuestionAnswering(bert_config, args)
  File "/home/santosh/projects/deepSpeed/DeepSpeed/DeepSpeedExamples/BingBertSquad/turing/nvidia_modeling.py", line 1472, in __init__
    self.bert = BertModel(config, args)
  File "/home/santosh/projects/deepSpeed/DeepSpeed/DeepSpeedExamples/BingBertSquad/turing/nvidia_modeling.py", line 900, in __init__
    self.embeddings = BertEmbeddings(config)
  File "/home/santosh/projects/deepSpeed/DeepSpeed/DeepSpeedExamples/BingBertSquad/turing/nvidia_modeling.py", line 347, in __init__
    self.LayerNorm = BertLayerNorm(config.hidden_size, eps=1e-12)
  File "/usr/local/lib/python3.6/dist-packages/apex/normalization/fused_layer_norm.py", line 133, in __init__
    fused_layer_norm_cuda = importlib.import_module("fused_layer_norm_cuda")
  File "/usr/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 658, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 571, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 922, in create_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
ImportError: /usr/local/lib/python3.6/dist-packages/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe26detail37_typeMetaDataInstance_preallocated_32E
Traceback (most recent call last):
  File "nvidia_run_squad_baseline.py", line 1158, in <module>
    main()
  File "nvidia_run_squad_baseline.py", line 866, in main
    model = BertForQuestionAnswering(bert_config, args)
  File "/home/santosh/projects/deepSpeed/DeepSpeed/DeepSpeedExamples/BingBertSquad/turing/nvidia_modeling.py", line 1472, in __init__
    self.bert = BertModel(config, args)
  File "/home/santosh/projects/deepSpeed/DeepSpeed/DeepSpeedExamples/BingBertSquad/turing/nvidia_modeling.py", line 900, in __init__
    self.embeddings = BertEmbeddings(config)
  File "/home/santosh/projects/deepSpeed/DeepSpeed/DeepSpeedExamples/BingBertSquad/turing/nvidia_modeling.py", line 347, in __init__
    self.LayerNorm = BertLayerNorm(config.hidden_size, eps=1e-12)
  File "/usr/local/lib/python3.6/dist-packages/apex/normalization/fused_layer_norm.py", line 133, in __init__
    fused_layer_norm_cuda = importlib.import_module("fused_layer_norm_cuda")
  File "/usr/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 658, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 571, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 922, in create_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
ImportError: /usr/local/lib/python3.6/dist-packages/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe26detail37_typeMetaDataInstance_preallocated_32E

@tjruwase
Copy link
Contributor

These new import errors suggest a mismatch in cuda, apex, or torch versions. Can you double check those?

ImportError: /usr/local/lib/python3.6/dist-packages/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe26detail37_typeMetaDataInstance_preallocated_32E

@Santosh-Gupta
Copy link
Author

These new import errors suggest a mismatch in cuda, apex, or torch versions. Can you double check those?

ImportError: /usr/local/lib/python3.6/dist-packages/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe26detail37_typeMetaDataInstance_preallocated_32E

torch is version 1.6.0
apex is Version: 0.1
Cuda is Version 10.0.130

I see that the latest Cuda version is 11.1, I'll upgrade it and check if that solves the issue.

@tjruwase
Copy link
Contributor

Actually can you try out these sequence of commands in python to test compatibility of cuda, torch, and apex fusedlayernorm?

>>> import torch
>>> import apex
>>> input = torch.randn(20, 5, 10, 10)
>>> m = apex.normalization.FusedLayerNorm(input.size()[1:])
>>> output = m(input)

@Santosh-Gupta
Copy link
Author

import torch
import apex
input = torch.randn(20, 5, 10, 10)
m = apex.normalization.FusedLayerNorm(input.size()[1:])
output = m(input)

Running this resulted in an error for the 4th line, here's the output

>>> import torch
>>> import apex
>>> input = torch.randn(20, 5, 10, 10)
>>> m = apex.normalization.FusedLayerNorm(input.size()[1:])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/apex/normalization/fused_layer_norm.py", line 133, in __init__
    fused_layer_norm_cuda = importlib.import_module("fused_layer_norm_cuda")
  File "/usr/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 658, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 571, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 922, in create_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
ImportError: /usr/local/lib/python3.6/dist-packages/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe26detail37_typeMetaDataInstance_preallocated_32E

@tjruwase
Copy link
Contributor

This confirms an incompatibility issue independent of deepspeed. I vaguely recall that either torch 1.6.0 or apex 0.1 requires cuda 10.1, and so upgrading cuda should fix the problem. For reference my cuda/torch/apex versions are
cuda 10.1
torch 1.6.0
apex 0.1

@Santosh-Gupta
Copy link
Author

Great, thanks tjruwase, I'll upgrade it and report back the results.

@Santosh-Gupta
Copy link
Author

This confirms an incompatibility issue independent of deepspeed. I vaguely recall that either torch 1.6.0 or apex 0.1 requires cuda 10.1, and so upgrading cuda should fix the problem. For reference my cuda/torch/apex versions are
cuda 10.1
torch 1.6.0
apex 0.1

I am wondering if the deepspeed docker image has an outdated version of cuda, that's what it seems like here

https://github.com/microsoft/DeepSpeed/blob/master/docker/Dockerfile#L1

Currently nvcc -V in the deepspeed container is showing

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

Even though we have recently installed 11.1 on our machine. I created a fresh docker container from the original image, and it is still showing V10.0.130.

@tjruwase
Copy link
Contributor

tjruwase commented Oct 28, 2020

Yes, the deepspeed docker image is cuda 10.0, which is a bit confusing since it does not work with torch 1.6.0. However, it does work with torch 1.5.0 which the deepspeed release was tested against. So it seems the options are (1) Downgrade to torch 1.5.0 to use cuda 10.0, or (2) Upgrade docker file to cuda 10.1 to use torch 1.6.0. Do either of these options work for you?

@Santosh-Gupta
Copy link
Author

Yes, the deepspeed docker image is cuda 10.0, which is a bit confusing since it does not work with torch 1.6.0. However, it does work with torch 1.5.0 which the deepspeed release was tested against. So it seems the options are (1) Downgrade to torch 1.5.0 to use cuda 10.0, or (2) Upgrade docker file to cuda 10.1 to use torch 1.6.0. Do either of these options work for you?

Ahh I see. Yeah downgrading python should work; Cuda seems to be very tricky to work with on our machines. I'll downgrade python and report back the results.

@Santosh-Gupta
Copy link
Author

Yes, the deepspeed docker image is cuda 10.0, which is a bit confusing since it does not work with torch 1.6.0. However, it does work with torch 1.5.0 which the deepspeed release was tested against. So it seems the options are (1) Downgrade to torch 1.5.0 to use cuda 10.0, or (2) Upgrade docker file to cuda 10.1 to use torch 1.6.0. Do either of these options work for you?

I downgraded my torch version to 1.5.0 to work with the official docker image, but I am still getting an error for that code snippet to test the compatibility against.

nvcc -V gives

release 10.0, V10.0.130

and 'torch.version' gives 1.5.0

but

import torch
import apex
input = torch.randn(20, 5, 10, 10)
m = apex.normalization.FusedLayerNorm(input.size()[1:])
output = m(input)

gives

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-8-ff19223b4d5c> in <module>
      4 import apex
      5 input = torch.randn(20, 5, 10, 10)
----> 6 m = apex.normalization.FusedLayerNorm(input.size()[1:])
      7 output = m(input)

/usr/local/lib/python3.6/dist-packages/apex/normalization/fused_layer_norm.py in __init__(self, normalized_shape, eps, elementwise_affine)
    131 
    132         global fused_layer_norm_cuda
--> 133         fused_layer_norm_cuda = importlib.import_module("fused_layer_norm_cuda")
    134 
    135         if isinstance(normalized_shape, numbers.Integral):

/usr/lib/python3.6/importlib/__init__.py in import_module(name, package)
    124                 break
    125             level += 1
--> 126     return _bootstrap._gcd_import(name[level:], package, level)
    127 
    128 

/usr/lib/python3.6/importlib/_bootstrap.py in _gcd_import(name, package, level)

/usr/lib/python3.6/importlib/_bootstrap.py in _find_and_load(name, import_)

/usr/lib/python3.6/importlib/_bootstrap.py in _find_and_load_unlocked(name, import_)

/usr/lib/python3.6/importlib/_bootstrap.py in _load_unlocked(spec)

/usr/lib/python3.6/importlib/_bootstrap.py in module_from_spec(spec)

/usr/lib/python3.6/importlib/_bootstrap_external.py in create_module(self, spec)

/usr/lib/python3.6/importlib/_bootstrap.py in _call_with_frames_removed(f, *args, **kwds)

ImportError: /usr/local/lib/python3.6/dist-packages/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe26detail37_typeMetaDataInstance_preallocated_32E

jeffra added a commit that referenced this issue Apr 11, 2023
* Merge chatgpt v2 to v3 - finalized (#484)

* [squash] staging chatgpt v1 (#463)

Co-authored-by: Reza Yazdani <[email protected]>
Co-authored-by: yaozhewei <[email protected]>
Co-authored-by: Tunji Ruwase <[email protected]>

* [partial] formatting fixes

* quantizer fixes

* fix for bert tests

* formatting fixes

* re-enable _param_slice_mappings in z2

* Enable the QKV requires_grad when in training mode (#466)

Co-authored-by: Jeff Rasley <[email protected]>

* fixes for attention enable_training flag

* commit to trigger CI

* fix for distil-bert param

* fixes for training context errors

* remove reza's qkv-optimization (#469)

Co-authored-by: Jeff Rasley <[email protected]>

* Chatgpt - Fuse lora params at HybridEngine (#472)

Co-authored-by: Jeff Rasley <[email protected]>

* add option to enable non-pin mode (#473)

* Chatgpt - fuse lora non pinned case (#474)

* Fix fuse/unfuse lora for Z3 and non-pinned parameter

* unfuse_lora_weight for non-pinned case

* fix the multiple issue for lora parameters

* formatting

* fuse lora only when available

---------

Co-authored-by: Jeff Rasley <[email protected]>

* Chatgpt/release inference cache (#475)

* Fix fuse/unfuse lora for Z3 and non-pinned parameter

* unfuse_lora_weight for non-pinned case

* release/retake the inference cache after/before generate

* remove duplicated _fuse_lora function

* fix formatting

* fix hybrid-engine config issue

* update formatting

* Chatgpt - fuse qkv v2 (#478)

Co-authored-by: Jeff Rasley <[email protected]>

* ChatGPT: Refactor Hybrid Engine Config (#477)

Co-authored-by: Lok Chand Koppaka <[email protected]>

* Inference Workspace Tweaks (#481)

* Safety checks around inference workspace allocation, extra flushing

* Formatting fixes

* Merge fix

* Chatgpt/inference tp (#480)

* Update the merged-QKV weights only if there is difference with the model parameter

* remove the hard-coded size

* always reset qkv params to updated ones after running step

* Add the infernce-tp group and tensor sharding to run inference in model-parallel mode

* optimize the gather/mp-sharding part

* Add hybrid_engine changes

* fix config issue

* Formatting fixes. Reset_qkv duplicate removal.

* fix bloom container.

* fix format.

---------

Co-authored-by: Ammar Ahmad Awan <[email protected]>
Co-authored-by: Lok Chand Koppaka <[email protected]>

* fix formatting

* more clean-up

---------

Co-authored-by: Jeff Rasley <[email protected]>
Co-authored-by: yaozhewei <[email protected]>
Co-authored-by: Tunji Ruwase <[email protected]>
Co-authored-by: Masahiro Tanaka <[email protected]>
Co-authored-by: Michael Wyatt <[email protected]>
Co-authored-by: Lok Chand Koppaka <[email protected]>
Co-authored-by: Connor Holmes <[email protected]>
Co-authored-by: Ammar Ahmad Awan <[email protected]>

* fix a bug on lora-fusion (#487)

* Cholmes/v3 workspace bugfixes (#488)

* Miscellaneous workspace fixes, new config param

* Fix typo

---------

Co-authored-by: Reza Yazdani <[email protected]>
Co-authored-by: Jeff Rasley <[email protected]>
Co-authored-by: yaozhewei <[email protected]>
Co-authored-by: Tunji Ruwase <[email protected]>
Co-authored-by: Masahiro Tanaka <[email protected]>
Co-authored-by: Michael Wyatt <[email protected]>
Co-authored-by: Lok Chand Koppaka <[email protected]>
Co-authored-by: Connor Holmes <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants