Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot install DeepSpeed on Ubuntu 20.04 #425

Closed
drfinkus opened this issue Sep 20, 2020 · 19 comments
Closed

Cannot install DeepSpeed on Ubuntu 20.04 #425

drfinkus opened this issue Sep 20, 2020 · 19 comments

Comments

@drfinkus
Copy link

Attempting to install DeepSpeed using the following steps:

  1. Cloned DeepSpeed repository
  2. Created virtual environment: python3 -m venv env
  3. Activated virtual environment: source env/bin/activate
  4. Ran install script: ./install.sh

gcc: 8.4.0
nvcc: 10.2
g++: 8.4.0

$ pip3 list
Package       Version
------------- -------
apex          0.1    
cpufeature    0.1.1  
future        0.18.2 
numpy         1.19.2 
Pillow        7.2.0  
pip           20.0.2 
pkg-resources 0.0.0  
protobuf      3.13.0 
psutil        5.7.2  
setuptools    44.0.0 
six           1.15.0 
tensorboardX  1.8    
torch         1.6.0  
torchvision   0.7.0  
tqdm          4.49.0 
wheel         0.35.1 

The install script throws the error message below:

removing build/bdist.linux-x86_64/wheel
/home/user/code/DeepSpeed
Installing apex locally so that deepspeed will build
Found existing installation: apex 0.1
Uninstalling apex-0.1:
  Successfully uninstalled apex-0.1
Non-user install because user site-packages disabled
Created temporary directory: /tmp/pip-ephem-wheel-cache-8wavy60a
Created temporary directory: /tmp/pip-req-tracker-a5l1t8ki
Initialized build tracking at /tmp/pip-req-tracker-a5l1t8ki
Created build tracker: /tmp/pip-req-tracker-a5l1t8ki
Entered build tracker: /tmp/pip-req-tracker-a5l1t8ki
Created temporary directory: /tmp/pip-install-tzd3lyjo
Processing ./third_party/apex/dist/apex-0.1-cp38-cp38-linux_x86_64.whl
  Added apex==0.1 from file:///home/user/code/DeepSpeed/third_party/apex/dist/apex-0.1-cp38-cp38-linux_x86_64.whl to build tracker '/tmp/pip-req-tracker-a5l1t8ki'
  Removed apex==0.1 from file:///home/user/code/DeepSpeed/third_party/apex/dist/apex-0.1-cp38-cp38-linux_x86_64.whl from build tracker '/tmp/pip-req-tracker-a5l1t8ki'
Installing collected packages: apex
  Created temporary directory: /tmp/pip-unpacked-wheel-rdepxzch

Successfully installed apex-0.1
Cleaning up...
Removed build tracker: '/tmp/pip-req-tracker-a5l1t8ki'
Building deepspeed wheel
./install.sh: line 196: 41599 Floating point exception(core dumped) python setup.py -v bdist_wheel
Error on line 195
Fail to install deepspeed
@ShadenSmith
Copy link
Contributor

Hi @drfinkus, thanks for your detailed report. Is there other output from the build process you could capture and share? Our setup.py does several things, including building CUDA extensions. We'll need to narrow down where the issue is. I think some of the build outputs and errors are printed to STDERR.

You might also try disabling the extension building to do a python-only install. I'd be very curious if that works and would also help narrow down the issue. To do that, just prepend DS_BUILD_CUDA=0 to your installation command.

@rople380
Copy link

rople380 commented Sep 23, 2020

Same error, identified core dump as being caused by an import:

import cpufeature

Replicable via:

python -c "import cpufeature; cpufeature.print_features()"

Version:

Ubuntu 18.04
python --version
Python 3.6.10 :: Anaconda, Inc.
pip list | grep cpufeature
cpufeature             0.1.1
gcc --version | grep gcc
gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
nvcc --version | grep release
Cuda compilation tools, release 10.1, V10.1.243
g++ --version | grep g++
g++ (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0

@drfinkus
Copy link
Author

Here is the full log, as requested:

(env) user@desktop:~/code/DeepSpeed$ ./install.sh 
Attempting to remove deepspeed/git_version_info_installed.py
Attempting to remove dist
Attempting to remove build
Attempting to remove deepspeed.egg-info
Attempting to remove third_party/apex/dist
Attempting to remove third_party/apex/build
Attempting to remove third_party/apex/apex.egg-info
No hostfile exists at /job/hostfile, installing locally
Non-user install because user site-packages disabled
Created temporary directory: /tmp/pip-ephem-wheel-cache-tsk9wsb0
Created temporary directory: /tmp/pip-req-tracker-ayv7e7xe
Initialized build tracking at /tmp/pip-req-tracker-ayv7e7xe
Created build tracker: /tmp/pip-req-tracker-ayv7e7xe
Entered build tracker: /tmp/pip-req-tracker-ayv7e7xe
Created temporary directory: /tmp/pip-install-y_l89l49
Requirement already satisfied: torch>=1.2 in ./env/lib/python3.8/site-packages (from -r requirements/requirements.txt (line 1)) (1.6.0)
Requirement already satisfied: torchvision>=0.4.0 in ./env/lib/python3.8/site-packages (from -r requirements/requirements.txt (line 2)) (0.7.0)
Requirement already satisfied: tqdm in ./env/lib/python3.8/site-packages (from -r requirements/requirements.txt (line 3)) (4.49.0)
Requirement already satisfied: psutil in ./env/lib/python3.8/site-packages (from -r requirements/requirements.txt (line 4)) (5.7.2)
Requirement already satisfied: cpufeature in ./env/lib/python3.8/site-packages (from -r requirements/requirements.txt (line 5)) (0.1.1)
Requirement already satisfied: tensorboardX==1.8 in ./env/lib/python3.8/site-packages (from -r requirements/requirements.txt (line 6)) (1.8)
Requirement already satisfied: future in ./env/lib/python3.8/site-packages (from torch>=1.2->-r requirements/requirements.txt (line 1)) (0.18.2)
Requirement already satisfied: numpy in ./env/lib/python3.8/site-packages (from torch>=1.2->-r requirements/requirements.txt (line 1)) (1.19.2)
Requirement already satisfied: pillow>=4.1.1 in ./env/lib/python3.8/site-packages (from torchvision>=0.4.0->-r requirements/requirements.txt (line 2)) (7.2.0)
Requirement already satisfied: six in ./env/lib/python3.8/site-packages (from tensorboardX==1.8->-r requirements/requirements.txt (line 6)) (1.15.0)
Requirement already satisfied: protobuf>=3.2.0 in ./env/lib/python3.8/site-packages (from tensorboardX==1.8->-r requirements/requirements.txt (line 6)) (3.13.0)
Requirement already satisfied: setuptools in ./env/lib/python3.8/site-packages (from protobuf>=3.2.0->tensorboardX==1.8->-r requirements/requirements.txt (line 6)) (44.0.0)
Cleaning up...
Removed build tracker: '/tmp/pip-req-tracker-ayv7e7xe'
Checking out sub-module(s)
Building apex wheel
torch.__version__  =  1.6.0
setup.py:43: UserWarning: Option --pyprof not specified. Not installing PyProf dependencies!
  warnings.warn("Option --pyprof not specified. Not installing PyProf dependencies!")

Compiling cuda extensions with
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89
from /usr/local/cuda/bin

running bdist_wheel
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/utils/cpp_extension.py:335: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
  warnings.warn(msg.format('we could not find ninja.'))
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.8
creating build/lib.linux-x86_64-3.8/apex
copying apex/__init__.py -> build/lib.linux-x86_64-3.8/apex
creating build/lib.linux-x86_64-3.8/apex/contrib
copying apex/contrib/__init__.py -> build/lib.linux-x86_64-3.8/apex/contrib
creating build/lib.linux-x86_64-3.8/apex/parallel
copying apex/parallel/LARC.py -> build/lib.linux-x86_64-3.8/apex/parallel
copying apex/parallel/distributed.py -> build/lib.linux-x86_64-3.8/apex/parallel
copying apex/parallel/__init__.py -> build/lib.linux-x86_64-3.8/apex/parallel
copying apex/parallel/optimized_sync_batchnorm.py -> build/lib.linux-x86_64-3.8/apex/parallel
copying apex/parallel/sync_batchnorm.py -> build/lib.linux-x86_64-3.8/apex/parallel
copying apex/parallel/multiproc.py -> build/lib.linux-x86_64-3.8/apex/parallel
copying apex/parallel/optimized_sync_batchnorm_kernel.py -> build/lib.linux-x86_64-3.8/apex/parallel
copying apex/parallel/sync_batchnorm_kernel.py -> build/lib.linux-x86_64-3.8/apex/parallel
creating build/lib.linux-x86_64-3.8/apex/multi_tensor_apply
copying apex/multi_tensor_apply/__init__.py -> build/lib.linux-x86_64-3.8/apex/multi_tensor_apply
copying apex/multi_tensor_apply/multi_tensor_apply.py -> build/lib.linux-x86_64-3.8/apex/multi_tensor_apply
creating build/lib.linux-x86_64-3.8/apex/normalization
copying apex/normalization/fused_layer_norm.py -> build/lib.linux-x86_64-3.8/apex/normalization
copying apex/normalization/__init__.py -> build/lib.linux-x86_64-3.8/apex/normalization
creating build/lib.linux-x86_64-3.8/apex/optimizers
copying apex/optimizers/fused_lamb.py -> build/lib.linux-x86_64-3.8/apex/optimizers
copying apex/optimizers/fused_novograd.py -> build/lib.linux-x86_64-3.8/apex/optimizers
copying apex/optimizers/__init__.py -> build/lib.linux-x86_64-3.8/apex/optimizers
copying apex/optimizers/fused_sgd.py -> build/lib.linux-x86_64-3.8/apex/optimizers
copying apex/optimizers/fused_adam.py -> build/lib.linux-x86_64-3.8/apex/optimizers
creating build/lib.linux-x86_64-3.8/apex/pyprof
copying apex/pyprof/__init__.py -> build/lib.linux-x86_64-3.8/apex/pyprof
creating build/lib.linux-x86_64-3.8/apex/RNN
copying apex/RNN/cells.py -> build/lib.linux-x86_64-3.8/apex/RNN
copying apex/RNN/__init__.py -> build/lib.linux-x86_64-3.8/apex/RNN
copying apex/RNN/RNNBackend.py -> build/lib.linux-x86_64-3.8/apex/RNN
copying apex/RNN/models.py -> build/lib.linux-x86_64-3.8/apex/RNN
creating build/lib.linux-x86_64-3.8/apex/amp
copying apex/amp/utils.py -> build/lib.linux-x86_64-3.8/apex/amp
copying apex/amp/rnn_compat.py -> build/lib.linux-x86_64-3.8/apex/amp
copying apex/amp/frontend.py -> build/lib.linux-x86_64-3.8/apex/amp
copying apex/amp/_initialize.py -> build/lib.linux-x86_64-3.8/apex/amp
copying apex/amp/amp.py -> build/lib.linux-x86_64-3.8/apex/amp
copying apex/amp/_process_optimizer.py -> build/lib.linux-x86_64-3.8/apex/amp
copying apex/amp/opt.py -> build/lib.linux-x86_64-3.8/apex/amp
copying apex/amp/__init__.py -> build/lib.linux-x86_64-3.8/apex/amp
copying apex/amp/handle.py -> build/lib.linux-x86_64-3.8/apex/amp
copying apex/amp/scaler.py -> build/lib.linux-x86_64-3.8/apex/amp
copying apex/amp/__version__.py -> build/lib.linux-x86_64-3.8/apex/amp
copying apex/amp/compat.py -> build/lib.linux-x86_64-3.8/apex/amp
copying apex/amp/_amp_state.py -> build/lib.linux-x86_64-3.8/apex/amp
copying apex/amp/wrap.py -> build/lib.linux-x86_64-3.8/apex/amp
creating build/lib.linux-x86_64-3.8/apex/reparameterization
copying apex/reparameterization/reparameterization.py -> build/lib.linux-x86_64-3.8/apex/reparameterization
copying apex/reparameterization/__init__.py -> build/lib.linux-x86_64-3.8/apex/reparameterization
copying apex/reparameterization/weight_norm.py -> build/lib.linux-x86_64-3.8/apex/reparameterization
creating build/lib.linux-x86_64-3.8/apex/fp16_utils
copying apex/fp16_utils/__init__.py -> build/lib.linux-x86_64-3.8/apex/fp16_utils
copying apex/fp16_utils/loss_scaler.py -> build/lib.linux-x86_64-3.8/apex/fp16_utils
copying apex/fp16_utils/fp16_optimizer.py -> build/lib.linux-x86_64-3.8/apex/fp16_utils
copying apex/fp16_utils/fp16util.py -> build/lib.linux-x86_64-3.8/apex/fp16_utils
creating build/lib.linux-x86_64-3.8/apex/contrib/xentropy
copying apex/contrib/xentropy/__init__.py -> build/lib.linux-x86_64-3.8/apex/contrib/xentropy
copying apex/contrib/xentropy/softmax_xentropy.py -> build/lib.linux-x86_64-3.8/apex/contrib/xentropy
creating build/lib.linux-x86_64-3.8/apex/contrib/optimizers
copying apex/contrib/optimizers/__init__.py -> build/lib.linux-x86_64-3.8/apex/contrib/optimizers
copying apex/contrib/optimizers/fp16_optimizer.py -> build/lib.linux-x86_64-3.8/apex/contrib/optimizers
copying apex/contrib/optimizers/fused_sgd.py -> build/lib.linux-x86_64-3.8/apex/contrib/optimizers
copying apex/contrib/optimizers/fused_adam.py -> build/lib.linux-x86_64-3.8/apex/contrib/optimizers
creating build/lib.linux-x86_64-3.8/apex/contrib/groupbn
copying apex/contrib/groupbn/__init__.py -> build/lib.linux-x86_64-3.8/apex/contrib/groupbn
copying apex/contrib/groupbn/batch_norm.py -> build/lib.linux-x86_64-3.8/apex/contrib/groupbn
creating build/lib.linux-x86_64-3.8/apex/pyprof/nvtx
copying apex/pyprof/nvtx/__init__.py -> build/lib.linux-x86_64-3.8/apex/pyprof/nvtx
copying apex/pyprof/nvtx/nvmarker.py -> build/lib.linux-x86_64-3.8/apex/pyprof/nvtx
creating build/lib.linux-x86_64-3.8/apex/pyprof/parse
copying apex/pyprof/parse/nvvp.py -> build/lib.linux-x86_64-3.8/apex/pyprof/parse
copying apex/pyprof/parse/__main__.py -> build/lib.linux-x86_64-3.8/apex/pyprof/parse
copying apex/pyprof/parse/parse.py -> build/lib.linux-x86_64-3.8/apex/pyprof/parse
copying apex/pyprof/parse/__init__.py -> build/lib.linux-x86_64-3.8/apex/pyprof/parse
copying apex/pyprof/parse/db.py -> build/lib.linux-x86_64-3.8/apex/pyprof/parse
copying apex/pyprof/parse/kernel.py -> build/lib.linux-x86_64-3.8/apex/pyprof/parse
creating build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/loss.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/optim.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/output.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/misc.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/__main__.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/usage.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/index_slice_join_mutate.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/pointwise.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/blas.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/embedding.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/reduction.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/prof.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/utility.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/__init__.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/softmax.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/recurrentCell.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/randomSample.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/conv.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/data.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/pooling.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/activation.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/linear.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/normalization.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/base.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/dropout.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/convert.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
creating build/lib.linux-x86_64-3.8/apex/amp/lists
copying apex/amp/lists/torch_overrides.py -> build/lib.linux-x86_64-3.8/apex/amp/lists
copying apex/amp/lists/__init__.py -> build/lib.linux-x86_64-3.8/apex/amp/lists
copying apex/amp/lists/functional_overrides.py -> build/lib.linux-x86_64-3.8/apex/amp/lists
copying apex/amp/lists/tensor_overrides.py -> build/lib.linux-x86_64-3.8/apex/amp/lists
running build_ext
building 'apex_C' extension
creating build/temp.linux-x86_64-3.8
creating build/temp.linux-x86_64-3.8/csrc
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/TH -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/THC -I/home/user/code/DeepSpeed/env/include -I/usr/include/python3.8 -c csrc/flatten_unflatten.cpp -o build/temp.linux-x86_64-3.8/csrc/flatten_unflatten.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=apex_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Parallel.h:149,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/utils.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:5,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:7,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/flatten_unflatten.cpp:1:
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ParallelOpenMP.h:84: warning: ignoring #pragma omp parallel [-Wunknown-pragmas]
   84 | #pragma omp parallel for if ((end - begin) >= grain_size)
      | 
In file included from csrc/flatten_unflatten.cpp:2:
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/utils/tensor_flatten.h: In member function ‘at::DeprecatedTypeProperties& torch::utils::TensorGroup::type()’:
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/utils/tensor_flatten.h:36:28: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
   36 |     return tensors[0].type();
      |                            ^
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Context.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:5,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/flatten_unflatten.cpp:1:
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:268:30: note: declared here
  268 |   DeprecatedTypeProperties & type() const {
      |                              ^~~~
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/csrc/flatten_unflatten.o -L/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -o build/lib.linux-x86_64-3.8/apex_C.cpython-38-x86_64-linux-gnu.so
building 'amp_C' extension
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/TH -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/user/code/DeepSpeed/env/include -I/usr/include/python3.8 -c csrc/amp_C_frontend.cpp -o build/temp.linux-x86_64-3.8/csrc/amp_C_frontend.o -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Parallel.h:149,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/utils.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:5,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:7,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/amp_C_frontend.cpp:1:
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ParallelOpenMP.h:84: warning: ignoring #pragma omp parallel [-Wunknown-pragmas]
   84 | #pragma omp parallel for if ((end - begin) >= grain_size)
      | 
/usr/local/cuda/bin/nvcc -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/TH -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/user/code/DeepSpeed/env/include -I/usr/include/python3.8 -c csrc/multi_tensor_sgd_kernel.cu -o build/temp.linux-x86_64-3.8/csrc/multi_tensor_sgd_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -std=c++14
/usr/local/cuda/bin/nvcc -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/TH -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/user/code/DeepSpeed/env/include -I/usr/include/python3.8 -c csrc/multi_tensor_scale_kernel.cu -o build/temp.linux-x86_64-3.8/csrc/multi_tensor_scale_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -std=c++14
/usr/local/cuda/bin/nvcc -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/TH -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/user/code/DeepSpeed/env/include -I/usr/include/python3.8 -c csrc/multi_tensor_axpby_kernel.cu -o build/temp.linux-x86_64-3.8/csrc/multi_tensor_axpby_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -std=c++14
/usr/local/cuda/bin/nvcc -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/TH -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/user/code/DeepSpeed/env/include -I/usr/include/python3.8 -c csrc/multi_tensor_l2norm_kernel.cu -o build/temp.linux-x86_64-3.8/csrc/multi_tensor_l2norm_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -std=c++14
/usr/local/cuda/bin/nvcc -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/TH -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/user/code/DeepSpeed/env/include -I/usr/include/python3.8 -c csrc/multi_tensor_lamb_stage_1.cu -o build/temp.linux-x86_64-3.8/csrc/multi_tensor_lamb_stage_1.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -std=c++14
/usr/local/cuda/bin/nvcc -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/TH -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/user/code/DeepSpeed/env/include -I/usr/include/python3.8 -c csrc/multi_tensor_lamb_stage_2.cu -o build/temp.linux-x86_64-3.8/csrc/multi_tensor_lamb_stage_2.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -std=c++14
/usr/local/cuda/bin/nvcc -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/TH -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/user/code/DeepSpeed/env/include -I/usr/include/python3.8 -c csrc/multi_tensor_adam.cu -o build/temp.linux-x86_64-3.8/csrc/multi_tensor_adam.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -std=c++14
/usr/local/cuda/bin/nvcc -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/TH -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/user/code/DeepSpeed/env/include -I/usr/include/python3.8 -c csrc/multi_tensor_novograd.cu -o build/temp.linux-x86_64-3.8/csrc/multi_tensor_novograd.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -std=c++14
/usr/local/cuda/bin/nvcc -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/TH -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/user/code/DeepSpeed/env/include -I/usr/include/python3.8 -c csrc/multi_tensor_lamb.cu -o build/temp.linux-x86_64-3.8/csrc/multi_tensor_lamb.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -std=c++14
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/csrc/amp_C_frontend.o build/temp.linux-x86_64-3.8/csrc/multi_tensor_sgd_kernel.o build/temp.linux-x86_64-3.8/csrc/multi_tensor_scale_kernel.o build/temp.linux-x86_64-3.8/csrc/multi_tensor_axpby_kernel.o build/temp.linux-x86_64-3.8/csrc/multi_tensor_l2norm_kernel.o build/temp.linux-x86_64-3.8/csrc/multi_tensor_lamb_stage_1.o build/temp.linux-x86_64-3.8/csrc/multi_tensor_lamb_stage_2.o build/temp.linux-x86_64-3.8/csrc/multi_tensor_adam.o build/temp.linux-x86_64-3.8/csrc/multi_tensor_novograd.o build/temp.linux-x86_64-3.8/csrc/multi_tensor_lamb.o -L/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-3.8/amp_C.cpython-38-x86_64-linux-gnu.so
building 'syncbn' extension
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/TH -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/user/code/DeepSpeed/env/include -I/usr/include/python3.8 -c csrc/syncbn.cpp -o build/temp.linux-x86_64-3.8/csrc/syncbn.o -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=syncbn -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Parallel.h:149,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/utils.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:5,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:7,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/syncbn.cpp:1:
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ParallelOpenMP.h:84: warning: ignoring #pragma omp parallel [-Wunknown-pragmas]
   84 | #pragma omp parallel for if ((end - begin) >= grain_size)
      | 
/usr/local/cuda/bin/nvcc -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/TH -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/user/code/DeepSpeed/env/include -I/usr/include/python3.8 -c csrc/welford.cu -o build/temp.linux-x86_64-3.8/csrc/welford.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=syncbn -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -std=c++14
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/csrc/syncbn.o build/temp.linux-x86_64-3.8/csrc/welford.o -L/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-3.8/syncbn.cpython-38-x86_64-linux-gnu.so
building 'fused_layer_norm_cuda' extension
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/TH -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/user/code/DeepSpeed/env/include -I/usr/include/python3.8 -c csrc/layer_norm_cuda.cpp -o build/temp.linux-x86_64-3.8/csrc/layer_norm_cuda.o -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=fused_layer_norm_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Parallel.h:149,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/utils.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:5,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:7,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ParallelOpenMP.h:84: warning: ignoring #pragma omp parallel [-Wunknown-pragmas]
   84 | #pragma omp parallel for if ((end - begin) >= grain_size)
      | 
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/DeviceType.h:8,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Device.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
csrc/layer_norm_cuda.cpp: In function ‘std::vector<at::Tensor> layer_norm(at::Tensor, c10::IntArrayRef, double)’:
csrc/layer_norm_cuda.cpp:117:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                                          ^
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/macros/Macros.h:146:65: note: in definition of macro ‘C10_UNLIKELY’
  146 | #define C10_UNLIKELY(expr)  (__builtin_expect(static_cast<bool>(expr), 0))
      |                                                                 ^~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:330:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
  330 |   if (C10_UNLIKELY_OR_CONST(!(cond))) {                               \
      |       ^~~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:318:3: note: in expansion of macro ‘TORCH_CHECK_WITH_MSG’
  318 |   TORCH_CHECK_WITH_MSG(error_t, cond, "", __VA_ARGS__)
      |   ^~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:341:32: note: in expansion of macro ‘TORCH_CHECK_WITH’
  341 | #define TORCH_CHECK(cond, ...) TORCH_CHECK_WITH(Error, cond, __VA_ARGS__)
      |                                ^~~~~~~~~~~~~~~~
csrc/layer_norm_cuda.cpp:117:23: note: in expansion of macro ‘TORCH_CHECK’
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                       ^~~~~~~~~~~
csrc/layer_norm_cuda.cpp:119:24: note: in expansion of macro ‘CHECK_CUDA’
  119 | #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
      |                        ^~~~~~~~~~
csrc/layer_norm_cuda.cpp:129:3: note: in expansion of macro ‘CHECK_INPUT’
  129 |   CHECK_INPUT(input);
      |   ^~~~~~~~~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Context.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:5,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:268:30: note: declared here
  268 |   DeprecatedTypeProperties & type() const {
      |                              ^~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/DeviceType.h:8,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Device.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
csrc/layer_norm_cuda.cpp: In function ‘std::vector<at::Tensor> layer_norm_affine(at::Tensor, c10::IntArrayRef, at::Tensor, at::Tensor, double)’:
csrc/layer_norm_cuda.cpp:117:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                                          ^
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/macros/Macros.h:146:65: note: in definition of macro ‘C10_UNLIKELY’
  146 | #define C10_UNLIKELY(expr)  (__builtin_expect(static_cast<bool>(expr), 0))
      |                                                                 ^~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:330:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
  330 |   if (C10_UNLIKELY_OR_CONST(!(cond))) {                               \
      |       ^~~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:318:3: note: in expansion of macro ‘TORCH_CHECK_WITH_MSG’
  318 |   TORCH_CHECK_WITH_MSG(error_t, cond, "", __VA_ARGS__)
      |   ^~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:341:32: note: in expansion of macro ‘TORCH_CHECK_WITH’
  341 | #define TORCH_CHECK(cond, ...) TORCH_CHECK_WITH(Error, cond, __VA_ARGS__)
      |                                ^~~~~~~~~~~~~~~~
csrc/layer_norm_cuda.cpp:117:23: note: in expansion of macro ‘TORCH_CHECK’
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                       ^~~~~~~~~~~
csrc/layer_norm_cuda.cpp:119:24: note: in expansion of macro ‘CHECK_CUDA’
  119 | #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
      |                        ^~~~~~~~~~
csrc/layer_norm_cuda.cpp:149:3: note: in expansion of macro ‘CHECK_INPUT’
  149 |   CHECK_INPUT(input);
      |   ^~~~~~~~~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Context.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:5,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:268:30: note: declared here
  268 |   DeprecatedTypeProperties & type() const {
      |                              ^~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/DeviceType.h:8,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Device.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
csrc/layer_norm_cuda.cpp:117:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                                          ^
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/macros/Macros.h:146:65: note: in definition of macro ‘C10_UNLIKELY’
  146 | #define C10_UNLIKELY(expr)  (__builtin_expect(static_cast<bool>(expr), 0))
      |                                                                 ^~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:330:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
  330 |   if (C10_UNLIKELY_OR_CONST(!(cond))) {                               \
      |       ^~~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:318:3: note: in expansion of macro ‘TORCH_CHECK_WITH_MSG’
  318 |   TORCH_CHECK_WITH_MSG(error_t, cond, "", __VA_ARGS__)
      |   ^~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:341:32: note: in expansion of macro ‘TORCH_CHECK_WITH’
  341 | #define TORCH_CHECK(cond, ...) TORCH_CHECK_WITH(Error, cond, __VA_ARGS__)
      |                                ^~~~~~~~~~~~~~~~
csrc/layer_norm_cuda.cpp:117:23: note: in expansion of macro ‘TORCH_CHECK’
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                       ^~~~~~~~~~~
csrc/layer_norm_cuda.cpp:119:24: note: in expansion of macro ‘CHECK_CUDA’
  119 | #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
      |                        ^~~~~~~~~~
csrc/layer_norm_cuda.cpp:150:3: note: in expansion of macro ‘CHECK_INPUT’
  150 |   CHECK_INPUT(gamma);
      |   ^~~~~~~~~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Context.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:5,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:268:30: note: declared here
  268 |   DeprecatedTypeProperties & type() const {
      |                              ^~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/DeviceType.h:8,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Device.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
csrc/layer_norm_cuda.cpp:117:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                                          ^
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/macros/Macros.h:146:65: note: in definition of macro ‘C10_UNLIKELY’
  146 | #define C10_UNLIKELY(expr)  (__builtin_expect(static_cast<bool>(expr), 0))
      |                                                                 ^~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:330:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
  330 |   if (C10_UNLIKELY_OR_CONST(!(cond))) {                               \
      |       ^~~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:318:3: note: in expansion of macro ‘TORCH_CHECK_WITH_MSG’
  318 |   TORCH_CHECK_WITH_MSG(error_t, cond, "", __VA_ARGS__)
      |   ^~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:341:32: note: in expansion of macro ‘TORCH_CHECK_WITH’
  341 | #define TORCH_CHECK(cond, ...) TORCH_CHECK_WITH(Error, cond, __VA_ARGS__)
      |                                ^~~~~~~~~~~~~~~~
csrc/layer_norm_cuda.cpp:117:23: note: in expansion of macro ‘TORCH_CHECK’
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                       ^~~~~~~~~~~
csrc/layer_norm_cuda.cpp:119:24: note: in expansion of macro ‘CHECK_CUDA’
  119 | #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
      |                        ^~~~~~~~~~
csrc/layer_norm_cuda.cpp:151:3: note: in expansion of macro ‘CHECK_INPUT’
  151 |   CHECK_INPUT(beta);
      |   ^~~~~~~~~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Context.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:5,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:268:30: note: declared here
  268 |   DeprecatedTypeProperties & type() const {
      |                              ^~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/DeviceType.h:8,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Device.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
csrc/layer_norm_cuda.cpp: In function ‘at::Tensor layer_norm_gradient(at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::IntArrayRef, double)’:
csrc/layer_norm_cuda.cpp:117:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                                          ^
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/macros/Macros.h:146:65: note: in definition of macro ‘C10_UNLIKELY’
  146 | #define C10_UNLIKELY(expr)  (__builtin_expect(static_cast<bool>(expr), 0))
      |                                                                 ^~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:330:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
  330 |   if (C10_UNLIKELY_OR_CONST(!(cond))) {                               \
      |       ^~~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:318:3: note: in expansion of macro ‘TORCH_CHECK_WITH_MSG’
  318 |   TORCH_CHECK_WITH_MSG(error_t, cond, "", __VA_ARGS__)
      |   ^~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:341:32: note: in expansion of macro ‘TORCH_CHECK_WITH’
  341 | #define TORCH_CHECK(cond, ...) TORCH_CHECK_WITH(Error, cond, __VA_ARGS__)
      |                                ^~~~~~~~~~~~~~~~
csrc/layer_norm_cuda.cpp:117:23: note: in expansion of macro ‘TORCH_CHECK’
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                       ^~~~~~~~~~~
csrc/layer_norm_cuda.cpp:119:24: note: in expansion of macro ‘CHECK_CUDA’
  119 | #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
      |                        ^~~~~~~~~~
csrc/layer_norm_cuda.cpp:193:3: note: in expansion of macro ‘CHECK_INPUT’
  193 |   CHECK_INPUT(dout);
      |   ^~~~~~~~~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Context.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:5,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:268:30: note: declared here
  268 |   DeprecatedTypeProperties & type() const {
      |                              ^~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/DeviceType.h:8,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Device.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
csrc/layer_norm_cuda.cpp:117:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                                          ^
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/macros/Macros.h:146:65: note: in definition of macro ‘C10_UNLIKELY’
  146 | #define C10_UNLIKELY(expr)  (__builtin_expect(static_cast<bool>(expr), 0))
      |                                                                 ^~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:330:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
  330 |   if (C10_UNLIKELY_OR_CONST(!(cond))) {                               \
      |       ^~~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:318:3: note: in expansion of macro ‘TORCH_CHECK_WITH_MSG’
  318 |   TORCH_CHECK_WITH_MSG(error_t, cond, "", __VA_ARGS__)
      |   ^~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:341:32: note: in expansion of macro ‘TORCH_CHECK_WITH’
  341 | #define TORCH_CHECK(cond, ...) TORCH_CHECK_WITH(Error, cond, __VA_ARGS__)
      |                                ^~~~~~~~~~~~~~~~
csrc/layer_norm_cuda.cpp:117:23: note: in expansion of macro ‘TORCH_CHECK’
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                       ^~~~~~~~~~~
csrc/layer_norm_cuda.cpp:119:24: note: in expansion of macro ‘CHECK_CUDA’
  119 | #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
      |                        ^~~~~~~~~~
csrc/layer_norm_cuda.cpp:194:3: note: in expansion of macro ‘CHECK_INPUT’
  194 |   CHECK_INPUT(mean);
      |   ^~~~~~~~~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Context.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:5,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:268:30: note: declared here
  268 |   DeprecatedTypeProperties & type() const {
      |                              ^~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/DeviceType.h:8,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Device.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
csrc/layer_norm_cuda.cpp:117:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                                          ^
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/macros/Macros.h:146:65: note: in definition of macro ‘C10_UNLIKELY’
  146 | #define C10_UNLIKELY(expr)  (__builtin_expect(static_cast<bool>(expr), 0))
      |                                                                 ^~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:330:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
  330 |   if (C10_UNLIKELY_OR_CONST(!(cond))) {                               \
      |       ^~~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:318:3: note: in expansion of macro ‘TORCH_CHECK_WITH_MSG’
  318 |   TORCH_CHECK_WITH_MSG(error_t, cond, "", __VA_ARGS__)
      |   ^~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:341:32: note: in expansion of macro ‘TORCH_CHECK_WITH’
  341 | #define TORCH_CHECK(cond, ...) TORCH_CHECK_WITH(Error, cond, __VA_ARGS__)
      |                                ^~~~~~~~~~~~~~~~
csrc/layer_norm_cuda.cpp:117:23: note: in expansion of macro ‘TORCH_CHECK’
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                       ^~~~~~~~~~~
csrc/layer_norm_cuda.cpp:119:24: note: in expansion of macro ‘CHECK_CUDA’
  119 | #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
      |                        ^~~~~~~~~~
csrc/layer_norm_cuda.cpp:195:3: note: in expansion of macro ‘CHECK_INPUT’
  195 |   CHECK_INPUT(invvar);
      |   ^~~~~~~~~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Context.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:5,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:268:30: note: declared here
  268 |   DeprecatedTypeProperties & type() const {
      |                              ^~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/DeviceType.h:8,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Device.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
csrc/layer_norm_cuda.cpp:117:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                                          ^
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/macros/Macros.h:146:65: note: in definition of macro ‘C10_UNLIKELY’
  146 | #define C10_UNLIKELY(expr)  (__builtin_expect(static_cast<bool>(expr), 0))
      |                                                                 ^~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:330:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
  330 |   if (C10_UNLIKELY_OR_CONST(!(cond))) {                               \
      |       ^~~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:318:3: note: in expansion of macro ‘TORCH_CHECK_WITH_MSG’
  318 |   TORCH_CHECK_WITH_MSG(error_t, cond, "", __VA_ARGS__)
      |   ^~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:341:32: note: in expansion of macro ‘TORCH_CHECK_WITH’
  341 | #define TORCH_CHECK(cond, ...) TORCH_CHECK_WITH(Error, cond, __VA_ARGS__)
      |                                ^~~~~~~~~~~~~~~~
csrc/layer_norm_cuda.cpp:117:23: note: in expansion of macro ‘TORCH_CHECK’
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                       ^~~~~~~~~~~
csrc/layer_norm_cuda.cpp:119:24: note: in expansion of macro ‘CHECK_CUDA’
  119 | #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
      |                        ^~~~~~~~~~
csrc/layer_norm_cuda.cpp:196:3: note: in expansion of macro ‘CHECK_INPUT’
  196 |   CHECK_INPUT(input);
      |   ^~~~~~~~~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Context.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:5,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:268:30: note: declared here
  268 |   DeprecatedTypeProperties & type() const {
      |                              ^~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/DeviceType.h:8,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Device.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
csrc/layer_norm_cuda.cpp: In function ‘std::vector<at::Tensor> layer_norm_gradient_affine(at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::IntArrayRef, at::Tensor, at::Tensor, double)’:
csrc/layer_norm_cuda.cpp:117:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                                          ^
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/macros/Macros.h:146:65: note: in definition of macro ‘C10_UNLIKELY’
  146 | #define C10_UNLIKELY(expr)  (__builtin_expect(static_cast<bool>(expr), 0))
      |                                                                 ^~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:330:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
  330 |   if (C10_UNLIKELY_OR_CONST(!(cond))) {                               \
      |       ^~~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:318:3: note: in expansion of macro ‘TORCH_CHECK_WITH_MSG’
  318 |   TORCH_CHECK_WITH_MSG(error_t, cond, "", __VA_ARGS__)
      |   ^~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:341:32: note: in expansion of macro ‘TORCH_CHECK_WITH’
  341 | #define TORCH_CHECK(cond, ...) TORCH_CHECK_WITH(Error, cond, __VA_ARGS__)
      |                                ^~~~~~~~~~~~~~~~
csrc/layer_norm_cuda.cpp:117:23: note: in expansion of macro ‘TORCH_CHECK’
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                       ^~~~~~~~~~~
csrc/layer_norm_cuda.cpp:119:24: note: in expansion of macro ‘CHECK_CUDA’
  119 | #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
      |                        ^~~~~~~~~~
csrc/layer_norm_cuda.cpp:218:3: note: in expansion of macro ‘CHECK_INPUT’
  218 |   CHECK_INPUT(dout);
      |   ^~~~~~~~~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Context.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:5,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:268:30: note: declared here
  268 |   DeprecatedTypeProperties & type() const {
      |                              ^~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/DeviceType.h:8,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Device.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
csrc/layer_norm_cuda.cpp:117:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                                          ^
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/macros/Macros.h:146:65: note: in definition of macro ‘C10_UNLIKELY’
  146 | #define C10_UNLIKELY(expr)  (__builtin_expect(static_cast<bool>(expr), 0))
      |                                                                 ^~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:330:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
  330 |   if (C10_UNLIKELY_OR_CONST(!(cond))) {                               \
      |       ^~~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:318:3: note: in expansion of macro ‘TORCH_CHECK_WITH_MSG’
  318 |   TORCH_CHECK_WITH_MSG(error_t, cond, "", __VA_ARGS__)
      |   ^~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:341:32: note: in expansion of macro ‘TORCH_CHECK_WITH’
  341 | #define TORCH_CHECK(cond, ...) TORCH_CHECK_WITH(Error, cond, __VA_ARGS__)
      |                                ^~~~~~~~~~~~~~~~
csrc/layer_norm_cuda.cpp:117:23: note: in expansion of macro ‘TORCH_CHECK’
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                       ^~~~~~~~~~~
csrc/layer_norm_cuda.cpp:119:24: note: in expansion of macro ‘CHECK_CUDA’
  119 | #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
      |                        ^~~~~~~~~~
csrc/layer_norm_cuda.cpp:219:3: note: in expansion of macro ‘CHECK_INPUT’
  219 |   CHECK_INPUT(mean);
      |   ^~~~~~~~~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Context.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:5,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:268:30: note: declared here
  268 |   DeprecatedTypeProperties & type() const {
      |                              ^~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/DeviceType.h:8,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Device.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
csrc/layer_norm_cuda.cpp:117:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                                          ^
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/macros/Macros.h:146:65: note: in definition of macro ‘C10_UNLIKELY’
  146 | #define C10_UNLIKELY(expr)  (__builtin_expect(static_cast<bool>(expr), 0))
      |                                                                 ^~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:330:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
  330 |   if (C10_UNLIKELY_OR_CONST(!(cond))) {                               \
      |       ^~~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:318:3: note: in expansion of macro ‘TORCH_CHECK_WITH_MSG’
  318 |   TORCH_CHECK_WITH_MSG(error_t, cond, "", __VA_ARGS__)
      |   ^~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:341:32: note: in expansion of macro ‘TORCH_CHECK_WITH’
  341 | #define TORCH_CHECK(cond, ...) TORCH_CHECK_WITH(Error, cond, __VA_ARGS__)
      |                                ^~~~~~~~~~~~~~~~
csrc/layer_norm_cuda.cpp:117:23: note: in expansion of macro ‘TORCH_CHECK’
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                       ^~~~~~~~~~~
csrc/layer_norm_cuda.cpp:119:24: note: in expansion of macro ‘CHECK_CUDA’
  119 | #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
      |                        ^~~~~~~~~~
csrc/layer_norm_cuda.cpp:220:3: note: in expansion of macro ‘CHECK_INPUT’
  220 |   CHECK_INPUT(invvar);
      |   ^~~~~~~~~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Context.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:5,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:268:30: note: declared here
  268 |   DeprecatedTypeProperties & type() const {
      |                              ^~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/DeviceType.h:8,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Device.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
csrc/layer_norm_cuda.cpp:117:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                                          ^
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/macros/Macros.h:146:65: note: in definition of macro ‘C10_UNLIKELY’
  146 | #define C10_UNLIKELY(expr)  (__builtin_expect(static_cast<bool>(expr), 0))
      |                                                                 ^~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:330:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
  330 |   if (C10_UNLIKELY_OR_CONST(!(cond))) {                               \
      |       ^~~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:318:3: note: in expansion of macro ‘TORCH_CHECK_WITH_MSG’
  318 |   TORCH_CHECK_WITH_MSG(error_t, cond, "", __VA_ARGS__)
      |   ^~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:341:32: note: in expansion of macro ‘TORCH_CHECK_WITH’
  341 | #define TORCH_CHECK(cond, ...) TORCH_CHECK_WITH(Error, cond, __VA_ARGS__)
      |                                ^~~~~~~~~~~~~~~~
csrc/layer_norm_cuda.cpp:117:23: note: in expansion of macro ‘TORCH_CHECK’
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                       ^~~~~~~~~~~
csrc/layer_norm_cuda.cpp:119:24: note: in expansion of macro ‘CHECK_CUDA’
  119 | #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
      |                        ^~~~~~~~~~
csrc/layer_norm_cuda.cpp:221:3: note: in expansion of macro ‘CHECK_INPUT’
  221 |   CHECK_INPUT(input);
      |   ^~~~~~~~~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Context.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:5,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:268:30: note: declared here
  268 |   DeprecatedTypeProperties & type() const {
      |                              ^~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/DeviceType.h:8,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Device.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
csrc/layer_norm_cuda.cpp:117:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                                          ^
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/macros/Macros.h:146:65: note: in definition of macro ‘C10_UNLIKELY’
  146 | #define C10_UNLIKELY(expr)  (__builtin_expect(static_cast<bool>(expr), 0))
      |                                                                 ^~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:330:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
  330 |   if (C10_UNLIKELY_OR_CONST(!(cond))) {                               \
      |       ^~~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:318:3: note: in expansion of macro ‘TORCH_CHECK_WITH_MSG’
  318 |   TORCH_CHECK_WITH_MSG(error_t, cond, "", __VA_ARGS__)
      |   ^~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:341:32: note: in expansion of macro ‘TORCH_CHECK_WITH’
  341 | #define TORCH_CHECK(cond, ...) TORCH_CHECK_WITH(Error, cond, __VA_ARGS__)
      |                                ^~~~~~~~~~~~~~~~
csrc/layer_norm_cuda.cpp:117:23: note: in expansion of macro ‘TORCH_CHECK’
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                       ^~~~~~~~~~~
csrc/layer_norm_cuda.cpp:119:24: note: in expansion of macro ‘CHECK_CUDA’
  119 | #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
      |                        ^~~~~~~~~~
csrc/layer_norm_cuda.cpp:222:3: note: in expansion of macro ‘CHECK_INPUT’
  222 |   CHECK_INPUT(gamma);
      |   ^~~~~~~~~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Context.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:5,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:268:30: note: declared here
  268 |   DeprecatedTypeProperties & type() const {
      |                              ^~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/DeviceType.h:8,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Device.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
csrc/layer_norm_cuda.cpp:117:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                                          ^
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/macros/Macros.h:146:65: note: in definition of macro ‘C10_UNLIKELY’
  146 | #define C10_UNLIKELY(expr)  (__builtin_expect(static_cast<bool>(expr), 0))
      |                                                                 ^~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:330:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
  330 |   if (C10_UNLIKELY_OR_CONST(!(cond))) {                               \
      |       ^~~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:318:3: note: in expansion of macro ‘TORCH_CHECK_WITH_MSG’
  318 |   TORCH_CHECK_WITH_MSG(error_t, cond, "", __VA_ARGS__)
      |   ^~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:341:32: note: in expansion of macro ‘TORCH_CHECK_WITH’
  341 | #define TORCH_CHECK(cond, ...) TORCH_CHECK_WITH(Error, cond, __VA_ARGS__)
      |                                ^~~~~~~~~~~~~~~~
csrc/layer_norm_cuda.cpp:117:23: note: in expansion of macro ‘TORCH_CHECK’
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                       ^~~~~~~~~~~
csrc/layer_norm_cuda.cpp:119:24: note: in expansion of macro ‘CHECK_CUDA’
  119 | #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
      |                        ^~~~~~~~~~
csrc/layer_norm_cuda.cpp:223:3: note: in expansion of macro ‘CHECK_INPUT’
  223 |   CHECK_INPUT(beta);
      |   ^~~~~~~~~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Context.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:5,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:268:30: note: declared here
  268 |   DeprecatedTypeProperties & type() const {
      |                              ^~~~
/usr/local/cuda/bin/nvcc -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/TH -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/user/code/DeepSpeed/env/include -I/usr/include/python3.8 -c csrc/layer_norm_cuda_kernel.cu -o build/temp.linux-x86_64-3.8/csrc/layer_norm_cuda_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -maxrregcount=50 -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=fused_layer_norm_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -std=c++14
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/csrc/layer_norm_cuda.o build/temp.linux-x86_64-3.8/csrc/layer_norm_cuda_kernel.o -L/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-3.8/fused_layer_norm_cuda.cpython-38-x86_64-linux-gnu.so
installing to build/bdist.linux-x86_64/wheel
running install
running install_lib
creating build/bdist.linux-x86_64
creating build/bdist.linux-x86_64/wheel
copying build/lib.linux-x86_64-3.8/amp_C.cpython-38-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/wheel
copying build/lib.linux-x86_64-3.8/fused_layer_norm_cuda.cpython-38-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/wheel
copying build/lib.linux-x86_64-3.8/syncbn.cpython-38-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/wheel
creating build/bdist.linux-x86_64/wheel/apex
creating build/bdist.linux-x86_64/wheel/apex/contrib
creating build/bdist.linux-x86_64/wheel/apex/contrib/xentropy
copying build/lib.linux-x86_64-3.8/apex/contrib/xentropy/__init__.py -> build/bdist.linux-x86_64/wheel/apex/contrib/xentropy
copying build/lib.linux-x86_64-3.8/apex/contrib/xentropy/softmax_xentropy.py -> build/bdist.linux-x86_64/wheel/apex/contrib/xentropy
creating build/bdist.linux-x86_64/wheel/apex/contrib/optimizers
copying build/lib.linux-x86_64-3.8/apex/contrib/optimizers/__init__.py -> build/bdist.linux-x86_64/wheel/apex/contrib/optimizers
copying build/lib.linux-x86_64-3.8/apex/contrib/optimizers/fp16_optimizer.py -> build/bdist.linux-x86_64/wheel/apex/contrib/optimizers
copying build/lib.linux-x86_64-3.8/apex/contrib/optimizers/fused_sgd.py -> build/bdist.linux-x86_64/wheel/apex/contrib/optimizers
copying build/lib.linux-x86_64-3.8/apex/contrib/optimizers/fused_adam.py -> build/bdist.linux-x86_64/wheel/apex/contrib/optimizers
copying build/lib.linux-x86_64-3.8/apex/contrib/__init__.py -> build/bdist.linux-x86_64/wheel/apex/contrib
creating build/bdist.linux-x86_64/wheel/apex/contrib/groupbn
copying build/lib.linux-x86_64-3.8/apex/contrib/groupbn/__init__.py -> build/bdist.linux-x86_64/wheel/apex/contrib/groupbn
copying build/lib.linux-x86_64-3.8/apex/contrib/groupbn/batch_norm.py -> build/bdist.linux-x86_64/wheel/apex/contrib/groupbn
creating build/bdist.linux-x86_64/wheel/apex/parallel
copying build/lib.linux-x86_64-3.8/apex/parallel/LARC.py -> build/bdist.linux-x86_64/wheel/apex/parallel
copying build/lib.linux-x86_64-3.8/apex/parallel/distributed.py -> build/bdist.linux-x86_64/wheel/apex/parallel
copying build/lib.linux-x86_64-3.8/apex/parallel/__init__.py -> build/bdist.linux-x86_64/wheel/apex/parallel
copying build/lib.linux-x86_64-3.8/apex/parallel/optimized_sync_batchnorm.py -> build/bdist.linux-x86_64/wheel/apex/parallel
copying build/lib.linux-x86_64-3.8/apex/parallel/sync_batchnorm.py -> build/bdist.linux-x86_64/wheel/apex/parallel
copying build/lib.linux-x86_64-3.8/apex/parallel/multiproc.py -> build/bdist.linux-x86_64/wheel/apex/parallel
copying build/lib.linux-x86_64-3.8/apex/parallel/optimized_sync_batchnorm_kernel.py -> build/bdist.linux-x86_64/wheel/apex/parallel
copying build/lib.linux-x86_64-3.8/apex/parallel/sync_batchnorm_kernel.py -> build/bdist.linux-x86_64/wheel/apex/parallel
creating build/bdist.linux-x86_64/wheel/apex/multi_tensor_apply
copying build/lib.linux-x86_64-3.8/apex/multi_tensor_apply/__init__.py -> build/bdist.linux-x86_64/wheel/apex/multi_tensor_apply
copying build/lib.linux-x86_64-3.8/apex/multi_tensor_apply/multi_tensor_apply.py -> build/bdist.linux-x86_64/wheel/apex/multi_tensor_apply
creating build/bdist.linux-x86_64/wheel/apex/normalization
copying build/lib.linux-x86_64-3.8/apex/normalization/fused_layer_norm.py -> build/bdist.linux-x86_64/wheel/apex/normalization
copying build/lib.linux-x86_64-3.8/apex/normalization/__init__.py -> build/bdist.linux-x86_64/wheel/apex/normalization
creating build/bdist.linux-x86_64/wheel/apex/optimizers
copying build/lib.linux-x86_64-3.8/apex/optimizers/fused_lamb.py -> build/bdist.linux-x86_64/wheel/apex/optimizers
copying build/lib.linux-x86_64-3.8/apex/optimizers/fused_novograd.py -> build/bdist.linux-x86_64/wheel/apex/optimizers
copying build/lib.linux-x86_64-3.8/apex/optimizers/__init__.py -> build/bdist.linux-x86_64/wheel/apex/optimizers
copying build/lib.linux-x86_64-3.8/apex/optimizers/fused_sgd.py -> build/bdist.linux-x86_64/wheel/apex/optimizers
copying build/lib.linux-x86_64-3.8/apex/optimizers/fused_adam.py -> build/bdist.linux-x86_64/wheel/apex/optimizers
creating build/bdist.linux-x86_64/wheel/apex/pyprof
creating build/bdist.linux-x86_64/wheel/apex/pyprof/nvtx
copying build/lib.linux-x86_64-3.8/apex/pyprof/nvtx/__init__.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/nvtx
copying build/lib.linux-x86_64-3.8/apex/pyprof/nvtx/nvmarker.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/nvtx
copying build/lib.linux-x86_64-3.8/apex/pyprof/__init__.py -> build/bdist.linux-x86_64/wheel/apex/pyprof
creating build/bdist.linux-x86_64/wheel/apex/pyprof/parse
copying build/lib.linux-x86_64-3.8/apex/pyprof/parse/nvvp.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/parse
copying build/lib.linux-x86_64-3.8/apex/pyprof/parse/__main__.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/parse
copying build/lib.linux-x86_64-3.8/apex/pyprof/parse/parse.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/parse
copying build/lib.linux-x86_64-3.8/apex/pyprof/parse/__init__.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/parse
copying build/lib.linux-x86_64-3.8/apex/pyprof/parse/db.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/parse
copying build/lib.linux-x86_64-3.8/apex/pyprof/parse/kernel.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/parse
creating build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/loss.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/optim.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/output.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/misc.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/__main__.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/usage.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/index_slice_join_mutate.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/pointwise.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/blas.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/embedding.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/reduction.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/prof.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/utility.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/__init__.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/softmax.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/recurrentCell.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/randomSample.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/conv.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/data.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/pooling.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/activation.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/linear.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/normalization.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/base.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/dropout.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/convert.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/__init__.py -> build/bdist.linux-x86_64/wheel/apex
creating build/bdist.linux-x86_64/wheel/apex/RNN
copying build/lib.linux-x86_64-3.8/apex/RNN/cells.py -> build/bdist.linux-x86_64/wheel/apex/RNN
copying build/lib.linux-x86_64-3.8/apex/RNN/__init__.py -> build/bdist.linux-x86_64/wheel/apex/RNN
copying build/lib.linux-x86_64-3.8/apex/RNN/RNNBackend.py -> build/bdist.linux-x86_64/wheel/apex/RNN
copying build/lib.linux-x86_64-3.8/apex/RNN/models.py -> build/bdist.linux-x86_64/wheel/apex/RNN
creating build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.8/apex/amp/utils.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.8/apex/amp/rnn_compat.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.8/apex/amp/frontend.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.8/apex/amp/_initialize.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.8/apex/amp/amp.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.8/apex/amp/_process_optimizer.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.8/apex/amp/opt.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.8/apex/amp/__init__.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.8/apex/amp/handle.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.8/apex/amp/scaler.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.8/apex/amp/__version__.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.8/apex/amp/compat.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.8/apex/amp/_amp_state.py -> build/bdist.linux-x86_64/wheel/apex/amp
creating build/bdist.linux-x86_64/wheel/apex/amp/lists
copying build/lib.linux-x86_64-3.8/apex/amp/lists/torch_overrides.py -> build/bdist.linux-x86_64/wheel/apex/amp/lists
copying build/lib.linux-x86_64-3.8/apex/amp/lists/__init__.py -> build/bdist.linux-x86_64/wheel/apex/amp/lists
copying build/lib.linux-x86_64-3.8/apex/amp/lists/functional_overrides.py -> build/bdist.linux-x86_64/wheel/apex/amp/lists
copying build/lib.linux-x86_64-3.8/apex/amp/lists/tensor_overrides.py -> build/bdist.linux-x86_64/wheel/apex/amp/lists
copying build/lib.linux-x86_64-3.8/apex/amp/wrap.py -> build/bdist.linux-x86_64/wheel/apex/amp
creating build/bdist.linux-x86_64/wheel/apex/reparameterization
copying build/lib.linux-x86_64-3.8/apex/reparameterization/reparameterization.py -> build/bdist.linux-x86_64/wheel/apex/reparameterization
copying build/lib.linux-x86_64-3.8/apex/reparameterization/__init__.py -> build/bdist.linux-x86_64/wheel/apex/reparameterization
copying build/lib.linux-x86_64-3.8/apex/reparameterization/weight_norm.py -> build/bdist.linux-x86_64/wheel/apex/reparameterization
creating build/bdist.linux-x86_64/wheel/apex/fp16_utils
copying build/lib.linux-x86_64-3.8/apex/fp16_utils/__init__.py -> build/bdist.linux-x86_64/wheel/apex/fp16_utils
copying build/lib.linux-x86_64-3.8/apex/fp16_utils/loss_scaler.py -> build/bdist.linux-x86_64/wheel/apex/fp16_utils
copying build/lib.linux-x86_64-3.8/apex/fp16_utils/fp16_optimizer.py -> build/bdist.linux-x86_64/wheel/apex/fp16_utils
copying build/lib.linux-x86_64-3.8/apex/fp16_utils/fp16util.py -> build/bdist.linux-x86_64/wheel/apex/fp16_utils
copying build/lib.linux-x86_64-3.8/apex_C.cpython-38-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/wheel
running install_egg_info
running egg_info
creating apex.egg-info
writing apex.egg-info/PKG-INFO
writing dependency_links to apex.egg-info/dependency_links.txt
writing top-level names to apex.egg-info/top_level.txt
writing manifest file 'apex.egg-info/SOURCES.txt'
'license_file' option was not specified
reading manifest file 'apex.egg-info/SOURCES.txt'
writing manifest file 'apex.egg-info/SOURCES.txt'
Copying apex.egg-info to build/bdist.linux-x86_64/wheel/apex-0.1.egg-info
Copying dependency_links.txt to build/bdist.linux-x86_64/wheel/apex-0.1.egg-info/dependency_links.txt
Copying PKG-INFO to build/bdist.linux-x86_64/wheel/apex-0.1.egg-info/PKG-INFO
Copying top_level.txt to build/bdist.linux-x86_64/wheel/apex-0.1.egg-info/top_level.txt
Copying SOURCES.txt to build/bdist.linux-x86_64/wheel/apex-0.1.egg-info/SOURCES.txt
running install_scripts
adding license file "LICENSE" (matched pattern "LICEN[CS]E*")
creating build/bdist.linux-x86_64/wheel/apex-0.1.dist-info/WHEEL
creating 'dist/apex-0.1-cp38-cp38-linux_x86_64.whl' and adding 'build/bdist.linux-x86_64/wheel' to it
adding 'amp_C.cpython-38-x86_64-linux-gnu.so'
adding 'apex_C.cpython-38-x86_64-linux-gnu.so'
adding 'fused_layer_norm_cuda.cpython-38-x86_64-linux-gnu.so'
adding 'syncbn.cpython-38-x86_64-linux-gnu.so'
adding 'apex/__init__.py'
adding 'apex/RNN/RNNBackend.py'
adding 'apex/RNN/__init__.py'
adding 'apex/RNN/cells.py'
adding 'apex/RNN/models.py'
adding 'apex/amp/__init__.py'
adding 'apex/amp/__version__.py'
adding 'apex/amp/_amp_state.py'
adding 'apex/amp/_initialize.py'
adding 'apex/amp/_process_optimizer.py'
adding 'apex/amp/amp.py'
adding 'apex/amp/compat.py'
adding 'apex/amp/frontend.py'
adding 'apex/amp/handle.py'
adding 'apex/amp/opt.py'
adding 'apex/amp/rnn_compat.py'
adding 'apex/amp/scaler.py'
adding 'apex/amp/utils.py'
adding 'apex/amp/wrap.py'
adding 'apex/amp/lists/__init__.py'
adding 'apex/amp/lists/functional_overrides.py'
adding 'apex/amp/lists/tensor_overrides.py'
adding 'apex/amp/lists/torch_overrides.py'
adding 'apex/contrib/__init__.py'
adding 'apex/contrib/groupbn/__init__.py'
adding 'apex/contrib/groupbn/batch_norm.py'
adding 'apex/contrib/optimizers/__init__.py'
adding 'apex/contrib/optimizers/fp16_optimizer.py'
adding 'apex/contrib/optimizers/fused_adam.py'
adding 'apex/contrib/optimizers/fused_sgd.py'
adding 'apex/contrib/xentropy/__init__.py'
adding 'apex/contrib/xentropy/softmax_xentropy.py'
adding 'apex/fp16_utils/__init__.py'
adding 'apex/fp16_utils/fp16_optimizer.py'
adding 'apex/fp16_utils/fp16util.py'
adding 'apex/fp16_utils/loss_scaler.py'
adding 'apex/multi_tensor_apply/__init__.py'
adding 'apex/multi_tensor_apply/multi_tensor_apply.py'
adding 'apex/normalization/__init__.py'
adding 'apex/normalization/fused_layer_norm.py'
adding 'apex/optimizers/__init__.py'
adding 'apex/optimizers/fused_adam.py'
adding 'apex/optimizers/fused_lamb.py'
adding 'apex/optimizers/fused_novograd.py'
adding 'apex/optimizers/fused_sgd.py'
adding 'apex/parallel/LARC.py'
adding 'apex/parallel/__init__.py'
adding 'apex/parallel/distributed.py'
adding 'apex/parallel/multiproc.py'
adding 'apex/parallel/optimized_sync_batchnorm.py'
adding 'apex/parallel/optimized_sync_batchnorm_kernel.py'
adding 'apex/parallel/sync_batchnorm.py'
adding 'apex/parallel/sync_batchnorm_kernel.py'
adding 'apex/pyprof/__init__.py'
adding 'apex/pyprof/nvtx/__init__.py'
adding 'apex/pyprof/nvtx/nvmarker.py'
adding 'apex/pyprof/parse/__init__.py'
adding 'apex/pyprof/parse/__main__.py'
adding 'apex/pyprof/parse/db.py'
adding 'apex/pyprof/parse/kernel.py'
adding 'apex/pyprof/parse/nvvp.py'
adding 'apex/pyprof/parse/parse.py'
adding 'apex/pyprof/prof/__init__.py'
adding 'apex/pyprof/prof/__main__.py'
adding 'apex/pyprof/prof/activation.py'
adding 'apex/pyprof/prof/base.py'
adding 'apex/pyprof/prof/blas.py'
adding 'apex/pyprof/prof/conv.py'
adding 'apex/pyprof/prof/convert.py'
adding 'apex/pyprof/prof/data.py'
adding 'apex/pyprof/prof/dropout.py'
adding 'apex/pyprof/prof/embedding.py'
adding 'apex/pyprof/prof/index_slice_join_mutate.py'
adding 'apex/pyprof/prof/linear.py'
adding 'apex/pyprof/prof/loss.py'
adding 'apex/pyprof/prof/misc.py'
adding 'apex/pyprof/prof/normalization.py'
adding 'apex/pyprof/prof/optim.py'
adding 'apex/pyprof/prof/output.py'
adding 'apex/pyprof/prof/pointwise.py'
adding 'apex/pyprof/prof/pooling.py'
adding 'apex/pyprof/prof/prof.py'
adding 'apex/pyprof/prof/randomSample.py'
adding 'apex/pyprof/prof/recurrentCell.py'
adding 'apex/pyprof/prof/reduction.py'
adding 'apex/pyprof/prof/softmax.py'
adding 'apex/pyprof/prof/usage.py'
adding 'apex/pyprof/prof/utility.py'
adding 'apex/reparameterization/__init__.py'
adding 'apex/reparameterization/reparameterization.py'
adding 'apex/reparameterization/weight_norm.py'
adding 'apex-0.1.dist-info/LICENSE'
adding 'apex-0.1.dist-info/METADATA'
adding 'apex-0.1.dist-info/WHEEL'
adding 'apex-0.1.dist-info/top_level.txt'
adding 'apex-0.1.dist-info/RECORD'
removing build/bdist.linux-x86_64/wheel
/home/user/code/DeepSpeed
Installing apex locally so that deepspeed will build
Found existing installation: apex 0.1
Uninstalling apex-0.1:
  Successfully uninstalled apex-0.1
Non-user install because user site-packages disabled
Created temporary directory: /tmp/pip-ephem-wheel-cache-9arkmem0
Created temporary directory: /tmp/pip-req-tracker-d4gfdhga
Initialized build tracking at /tmp/pip-req-tracker-d4gfdhga
Created build tracker: /tmp/pip-req-tracker-d4gfdhga
Entered build tracker: /tmp/pip-req-tracker-d4gfdhga
Created temporary directory: /tmp/pip-install-bakr9hm9
Processing ./third_party/apex/dist/apex-0.1-cp38-cp38-linux_x86_64.whl
  Added apex==0.1 from file:///home/user/code/DeepSpeed/third_party/apex/dist/apex-0.1-cp38-cp38-linux_x86_64.whl to build tracker '/tmp/pip-req-tracker-d4gfdhga'
  Removed apex==0.1 from file:///home/user/code/DeepSpeed/third_party/apex/dist/apex-0.1-cp38-cp38-linux_x86_64.whl from build tracker '/tmp/pip-req-tracker-d4gfdhga'
Installing collected packages: apex
  Created temporary directory: /tmp/pip-unpacked-wheel-t50d2vwx

Successfully installed apex-0.1
Cleaning up...
Removed build tracker: '/tmp/pip-req-tracker-d4gfdhga'
Building deepspeed wheel
./install.sh: line 196: 10132 Floating point exception(core dumped) python setup.py -v bdist_wheel
Error on line 195
Fail to install deepspeed

@drfinkus
Copy link
Author

drfinkus commented Sep 23, 2020

Trying with DS_BUILD_CUDA=0 gives the same result:

(env) user@desktop:~/code/DeepSpeed$ DS_BUILD_CUDA=0 ./install.sh 
Attempting to remove deepspeed/git_version_info_installed.py
Attempting to remove dist
Attempting to remove build
Attempting to remove deepspeed.egg-info
Attempting to remove third_party/apex/dist
removed 'third_party/apex/dist/apex-0.1-cp38-cp38-linux_x86_64.whl'
removed directory 'third_party/apex/dist'
Attempting to remove third_party/apex/build
removed 'third_party/apex/build/temp.linux-x86_64-3.8/csrc/multi_tensor_axpby_kernel.o'
removed 'third_party/apex/build/temp.linux-x86_64-3.8/csrc/syncbn.o'
removed 'third_party/apex/build/temp.linux-x86_64-3.8/csrc/multi_tensor_sgd_kernel.o'
removed 'third_party/apex/build/temp.linux-x86_64-3.8/csrc/multi_tensor_scale_kernel.o'
removed 'third_party/apex/build/temp.linux-x86_64-3.8/csrc/layer_norm_cuda_kernel.o'
removed 'third_party/apex/build/temp.linux-x86_64-3.8/csrc/multi_tensor_novograd.o'
removed 'third_party/apex/build/temp.linux-x86_64-3.8/csrc/multi_tensor_lamb_stage_2.o'
removed 'third_party/apex/build/temp.linux-x86_64-3.8/csrc/multi_tensor_adam.o'
removed 'third_party/apex/build/temp.linux-x86_64-3.8/csrc/welford.o'
removed 'third_party/apex/build/temp.linux-x86_64-3.8/csrc/multi_tensor_lamb_stage_1.o'
removed 'third_party/apex/build/temp.linux-x86_64-3.8/csrc/layer_norm_cuda.o'
removed 'third_party/apex/build/temp.linux-x86_64-3.8/csrc/multi_tensor_l2norm_kernel.o'
removed 'third_party/apex/build/temp.linux-x86_64-3.8/csrc/flatten_unflatten.o'
removed 'third_party/apex/build/temp.linux-x86_64-3.8/csrc/amp_C_frontend.o'
removed 'third_party/apex/build/temp.linux-x86_64-3.8/csrc/multi_tensor_lamb.o'
removed directory 'third_party/apex/build/temp.linux-x86_64-3.8/csrc'
removed directory 'third_party/apex/build/temp.linux-x86_64-3.8'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/amp_C.cpython-38-x86_64-linux-gnu.so'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/fused_layer_norm_cuda.cpython-38-x86_64-linux-gnu.so'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/syncbn.cpython-38-x86_64-linux-gnu.so'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/contrib/xentropy/__init__.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/contrib/xentropy/softmax_xentropy.py'
removed directory 'third_party/apex/build/lib.linux-x86_64-3.8/apex/contrib/xentropy'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/contrib/optimizers/__init__.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/contrib/optimizers/fp16_optimizer.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/contrib/optimizers/fused_sgd.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/contrib/optimizers/fused_adam.py'
removed directory 'third_party/apex/build/lib.linux-x86_64-3.8/apex/contrib/optimizers'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/contrib/__init__.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/contrib/groupbn/__init__.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/contrib/groupbn/batch_norm.py'
removed directory 'third_party/apex/build/lib.linux-x86_64-3.8/apex/contrib/groupbn'
removed directory 'third_party/apex/build/lib.linux-x86_64-3.8/apex/contrib'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/parallel/LARC.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/parallel/distributed.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/parallel/__init__.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/parallel/optimized_sync_batchnorm.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/parallel/sync_batchnorm.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/parallel/multiproc.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/parallel/optimized_sync_batchnorm_kernel.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/parallel/sync_batchnorm_kernel.py'
removed directory 'third_party/apex/build/lib.linux-x86_64-3.8/apex/parallel'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/multi_tensor_apply/__init__.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/multi_tensor_apply/multi_tensor_apply.py'
removed directory 'third_party/apex/build/lib.linux-x86_64-3.8/apex/multi_tensor_apply'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/normalization/fused_layer_norm.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/normalization/__init__.py'
removed directory 'third_party/apex/build/lib.linux-x86_64-3.8/apex/normalization'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/optimizers/fused_lamb.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/optimizers/fused_novograd.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/optimizers/__init__.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/optimizers/fused_sgd.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/optimizers/fused_adam.py'
removed directory 'third_party/apex/build/lib.linux-x86_64-3.8/apex/optimizers'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/pyprof/nvtx/__init__.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/pyprof/nvtx/nvmarker.py'
removed directory 'third_party/apex/build/lib.linux-x86_64-3.8/apex/pyprof/nvtx'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/pyprof/__init__.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/pyprof/parse/nvvp.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/pyprof/parse/__main__.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/pyprof/parse/parse.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/pyprof/parse/__init__.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/pyprof/parse/db.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/pyprof/parse/kernel.py'
removed directory 'third_party/apex/build/lib.linux-x86_64-3.8/apex/pyprof/parse'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/pyprof/prof/loss.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/pyprof/prof/optim.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/pyprof/prof/output.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/pyprof/prof/misc.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/pyprof/prof/__main__.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/pyprof/prof/usage.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/pyprof/prof/index_slice_join_mutate.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/pyprof/prof/pointwise.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/pyprof/prof/blas.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/pyprof/prof/embedding.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/pyprof/prof/reduction.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/pyprof/prof/prof.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/pyprof/prof/utility.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/pyprof/prof/__init__.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/pyprof/prof/softmax.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/pyprof/prof/recurrentCell.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/pyprof/prof/randomSample.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/pyprof/prof/conv.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/pyprof/prof/data.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/pyprof/prof/pooling.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/pyprof/prof/activation.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/pyprof/prof/linear.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/pyprof/prof/normalization.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/pyprof/prof/base.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/pyprof/prof/dropout.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/pyprof/prof/convert.py'
removed directory 'third_party/apex/build/lib.linux-x86_64-3.8/apex/pyprof/prof'
removed directory 'third_party/apex/build/lib.linux-x86_64-3.8/apex/pyprof'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/__init__.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/RNN/cells.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/RNN/__init__.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/RNN/RNNBackend.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/RNN/models.py'
removed directory 'third_party/apex/build/lib.linux-x86_64-3.8/apex/RNN'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/amp/utils.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/amp/rnn_compat.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/amp/frontend.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/amp/_initialize.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/amp/amp.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/amp/_process_optimizer.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/amp/opt.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/amp/__init__.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/amp/handle.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/amp/scaler.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/amp/__version__.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/amp/compat.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/amp/_amp_state.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/amp/lists/torch_overrides.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/amp/lists/__init__.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/amp/lists/functional_overrides.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/amp/lists/tensor_overrides.py'
removed directory 'third_party/apex/build/lib.linux-x86_64-3.8/apex/amp/lists'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/amp/wrap.py'
removed directory 'third_party/apex/build/lib.linux-x86_64-3.8/apex/amp'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/reparameterization/reparameterization.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/reparameterization/__init__.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/reparameterization/weight_norm.py'
removed directory 'third_party/apex/build/lib.linux-x86_64-3.8/apex/reparameterization'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/fp16_utils/__init__.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/fp16_utils/loss_scaler.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/fp16_utils/fp16_optimizer.py'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex/fp16_utils/fp16util.py'
removed directory 'third_party/apex/build/lib.linux-x86_64-3.8/apex/fp16_utils'
removed directory 'third_party/apex/build/lib.linux-x86_64-3.8/apex'
removed 'third_party/apex/build/lib.linux-x86_64-3.8/apex_C.cpython-38-x86_64-linux-gnu.so'
removed directory 'third_party/apex/build/lib.linux-x86_64-3.8'
removed directory 'third_party/apex/build/bdist.linux-x86_64'
removed directory 'third_party/apex/build'
Attempting to remove third_party/apex/apex.egg-info
removed 'third_party/apex/apex.egg-info/dependency_links.txt'
removed 'third_party/apex/apex.egg-info/PKG-INFO'
removed 'third_party/apex/apex.egg-info/top_level.txt'
removed 'third_party/apex/apex.egg-info/SOURCES.txt'
removed directory 'third_party/apex/apex.egg-info'
No hostfile exists at /job/hostfile, installing locally
Non-user install because user site-packages disabled
Created temporary directory: /tmp/pip-ephem-wheel-cache-5ffypsgg
Created temporary directory: /tmp/pip-req-tracker-ncb3y77e
Initialized build tracking at /tmp/pip-req-tracker-ncb3y77e
Created build tracker: /tmp/pip-req-tracker-ncb3y77e
Entered build tracker: /tmp/pip-req-tracker-ncb3y77e
Created temporary directory: /tmp/pip-install-zjniy548
Requirement already satisfied: torch>=1.2 in ./env/lib/python3.8/site-packages (from -r requirements/requirements.txt (line 1)) (1.6.0)
Requirement already satisfied: torchvision>=0.4.0 in ./env/lib/python3.8/site-packages (from -r requirements/requirements.txt (line 2)) (0.7.0)
Requirement already satisfied: tqdm in ./env/lib/python3.8/site-packages (from -r requirements/requirements.txt (line 3)) (4.49.0)
Requirement already satisfied: psutil in ./env/lib/python3.8/site-packages (from -r requirements/requirements.txt (line 4)) (5.7.2)
Requirement already satisfied: cpufeature in ./env/lib/python3.8/site-packages (from -r requirements/requirements.txt (line 5)) (0.1.1)
Requirement already satisfied: tensorboardX==1.8 in ./env/lib/python3.8/site-packages (from -r requirements/requirements.txt (line 6)) (1.8)
Requirement already satisfied: numpy in ./env/lib/python3.8/site-packages (from torch>=1.2->-r requirements/requirements.txt (line 1)) (1.19.2)
Requirement already satisfied: future in ./env/lib/python3.8/site-packages (from torch>=1.2->-r requirements/requirements.txt (line 1)) (0.18.2)
Requirement already satisfied: pillow>=4.1.1 in ./env/lib/python3.8/site-packages (from torchvision>=0.4.0->-r requirements/requirements.txt (line 2)) (7.2.0)
Requirement already satisfied: protobuf>=3.2.0 in ./env/lib/python3.8/site-packages (from tensorboardX==1.8->-r requirements/requirements.txt (line 6)) (3.13.0)
Requirement already satisfied: six in ./env/lib/python3.8/site-packages (from tensorboardX==1.8->-r requirements/requirements.txt (line 6)) (1.15.0)
Requirement already satisfied: setuptools in ./env/lib/python3.8/site-packages (from protobuf>=3.2.0->tensorboardX==1.8->-r requirements/requirements.txt (line 6)) (44.0.0)
Cleaning up...
Removed build tracker: '/tmp/pip-req-tracker-ncb3y77e'
Checking out sub-module(s)
Building apex wheel
torch.__version__  =  1.6.0
setup.py:43: UserWarning: Option --pyprof not specified. Not installing PyProf dependencies!
  warnings.warn("Option --pyprof not specified. Not installing PyProf dependencies!")

Compiling cuda extensions with
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89
from /usr/local/cuda/bin

running bdist_wheel
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/utils/cpp_extension.py:335: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
  warnings.warn(msg.format('we could not find ninja.'))
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.8
creating build/lib.linux-x86_64-3.8/apex
copying apex/__init__.py -> build/lib.linux-x86_64-3.8/apex
creating build/lib.linux-x86_64-3.8/apex/contrib
copying apex/contrib/__init__.py -> build/lib.linux-x86_64-3.8/apex/contrib
creating build/lib.linux-x86_64-3.8/apex/parallel
copying apex/parallel/LARC.py -> build/lib.linux-x86_64-3.8/apex/parallel
copying apex/parallel/distributed.py -> build/lib.linux-x86_64-3.8/apex/parallel
copying apex/parallel/__init__.py -> build/lib.linux-x86_64-3.8/apex/parallel
copying apex/parallel/optimized_sync_batchnorm.py -> build/lib.linux-x86_64-3.8/apex/parallel
copying apex/parallel/sync_batchnorm.py -> build/lib.linux-x86_64-3.8/apex/parallel
copying apex/parallel/multiproc.py -> build/lib.linux-x86_64-3.8/apex/parallel
copying apex/parallel/optimized_sync_batchnorm_kernel.py -> build/lib.linux-x86_64-3.8/apex/parallel
copying apex/parallel/sync_batchnorm_kernel.py -> build/lib.linux-x86_64-3.8/apex/parallel
creating build/lib.linux-x86_64-3.8/apex/multi_tensor_apply
copying apex/multi_tensor_apply/__init__.py -> build/lib.linux-x86_64-3.8/apex/multi_tensor_apply
copying apex/multi_tensor_apply/multi_tensor_apply.py -> build/lib.linux-x86_64-3.8/apex/multi_tensor_apply
creating build/lib.linux-x86_64-3.8/apex/normalization
copying apex/normalization/fused_layer_norm.py -> build/lib.linux-x86_64-3.8/apex/normalization
copying apex/normalization/__init__.py -> build/lib.linux-x86_64-3.8/apex/normalization
creating build/lib.linux-x86_64-3.8/apex/optimizers
copying apex/optimizers/fused_lamb.py -> build/lib.linux-x86_64-3.8/apex/optimizers
copying apex/optimizers/fused_novograd.py -> build/lib.linux-x86_64-3.8/apex/optimizers
copying apex/optimizers/__init__.py -> build/lib.linux-x86_64-3.8/apex/optimizers
copying apex/optimizers/fused_sgd.py -> build/lib.linux-x86_64-3.8/apex/optimizers
copying apex/optimizers/fused_adam.py -> build/lib.linux-x86_64-3.8/apex/optimizers
creating build/lib.linux-x86_64-3.8/apex/pyprof
copying apex/pyprof/__init__.py -> build/lib.linux-x86_64-3.8/apex/pyprof
creating build/lib.linux-x86_64-3.8/apex/RNN
copying apex/RNN/cells.py -> build/lib.linux-x86_64-3.8/apex/RNN
copying apex/RNN/__init__.py -> build/lib.linux-x86_64-3.8/apex/RNN
copying apex/RNN/RNNBackend.py -> build/lib.linux-x86_64-3.8/apex/RNN
copying apex/RNN/models.py -> build/lib.linux-x86_64-3.8/apex/RNN
creating build/lib.linux-x86_64-3.8/apex/amp
copying apex/amp/utils.py -> build/lib.linux-x86_64-3.8/apex/amp
copying apex/amp/rnn_compat.py -> build/lib.linux-x86_64-3.8/apex/amp
copying apex/amp/frontend.py -> build/lib.linux-x86_64-3.8/apex/amp
copying apex/amp/_initialize.py -> build/lib.linux-x86_64-3.8/apex/amp
copying apex/amp/amp.py -> build/lib.linux-x86_64-3.8/apex/amp
copying apex/amp/_process_optimizer.py -> build/lib.linux-x86_64-3.8/apex/amp
copying apex/amp/opt.py -> build/lib.linux-x86_64-3.8/apex/amp
copying apex/amp/__init__.py -> build/lib.linux-x86_64-3.8/apex/amp
copying apex/amp/handle.py -> build/lib.linux-x86_64-3.8/apex/amp
copying apex/amp/scaler.py -> build/lib.linux-x86_64-3.8/apex/amp
copying apex/amp/__version__.py -> build/lib.linux-x86_64-3.8/apex/amp
copying apex/amp/compat.py -> build/lib.linux-x86_64-3.8/apex/amp
copying apex/amp/_amp_state.py -> build/lib.linux-x86_64-3.8/apex/amp
copying apex/amp/wrap.py -> build/lib.linux-x86_64-3.8/apex/amp
creating build/lib.linux-x86_64-3.8/apex/reparameterization
copying apex/reparameterization/reparameterization.py -> build/lib.linux-x86_64-3.8/apex/reparameterization
copying apex/reparameterization/__init__.py -> build/lib.linux-x86_64-3.8/apex/reparameterization
copying apex/reparameterization/weight_norm.py -> build/lib.linux-x86_64-3.8/apex/reparameterization
creating build/lib.linux-x86_64-3.8/apex/fp16_utils
copying apex/fp16_utils/__init__.py -> build/lib.linux-x86_64-3.8/apex/fp16_utils
copying apex/fp16_utils/loss_scaler.py -> build/lib.linux-x86_64-3.8/apex/fp16_utils
copying apex/fp16_utils/fp16_optimizer.py -> build/lib.linux-x86_64-3.8/apex/fp16_utils
copying apex/fp16_utils/fp16util.py -> build/lib.linux-x86_64-3.8/apex/fp16_utils
creating build/lib.linux-x86_64-3.8/apex/contrib/xentropy
copying apex/contrib/xentropy/__init__.py -> build/lib.linux-x86_64-3.8/apex/contrib/xentropy
copying apex/contrib/xentropy/softmax_xentropy.py -> build/lib.linux-x86_64-3.8/apex/contrib/xentropy
creating build/lib.linux-x86_64-3.8/apex/contrib/optimizers
copying apex/contrib/optimizers/__init__.py -> build/lib.linux-x86_64-3.8/apex/contrib/optimizers
copying apex/contrib/optimizers/fp16_optimizer.py -> build/lib.linux-x86_64-3.8/apex/contrib/optimizers
copying apex/contrib/optimizers/fused_sgd.py -> build/lib.linux-x86_64-3.8/apex/contrib/optimizers
copying apex/contrib/optimizers/fused_adam.py -> build/lib.linux-x86_64-3.8/apex/contrib/optimizers
creating build/lib.linux-x86_64-3.8/apex/contrib/groupbn
copying apex/contrib/groupbn/__init__.py -> build/lib.linux-x86_64-3.8/apex/contrib/groupbn
copying apex/contrib/groupbn/batch_norm.py -> build/lib.linux-x86_64-3.8/apex/contrib/groupbn
creating build/lib.linux-x86_64-3.8/apex/pyprof/nvtx
copying apex/pyprof/nvtx/__init__.py -> build/lib.linux-x86_64-3.8/apex/pyprof/nvtx
copying apex/pyprof/nvtx/nvmarker.py -> build/lib.linux-x86_64-3.8/apex/pyprof/nvtx
creating build/lib.linux-x86_64-3.8/apex/pyprof/parse
copying apex/pyprof/parse/nvvp.py -> build/lib.linux-x86_64-3.8/apex/pyprof/parse
copying apex/pyprof/parse/__main__.py -> build/lib.linux-x86_64-3.8/apex/pyprof/parse
copying apex/pyprof/parse/parse.py -> build/lib.linux-x86_64-3.8/apex/pyprof/parse
copying apex/pyprof/parse/__init__.py -> build/lib.linux-x86_64-3.8/apex/pyprof/parse
copying apex/pyprof/parse/db.py -> build/lib.linux-x86_64-3.8/apex/pyprof/parse
copying apex/pyprof/parse/kernel.py -> build/lib.linux-x86_64-3.8/apex/pyprof/parse
creating build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/loss.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/optim.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/output.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/misc.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/__main__.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/usage.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/index_slice_join_mutate.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/pointwise.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/blas.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/embedding.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/reduction.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/prof.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/utility.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/__init__.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/softmax.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/recurrentCell.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/randomSample.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/conv.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/data.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/pooling.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/activation.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/linear.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/normalization.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/base.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/dropout.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
copying apex/pyprof/prof/convert.py -> build/lib.linux-x86_64-3.8/apex/pyprof/prof
creating build/lib.linux-x86_64-3.8/apex/amp/lists
copying apex/amp/lists/torch_overrides.py -> build/lib.linux-x86_64-3.8/apex/amp/lists
copying apex/amp/lists/__init__.py -> build/lib.linux-x86_64-3.8/apex/amp/lists
copying apex/amp/lists/functional_overrides.py -> build/lib.linux-x86_64-3.8/apex/amp/lists
copying apex/amp/lists/tensor_overrides.py -> build/lib.linux-x86_64-3.8/apex/amp/lists
running build_ext
building 'apex_C' extension
creating build/temp.linux-x86_64-3.8
creating build/temp.linux-x86_64-3.8/csrc
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/TH -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/THC -I/home/user/code/DeepSpeed/env/include -I/usr/include/python3.8 -c csrc/flatten_unflatten.cpp -o build/temp.linux-x86_64-3.8/csrc/flatten_unflatten.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=apex_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Parallel.h:149,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/utils.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:5,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:7,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/flatten_unflatten.cpp:1:
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ParallelOpenMP.h:84: warning: ignoring #pragma omp parallel [-Wunknown-pragmas]
   84 | #pragma omp parallel for if ((end - begin) >= grain_size)
      | 
In file included from csrc/flatten_unflatten.cpp:2:
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/utils/tensor_flatten.h: In member function ‘at::DeprecatedTypeProperties& torch::utils::TensorGroup::type()’:
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/utils/tensor_flatten.h:36:28: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
   36 |     return tensors[0].type();
      |                            ^
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Context.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:5,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/flatten_unflatten.cpp:1:
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:268:30: note: declared here
  268 |   DeprecatedTypeProperties & type() const {
      |                              ^~~~
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/csrc/flatten_unflatten.o -L/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -o build/lib.linux-x86_64-3.8/apex_C.cpython-38-x86_64-linux-gnu.so
building 'amp_C' extension
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/TH -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/user/code/DeepSpeed/env/include -I/usr/include/python3.8 -c csrc/amp_C_frontend.cpp -o build/temp.linux-x86_64-3.8/csrc/amp_C_frontend.o -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Parallel.h:149,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/utils.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:5,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:7,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/amp_C_frontend.cpp:1:
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ParallelOpenMP.h:84: warning: ignoring #pragma omp parallel [-Wunknown-pragmas]
   84 | #pragma omp parallel for if ((end - begin) >= grain_size)
      | 
/usr/local/cuda/bin/nvcc -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/TH -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/user/code/DeepSpeed/env/include -I/usr/include/python3.8 -c csrc/multi_tensor_sgd_kernel.cu -o build/temp.linux-x86_64-3.8/csrc/multi_tensor_sgd_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -std=c++14
/usr/local/cuda/bin/nvcc -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/TH -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/user/code/DeepSpeed/env/include -I/usr/include/python3.8 -c csrc/multi_tensor_scale_kernel.cu -o build/temp.linux-x86_64-3.8/csrc/multi_tensor_scale_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -std=c++14
/usr/local/cuda/bin/nvcc -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/TH -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/user/code/DeepSpeed/env/include -I/usr/include/python3.8 -c csrc/multi_tensor_axpby_kernel.cu -o build/temp.linux-x86_64-3.8/csrc/multi_tensor_axpby_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -std=c++14
/usr/local/cuda/bin/nvcc -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/TH -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/user/code/DeepSpeed/env/include -I/usr/include/python3.8 -c csrc/multi_tensor_l2norm_kernel.cu -o build/temp.linux-x86_64-3.8/csrc/multi_tensor_l2norm_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -std=c++14
/usr/local/cuda/bin/nvcc -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/TH -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/user/code/DeepSpeed/env/include -I/usr/include/python3.8 -c csrc/multi_tensor_lamb_stage_1.cu -o build/temp.linux-x86_64-3.8/csrc/multi_tensor_lamb_stage_1.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -std=c++14
/usr/local/cuda/bin/nvcc -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/TH -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/user/code/DeepSpeed/env/include -I/usr/include/python3.8 -c csrc/multi_tensor_lamb_stage_2.cu -o build/temp.linux-x86_64-3.8/csrc/multi_tensor_lamb_stage_2.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -std=c++14
/usr/local/cuda/bin/nvcc -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/TH -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/user/code/DeepSpeed/env/include -I/usr/include/python3.8 -c csrc/multi_tensor_adam.cu -o build/temp.linux-x86_64-3.8/csrc/multi_tensor_adam.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -std=c++14
/usr/local/cuda/bin/nvcc -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/TH -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/user/code/DeepSpeed/env/include -I/usr/include/python3.8 -c csrc/multi_tensor_novograd.cu -o build/temp.linux-x86_64-3.8/csrc/multi_tensor_novograd.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -std=c++14
/usr/local/cuda/bin/nvcc -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/TH -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/user/code/DeepSpeed/env/include -I/usr/include/python3.8 -c csrc/multi_tensor_lamb.cu -o build/temp.linux-x86_64-3.8/csrc/multi_tensor_lamb.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -std=c++14
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/csrc/amp_C_frontend.o build/temp.linux-x86_64-3.8/csrc/multi_tensor_sgd_kernel.o build/temp.linux-x86_64-3.8/csrc/multi_tensor_scale_kernel.o build/temp.linux-x86_64-3.8/csrc/multi_tensor_axpby_kernel.o build/temp.linux-x86_64-3.8/csrc/multi_tensor_l2norm_kernel.o build/temp.linux-x86_64-3.8/csrc/multi_tensor_lamb_stage_1.o build/temp.linux-x86_64-3.8/csrc/multi_tensor_lamb_stage_2.o build/temp.linux-x86_64-3.8/csrc/multi_tensor_adam.o build/temp.linux-x86_64-3.8/csrc/multi_tensor_novograd.o build/temp.linux-x86_64-3.8/csrc/multi_tensor_lamb.o -L/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-3.8/amp_C.cpython-38-x86_64-linux-gnu.so
building 'syncbn' extension
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/TH -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/user/code/DeepSpeed/env/include -I/usr/include/python3.8 -c csrc/syncbn.cpp -o build/temp.linux-x86_64-3.8/csrc/syncbn.o -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=syncbn -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Parallel.h:149,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/utils.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:5,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:7,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/syncbn.cpp:1:
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ParallelOpenMP.h:84: warning: ignoring #pragma omp parallel [-Wunknown-pragmas]
   84 | #pragma omp parallel for if ((end - begin) >= grain_size)
      | 
/usr/local/cuda/bin/nvcc -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/TH -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/user/code/DeepSpeed/env/include -I/usr/include/python3.8 -c csrc/welford.cu -o build/temp.linux-x86_64-3.8/csrc/welford.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=syncbn -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -std=c++14
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/csrc/syncbn.o build/temp.linux-x86_64-3.8/csrc/welford.o -L/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-3.8/syncbn.cpython-38-x86_64-linux-gnu.so
building 'fused_layer_norm_cuda' extension
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/TH -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/user/code/DeepSpeed/env/include -I/usr/include/python3.8 -c csrc/layer_norm_cuda.cpp -o build/temp.linux-x86_64-3.8/csrc/layer_norm_cuda.o -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=fused_layer_norm_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Parallel.h:149,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/utils.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:5,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:7,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ParallelOpenMP.h:84: warning: ignoring #pragma omp parallel [-Wunknown-pragmas]
   84 | #pragma omp parallel for if ((end - begin) >= grain_size)
      | 
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/DeviceType.h:8,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Device.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
csrc/layer_norm_cuda.cpp: In function ‘std::vector<at::Tensor> layer_norm(at::Tensor, c10::IntArrayRef, double)’:
csrc/layer_norm_cuda.cpp:117:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                                          ^
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/macros/Macros.h:146:65: note: in definition of macro ‘C10_UNLIKELY’
  146 | #define C10_UNLIKELY(expr)  (__builtin_expect(static_cast<bool>(expr), 0))
      |                                                                 ^~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:330:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
  330 |   if (C10_UNLIKELY_OR_CONST(!(cond))) {                               \
      |       ^~~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:318:3: note: in expansion of macro ‘TORCH_CHECK_WITH_MSG’
  318 |   TORCH_CHECK_WITH_MSG(error_t, cond, "", __VA_ARGS__)
      |   ^~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:341:32: note: in expansion of macro ‘TORCH_CHECK_WITH’
  341 | #define TORCH_CHECK(cond, ...) TORCH_CHECK_WITH(Error, cond, __VA_ARGS__)
      |                                ^~~~~~~~~~~~~~~~
csrc/layer_norm_cuda.cpp:117:23: note: in expansion of macro ‘TORCH_CHECK’
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                       ^~~~~~~~~~~
csrc/layer_norm_cuda.cpp:119:24: note: in expansion of macro ‘CHECK_CUDA’
  119 | #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
      |                        ^~~~~~~~~~
csrc/layer_norm_cuda.cpp:129:3: note: in expansion of macro ‘CHECK_INPUT’
  129 |   CHECK_INPUT(input);
      |   ^~~~~~~~~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Context.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:5,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:268:30: note: declared here
  268 |   DeprecatedTypeProperties & type() const {
      |                              ^~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/DeviceType.h:8,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Device.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
csrc/layer_norm_cuda.cpp: In function ‘std::vector<at::Tensor> layer_norm_affine(at::Tensor, c10::IntArrayRef, at::Tensor, at::Tensor, double)’:
csrc/layer_norm_cuda.cpp:117:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                                          ^
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/macros/Macros.h:146:65: note: in definition of macro ‘C10_UNLIKELY’
  146 | #define C10_UNLIKELY(expr)  (__builtin_expect(static_cast<bool>(expr), 0))
      |                                                                 ^~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:330:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
  330 |   if (C10_UNLIKELY_OR_CONST(!(cond))) {                               \
      |       ^~~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:318:3: note: in expansion of macro ‘TORCH_CHECK_WITH_MSG’
  318 |   TORCH_CHECK_WITH_MSG(error_t, cond, "", __VA_ARGS__)
      |   ^~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:341:32: note: in expansion of macro ‘TORCH_CHECK_WITH’
  341 | #define TORCH_CHECK(cond, ...) TORCH_CHECK_WITH(Error, cond, __VA_ARGS__)
      |                                ^~~~~~~~~~~~~~~~
csrc/layer_norm_cuda.cpp:117:23: note: in expansion of macro ‘TORCH_CHECK’
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                       ^~~~~~~~~~~
csrc/layer_norm_cuda.cpp:119:24: note: in expansion of macro ‘CHECK_CUDA’
  119 | #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
      |                        ^~~~~~~~~~
csrc/layer_norm_cuda.cpp:149:3: note: in expansion of macro ‘CHECK_INPUT’
  149 |   CHECK_INPUT(input);
      |   ^~~~~~~~~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Context.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:5,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:268:30: note: declared here
  268 |   DeprecatedTypeProperties & type() const {
      |                              ^~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/DeviceType.h:8,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Device.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
csrc/layer_norm_cuda.cpp:117:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                                          ^
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/macros/Macros.h:146:65: note: in definition of macro ‘C10_UNLIKELY’
  146 | #define C10_UNLIKELY(expr)  (__builtin_expect(static_cast<bool>(expr), 0))
      |                                                                 ^~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:330:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
  330 |   if (C10_UNLIKELY_OR_CONST(!(cond))) {                               \
      |       ^~~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:318:3: note: in expansion of macro ‘TORCH_CHECK_WITH_MSG’
  318 |   TORCH_CHECK_WITH_MSG(error_t, cond, "", __VA_ARGS__)
      |   ^~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:341:32: note: in expansion of macro ‘TORCH_CHECK_WITH’
  341 | #define TORCH_CHECK(cond, ...) TORCH_CHECK_WITH(Error, cond, __VA_ARGS__)
      |                                ^~~~~~~~~~~~~~~~
csrc/layer_norm_cuda.cpp:117:23: note: in expansion of macro ‘TORCH_CHECK’
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                       ^~~~~~~~~~~
csrc/layer_norm_cuda.cpp:119:24: note: in expansion of macro ‘CHECK_CUDA’
  119 | #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
      |                        ^~~~~~~~~~
csrc/layer_norm_cuda.cpp:150:3: note: in expansion of macro ‘CHECK_INPUT’
  150 |   CHECK_INPUT(gamma);
      |   ^~~~~~~~~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Context.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:5,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:268:30: note: declared here
  268 |   DeprecatedTypeProperties & type() const {
      |                              ^~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/DeviceType.h:8,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Device.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
csrc/layer_norm_cuda.cpp:117:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                                          ^
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/macros/Macros.h:146:65: note: in definition of macro ‘C10_UNLIKELY’
  146 | #define C10_UNLIKELY(expr)  (__builtin_expect(static_cast<bool>(expr), 0))
      |                                                                 ^~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:330:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
  330 |   if (C10_UNLIKELY_OR_CONST(!(cond))) {                               \
      |       ^~~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:318:3: note: in expansion of macro ‘TORCH_CHECK_WITH_MSG’
  318 |   TORCH_CHECK_WITH_MSG(error_t, cond, "", __VA_ARGS__)
      |   ^~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:341:32: note: in expansion of macro ‘TORCH_CHECK_WITH’
  341 | #define TORCH_CHECK(cond, ...) TORCH_CHECK_WITH(Error, cond, __VA_ARGS__)
      |                                ^~~~~~~~~~~~~~~~
csrc/layer_norm_cuda.cpp:117:23: note: in expansion of macro ‘TORCH_CHECK’
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                       ^~~~~~~~~~~
csrc/layer_norm_cuda.cpp:119:24: note: in expansion of macro ‘CHECK_CUDA’
  119 | #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
      |                        ^~~~~~~~~~
csrc/layer_norm_cuda.cpp:151:3: note: in expansion of macro ‘CHECK_INPUT’
  151 |   CHECK_INPUT(beta);
      |   ^~~~~~~~~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Context.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:5,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:268:30: note: declared here
  268 |   DeprecatedTypeProperties & type() const {
      |                              ^~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/DeviceType.h:8,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Device.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
csrc/layer_norm_cuda.cpp: In function ‘at::Tensor layer_norm_gradient(at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::IntArrayRef, double)’:
csrc/layer_norm_cuda.cpp:117:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                                          ^
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/macros/Macros.h:146:65: note: in definition of macro ‘C10_UNLIKELY’
  146 | #define C10_UNLIKELY(expr)  (__builtin_expect(static_cast<bool>(expr), 0))
      |                                                                 ^~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:330:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
  330 |   if (C10_UNLIKELY_OR_CONST(!(cond))) {                               \
      |       ^~~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:318:3: note: in expansion of macro ‘TORCH_CHECK_WITH_MSG’
  318 |   TORCH_CHECK_WITH_MSG(error_t, cond, "", __VA_ARGS__)
      |   ^~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:341:32: note: in expansion of macro ‘TORCH_CHECK_WITH’
  341 | #define TORCH_CHECK(cond, ...) TORCH_CHECK_WITH(Error, cond, __VA_ARGS__)
      |                                ^~~~~~~~~~~~~~~~
csrc/layer_norm_cuda.cpp:117:23: note: in expansion of macro ‘TORCH_CHECK’
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                       ^~~~~~~~~~~
csrc/layer_norm_cuda.cpp:119:24: note: in expansion of macro ‘CHECK_CUDA’
  119 | #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
      |                        ^~~~~~~~~~
csrc/layer_norm_cuda.cpp:193:3: note: in expansion of macro ‘CHECK_INPUT’
  193 |   CHECK_INPUT(dout);
      |   ^~~~~~~~~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Context.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:5,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:268:30: note: declared here
  268 |   DeprecatedTypeProperties & type() const {
      |                              ^~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/DeviceType.h:8,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Device.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
csrc/layer_norm_cuda.cpp:117:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                                          ^
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/macros/Macros.h:146:65: note: in definition of macro ‘C10_UNLIKELY’
  146 | #define C10_UNLIKELY(expr)  (__builtin_expect(static_cast<bool>(expr), 0))
      |                                                                 ^~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:330:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
  330 |   if (C10_UNLIKELY_OR_CONST(!(cond))) {                               \
      |       ^~~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:318:3: note: in expansion of macro ‘TORCH_CHECK_WITH_MSG’
  318 |   TORCH_CHECK_WITH_MSG(error_t, cond, "", __VA_ARGS__)
      |   ^~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:341:32: note: in expansion of macro ‘TORCH_CHECK_WITH’
  341 | #define TORCH_CHECK(cond, ...) TORCH_CHECK_WITH(Error, cond, __VA_ARGS__)
      |                                ^~~~~~~~~~~~~~~~
csrc/layer_norm_cuda.cpp:117:23: note: in expansion of macro ‘TORCH_CHECK’
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                       ^~~~~~~~~~~
csrc/layer_norm_cuda.cpp:119:24: note: in expansion of macro ‘CHECK_CUDA’
  119 | #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
      |                        ^~~~~~~~~~
csrc/layer_norm_cuda.cpp:194:3: note: in expansion of macro ‘CHECK_INPUT’
  194 |   CHECK_INPUT(mean);
      |   ^~~~~~~~~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Context.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:5,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:268:30: note: declared here
  268 |   DeprecatedTypeProperties & type() const {
      |                              ^~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/DeviceType.h:8,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Device.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
csrc/layer_norm_cuda.cpp:117:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                                          ^
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/macros/Macros.h:146:65: note: in definition of macro ‘C10_UNLIKELY’
  146 | #define C10_UNLIKELY(expr)  (__builtin_expect(static_cast<bool>(expr), 0))
      |                                                                 ^~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:330:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
  330 |   if (C10_UNLIKELY_OR_CONST(!(cond))) {                               \
      |       ^~~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:318:3: note: in expansion of macro ‘TORCH_CHECK_WITH_MSG’
  318 |   TORCH_CHECK_WITH_MSG(error_t, cond, "", __VA_ARGS__)
      |   ^~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:341:32: note: in expansion of macro ‘TORCH_CHECK_WITH’
  341 | #define TORCH_CHECK(cond, ...) TORCH_CHECK_WITH(Error, cond, __VA_ARGS__)
      |                                ^~~~~~~~~~~~~~~~
csrc/layer_norm_cuda.cpp:117:23: note: in expansion of macro ‘TORCH_CHECK’
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                       ^~~~~~~~~~~
csrc/layer_norm_cuda.cpp:119:24: note: in expansion of macro ‘CHECK_CUDA’
  119 | #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
      |                        ^~~~~~~~~~
csrc/layer_norm_cuda.cpp:195:3: note: in expansion of macro ‘CHECK_INPUT’
  195 |   CHECK_INPUT(invvar);
      |   ^~~~~~~~~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Context.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:5,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:268:30: note: declared here
  268 |   DeprecatedTypeProperties & type() const {
      |                              ^~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/DeviceType.h:8,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Device.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
csrc/layer_norm_cuda.cpp:117:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                                          ^
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/macros/Macros.h:146:65: note: in definition of macro ‘C10_UNLIKELY’
  146 | #define C10_UNLIKELY(expr)  (__builtin_expect(static_cast<bool>(expr), 0))
      |                                                                 ^~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:330:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
  330 |   if (C10_UNLIKELY_OR_CONST(!(cond))) {                               \
      |       ^~~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:318:3: note: in expansion of macro ‘TORCH_CHECK_WITH_MSG’
  318 |   TORCH_CHECK_WITH_MSG(error_t, cond, "", __VA_ARGS__)
      |   ^~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:341:32: note: in expansion of macro ‘TORCH_CHECK_WITH’
  341 | #define TORCH_CHECK(cond, ...) TORCH_CHECK_WITH(Error, cond, __VA_ARGS__)
      |                                ^~~~~~~~~~~~~~~~
csrc/layer_norm_cuda.cpp:117:23: note: in expansion of macro ‘TORCH_CHECK’
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                       ^~~~~~~~~~~
csrc/layer_norm_cuda.cpp:119:24: note: in expansion of macro ‘CHECK_CUDA’
  119 | #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
      |                        ^~~~~~~~~~
csrc/layer_norm_cuda.cpp:196:3: note: in expansion of macro ‘CHECK_INPUT’
  196 |   CHECK_INPUT(input);
      |   ^~~~~~~~~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Context.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:5,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:268:30: note: declared here
  268 |   DeprecatedTypeProperties & type() const {
      |                              ^~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/DeviceType.h:8,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Device.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
csrc/layer_norm_cuda.cpp: In function ‘std::vector<at::Tensor> layer_norm_gradient_affine(at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::IntArrayRef, at::Tensor, at::Tensor, double)’:
csrc/layer_norm_cuda.cpp:117:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                                          ^
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/macros/Macros.h:146:65: note: in definition of macro ‘C10_UNLIKELY’
  146 | #define C10_UNLIKELY(expr)  (__builtin_expect(static_cast<bool>(expr), 0))
      |                                                                 ^~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:330:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
  330 |   if (C10_UNLIKELY_OR_CONST(!(cond))) {                               \
      |       ^~~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:318:3: note: in expansion of macro ‘TORCH_CHECK_WITH_MSG’
  318 |   TORCH_CHECK_WITH_MSG(error_t, cond, "", __VA_ARGS__)
      |   ^~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:341:32: note: in expansion of macro ‘TORCH_CHECK_WITH’
  341 | #define TORCH_CHECK(cond, ...) TORCH_CHECK_WITH(Error, cond, __VA_ARGS__)
      |                                ^~~~~~~~~~~~~~~~
csrc/layer_norm_cuda.cpp:117:23: note: in expansion of macro ‘TORCH_CHECK’
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                       ^~~~~~~~~~~
csrc/layer_norm_cuda.cpp:119:24: note: in expansion of macro ‘CHECK_CUDA’
  119 | #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
      |                        ^~~~~~~~~~
csrc/layer_norm_cuda.cpp:218:3: note: in expansion of macro ‘CHECK_INPUT’
  218 |   CHECK_INPUT(dout);
      |   ^~~~~~~~~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Context.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:5,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:268:30: note: declared here
  268 |   DeprecatedTypeProperties & type() const {
      |                              ^~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/DeviceType.h:8,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Device.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
csrc/layer_norm_cuda.cpp:117:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                                          ^
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/macros/Macros.h:146:65: note: in definition of macro ‘C10_UNLIKELY’
  146 | #define C10_UNLIKELY(expr)  (__builtin_expect(static_cast<bool>(expr), 0))
      |                                                                 ^~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:330:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
  330 |   if (C10_UNLIKELY_OR_CONST(!(cond))) {                               \
      |       ^~~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:318:3: note: in expansion of macro ‘TORCH_CHECK_WITH_MSG’
  318 |   TORCH_CHECK_WITH_MSG(error_t, cond, "", __VA_ARGS__)
      |   ^~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:341:32: note: in expansion of macro ‘TORCH_CHECK_WITH’
  341 | #define TORCH_CHECK(cond, ...) TORCH_CHECK_WITH(Error, cond, __VA_ARGS__)
      |                                ^~~~~~~~~~~~~~~~
csrc/layer_norm_cuda.cpp:117:23: note: in expansion of macro ‘TORCH_CHECK’
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                       ^~~~~~~~~~~
csrc/layer_norm_cuda.cpp:119:24: note: in expansion of macro ‘CHECK_CUDA’
  119 | #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
      |                        ^~~~~~~~~~
csrc/layer_norm_cuda.cpp:219:3: note: in expansion of macro ‘CHECK_INPUT’
  219 |   CHECK_INPUT(mean);
      |   ^~~~~~~~~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Context.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:5,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:268:30: note: declared here
  268 |   DeprecatedTypeProperties & type() const {
      |                              ^~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/DeviceType.h:8,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Device.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
csrc/layer_norm_cuda.cpp:117:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                                          ^
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/macros/Macros.h:146:65: note: in definition of macro ‘C10_UNLIKELY’
  146 | #define C10_UNLIKELY(expr)  (__builtin_expect(static_cast<bool>(expr), 0))
      |                                                                 ^~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:330:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
  330 |   if (C10_UNLIKELY_OR_CONST(!(cond))) {                               \
      |       ^~~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:318:3: note: in expansion of macro ‘TORCH_CHECK_WITH_MSG’
  318 |   TORCH_CHECK_WITH_MSG(error_t, cond, "", __VA_ARGS__)
      |   ^~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:341:32: note: in expansion of macro ‘TORCH_CHECK_WITH’
  341 | #define TORCH_CHECK(cond, ...) TORCH_CHECK_WITH(Error, cond, __VA_ARGS__)
      |                                ^~~~~~~~~~~~~~~~
csrc/layer_norm_cuda.cpp:117:23: note: in expansion of macro ‘TORCH_CHECK’
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                       ^~~~~~~~~~~
csrc/layer_norm_cuda.cpp:119:24: note: in expansion of macro ‘CHECK_CUDA’
  119 | #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
      |                        ^~~~~~~~~~
csrc/layer_norm_cuda.cpp:220:3: note: in expansion of macro ‘CHECK_INPUT’
  220 |   CHECK_INPUT(invvar);
      |   ^~~~~~~~~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Context.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:5,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:268:30: note: declared here
  268 |   DeprecatedTypeProperties & type() const {
      |                              ^~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/DeviceType.h:8,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Device.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
csrc/layer_norm_cuda.cpp:117:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                                          ^
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/macros/Macros.h:146:65: note: in definition of macro ‘C10_UNLIKELY’
  146 | #define C10_UNLIKELY(expr)  (__builtin_expect(static_cast<bool>(expr), 0))
      |                                                                 ^~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:330:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
  330 |   if (C10_UNLIKELY_OR_CONST(!(cond))) {                               \
      |       ^~~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:318:3: note: in expansion of macro ‘TORCH_CHECK_WITH_MSG’
  318 |   TORCH_CHECK_WITH_MSG(error_t, cond, "", __VA_ARGS__)
      |   ^~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:341:32: note: in expansion of macro ‘TORCH_CHECK_WITH’
  341 | #define TORCH_CHECK(cond, ...) TORCH_CHECK_WITH(Error, cond, __VA_ARGS__)
      |                                ^~~~~~~~~~~~~~~~
csrc/layer_norm_cuda.cpp:117:23: note: in expansion of macro ‘TORCH_CHECK’
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                       ^~~~~~~~~~~
csrc/layer_norm_cuda.cpp:119:24: note: in expansion of macro ‘CHECK_CUDA’
  119 | #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
      |                        ^~~~~~~~~~
csrc/layer_norm_cuda.cpp:221:3: note: in expansion of macro ‘CHECK_INPUT’
  221 |   CHECK_INPUT(input);
      |   ^~~~~~~~~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Context.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:5,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:268:30: note: declared here
  268 |   DeprecatedTypeProperties & type() const {
      |                              ^~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/DeviceType.h:8,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Device.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
csrc/layer_norm_cuda.cpp:117:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                                          ^
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/macros/Macros.h:146:65: note: in definition of macro ‘C10_UNLIKELY’
  146 | #define C10_UNLIKELY(expr)  (__builtin_expect(static_cast<bool>(expr), 0))
      |                                                                 ^~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:330:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
  330 |   if (C10_UNLIKELY_OR_CONST(!(cond))) {                               \
      |       ^~~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:318:3: note: in expansion of macro ‘TORCH_CHECK_WITH_MSG’
  318 |   TORCH_CHECK_WITH_MSG(error_t, cond, "", __VA_ARGS__)
      |   ^~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:341:32: note: in expansion of macro ‘TORCH_CHECK_WITH’
  341 | #define TORCH_CHECK(cond, ...) TORCH_CHECK_WITH(Error, cond, __VA_ARGS__)
      |                                ^~~~~~~~~~~~~~~~
csrc/layer_norm_cuda.cpp:117:23: note: in expansion of macro ‘TORCH_CHECK’
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                       ^~~~~~~~~~~
csrc/layer_norm_cuda.cpp:119:24: note: in expansion of macro ‘CHECK_CUDA’
  119 | #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
      |                        ^~~~~~~~~~
csrc/layer_norm_cuda.cpp:222:3: note: in expansion of macro ‘CHECK_INPUT’
  222 |   CHECK_INPUT(gamma);
      |   ^~~~~~~~~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Context.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:5,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:268:30: note: declared here
  268 |   DeprecatedTypeProperties & type() const {
      |                              ^~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/DeviceType.h:8,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Device.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
csrc/layer_norm_cuda.cpp:117:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                                          ^
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/macros/Macros.h:146:65: note: in definition of macro ‘C10_UNLIKELY’
  146 | #define C10_UNLIKELY(expr)  (__builtin_expect(static_cast<bool>(expr), 0))
      |                                                                 ^~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:330:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
  330 |   if (C10_UNLIKELY_OR_CONST(!(cond))) {                               \
      |       ^~~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:318:3: note: in expansion of macro ‘TORCH_CHECK_WITH_MSG’
  318 |   TORCH_CHECK_WITH_MSG(error_t, cond, "", __VA_ARGS__)
      |   ^~~~~~~~~~~~~~~~~~~~
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:341:32: note: in expansion of macro ‘TORCH_CHECK_WITH’
  341 | #define TORCH_CHECK(cond, ...) TORCH_CHECK_WITH(Error, cond, __VA_ARGS__)
      |                                ^~~~~~~~~~~~~~~~
csrc/layer_norm_cuda.cpp:117:23: note: in expansion of macro ‘TORCH_CHECK’
  117 | #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
      |                       ^~~~~~~~~~~
csrc/layer_norm_cuda.cpp:119:24: note: in expansion of macro ‘CHECK_CUDA’
  119 | #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
      |                        ^~~~~~~~~~
csrc/layer_norm_cuda.cpp:223:3: note: in expansion of macro ‘CHECK_INPUT’
  223 |   CHECK_INPUT(beta);
      |   ^~~~~~~~~~~
In file included from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/Context.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/ATen.h:5,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:268:30: note: declared here
  268 |   DeprecatedTypeProperties & type() const {
      |                              ^~~~
/usr/local/cuda/bin/nvcc -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/TH -I/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/user/code/DeepSpeed/env/include -I/usr/include/python3.8 -c csrc/layer_norm_cuda_kernel.cu -o build/temp.linux-x86_64-3.8/csrc/layer_norm_cuda_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -maxrregcount=50 -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=fused_layer_norm_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -std=c++14
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.8/csrc/layer_norm_cuda.o build/temp.linux-x86_64-3.8/csrc/layer_norm_cuda_kernel.o -L/home/user/code/DeepSpeed/env/lib/python3.8/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-3.8/fused_layer_norm_cuda.cpython-38-x86_64-linux-gnu.so
installing to build/bdist.linux-x86_64/wheel
running install
running install_lib
creating build/bdist.linux-x86_64
creating build/bdist.linux-x86_64/wheel
copying build/lib.linux-x86_64-3.8/amp_C.cpython-38-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/wheel
copying build/lib.linux-x86_64-3.8/fused_layer_norm_cuda.cpython-38-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/wheel
copying build/lib.linux-x86_64-3.8/syncbn.cpython-38-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/wheel
creating build/bdist.linux-x86_64/wheel/apex
creating build/bdist.linux-x86_64/wheel/apex/contrib
creating build/bdist.linux-x86_64/wheel/apex/contrib/xentropy
copying build/lib.linux-x86_64-3.8/apex/contrib/xentropy/__init__.py -> build/bdist.linux-x86_64/wheel/apex/contrib/xentropy
copying build/lib.linux-x86_64-3.8/apex/contrib/xentropy/softmax_xentropy.py -> build/bdist.linux-x86_64/wheel/apex/contrib/xentropy
creating build/bdist.linux-x86_64/wheel/apex/contrib/optimizers
copying build/lib.linux-x86_64-3.8/apex/contrib/optimizers/__init__.py -> build/bdist.linux-x86_64/wheel/apex/contrib/optimizers
copying build/lib.linux-x86_64-3.8/apex/contrib/optimizers/fp16_optimizer.py -> build/bdist.linux-x86_64/wheel/apex/contrib/optimizers
copying build/lib.linux-x86_64-3.8/apex/contrib/optimizers/fused_sgd.py -> build/bdist.linux-x86_64/wheel/apex/contrib/optimizers
copying build/lib.linux-x86_64-3.8/apex/contrib/optimizers/fused_adam.py -> build/bdist.linux-x86_64/wheel/apex/contrib/optimizers
copying build/lib.linux-x86_64-3.8/apex/contrib/__init__.py -> build/bdist.linux-x86_64/wheel/apex/contrib
creating build/bdist.linux-x86_64/wheel/apex/contrib/groupbn
copying build/lib.linux-x86_64-3.8/apex/contrib/groupbn/__init__.py -> build/bdist.linux-x86_64/wheel/apex/contrib/groupbn
copying build/lib.linux-x86_64-3.8/apex/contrib/groupbn/batch_norm.py -> build/bdist.linux-x86_64/wheel/apex/contrib/groupbn
creating build/bdist.linux-x86_64/wheel/apex/parallel
copying build/lib.linux-x86_64-3.8/apex/parallel/LARC.py -> build/bdist.linux-x86_64/wheel/apex/parallel
copying build/lib.linux-x86_64-3.8/apex/parallel/distributed.py -> build/bdist.linux-x86_64/wheel/apex/parallel
copying build/lib.linux-x86_64-3.8/apex/parallel/__init__.py -> build/bdist.linux-x86_64/wheel/apex/parallel
copying build/lib.linux-x86_64-3.8/apex/parallel/optimized_sync_batchnorm.py -> build/bdist.linux-x86_64/wheel/apex/parallel
copying build/lib.linux-x86_64-3.8/apex/parallel/sync_batchnorm.py -> build/bdist.linux-x86_64/wheel/apex/parallel
copying build/lib.linux-x86_64-3.8/apex/parallel/multiproc.py -> build/bdist.linux-x86_64/wheel/apex/parallel
copying build/lib.linux-x86_64-3.8/apex/parallel/optimized_sync_batchnorm_kernel.py -> build/bdist.linux-x86_64/wheel/apex/parallel
copying build/lib.linux-x86_64-3.8/apex/parallel/sync_batchnorm_kernel.py -> build/bdist.linux-x86_64/wheel/apex/parallel
creating build/bdist.linux-x86_64/wheel/apex/multi_tensor_apply
copying build/lib.linux-x86_64-3.8/apex/multi_tensor_apply/__init__.py -> build/bdist.linux-x86_64/wheel/apex/multi_tensor_apply
copying build/lib.linux-x86_64-3.8/apex/multi_tensor_apply/multi_tensor_apply.py -> build/bdist.linux-x86_64/wheel/apex/multi_tensor_apply
creating build/bdist.linux-x86_64/wheel/apex/normalization
copying build/lib.linux-x86_64-3.8/apex/normalization/fused_layer_norm.py -> build/bdist.linux-x86_64/wheel/apex/normalization
copying build/lib.linux-x86_64-3.8/apex/normalization/__init__.py -> build/bdist.linux-x86_64/wheel/apex/normalization
creating build/bdist.linux-x86_64/wheel/apex/optimizers
copying build/lib.linux-x86_64-3.8/apex/optimizers/fused_lamb.py -> build/bdist.linux-x86_64/wheel/apex/optimizers
copying build/lib.linux-x86_64-3.8/apex/optimizers/fused_novograd.py -> build/bdist.linux-x86_64/wheel/apex/optimizers
copying build/lib.linux-x86_64-3.8/apex/optimizers/__init__.py -> build/bdist.linux-x86_64/wheel/apex/optimizers
copying build/lib.linux-x86_64-3.8/apex/optimizers/fused_sgd.py -> build/bdist.linux-x86_64/wheel/apex/optimizers
copying build/lib.linux-x86_64-3.8/apex/optimizers/fused_adam.py -> build/bdist.linux-x86_64/wheel/apex/optimizers
creating build/bdist.linux-x86_64/wheel/apex/pyprof
creating build/bdist.linux-x86_64/wheel/apex/pyprof/nvtx
copying build/lib.linux-x86_64-3.8/apex/pyprof/nvtx/__init__.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/nvtx
copying build/lib.linux-x86_64-3.8/apex/pyprof/nvtx/nvmarker.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/nvtx
copying build/lib.linux-x86_64-3.8/apex/pyprof/__init__.py -> build/bdist.linux-x86_64/wheel/apex/pyprof
creating build/bdist.linux-x86_64/wheel/apex/pyprof/parse
copying build/lib.linux-x86_64-3.8/apex/pyprof/parse/nvvp.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/parse
copying build/lib.linux-x86_64-3.8/apex/pyprof/parse/__main__.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/parse
copying build/lib.linux-x86_64-3.8/apex/pyprof/parse/parse.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/parse
copying build/lib.linux-x86_64-3.8/apex/pyprof/parse/__init__.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/parse
copying build/lib.linux-x86_64-3.8/apex/pyprof/parse/db.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/parse
copying build/lib.linux-x86_64-3.8/apex/pyprof/parse/kernel.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/parse
creating build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/loss.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/optim.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/output.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/misc.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/__main__.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/usage.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/index_slice_join_mutate.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/pointwise.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/blas.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/embedding.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/reduction.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/prof.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/utility.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/__init__.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/softmax.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/recurrentCell.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/randomSample.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/conv.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/data.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/pooling.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/activation.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/linear.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/normalization.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/base.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/dropout.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/pyprof/prof/convert.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.8/apex/__init__.py -> build/bdist.linux-x86_64/wheel/apex
creating build/bdist.linux-x86_64/wheel/apex/RNN
copying build/lib.linux-x86_64-3.8/apex/RNN/cells.py -> build/bdist.linux-x86_64/wheel/apex/RNN
copying build/lib.linux-x86_64-3.8/apex/RNN/__init__.py -> build/bdist.linux-x86_64/wheel/apex/RNN
copying build/lib.linux-x86_64-3.8/apex/RNN/RNNBackend.py -> build/bdist.linux-x86_64/wheel/apex/RNN
copying build/lib.linux-x86_64-3.8/apex/RNN/models.py -> build/bdist.linux-x86_64/wheel/apex/RNN
creating build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.8/apex/amp/utils.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.8/apex/amp/rnn_compat.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.8/apex/amp/frontend.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.8/apex/amp/_initialize.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.8/apex/amp/amp.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.8/apex/amp/_process_optimizer.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.8/apex/amp/opt.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.8/apex/amp/__init__.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.8/apex/amp/handle.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.8/apex/amp/scaler.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.8/apex/amp/__version__.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.8/apex/amp/compat.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.8/apex/amp/_amp_state.py -> build/bdist.linux-x86_64/wheel/apex/amp
creating build/bdist.linux-x86_64/wheel/apex/amp/lists
copying build/lib.linux-x86_64-3.8/apex/amp/lists/torch_overrides.py -> build/bdist.linux-x86_64/wheel/apex/amp/lists
copying build/lib.linux-x86_64-3.8/apex/amp/lists/__init__.py -> build/bdist.linux-x86_64/wheel/apex/amp/lists
copying build/lib.linux-x86_64-3.8/apex/amp/lists/functional_overrides.py -> build/bdist.linux-x86_64/wheel/apex/amp/lists
copying build/lib.linux-x86_64-3.8/apex/amp/lists/tensor_overrides.py -> build/bdist.linux-x86_64/wheel/apex/amp/lists
copying build/lib.linux-x86_64-3.8/apex/amp/wrap.py -> build/bdist.linux-x86_64/wheel/apex/amp
creating build/bdist.linux-x86_64/wheel/apex/reparameterization
copying build/lib.linux-x86_64-3.8/apex/reparameterization/reparameterization.py -> build/bdist.linux-x86_64/wheel/apex/reparameterization
copying build/lib.linux-x86_64-3.8/apex/reparameterization/__init__.py -> build/bdist.linux-x86_64/wheel/apex/reparameterization
copying build/lib.linux-x86_64-3.8/apex/reparameterization/weight_norm.py -> build/bdist.linux-x86_64/wheel/apex/reparameterization
creating build/bdist.linux-x86_64/wheel/apex/fp16_utils
copying build/lib.linux-x86_64-3.8/apex/fp16_utils/__init__.py -> build/bdist.linux-x86_64/wheel/apex/fp16_utils
copying build/lib.linux-x86_64-3.8/apex/fp16_utils/loss_scaler.py -> build/bdist.linux-x86_64/wheel/apex/fp16_utils
copying build/lib.linux-x86_64-3.8/apex/fp16_utils/fp16_optimizer.py -> build/bdist.linux-x86_64/wheel/apex/fp16_utils
copying build/lib.linux-x86_64-3.8/apex/fp16_utils/fp16util.py -> build/bdist.linux-x86_64/wheel/apex/fp16_utils
copying build/lib.linux-x86_64-3.8/apex_C.cpython-38-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/wheel
running install_egg_info
running egg_info
creating apex.egg-info
writing apex.egg-info/PKG-INFO
writing dependency_links to apex.egg-info/dependency_links.txt
writing top-level names to apex.egg-info/top_level.txt
writing manifest file 'apex.egg-info/SOURCES.txt'
'license_file' option was not specified
reading manifest file 'apex.egg-info/SOURCES.txt'
writing manifest file 'apex.egg-info/SOURCES.txt'
Copying apex.egg-info to build/bdist.linux-x86_64/wheel/apex-0.1.egg-info
Copying dependency_links.txt to build/bdist.linux-x86_64/wheel/apex-0.1.egg-info/dependency_links.txt
Copying PKG-INFO to build/bdist.linux-x86_64/wheel/apex-0.1.egg-info/PKG-INFO
Copying top_level.txt to build/bdist.linux-x86_64/wheel/apex-0.1.egg-info/top_level.txt
Copying SOURCES.txt to build/bdist.linux-x86_64/wheel/apex-0.1.egg-info/SOURCES.txt
running install_scripts
adding license file "LICENSE" (matched pattern "LICEN[CS]E*")
creating build/bdist.linux-x86_64/wheel/apex-0.1.dist-info/WHEEL
creating 'dist/apex-0.1-cp38-cp38-linux_x86_64.whl' and adding 'build/bdist.linux-x86_64/wheel' to it
adding 'amp_C.cpython-38-x86_64-linux-gnu.so'
adding 'apex_C.cpython-38-x86_64-linux-gnu.so'
adding 'fused_layer_norm_cuda.cpython-38-x86_64-linux-gnu.so'
adding 'syncbn.cpython-38-x86_64-linux-gnu.so'
adding 'apex/__init__.py'
adding 'apex/RNN/RNNBackend.py'
adding 'apex/RNN/__init__.py'
adding 'apex/RNN/cells.py'
adding 'apex/RNN/models.py'
adding 'apex/amp/__init__.py'
adding 'apex/amp/__version__.py'
adding 'apex/amp/_amp_state.py'
adding 'apex/amp/_initialize.py'
adding 'apex/amp/_process_optimizer.py'
adding 'apex/amp/amp.py'
adding 'apex/amp/compat.py'
adding 'apex/amp/frontend.py'
adding 'apex/amp/handle.py'
adding 'apex/amp/opt.py'
adding 'apex/amp/rnn_compat.py'
adding 'apex/amp/scaler.py'
adding 'apex/amp/utils.py'
adding 'apex/amp/wrap.py'
adding 'apex/amp/lists/__init__.py'
adding 'apex/amp/lists/functional_overrides.py'
adding 'apex/amp/lists/tensor_overrides.py'
adding 'apex/amp/lists/torch_overrides.py'
adding 'apex/contrib/__init__.py'
adding 'apex/contrib/groupbn/__init__.py'
adding 'apex/contrib/groupbn/batch_norm.py'
adding 'apex/contrib/optimizers/__init__.py'
adding 'apex/contrib/optimizers/fp16_optimizer.py'
adding 'apex/contrib/optimizers/fused_adam.py'
adding 'apex/contrib/optimizers/fused_sgd.py'
adding 'apex/contrib/xentropy/__init__.py'
adding 'apex/contrib/xentropy/softmax_xentropy.py'
adding 'apex/fp16_utils/__init__.py'
adding 'apex/fp16_utils/fp16_optimizer.py'
adding 'apex/fp16_utils/fp16util.py'
adding 'apex/fp16_utils/loss_scaler.py'
adding 'apex/multi_tensor_apply/__init__.py'
adding 'apex/multi_tensor_apply/multi_tensor_apply.py'
adding 'apex/normalization/__init__.py'
adding 'apex/normalization/fused_layer_norm.py'
adding 'apex/optimizers/__init__.py'
adding 'apex/optimizers/fused_adam.py'
adding 'apex/optimizers/fused_lamb.py'
adding 'apex/optimizers/fused_novograd.py'
adding 'apex/optimizers/fused_sgd.py'
adding 'apex/parallel/LARC.py'
adding 'apex/parallel/__init__.py'
adding 'apex/parallel/distributed.py'
adding 'apex/parallel/multiproc.py'
adding 'apex/parallel/optimized_sync_batchnorm.py'
adding 'apex/parallel/optimized_sync_batchnorm_kernel.py'
adding 'apex/parallel/sync_batchnorm.py'
adding 'apex/parallel/sync_batchnorm_kernel.py'
adding 'apex/pyprof/__init__.py'
adding 'apex/pyprof/nvtx/__init__.py'
adding 'apex/pyprof/nvtx/nvmarker.py'
adding 'apex/pyprof/parse/__init__.py'
adding 'apex/pyprof/parse/__main__.py'
adding 'apex/pyprof/parse/db.py'
adding 'apex/pyprof/parse/kernel.py'
adding 'apex/pyprof/parse/nvvp.py'
adding 'apex/pyprof/parse/parse.py'
adding 'apex/pyprof/prof/__init__.py'
adding 'apex/pyprof/prof/__main__.py'
adding 'apex/pyprof/prof/activation.py'
adding 'apex/pyprof/prof/base.py'
adding 'apex/pyprof/prof/blas.py'
adding 'apex/pyprof/prof/conv.py'
adding 'apex/pyprof/prof/convert.py'
adding 'apex/pyprof/prof/data.py'
adding 'apex/pyprof/prof/dropout.py'
adding 'apex/pyprof/prof/embedding.py'
adding 'apex/pyprof/prof/index_slice_join_mutate.py'
adding 'apex/pyprof/prof/linear.py'
adding 'apex/pyprof/prof/loss.py'
adding 'apex/pyprof/prof/misc.py'
adding 'apex/pyprof/prof/normalization.py'
adding 'apex/pyprof/prof/optim.py'
adding 'apex/pyprof/prof/output.py'
adding 'apex/pyprof/prof/pointwise.py'
adding 'apex/pyprof/prof/pooling.py'
adding 'apex/pyprof/prof/prof.py'
adding 'apex/pyprof/prof/randomSample.py'
adding 'apex/pyprof/prof/recurrentCell.py'
adding 'apex/pyprof/prof/reduction.py'
adding 'apex/pyprof/prof/softmax.py'
adding 'apex/pyprof/prof/usage.py'
adding 'apex/pyprof/prof/utility.py'
adding 'apex/reparameterization/__init__.py'
adding 'apex/reparameterization/reparameterization.py'
adding 'apex/reparameterization/weight_norm.py'
adding 'apex-0.1.dist-info/LICENSE'
adding 'apex-0.1.dist-info/METADATA'
adding 'apex-0.1.dist-info/WHEEL'
adding 'apex-0.1.dist-info/top_level.txt'
adding 'apex-0.1.dist-info/RECORD'
removing build/bdist.linux-x86_64/wheel
/home/user/code/DeepSpeed
Installing apex locally so that deepspeed will build
Found existing installation: apex 0.1
Uninstalling apex-0.1:
  Successfully uninstalled apex-0.1
Non-user install because user site-packages disabled
Created temporary directory: /tmp/pip-ephem-wheel-cache-4neh0e_j
Created temporary directory: /tmp/pip-req-tracker-m90min1b
Initialized build tracking at /tmp/pip-req-tracker-m90min1b
Created build tracker: /tmp/pip-req-tracker-m90min1b
Entered build tracker: /tmp/pip-req-tracker-m90min1b
Created temporary directory: /tmp/pip-install-22j4r8a3
Processing ./third_party/apex/dist/apex-0.1-cp38-cp38-linux_x86_64.whl
  Added apex==0.1 from file:///home/user/code/DeepSpeed/third_party/apex/dist/apex-0.1-cp38-cp38-linux_x86_64.whl to build tracker '/tmp/pip-req-tracker-m90min1b'
  Removed apex==0.1 from file:///home/user/code/DeepSpeed/third_party/apex/dist/apex-0.1-cp38-cp38-linux_x86_64.whl from build tracker '/tmp/pip-req-tracker-m90min1b'
Installing collected packages: apex
  Created temporary directory: /tmp/pip-unpacked-wheel-aq_him25

Successfully installed apex-0.1
Cleaning up...
Removed build tracker: '/tmp/pip-req-tracker-m90min1b'
Building deepspeed wheel
./install.sh: line 196: 11306 Floating point exception(core dumped) python setup.py -v bdist_wheel
Error on line 195
Fail to install deepspeed

@drfinkus
Copy link
Author

drfinkus commented Sep 23, 2020

In addition to the logs above, I can confirm the same issue as @rople380

(env) user@desktop:~/code/DeepSpeed$ python -c "import cpufeature; cpufeature.print_features()"
Floating point exception (core dumped)

Python: 3.8.2

Also, I note that @rople380 seems to have Anaconda installed. I have miniconda installed, FWIW.

I found these issues which may or may not be relevant here:
pytorch/pytorch#32630
pytorch/pytorch#34295

@ShadenSmith
Copy link
Contributor

Thanks @drfinkus and @rople380 for the help with diagnosis. It seems that cpufeature does not play nicely with all systems. I'm working on a fix to remove it.

@drfinkus
Copy link
Author

Out of curiosity, @rople380, what CPU do you have? AMD Threadripper 1920x here.

@tjruwase
Copy link
Contributor

@drfinkus and @rople380 can you please check if you still encounter this issue. This has been addressed by PR #450.

@drfinkus
Copy link
Author

@tjruwase thanks for the help, but unfortunately, I get the same error as before. I wonder if @rople380 got it working.

Attempting to remove deepspeed/git_version_info_installed.py
Attempting to remove dist
Attempting to remove build
Attempting to remove deepspeed.egg-info
Attempting to remove third_party/apex/dist
Attempting to remove third_party/apex/build
Attempting to remove third_party/apex/apex.egg-info
No hostfile exists at /job/hostfile, installing locally
Using pip 20.2.3 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/pip (python 3.7)
Non-user install because site-packages writeable
Created temporary directory: /tmp/pip-ephem-wheel-cache-vw_y677y
Created temporary directory: /tmp/pip-req-tracker-nne6cy1x
Initialized build tracking at /tmp/pip-req-tracker-nne6cy1x
Created build tracker: /tmp/pip-req-tracker-nne6cy1x
Entered build tracker: /tmp/pip-req-tracker-nne6cy1x
Created temporary directory: /tmp/pip-install-vspuj3i4
Requirement already satisfied: torch>=1.2 in /home/user/miniconda3/envs/ait/lib/python3.7/site-packages (from -r requirements/requirements.txt (line 1)) (1.6.0)
Requirement already satisfied: torchvision>=0.4.0 in /home/user/miniconda3/envs/ait/lib/python3.7/site-packages (from -r requirements/requirements.txt (line 2)) (0.7.0)
Requirement already satisfied: tqdm in /home/user/miniconda3/envs/ait/lib/python3.7/site-packages (from -r requirements/requirements.txt (line 3)) (4.50.2)
Requirement already satisfied: psutil in /home/user/miniconda3/envs/ait/lib/python3.7/site-packages (from -r requirements/requirements.txt (line 4)) (5.7.2)
Requirement already satisfied: cpufeature in /home/user/miniconda3/envs/ait/lib/python3.7/site-packages (from -r requirements/requirements.txt (line 5)) (0.1.1)
Requirement already satisfied: tensorboardX==1.8 in /home/user/miniconda3/envs/ait/lib/python3.7/site-packages (from -r requirements/requirements.txt (line 6)) (1.8)
Requirement already satisfied: numpy in /home/user/miniconda3/envs/ait/lib/python3.7/site-packages (from torch>=1.2->-r requirements/requirements.txt (line 1)) (1.19.2)
Requirement already satisfied: future in /home/user/miniconda3/envs/ait/lib/python3.7/site-packages (from torch>=1.2->-r requirements/requirements.txt (line 1)) (0.18.2)
Requirement already satisfied: pillow>=4.1.1 in /home/user/miniconda3/envs/ait/lib/python3.7/site-packages (from torchvision>=0.4.0->-r requirements/requirements.txt (line 2)) (8.0.0)
Requirement already satisfied: protobuf>=3.2.0 in /home/user/miniconda3/envs/ait/lib/python3.7/site-packages (from tensorboardX==1.8->-r requirements/requirements.txt (line 6)) (3.13.0)
Requirement already satisfied: six in /home/user/miniconda3/envs/ait/lib/python3.7/site-packages (from tensorboardX==1.8->-r requirements/requirements.txt (line 6)) (1.15.0)
Requirement already satisfied: setuptools in /home/user/miniconda3/envs/ait/lib/python3.7/site-packages (from protobuf>=3.2.0->tensorboardX==1.8->-r requirements/requirements.txt (line 6)) (50.3.0.post20201006)
Removed build tracker: '/tmp/pip-req-tracker-nne6cy1x'
Checking out sub-module(s)
Submodule 'DeepSpeedExamples' (https://github.com/microsoft/DeepSpeedExamples) registered for path 'DeepSpeedExamples'
Submodule 'third_party/apex' (https://github.com/NVIDIA/apex.git) registered for path 'third_party/apex'
Cloning into '/home/user/code/DeepSpeed/DeepSpeedExamples'...
Cloning into '/home/user/code/DeepSpeed/third_party/apex'...
Submodule path 'DeepSpeedExamples': checked out 'ba63ad0fa861d28b3b33bc2c20f702647403e258'
Submodule path 'third_party/apex': checked out '494f8ab3fc1b0b26949a3bcbb2bcac78008d48c1'
Building apex wheel
torch.__version__  =  1.6.0
setup.py:43: UserWarning: Option --pyprof not specified. Not installing PyProf dependencies!
  warnings.warn("Option --pyprof not specified. Not installing PyProf dependencies!")

Compiling cuda extensions with
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89
from /usr/local/cuda/bin

running bdist_wheel
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/utils/cpp_extension.py:335: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
  warnings.warn(msg.format('we could not find ninja.'))
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.7
creating build/lib.linux-x86_64-3.7/apex
copying apex/__init__.py -> build/lib.linux-x86_64-3.7/apex
creating build/lib.linux-x86_64-3.7/apex/contrib
copying apex/contrib/__init__.py -> build/lib.linux-x86_64-3.7/apex/contrib
creating build/lib.linux-x86_64-3.7/apex/parallel
copying apex/parallel/LARC.py -> build/lib.linux-x86_64-3.7/apex/parallel
copying apex/parallel/distributed.py -> build/lib.linux-x86_64-3.7/apex/parallel
copying apex/parallel/__init__.py -> build/lib.linux-x86_64-3.7/apex/parallel
copying apex/parallel/optimized_sync_batchnorm.py -> build/lib.linux-x86_64-3.7/apex/parallel
copying apex/parallel/sync_batchnorm.py -> build/lib.linux-x86_64-3.7/apex/parallel
copying apex/parallel/multiproc.py -> build/lib.linux-x86_64-3.7/apex/parallel
copying apex/parallel/optimized_sync_batchnorm_kernel.py -> build/lib.linux-x86_64-3.7/apex/parallel
copying apex/parallel/sync_batchnorm_kernel.py -> build/lib.linux-x86_64-3.7/apex/parallel
creating build/lib.linux-x86_64-3.7/apex/multi_tensor_apply
copying apex/multi_tensor_apply/__init__.py -> build/lib.linux-x86_64-3.7/apex/multi_tensor_apply
copying apex/multi_tensor_apply/multi_tensor_apply.py -> build/lib.linux-x86_64-3.7/apex/multi_tensor_apply
creating build/lib.linux-x86_64-3.7/apex/normalization
copying apex/normalization/fused_layer_norm.py -> build/lib.linux-x86_64-3.7/apex/normalization
copying apex/normalization/__init__.py -> build/lib.linux-x86_64-3.7/apex/normalization
creating build/lib.linux-x86_64-3.7/apex/optimizers
copying apex/optimizers/fused_lamb.py -> build/lib.linux-x86_64-3.7/apex/optimizers
copying apex/optimizers/fused_novograd.py -> build/lib.linux-x86_64-3.7/apex/optimizers
copying apex/optimizers/__init__.py -> build/lib.linux-x86_64-3.7/apex/optimizers
copying apex/optimizers/fused_sgd.py -> build/lib.linux-x86_64-3.7/apex/optimizers
copying apex/optimizers/fused_adam.py -> build/lib.linux-x86_64-3.7/apex/optimizers
creating build/lib.linux-x86_64-3.7/apex/pyprof
copying apex/pyprof/__init__.py -> build/lib.linux-x86_64-3.7/apex/pyprof
creating build/lib.linux-x86_64-3.7/apex/RNN
copying apex/RNN/cells.py -> build/lib.linux-x86_64-3.7/apex/RNN
copying apex/RNN/__init__.py -> build/lib.linux-x86_64-3.7/apex/RNN
copying apex/RNN/RNNBackend.py -> build/lib.linux-x86_64-3.7/apex/RNN
copying apex/RNN/models.py -> build/lib.linux-x86_64-3.7/apex/RNN
creating build/lib.linux-x86_64-3.7/apex/amp
copying apex/amp/utils.py -> build/lib.linux-x86_64-3.7/apex/amp
copying apex/amp/rnn_compat.py -> build/lib.linux-x86_64-3.7/apex/amp
copying apex/amp/frontend.py -> build/lib.linux-x86_64-3.7/apex/amp
copying apex/amp/_initialize.py -> build/lib.linux-x86_64-3.7/apex/amp
copying apex/amp/amp.py -> build/lib.linux-x86_64-3.7/apex/amp
copying apex/amp/_process_optimizer.py -> build/lib.linux-x86_64-3.7/apex/amp
copying apex/amp/opt.py -> build/lib.linux-x86_64-3.7/apex/amp
copying apex/amp/__init__.py -> build/lib.linux-x86_64-3.7/apex/amp
copying apex/amp/handle.py -> build/lib.linux-x86_64-3.7/apex/amp
copying apex/amp/scaler.py -> build/lib.linux-x86_64-3.7/apex/amp
copying apex/amp/__version__.py -> build/lib.linux-x86_64-3.7/apex/amp
copying apex/amp/compat.py -> build/lib.linux-x86_64-3.7/apex/amp
copying apex/amp/_amp_state.py -> build/lib.linux-x86_64-3.7/apex/amp
copying apex/amp/wrap.py -> build/lib.linux-x86_64-3.7/apex/amp
creating build/lib.linux-x86_64-3.7/apex/reparameterization
copying apex/reparameterization/reparameterization.py -> build/lib.linux-x86_64-3.7/apex/reparameterization
copying apex/reparameterization/__init__.py -> build/lib.linux-x86_64-3.7/apex/reparameterization
copying apex/reparameterization/weight_norm.py -> build/lib.linux-x86_64-3.7/apex/reparameterization
creating build/lib.linux-x86_64-3.7/apex/fp16_utils
copying apex/fp16_utils/__init__.py -> build/lib.linux-x86_64-3.7/apex/fp16_utils
copying apex/fp16_utils/loss_scaler.py -> build/lib.linux-x86_64-3.7/apex/fp16_utils
copying apex/fp16_utils/fp16_optimizer.py -> build/lib.linux-x86_64-3.7/apex/fp16_utils
copying apex/fp16_utils/fp16util.py -> build/lib.linux-x86_64-3.7/apex/fp16_utils
creating build/lib.linux-x86_64-3.7/apex/contrib/xentropy
copying apex/contrib/xentropy/__init__.py -> build/lib.linux-x86_64-3.7/apex/contrib/xentropy
copying apex/contrib/xentropy/softmax_xentropy.py -> build/lib.linux-x86_64-3.7/apex/contrib/xentropy
creating build/lib.linux-x86_64-3.7/apex/contrib/optimizers
copying apex/contrib/optimizers/__init__.py -> build/lib.linux-x86_64-3.7/apex/contrib/optimizers
copying apex/contrib/optimizers/fp16_optimizer.py -> build/lib.linux-x86_64-3.7/apex/contrib/optimizers
copying apex/contrib/optimizers/fused_sgd.py -> build/lib.linux-x86_64-3.7/apex/contrib/optimizers
copying apex/contrib/optimizers/fused_adam.py -> build/lib.linux-x86_64-3.7/apex/contrib/optimizers
creating build/lib.linux-x86_64-3.7/apex/contrib/groupbn
copying apex/contrib/groupbn/__init__.py -> build/lib.linux-x86_64-3.7/apex/contrib/groupbn
copying apex/contrib/groupbn/batch_norm.py -> build/lib.linux-x86_64-3.7/apex/contrib/groupbn
creating build/lib.linux-x86_64-3.7/apex/pyprof/nvtx
copying apex/pyprof/nvtx/__init__.py -> build/lib.linux-x86_64-3.7/apex/pyprof/nvtx
copying apex/pyprof/nvtx/nvmarker.py -> build/lib.linux-x86_64-3.7/apex/pyprof/nvtx
creating build/lib.linux-x86_64-3.7/apex/pyprof/parse
copying apex/pyprof/parse/nvvp.py -> build/lib.linux-x86_64-3.7/apex/pyprof/parse
copying apex/pyprof/parse/__main__.py -> build/lib.linux-x86_64-3.7/apex/pyprof/parse
copying apex/pyprof/parse/parse.py -> build/lib.linux-x86_64-3.7/apex/pyprof/parse
copying apex/pyprof/parse/__init__.py -> build/lib.linux-x86_64-3.7/apex/pyprof/parse
copying apex/pyprof/parse/db.py -> build/lib.linux-x86_64-3.7/apex/pyprof/parse
copying apex/pyprof/parse/kernel.py -> build/lib.linux-x86_64-3.7/apex/pyprof/parse
creating build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/loss.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/optim.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/output.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/misc.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/__main__.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/usage.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/index_slice_join_mutate.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/pointwise.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/blas.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/embedding.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/reduction.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/prof.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/utility.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/__init__.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/softmax.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/recurrentCell.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/randomSample.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/conv.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/data.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/pooling.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/activation.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/linear.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/normalization.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/base.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/dropout.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
copying apex/pyprof/prof/convert.py -> build/lib.linux-x86_64-3.7/apex/pyprof/prof
creating build/lib.linux-x86_64-3.7/apex/amp/lists
copying apex/amp/lists/torch_overrides.py -> build/lib.linux-x86_64-3.7/apex/amp/lists
copying apex/amp/lists/__init__.py -> build/lib.linux-x86_64-3.7/apex/amp/lists
copying apex/amp/lists/functional_overrides.py -> build/lib.linux-x86_64-3.7/apex/amp/lists
copying apex/amp/lists/tensor_overrides.py -> build/lib.linux-x86_64-3.7/apex/amp/lists
running build_ext
building 'apex_C' extension
creating build/temp.linux-x86_64-3.7
creating build/temp.linux-x86_64-3.7/csrc
gcc -pthread -B /home/user/miniconda3/envs/ait/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/TH -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/THC -I/home/user/miniconda3/envs/ait/include/python3.7m -c csrc/flatten_unflatten.cpp -o build/temp.linux-x86_64-3.7/csrc/flatten_unflatten.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=apex_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
In file included from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/Parallel.h:149,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/utils.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:5,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/nn.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/all.h:7,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/extension.h:4,
                 from csrc/flatten_unflatten.cpp:1:
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/ParallelOpenMP.h:84: warning: ignoring #pragma omp parallel [-Wunknown-pragmas]
 #pragma omp parallel for if ((end - begin) >= grain_size)
 
In file included from csrc/flatten_unflatten.cpp:2:
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/utils/tensor_flatten.h: In member function ‘at::DeprecatedTypeProperties& torch::utils::TensorGroup::type()’:
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/utils/tensor_flatten.h:36:28: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
     return tensors[0].type();
                            ^
In file included from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/Context.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/ATen.h:5,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/extension.h:4,
                 from csrc/flatten_unflatten.cpp:1:
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/core/TensorBody.h:268:30: note: declared here
   DeprecatedTypeProperties & type() const {
                              ^~~~
g++ -pthread -shared -B /home/user/miniconda3/envs/ait/compiler_compat -L/home/user/miniconda3/envs/ait/lib -Wl,-rpath=/home/user/miniconda3/envs/ait/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.7/csrc/flatten_unflatten.o -L/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -o build/lib.linux-x86_64-3.7/apex_C.cpython-37m-x86_64-linux-gnu.so
building 'amp_C' extension
gcc -pthread -B /home/user/miniconda3/envs/ait/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/TH -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/user/miniconda3/envs/ait/include/python3.7m -c csrc/amp_C_frontend.cpp -o build/temp.linux-x86_64-3.7/csrc/amp_C_frontend.o -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
In file included from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/Parallel.h:149,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/utils.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:5,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/nn.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/all.h:7,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/extension.h:4,
                 from csrc/amp_C_frontend.cpp:1:
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/ParallelOpenMP.h:84: warning: ignoring #pragma omp parallel [-Wunknown-pragmas]
 #pragma omp parallel for if ((end - begin) >= grain_size)
 
/usr/local/cuda/bin/nvcc -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/TH -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/user/miniconda3/envs/ait/include/python3.7m -c csrc/multi_tensor_sgd_kernel.cu -o build/temp.linux-x86_64-3.7/csrc/multi_tensor_sgd_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -std=c++14
/usr/local/cuda/bin/nvcc -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/TH -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/user/miniconda3/envs/ait/include/python3.7m -c csrc/multi_tensor_scale_kernel.cu -o build/temp.linux-x86_64-3.7/csrc/multi_tensor_scale_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -std=c++14
/usr/local/cuda/bin/nvcc -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/TH -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/user/miniconda3/envs/ait/include/python3.7m -c csrc/multi_tensor_axpby_kernel.cu -o build/temp.linux-x86_64-3.7/csrc/multi_tensor_axpby_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -std=c++14
/usr/local/cuda/bin/nvcc -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/TH -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/user/miniconda3/envs/ait/include/python3.7m -c csrc/multi_tensor_l2norm_kernel.cu -o build/temp.linux-x86_64-3.7/csrc/multi_tensor_l2norm_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -std=c++14
/usr/local/cuda/bin/nvcc -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/TH -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/user/miniconda3/envs/ait/include/python3.7m -c csrc/multi_tensor_lamb_stage_1.cu -o build/temp.linux-x86_64-3.7/csrc/multi_tensor_lamb_stage_1.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -std=c++14
/usr/local/cuda/bin/nvcc -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/TH -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/user/miniconda3/envs/ait/include/python3.7m -c csrc/multi_tensor_lamb_stage_2.cu -o build/temp.linux-x86_64-3.7/csrc/multi_tensor_lamb_stage_2.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -std=c++14
/usr/local/cuda/bin/nvcc -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/TH -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/user/miniconda3/envs/ait/include/python3.7m -c csrc/multi_tensor_adam.cu -o build/temp.linux-x86_64-3.7/csrc/multi_tensor_adam.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -std=c++14
/usr/local/cuda/bin/nvcc -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/TH -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/user/miniconda3/envs/ait/include/python3.7m -c csrc/multi_tensor_novograd.cu -o build/temp.linux-x86_64-3.7/csrc/multi_tensor_novograd.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -std=c++14
/usr/local/cuda/bin/nvcc -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/TH -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/user/miniconda3/envs/ait/include/python3.7m -c csrc/multi_tensor_lamb.cu -o build/temp.linux-x86_64-3.7/csrc/multi_tensor_lamb.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -std=c++14
g++ -pthread -shared -B /home/user/miniconda3/envs/ait/compiler_compat -L/home/user/miniconda3/envs/ait/lib -Wl,-rpath=/home/user/miniconda3/envs/ait/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.7/csrc/amp_C_frontend.o build/temp.linux-x86_64-3.7/csrc/multi_tensor_sgd_kernel.o build/temp.linux-x86_64-3.7/csrc/multi_tensor_scale_kernel.o build/temp.linux-x86_64-3.7/csrc/multi_tensor_axpby_kernel.o build/temp.linux-x86_64-3.7/csrc/multi_tensor_l2norm_kernel.o build/temp.linux-x86_64-3.7/csrc/multi_tensor_lamb_stage_1.o build/temp.linux-x86_64-3.7/csrc/multi_tensor_lamb_stage_2.o build/temp.linux-x86_64-3.7/csrc/multi_tensor_adam.o build/temp.linux-x86_64-3.7/csrc/multi_tensor_novograd.o build/temp.linux-x86_64-3.7/csrc/multi_tensor_lamb.o -L/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-3.7/amp_C.cpython-37m-x86_64-linux-gnu.so
building 'syncbn' extension
gcc -pthread -B /home/user/miniconda3/envs/ait/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/TH -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/user/miniconda3/envs/ait/include/python3.7m -c csrc/syncbn.cpp -o build/temp.linux-x86_64-3.7/csrc/syncbn.o -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=syncbn -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
In file included from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/Parallel.h:149,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/utils.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:5,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/nn.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/all.h:7,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/extension.h:4,
                 from csrc/syncbn.cpp:1:
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/ParallelOpenMP.h:84: warning: ignoring #pragma omp parallel [-Wunknown-pragmas]
 #pragma omp parallel for if ((end - begin) >= grain_size)
 
/usr/local/cuda/bin/nvcc -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/TH -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/user/miniconda3/envs/ait/include/python3.7m -c csrc/welford.cu -o build/temp.linux-x86_64-3.7/csrc/welford.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=syncbn -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -std=c++14
g++ -pthread -shared -B /home/user/miniconda3/envs/ait/compiler_compat -L/home/user/miniconda3/envs/ait/lib -Wl,-rpath=/home/user/miniconda3/envs/ait/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.7/csrc/syncbn.o build/temp.linux-x86_64-3.7/csrc/welford.o -L/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-3.7/syncbn.cpython-37m-x86_64-linux-gnu.so
building 'fused_layer_norm_cuda' extension
gcc -pthread -B /home/user/miniconda3/envs/ait/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/TH -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/user/miniconda3/envs/ait/include/python3.7m -c csrc/layer_norm_cuda.cpp -o build/temp.linux-x86_64-3.7/csrc/layer_norm_cuda.o -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=fused_layer_norm_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
In file included from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/Parallel.h:149,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/utils.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:5,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/nn.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/all.h:7,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/ParallelOpenMP.h:84: warning: ignoring #pragma omp parallel [-Wunknown-pragmas]
 #pragma omp parallel for if ((end - begin) >= grain_size)
 
In file included from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/core/DeviceType.h:8,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/core/Device.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/ATen.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
csrc/layer_norm_cuda.cpp: In function ‘std::vector<at::Tensor> layer_norm(at::Tensor, c10::IntArrayRef, double)’:
csrc/layer_norm_cuda.cpp:117:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
 #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
                                          ^
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/macros/Macros.h:146:65: note: in definition of macro ‘C10_UNLIKELY’
 #define C10_UNLIKELY(expr)  (__builtin_expect(static_cast<bool>(expr), 0))
                                                                 ^~~~
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/util/Exception.h:330:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
   if (C10_UNLIKELY_OR_CONST(!(cond))) {                               \
       ^~~~~~~~~~~~~~~~~~~~~
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/util/Exception.h:318:3: note: in expansion of macro ‘TORCH_CHECK_WITH_MSG’
   TORCH_CHECK_WITH_MSG(error_t, cond, "", __VA_ARGS__)
   ^~~~~~~~~~~~~~~~~~~~
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/util/Exception.h:341:32: note: in expansion of macro ‘TORCH_CHECK_WITH’
 #define TORCH_CHECK(cond, ...) TORCH_CHECK_WITH(Error, cond, __VA_ARGS__)
                                ^~~~~~~~~~~~~~~~
csrc/layer_norm_cuda.cpp:117:23: note: in expansion of macro ‘TORCH_CHECK’
 #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
                       ^~~~~~~~~~~
csrc/layer_norm_cuda.cpp:119:24: note: in expansion of macro ‘CHECK_CUDA’
 #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
                        ^~~~~~~~~~
csrc/layer_norm_cuda.cpp:129:3: note: in expansion of macro ‘CHECK_INPUT’
   CHECK_INPUT(input);
   ^~~~~~~~~~~
In file included from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/Context.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/ATen.h:5,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/core/TensorBody.h:268:30: note: declared here
   DeprecatedTypeProperties & type() const {
                              ^~~~
In file included from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/core/DeviceType.h:8,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/core/Device.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/ATen.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
csrc/layer_norm_cuda.cpp: In function ‘std::vector<at::Tensor> layer_norm_affine(at::Tensor, c10::IntArrayRef, at::Tensor, at::Tensor, double)’:
csrc/layer_norm_cuda.cpp:117:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
 #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
                                          ^
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/macros/Macros.h:146:65: note: in definition of macro ‘C10_UNLIKELY’
 #define C10_UNLIKELY(expr)  (__builtin_expect(static_cast<bool>(expr), 0))
                                                                 ^~~~
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/util/Exception.h:330:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
   if (C10_UNLIKELY_OR_CONST(!(cond))) {                               \
       ^~~~~~~~~~~~~~~~~~~~~
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/util/Exception.h:318:3: note: in expansion of macro ‘TORCH_CHECK_WITH_MSG’
   TORCH_CHECK_WITH_MSG(error_t, cond, "", __VA_ARGS__)
   ^~~~~~~~~~~~~~~~~~~~
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/util/Exception.h:341:32: note: in expansion of macro ‘TORCH_CHECK_WITH’
 #define TORCH_CHECK(cond, ...) TORCH_CHECK_WITH(Error, cond, __VA_ARGS__)
                                ^~~~~~~~~~~~~~~~
csrc/layer_norm_cuda.cpp:117:23: note: in expansion of macro ‘TORCH_CHECK’
 #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
                       ^~~~~~~~~~~
csrc/layer_norm_cuda.cpp:119:24: note: in expansion of macro ‘CHECK_CUDA’
 #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
                        ^~~~~~~~~~
csrc/layer_norm_cuda.cpp:149:3: note: in expansion of macro ‘CHECK_INPUT’
   CHECK_INPUT(input);
   ^~~~~~~~~~~
In file included from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/Context.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/ATen.h:5,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/core/TensorBody.h:268:30: note: declared here
   DeprecatedTypeProperties & type() const {
                              ^~~~
In file included from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/core/DeviceType.h:8,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/core/Device.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/ATen.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
csrc/layer_norm_cuda.cpp:117:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
 #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
                                          ^
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/macros/Macros.h:146:65: note: in definition of macro ‘C10_UNLIKELY’
 #define C10_UNLIKELY(expr)  (__builtin_expect(static_cast<bool>(expr), 0))
                                                                 ^~~~
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/util/Exception.h:330:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
   if (C10_UNLIKELY_OR_CONST(!(cond))) {                               \
       ^~~~~~~~~~~~~~~~~~~~~
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/util/Exception.h:318:3: note: in expansion of macro ‘TORCH_CHECK_WITH_MSG’
   TORCH_CHECK_WITH_MSG(error_t, cond, "", __VA_ARGS__)
   ^~~~~~~~~~~~~~~~~~~~
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/util/Exception.h:341:32: note: in expansion of macro ‘TORCH_CHECK_WITH’
 #define TORCH_CHECK(cond, ...) TORCH_CHECK_WITH(Error, cond, __VA_ARGS__)
                                ^~~~~~~~~~~~~~~~
csrc/layer_norm_cuda.cpp:117:23: note: in expansion of macro ‘TORCH_CHECK’
 #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
                       ^~~~~~~~~~~
csrc/layer_norm_cuda.cpp:119:24: note: in expansion of macro ‘CHECK_CUDA’
 #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
                        ^~~~~~~~~~
csrc/layer_norm_cuda.cpp:150:3: note: in expansion of macro ‘CHECK_INPUT’
   CHECK_INPUT(gamma);
   ^~~~~~~~~~~
In file included from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/Context.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/ATen.h:5,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/core/TensorBody.h:268:30: note: declared here
   DeprecatedTypeProperties & type() const {
                              ^~~~
In file included from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/core/DeviceType.h:8,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/core/Device.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/ATen.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
csrc/layer_norm_cuda.cpp:117:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
 #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
                                          ^
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/macros/Macros.h:146:65: note: in definition of macro ‘C10_UNLIKELY’
 #define C10_UNLIKELY(expr)  (__builtin_expect(static_cast<bool>(expr), 0))
                                                                 ^~~~
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/util/Exception.h:330:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
   if (C10_UNLIKELY_OR_CONST(!(cond))) {                               \
       ^~~~~~~~~~~~~~~~~~~~~
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/util/Exception.h:318:3: note: in expansion of macro ‘TORCH_CHECK_WITH_MSG’
   TORCH_CHECK_WITH_MSG(error_t, cond, "", __VA_ARGS__)
   ^~~~~~~~~~~~~~~~~~~~
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/util/Exception.h:341:32: note: in expansion of macro ‘TORCH_CHECK_WITH’
 #define TORCH_CHECK(cond, ...) TORCH_CHECK_WITH(Error, cond, __VA_ARGS__)
                                ^~~~~~~~~~~~~~~~
csrc/layer_norm_cuda.cpp:117:23: note: in expansion of macro ‘TORCH_CHECK’
 #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
                       ^~~~~~~~~~~
csrc/layer_norm_cuda.cpp:119:24: note: in expansion of macro ‘CHECK_CUDA’
 #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
                        ^~~~~~~~~~
csrc/layer_norm_cuda.cpp:151:3: note: in expansion of macro ‘CHECK_INPUT’
   CHECK_INPUT(beta);
   ^~~~~~~~~~~
In file included from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/Context.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/ATen.h:5,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/core/TensorBody.h:268:30: note: declared here
   DeprecatedTypeProperties & type() const {
                              ^~~~
In file included from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/core/DeviceType.h:8,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/core/Device.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/ATen.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
csrc/layer_norm_cuda.cpp: In function ‘at::Tensor layer_norm_gradient(at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::IntArrayRef, double)’:
csrc/layer_norm_cuda.cpp:117:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
 #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
                                          ^
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/macros/Macros.h:146:65: note: in definition of macro ‘C10_UNLIKELY’
 #define C10_UNLIKELY(expr)  (__builtin_expect(static_cast<bool>(expr), 0))
                                                                 ^~~~
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/util/Exception.h:330:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
   if (C10_UNLIKELY_OR_CONST(!(cond))) {                               \
       ^~~~~~~~~~~~~~~~~~~~~
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/util/Exception.h:318:3: note: in expansion of macro ‘TORCH_CHECK_WITH_MSG’
   TORCH_CHECK_WITH_MSG(error_t, cond, "", __VA_ARGS__)
   ^~~~~~~~~~~~~~~~~~~~
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/util/Exception.h:341:32: note: in expansion of macro ‘TORCH_CHECK_WITH’
 #define TORCH_CHECK(cond, ...) TORCH_CHECK_WITH(Error, cond, __VA_ARGS__)
                                ^~~~~~~~~~~~~~~~
csrc/layer_norm_cuda.cpp:117:23: note: in expansion of macro ‘TORCH_CHECK’
 #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
                       ^~~~~~~~~~~
csrc/layer_norm_cuda.cpp:119:24: note: in expansion of macro ‘CHECK_CUDA’
 #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
                        ^~~~~~~~~~
csrc/layer_norm_cuda.cpp:193:3: note: in expansion of macro ‘CHECK_INPUT’
   CHECK_INPUT(dout);
   ^~~~~~~~~~~
In file included from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/Context.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/ATen.h:5,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/core/TensorBody.h:268:30: note: declared here
   DeprecatedTypeProperties & type() const {
                              ^~~~
In file included from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/core/DeviceType.h:8,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/core/Device.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/ATen.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
csrc/layer_norm_cuda.cpp:117:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
 #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
                                          ^
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/macros/Macros.h:146:65: note: in definition of macro ‘C10_UNLIKELY’
 #define C10_UNLIKELY(expr)  (__builtin_expect(static_cast<bool>(expr), 0))
                                                                 ^~~~
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/util/Exception.h:330:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
   if (C10_UNLIKELY_OR_CONST(!(cond))) {                               \
       ^~~~~~~~~~~~~~~~~~~~~
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/util/Exception.h:318:3: note: in expansion of macro ‘TORCH_CHECK_WITH_MSG’
   TORCH_CHECK_WITH_MSG(error_t, cond, "", __VA_ARGS__)
   ^~~~~~~~~~~~~~~~~~~~
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/util/Exception.h:341:32: note: in expansion of macro ‘TORCH_CHECK_WITH’
 #define TORCH_CHECK(cond, ...) TORCH_CHECK_WITH(Error, cond, __VA_ARGS__)
                                ^~~~~~~~~~~~~~~~
csrc/layer_norm_cuda.cpp:117:23: note: in expansion of macro ‘TORCH_CHECK’
 #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
                       ^~~~~~~~~~~
csrc/layer_norm_cuda.cpp:119:24: note: in expansion of macro ‘CHECK_CUDA’
 #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
                        ^~~~~~~~~~
csrc/layer_norm_cuda.cpp:194:3: note: in expansion of macro ‘CHECK_INPUT’
   CHECK_INPUT(mean);
   ^~~~~~~~~~~
In file included from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/Context.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/ATen.h:5,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/core/TensorBody.h:268:30: note: declared here
   DeprecatedTypeProperties & type() const {
                              ^~~~
In file included from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/core/DeviceType.h:8,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/core/Device.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/ATen.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
csrc/layer_norm_cuda.cpp:117:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
 #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
                                          ^
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/macros/Macros.h:146:65: note: in definition of macro ‘C10_UNLIKELY’
 #define C10_UNLIKELY(expr)  (__builtin_expect(static_cast<bool>(expr), 0))
                                                                 ^~~~
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/util/Exception.h:330:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
   if (C10_UNLIKELY_OR_CONST(!(cond))) {                               \
       ^~~~~~~~~~~~~~~~~~~~~
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/util/Exception.h:318:3: note: in expansion of macro ‘TORCH_CHECK_WITH_MSG’
   TORCH_CHECK_WITH_MSG(error_t, cond, "", __VA_ARGS__)
   ^~~~~~~~~~~~~~~~~~~~
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/util/Exception.h:341:32: note: in expansion of macro ‘TORCH_CHECK_WITH’
 #define TORCH_CHECK(cond, ...) TORCH_CHECK_WITH(Error, cond, __VA_ARGS__)
                                ^~~~~~~~~~~~~~~~
csrc/layer_norm_cuda.cpp:117:23: note: in expansion of macro ‘TORCH_CHECK’
 #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
                       ^~~~~~~~~~~
csrc/layer_norm_cuda.cpp:119:24: note: in expansion of macro ‘CHECK_CUDA’
 #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
                        ^~~~~~~~~~
csrc/layer_norm_cuda.cpp:195:3: note: in expansion of macro ‘CHECK_INPUT’
   CHECK_INPUT(invvar);
   ^~~~~~~~~~~
In file included from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/Context.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/ATen.h:5,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/core/TensorBody.h:268:30: note: declared here
   DeprecatedTypeProperties & type() const {
                              ^~~~
In file included from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/core/DeviceType.h:8,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/core/Device.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/ATen.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
csrc/layer_norm_cuda.cpp:117:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
 #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
                                          ^
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/macros/Macros.h:146:65: note: in definition of macro ‘C10_UNLIKELY’
 #define C10_UNLIKELY(expr)  (__builtin_expect(static_cast<bool>(expr), 0))
                                                                 ^~~~
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/util/Exception.h:330:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
   if (C10_UNLIKELY_OR_CONST(!(cond))) {                               \
       ^~~~~~~~~~~~~~~~~~~~~
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/util/Exception.h:318:3: note: in expansion of macro ‘TORCH_CHECK_WITH_MSG’
   TORCH_CHECK_WITH_MSG(error_t, cond, "", __VA_ARGS__)
   ^~~~~~~~~~~~~~~~~~~~
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/util/Exception.h:341:32: note: in expansion of macro ‘TORCH_CHECK_WITH’
 #define TORCH_CHECK(cond, ...) TORCH_CHECK_WITH(Error, cond, __VA_ARGS__)
                                ^~~~~~~~~~~~~~~~
csrc/layer_norm_cuda.cpp:117:23: note: in expansion of macro ‘TORCH_CHECK’
 #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
                       ^~~~~~~~~~~
csrc/layer_norm_cuda.cpp:119:24: note: in expansion of macro ‘CHECK_CUDA’
 #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
                        ^~~~~~~~~~
csrc/layer_norm_cuda.cpp:196:3: note: in expansion of macro ‘CHECK_INPUT’
   CHECK_INPUT(input);
   ^~~~~~~~~~~
In file included from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/Context.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/ATen.h:5,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/core/TensorBody.h:268:30: note: declared here
   DeprecatedTypeProperties & type() const {
                              ^~~~
In file included from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/core/DeviceType.h:8,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/core/Device.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/ATen.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
csrc/layer_norm_cuda.cpp: In function ‘std::vector<at::Tensor> layer_norm_gradient_affine(at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::IntArrayRef, at::Tensor, at::Tensor, double)’:
csrc/layer_norm_cuda.cpp:117:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
 #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
                                          ^
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/macros/Macros.h:146:65: note: in definition of macro ‘C10_UNLIKELY’
 #define C10_UNLIKELY(expr)  (__builtin_expect(static_cast<bool>(expr), 0))
                                                                 ^~~~
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/util/Exception.h:330:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
   if (C10_UNLIKELY_OR_CONST(!(cond))) {                               \
       ^~~~~~~~~~~~~~~~~~~~~
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/util/Exception.h:318:3: note: in expansion of macro ‘TORCH_CHECK_WITH_MSG’
   TORCH_CHECK_WITH_MSG(error_t, cond, "", __VA_ARGS__)
   ^~~~~~~~~~~~~~~~~~~~
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/util/Exception.h:341:32: note: in expansion of macro ‘TORCH_CHECK_WITH’
 #define TORCH_CHECK(cond, ...) TORCH_CHECK_WITH(Error, cond, __VA_ARGS__)
                                ^~~~~~~~~~~~~~~~
csrc/layer_norm_cuda.cpp:117:23: note: in expansion of macro ‘TORCH_CHECK’
 #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
                       ^~~~~~~~~~~
csrc/layer_norm_cuda.cpp:119:24: note: in expansion of macro ‘CHECK_CUDA’
 #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
                        ^~~~~~~~~~
csrc/layer_norm_cuda.cpp:218:3: note: in expansion of macro ‘CHECK_INPUT’
   CHECK_INPUT(dout);
   ^~~~~~~~~~~
In file included from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/Context.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/ATen.h:5,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/core/TensorBody.h:268:30: note: declared here
   DeprecatedTypeProperties & type() const {
                              ^~~~
In file included from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/core/DeviceType.h:8,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/core/Device.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/ATen.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
csrc/layer_norm_cuda.cpp:117:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
 #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
                                          ^
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/macros/Macros.h:146:65: note: in definition of macro ‘C10_UNLIKELY’
 #define C10_UNLIKELY(expr)  (__builtin_expect(static_cast<bool>(expr), 0))
                                                                 ^~~~
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/util/Exception.h:330:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
   if (C10_UNLIKELY_OR_CONST(!(cond))) {                               \
       ^~~~~~~~~~~~~~~~~~~~~
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/util/Exception.h:318:3: note: in expansion of macro ‘TORCH_CHECK_WITH_MSG’
   TORCH_CHECK_WITH_MSG(error_t, cond, "", __VA_ARGS__)
   ^~~~~~~~~~~~~~~~~~~~
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/util/Exception.h:341:32: note: in expansion of macro ‘TORCH_CHECK_WITH’
 #define TORCH_CHECK(cond, ...) TORCH_CHECK_WITH(Error, cond, __VA_ARGS__)
                                ^~~~~~~~~~~~~~~~
csrc/layer_norm_cuda.cpp:117:23: note: in expansion of macro ‘TORCH_CHECK’
 #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
                       ^~~~~~~~~~~
csrc/layer_norm_cuda.cpp:119:24: note: in expansion of macro ‘CHECK_CUDA’
 #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
                        ^~~~~~~~~~
csrc/layer_norm_cuda.cpp:219:3: note: in expansion of macro ‘CHECK_INPUT’
   CHECK_INPUT(mean);
   ^~~~~~~~~~~
In file included from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/Context.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/ATen.h:5,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/core/TensorBody.h:268:30: note: declared here
   DeprecatedTypeProperties & type() const {
                              ^~~~
In file included from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/core/DeviceType.h:8,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/core/Device.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/ATen.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
csrc/layer_norm_cuda.cpp:117:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
 #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
                                          ^
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/macros/Macros.h:146:65: note: in definition of macro ‘C10_UNLIKELY’
 #define C10_UNLIKELY(expr)  (__builtin_expect(static_cast<bool>(expr), 0))
                                                                 ^~~~
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/util/Exception.h:330:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
   if (C10_UNLIKELY_OR_CONST(!(cond))) {                               \
       ^~~~~~~~~~~~~~~~~~~~~
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/util/Exception.h:318:3: note: in expansion of macro ‘TORCH_CHECK_WITH_MSG’
   TORCH_CHECK_WITH_MSG(error_t, cond, "", __VA_ARGS__)
   ^~~~~~~~~~~~~~~~~~~~
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/util/Exception.h:341:32: note: in expansion of macro ‘TORCH_CHECK_WITH’
 #define TORCH_CHECK(cond, ...) TORCH_CHECK_WITH(Error, cond, __VA_ARGS__)
                                ^~~~~~~~~~~~~~~~
csrc/layer_norm_cuda.cpp:117:23: note: in expansion of macro ‘TORCH_CHECK’
 #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
                       ^~~~~~~~~~~
csrc/layer_norm_cuda.cpp:119:24: note: in expansion of macro ‘CHECK_CUDA’
 #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
                        ^~~~~~~~~~
csrc/layer_norm_cuda.cpp:220:3: note: in expansion of macro ‘CHECK_INPUT’
   CHECK_INPUT(invvar);
   ^~~~~~~~~~~
In file included from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/Context.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/ATen.h:5,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/core/TensorBody.h:268:30: note: declared here
   DeprecatedTypeProperties & type() const {
                              ^~~~
In file included from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/core/DeviceType.h:8,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/core/Device.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/ATen.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
csrc/layer_norm_cuda.cpp:117:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
 #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
                                          ^
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/macros/Macros.h:146:65: note: in definition of macro ‘C10_UNLIKELY’
 #define C10_UNLIKELY(expr)  (__builtin_expect(static_cast<bool>(expr), 0))
                                                                 ^~~~
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/util/Exception.h:330:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
   if (C10_UNLIKELY_OR_CONST(!(cond))) {                               \
       ^~~~~~~~~~~~~~~~~~~~~
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/util/Exception.h:318:3: note: in expansion of macro ‘TORCH_CHECK_WITH_MSG’
   TORCH_CHECK_WITH_MSG(error_t, cond, "", __VA_ARGS__)
   ^~~~~~~~~~~~~~~~~~~~
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/util/Exception.h:341:32: note: in expansion of macro ‘TORCH_CHECK_WITH’
 #define TORCH_CHECK(cond, ...) TORCH_CHECK_WITH(Error, cond, __VA_ARGS__)
                                ^~~~~~~~~~~~~~~~
csrc/layer_norm_cuda.cpp:117:23: note: in expansion of macro ‘TORCH_CHECK’
 #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
                       ^~~~~~~~~~~
csrc/layer_norm_cuda.cpp:119:24: note: in expansion of macro ‘CHECK_CUDA’
 #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
                        ^~~~~~~~~~
csrc/layer_norm_cuda.cpp:221:3: note: in expansion of macro ‘CHECK_INPUT’
   CHECK_INPUT(input);
   ^~~~~~~~~~~
In file included from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/Context.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/ATen.h:5,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/core/TensorBody.h:268:30: note: declared here
   DeprecatedTypeProperties & type() const {
                              ^~~~
In file included from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/core/DeviceType.h:8,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/core/Device.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/ATen.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
csrc/layer_norm_cuda.cpp:117:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
 #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
                                          ^
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/macros/Macros.h:146:65: note: in definition of macro ‘C10_UNLIKELY’
 #define C10_UNLIKELY(expr)  (__builtin_expect(static_cast<bool>(expr), 0))
                                                                 ^~~~
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/util/Exception.h:330:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
   if (C10_UNLIKELY_OR_CONST(!(cond))) {                               \
       ^~~~~~~~~~~~~~~~~~~~~
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/util/Exception.h:318:3: note: in expansion of macro ‘TORCH_CHECK_WITH_MSG’
   TORCH_CHECK_WITH_MSG(error_t, cond, "", __VA_ARGS__)
   ^~~~~~~~~~~~~~~~~~~~
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/util/Exception.h:341:32: note: in expansion of macro ‘TORCH_CHECK_WITH’
 #define TORCH_CHECK(cond, ...) TORCH_CHECK_WITH(Error, cond, __VA_ARGS__)
                                ^~~~~~~~~~~~~~~~
csrc/layer_norm_cuda.cpp:117:23: note: in expansion of macro ‘TORCH_CHECK’
 #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
                       ^~~~~~~~~~~
csrc/layer_norm_cuda.cpp:119:24: note: in expansion of macro ‘CHECK_CUDA’
 #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
                        ^~~~~~~~~~
csrc/layer_norm_cuda.cpp:222:3: note: in expansion of macro ‘CHECK_INPUT’
   CHECK_INPUT(gamma);
   ^~~~~~~~~~~
In file included from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/Context.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/ATen.h:5,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/core/TensorBody.h:268:30: note: declared here
   DeprecatedTypeProperties & type() const {
                              ^~~~
In file included from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/core/DeviceType.h:8,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/core/Device.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/ATen.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
csrc/layer_norm_cuda.cpp:117:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
 #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
                                          ^
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/macros/Macros.h:146:65: note: in definition of macro ‘C10_UNLIKELY’
 #define C10_UNLIKELY(expr)  (__builtin_expect(static_cast<bool>(expr), 0))
                                                                 ^~~~
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/util/Exception.h:330:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
   if (C10_UNLIKELY_OR_CONST(!(cond))) {                               \
       ^~~~~~~~~~~~~~~~~~~~~
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/util/Exception.h:318:3: note: in expansion of macro ‘TORCH_CHECK_WITH_MSG’
   TORCH_CHECK_WITH_MSG(error_t, cond, "", __VA_ARGS__)
   ^~~~~~~~~~~~~~~~~~~~
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/c10/util/Exception.h:341:32: note: in expansion of macro ‘TORCH_CHECK_WITH’
 #define TORCH_CHECK(cond, ...) TORCH_CHECK_WITH(Error, cond, __VA_ARGS__)
                                ^~~~~~~~~~~~~~~~
csrc/layer_norm_cuda.cpp:117:23: note: in expansion of macro ‘TORCH_CHECK’
 #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
                       ^~~~~~~~~~~
csrc/layer_norm_cuda.cpp:119:24: note: in expansion of macro ‘CHECK_CUDA’
 #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
                        ^~~~~~~~~~
csrc/layer_norm_cuda.cpp:223:3: note: in expansion of macro ‘CHECK_INPUT’
   CHECK_INPUT(beta);
   ^~~~~~~~~~~
In file included from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/Tensor.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/Context.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/ATen.h:5,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/extension.h:4,
                 from csrc/layer_norm_cuda.cpp:1:
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/ATen/core/TensorBody.h:268:30: note: declared here
   DeprecatedTypeProperties & type() const {
                              ^~~~
/usr/local/cuda/bin/nvcc -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/TH -I/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/user/miniconda3/envs/ait/include/python3.7m -c csrc/layer_norm_cuda_kernel.cu -o build/temp.linux-x86_64-3.7/csrc/layer_norm_cuda_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -maxrregcount=50 -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=fused_layer_norm_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -std=c++14
g++ -pthread -shared -B /home/user/miniconda3/envs/ait/compiler_compat -L/home/user/miniconda3/envs/ait/lib -Wl,-rpath=/home/user/miniconda3/envs/ait/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.7/csrc/layer_norm_cuda.o build/temp.linux-x86_64-3.7/csrc/layer_norm_cuda_kernel.o -L/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-3.7/fused_layer_norm_cuda.cpython-37m-x86_64-linux-gnu.so
installing to build/bdist.linux-x86_64/wheel
running install
running install_lib
creating build/bdist.linux-x86_64
creating build/bdist.linux-x86_64/wheel
copying build/lib.linux-x86_64-3.7/fused_layer_norm_cuda.cpython-37m-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/wheel
copying build/lib.linux-x86_64-3.7/amp_C.cpython-37m-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/wheel
creating build/bdist.linux-x86_64/wheel/apex
creating build/bdist.linux-x86_64/wheel/apex/contrib
creating build/bdist.linux-x86_64/wheel/apex/contrib/xentropy
copying build/lib.linux-x86_64-3.7/apex/contrib/xentropy/__init__.py -> build/bdist.linux-x86_64/wheel/apex/contrib/xentropy
copying build/lib.linux-x86_64-3.7/apex/contrib/xentropy/softmax_xentropy.py -> build/bdist.linux-x86_64/wheel/apex/contrib/xentropy
creating build/bdist.linux-x86_64/wheel/apex/contrib/optimizers
copying build/lib.linux-x86_64-3.7/apex/contrib/optimizers/__init__.py -> build/bdist.linux-x86_64/wheel/apex/contrib/optimizers
copying build/lib.linux-x86_64-3.7/apex/contrib/optimizers/fp16_optimizer.py -> build/bdist.linux-x86_64/wheel/apex/contrib/optimizers
copying build/lib.linux-x86_64-3.7/apex/contrib/optimizers/fused_sgd.py -> build/bdist.linux-x86_64/wheel/apex/contrib/optimizers
copying build/lib.linux-x86_64-3.7/apex/contrib/optimizers/fused_adam.py -> build/bdist.linux-x86_64/wheel/apex/contrib/optimizers
copying build/lib.linux-x86_64-3.7/apex/contrib/__init__.py -> build/bdist.linux-x86_64/wheel/apex/contrib
creating build/bdist.linux-x86_64/wheel/apex/contrib/groupbn
copying build/lib.linux-x86_64-3.7/apex/contrib/groupbn/__init__.py -> build/bdist.linux-x86_64/wheel/apex/contrib/groupbn
copying build/lib.linux-x86_64-3.7/apex/contrib/groupbn/batch_norm.py -> build/bdist.linux-x86_64/wheel/apex/contrib/groupbn
creating build/bdist.linux-x86_64/wheel/apex/parallel
copying build/lib.linux-x86_64-3.7/apex/parallel/LARC.py -> build/bdist.linux-x86_64/wheel/apex/parallel
copying build/lib.linux-x86_64-3.7/apex/parallel/distributed.py -> build/bdist.linux-x86_64/wheel/apex/parallel
copying build/lib.linux-x86_64-3.7/apex/parallel/__init__.py -> build/bdist.linux-x86_64/wheel/apex/parallel
copying build/lib.linux-x86_64-3.7/apex/parallel/optimized_sync_batchnorm.py -> build/bdist.linux-x86_64/wheel/apex/parallel
copying build/lib.linux-x86_64-3.7/apex/parallel/sync_batchnorm.py -> build/bdist.linux-x86_64/wheel/apex/parallel
copying build/lib.linux-x86_64-3.7/apex/parallel/multiproc.py -> build/bdist.linux-x86_64/wheel/apex/parallel
copying build/lib.linux-x86_64-3.7/apex/parallel/optimized_sync_batchnorm_kernel.py -> build/bdist.linux-x86_64/wheel/apex/parallel
copying build/lib.linux-x86_64-3.7/apex/parallel/sync_batchnorm_kernel.py -> build/bdist.linux-x86_64/wheel/apex/parallel
creating build/bdist.linux-x86_64/wheel/apex/multi_tensor_apply
copying build/lib.linux-x86_64-3.7/apex/multi_tensor_apply/__init__.py -> build/bdist.linux-x86_64/wheel/apex/multi_tensor_apply
copying build/lib.linux-x86_64-3.7/apex/multi_tensor_apply/multi_tensor_apply.py -> build/bdist.linux-x86_64/wheel/apex/multi_tensor_apply
creating build/bdist.linux-x86_64/wheel/apex/normalization
copying build/lib.linux-x86_64-3.7/apex/normalization/fused_layer_norm.py -> build/bdist.linux-x86_64/wheel/apex/normalization
copying build/lib.linux-x86_64-3.7/apex/normalization/__init__.py -> build/bdist.linux-x86_64/wheel/apex/normalization
creating build/bdist.linux-x86_64/wheel/apex/optimizers
copying build/lib.linux-x86_64-3.7/apex/optimizers/fused_lamb.py -> build/bdist.linux-x86_64/wheel/apex/optimizers
copying build/lib.linux-x86_64-3.7/apex/optimizers/fused_novograd.py -> build/bdist.linux-x86_64/wheel/apex/optimizers
copying build/lib.linux-x86_64-3.7/apex/optimizers/__init__.py -> build/bdist.linux-x86_64/wheel/apex/optimizers
copying build/lib.linux-x86_64-3.7/apex/optimizers/fused_sgd.py -> build/bdist.linux-x86_64/wheel/apex/optimizers
copying build/lib.linux-x86_64-3.7/apex/optimizers/fused_adam.py -> build/bdist.linux-x86_64/wheel/apex/optimizers
creating build/bdist.linux-x86_64/wheel/apex/pyprof
creating build/bdist.linux-x86_64/wheel/apex/pyprof/nvtx
copying build/lib.linux-x86_64-3.7/apex/pyprof/nvtx/__init__.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/nvtx
copying build/lib.linux-x86_64-3.7/apex/pyprof/nvtx/nvmarker.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/nvtx
copying build/lib.linux-x86_64-3.7/apex/pyprof/__init__.py -> build/bdist.linux-x86_64/wheel/apex/pyprof
creating build/bdist.linux-x86_64/wheel/apex/pyprof/parse
copying build/lib.linux-x86_64-3.7/apex/pyprof/parse/nvvp.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/parse
copying build/lib.linux-x86_64-3.7/apex/pyprof/parse/__main__.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/parse
copying build/lib.linux-x86_64-3.7/apex/pyprof/parse/parse.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/parse
copying build/lib.linux-x86_64-3.7/apex/pyprof/parse/__init__.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/parse
copying build/lib.linux-x86_64-3.7/apex/pyprof/parse/db.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/parse
copying build/lib.linux-x86_64-3.7/apex/pyprof/parse/kernel.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/parse
creating build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/loss.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/optim.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/output.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/misc.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/__main__.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/usage.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/index_slice_join_mutate.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/pointwise.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/blas.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/embedding.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/reduction.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/prof.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/utility.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/__init__.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/softmax.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/recurrentCell.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/randomSample.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/conv.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/data.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/pooling.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/activation.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/linear.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/normalization.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/base.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/dropout.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/convert.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/__init__.py -> build/bdist.linux-x86_64/wheel/apex
creating build/bdist.linux-x86_64/wheel/apex/RNN
copying build/lib.linux-x86_64-3.7/apex/RNN/cells.py -> build/bdist.linux-x86_64/wheel/apex/RNN
copying build/lib.linux-x86_64-3.7/apex/RNN/__init__.py -> build/bdist.linux-x86_64/wheel/apex/RNN
copying build/lib.linux-x86_64-3.7/apex/RNN/RNNBackend.py -> build/bdist.linux-x86_64/wheel/apex/RNN
copying build/lib.linux-x86_64-3.7/apex/RNN/models.py -> build/bdist.linux-x86_64/wheel/apex/RNN
creating build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.7/apex/amp/utils.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.7/apex/amp/rnn_compat.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.7/apex/amp/frontend.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.7/apex/amp/_initialize.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.7/apex/amp/amp.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.7/apex/amp/_process_optimizer.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.7/apex/amp/opt.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.7/apex/amp/__init__.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.7/apex/amp/handle.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.7/apex/amp/scaler.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.7/apex/amp/__version__.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.7/apex/amp/compat.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.7/apex/amp/_amp_state.py -> build/bdist.linux-x86_64/wheel/apex/amp
creating build/bdist.linux-x86_64/wheel/apex/amp/lists
copying build/lib.linux-x86_64-3.7/apex/amp/lists/torch_overrides.py -> build/bdist.linux-x86_64/wheel/apex/amp/lists
copying build/lib.linux-x86_64-3.7/apex/amp/lists/__init__.py -> build/bdist.linux-x86_64/wheel/apex/amp/lists
copying build/lib.linux-x86_64-3.7/apex/amp/lists/functional_overrides.py -> build/bdist.linux-x86_64/wheel/apex/amp/lists
copying build/lib.linux-x86_64-3.7/apex/amp/lists/tensor_overrides.py -> build/bdist.linux-x86_64/wheel/apex/amp/lists
copying build/lib.linux-x86_64-3.7/apex/amp/wrap.py -> build/bdist.linux-x86_64/wheel/apex/amp
creating build/bdist.linux-x86_64/wheel/apex/reparameterization
copying build/lib.linux-x86_64-3.7/apex/reparameterization/reparameterization.py -> build/bdist.linux-x86_64/wheel/apex/reparameterization
copying build/lib.linux-x86_64-3.7/apex/reparameterization/__init__.py -> build/bdist.linux-x86_64/wheel/apex/reparameterization
copying build/lib.linux-x86_64-3.7/apex/reparameterization/weight_norm.py -> build/bdist.linux-x86_64/wheel/apex/reparameterization
creating build/bdist.linux-x86_64/wheel/apex/fp16_utils
copying build/lib.linux-x86_64-3.7/apex/fp16_utils/__init__.py -> build/bdist.linux-x86_64/wheel/apex/fp16_utils
copying build/lib.linux-x86_64-3.7/apex/fp16_utils/loss_scaler.py -> build/bdist.linux-x86_64/wheel/apex/fp16_utils
copying build/lib.linux-x86_64-3.7/apex/fp16_utils/fp16_optimizer.py -> build/bdist.linux-x86_64/wheel/apex/fp16_utils
copying build/lib.linux-x86_64-3.7/apex/fp16_utils/fp16util.py -> build/bdist.linux-x86_64/wheel/apex/fp16_utils
copying build/lib.linux-x86_64-3.7/syncbn.cpython-37m-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/wheel
copying build/lib.linux-x86_64-3.7/apex_C.cpython-37m-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/wheel
running install_egg_info
running egg_info
creating apex.egg-info
writing apex.egg-info/PKG-INFO
writing dependency_links to apex.egg-info/dependency_links.txt
writing top-level names to apex.egg-info/top_level.txt
writing manifest file 'apex.egg-info/SOURCES.txt'
'license_file' option was not specified
reading manifest file 'apex.egg-info/SOURCES.txt'
writing manifest file 'apex.egg-info/SOURCES.txt'
Copying apex.egg-info to build/bdist.linux-x86_64/wheel/apex-0.1-py3.7.egg-info
Copying dependency_links.txt to build/bdist.linux-x86_64/wheel/apex-0.1-py3.7.egg-info/dependency_links.txt
Copying PKG-INFO to build/bdist.linux-x86_64/wheel/apex-0.1-py3.7.egg-info/PKG-INFO
Copying top_level.txt to build/bdist.linux-x86_64/wheel/apex-0.1-py3.7.egg-info/top_level.txt
Copying SOURCES.txt to build/bdist.linux-x86_64/wheel/apex-0.1-py3.7.egg-info/SOURCES.txt
running install_scripts
adding license file "LICENSE" (matched pattern "LICEN[CS]E*")
creating build/bdist.linux-x86_64/wheel/apex-0.1.dist-info/WHEEL
creating 'dist/apex-0.1-cp37-cp37m-linux_x86_64.whl' and adding 'build/bdist.linux-x86_64/wheel' to it
adding 'amp_C.cpython-37m-x86_64-linux-gnu.so'
adding 'apex_C.cpython-37m-x86_64-linux-gnu.so'
adding 'fused_layer_norm_cuda.cpython-37m-x86_64-linux-gnu.so'
adding 'syncbn.cpython-37m-x86_64-linux-gnu.so'
adding 'apex/__init__.py'
adding 'apex/RNN/RNNBackend.py'
adding 'apex/RNN/__init__.py'
adding 'apex/RNN/cells.py'
adding 'apex/RNN/models.py'
adding 'apex/amp/__init__.py'
adding 'apex/amp/__version__.py'
adding 'apex/amp/_amp_state.py'
adding 'apex/amp/_initialize.py'
adding 'apex/amp/_process_optimizer.py'
adding 'apex/amp/amp.py'
adding 'apex/amp/compat.py'
adding 'apex/amp/frontend.py'
adding 'apex/amp/handle.py'
adding 'apex/amp/opt.py'
adding 'apex/amp/rnn_compat.py'
adding 'apex/amp/scaler.py'
adding 'apex/amp/utils.py'
adding 'apex/amp/wrap.py'
adding 'apex/amp/lists/__init__.py'
adding 'apex/amp/lists/functional_overrides.py'
adding 'apex/amp/lists/tensor_overrides.py'
adding 'apex/amp/lists/torch_overrides.py'
adding 'apex/contrib/__init__.py'
adding 'apex/contrib/groupbn/__init__.py'
adding 'apex/contrib/groupbn/batch_norm.py'
adding 'apex/contrib/optimizers/__init__.py'
adding 'apex/contrib/optimizers/fp16_optimizer.py'
adding 'apex/contrib/optimizers/fused_adam.py'
adding 'apex/contrib/optimizers/fused_sgd.py'
adding 'apex/contrib/xentropy/__init__.py'
adding 'apex/contrib/xentropy/softmax_xentropy.py'
adding 'apex/fp16_utils/__init__.py'
adding 'apex/fp16_utils/fp16_optimizer.py'
adding 'apex/fp16_utils/fp16util.py'
adding 'apex/fp16_utils/loss_scaler.py'
adding 'apex/multi_tensor_apply/__init__.py'
adding 'apex/multi_tensor_apply/multi_tensor_apply.py'
adding 'apex/normalization/__init__.py'
adding 'apex/normalization/fused_layer_norm.py'
adding 'apex/optimizers/__init__.py'
adding 'apex/optimizers/fused_adam.py'
adding 'apex/optimizers/fused_lamb.py'
adding 'apex/optimizers/fused_novograd.py'
adding 'apex/optimizers/fused_sgd.py'
adding 'apex/parallel/LARC.py'
adding 'apex/parallel/__init__.py'
adding 'apex/parallel/distributed.py'
adding 'apex/parallel/multiproc.py'
adding 'apex/parallel/optimized_sync_batchnorm.py'
adding 'apex/parallel/optimized_sync_batchnorm_kernel.py'
adding 'apex/parallel/sync_batchnorm.py'
adding 'apex/parallel/sync_batchnorm_kernel.py'
adding 'apex/pyprof/__init__.py'
adding 'apex/pyprof/nvtx/__init__.py'
adding 'apex/pyprof/nvtx/nvmarker.py'
adding 'apex/pyprof/parse/__init__.py'
adding 'apex/pyprof/parse/__main__.py'
adding 'apex/pyprof/parse/db.py'
adding 'apex/pyprof/parse/kernel.py'
adding 'apex/pyprof/parse/nvvp.py'
adding 'apex/pyprof/parse/parse.py'
adding 'apex/pyprof/prof/__init__.py'
adding 'apex/pyprof/prof/__main__.py'
adding 'apex/pyprof/prof/activation.py'
adding 'apex/pyprof/prof/base.py'
adding 'apex/pyprof/prof/blas.py'
adding 'apex/pyprof/prof/conv.py'
adding 'apex/pyprof/prof/convert.py'
adding 'apex/pyprof/prof/data.py'
adding 'apex/pyprof/prof/dropout.py'
adding 'apex/pyprof/prof/embedding.py'
adding 'apex/pyprof/prof/index_slice_join_mutate.py'
adding 'apex/pyprof/prof/linear.py'
adding 'apex/pyprof/prof/loss.py'
adding 'apex/pyprof/prof/misc.py'
adding 'apex/pyprof/prof/normalization.py'
adding 'apex/pyprof/prof/optim.py'
adding 'apex/pyprof/prof/output.py'
adding 'apex/pyprof/prof/pointwise.py'
adding 'apex/pyprof/prof/pooling.py'
adding 'apex/pyprof/prof/prof.py'
adding 'apex/pyprof/prof/randomSample.py'
adding 'apex/pyprof/prof/recurrentCell.py'
adding 'apex/pyprof/prof/reduction.py'
adding 'apex/pyprof/prof/softmax.py'
adding 'apex/pyprof/prof/usage.py'
adding 'apex/pyprof/prof/utility.py'
adding 'apex/reparameterization/__init__.py'
adding 'apex/reparameterization/reparameterization.py'
adding 'apex/reparameterization/weight_norm.py'
adding 'apex-0.1.dist-info/LICENSE'
adding 'apex-0.1.dist-info/METADATA'
adding 'apex-0.1.dist-info/WHEEL'
adding 'apex-0.1.dist-info/top_level.txt'
adding 'apex-0.1.dist-info/RECORD'
removing build/bdist.linux-x86_64/wheel
/home/user/code/DeepSpeed
Installing apex locally so that deepspeed will build
WARNING: Skipping apex as it is not installed.
Using pip 20.2.3 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/pip (python 3.7)
Non-user install because site-packages writeable
Created temporary directory: /tmp/pip-ephem-wheel-cache-ra6kifta
Created temporary directory: /tmp/pip-req-tracker-4fweqvlb
Initialized build tracking at /tmp/pip-req-tracker-4fweqvlb
Created build tracker: /tmp/pip-req-tracker-4fweqvlb
Entered build tracker: /tmp/pip-req-tracker-4fweqvlb
Created temporary directory: /tmp/pip-install-82wldn2p
Processing ./third_party/apex/dist/apex-0.1-cp37-cp37m-linux_x86_64.whl
  Added apex==0.1 from file:///home/user/code/DeepSpeed/third_party/apex/dist/apex-0.1-cp37-cp37m-linux_x86_64.whl to build tracker '/tmp/pip-req-tracker-4fweqvlb'
  Removed apex==0.1 from file:///home/user/code/DeepSpeed/third_party/apex/dist/apex-0.1-cp37-cp37m-linux_x86_64.whl from build tracker '/tmp/pip-req-tracker-4fweqvlb'
Installing collected packages: apex

Successfully installed apex-0.1
Removed build tracker: '/tmp/pip-req-tracker-4fweqvlb'
Building deepspeed wheel
./install.sh: line 196: 69900 Floating point exception(core dumped) python setup.py -v bdist_wheel
Error on line 195
Fail to install deepspeed

@tjruwase
Copy link
Contributor

@drfinkus Sorry that this is still a problem. While I see the floating point exception in the log, I am unable to see the cause. So can you please rerun install.sh in an incremental way to reduce the log size. You can do this by adding -n to the command line arguments: install.sh ... -n

@drfinkus
Copy link
Author

drfinkus commented Oct 21, 2020

@tjruwase sure, thanks for the help! See below, hope this helps and let me know if I can help in any other way!

(ait) user@desktop:~/code/DeepSpeed$ ./install.sh -n
No hostfile exists at /job/hostfile, installing locally
Using pip 20.2.4 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/pip (python 3.7)
Non-user install because site-packages writeable
Created temporary directory: /tmp/pip-ephem-wheel-cache-2o9xf9lz
Created temporary directory: /tmp/pip-req-tracker-lu7qlrs5
Initialized build tracking at /tmp/pip-req-tracker-lu7qlrs5
Created build tracker: /tmp/pip-req-tracker-lu7qlrs5
Entered build tracker: /tmp/pip-req-tracker-lu7qlrs5
Created temporary directory: /tmp/pip-install-yrc0vse_
Requirement already satisfied: torch>=1.2 in /home/user/miniconda3/envs/ait/lib/python3.7/site-packages (from -r requirements/requirements.txt (line 1)) (1.6.0)
Requirement already satisfied: torchvision>=0.4.0 in /home/user/miniconda3/envs/ait/lib/python3.7/site-packages (from -r requirements/requirements.txt (line 2)) (0.7.0)
Requirement already satisfied: tqdm in /home/user/miniconda3/envs/ait/lib/python3.7/site-packages (from -r requirements/requirements.txt (line 3)) (4.50.2)
Requirement already satisfied: psutil in /home/user/miniconda3/envs/ait/lib/python3.7/site-packages (from -r requirements/requirements.txt (line 4)) (5.7.2)
Requirement already satisfied: cpufeature in /home/user/miniconda3/envs/ait/lib/python3.7/site-packages (from -r requirements/requirements.txt (line 5)) (0.1.1)
Requirement already satisfied: tensorboardX==1.8 in /home/user/miniconda3/envs/ait/lib/python3.7/site-packages (from -r requirements/requirements.txt (line 6)) (1.8)
Requirement already satisfied: future in /home/user/miniconda3/envs/ait/lib/python3.7/site-packages (from torch>=1.2->-r requirements/requirements.txt (line 1)) (0.18.2)
Requirement already satisfied: numpy in /home/user/miniconda3/envs/ait/lib/python3.7/site-packages (from torch>=1.2->-r requirements/requirements.txt (line 1)) (1.18.5)
Requirement already satisfied: pillow>=4.1.1 in /home/user/miniconda3/envs/ait/lib/python3.7/site-packages (from torchvision>=0.4.0->-r requirements/requirements.txt (line 2)) (8.0.0)
Requirement already satisfied: six in /home/user/miniconda3/envs/ait/lib/python3.7/site-packages (from tensorboardX==1.8->-r requirements/requirements.txt (line 6)) (1.15.0)
Requirement already satisfied: protobuf>=3.2.0 in /home/user/miniconda3/envs/ait/lib/python3.7/site-packages (from tensorboardX==1.8->-r requirements/requirements.txt (line 6)) (3.13.0)
Requirement already satisfied: setuptools in /home/user/miniconda3/envs/ait/lib/python3.7/site-packages (from protobuf>=3.2.0->tensorboardX==1.8->-r requirements/requirements.txt (line 6)) (50.3.2)
Removed build tracker: '/tmp/pip-req-tracker-lu7qlrs5'
Checking out sub-module(s)
Building apex wheel
torch.__version__  =  1.6.0
setup.py:43: UserWarning: Option --pyprof not specified. Not installing PyProf dependencies!
  warnings.warn("Option --pyprof not specified. Not installing PyProf dependencies!")

Compiling cuda extensions with
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89
from /usr/local/cuda/bin

running bdist_wheel
/home/user/miniconda3/envs/ait/lib/python3.7/site-packages/torch/utils/cpp_extension.py:335: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
  warnings.warn(msg.format('we could not find ninja.'))
running build
running build_py
not copying apex/__init__.py (output up-to-date)
not copying apex/contrib/__init__.py (output up-to-date)
not copying apex/parallel/LARC.py (output up-to-date)
not copying apex/parallel/distributed.py (output up-to-date)
not copying apex/parallel/__init__.py (output up-to-date)
not copying apex/parallel/optimized_sync_batchnorm.py (output up-to-date)
not copying apex/parallel/sync_batchnorm.py (output up-to-date)
not copying apex/parallel/multiproc.py (output up-to-date)
not copying apex/parallel/optimized_sync_batchnorm_kernel.py (output up-to-date)
not copying apex/parallel/sync_batchnorm_kernel.py (output up-to-date)
not copying apex/multi_tensor_apply/__init__.py (output up-to-date)
not copying apex/multi_tensor_apply/multi_tensor_apply.py (output up-to-date)
not copying apex/normalization/fused_layer_norm.py (output up-to-date)
not copying apex/normalization/__init__.py (output up-to-date)
not copying apex/optimizers/fused_lamb.py (output up-to-date)
not copying apex/optimizers/fused_novograd.py (output up-to-date)
not copying apex/optimizers/__init__.py (output up-to-date)
not copying apex/optimizers/fused_sgd.py (output up-to-date)
not copying apex/optimizers/fused_adam.py (output up-to-date)
not copying apex/pyprof/__init__.py (output up-to-date)
not copying apex/RNN/cells.py (output up-to-date)
not copying apex/RNN/__init__.py (output up-to-date)
not copying apex/RNN/RNNBackend.py (output up-to-date)
not copying apex/RNN/models.py (output up-to-date)
not copying apex/amp/utils.py (output up-to-date)
not copying apex/amp/rnn_compat.py (output up-to-date)
not copying apex/amp/frontend.py (output up-to-date)
not copying apex/amp/_initialize.py (output up-to-date)
not copying apex/amp/amp.py (output up-to-date)
not copying apex/amp/_process_optimizer.py (output up-to-date)
not copying apex/amp/opt.py (output up-to-date)
not copying apex/amp/__init__.py (output up-to-date)
not copying apex/amp/handle.py (output up-to-date)
not copying apex/amp/scaler.py (output up-to-date)
not copying apex/amp/__version__.py (output up-to-date)
not copying apex/amp/compat.py (output up-to-date)
not copying apex/amp/_amp_state.py (output up-to-date)
not copying apex/amp/wrap.py (output up-to-date)
not copying apex/reparameterization/reparameterization.py (output up-to-date)
not copying apex/reparameterization/__init__.py (output up-to-date)
not copying apex/reparameterization/weight_norm.py (output up-to-date)
not copying apex/fp16_utils/__init__.py (output up-to-date)
not copying apex/fp16_utils/loss_scaler.py (output up-to-date)
not copying apex/fp16_utils/fp16_optimizer.py (output up-to-date)
not copying apex/fp16_utils/fp16util.py (output up-to-date)
not copying apex/contrib/xentropy/__init__.py (output up-to-date)
not copying apex/contrib/xentropy/softmax_xentropy.py (output up-to-date)
not copying apex/contrib/optimizers/__init__.py (output up-to-date)
not copying apex/contrib/optimizers/fp16_optimizer.py (output up-to-date)
not copying apex/contrib/optimizers/fused_sgd.py (output up-to-date)
not copying apex/contrib/optimizers/fused_adam.py (output up-to-date)
not copying apex/contrib/groupbn/__init__.py (output up-to-date)
not copying apex/contrib/groupbn/batch_norm.py (output up-to-date)
not copying apex/pyprof/nvtx/__init__.py (output up-to-date)
not copying apex/pyprof/nvtx/nvmarker.py (output up-to-date)
not copying apex/pyprof/parse/nvvp.py (output up-to-date)
not copying apex/pyprof/parse/__main__.py (output up-to-date)
not copying apex/pyprof/parse/parse.py (output up-to-date)
not copying apex/pyprof/parse/__init__.py (output up-to-date)
not copying apex/pyprof/parse/db.py (output up-to-date)
not copying apex/pyprof/parse/kernel.py (output up-to-date)
not copying apex/pyprof/prof/loss.py (output up-to-date)
not copying apex/pyprof/prof/optim.py (output up-to-date)
not copying apex/pyprof/prof/output.py (output up-to-date)
not copying apex/pyprof/prof/misc.py (output up-to-date)
not copying apex/pyprof/prof/__main__.py (output up-to-date)
not copying apex/pyprof/prof/usage.py (output up-to-date)
not copying apex/pyprof/prof/index_slice_join_mutate.py (output up-to-date)
not copying apex/pyprof/prof/pointwise.py (output up-to-date)
not copying apex/pyprof/prof/blas.py (output up-to-date)
not copying apex/pyprof/prof/embedding.py (output up-to-date)
not copying apex/pyprof/prof/reduction.py (output up-to-date)
not copying apex/pyprof/prof/prof.py (output up-to-date)
not copying apex/pyprof/prof/utility.py (output up-to-date)
not copying apex/pyprof/prof/__init__.py (output up-to-date)
not copying apex/pyprof/prof/softmax.py (output up-to-date)
not copying apex/pyprof/prof/recurrentCell.py (output up-to-date)
not copying apex/pyprof/prof/randomSample.py (output up-to-date)
not copying apex/pyprof/prof/conv.py (output up-to-date)
not copying apex/pyprof/prof/data.py (output up-to-date)
not copying apex/pyprof/prof/pooling.py (output up-to-date)
not copying apex/pyprof/prof/activation.py (output up-to-date)
not copying apex/pyprof/prof/linear.py (output up-to-date)
not copying apex/pyprof/prof/normalization.py (output up-to-date)
not copying apex/pyprof/prof/base.py (output up-to-date)
not copying apex/pyprof/prof/dropout.py (output up-to-date)
not copying apex/pyprof/prof/convert.py (output up-to-date)
not copying apex/amp/lists/torch_overrides.py (output up-to-date)
not copying apex/amp/lists/__init__.py (output up-to-date)
not copying apex/amp/lists/functional_overrides.py (output up-to-date)
not copying apex/amp/lists/tensor_overrides.py (output up-to-date)
running build_ext
skipping 'apex_C' extension (up-to-date)
skipping 'amp_C' extension (up-to-date)
skipping 'syncbn' extension (up-to-date)
skipping 'fused_layer_norm_cuda' extension (up-to-date)
installing to build/bdist.linux-x86_64/wheel
running install
running install_lib
creating build/bdist.linux-x86_64/wheel
copying build/lib.linux-x86_64-3.7/fused_layer_norm_cuda.cpython-37m-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/wheel
copying build/lib.linux-x86_64-3.7/amp_C.cpython-37m-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/wheel
creating build/bdist.linux-x86_64/wheel/apex
creating build/bdist.linux-x86_64/wheel/apex/contrib
creating build/bdist.linux-x86_64/wheel/apex/contrib/xentropy
copying build/lib.linux-x86_64-3.7/apex/contrib/xentropy/__init__.py -> build/bdist.linux-x86_64/wheel/apex/contrib/xentropy
copying build/lib.linux-x86_64-3.7/apex/contrib/xentropy/softmax_xentropy.py -> build/bdist.linux-x86_64/wheel/apex/contrib/xentropy
creating build/bdist.linux-x86_64/wheel/apex/contrib/optimizers
copying build/lib.linux-x86_64-3.7/apex/contrib/optimizers/__init__.py -> build/bdist.linux-x86_64/wheel/apex/contrib/optimizers
copying build/lib.linux-x86_64-3.7/apex/contrib/optimizers/fp16_optimizer.py -> build/bdist.linux-x86_64/wheel/apex/contrib/optimizers
copying build/lib.linux-x86_64-3.7/apex/contrib/optimizers/fused_sgd.py -> build/bdist.linux-x86_64/wheel/apex/contrib/optimizers
copying build/lib.linux-x86_64-3.7/apex/contrib/optimizers/fused_adam.py -> build/bdist.linux-x86_64/wheel/apex/contrib/optimizers
copying build/lib.linux-x86_64-3.7/apex/contrib/__init__.py -> build/bdist.linux-x86_64/wheel/apex/contrib
creating build/bdist.linux-x86_64/wheel/apex/contrib/groupbn
copying build/lib.linux-x86_64-3.7/apex/contrib/groupbn/__init__.py -> build/bdist.linux-x86_64/wheel/apex/contrib/groupbn
copying build/lib.linux-x86_64-3.7/apex/contrib/groupbn/batch_norm.py -> build/bdist.linux-x86_64/wheel/apex/contrib/groupbn
creating build/bdist.linux-x86_64/wheel/apex/parallel
copying build/lib.linux-x86_64-3.7/apex/parallel/LARC.py -> build/bdist.linux-x86_64/wheel/apex/parallel
copying build/lib.linux-x86_64-3.7/apex/parallel/distributed.py -> build/bdist.linux-x86_64/wheel/apex/parallel
copying build/lib.linux-x86_64-3.7/apex/parallel/__init__.py -> build/bdist.linux-x86_64/wheel/apex/parallel
copying build/lib.linux-x86_64-3.7/apex/parallel/optimized_sync_batchnorm.py -> build/bdist.linux-x86_64/wheel/apex/parallel
copying build/lib.linux-x86_64-3.7/apex/parallel/sync_batchnorm.py -> build/bdist.linux-x86_64/wheel/apex/parallel
copying build/lib.linux-x86_64-3.7/apex/parallel/multiproc.py -> build/bdist.linux-x86_64/wheel/apex/parallel
copying build/lib.linux-x86_64-3.7/apex/parallel/optimized_sync_batchnorm_kernel.py -> build/bdist.linux-x86_64/wheel/apex/parallel
copying build/lib.linux-x86_64-3.7/apex/parallel/sync_batchnorm_kernel.py -> build/bdist.linux-x86_64/wheel/apex/parallel
creating build/bdist.linux-x86_64/wheel/apex/multi_tensor_apply
copying build/lib.linux-x86_64-3.7/apex/multi_tensor_apply/__init__.py -> build/bdist.linux-x86_64/wheel/apex/multi_tensor_apply
copying build/lib.linux-x86_64-3.7/apex/multi_tensor_apply/multi_tensor_apply.py -> build/bdist.linux-x86_64/wheel/apex/multi_tensor_apply
creating build/bdist.linux-x86_64/wheel/apex/normalization
copying build/lib.linux-x86_64-3.7/apex/normalization/fused_layer_norm.py -> build/bdist.linux-x86_64/wheel/apex/normalization
copying build/lib.linux-x86_64-3.7/apex/normalization/__init__.py -> build/bdist.linux-x86_64/wheel/apex/normalization
creating build/bdist.linux-x86_64/wheel/apex/optimizers
copying build/lib.linux-x86_64-3.7/apex/optimizers/fused_lamb.py -> build/bdist.linux-x86_64/wheel/apex/optimizers
copying build/lib.linux-x86_64-3.7/apex/optimizers/fused_novograd.py -> build/bdist.linux-x86_64/wheel/apex/optimizers
copying build/lib.linux-x86_64-3.7/apex/optimizers/__init__.py -> build/bdist.linux-x86_64/wheel/apex/optimizers
copying build/lib.linux-x86_64-3.7/apex/optimizers/fused_sgd.py -> build/bdist.linux-x86_64/wheel/apex/optimizers
copying build/lib.linux-x86_64-3.7/apex/optimizers/fused_adam.py -> build/bdist.linux-x86_64/wheel/apex/optimizers
creating build/bdist.linux-x86_64/wheel/apex/pyprof
creating build/bdist.linux-x86_64/wheel/apex/pyprof/nvtx
copying build/lib.linux-x86_64-3.7/apex/pyprof/nvtx/__init__.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/nvtx
copying build/lib.linux-x86_64-3.7/apex/pyprof/nvtx/nvmarker.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/nvtx
copying build/lib.linux-x86_64-3.7/apex/pyprof/__init__.py -> build/bdist.linux-x86_64/wheel/apex/pyprof
creating build/bdist.linux-x86_64/wheel/apex/pyprof/parse
copying build/lib.linux-x86_64-3.7/apex/pyprof/parse/nvvp.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/parse
copying build/lib.linux-x86_64-3.7/apex/pyprof/parse/__main__.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/parse
copying build/lib.linux-x86_64-3.7/apex/pyprof/parse/parse.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/parse
copying build/lib.linux-x86_64-3.7/apex/pyprof/parse/__init__.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/parse
copying build/lib.linux-x86_64-3.7/apex/pyprof/parse/db.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/parse
copying build/lib.linux-x86_64-3.7/apex/pyprof/parse/kernel.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/parse
creating build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/loss.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/optim.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/output.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/misc.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/__main__.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/usage.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/index_slice_join_mutate.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/pointwise.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/blas.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/embedding.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/reduction.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/prof.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/utility.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/__init__.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/softmax.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/recurrentCell.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/randomSample.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/conv.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/data.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/pooling.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/activation.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/linear.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/normalization.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/base.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/dropout.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/pyprof/prof/convert.py -> build/bdist.linux-x86_64/wheel/apex/pyprof/prof
copying build/lib.linux-x86_64-3.7/apex/__init__.py -> build/bdist.linux-x86_64/wheel/apex
creating build/bdist.linux-x86_64/wheel/apex/RNN
copying build/lib.linux-x86_64-3.7/apex/RNN/cells.py -> build/bdist.linux-x86_64/wheel/apex/RNN
copying build/lib.linux-x86_64-3.7/apex/RNN/__init__.py -> build/bdist.linux-x86_64/wheel/apex/RNN
copying build/lib.linux-x86_64-3.7/apex/RNN/RNNBackend.py -> build/bdist.linux-x86_64/wheel/apex/RNN
copying build/lib.linux-x86_64-3.7/apex/RNN/models.py -> build/bdist.linux-x86_64/wheel/apex/RNN
creating build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.7/apex/amp/utils.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.7/apex/amp/rnn_compat.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.7/apex/amp/frontend.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.7/apex/amp/_initialize.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.7/apex/amp/amp.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.7/apex/amp/_process_optimizer.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.7/apex/amp/opt.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.7/apex/amp/__init__.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.7/apex/amp/handle.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.7/apex/amp/scaler.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.7/apex/amp/__version__.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.7/apex/amp/compat.py -> build/bdist.linux-x86_64/wheel/apex/amp
copying build/lib.linux-x86_64-3.7/apex/amp/_amp_state.py -> build/bdist.linux-x86_64/wheel/apex/amp
creating build/bdist.linux-x86_64/wheel/apex/amp/lists
copying build/lib.linux-x86_64-3.7/apex/amp/lists/torch_overrides.py -> build/bdist.linux-x86_64/wheel/apex/amp/lists
copying build/lib.linux-x86_64-3.7/apex/amp/lists/__init__.py -> build/bdist.linux-x86_64/wheel/apex/amp/lists
copying build/lib.linux-x86_64-3.7/apex/amp/lists/functional_overrides.py -> build/bdist.linux-x86_64/wheel/apex/amp/lists
copying build/lib.linux-x86_64-3.7/apex/amp/lists/tensor_overrides.py -> build/bdist.linux-x86_64/wheel/apex/amp/lists
copying build/lib.linux-x86_64-3.7/apex/amp/wrap.py -> build/bdist.linux-x86_64/wheel/apex/amp
creating build/bdist.linux-x86_64/wheel/apex/reparameterization
copying build/lib.linux-x86_64-3.7/apex/reparameterization/reparameterization.py -> build/bdist.linux-x86_64/wheel/apex/reparameterization
copying build/lib.linux-x86_64-3.7/apex/reparameterization/__init__.py -> build/bdist.linux-x86_64/wheel/apex/reparameterization
copying build/lib.linux-x86_64-3.7/apex/reparameterization/weight_norm.py -> build/bdist.linux-x86_64/wheel/apex/reparameterization
creating build/bdist.linux-x86_64/wheel/apex/fp16_utils
copying build/lib.linux-x86_64-3.7/apex/fp16_utils/__init__.py -> build/bdist.linux-x86_64/wheel/apex/fp16_utils
copying build/lib.linux-x86_64-3.7/apex/fp16_utils/loss_scaler.py -> build/bdist.linux-x86_64/wheel/apex/fp16_utils
copying build/lib.linux-x86_64-3.7/apex/fp16_utils/fp16_optimizer.py -> build/bdist.linux-x86_64/wheel/apex/fp16_utils
copying build/lib.linux-x86_64-3.7/apex/fp16_utils/fp16util.py -> build/bdist.linux-x86_64/wheel/apex/fp16_utils
copying build/lib.linux-x86_64-3.7/syncbn.cpython-37m-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/wheel
copying build/lib.linux-x86_64-3.7/apex_C.cpython-37m-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/wheel
running install_egg_info
running egg_info
writing apex.egg-info/PKG-INFO
writing dependency_links to apex.egg-info/dependency_links.txt
writing top-level names to apex.egg-info/top_level.txt
'license_file' option was not specified
reading manifest file 'apex.egg-info/SOURCES.txt'
writing manifest file 'apex.egg-info/SOURCES.txt'
Copying apex.egg-info to build/bdist.linux-x86_64/wheel/apex-0.1-py3.7.egg-info
Copying dependency_links.txt to build/bdist.linux-x86_64/wheel/apex-0.1-py3.7.egg-info/dependency_links.txt
Copying PKG-INFO to build/bdist.linux-x86_64/wheel/apex-0.1-py3.7.egg-info/PKG-INFO
Copying top_level.txt to build/bdist.linux-x86_64/wheel/apex-0.1-py3.7.egg-info/top_level.txt
Copying SOURCES.txt to build/bdist.linux-x86_64/wheel/apex-0.1-py3.7.egg-info/SOURCES.txt
running install_scripts
adding license file "LICENSE" (matched pattern "LICEN[CS]E*")
creating build/bdist.linux-x86_64/wheel/apex-0.1.dist-info/WHEEL
creating 'dist/apex-0.1-cp37-cp37m-linux_x86_64.whl' and adding 'build/bdist.linux-x86_64/wheel' to it
adding 'amp_C.cpython-37m-x86_64-linux-gnu.so'
adding 'apex_C.cpython-37m-x86_64-linux-gnu.so'
adding 'fused_layer_norm_cuda.cpython-37m-x86_64-linux-gnu.so'
adding 'syncbn.cpython-37m-x86_64-linux-gnu.so'
adding 'apex/__init__.py'
adding 'apex/RNN/RNNBackend.py'
adding 'apex/RNN/__init__.py'
adding 'apex/RNN/cells.py'
adding 'apex/RNN/models.py'
adding 'apex/amp/__init__.py'
adding 'apex/amp/__version__.py'
adding 'apex/amp/_amp_state.py'
adding 'apex/amp/_initialize.py'
adding 'apex/amp/_process_optimizer.py'
adding 'apex/amp/amp.py'
adding 'apex/amp/compat.py'
adding 'apex/amp/frontend.py'
adding 'apex/amp/handle.py'
adding 'apex/amp/opt.py'
adding 'apex/amp/rnn_compat.py'
adding 'apex/amp/scaler.py'
adding 'apex/amp/utils.py'
adding 'apex/amp/wrap.py'
adding 'apex/amp/lists/__init__.py'
adding 'apex/amp/lists/functional_overrides.py'
adding 'apex/amp/lists/tensor_overrides.py'
adding 'apex/amp/lists/torch_overrides.py'
adding 'apex/contrib/__init__.py'
adding 'apex/contrib/groupbn/__init__.py'
adding 'apex/contrib/groupbn/batch_norm.py'
adding 'apex/contrib/optimizers/__init__.py'
adding 'apex/contrib/optimizers/fp16_optimizer.py'
adding 'apex/contrib/optimizers/fused_adam.py'
adding 'apex/contrib/optimizers/fused_sgd.py'
adding 'apex/contrib/xentropy/__init__.py'
adding 'apex/contrib/xentropy/softmax_xentropy.py'
adding 'apex/fp16_utils/__init__.py'
adding 'apex/fp16_utils/fp16_optimizer.py'
adding 'apex/fp16_utils/fp16util.py'
adding 'apex/fp16_utils/loss_scaler.py'
adding 'apex/multi_tensor_apply/__init__.py'
adding 'apex/multi_tensor_apply/multi_tensor_apply.py'
adding 'apex/normalization/__init__.py'
adding 'apex/normalization/fused_layer_norm.py'
adding 'apex/optimizers/__init__.py'
adding 'apex/optimizers/fused_adam.py'
adding 'apex/optimizers/fused_lamb.py'
adding 'apex/optimizers/fused_novograd.py'
adding 'apex/optimizers/fused_sgd.py'
adding 'apex/parallel/LARC.py'
adding 'apex/parallel/__init__.py'
adding 'apex/parallel/distributed.py'
adding 'apex/parallel/multiproc.py'
adding 'apex/parallel/optimized_sync_batchnorm.py'
adding 'apex/parallel/optimized_sync_batchnorm_kernel.py'
adding 'apex/parallel/sync_batchnorm.py'
adding 'apex/parallel/sync_batchnorm_kernel.py'
adding 'apex/pyprof/__init__.py'
adding 'apex/pyprof/nvtx/__init__.py'
adding 'apex/pyprof/nvtx/nvmarker.py'
adding 'apex/pyprof/parse/__init__.py'
adding 'apex/pyprof/parse/__main__.py'
adding 'apex/pyprof/parse/db.py'
adding 'apex/pyprof/parse/kernel.py'
adding 'apex/pyprof/parse/nvvp.py'
adding 'apex/pyprof/parse/parse.py'
adding 'apex/pyprof/prof/__init__.py'
adding 'apex/pyprof/prof/__main__.py'
adding 'apex/pyprof/prof/activation.py'
adding 'apex/pyprof/prof/base.py'
adding 'apex/pyprof/prof/blas.py'
adding 'apex/pyprof/prof/conv.py'
adding 'apex/pyprof/prof/convert.py'
adding 'apex/pyprof/prof/data.py'
adding 'apex/pyprof/prof/dropout.py'
adding 'apex/pyprof/prof/embedding.py'
adding 'apex/pyprof/prof/index_slice_join_mutate.py'
adding 'apex/pyprof/prof/linear.py'
adding 'apex/pyprof/prof/loss.py'
adding 'apex/pyprof/prof/misc.py'
adding 'apex/pyprof/prof/normalization.py'
adding 'apex/pyprof/prof/optim.py'
adding 'apex/pyprof/prof/output.py'
adding 'apex/pyprof/prof/pointwise.py'
adding 'apex/pyprof/prof/pooling.py'
adding 'apex/pyprof/prof/prof.py'
adding 'apex/pyprof/prof/randomSample.py'
adding 'apex/pyprof/prof/recurrentCell.py'
adding 'apex/pyprof/prof/reduction.py'
adding 'apex/pyprof/prof/softmax.py'
adding 'apex/pyprof/prof/usage.py'
adding 'apex/pyprof/prof/utility.py'
adding 'apex/reparameterization/__init__.py'
adding 'apex/reparameterization/reparameterization.py'
adding 'apex/reparameterization/weight_norm.py'
adding 'apex-0.1.dist-info/LICENSE'
adding 'apex-0.1.dist-info/METADATA'
adding 'apex-0.1.dist-info/WHEEL'
adding 'apex-0.1.dist-info/top_level.txt'
adding 'apex-0.1.dist-info/RECORD'
removing build/bdist.linux-x86_64/wheel
/home/user/code/DeepSpeed
Installing apex locally so that deepspeed will build
Found existing installation: apex 0.1
Uninstalling apex-0.1:
  Successfully uninstalled apex-0.1
Using pip 20.2.4 from /home/user/miniconda3/envs/ait/lib/python3.7/site-packages/pip (python 3.7)
Non-user install because site-packages writeable
Created temporary directory: /tmp/pip-ephem-wheel-cache-st9ti59p
Created temporary directory: /tmp/pip-req-tracker-x7xrzrg7
Initialized build tracking at /tmp/pip-req-tracker-x7xrzrg7
Created build tracker: /tmp/pip-req-tracker-x7xrzrg7
Entered build tracker: /tmp/pip-req-tracker-x7xrzrg7
Created temporary directory: /tmp/pip-install-yhmp53sr
Processing ./third_party/apex/dist/apex-0.1-cp37-cp37m-linux_x86_64.whl
  Added apex==0.1 from file:///home/user/code/DeepSpeed/third_party/apex/dist/apex-0.1-cp37-cp37m-linux_x86_64.whl to build tracker '/tmp/pip-req-tracker-x7xrzrg7'
  Removed apex==0.1 from file:///home/user/code/DeepSpeed/third_party/apex/dist/apex-0.1-cp37-cp37m-linux_x86_64.whl from build tracker '/tmp/pip-req-tracker-x7xrzrg7'
Installing collected packages: apex

Successfully installed apex-0.1
Removed build tracker: '/tmp/pip-req-tracker-x7xrzrg7'
Building deepspeed wheel
./install.sh: line 196: 193157 Floating point exception(core dumped) python setup.py -v bdist_wheel
Error on line 195
Fail to install deepspeed

@drfinkus
Copy link
Author

@tjruwase did the install.sh -n log above help in any way? Please let me know if I can help debug further!

@tjruwase
Copy link
Contributor

@drfinkus, thanks for sharing the log. It confirms that error is in setup.py, but not much more. So I think we should run setup.py alone to see if it will reveal more information. So can you please share the log for the following command?
python setup.py -v build

@drfinkus
Copy link
Author

@tjruwase absolutely, please see below!

$ python setup.py -v build
Floating point exception (core dumped)

Not very helpful unfortunately, but I hope this helps somehow! Let me know if I can help debug further!

@Sleepychord
Copy link

Sleepychord commented Oct 28, 2020

Hi @drfinkus @tjruwase , I have met the same problem and solve it by simply commenting the relevant line in setup.py into

cpu_vector_instructions = {} # available_vector_instructions()

@drfinkus
Copy link
Author

@tjruwase @Sleepychord thanks for the suggestion! But if I read this correctly, this would disable CPUAdam. That would cripple it a lot for my use case. I was specifically looking for the optimized Adam implementation.

Is it an AMD issue? Or do you see this on Intel? Is it possible to fix cpufeature? I reported it upstream but did not get much further.

@drfinkus
Copy link
Author

@tjruwase @Sleepychord @rople380, after further investigation, it seems like cpufeature was implemented using Intel logic only. I fixed the implementation for AMD processors as well, you can use my fork here https://github.com/drfinkus/cpufeature

I also issued a PR, hopefully it will make its way downstream soon, I am in contact with the author. I will see if this fixes the install issue and close the issue if it does.

@drfinkus
Copy link
Author

@tjruwase btw, as a suggestion, I see that in the master branch, cpu_adam is disabled by default:

DS_BUILD_CPU_ADAM = int(os.environ.get('DS_BUILD_CPU_ADAM', 0)) * DS_BUILD_CPU_ADAM_MASK

It's very easy to miss unless you're specifically looking for it, perhaps document it somewhere or maybe turn the default back on as for the rest. Now that cpufeature is fixed (see above) that shouldn't be a problem anymore.

@tjruwase
Copy link
Contributor

@drfinkus Thanks for your help with this issue. We further improved installation recently and removed dependency on cpufeature. Can you please check if you still see installation issues? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants