Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error building k2 v1.13 with pytorch:22.01 #916

Closed
GNroy opened this issue Feb 9, 2022 · 5 comments · Fixed by #917
Closed

Error building k2 v1.13 with pytorch:22.01 #916

GNroy opened this issue Feb 9, 2022 · 5 comments · Fixed by #917

Comments

@GNroy
Copy link

GNroy commented Feb 9, 2022

I'm trying to build k2 v1.13 from source (python3 setup.py install) inside of pytorch:22.01 container (nvcr.io/nvidia/pytorch:22.01-py3) and I get the following error when compiling mutual_information.cu:

[100%] Building CUDA object k2/python/csrc/CMakeFiles/_k2.dir/torch/mutual_information_cuda.cu.o
/usr/local/cuda/include/thrust/system/cuda/detail/core/util.h(534): error: namespace "thrust::cub" has no member "CacheModifiedInputIterator"

/usr/local/cuda/include/thrust/system/cuda/detail/core/util.h(534): error: too few arguments for class template "thrust::detail::conditional"

/usr/local/cuda/include/thrust/system/cuda/detail/core/util.h(534): error: expected an identifier

/usr/local/cuda/include/thrust/system/cuda/detail/core/util.h(536): error: expected a ";"

/usr/local/cuda/include/thrust/system/cuda/detail/core/util.h(576): error: namespace "thrust::cub" has no member "BlockLoad"

/usr/local/cuda/include/thrust/system/cuda/detail/core/util.h(576): error: expected a ";"

/usr/local/cuda/include/thrust/system/cuda/detail/core/util.h(593): error: namespace "thrust::cub" has no member "BlockStore"

/usr/local/cuda/include/thrust/system/cuda/detail/core/util.h(593): error: expected a ";"

/usr/local/cuda/include/thrust/system/cuda/detail/core/util.h(635): error: namespace "thrust::cub" has no member "PtxVersion"

/usr/local/cuda/include/thrust/system/cuda/detail/core/util.h(642): error: namespace "thrust::cub" has no member "SyncStream"

/usr/local/cuda/include/thrust/system/cuda/detail/core/util.h(647): error: namespace "thrust::cub" has no member "CTA_SYNC"

/usr/local/cuda/include/thrust/system/cuda/detail/core/util.h(663): error: namespace "thrust::cub" has no member "UnitWord"

/usr/local/cuda/include/thrust/system/cuda/detail/core/util.h(663): error: expected an identifier

/usr/local/cuda/include/thrust/system/cuda/detail/core/util.h(667): error: identifier "DeviceWord" is undefined

/usr/local/cuda/include/thrust/system/cuda/detail/core/util.h(670): error: "DeviceWord" is not a type name

15 errors detected in the compilation of "/tmp/pip-install-g7ig1yhx/k2_5d3497e7876e4f82bd5f05bd7bdc1677/k2/python/csrc/torch/mutual_information.cu".
make[3]: *** [k2/python/csrc/CMakeFiles/_k2.dir/build.make:216: k2/python/csrc/CMakeFiles/_k2.dir/torch/mutual_information.cu.o] Error 1

For the same setup, k2 v1.11 build succeeded.

P.S. sorry if the issue is not related to k2.

@danpovey
Copy link
Collaborator

danpovey commented Feb 9, 2022

I suspect it's a mismatch between the CUDA on your path, which is a system-installed CUDA, and whatever that version of PyTorch was intended to be used with. I believe when we include PyTorch we also get CUDA headers, including those of cub, and this can cause problems if we don't have the exact same version as the NVCC we are using.

@csukuangfj
Copy link
Collaborator

I'm trying to build k2 v1.13 from source (python3 setup.py install) inside of pytorch:22.01 container

Could you provide some information about the container, e.g.

  • pytorch version
  • CUDA version
    ?

@GNroy
Copy link
Author

GNroy commented Feb 9, 2022

Could you provide some information about the container, e.g.

pytorch version
CUDA version
?

  • CUDA 11.6.r11.6
  • PyTorch 1.11.0a0+bfe5ad2

(Edit) More info on https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel_22-01.html
(Edit2) print(torch.version.cuda) gives 11.6
(Edit3) /usr/local/cuda links to /etc/alternatives/cuda -> /usr/local/cuda-11.6

@csukuangfj
Copy link
Collaborator

I will try to use torch 1.11.0 + CUDA 11.5 to reproduce your issue and try to fix it.
(The nightly built wheel of pytorch https://download.pytorch.org/whl/nightly/torch_nightly.html supports only up to CUDA 11.5)

@GNroy
Copy link
Author

GNroy commented Feb 9, 2022

@csukuangfj I just tried the torch 1.11.0 + CUDA 11.5 combination in pytorch:21.12 (previous container).
k2 v1.13 build succeeded in such setup.
This makes the issue either related to CUDA 11.6 or the pytorch:22.01 container itself.

To reproduce the issue:

docker run --gpus all -it nvcr.io/nvidia/pytorch:22.01-py3
K2_MAKE_ARGS="-j" pip install git+https://github.com/k2-fsa/[email protected]#egg=k2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants