Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compilation error with GCC 11.3.0 + CUDA 11.7.0 #364

Closed
boegel opened this issue Sep 8, 2023 · 5 comments
Closed

compilation error with GCC 11.3.0 + CUDA 11.7.0 #364

boegel opened this issue Sep 8, 2023 · 5 comments
Labels
build For issues with building the code

Comments

@boegel
Copy link

boegel commented Sep 8, 2023

I'm trying to build dorado 0.3.4 from source using GCC 11.3.0 + CUDA 11.7.0, and I'm hitting the following compilation error:

/tmp/easybuild_build/dorado/0.3.4/foss-2022a-CUDA-11.7.0/dorado/dorado/nn/CudaCRFModel.cpp: In member function void dorado::CudaCaller::cuda_thread_fn():
/tmp/easybuild_build/dorado/0.3.4/foss-2022a-CUDA-11.7.0/dorado/dorado/nn/CudaCRFModel.cpp:283:45: error: struct c10::cuda::CUDACachingAllocator::DeviceStats has no member named requested_bytes; did you mean reserved_bytes?
  283 |                     print_stat(device_stats.requested_bytes), device_stats.num_alloc_retries,
      |                                             ^~~~~~~~~~~~~~~
      |                                             reserved_bytes
make[2]: *** [CMakeFiles/dorado_lib.dir/build.make:800: CMakeFiles/dorado_lib.dir/dorado/nn/CudaCRFModel.cpp.o] Error 1

Is this a known problem, should I use a different CUDA (or GCC) version, or am I overlooking something else?

@malton-ont
Copy link
Collaborator

DeviceStats::requested_bytes was introduced in libtorch 2.0. Are you linking against your own version of libtorch rather than the version that the dorado configuration process downloads?

@boegel
Copy link
Author

boegel commented Sep 12, 2023

@malton-ont Thanks for the feedback!

Yes, we are installing dorado on top of a PyTorch 1.12.0 we installed ourselves here, since we prefer to have control over which version is used, and how it gets built (and because we try hard to avoid that stuff gets downloaded on the fly during an installation because that complicates reproducing that same installation later).

Is there an overview of which PyTorch versions dorado 0.3.4 is compatible with?

@tijyojwad
Copy link
Collaborator

Hi @boegel - dorado depends on a custom build of pytorch 2.0 that we host on our CDN because we need static libraries. This custom build gets downloaded when the dorado build setup runs.

We still support building from the PyTorch hosted package, but it needs to be enabled with -D TRY_USING_STATIC_TORCH_LIB=0 when setting up cmake. The exact supported PyTorch version is specified here.

If your aim is to have reproducible builds, I would recommend using one of the pre-built dorado releases since all the dependencies (other than standard host libs) are packaged together. So that build is fixed and will be reproducible.

Is there a reason you're doing custom builds?

@boegel
Copy link
Author

boegel commented Jan 4, 2024

@tijyojwad The main reason we're doing custom builds of Dorado and its dependencies is performance: we use compiler options like -march=native so that the binaries obtained are optimized for the CPUs on which they will be used, which can result in significant performance improvements.

In addition, especially for PyTorch, the EasyBuild community does a significant effort to try and get the (massive) PyTorch test suite to pass on our custom PyTorch installation, so we're very reluctant to use a different PyTorch.

Thanks for the pointers on the requirement for -DTRY_USING_STATIC_TORCH_LIB=0, that's very helpful.

Can you elaborate why you prefer using static libraries? Does that just make things easier w.r.t. packaging for Dorado?

@tijyojwad
Copy link
Collaborator

Yes static libraries are primarily for reducing the size of the distributed build, since we minimize which torch libraries are packaged. It also reduces the likelihood of it interfering with existing torch installation. And for a given pre-built version of Dorado, all dependencies are fixed (i.e. we don't download any on the fly).

From a users perspective, I think dorado dependencies should be treated as a black box (whether it's depending on torch or not is a dorado implementation detail). Going down the path of making dorado use your own version of torch would be a non-trivial undertaking, and I think it might be better to depend on specific dorado versions (and have a process to validate upgrades) rather than lock down dorado's dependencies.

@HalfPhoton HalfPhoton added the build For issues with building the code label May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build For issues with building the code
Projects
None yet
Development

No branches or pull requests

4 participants