compilation error with GCC 11.3.0 + CUDA 11.7.0 #364

boegel · 2023-09-08T12:24:36Z

I'm trying to build dorado 0.3.4 from source using GCC 11.3.0 + CUDA 11.7.0, and I'm hitting the following compilation error:

/tmp/easybuild_build/dorado/0.3.4/foss-2022a-CUDA-11.7.0/dorado/dorado/nn/CudaCRFModel.cpp: In member function void dorado::CudaCaller::cuda_thread_fn():
/tmp/easybuild_build/dorado/0.3.4/foss-2022a-CUDA-11.7.0/dorado/dorado/nn/CudaCRFModel.cpp:283:45: error: struct c10::cuda::CUDACachingAllocator::DeviceStats has no member named requested_bytes; did you mean reserved_bytes?
  283 |                     print_stat(device_stats.requested_bytes), device_stats.num_alloc_retries,
      |                                             ^~~~~~~~~~~~~~~
      |                                             reserved_bytes
make[2]: *** [CMakeFiles/dorado_lib.dir/build.make:800: CMakeFiles/dorado_lib.dir/dorado/nn/CudaCRFModel.cpp.o] Error 1

Is this a known problem, should I use a different CUDA (or GCC) version, or am I overlooking something else?

The text was updated successfully, but these errors were encountered:

malton-ont · 2023-09-12T14:31:30Z

DeviceStats::requested_bytes was introduced in libtorch 2.0. Are you linking against your own version of libtorch rather than the version that the dorado configuration process downloads?

boegel · 2023-09-12T15:05:06Z

@malton-ont Thanks for the feedback!

Yes, we are installing dorado on top of a PyTorch 1.12.0 we installed ourselves here, since we prefer to have control over which version is used, and how it gets built (and because we try hard to avoid that stuff gets downloaded on the fly during an installation because that complicates reproducing that same installation later).

Is there an overview of which PyTorch versions dorado 0.3.4 is compatible with?

tijyojwad · 2023-09-12T19:55:08Z

Hi @boegel - dorado depends on a custom build of pytorch 2.0 that we host on our CDN because we need static libraries. This custom build gets downloaded when the dorado build setup runs.

We still support building from the PyTorch hosted package, but it needs to be enabled with -D TRY_USING_STATIC_TORCH_LIB=0 when setting up cmake. The exact supported PyTorch version is specified here.

If your aim is to have reproducible builds, I would recommend using one of the pre-built dorado releases since all the dependencies (other than standard host libs) are packaged together. So that build is fixed and will be reproducible.

Is there a reason you're doing custom builds?

boegel · 2024-01-04T10:09:07Z

@tijyojwad The main reason we're doing custom builds of Dorado and its dependencies is performance: we use compiler options like -march=native so that the binaries obtained are optimized for the CPUs on which they will be used, which can result in significant performance improvements.

In addition, especially for PyTorch, the EasyBuild community does a significant effort to try and get the (massive) PyTorch test suite to pass on our custom PyTorch installation, so we're very reluctant to use a different PyTorch.

Thanks for the pointers on the requirement for -DTRY_USING_STATIC_TORCH_LIB=0, that's very helpful.

Can you elaborate why you prefer using static libraries? Does that just make things easier w.r.t. packaging for Dorado?

tijyojwad · 2024-01-04T14:58:12Z

Yes static libraries are primarily for reducing the size of the distributed build, since we minimize which torch libraries are packaged. It also reduces the likelihood of it interfering with existing torch installation. And for a given pre-built version of Dorado, all dependencies are fixed (i.e. we don't download any on the fly).

From a users perspective, I think dorado dependencies should be treated as a black box (whether it's depending on torch or not is a dorado implementation detail). Going down the path of making dorado use your own version of torch would be a non-trivial undertaking, and I think it might be better to depend on specific dorado versions (and have a process to validate upgrades) rather than lock down dorado's dependencies.

boegel mentioned this issue Sep 8, 2023

dorado vscentrum/vsc-software-stack#208

Closed

HalfPhoton added the build For issues with building the code label May 20, 2024

HalfPhoton closed this as completed May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compilation error with GCC 11.3.0 + CUDA 11.7.0 #364

compilation error with GCC 11.3.0 + CUDA 11.7.0 #364

boegel commented Sep 8, 2023

malton-ont commented Sep 12, 2023

boegel commented Sep 12, 2023

tijyojwad commented Sep 12, 2023

boegel commented Jan 4, 2024

tijyojwad commented Jan 4, 2024

compilation error with GCC 11.3.0 + CUDA 11.7.0 #364

compilation error with GCC 11.3.0 + CUDA 11.7.0 #364

Comments

boegel commented Sep 8, 2023

malton-ont commented Sep 12, 2023

boegel commented Sep 12, 2023

tijyojwad commented Sep 12, 2023

boegel commented Jan 4, 2024

tijyojwad commented Jan 4, 2024