-
Notifications
You must be signed in to change notification settings - Fork 565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build Torch from source #4554
Build Torch from source #4554
Conversation
31ce087
to
ae939c8
Compare
4b4943e
to
35e09b5
Compare
T/Torch/Torch/build_tarballs.jl
Outdated
cuda_version_minor=`echo $cuda_version | cut -d . -f 2` | ||
cuda_full_path="$WORKSPACE/srcdir/CUDA_full.v$cuda_version/cuda" | ||
apk del cmake | ||
apk add 'cmake<3.17' --repository=http://dl-cdn.alpinelinux.org/alpine/v3.11/main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, what?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got fed up trying to hack the "find cuda" part of Torch - downgrading cmake seems to work.
Torch seems to be using the "old" https://cmake.org/cmake/help/latest/module/FindCUDA.html approach which was replaced with https://cmake.org/cmake/help/latest/module/FindCUDAToolkit.html in CMake v3.17 as I understand it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sigh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least it seems more "clean" to me than patching some set of cmake files :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and then it works in the BB shell (e.g. with --debug=begin
), but not in auto mode...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and then it works in the BB shell (e.g. with --debug=begin), but not in auto mode...
I'm not 100% sure what you mean here, but note that package installation and setting environment variables aren't persistent at the moment, when dropping into the debug shell (package are installed in a tmpfs which is lost when recreating the debug environment, and we don't have a way to remember all environment settings). It may be possible to address some of these issues in the future, but at the moment you have to repeat those operations manually
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - that was a bit too terse. I meant that I could get cmake to find CUDA when running interactively, but not when just doing a build. It turned out to be something weird about the (copied/forked) FindCUDA/FindCUDAToolkit cmake stuff in pytorch/cmake
- at least I currently ended up with the hack about running "configure" twice/thrice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might actually work with the bundled cmake v3.21 now - have to check...
29cfcc0
to
3c03c11
Compare
6cdc1c9
to
88f338a
Compare
36063d8
to
7bf84de
Compare
12f68e5
to
47ff52a
Compare
T/Torch/build_tarballs.jl
Outdated
Dependency(PackageSpec(name="CompilerSupportLibraries_jll", uuid="e66e0078-7015-5450-92f7-15fbd957f2ae")), | ||
Dependency("blis_jll"; platforms = blis_platforms), | ||
Dependency("CPUInfo_jll", v"0.0.20201217"), | ||
Dependency("CUDNN_jll", v"8.2.4"; compat = "8", platforms = cuda_platforms), | ||
Dependency("Gloo_jll", v"0.0.20210521"; platforms = filter(p -> nbits(p) == 64, platforms)), | ||
Dependency("LAPACK_jll"; platforms = openblas_platforms), | ||
Dependency("MKL_jll"; platforms = mkl_platforms), | ||
BuildDependency("MKL_Headers_jll"; platforms = mkl_platforms), | ||
Dependency("OpenBLAS_jll"; platforms = openblas_platforms), | ||
Dependency("PThreadPool_jll", v"0.0.20210414"), | ||
Dependency("SLEEF_jll", v"3.5.2"), | ||
# Dependency("TensorRT_jll"; platforms = cuda_platforms), # Building with TensorRT is not supported: https://github.com/pytorch/pytorch/issues/60228 | ||
Dependency("XNNPACK_jll", v"0.0.20210622"), | ||
BuildDependency(PackageSpec("protoc_jll", Base.UUID("c7845625-083e-5bbe-8504-b32d602b7110"), v"3.13.0")), | ||
HostBuildDependency(PackageSpec("protoc_jll", Base.UUID("c7845625-083e-5bbe-8504-b32d602b7110"), v"3.13.0")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you specify the version of a runtime dependency you typically also want to specify the compat (or actually only that one since the build version is automatically inferred from the lowest compatible version)
8f0437d
to
931edf6
Compare
* Removed Torch v1.4.0 which included Torch.jl wrapper * Skipped Torch.jl wrapper * With MKL dependency on MKL-platforms * Using protoc v3.13.0 JLL. * Added protoc as a build dependency to get correct version * Not using ONNX dependency to get past protoc issue * Added micromamba install of pyyaml and typing_extensions - needed for build. * Using XNNPACK JLL dependency * Added CPUInfo and PThreadPool dependencies * Added SLEEF dependency * Turned off some features explicitly to silence some configure warnings * Not using NNPACK, and QNNPACK, and limited PYTORCH_QNNPACK to x86_64. * Disabled use of breakpad on aarch64-linux-gnu * Enabled configure on Windows via patch and disabling breakpad * Disabled use of TensorPipe on linux-musl * Excluded unsupported powerpc64le and i686-windows platforms * Disabled kineto for w64 and freebsd * Disabled breakpad for FreeBSD * Disabled use of MKLDNN on macOS * Added Gloo dependency - to aid linux-musl * Disabled MKLDNN for linux-musl * Disabled FreeBSD as Clang v12 crashes * Disabled MKLDNN for w64-mingw32 * Using MKL, BLIS, or OpenBLAS + LAPACK - preferring MKL or BLIS * Restricted use of LAPACK to OpenBLAS platforms * Set preferred BLAS for armv6l-linux-gnu * Disabled FBGEMM for x86_64-w64-mingw32 * Added MKL_Headers as dependency * Disabled MKL for Windows as CMake cannot find MKL * Optimized git submodule update * Added note about disabling MKLDNN for x86_64-apple-darwin * Fixed a few warnings related to FBGEMM * Fixed windows warning related to TensorPipe * Disabled Metal to silence warning that it is only used on iOS * Silence cmake developer warnings * Disabled linux-musl and Windows * Added additional library product libtorch_cpu * Added SO version to libraries and disabled numpy * Set GLIBCXX_USE_CXX11_ABI - like official libtorch builds. * Added platform expansion for C++ string ABIs * Added dep build versions and/or compat * Disabled ARM 32-bit platforms * Fixup for FBGEMM warning on aarch64-apple-darwin
e7dc80e
to
afc8ffa
Compare
* Using CUDA_full v11.3 to use v11.3.1+1 which includes Thrust library. * Using CUDNN v8.2.4 for build version (similar to ONNXRuntime) * Added patch for cmake to find CUDA * Set CUDACXX to make cmake find CUDA * Added CUDA libraries manually - and enabled CUDNN * Added double-triple configure hack to make CUDA configure - To get past TRY_RUN for CUDA * Added CUDA headers to CMAKE_INCLUDE_PATH * Additional fixes for CUDA - and CUB * Set TMPDIR for nvcc * Added additional CUDA libraries
f466550
to
2dc37fa
Compare
Is this good to go now? |
Yes, LGTM :-) I plan to follow-up with a PR with a recipe for building an updated version of the C wrapper in https://github.com/FluxML/Torch.jl/tree/master/build |
Awesome I was just looking at whether to do the ocaml or Rust one today. |
Alternative to #4477, as that bumped into FluxML/Torch.jl#17
Building Torch from source (for Linux and Linux with CUDA) would likely remedy (at least):
Related aims (but not the aim of this PR):