[WIP] Always pass `-x cu` to nvcc #1047

trxcllnt · 2021-09-27T17:53:29Z

This PR ensures the -x cu language flag is not modified when constructing an nvcc compile string. When rewrite_includes_only is false, the -x cu argument is transformed to -x cu-cpp-output, causing nvcc to error with:

nvcc fatal   : Value 'cu-cpp-output' is not defined for option 'x'

Curiously this only seems to show up when attempting to do a distributed compilation with sccache-dist. I have not encountered this issue doing local-only sccache builds. Does rewrite_includes_only only affect sccache-dist servers (or is set to false in the server job request)?

cc: @robertmaynard

Otherwise we get warnings about unused imports on Windows.

This is a bit of a hack. The integration tests under `src/test/`, specifically `test_server_compile` have started timing out at some point after the Windows builds were broken. It's not obvious to me why it should be this particular test, and not other tests. But we're running on Windows, and presumably running on some kind of VM in CI, so it seems worth bumping up the timeout here.

Tests on appveyor timeout before the server starts.

BLAKE3 is designed to be a very high performance cryptographic hash. The BLAKE3 team has shown 8.5x higher single-thread performance than SHA-512 on modern server hardware (AWS `c5.metal`). This change did not result in a significant improvement to my observed local build times, but newer hardware may see a meaningful improvement. Signed-off-by: George Hahn <[email protected]>

This is used by chromium for example.

Even though Actions aren't supported on the main repo, they can still be supported on personal repos, and people might proactively fix their Windows bustage if GitHub sends them emails about it. So let's keep Actions, but turn off things we haven't enabled on Travis yet.

Co-authored-by: Bert Belder <[email protected]>

Removes a chunk from the readme regarding a false positive rustc-wrapper entry not being used, which is closed since 1.40.0 . Cargo issue: rust-lang/cargo#7745

Co-authored-by: Bernhard Schuster <[email protected]>

Because the configuration is merged from both the environment and the configuration file, it's possible to forget about overriding variables related to one of the backends (e.g. by setting `SCCACHE_REDIS`). To account for that and not have to explicitly list/remember all of the supported env vars, we just don't inherit the sccache-related environment at all when running this test.

I needed a way to make hashes CWD-dependent, and it feels ugly to use other variables names for that purpose.

…xtension)

…eate custom gcc+nvcc toolchain tgz

…nvcc-lang

sylvestre · 2021-12-03T14:52:28Z

Some tasks are failing (clippy, rustfmt, etc)
and some lines aren't covered by tests:
https://codecov.io/gh/mozilla/sccache/commit/8024345a32442df3cf9a8f21a8be6b5620fe7aea/

could you please fix these issues?
thanks

trxcllnt · 2022-01-27T21:38:41Z

@sylvestre sorry for the delay over the holidays. My original goal for this PR was to make sccache-dist work with nvcc. After diving in, I discovered some structural limitations in nvcc that mean the original small fix is insufficient.

I'm fairly close with the full fix -- I think I just need to ensure more things are packaged into the nvcc dist toolchain. If you don't mind, I'd like to keep this PR open as a draft/work-in-progress so I can keep pushing things and testing in CI 🙏.

mitchhentges · 2022-02-16T20:59:11Z

Sounds good to me, go for it Paul.
Would you mind explicitly marking this as Draft/WIP so that it is removed from the review queue?
(I'm assuming that published PR's can be re-draft-ed, and that they still trigger CI. If not, then if you wouldn't mind using [WIP] instead of [FIX] - because "FIX" could be interpreted as a makeshift "this is a bugfix" label - that would be 👍).

suo · 2022-07-06T01:47:06Z

I'm fairly close with the full fix -- I think I just need to ensure more things are packaged into the nvcc dist toolchain.

This is tantalizing 😛. I would also like to make sccache-dist support nvcc nicely. @trxcllnt do you mind explaining what the gaps are between this PR and the "full fix"? If you don't have time to push this one over the finish line I might have some time to try.

trxcllnt · 2022-07-06T23:50:03Z

@suo Yeah, I'd love some help. I did some more exploring since I posted last, so I'll try to describe my current thoughts in detail. I don't think it's worth building from this branch anymore since it doesn't have any of the work I describe below and is so behind [email protected].

Disclaimer: I only know what's publicly available in docs and presentations. I don't have any special knowledge of nvcc internals or roadmap. I'm sure there's use-cases/edge-cases of which I'm not aware.

The main issue for sccache-dist is that nvcc is a sort of compiler-launcher, not the compiler itself. The sccache client expects to be able to preprocess a file to compute a hash, then send the preprocessed file contents (plus a compiler toolchain) to a worker and compile the preprocessed file. Unfortunately, compiling preprocessed input is not a supported nvcc run mode.

Executing nvcc <args> produces a list of sub-compiler invocations to the host compiler and NVIDIA compilers.

Here's an example of the sub-compiler invocations generated and executed by nvcc:

# Safe to run w/o creating `/tmp/x.cu` input file due to --dryrun
/usr/local/cuda/bin/nvcc \
    --generate-code=arch=compute_60,code=[sm_60] \
    --generate-code=arch=compute_70,code=[sm_70] \
    --generate-code=arch=compute_75,code=[compute_75,sm_75] \
    --generate-code=arch=compute_80,code=[compute_80,sm_80] \
    --generate-code=arch=compute_86,code=[compute_86,sm_86] \
    -c /tmp/x.cu -o /tmp/x.cu.o -DSCCACHE_TEST_DEFINE \
    --dryrun

#$ _NVVM_BRANCH_=nvvm
#$ _SPACE_= 
#$ _CUDART_=cudart
#$ _HERE_=/usr/local/cuda/bin
#$ _THERE_=/usr/local/cuda/bin
#$ _TARGET_SIZE_=
#$ _TARGET_DIR_=
#$ _TARGET_DIR_=targets/x86_64-linux
#$ TOP=/usr/local/cuda/bin/..
#$ NVVMIR_LIBRARY_DIR=/usr/local/cuda/bin/../nvvm/libdevice
#$ LD_LIBRARY_PATH=/usr/local/cuda/bin/../lib:
#$ PATH=/usr/local/cuda/bin/../nvvm/bin:/usr/local/cuda/bin:/usr/local/cuda/bin:/usr/local/cuda/nvvm/bin:/home/ptaylor/.nvm/versions/node/v16.15.1/bin:/home/ptaylor/.cargo/bin:/home/ptaylor/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/home/ptaylor/.fzf/bin:/home/ptaylor/.bin:/home/ptaylor/.local/bin
#$ INCLUDES="-I/usr/local/cuda/bin/../targets/x86_64-linux/include"  
#$ LIBRARIES=  "-L/usr/local/cuda/bin/../targets/x86_64-linux/lib/stubs" "-L/usr/local/cuda/bin/../targets/x86_64-linux/lib"
#$ CUDAFE_FLAGS=
#$ PTXAS_FLAGS=

#$ gcc -D__CUDA_ARCH__=860 -D__CUDA_ARCH_LIST__=600,700,750,800,860 -E -x c++  -DCUDA_DOUBLE_MATH_FUNCTIONS -D__CUDACC__ -D__NVCC__  "-I/usr/local/cuda/bin/../targets/x86_64-linux/include"    -D "SCCACHE_TEST_DEFINE" -D__CUDACC_VER_MAJOR__=11 -D__CUDACC_VER_MINOR__=7 -D__CUDACC_VER_BUILD__=64 -D__CUDA_API_VER_MAJOR__=11 -D__CUDA_API_VER_MINOR__=7 -D__NVCC_DIAG_PRAGMA_SUPPORT__=1 -include "cuda_runtime.h" -m64 "/tmp/x.cu" -o "/tmp/tmpxft_0003a542_00000000-11_x.compute_86.cpp1.ii" 
#$ cicc --c++14 --gnu_version=90400 --display_error_number --orig_src_file_name "/tmp/x.cu" --orig_src_path_name "/tmp/x.cu" --allow_managed   -arch compute_86 -m64 --no-version-ident -ftz=0 -prec_div=1 -prec_sqrt=1 -fmad=1 --include_file_name "tmpxft_0003a542_00000000-3_x.fatbin.c" -tused --gen_module_id_file --module_id_file_name "/tmp/tmpxft_0003a542_00000000-4_x.module_id" --gen_c_file_name "/tmp/tmpxft_0003a542_00000000-6_x.compute_86.cudafe1.c" --stub_file_name "/tmp/tmpxft_0003a542_00000000-6_x.compute_86.cudafe1.stub.c" --gen_device_file_name "/tmp/tmpxft_0003a542_00000000-6_x.compute_86.cudafe1.gpu"  "/tmp/tmpxft_0003a542_00000000-11_x.compute_86.cpp1.ii" -o "/tmp/tmpxft_0003a542_00000000-6_x.compute_86.ptx"

#$ gcc -D__CUDA_ARCH__=600 -D__CUDA_ARCH_LIST__=600,700,750,800,860 -E -x c++  -DCUDA_DOUBLE_MATH_FUNCTIONS -D__CUDACC__ -D__NVCC__  "-I/usr/local/cuda/bin/../targets/x86_64-linux/include"    -D "SCCACHE_TEST_DEFINE" -D__CUDACC_VER_MAJOR__=11 -D__CUDACC_VER_MINOR__=7 -D__CUDACC_VER_BUILD__=64 -D__CUDA_API_VER_MAJOR__=11 -D__CUDA_API_VER_MINOR__=7 -D__NVCC_DIAG_PRAGMA_SUPPORT__=1 -include "cuda_runtime.h" -m64 "/tmp/x.cu" -o "/tmp/tmpxft_0003a542_00000000-12_x.compute_60.cpp1.ii" 
#$ cicc --c++14 --gnu_version=90400 --display_error_number --orig_src_file_name "/tmp/x.cu" --orig_src_path_name "/tmp/x.cu" --allow_managed   -arch compute_60 -m64 --no-version-ident -ftz=0 -prec_div=1 -prec_sqrt=1 -fmad=1 --include_file_name "tmpxft_0003a542_00000000-3_x.fatbin.c" -tused --module_id_file_name "/tmp/tmpxft_0003a542_00000000-4_x.module_id" --gen_c_file_name "/tmp/tmpxft_0003a542_00000000-10_x.compute_60.cudafe1.c" --stub_file_name "/tmp/tmpxft_0003a542_00000000-10_x.compute_60.cudafe1.stub.c" --gen_device_file_name "/tmp/tmpxft_0003a542_00000000-10_x.compute_60.cudafe1.gpu"  "/tmp/tmpxft_0003a542_00000000-12_x.compute_60.cpp1.ii" -o "/tmp/tmpxft_0003a542_00000000-10_x.compute_60.ptx"
#$ ptxas -arch=sm_60 -m64  "/tmp/tmpxft_0003a542_00000000-10_x.compute_60.ptx"  -o "/tmp/tmpxft_0003a542_00000000-13_x.compute_60.cubin" 

#$ gcc -D__CUDA_ARCH__=700 -D__CUDA_ARCH_LIST__=600,700,750,800,860 -E -x c++  -DCUDA_DOUBLE_MATH_FUNCTIONS -D__CUDACC__ -D__NVCC__  "-I/usr/local/cuda/bin/../targets/x86_64-linux/include"    -D "SCCACHE_TEST_DEFINE" -D__CUDACC_VER_MAJOR__=11 -D__CUDACC_VER_MINOR__=7 -D__CUDACC_VER_BUILD__=64 -D__CUDA_API_VER_MAJOR__=11 -D__CUDA_API_VER_MINOR__=7 -D__NVCC_DIAG_PRAGMA_SUPPORT__=1 -include "cuda_runtime.h" -m64 "/tmp/x.cu" -o "/tmp/tmpxft_0003a542_00000000-14_x.compute_70.cpp1.ii" 
#$ cicc --c++14 --gnu_version=90400 --display_error_number --orig_src_file_name "/tmp/x.cu" --orig_src_path_name "/tmp/x.cu" --allow_managed   -arch compute_70 -m64 --no-version-ident -ftz=0 -prec_div=1 -prec_sqrt=1 -fmad=1 --include_file_name "tmpxft_0003a542_00000000-3_x.fatbin.c" -tused --module_id_file_name "/tmp/tmpxft_0003a542_00000000-4_x.module_id" --gen_c_file_name "/tmp/tmpxft_0003a542_00000000-9_x.compute_70.cudafe1.c" --stub_file_name "/tmp/tmpxft_0003a542_00000000-9_x.compute_70.cudafe1.stub.c" --gen_device_file_name "/tmp/tmpxft_0003a542_00000000-9_x.compute_70.cudafe1.gpu"  "/tmp/tmpxft_0003a542_00000000-14_x.compute_70.cpp1.ii" -o "/tmp/tmpxft_0003a542_00000000-9_x.compute_70.ptx"
#$ ptxas -arch=sm_70 -m64  "/tmp/tmpxft_0003a542_00000000-9_x.compute_70.ptx"  -o "/tmp/tmpxft_0003a542_00000000-15_x.compute_70.cubin" 

#$ gcc -D__CUDA_ARCH__=750 -D__CUDA_ARCH_LIST__=600,700,750,800,860 -E -x c++  -DCUDA_DOUBLE_MATH_FUNCTIONS -D__CUDACC__ -D__NVCC__  "-I/usr/local/cuda/bin/../targets/x86_64-linux/include"    -D "SCCACHE_TEST_DEFINE" -D__CUDACC_VER_MAJOR__=11 -D__CUDACC_VER_MINOR__=7 -D__CUDACC_VER_BUILD__=64 -D__CUDA_API_VER_MAJOR__=11 -D__CUDA_API_VER_MINOR__=7 -D__NVCC_DIAG_PRAGMA_SUPPORT__=1 -include "cuda_runtime.h" -m64 "/tmp/x.cu" -o "/tmp/tmpxft_0003a542_00000000-16_x.compute_75.cpp1.ii" 
#$ cicc --c++14 --gnu_version=90400 --display_error_number --orig_src_file_name "/tmp/x.cu" --orig_src_path_name "/tmp/x.cu" --allow_managed   -arch compute_75 -m64 --no-version-ident -ftz=0 -prec_div=1 -prec_sqrt=1 -fmad=1 --include_file_name "tmpxft_0003a542_00000000-3_x.fatbin.c" -tused --module_id_file_name "/tmp/tmpxft_0003a542_00000000-4_x.module_id" --gen_c_file_name "/tmp/tmpxft_0003a542_00000000-8_x.compute_75.cudafe1.c" --stub_file_name "/tmp/tmpxft_0003a542_00000000-8_x.compute_75.cudafe1.stub.c" --gen_device_file_name "/tmp/tmpxft_0003a542_00000000-8_x.compute_75.cudafe1.gpu"  "/tmp/tmpxft_0003a542_00000000-16_x.compute_75.cpp1.ii" -o "/tmp/tmpxft_0003a542_00000000-8_x.compute_75.ptx"
#$ ptxas -arch=sm_75 -m64  "/tmp/tmpxft_0003a542_00000000-8_x.compute_75.ptx"  -o "/tmp/tmpxft_0003a542_00000000-17_x.compute_75.sm_75.cubin" 

#$ gcc -D__CUDA_ARCH__=800 -D__CUDA_ARCH_LIST__=600,700,750,800,860 -E -x c++  -DCUDA_DOUBLE_MATH_FUNCTIONS -D__CUDACC__ -D__NVCC__  "-I/usr/local/cuda/bin/../targets/x86_64-linux/include"    -D "SCCACHE_TEST_DEFINE" -D__CUDACC_VER_MAJOR__=11 -D__CUDACC_VER_MINOR__=7 -D__CUDACC_VER_BUILD__=64 -D__CUDA_API_VER_MAJOR__=11 -D__CUDA_API_VER_MINOR__=7 -D__NVCC_DIAG_PRAGMA_SUPPORT__=1 -include "cuda_runtime.h" -m64 "/tmp/x.cu" -o "/tmp/tmpxft_0003a542_00000000-18_x.compute_80.cpp1.ii" 
#$ cicc --c++14 --gnu_version=90400 --display_error_number --orig_src_file_name "/tmp/x.cu" --orig_src_path_name "/tmp/x.cu" --allow_managed   -arch compute_80 -m64 --no-version-ident -ftz=0 -prec_div=1 -prec_sqrt=1 -fmad=1 --include_file_name "tmpxft_0003a542_00000000-3_x.fatbin.c" -tused --module_id_file_name "/tmp/tmpxft_0003a542_00000000-4_x.module_id" --gen_c_file_name "/tmp/tmpxft_0003a542_00000000-7_x.compute_80.cudafe1.c" --stub_file_name "/tmp/tmpxft_0003a542_00000000-7_x.compute_80.cudafe1.stub.c" --gen_device_file_name "/tmp/tmpxft_0003a542_00000000-7_x.compute_80.cudafe1.gpu"  "/tmp/tmpxft_0003a542_00000000-18_x.compute_80.cpp1.ii" -o "/tmp/tmpxft_0003a542_00000000-7_x.compute_80.ptx"
#$ ptxas -arch=sm_80 -m64  "/tmp/tmpxft_0003a542_00000000-7_x.compute_80.ptx"  -o "/tmp/tmpxft_0003a542_00000000-19_x.compute_80.sm_80.cubin" 

#$ ptxas -arch=sm_86 -m64  "/tmp/tmpxft_0003a542_00000000-6_x.compute_86.ptx"  -o "/tmp/tmpxft_0003a542_00000000-20_x.compute_86.sm_86.cubin" 
#$ fatbinary -64 --cicc-cmdline="-ftz=0 -prec_div=1 -prec_sqrt=1 -fmad=1 " "--image3=kind=elf,sm=60,file=/tmp/tmpxft_0003a542_00000000-13_x.compute_60.cubin" "--image3=kind=elf,sm=70,file=/tmp/tmpxft_0003a542_00000000-15_x.compute_70.cubin" "--image3=kind=ptx,sm=75,file=/tmp/tmpxft_0003a542_00000000-8_x.compute_75.ptx" "--image3=kind=elf,sm=75,file=/tmp/tmpxft_0003a542_00000000-17_x.compute_75.sm_75.cubin" "--image3=kind=ptx,sm=80,file=/tmp/tmpxft_0003a542_00000000-7_x.compute_80.ptx" "--image3=kind=elf,sm=80,file=/tmp/tmpxft_0003a542_00000000-19_x.compute_80.sm_80.cubin" "--image3=kind=ptx,sm=86,file=/tmp/tmpxft_0003a542_00000000-6_x.compute_86.ptx" "--image3=kind=elf,sm=86,file=/tmp/tmpxft_0003a542_00000000-20_x.compute_86.sm_86.cubin" --embedded-fatbin="/tmp/tmpxft_0003a542_00000000-3_x.fatbin.c" 
#$ rm /tmp/tmpxft_0003a542_00000000-3_x.fatbin

#$ gcc -D__CUDA_ARCH_LIST__=600,700,750,800,860 -E -x c++ -D__CUDACC__ -D__NVCC__  "-I/usr/local/cuda/bin/../targets/x86_64-linux/include"    -D "SCCACHE_TEST_DEFINE" -D__CUDACC_VER_MAJOR__=11 -D__CUDACC_VER_MINOR__=7 -D__CUDACC_VER_BUILD__=64 -D__CUDA_API_VER_MAJOR__=11 -D__CUDA_API_VER_MINOR__=7 -D__NVCC_DIAG_PRAGMA_SUPPORT__=1 -include "cuda_runtime.h" -m64 "/tmp/x.cu" -o "/tmp/tmpxft_0003a542_00000000-5_x.cpp4.ii" 
#$ cudafe++ --c++14 --gnu_version=90400 --display_error_number --orig_src_file_name "/tmp/x.cu" --orig_src_path_name "/tmp/x.cu" --allow_managed  --m64 --parse_templates --gen_c_file_name "/tmp/tmpxft_0003a542_00000000-6_x.compute_86.cudafe1.cpp" --stub_file_name "tmpxft_0003a542_00000000-6_x.compute_86.cudafe1.stub.c" --module_id_file_name "/tmp/tmpxft_0003a542_00000000-4_x.module_id" "/tmp/tmpxft_0003a542_00000000-5_x.cpp4.ii" 
#$ gcc -D__CUDA_ARCH__=860 -D__CUDA_ARCH_LIST__=600,700,750,800,860 -c -x c++  -DCUDA_DOUBLE_MATH_FUNCTIONS "-I/usr/local/cuda/bin/../targets/x86_64-linux/include"   -m64 "/tmp/tmpxft_0003a542_00000000-6_x.compute_86.cudafe1.cpp" -o "/tmp/x.cu.o"

I formatted the output above to highlight the compiler phases.

Device-side compilation

nvcc executes the following steps for each GPU arch:

host compiler preprocessor invocation (gcc -E)
cicc on the result of step 1 to produce an intermediate PTX assembly .ptx file
ptxas on the result of step 2 to produce a device code binary .cubin (for a single GPU arch)

And finally, a call to fatbinary to link all the .cubin files into a .fatbin.

Host-side compilation

The last three lines of the output:

host compiler preprocessor invocation again (gcc -E)
cudafe++ to embed the device-side's fatbin into the result of step 1
a host compiler invocation to compile the host .cpp from step 2 to an object .o file

`sccache-dist` modifications

Here's a rough outline of what we'd need to do for sccache-dist:

sccache client runs nvcc -E (like it does today) to compute the compile hash for cache lookups
If no cached object exists, run nvcc <original-args> --dryrun to produce the host/device sub-compiler commands
sccache client runs each <host-compiler> -E invocation (steps 1 above) and saves the output in the payload sent to the sccache-dist worker
Send each preprocessed file, the cicc/ptxas/fatbin/cudafe++/<host compiler> sub-compiler commands, and the minimal nvcc toolchain to the sccache-dist worker
The sccache-dist worker executes each sub-compiler command and ultimately generates the final .o object file

One caveat may be supporting the --threads= option, since that allows nvcc to compile multiple architectures in parallel. We may need to ignore that flag when we send all compile jobs to one worker, or (ideally) send each cicc + ptxas pair to separate workers then perform the final host-linker step once they're done.

suo · 2022-07-07T16:35:10Z

Wow, thank you for the detailed and very helpful response! This is definitely enough for me to start on. I'll have some free time in the coming weeks so hopefully we can get it done.

…

On Wed, Jul 6, 2022 at 4:50 PM Paul Taylor ***@***.***> wrote: @suo <https://github.com/suo> Yeah, I'd love some help. I did some more exploring since I posted last, so I'll try to describe my current thoughts in detail. I don't think it's worth building from this branch anymore since it doesn't have any of the work I describe below and is so behind ***@***.*** *Disclaimer:* I only know what's publicly available in docs <https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html> and presentations <https://on-demand.gputechconf.com/gtc/2013/presentations/S3185-Building-GPU-Compilers-libNVVM.pdf>. I *don't* have any special knowledge of nvcc internals or roadmap. I'm sure there's use-cases/edge-cases of which I'm not aware. The main issue for sccache-dist is that nvcc is a sort of compiler-launcher, not the compiler itself. The sccache client expects to be able to preprocess a file to compute a hash, then send the preprocessed file contents (plus a compiler toolchain) to a worker and compile the preprocessed file. Unfortunately, compiling preprocessed input is not a supported nvcc run mode. Executing nvcc <args> produces a list of sub-compiler invocations to the host compiler and NVIDIA compilers. Here's an example of the sub-compiler invocations generated and executed by nvcc: # Safe to run w/o creating `/tmp/x.cu` input file due to --dryrun /usr/local/cuda/bin/nvcc \ --generate-code=arch=compute_60,code=[sm_60] \ --generate-code=arch=compute_70,code=[sm_70] \ --generate-code=arch=compute_75,code=[compute_75,sm_75] \ --generate-code=arch=compute_80,code=[compute_80,sm_80] \ --generate-code=arch=compute_86,code=[compute_86,sm_86] \ -c /tmp/x.cu -o /tmp/x.cu.o -DSCCACHE_TEST_DEFINE \ --dryrun #$ _NVVM_BRANCH_=nvvm #$ _SPACE_= #$ _CUDART_=cudart #$ _HERE_=/usr/local/cuda/bin #$ _THERE_=/usr/local/cuda/bin #$ _TARGET_SIZE_= #$ _TARGET_DIR_= #$ _TARGET_DIR_=targets/x86_64-linux #$ TOP=/usr/local/cuda/bin/.. #$ NVVMIR_LIBRARY_DIR=/usr/local/cuda/bin/../nvvm/libdevice #$ LD_LIBRARY_PATH=/usr/local/cuda/bin/../lib: #$ PATH=/usr/local/cuda/bin/../nvvm/bin:/usr/local/cuda/bin:/usr/local/cuda/bin:/usr/local/cuda/nvvm/bin:/home/ptaylor/.nvm/versions/node/v16.15.1/bin:/home/ptaylor/.cargo/bin:/home/ptaylor/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/home/ptaylor/.fzf/bin:/home/ptaylor/.bin:/home/ptaylor/.local/bin #$ INCLUDES="-I/usr/local/cuda/bin/../targets/x86_64-linux/include" #$ LIBRARIES= "-L/usr/local/cuda/bin/../targets/x86_64-linux/lib/stubs" "-L/usr/local/cuda/bin/../targets/x86_64-linux/lib" #$ CUDAFE_FLAGS= #$ PTXAS_FLAGS= #$ gcc -D__CUDA_ARCH__=860 -D__CUDA_ARCH_LIST__=600,700,750,800,860 -E -x c++ -DCUDA_DOUBLE_MATH_FUNCTIONS -D__CUDACC__ -D__NVCC__ "-I/usr/local/cuda/bin/../targets/x86_64-linux/include" -D "SCCACHE_TEST_DEFINE" -D__CUDACC_VER_MAJOR__=11 -D__CUDACC_VER_MINOR__=7 -D__CUDACC_VER_BUILD__=64 -D__CUDA_API_VER_MAJOR__=11 -D__CUDA_API_VER_MINOR__=7 -D__NVCC_DIAG_PRAGMA_SUPPORT__=1 -include "cuda_runtime.h" -m64 "/tmp/x.cu" -o "/tmp/tmpxft_0003a542_00000000-11_x.compute_86.cpp1.ii" #$ cicc --c++14 --gnu_version=90400 --display_error_number --orig_src_file_name "/tmp/x.cu" --orig_src_path_name "/tmp/x.cu" --allow_managed -arch compute_86 -m64 --no-version-ident -ftz=0 -prec_div=1 -prec_sqrt=1 -fmad=1 --include_file_name "tmpxft_0003a542_00000000-3_x.fatbin.c" -tused --gen_module_id_file --module_id_file_name "/tmp/tmpxft_0003a542_00000000-4_x.module_id" --gen_c_file_name "/tmp/tmpxft_0003a542_00000000-6_x.compute_86.cudafe1.c" --stub_file_name "/tmp/tmpxft_0003a542_00000000-6_x.compute_86.cudafe1.stub.c" --gen_device_file_name "/tmp/tmpxft_0003a542_00000000-6_x.compute_86.cudafe1.gpu" "/tmp/tmpxft_0003a542_00000000-11_x.compute_86.cpp1.ii" -o "/tmp/tmpxft_0003a542_00000000-6_x.compute_86.ptx" #$ gcc -D__CUDA_ARCH__=600 -D__CUDA_ARCH_LIST__=600,700,750,800,860 -E -x c++ -DCUDA_DOUBLE_MATH_FUNCTIONS -D__CUDACC__ -D__NVCC__ "-I/usr/local/cuda/bin/../targets/x86_64-linux/include" -D "SCCACHE_TEST_DEFINE" -D__CUDACC_VER_MAJOR__=11 -D__CUDACC_VER_MINOR__=7 -D__CUDACC_VER_BUILD__=64 -D__CUDA_API_VER_MAJOR__=11 -D__CUDA_API_VER_MINOR__=7 -D__NVCC_DIAG_PRAGMA_SUPPORT__=1 -include "cuda_runtime.h" -m64 "/tmp/x.cu" -o "/tmp/tmpxft_0003a542_00000000-12_x.compute_60.cpp1.ii" #$ cicc --c++14 --gnu_version=90400 --display_error_number --orig_src_file_name "/tmp/x.cu" --orig_src_path_name "/tmp/x.cu" --allow_managed -arch compute_60 -m64 --no-version-ident -ftz=0 -prec_div=1 -prec_sqrt=1 -fmad=1 --include_file_name "tmpxft_0003a542_00000000-3_x.fatbin.c" -tused --module_id_file_name "/tmp/tmpxft_0003a542_00000000-4_x.module_id" --gen_c_file_name "/tmp/tmpxft_0003a542_00000000-10_x.compute_60.cudafe1.c" --stub_file_name "/tmp/tmpxft_0003a542_00000000-10_x.compute_60.cudafe1.stub.c" --gen_device_file_name "/tmp/tmpxft_0003a542_00000000-10_x.compute_60.cudafe1.gpu" "/tmp/tmpxft_0003a542_00000000-12_x.compute_60.cpp1.ii" -o "/tmp/tmpxft_0003a542_00000000-10_x.compute_60.ptx" #$ ptxas -arch=sm_60 -m64 "/tmp/tmpxft_0003a542_00000000-10_x.compute_60.ptx" -o "/tmp/tmpxft_0003a542_00000000-13_x.compute_60.cubin" #$ gcc -D__CUDA_ARCH__=700 -D__CUDA_ARCH_LIST__=600,700,750,800,860 -E -x c++ -DCUDA_DOUBLE_MATH_FUNCTIONS -D__CUDACC__ -D__NVCC__ "-I/usr/local/cuda/bin/../targets/x86_64-linux/include" -D "SCCACHE_TEST_DEFINE" -D__CUDACC_VER_MAJOR__=11 -D__CUDACC_VER_MINOR__=7 -D__CUDACC_VER_BUILD__=64 -D__CUDA_API_VER_MAJOR__=11 -D__CUDA_API_VER_MINOR__=7 -D__NVCC_DIAG_PRAGMA_SUPPORT__=1 -include "cuda_runtime.h" -m64 "/tmp/x.cu" -o "/tmp/tmpxft_0003a542_00000000-14_x.compute_70.cpp1.ii" #$ cicc --c++14 --gnu_version=90400 --display_error_number --orig_src_file_name "/tmp/x.cu" --orig_src_path_name "/tmp/x.cu" --allow_managed -arch compute_70 -m64 --no-version-ident -ftz=0 -prec_div=1 -prec_sqrt=1 -fmad=1 --include_file_name "tmpxft_0003a542_00000000-3_x.fatbin.c" -tused --module_id_file_name "/tmp/tmpxft_0003a542_00000000-4_x.module_id" --gen_c_file_name "/tmp/tmpxft_0003a542_00000000-9_x.compute_70.cudafe1.c" --stub_file_name "/tmp/tmpxft_0003a542_00000000-9_x.compute_70.cudafe1.stub.c" --gen_device_file_name "/tmp/tmpxft_0003a542_00000000-9_x.compute_70.cudafe1.gpu" "/tmp/tmpxft_0003a542_00000000-14_x.compute_70.cpp1.ii" -o "/tmp/tmpxft_0003a542_00000000-9_x.compute_70.ptx" #$ ptxas -arch=sm_70 -m64 "/tmp/tmpxft_0003a542_00000000-9_x.compute_70.ptx" -o "/tmp/tmpxft_0003a542_00000000-15_x.compute_70.cubin" #$ gcc -D__CUDA_ARCH__=750 -D__CUDA_ARCH_LIST__=600,700,750,800,860 -E -x c++ -DCUDA_DOUBLE_MATH_FUNCTIONS -D__CUDACC__ -D__NVCC__ "-I/usr/local/cuda/bin/../targets/x86_64-linux/include" -D "SCCACHE_TEST_DEFINE" -D__CUDACC_VER_MAJOR__=11 -D__CUDACC_VER_MINOR__=7 -D__CUDACC_VER_BUILD__=64 -D__CUDA_API_VER_MAJOR__=11 -D__CUDA_API_VER_MINOR__=7 -D__NVCC_DIAG_PRAGMA_SUPPORT__=1 -include "cuda_runtime.h" -m64 "/tmp/x.cu" -o "/tmp/tmpxft_0003a542_00000000-16_x.compute_75.cpp1.ii" #$ cicc --c++14 --gnu_version=90400 --display_error_number --orig_src_file_name "/tmp/x.cu" --orig_src_path_name "/tmp/x.cu" --allow_managed -arch compute_75 -m64 --no-version-ident -ftz=0 -prec_div=1 -prec_sqrt=1 -fmad=1 --include_file_name "tmpxft_0003a542_00000000-3_x.fatbin.c" -tused --module_id_file_name "/tmp/tmpxft_0003a542_00000000-4_x.module_id" --gen_c_file_name "/tmp/tmpxft_0003a542_00000000-8_x.compute_75.cudafe1.c" --stub_file_name "/tmp/tmpxft_0003a542_00000000-8_x.compute_75.cudafe1.stub.c" --gen_device_file_name "/tmp/tmpxft_0003a542_00000000-8_x.compute_75.cudafe1.gpu" "/tmp/tmpxft_0003a542_00000000-16_x.compute_75.cpp1.ii" -o "/tmp/tmpxft_0003a542_00000000-8_x.compute_75.ptx" #$ ptxas -arch=sm_75 -m64 "/tmp/tmpxft_0003a542_00000000-8_x.compute_75.ptx" -o "/tmp/tmpxft_0003a542_00000000-17_x.compute_75.sm_75.cubin" #$ gcc -D__CUDA_ARCH__=800 -D__CUDA_ARCH_LIST__=600,700,750,800,860 -E -x c++ -DCUDA_DOUBLE_MATH_FUNCTIONS -D__CUDACC__ -D__NVCC__ "-I/usr/local/cuda/bin/../targets/x86_64-linux/include" -D "SCCACHE_TEST_DEFINE" -D__CUDACC_VER_MAJOR__=11 -D__CUDACC_VER_MINOR__=7 -D__CUDACC_VER_BUILD__=64 -D__CUDA_API_VER_MAJOR__=11 -D__CUDA_API_VER_MINOR__=7 -D__NVCC_DIAG_PRAGMA_SUPPORT__=1 -include "cuda_runtime.h" -m64 "/tmp/x.cu" -o "/tmp/tmpxft_0003a542_00000000-18_x.compute_80.cpp1.ii" #$ cicc --c++14 --gnu_version=90400 --display_error_number --orig_src_file_name "/tmp/x.cu" --orig_src_path_name "/tmp/x.cu" --allow_managed -arch compute_80 -m64 --no-version-ident -ftz=0 -prec_div=1 -prec_sqrt=1 -fmad=1 --include_file_name "tmpxft_0003a542_00000000-3_x.fatbin.c" -tused --module_id_file_name "/tmp/tmpxft_0003a542_00000000-4_x.module_id" --gen_c_file_name "/tmp/tmpxft_0003a542_00000000-7_x.compute_80.cudafe1.c" --stub_file_name "/tmp/tmpxft_0003a542_00000000-7_x.compute_80.cudafe1.stub.c" --gen_device_file_name "/tmp/tmpxft_0003a542_00000000-7_x.compute_80.cudafe1.gpu" "/tmp/tmpxft_0003a542_00000000-18_x.compute_80.cpp1.ii" -o "/tmp/tmpxft_0003a542_00000000-7_x.compute_80.ptx" #$ ptxas -arch=sm_80 -m64 "/tmp/tmpxft_0003a542_00000000-7_x.compute_80.ptx" -o "/tmp/tmpxft_0003a542_00000000-19_x.compute_80.sm_80.cubin" #$ ptxas -arch=sm_86 -m64 "/tmp/tmpxft_0003a542_00000000-6_x.compute_86.ptx" -o "/tmp/tmpxft_0003a542_00000000-20_x.compute_86.sm_86.cubin" #$ fatbinary -64 --cicc-cmdline="-ftz=0 -prec_div=1 -prec_sqrt=1 -fmad=1 " "--image3=kind=elf,sm=60,file=/tmp/tmpxft_0003a542_00000000-13_x.compute_60.cubin" "--image3=kind=elf,sm=70,file=/tmp/tmpxft_0003a542_00000000-15_x.compute_70.cubin" "--image3=kind=ptx,sm=75,file=/tmp/tmpxft_0003a542_00000000-8_x.compute_75.ptx" "--image3=kind=elf,sm=75,file=/tmp/tmpxft_0003a542_00000000-17_x.compute_75.sm_75.cubin" "--image3=kind=ptx,sm=80,file=/tmp/tmpxft_0003a542_00000000-7_x.compute_80.ptx" "--image3=kind=elf,sm=80,file=/tmp/tmpxft_0003a542_00000000-19_x.compute_80.sm_80.cubin" "--image3=kind=ptx,sm=86,file=/tmp/tmpxft_0003a542_00000000-6_x.compute_86.ptx" "--image3=kind=elf,sm=86,file=/tmp/tmpxft_0003a542_00000000-20_x.compute_86.sm_86.cubin" --embedded-fatbin="/tmp/tmpxft_0003a542_00000000-3_x.fatbin.c" #$ rm /tmp/tmpxft_0003a542_00000000-3_x.fatbin #$ gcc -D__CUDA_ARCH_LIST__=600,700,750,800,860 -E -x c++ -D__CUDACC__ -D__NVCC__ "-I/usr/local/cuda/bin/../targets/x86_64-linux/include" -D "SCCACHE_TEST_DEFINE" -D__CUDACC_VER_MAJOR__=11 -D__CUDACC_VER_MINOR__=7 -D__CUDACC_VER_BUILD__=64 -D__CUDA_API_VER_MAJOR__=11 -D__CUDA_API_VER_MINOR__=7 -D__NVCC_DIAG_PRAGMA_SUPPORT__=1 -include "cuda_runtime.h" -m64 "/tmp/x.cu" -o "/tmp/tmpxft_0003a542_00000000-5_x.cpp4.ii" #$ cudafe++ --c++14 --gnu_version=90400 --display_error_number --orig_src_file_name "/tmp/x.cu" --orig_src_path_name "/tmp/x.cu" --allow_managed --m64 --parse_templates --gen_c_file_name "/tmp/tmpxft_0003a542_00000000-6_x.compute_86.cudafe1.cpp" --stub_file_name "tmpxft_0003a542_00000000-6_x.compute_86.cudafe1.stub.c" --module_id_file_name "/tmp/tmpxft_0003a542_00000000-4_x.module_id" "/tmp/tmpxft_0003a542_00000000-5_x.cpp4.ii" #$ gcc -D__CUDA_ARCH__=860 -D__CUDA_ARCH_LIST__=600,700,750,800,860 -c -x c++ -DCUDA_DOUBLE_MATH_FUNCTIONS "-I/usr/local/cuda/bin/../targets/x86_64-linux/include" -m64 "/tmp/tmpxft_0003a542_00000000-6_x.compute_86.cudafe1.cpp" -o "/tmp/x.cu.o" I formatted the output above to highlight the compiler phases <https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#cuda-compilation-trajectory__cuda-compilation-from-cu-to-executable> . Device-side compilation nvcc executes the following steps for each GPU arch: 1. host compiler preprocessor invocation (gcc -E) 2. cicc on the result of step 1 to produce an intermediate PTX assembly .ptx file 3. ptxas on the result of step 2 to produce a device code binary .cubin (for a single GPU arch) And finally, a call to fatbinary to link all the .cubin files into a .fatbin. Host-side compilation The last three lines of the output: 1. host compiler preprocessor invocation again (gcc -E) 2. cudafe++ to embed the device-side's fatbin into the result of step 1 3. a host compiler invocation to compile the host .cpp from step 2 to an object .o file sccache-dist modifications Here's a rough outline of what we'd need to do for sccache-dist: 1. sccache client runs nvcc -E (like it does today) to compute the compile hash for cache lookups 2. If no cached object exists, run nvcc <original-args> --dryrun to produce the host/device sub-compiler commands 3. sccache client runs each <host-compiler> -E invocation (steps 1 above) and saves the output in the payload sent to the sccache-dist worker 4. Send each preprocessed file, the cicc/ptxas/fatbin/cudafe++/<host compiler> sub-compiler commands, and the minimal nvcc toolchain to the sccache-dist worker 5. The sccache-dist worker executes each sub-compiler command and ultimately generates the final .o object file One caveat may be supporting the --threads= <https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#options-for-guiding-compiler-driver-threads> option, since that allows nvcc to compile multiple architectures in parallel. We may need to ignore that flag when we send all compile jobs to one worker, or (ideally) send each cicc + ptxas pair to separate workers then perform the final host-linker step once they're done. — Reply to this email directly, view it on GitHub <#1047 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAMK4EGWPZN36CX3B6FTV33VSYLTPANCNFSM5E3DWPJA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

robertmaynard · 2022-07-08T13:55:12Z

Paul's outline is an amazing starting point. If my memory is correct the code paths are slightly different when compiling directly to SASS ( sm_XY ) since you don't embed the PTX.

I also can't remember if -rdc effects the process for compilation.

This isn't to take away from the effort. But we should be aware that whatever we design will most likely need changes to support all the complicated use cases

sylvestre · 2023-01-09T15:24:33Z

sorry, I make a mistake, could you please resend/recreate it if you still want to land it?

trxcllnt · 2023-01-09T18:00:53Z

@sylvestre no worries, it's safe to close this PR.

trxcllnt · 2024-09-17T21:08:36Z

@suo fyi, I have added nvcc support for sccache-dist in #2247.

edge90 and others added 30 commits March 8, 2020 16:57

Removed unnecessary white space.

4bab0d3

Fix compiler warnings

294551c

use tokio_reactor::Handle::default

b43d51f

add missing parameter for handle_outputs call in windows test

fd7ece2

move which::which import closer to its point of use

90a3d66

Otherwise we get warnings about unused imports on Windows.

use tempfile in test_rust_outputs_rewriter

d2f3380

fix error message on windows for test_server_port_in_use

b437c92

disable sccache_cargo tests on windows

ce05072

Tests on appveyor timeout before the server starts.

Fix many clippy warnings

d73d1b9

fixup: correct stale comments

33a3869

fixup: bump CACHE_VERSION

8253784

fixup: make ring optional (only used for gcs feature)

ed8b7e1

Update Cargo.lock

d5ebd50

cargo fmt

b835831

Add github actions

09c9e32

clang: Add support for -debug-info-kind.

87c278b

This is used by chromium for example.

azure: container_name shouldn't be optional

86be547

upgrade: blake3 0.1.5 (mozilla#701)

6c71bfe

Don't lock quote to a version. (mozilla#697)

ca73029

Fix remaining clippy warnings

65bb3b6

derive Default for Windows PathTransformer

5e7236f

azure: allow anonymous reads when account key is omitted (mozilla#706)

18855f4

Co-authored-by: Bert Belder <[email protected]>

remove deprecated useage of description() and cause() (mozilla#711)

8816c2f

bump minimum rust version to 1.41.1

32dc8b0

Removes a chunk from the readme regarding a false positive rustc-wrapper entry not being used, which is closed since 1.40.0 . Cargo issue: rust-lang/cargo#7745

chore/cargo: convert Cargo.lock with cargo-lock

0209ab0

involve rustup when determining toolchain paths (mozilla#666)

1351a07

Co-authored-by: Bernhard Schuster <[email protected]>

Ignore SCCACHE_ERROR_LOG when empty

abc2759

omid and others added 16 commits November 14, 2021 12:27

Add reqwest to azure feature. Fixes mozilla#967

7778c8d

Make sccache_cargo integration test more robust

fff1a9e

Add SCCACHE_C_CUSTOM_CACHE_BUSTER to CACHED_ENV_VARS

50d0b1f

I needed a way to make hashes CWD-dependent, and it feels ugly to use other variables names for that purpose.

Ignore profiling flag if no link is being emitted

44c6bc5

Check that -Zprofile works for nightly target

65fcf56

Use unstable feature

4e106a4

Add rust beta & nightly for Windows CI

7dea20a

fix lint

be07f30

set RUST_LOG=trace and SCCACHE_LOG=trace

188e8a0

make parsed_args mutable and change .cu to .cup for dist nvcc

e77e64c

don't pass -x <lang> option when using nvcc (it is inferred by .cup e…

eec500e

…xtension)

expand nvcc toolchain file list to enable dist nvcc, add script to cr…

52765ff

…eate custom gcc+nvcc toolchain tgz

add basic dist nvcc test

c0e16b7

Merge branch 'master' of github.com:mozilla/sccache into fix/invalid-…

ca9b7ce

…nvcc-lang

only change extension to .cup extension in dist command

8024345

trxcllnt changed the title ~~[FIX] Always pass -x cu to nvcc~~ [WIP] Always pass -x cu to nvcc Feb 18, 2022

sylvestre closed this Jan 9, 2023

sylvestre force-pushed the main branch from 1e573e9 to 02ded5c Compare January 9, 2023 14:54

trxcllnt deleted the fix/invalid-nvcc-lang branch October 3, 2024 20:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Always pass `-x cu` to nvcc #1047

[WIP] Always pass `-x cu` to nvcc #1047

trxcllnt commented Sep 27, 2021 •

edited

Loading

sylvestre commented Dec 3, 2021

trxcllnt commented Jan 27, 2022 •

edited

Loading

mitchhentges commented Feb 16, 2022 •

edited

Loading

suo commented Jul 6, 2022

trxcllnt commented Jul 6, 2022

suo commented Jul 7, 2022 via email

robertmaynard commented Jul 8, 2022 •

edited

Loading

sylvestre commented Jan 9, 2023

trxcllnt commented Jan 9, 2023

trxcllnt commented Sep 17, 2024

[WIP] Always pass -x cu to nvcc #1047

[WIP] Always pass -x cu to nvcc #1047

Conversation

trxcllnt commented Sep 27, 2021 • edited Loading

sylvestre commented Dec 3, 2021

trxcllnt commented Jan 27, 2022 • edited Loading

mitchhentges commented Feb 16, 2022 • edited Loading

suo commented Jul 6, 2022

trxcllnt commented Jul 6, 2022

Device-side compilation

Host-side compilation

sccache-dist modifications

suo commented Jul 7, 2022 via email

robertmaynard commented Jul 8, 2022 • edited Loading

sylvestre commented Jan 9, 2023

trxcllnt commented Jan 9, 2023

trxcllnt commented Sep 17, 2024

[WIP] Always pass `-x cu` to nvcc #1047

[WIP] Always pass `-x cu` to nvcc #1047

trxcllnt commented Sep 27, 2021 •

edited

Loading

trxcllnt commented Jan 27, 2022 •

edited

Loading

mitchhentges commented Feb 16, 2022 •

edited

Loading

`sccache-dist` modifications

robertmaynard commented Jul 8, 2022 •

edited

Loading