openai-triton: update to v2.2.0 pass compiler and libcuda paths to runtime #292996

CertainLach · 2024-03-03T11:51:23Z

Description of changes

On VLLM v0.3.3, using mixtral causes triton to use CC in runtime, thus defining CC variable only for build is not enough.

Cc: @happysalada

 (_AsyncLLMEngine pid=43320)   File "/nix/store/lgc1qjfq1m8bygz0mdddfwra7cid3xdk-python3-3.11.8-env/lib/python3.11/site-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 196, in invoke_fused_moe_kernel
 (_AsyncLLMEngine pid=43320)     fused_moe_kernel[grid](
 (_AsyncLLMEngine pid=43320)   File "<string>", line 63, in fused_moe_kernel
 (_AsyncLLMEngine pid=43320)   File "/nix/store/lgc1qjfq1m8bygz0mdddfwra7cid3xdk-python3-3.11.8-env/lib/python3.11/site-packages/triton/compiler/compiler.py", line 425, in compile
 (_AsyncLLMEngine pid=43320)     so_path = make_stub(name, signature, constants)
 (_AsyncLLMEngine pid=43320)               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 (_AsyncLLMEngine pid=43320)   File "/nix/store/lgc1qjfq1m8bygz0mdddfwra7cid3xdk-python3-3.11.8-env/lib/python3.11/site-packages/triton/compiler/make_launcher.py", line 39, in make_stub
 (_AsyncLLMEngine pid=43320)     so = _build(name, src_path, tmpdir)
 (_AsyncLLMEngine pid=43320)          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 (_AsyncLLMEngine pid=43320)   File "/nix/store/lgc1qjfq1m8bygz0mdddfwra7cid3xdk-python3-3.11.8-env/lib/python3.11/site-packages/triton/common/build.py", line 80, in _build
 (_AsyncLLMEngine pid=43320)     raise RuntimeError("Failed to find C compiler. Please specify via CC environment variable.")

Same thing with libcuda, it also needs to be provided to runtime.

This doesn't seem to affect ROCm, however, I'm not sure about of the execution path here.

Things done

Add a 👍 reaction to pull requests you find important.

CertainLach · 2024-03-03T11:54:33Z

Forgot to commit updated hash.

happysalada · 2024-03-03T13:37:57Z

This looks good after youve managed to commit the updated hash

SomeoneSerge · 2024-03-03T13:44:22Z

pkgs/development/python-modules/openai-triton/default.nix

@@ -111,6 +112,11 @@ buildPythonPackage rec {
    # Use our linker flags
    substituteInPlace python/triton/common/build.py \
      --replace '${oldStr}' '${newStr}'
+    # triton/common/build.py will be called both on build, and sometimes in runtime. This is cursed.
+    substituteInPlace python/triton/common/build.py \
+      --replace 'os.getenv("TRITON_LIBCUDA_PATH")' '"${cudaPackages.cuda_cudart}/lib"'


The environment variable says "libcuda" (the userspace driver), not "libcudart"? Is this a confusion on upstream's side?

This is cursed.

Yes and this is intentional: isn't triton literally a tool for compiling kernels on the fly from some subset of python?

Also, wasn't there an attempt to make CUDA stuff optional in triton? In that case we don't want to refer to backendStdenv but to the conditional stdenv (otherwise the cpu-only version pulls two different GCCs into the closure)

The environment variable says "libcuda" (the userspace driver), not "libcudart"? Is this a confusion on upstream's side?

It wants libcuda.so.1, I'm not sure where I need to look for it?

Yes and this is intentional: isn't triton literally a tool for compiling kernels on the fly from some subset of python?

The cursed part is that build and runtime step are closely intermixed, but the build step doesn't have a way to provide some values for runtime.
It provides ptxas as third_party binary, but libcuda and etc are expecting that it is running with the same binaries as it built with, while this is not necessary true, as I'm sure some things there are not ABI-compatible. I see in 3.0.0 version the build process is making much more sense.

Also, wasn't there an attempt to make CUDA stuff optional in triton? In that case we don't want to refer to backendStdenv but to the conditional stdenv (otherwise the cpu-only version pulls two different GCCs into the closure)

The CC variable read override is only enabled on cudaSupport, otherwise it doesn't try to call CC at runtime, and the build-time CC is enough (At least, I haven't experienced that with vLLM)

It wants libcuda.so.1, I'm not sure where I need to look for it?

Libcuda depends on the (nvidia) kernel (module) that runs on the user's machine, so we don't link it through the nix store, we link it through /run/opengl-driver/lib: ${addDriverRunpath.driverLink}/lib.

More specifically, we use the fake driver ${getLib cudaPackages.cuda_cudart}/lib/stubs at build/link time, and ${addDriverRunpath.driverLink}/lib at runtime. It's also important that at runtime we first try to dlopen("libcuda.so", ...) first, and only then dlopen("/run/opengl-driver/lib/libcuda.so", ...) because we want things to also work on FHS distributions and respect the optional LD_LIBRARY_PATH

The cursed part is that build and runtime step are closely intermixed, but the build step doesn't have a way to provide some values for runtime.

Oh right, we should probably try to explicitly track the references retained at runtime.

The CC variable read override is only enabled on cudaSupport, otherwise it doesn't try to call CC at runtime, and the build-time CC is enough (At least, I haven't experienced that with vLLM)

Do they not use triton.common.build at runtime for their jit/aot?

More specifically, we use the fake driver ${getLib cudaPackages.cuda_cudart}/lib/stubs at build/link time, and ${addDriverRunpath.driverLink}/lib at runtime. It's also important that at runtime we first try to dlopen("libcuda.so", ...) first, and only then dlopen("/run/opengl-driver/lib/libcuda.so", ...) because we want things to also work on FHS distributions and respect the optional LD_LIBRARY_PATH

I don't think we can achieve that with this code? It runs both at compile time, and at runtime...
Except by patching it in preInstall?..

Do they not use triton.common.build at runtime for their jit/aot?

Not in vLLM on ROCm, I'm not sure about other projects using triton directly.

Well we should open an issue asking for a more fine-grained support. Note they do not use the variable on master any more, but use whereis which is also platform-specific: https://github.com/feihugis/triton/blob/a9d1935e795cf28aa3c3be8ac5c14723e6805de5/python/triton/compiler.py#L1354-L1357

SomeoneSerge · 2024-03-03T13:46:22Z

pkgs/development/python-modules/openai-triton/default.nix

+    substituteInPlace python/triton/common/build.py \
+      --replace 'os.getenv("TRITON_LIBCUDA_PATH")' '"${cudaPackages.cuda_cudart}/lib"'
+    substituteInPlace python/triton/common/build.py \
+      --replace 'os.environ.get("CC")' '"${cudaPackages.backendStdenv.cc}/bin/cc"'


I would suggest making this (and the above) something like os.environ.get("CC", default=${...}). I would also suggest upstream to make this TRITON_CC instead of CC, to avoid unintentionally changing the compiler

SomeoneSerge · 2024-10-17T00:43:51Z

So, the ptxas, cudart, and libcuda situations have been revisited in #328247. @CertainLach could you update the description wrt what issues are left?

aucub · 2025-01-22T20:40:58Z

Duplicate of #328247

github-actions bot added the 6.topic: python label Mar 3, 2024

CertainLach marked this pull request as draft March 3, 2024 11:54

ofborg bot requested review from Madouura and SomeoneSerge March 3, 2024 12:15

ofborg bot added 10.rebuild-darwin: 1-10 10.rebuild-linux: 101-500 labels Mar 3, 2024

SomeoneSerge reviewed Mar 3, 2024

View reviewed changes

openai-triton-vllm: update for openai-triton v2.2.0

c855c69

CertainLach force-pushed the openai-triton-runtime-compiler branch from ce5b24a to 1ce82a9 Compare March 3, 2024 14:06

openai-triton: 2.1.0 -> 2.2.0

c540b19

CertainLach force-pushed the openai-triton-runtime-compiler branch from 1ce82a9 to c540b19 Compare March 3, 2024 14:17

ofborg bot requested a review from SomeoneSerge March 3, 2024 14:51

ofborg bot added 10.rebuild-darwin: 11-100 and removed 10.rebuild-darwin: 1-10 labels Mar 3, 2024

CertainLach marked this pull request as ready for review March 4, 2024 12:19

wegank added the 2.status: merge conflict This PR has merge conflicts with the target branch label May 22, 2024

CertainLach mentioned this pull request Jul 10, 2024

python3Packages.vllm: relax deps, add cuda stdenv #326145

Merged

13 tasks

wegank added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jan 2, 2025

aucub closed this Jan 22, 2025

aucub marked this as a duplicate of #328247 Jan 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

openai-triton: update to v2.2.0 pass compiler and libcuda paths to runtime #292996

openai-triton: update to v2.2.0 pass compiler and libcuda paths to runtime #292996

CertainLach commented Mar 3, 2024

CertainLach commented Mar 3, 2024

happysalada commented Mar 3, 2024

SomeoneSerge Mar 3, 2024 •

edited

Loading

SomeoneSerge Mar 3, 2024

CertainLach Mar 3, 2024

SomeoneSerge Mar 3, 2024

CertainLach Mar 3, 2024 •

edited

Loading

SomeoneSerge Mar 4, 2024

SomeoneSerge Mar 3, 2024

SomeoneSerge commented Oct 17, 2024

aucub commented Jan 22, 2025

openai-triton: update to v2.2.0 pass compiler and libcuda paths to runtime #292996

openai-triton: update to v2.2.0 pass compiler and libcuda paths to runtime #292996

Conversation

CertainLach commented Mar 3, 2024

Description of changes

Things done

CertainLach commented Mar 3, 2024

happysalada commented Mar 3, 2024

SomeoneSerge Mar 3, 2024 • edited Loading

Choose a reason for hiding this comment

SomeoneSerge Mar 3, 2024

Choose a reason for hiding this comment

CertainLach Mar 3, 2024

Choose a reason for hiding this comment

SomeoneSerge Mar 3, 2024

Choose a reason for hiding this comment

CertainLach Mar 3, 2024 • edited Loading

Choose a reason for hiding this comment

SomeoneSerge Mar 4, 2024

Choose a reason for hiding this comment

SomeoneSerge Mar 3, 2024

Choose a reason for hiding this comment

SomeoneSerge commented Oct 17, 2024

aucub commented Jan 22, 2025

SomeoneSerge Mar 3, 2024 •

edited

Loading

CertainLach Mar 3, 2024 •

edited

Loading