Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openai-triton: update to v2.2.0 pass compiler and libcuda paths to runtime #292996

Closed

Conversation

CertainLach
Copy link
Member

Description of changes

On VLLM v0.3.3, using mixtral causes triton to use CC in runtime, thus defining CC variable only for build is not enough.

Cc: @happysalada

 (_AsyncLLMEngine pid=43320)   File "/nix/store/lgc1qjfq1m8bygz0mdddfwra7cid3xdk-python3-3.11.8-env/lib/python3.11/site-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 196, in invoke_fused_moe_kernel
 (_AsyncLLMEngine pid=43320)     fused_moe_kernel[grid](
 (_AsyncLLMEngine pid=43320)   File "<string>", line 63, in fused_moe_kernel
 (_AsyncLLMEngine pid=43320)   File "/nix/store/lgc1qjfq1m8bygz0mdddfwra7cid3xdk-python3-3.11.8-env/lib/python3.11/site-packages/triton/compiler/compiler.py", line 425, in compile
 (_AsyncLLMEngine pid=43320)     so_path = make_stub(name, signature, constants)
 (_AsyncLLMEngine pid=43320)               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 (_AsyncLLMEngine pid=43320)   File "/nix/store/lgc1qjfq1m8bygz0mdddfwra7cid3xdk-python3-3.11.8-env/lib/python3.11/site-packages/triton/compiler/make_launcher.py", line 39, in make_stub
 (_AsyncLLMEngine pid=43320)     so = _build(name, src_path, tmpdir)
 (_AsyncLLMEngine pid=43320)          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 (_AsyncLLMEngine pid=43320)   File "/nix/store/lgc1qjfq1m8bygz0mdddfwra7cid3xdk-python3-3.11.8-env/lib/python3.11/site-packages/triton/common/build.py", line 80, in _build
 (_AsyncLLMEngine pid=43320)     raise RuntimeError("Failed to find C compiler. Please specify via CC environment variable.")

Same thing with libcuda, it also needs to be provided to runtime.

This doesn't seem to affect ROCm, however, I'm not sure about of the execution path here.

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 24.05 Release Notes (or backporting 23.05 and 23.11 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@CertainLach CertainLach marked this pull request as draft March 3, 2024 11:54
@CertainLach
Copy link
Member Author

Forgot to commit updated hash.

@happysalada
Copy link
Contributor

This looks good after youve managed to commit the updated hash

@@ -111,6 +112,11 @@ buildPythonPackage rec {
# Use our linker flags
substituteInPlace python/triton/common/build.py \
--replace '${oldStr}' '${newStr}'
# triton/common/build.py will be called both on build, and sometimes in runtime. This is cursed.
substituteInPlace python/triton/common/build.py \
--replace 'os.getenv("TRITON_LIBCUDA_PATH")' '"${cudaPackages.cuda_cudart}/lib"'
Copy link
Contributor

@SomeoneSerge SomeoneSerge Mar 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The environment variable says "libcuda" (the userspace driver), not "libcudart"? Is this a confusion on upstream's side?

This is cursed.

Yes and this is intentional: isn't triton literally a tool for compiling kernels on the fly from some subset of python?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, wasn't there an attempt to make CUDA stuff optional in triton? In that case we don't want to refer to backendStdenv but to the conditional stdenv (otherwise the cpu-only version pulls two different GCCs into the closure)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The environment variable says "libcuda" (the userspace driver), not "libcudart"? Is this a confusion on upstream's side?

It wants libcuda.so.1, I'm not sure where I need to look for it?

Yes and this is intentional: isn't triton literally a tool for compiling kernels on the fly from some subset of python?

The cursed part is that build and runtime step are closely intermixed, but the build step doesn't have a way to provide some values for runtime.
It provides ptxas as third_party binary, but libcuda and etc are expecting that it is running with the same binaries as it built with, while this is not necessary true, as I'm sure some things there are not ABI-compatible. I see in 3.0.0 version the build process is making much more sense.

Also, wasn't there an attempt to make CUDA stuff optional in triton? In that case we don't want to refer to backendStdenv but to the conditional stdenv (otherwise the cpu-only version pulls two different GCCs into the closure)

The CC variable read override is only enabled on cudaSupport, otherwise it doesn't try to call CC at runtime, and the build-time CC is enough (At least, I haven't experienced that with vLLM)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wants libcuda.so.1, I'm not sure where I need to look for it?

Libcuda depends on the (nvidia) kernel (module) that runs on the user's machine, so we don't link it through the nix store, we link it through /run/opengl-driver/lib: ${addDriverRunpath.driverLink}/lib.

More specifically, we use the fake driver ${getLib cudaPackages.cuda_cudart}/lib/stubs at build/link time, and ${addDriverRunpath.driverLink}/lib at runtime. It's also important that at runtime we first try to dlopen("libcuda.so", ...) first, and only then dlopen("/run/opengl-driver/lib/libcuda.so", ...) because we want things to also work on FHS distributions and respect the optional LD_LIBRARY_PATH

The cursed part is that build and runtime step are closely intermixed, but the build step doesn't have a way to provide some values for runtime.

Oh right, we should probably try to explicitly track the references retained at runtime.

The CC variable read override is only enabled on cudaSupport, otherwise it doesn't try to call CC at runtime, and the build-time CC is enough (At least, I haven't experienced that with vLLM)

Do they not use triton.common.build at runtime for their jit/aot?

Copy link
Member Author

@CertainLach CertainLach Mar 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More specifically, we use the fake driver ${getLib cudaPackages.cuda_cudart}/lib/stubs at build/link time, and ${addDriverRunpath.driverLink}/lib at runtime. It's also important that at runtime we first try to dlopen("libcuda.so", ...) first, and only then dlopen("/run/opengl-driver/lib/libcuda.so", ...) because we want things to also work on FHS distributions and respect the optional LD_LIBRARY_PATH

I don't think we can achieve that with this code? It runs both at compile time, and at runtime...
Except by patching it in preInstall?..

Do they not use triton.common.build at runtime for their jit/aot?

Not in vLLM on ROCm, I'm not sure about other projects using triton directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well we should open an issue asking for a more fine-grained support. Note they do not use the variable on master any more, but use whereis which is also platform-specific: https://github.com/feihugis/triton/blob/a9d1935e795cf28aa3c3be8ac5c14723e6805de5/python/triton/compiler.py#L1354-L1357

substituteInPlace python/triton/common/build.py \
--replace 'os.getenv("TRITON_LIBCUDA_PATH")' '"${cudaPackages.cuda_cudart}/lib"'
substituteInPlace python/triton/common/build.py \
--replace 'os.environ.get("CC")' '"${cudaPackages.backendStdenv.cc}/bin/cc"'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest making this (and the above) something like os.environ.get("CC", default=${...}). I would also suggest upstream to make this TRITON_CC instead of CC, to avoid unintentionally changing the compiler

@CertainLach CertainLach force-pushed the openai-triton-runtime-compiler branch from ce5b24a to 1ce82a9 Compare March 3, 2024 14:06
@CertainLach CertainLach force-pushed the openai-triton-runtime-compiler branch from 1ce82a9 to c540b19 Compare March 3, 2024 14:17
@ofborg ofborg bot requested a review from SomeoneSerge March 3, 2024 14:51
@CertainLach CertainLach marked this pull request as ready for review March 4, 2024 12:19
@wegank wegank added the 2.status: merge conflict This PR has merge conflicts with the target branch label May 22, 2024
@SomeoneSerge
Copy link
Contributor

So, the ptxas, cudart, and libcuda situations have been revisited in #328247. @CertainLach could you update the description wrt what issues are left?

@wegank wegank added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jan 2, 2025
@aucub aucub closed this Jan 22, 2025
@aucub
Copy link
Contributor

aucub commented Jan 22, 2025

Duplicate of #328247

@aucub aucub marked this as a duplicate of #328247 Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.status: merge conflict This PR has merge conflicts with the target branch 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md 6.topic: python 10.rebuild-darwin: 11-100 10.rebuild-linux: 101-500
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants