-
-
Notifications
You must be signed in to change notification settings - Fork 14.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
openai-triton: update to v2.2.0 pass compiler and libcuda paths to runtime #292996
openai-triton: update to v2.2.0 pass compiler and libcuda paths to runtime #292996
Conversation
Forgot to commit updated hash. |
This looks good after youve managed to commit the updated hash |
@@ -111,6 +112,11 @@ buildPythonPackage rec { | |||
# Use our linker flags | |||
substituteInPlace python/triton/common/build.py \ | |||
--replace '${oldStr}' '${newStr}' | |||
# triton/common/build.py will be called both on build, and sometimes in runtime. This is cursed. | |||
substituteInPlace python/triton/common/build.py \ | |||
--replace 'os.getenv("TRITON_LIBCUDA_PATH")' '"${cudaPackages.cuda_cudart}/lib"' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The environment variable says "libcuda" (the userspace driver), not "libcudart"? Is this a confusion on upstream's side?
This is cursed.
Yes and this is intentional: isn't triton literally a tool for compiling kernels on the fly from some subset of python?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, wasn't there an attempt to make CUDA stuff optional in triton? In that case we don't want to refer to backendStdenv but to the conditional stdenv (otherwise the cpu-only version pulls two different GCCs into the closure)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The environment variable says "libcuda" (the userspace driver), not "libcudart"? Is this a confusion on upstream's side?
It wants libcuda.so.1, I'm not sure where I need to look for it?
Yes and this is intentional: isn't triton literally a tool for compiling kernels on the fly from some subset of python?
The cursed part is that build and runtime step are closely intermixed, but the build step doesn't have a way to provide some values for runtime.
It provides ptxas as third_party binary, but libcuda and etc are expecting that it is running with the same binaries as it built with, while this is not necessary true, as I'm sure some things there are not ABI-compatible. I see in 3.0.0 version the build process is making much more sense.
Also, wasn't there an attempt to make CUDA stuff optional in triton? In that case we don't want to refer to backendStdenv but to the conditional stdenv (otherwise the cpu-only version pulls two different GCCs into the closure)
The CC variable read override is only enabled on cudaSupport, otherwise it doesn't try to call CC at runtime, and the build-time CC is enough (At least, I haven't experienced that with vLLM)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wants libcuda.so.1, I'm not sure where I need to look for it?
Libcuda depends on the (nvidia) kernel (module) that runs on the user's machine, so we don't link it through the nix store, we link it through /run/opengl-driver/lib
: ${addDriverRunpath.driverLink}/lib
.
More specifically, we use the fake driver ${getLib cudaPackages.cuda_cudart}/lib/stubs
at build/link time, and ${addDriverRunpath.driverLink}/lib
at runtime. It's also important that at runtime we first try to dlopen("libcuda.so", ...)
first, and only then dlopen("/run/opengl-driver/lib/libcuda.so", ...)
because we want things to also work on FHS distributions and respect the optional LD_LIBRARY_PATH
The cursed part is that build and runtime step are closely intermixed, but the build step doesn't have a way to provide some values for runtime.
Oh right, we should probably try to explicitly track the references retained at runtime.
The CC variable read override is only enabled on cudaSupport, otherwise it doesn't try to call CC at runtime, and the build-time CC is enough (At least, I haven't experienced that with vLLM)
Do they not use triton.common.build
at runtime for their jit/aot?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More specifically, we use the fake driver ${getLib cudaPackages.cuda_cudart}/lib/stubs at build/link time, and ${addDriverRunpath.driverLink}/lib at runtime. It's also important that at runtime we first try to dlopen("libcuda.so", ...) first, and only then dlopen("/run/opengl-driver/lib/libcuda.so", ...) because we want things to also work on FHS distributions and respect the optional LD_LIBRARY_PATH
I don't think we can achieve that with this code? It runs both at compile time, and at runtime...
Except by patching it in preInstall?..
Do they not use triton.common.build at runtime for their jit/aot?
Not in vLLM on ROCm, I'm not sure about other projects using triton directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well we should open an issue asking for a more fine-grained support. Note they do not use the variable on master
any more, but use whereis
which is also platform-specific: https://github.com/feihugis/triton/blob/a9d1935e795cf28aa3c3be8ac5c14723e6805de5/python/triton/compiler.py#L1354-L1357
substituteInPlace python/triton/common/build.py \ | ||
--replace 'os.getenv("TRITON_LIBCUDA_PATH")' '"${cudaPackages.cuda_cudart}/lib"' | ||
substituteInPlace python/triton/common/build.py \ | ||
--replace 'os.environ.get("CC")' '"${cudaPackages.backendStdenv.cc}/bin/cc"' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest making this (and the above) something like os.environ.get("CC", default=${...})
. I would also suggest upstream to make this TRITON_CC
instead of CC
, to avoid unintentionally changing the compiler
ce5b24a
to
1ce82a9
Compare
1ce82a9
to
c540b19
Compare
So, the ptxas, cudart, and libcuda situations have been revisited in #328247. @CertainLach could you update the description wrt what issues are left? |
Duplicate of #328247 |
Description of changes
On VLLM v0.3.3, using mixtral causes triton to use CC in runtime, thus defining CC variable only for build is not enough.
Cc: @happysalada
Same thing with libcuda, it also needs to be provided to runtime.
This doesn't seem to affect ROCm, however, I'm not sure about of the execution path here.
Things done
nix.conf
? (See Nix manual)sandbox = relaxed
sandbox = true
nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD"
. Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/
)Add a 👍 reaction to pull requests you find important.