-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add gpu-kernel calling convention #135047
Conversation
r? @SparrowLii rustbot has assigned @SparrowLii. Use |
rust-analyzer is developed in its own repository. If possible, consider making this change to rust-lang/rust-analyzer instead. cc @rust-lang/rust-analyzer These commits modify compiler targets. Some changes occurred in compiler/rustc_codegen_cranelift cc @bjorn3 This PR changes Stable MIR |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
What is the purpose of this calling convention for, @Flakebi? What does it represent? |
This seems to be missing the actual implementation of the call conv adjustments. As is it is using the |
It maps to the To run compute kernels on AMD GPUs, one needs to compile a shared ELF object (like a For running, one can use ROCR-Runtime through the HSA interface or HIP (just learned this yesterday). The first step is to load the ELF (as executable for HSA or module for HIP). Then, query a kernel on the loaded object by symbol name ( So, on a high level, ⁰Rust always uses a linker-script when linking and that hides the |
@Flakebi Then, er, what is the calling convention? By saying "entry point" and comparing it to |
To be clear: I do not believe "there is something LLVM calls a 'calling convention', therefore we must add it as an I am not trying to suggest there is no other reason, I just am trying to determine "what does exposing this actually accomplish for programmers, and why is this the correct approach, instead of e.g. an attribute?" |
Yes, it is meant to be called externally. define amdgpu_kernel void @kernel() {
call amdgpu_kernel void @kernel()
ret void
}
; LLVM ERROR: Unsupported calling convention for call
; Stack dump…
I don’t have much of a reason to expose it as a calling convention except that
An attribute like I think a wrapping entry point like An example copy kernel looks like this (using the calling convention): // Compile with
// RUSTFLAGS='-Ctarget-cpu=gfx900' cargo +stage1 build --target amdgcn-amd-amdhsa -Zbuild-std=core
#![allow(internal_features)]
#![feature(link_llvm_intrinsics, abi_amdgpu_kernel)]
#![no_std]
unsafe extern "C" {
#[link_name = "llvm.amdgcn.workitem.id.x"]
safe fn workitem_x() -> u32;
#[link_name = "llvm.amdgcn.s.sethalt"]
safe fn halt(i: u32) -> !;
}
#[panic_handler]
fn panic(_: &core::panic::PanicInfo) -> ! {
halt(1);
}
#[no_mangle]
pub extern "amdgpu-kernel" fn kernel(input: *const u8, output: *mut u8) {
let id = workitem_x(); // This is the execution id from 0 to <size> (size is given on the CPU when launching the kernel)
unsafe {
*output.add(id as usize) = *input.add(id as usize);
}
} |
cool! are |
Seems like The following compiles: #[no_mangle]
pub unsafe extern "ptx-kernel" fn global_function(i: i32) {
if i == 0 {
global_function(i - 1);
}
} (finally managed to compile nvptx with |
weird! |
Looking at https://reviews.llvm.org/D140226#4004026 I'm not so sure that these aren't meant to behave identically and it's just lack of interest in implementing the followup steps that made them distinct. |
Opened llvm/llvm-project#121655 |
Interesting find! |
Do nvptx and amdgpu allow the same kinds of arguments to the kernel function with the same restrictions? |
On the issue of whether Even if the LLVM project emits an error on this, we should dedicate an Error message to not have users running into LLVM errors and ICEs. The error messages could at least be common between the different calling conventions.
I'm not intimately familiar with the restrictions of amdgpu kernels so I don't really know if it can be modelled as "the same thing". I like the idea of the "unification" if this is without drawbacks. I also think that there can be a case for having a "Rust kernel" calling convention where it's possible to pass enums, tuples, etc from Rust host code to Rust device code. Perhaps that can also be common between the different formats for describing GPU code (ptx, amdgcn, spir-v) calling conv? But this is probably getting ahead of things. If this is being looked into then at least also SPIR-V kernel should be looked into in the same context. |
@kjetilkjeka applied your recommended edits to the issue. |
It's fine if the ABI is implemented slightly differently, IMO, if it's conceptually the same. @Flakebi I'm fine with adding this as @kjetilkjeka SPIRV's |
I'm not as certain if SPIRV's And they can all have the same name. So... not really something we can easily map, I think, nor should try. |
Good, thanks for shedding light into nvptx :) I agree that amdgpu/nvptx kernel conceptually mean the same thing. We don’t want to guarantee that they behave exactly the same, some properties of a For SPIR-V, I’m not sure how well it can map to the same calling convention.
(Sidenot: The amdgpu LLVM backend uses different calling conventions for the different graphics shader types like pixel/vertex/compute shaders.) I think |
…gjubilee Add gpu-kernel calling convention The amdgpu-kernel calling convention was reverted in commit f6b21e9 (rust-lang#120495 and rust-lang/rust-analyzer#16463) due to inactivity in the amdgpu target. Introduce a `gpu-kernel` calling convention that translates to `ptx_kernel` or `amdgpu_kernel`, depending on the target that rust compiles for. Tracking issue: rust-lang#135467 amdgpu target tracking issue: rust-lang#135024
…gjubilee Add gpu-kernel calling convention The amdgpu-kernel calling convention was reverted in commit f6b21e9 (rust-lang#120495 and rust-lang/rust-analyzer#16463) due to inactivity in the amdgpu target. Introduce a `gpu-kernel` calling convention that translates to `ptx_kernel` or `amdgpu_kernel`, depending on the target that rust compiles for. Tracking issue: rust-lang#135467 amdgpu target tracking issue: rust-lang#135024
…iaskrgr Rollup of 7 pull requests Successful merges: - rust-lang#134940 (Make sure to scrape region constraints from deeply normalizing type outlives assumptions in borrowck) - rust-lang#135047 (Add gpu-kernel calling convention) - rust-lang#135228 (Improve `DispatchFromDyn` and `CoerceUnsized` impl validation) - rust-lang#135264 (Consider more erroneous layouts as `LayoutError::ReferencesError` to suppress spurious errors) - rust-lang#135302 (for purely return-type based searches, deprioritize clone-like functions) - rust-lang#135380 (Make sure we can produce `ConstArgHasWrongType` errors for valtree consts) - rust-lang#135425 (Do not consider traits that have unsatisfied const conditions to be conditionally const) r? `@ghost` `@rustbot` modify labels: rollup
tests/codegen/gpu-kernel-abi.rs
Outdated
//@ revisions: amdgpu nvptx | ||
//@ [amdgpu] compile-flags: --crate-type=rlib --target=amdgcn-amd-amdhsa -Ctarget-cpu=gfx900 | ||
//@ [amdgpu] needs-llvm-components: amdgpu | ||
// amdgpu target is not yet merged | ||
//@ [amdgpu] should-fail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't really work:
thread 'main' panicked at src/tools/compiletest/src/header.rs:1559:17:
missing LLVM component amdgpu, and COMPILETEST_REQUIRE_ALL_LLVM_COMPONENTS is set
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Build completed unsuccessfully in 0:37:29
local time: Wed Jan 15 03:17:37 UTC 2025
network time: Wed, 15 Jan 2025 03:17:37 GMT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, Rust’s testing is too advanced for this to go through 😃
@bors r- |
@bors rollup=iffy |
( it might be simpler to just land the amdgpu target first ) |
The amdgpu-kernel calling convention was reverted in commit f6b21e9 due to inactivity in the amdgpu target. Introduce a `gpu-kernel` calling convention that translates to `ptx_kernel` or `amdgpu_kernel`, depending on the target that rust compiles for.
68b2639
to
e7e5202
Compare
I removed the amdgpu part of the test and squashed commits. Diff |
@bors r+ |
☀️ Test successful - checks-actions |
A job failed! Check out the build log: (web) (plain) Click to see the possible cause of the failure (guessed by this bot)
|
Finished benchmarking commit (0c2c096): comparison URL. Overall result: no relevant changes - no action needed@rustbot label: -perf-regression Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)Results (primary -0.4%, secondary 0.0%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 763.565s -> 764.62s (0.14%) |
…ubilee Add gpu-kernel calling convention The amdgpu-kernel calling convention was reverted in commit f6b21e9 (rust-lang#120495 and rust-lang/rust-analyzer#16463) due to inactivity in the amdgpu target. Introduce a `gpu-kernel` calling convention that translates to `ptx_kernel` or `amdgpu_kernel`, depending on the target that rust compiles for. Tracking issue: rust-lang#135467 amdgpu target tracking issue: rust-lang#135024
The amdgpu-kernel calling convention was reverted in commit f6b21e9 (#120495 and rust-lang/rust-analyzer#16463) due to inactivity in the amdgpu target.
Introduce a
gpu-kernel
calling convention that translates toptx_kernel
oramdgpu_kernel
, depending on the target that rust compiles for.Tracking issue: #135467
amdgpu target tracking issue: #135024