-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking issue for the "ptx-kernel" ABI #38788
Comments
…lexcrichton Ensure every unstable language feature has a tracking issue. Filled in the missing numbers: * `abi_ptx` → #38788 * `generators` → #43122 * `global_allocator` → #27389 Reused existing tracking issues because they were decomposed from a larger feature * `*_target_feature` → #44839 (reusing the old `target_feature` number) * `proc_macros_*` → #38356 (reusing the to-be-stabilized `proc_macros` number) Filed new issues * `exhaustive_patterns` → #51085 * `pattern_parentheses` → #51087 * `wasm_custom_section` and `wasm_import_module` → #51088
Triage: not aware of any movement on stabilizing this. |
would be very interested in this! any of y'all know what needs doing to bring this back to life? edit: following links through the metabug, and then through a project that mentioned the metabug, looks like there are in fact folks still interested. cool! |
I'm curious if I might have found a codegen bug related to this feature. It is related to how the struct is assumed to be passed to the kernel. When compiling the following Rust code#![no_std]
#![feature(abi_ptx, stdsimd)]
#[panic_handler]
fn panic(info: &core::panic::PanicInfo) -> ! {
unreachable!()
}
#[repr(C)]
pub struct Foo{
array: [f32; 9],
}
#[no_mangle]
pub unsafe extern "ptx-kernel" fn add(a: *mut f32, b: Foo) {
*a = b.array[5];
} Rustc will produce the followingWhen compiling with
On the other hand, when compiling the following C++ codestruct foo {
float array[9];
};
extern "C" __global__ void add( float *a, struct foo b) {
*a = b.array[5];
} I get the following output for nvccWhen compiling with
I get the following output for clangWhen compiling with
Here's the different compiler versions I'm using
I would be happy to try to figure out whats going on if anyone can confirm that this is a bug? If anyone would be able to point me in the right direction that would be a nice bonus as well. |
Visiting as part of a T-compiler backlog bonanza prepass. This at very least is in a "needs summary" state. But based on @RDambrosio016 's comment from 10 days ago, there may even be design concerns? (At least depending on what scope of potential parameter types we want this ABI to cover at the outset.) @rustbot label: S-tracking-design-concerns S-tracking-needs-summary |
Here's a suggestion for an update to the tracking issue to include concerns. Partially copied for japaric's original post and added concerns from and links to relevant issues. If you have the possibility you should take a look @RDambrosio016 Feature gate This ABI is intended to be used when generating code for device (GPU) targets like Public APIThe following code #![no_std]
#![feature(abi_ptx)]
#[no_mangle]
pub extern "ptx-kernel" fn foo() {} Produces
Steps / History
Unresolved Questions
Notes
|
… r=nagisa Fix codegen bug in "ptx-kernel" abi related to arg passing I found a codegen bug in the nvptx abi related to that args are passed as ptrs ([see comment](rust-lang#38788 (comment))), this is not as specified in the [ptx-interoperability doc](https://docs.nvidia.com/cuda/ptx-writers-guide-to-interoperability/) or how C/C++ does it. It will also almost always fail in practice since device/host uses different memory spaces for most hardware. This PR fixes the bug and add tests for passing structs to ptx kernels. I observed that all nvptx assembly tests had been marked as [ignore a long time ago](rust-lang#59752 (comment)). I'm not sure if the new one should be marked as ignore, it passed on my computer but it might fail if ptx-linker is missing on the server? I guess this is outside scope for this PR and should be looked at in a different issue/PR. I only fixed the nvptx64-nvidia-cuda target and not the potential code paths for the non-existing 32bit target. Even though 32bit nvptx is not a supported target there are still some code under the hood supporting codegen for 32 bit ptx. I was advised to create an MCP to find out if this code should be removed or updated. Perhaps `@RDambrosio016` would have interest in taking a quick look at this.
… r=nagisa Fix codegen bug in "ptx-kernel" abi related to arg passing I found a codegen bug in the nvptx abi related to that args are passed as ptrs ([see comment](rust-lang#38788 (comment))), this is not as specified in the [ptx-interoperability doc](https://docs.nvidia.com/cuda/ptx-writers-guide-to-interoperability/) or how C/C++ does it. It will also almost always fail in practice since device/host uses different memory spaces for most hardware. This PR fixes the bug and add tests for passing structs to ptx kernels. I observed that all nvptx assembly tests had been marked as [ignore a long time ago](rust-lang#59752 (comment)). I'm not sure if the new one should be marked as ignore, it passed on my computer but it might fail if ptx-linker is missing on the server? I guess this is outside scope for this PR and should be looked at in a different issue/PR. I only fixed the nvptx64-nvidia-cuda target and not the potential code paths for the non-existing 32bit target. Even though 32bit nvptx is not a supported target there are still some code under the hood supporting codegen for 32 bit ptx. I was advised to create an MCP to find out if this code should be removed or updated. Perhaps ``@RDambrosio016`` would have interest in taking a quick look at this.
…exception, r=workingjubilee,RalfJung NVPTX: Allow PassMode::Direct for ptx kernels for now Upgrading the nvptx toolchain to the newest nightly makes it hit the assert that links to rust-lang#115666 It seems like most targets get around this by using `PassMode::Indirect`. That is impossible for the kernel as it's not a normal call, but instead the arguments are copied from CPU to GPU and the passed pointer would be invalid when it reached the GPU. I also made an experiment with `PassMode::Cast` but at least the most simple version of this broke the assembly API tests. I added fixing the pass mode in my unofficial tracking issue list (I do not have the necessary permissions to update to official one). rust-lang#38788 (comment) Since the ptx_abi is currently unstable and have been working with `PassMode::Direct` for more than a year now, the steps above is hopefully sufficient to enable it as an exception until I can prioritize to fix it. I'm currently looking at steps to enable the CI for nvptx64 again and would prefer to finish that first.
Rollup merge of rust-lang#117247 - kjetilkjeka:nvptx_direct_passmode_exception, r=workingjubilee,RalfJung NVPTX: Allow PassMode::Direct for ptx kernels for now Upgrading the nvptx toolchain to the newest nightly makes it hit the assert that links to rust-lang#115666 It seems like most targets get around this by using `PassMode::Indirect`. That is impossible for the kernel as it's not a normal call, but instead the arguments are copied from CPU to GPU and the passed pointer would be invalid when it reached the GPU. I also made an experiment with `PassMode::Cast` but at least the most simple version of this broke the assembly API tests. I added fixing the pass mode in my unofficial tracking issue list (I do not have the necessary permissions to update to official one). rust-lang#38788 (comment) Since the ptx_abi is currently unstable and have been working with `PassMode::Direct` for more than a year now, the steps above is hopefully sufficient to enable it as an exception until I can prioritize to fix it. I'm currently looking at steps to enable the CI for nvptx64 again and would prefer to finish that first.
…, r=workingjubilee,RalfJung NVPTX: Allow PassMode::Direct for ptx kernels for now Upgrading the nvptx toolchain to the newest nightly makes it hit the assert that links to rust-lang/rust#115666 It seems like most targets get around this by using `PassMode::Indirect`. That is impossible for the kernel as it's not a normal call, but instead the arguments are copied from CPU to GPU and the passed pointer would be invalid when it reached the GPU. I also made an experiment with `PassMode::Cast` but at least the most simple version of this broke the assembly API tests. I added fixing the pass mode in my unofficial tracking issue list (I do not have the necessary permissions to update to official one). rust-lang/rust#38788 (comment) Since the ptx_abi is currently unstable and have been working with `PassMode::Direct` for more than a year now, the steps above is hopefully sufficient to enable it as an exception until I can prioritize to fix it. I'm currently looking at steps to enable the CI for nvptx64 again and would prefer to finish that first.
It seems likely that the way this ABI will work is going to be similar in its broad strokes between most GPU targets. Much like |
Here's a suggestion for an update to the tracking issue to include concerns. Partially copied for japaric's original post and added concerns from and links to relevant issues.
If you have the possibility you should take a look @RDambrosio016
Feature gate
#![feature(abi_ptx)]
This ABI is intended to be used when generating code for device (GPU) targets like
nvptx64-nvidia-cuda
. It is used to generate kernels ("global functions") that work as an entry point from host (cpu) code. Functions that do not use the "ptx-kernel" ABI are "device functions" and only callable from kernels and device functions. Device functions are specifically not usable from host (cpu) code.Public API
The following code
Produces
Steps / History
PassMode::Direct
with something else (nvptx "ptx-kernel" ABI (feature: abi_ptx) uses PassMode::Direct for Aggregates #117271)()
Unresolved Questions
nvptx64-nvidia-cuda
target and the__global__
modifier.#[repr(C)]
types seems like a good start (no slices, tuples, references, etc).&[T]
,(T, U)
, etc). Are there some types that do not have a stable representation but can be relied on having an identical representation for sequential compilation with a given rustc version? If so are there any way we could pass them safely between host and device code compiled with the same rustc version?nvptx64-nvidia-cuda
on stable Rust. The target seems to still have a few bugs (NVPTX backend metabug #38789). Should this feature be kept unstable to avoid usage ofnvptx64-nvidia-cuda
until it has been verified to be usable.Notes
#[naked]
functions as the.entry
directive needs to be emited for nvptx kernels.The text was updated successfully, but these errors were encountered: