Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate compilation to GPUs #1330

Open
Praetonus opened this issue Oct 17, 2016 · 6 comments
Open

Investigate compilation to GPUs #1330

Praetonus opened this issue Oct 17, 2016 · 6 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@Praetonus
Copy link
Member

LLVM has experimental support of NVPTX and AMDGPU backends. Getting Pony to run on these as a proof of concept would be great.

@ryanai3
Copy link
Contributor

ryanai3 commented Oct 19, 2016

Just want to comment saying that this would be awesome - I think the actor paradigm w/ capabilities maps really neatly onto gpus & the compile time types of memory (const, val, etc.).

@mpbagot
Copy link
Contributor

mpbagot commented Jan 5, 2019

I'd really like to do some work on this, but I don't think I'm particularly qualified. If any tips or pointers could be given on where to start, I'd be happy to give it a go.

Though I do wonder, given that most GPU based code (from what little research I've done) functions by using a host program to send processing kernels to the GPU, and then fetching the results later, how should this behaviour be written in Pony code?
Or should each new actor (beyond Main, which can be the host) be forked as a new work item in the corresponding work group on the GPU? (e.g. a newly created OutStream actor becomes a new work item in the existing OutStream work group)

@SeanTAllen
Copy link
Member

Marked as "needs discussion during sync" to make sure that folks weigh in on @mpbagot's question.

@jemc
Copy link
Member

jemc commented Jan 8, 2019

We discussed this on today's sync call. I didn't take notes on all of what was said, and I personally don't know a lot about it, but if you listen to the last five minutes of the call, you can hear @sylvanc's comments on it.

One place to start would be to introduce an annotation (\annotation\ syntax) at the function level to specify that this function is targeting the GPU, with the parameters going into a render target / texture buffer (or some other way of communicating it), and then "render" the return value to the output render target.

@cjdelisle
Copy link

It might be worth considering trying to target SPIR-V, either directly or through the LLVM<->SPIR-V converter ( https://github.com/KhronosGroup/SPIRV-LLVM ) because SPIR-V is an intermediate which can be used to represent OpenCL and that allows vendor drivers to take over and compile it down to native.

@SeanTAllen SeanTAllen added help wanted Extra attention is needed enhancement New feature or request and removed complexity: major effort labels May 12, 2020
@mpbagot
Copy link
Contributor

mpbagot commented Oct 29, 2021

There's quite a few issues I'm finding whilst trying to conceptualise how this could be done. Assuming only functions can be parallelised (for simplicity's sake), you have to consider the following:

  1. How should GPU functions be called syntactically? Consider a simple function like below (gpufunc is a stand-in for the function annotation). This function should be called with two equal length arrays of input, and each pair of (a,b) values would be processed in a new thread on the GPU. Syntactically, what would a function call look like?
/gpufunc/
fun func_a(a: U32, b: U32) : U32 =>
    a + b

Should it be the same as all function calls, with array/iterable inputs?

result: Array[U32] = func_a([1; 2; 3], [4; 5; 6])

Or should some other form of call be used, like with the @ prefix on FFI calls, to ensure a noticeable distinction of GPU function calls vs CPU function calls?

  1. GPU function calls from within GPU functions. I would assume that calls into functions would call that function with a single value of data. However, this means that the function parameter types need to change depending on where the function is called from, which is both somewhat confusing, and also complicates the syntactic processing. Alternate syntax for parallel calls would avoid this.

  2. CPU function calls from within GPU functions. Would CPU functions be compiled twice, for GPU and CPU? Or would GPU functions simply be restricted to only calling other GPU functions to sidestep this?

  3. FFI calls from GPU functions. These would need to be prevented, since there is no feasible way to compile arbitrary C libraries to run on the GPU.

  4. Race conditions across multiple threads. Consider a case where a GPU function modifies a value in the class object it is called from. In traditional CUDA, all threads would write to the location in an indeterminate order. In pony, I imagine this behaviour would violate various guarantees. It would be easiest to ensure only pure functions can be GPU functions, as this solves both the issues of race conditions, and of modifying CPU objects from the GPU

  5. Introduction of SPIRV-LLVM library to ponyc. As mentioned by cjdelisle, the simplest method for cross-platform GPU compilation would involve taking the functions' LLVM-IR code and translating it to SPIR-V using https://github.com/KhronosGroup/SPIRV-LLVM-Translator/. This would be the most user-friendly, and simplest implementation for this functionality that I can see, however it would involve pulling in an additional dependency for ponyc, for functionality that will only be used by a small number of pony programs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

7 participants