Investigate compilation to GPUs #1330

Praetonus · 2016-10-17T01:08:11Z

LLVM has experimental support of NVPTX and AMDGPU backends. Getting Pony to run on these as a proof of concept would be great.

ryanai3 · 2016-10-19T01:27:48Z

Just want to comment saying that this would be awesome - I think the actor paradigm w/ capabilities maps really neatly onto gpus & the compile time types of memory (const, val, etc.).

mpbagot · 2019-01-05T16:09:06Z

I'd really like to do some work on this, but I don't think I'm particularly qualified. If any tips or pointers could be given on where to start, I'd be happy to give it a go.

Though I do wonder, given that most GPU based code (from what little research I've done) functions by using a host program to send processing kernels to the GPU, and then fetching the results later, how should this behaviour be written in Pony code?
Or should each new actor (beyond Main, which can be the host) be forked as a new work item in the corresponding work group on the GPU? (e.g. a newly created OutStream actor becomes a new work item in the existing OutStream work group)

SeanTAllen · 2019-01-05T18:04:26Z

Marked as "needs discussion during sync" to make sure that folks weigh in on @mpbagot's question.

jemc · 2019-01-08T18:00:16Z

We discussed this on today's sync call. I didn't take notes on all of what was said, and I personally don't know a lot about it, but if you listen to the last five minutes of the call, you can hear @sylvanc's comments on it.

One place to start would be to introduce an annotation (\annotation\ syntax) at the function level to specify that this function is targeting the GPU, with the parameters going into a render target / texture buffer (or some other way of communicating it), and then "render" the return value to the output render target.

cjdelisle · 2019-03-18T19:37:04Z

It might be worth considering trying to target SPIR-V, either directly or through the LLVM<->SPIR-V converter ( https://github.com/KhronosGroup/SPIRV-LLVM ) because SPIR-V is an intermediate which can be used to represent OpenCL and that allows vendor drivers to take over and compile it down to native.

mpbagot · 2021-10-29T07:11:20Z

There's quite a few issues I'm finding whilst trying to conceptualise how this could be done. Assuming only functions can be parallelised (for simplicity's sake), you have to consider the following:

How should GPU functions be called syntactically? Consider a simple function like below (gpufunc is a stand-in for the function annotation). This function should be called with two equal length arrays of input, and each pair of (a,b) values would be processed in a new thread on the GPU. Syntactically, what would a function call look like?

/gpufunc/
fun func_a(a: U32, b: U32) : U32 =>
    a + b

Should it be the same as all function calls, with array/iterable inputs?

result: Array[U32] = func_a([1; 2; 3], [4; 5; 6])

Or should some other form of call be used, like with the @ prefix on FFI calls, to ensure a noticeable distinction of GPU function calls vs CPU function calls?

GPU function calls from within GPU functions. I would assume that calls into functions would call that function with a single value of data. However, this means that the function parameter types need to change depending on where the function is called from, which is both somewhat confusing, and also complicates the syntactic processing. Alternate syntax for parallel calls would avoid this.
CPU function calls from within GPU functions. Would CPU functions be compiled twice, for GPU and CPU? Or would GPU functions simply be restricted to only calling other GPU functions to sidestep this?
FFI calls from GPU functions. These would need to be prevented, since there is no feasible way to compile arbitrary C libraries to run on the GPU.
Race conditions across multiple threads. Consider a case where a GPU function modifies a value in the class object it is called from. In traditional CUDA, all threads would write to the location in an indeterminate order. In pony, I imagine this behaviour would violate various guarantees. It would be easiest to ensure only pure functions can be GPU functions, as this solves both the issues of race conditions, and of modifying CPU objects from the GPU
Introduction of SPIRV-LLVM library to ponyc. As mentioned by cjdelisle, the simplest method for cross-platform GPU compilation would involve taking the functions' LLVM-IR code and translating it to SPIR-V using https://github.com/KhronosGroup/SPIRV-LLVM-Translator/. This would be the most user-friendly, and simplest implementation for this functionality that I can see, however it would involve pulling in an additional dependency for ponyc, for functionality that will only be used by a small number of pony programs.

Praetonus added difficulty: 3- hard labels Oct 17, 2016

jbondc mentioned this issue Nov 4, 2016

What are Web Assembly threads? WebAssembly/design#104

Closed

Praetonus mentioned this issue Nov 9, 2016

Investigate compilation to WebAssembly #1407

Open

SeanTAllen removed the priority: 1 - low label Sep 15, 2017

SeanTAllen added the needs discussion during sync label Jan 5, 2019

aturley removed the needs discussion during sync label Feb 5, 2019

SeanTAllen added help wanted Extra attention is needed enhancement New feature or request and removed complexity: major effort labels May 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate compilation to GPUs #1330

Investigate compilation to GPUs #1330

Praetonus commented Oct 17, 2016

ryanai3 commented Oct 19, 2016

mpbagot commented Jan 5, 2019

SeanTAllen commented Jan 5, 2019

jemc commented Jan 8, 2019

cjdelisle commented Mar 18, 2019

mpbagot commented Oct 29, 2021

Investigate compilation to GPUs #1330

Investigate compilation to GPUs #1330

Comments

Praetonus commented Oct 17, 2016

ryanai3 commented Oct 19, 2016

mpbagot commented Jan 5, 2019

SeanTAllen commented Jan 5, 2019

jemc commented Jan 8, 2019

cjdelisle commented Mar 18, 2019

mpbagot commented Oct 29, 2021