Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OpenGL] Basic OpenGL backend #492 #495

Closed
wants to merge 53 commits into from

Conversation

archibate
Copy link
Collaborator

Related issue id = #492

@archibate
Copy link
Collaborator Author

archibate commented Feb 18, 2020

I could't wait to share you more about the work!

@yuanming-hu yuanming-hu self-requested a review February 18, 2020 03:11
@yuanming-hu
Copy link
Member

yuanming-hu commented Feb 18, 2020

Great! I have one small suggestion (which will make PR/code maintenance much easier): keep each PR small. This will also make your life easier since you have fewer conflicts and get quicker reviews. The PR process of the Metal backend #396 is a great example.

I'm super excited to witness the birth of a new backend! At the same time, we should also pay more attention to code maintainability, as we are getting more and more backends :-) Maybe we should consider extracting the common parts of the OpenGL and Metal (and potentially OpenCL in the future) codegen into a base class.

@k-ye
Copy link
Member

k-ye commented Feb 18, 2020

I can share some experiences when working on the Metal one, or any other big features. I think it’s better to have a working solution in your local repo before opening PRs. The local solution doesn’t have to be clean or anything, just a proof of concept. Then we can start breaking down the solution into smaller PRs, possibly doing some cleanups during this process. This way you can save a lot of the reviewer’s bandwidth :-) We also don’t run into the risk that, in case there is a major design flaw, the main codebase gets polluted by incomplete solutions.

@yuanming-hu
Copy link
Member

yuanming-hu commented Feb 18, 2020

Yeah, I strongly agree on this two-stage strategy: writing a new backend inevitably has a lot of uncertainty, so it's great to have a minimal working prototype (e.g. one that runs mpm99.py, which is already pretty challenging IMO). Then clean up the code and break down the changes into small (and preferably testable, if I'm not asking for too much) PRs, each with clear meaning.

As @k-ye said, doing so allows you to get quick design feedback and thereby minimizes the risk of high-level design errors. It also minimizes the number of hours you spend before you get a working codegen & demos (and everybody gets super excited so you get more helping hands).

Last but not least, following this strategy makes the reviewer's job much easier :-)

@archibate
Copy link
Collaborator Author

Thank for your advises! Managed to use git rebase to reduce reviewer's pressure.

@archibate
Copy link
Collaborator Author

@archibate archibate changed the title [WIP] Basic OpenGL backend [GLSL backend] stage 1: serial_kernel Feb 18, 2020
@archibate archibate changed the title [GLSL backend] stage 1: serial_kernel [OpenGL] stage 1: serial_kernel & simple arrays Feb 18, 2020
@archibate
Copy link
Collaborator Author

archibate commented Feb 18, 2020

How to deal with globals? How can I get raw data from snodes? I can pass in/out data to/from GL now.

@archibate
Copy link
Collaborator Author

You probably want to tell me here, so I'll see it tomorrow. Good night~~

@yuanming-hu
Copy link
Member

We need to allocate something like a pixel buffer (or something related, my OpenGL knowledge is rusty), which we use as a byte array to store the global data structure :-)

@yuanming-hu
Copy link
Member

Btw, I found https://github.com/9ballsyndrome/WebGL_Compute_shader!

It will be great if we can do compute shader on WebGL, but my read is that it might take a couple more years for WebGL compute shader to become mature :-) For example, it's still not yet supported by default in my browser.

taichi/backends/codegen_opengl.cpp Outdated Show resolved Hide resolved
taichi/platform/opengl/opengl_api.cpp Outdated Show resolved Hide resolved
taichi/platform/opengl/opengl_api.cpp Outdated Show resolved Hide resolved
taichi/backends/codegen_opengl.cpp Outdated Show resolved Hide resolved
taichi/platform/opengl/opengl_api.cpp Show resolved Hide resolved
@archibate
Copy link
Collaborator Author

archibate commented Feb 19, 2020

I need a deeper understand of snode now. When we say

x = ti.var(ti.i32, shape=())
@ti.kernel
def func():
  x[None] = 233

x isn't allocated this time until kernel called, right?
Then, does the backend responsible for allocation?
In fact, SSBOs aren't so precious, they don't need to be allocated like cudaMallocManaged this way. Simply malloc can works - any part of CPU memory can be bound with SSBO.

@yuanming-hu
Copy link
Member

x isn't allocated this time until kernel called, right?

Yes, it will be allocated the first time one kernel is called, or when you access the data structure in Python scope (e.g. x[1, 2, 3]=3 outside Taichi kernels)

Then, does the backend responsible for allocation?

The struct compiler does allocation:

creator = [=]() {

The struct_metal file also worth checking out.

@archibate
Copy link
Collaborator Author

archibate commented Feb 19, 2020

or when you access the data structure in Python scope (e.g. x[1, 2, 3]=3 outside Taichi kernels)

I also noticed that x[1] = 2 or print(x[1]) is translated into a kernel with a argument somehow...

The struct_metal file also worth checking out.

Oh, I almost forget that!

prog->memory_pool->set_queue((MemRequestQueue *)mem_req_queue);

Seems you have a memory pool there working for LLVM backend? Maybe GL backend can use it later too, as lone as it doesn't use CUDA API.

@yuanming-hu
Copy link
Member

I also noticed that x[1] = 2 or print(x[1]) is translated into a kernel with an argument somehow...

Yeah, you'll have to launch a compute shader task for single element accesses like these. A more efficient way is to use numpy interfaces to/from_numpy() - I would suggest start with the numpy interface first since that's easier to implement.

Seems you have a memory pool there working for LLVM backend? Maybe GL backend can use it later too, as lone as it doesn't use CUDA API.

The memory pool stuff relies on unified memory (which to my knowledge is not supported by OpenGL), so I would suggest starting with just dense snodes, whose storage can be directly pre-allocated.

@archibate
Copy link
Collaborator Author

to_numpy need struct_for kernel, which is not implemented for now.

@archibate
Copy link
Collaborator Author

archibate commented Feb 19, 2020

Memory allocate done. Next event is to figure out how x[0] (read_int) works, wanna see print shows 233 on screen.

[skip ci] test inout data[3] success

[skip ci] try to pass kernel arguments
@archibate
Copy link
Collaborator Author

archibate commented Feb 19, 2020

We're making history! After this and stage 2: struct_for_kernel & parallel process done, I think we're going to release our GPGPU programming language on both Linux, OS X and Windows with a even wider usersphere!!! I've also thought about the stage 3 title: nested for & func~

@archibate
Copy link
Collaborator Author

Ok! Will fix the bug tomorrow. Check examples/opengl_example.py for test, and good night.

@archibate
Copy link
Collaborator Author

NOTE: 976d387 may have conflicts with #527.

@archibate
Copy link
Collaborator Author

TODO: atan2 -> atan.

       genType atan(genType y, genType x);
       genType atan(genType y_over_x);

@archibate archibate requested a review from k-ye February 25, 2020 11:09
@archibate
Copy link
Collaborator Author

archibate commented Feb 25, 2020

I simply skipped OpenGL from test_types for now, since it doesn't support lots of types involved in test_types like u8. All types supported by GLSL firmware: i1(bool), i32, u32, f32, f64.

@k-ye
Copy link
Member

k-ye commented Feb 25, 2020

I saw that you requested a review. If you think this is ready, as suggested before, could you break this down into smaller PRs? Otherwise it's very hard for reviewing.. You may need to create new branches and open new PRs, whereas archibate:opengl will be an experimental branch containing a working solution for reference ;)

@archibate
Copy link
Collaborator Author

archibate commented Feb 25, 2020

Yes, I will, after everything done. But the actual reason why I requested you for review is, I'm afraid my changes in test_types.py could broke yours.

@archibate
Copy link
Collaborator Author

ALL TESTS PASSED!!!

@k-ye
Copy link
Member

k-ye commented Feb 25, 2020

I'm afraid my changes in test_types.py could broke yours.

Ha thanks. Don't worry too much until it actually gets merged... And I think it's fine not to worry too much about breaking the metal backend. After all, it's hard to test the code if you don't even have a way to run it..

@archibate
Copy link
Collaborator Author

Break down into fragments, closing here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants