Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebGL2: Asynchronous Readback with PIXEL_PACK_BUFFERs #22779

Closed
zalo opened this issue Nov 2, 2021 · 15 comments · Fixed by #28291
Closed

WebGL2: Asynchronous Readback with PIXEL_PACK_BUFFERs #22779

zalo opened this issue Nov 2, 2021 · 15 comments · Fixed by #28291

Comments

@zalo
Copy link
Contributor

zalo commented Nov 2, 2021

Is your feature request related to a problem? Please describe.

WebGL2 (which is now supported by all modern browsers) offers powerful General Purpose GPU capabilities. three.js already implements an impressive array of these features (like WebGLMultipleRenderTargets). This GPGPU suite is nearly complete, but it's missing a performant way of reading texture data back from the GPU. renderer.readRenderTargetPixels() waits for a sync with the GPU before the download starts, costing tens of milliseconds per-call on PC and more on mobile.

Describe the solution you'd like

WebGL2 adds the ability to bind _gl.PIXEL_PACK_BUFFERs, which allow for asynchronous readback from the GPU Texture Buffers, largely eliminating the performance impact of GPU -> Code. This should open up all kinds of GPU Physics (and other GPGPU) Solutions.

Describe alternatives you've considered

Luma.gl implemented async readback a few years ago
Babylon.js also has some functionality for this
(One implementation in Babylon...)
Unity has a similar mechanism for this

Additional context

My immediate use-case would be to speed up this GPU Finite Element Modelling simulation.
If I can get everything running fast a stable on all platforms, I'd also like to contribute a new GPUComputationRenderer (which would support WebGLMultipleRenderTargets and separate the concept of "passes" from "variables", allowing for multiple passes over the same data).

@Cygnusfear
Copy link

I am incredibly interested in this. Is there a possibility to accelerate implementation of this feature with for example a bounty?

@snsie
Copy link

snsie commented Nov 7, 2021

I am not sure how to best integrate the approach for general use cases, but I am successfully retrieving pixel values by integrating the readPixelsAsync example included in the link below.

https://developer.mozilla.org/en-US/docs/Web/API/WebGL_API/WebGL_best_practices

The pixel buffer input dest type needs to match the texture data type, which was gl.FLOAT for me. Examples I have found online, such as the one linked below, use gl.UNSIGNED_BYTE as the type so I am not sure if compatibility issues will arise on older devices.

https://github.com/kainino0x/getBufferSubDataAsync-Demo/blob/master/index.html

I modified the GPUComputationRenderer class to call readPixelsAsync inside of the doRenderTarget function. I'm still trying to figure out the best way to bind pixel buffer objects to specific textures rather than the base GPUComputationRenderer class.

I apologize for not providing a more robust solution, but I thought this info might be helpful. It would be exciting to experiment with a GPUComputationRenderer that incorporates both nonblocking pixel readback and WebGLMultipleRenderTargets.

zalo added a commit to zalo/TetSim that referenced this issue Nov 9, 2021
@zalo
Copy link
Contributor Author

zalo commented Nov 9, 2021

@snsie Thank you for the tips! Following your instructions, I was able to get Async Readback working in my TetSim with these changes: zalo/TetSim@9696c2e

Live Version Here

It results in a dramatic performance improvement when reading texture data back from the GPU! (Especially on Mobile/iOS!)

I didn't try too many things, but the spooky(?) thing about it is that it only seems to work if it's called right after the last render() call that writes to the texture you want (I guess because then it's the active render target?) It may also be desirable to persist buf between calls so gl.createBuffer() isn't getting called all the time.

@sciecode
Copy link
Contributor

sciecode commented Feb 22, 2022

Alright, let's try to get this feature live, so others can utilize it as well.

it only seems to work if it's called right after the last render() call that writes to the texture you want (I guess because then it's the active render target?)

I imagine you were bouncing render targets, so if you don't call readPixels just after the rendertarget is complete. It would immediately get used by the renderer again, and it should give you a framebuffer incomplete error. I believe this is more of user-level concern, and highly depends on the use-case.

It may also be desirable to persist buf between calls so gl.createBuffer() isn't getting called all the time.

Yes, this is something to be considered. However, we should allow for the user to readback more than a single RenderTarget at a time. So reusing buffers may not always be viable, as it really depends on the number of active readbacks occurring at any given time.

An internal pool of on-demand PIXEL_PACK_BUFFER buffers would be the ideal approach IMO. Those are created only if requested and no other buffers are available in the pool. As they finish copying the buffer data to the destination TypedArray, those get inserted back into the pool for later use, and we can even dispose these buffers if they stay latent for too long as well.

I should have enough to prep an initial PR for this feature, however I'm unsure about setting a proper fence sync API to be used alongside the rest of the lib, or if the sync procedure should be considered stand-alone just for readPixels.

Thoughts? @Mugen87 @mrdoob

@zalo
Copy link
Contributor Author

zalo commented Feb 23, 2022

I'm a bit biased toward GPGPU uses, but my ideal would be that the current GPUComputationRenderer gets supplemented or replaced with a generalized version of the MultiTargetGPUComputationRenderer I put together for the Tetrahedral simulation in the above post.

The primary upgrades are that it:

  • Separates the notion of textures from passes (so different passes/shaders can be run on the same texture)
  • Populates multiple target outputs so that one pass can to write to multiple texture outputs
  • Adds Asynchronous Readback functionality

At the moment, I believe it has some rough edges (hard-codes to 4 outputs, comments out gl_FragColor -> pc_FragColor define for no reason, doesn't have an updated read-me block, and other misc. style issues etc.)

A lot of these could probably get picked up and resolved while porting the current GPGPU examples to it.

I haven't had the time lately to wrap it together into a nice PR, but anyone is free to take the code and modify + submit it it for that (or any other) purpose.

(I can see how it might also be desirable to do Async Readback from non-GPGPU render textures, so I suppose the functionality could also live deeper in three.js and generalize back to my use-case...)

@Mugen87
Copy link
Collaborator

Mugen87 commented Feb 23, 2022

I can only say that it's best to start with something simple that can be enhanced over time. Try to not over-engineer the feature. The advantage of PIXEL_PACK_BUFFER should become clear in a single use case which is easy to understand. It's then more likely that a PR gets properly reviewed and eventually merged.

@upisfree
Copy link
Contributor

This is really needed feature! I'm using GPUComputationRenderer and really miss async readRenderTargetPixels especially for physics on GPU and GPU picking, so I will try to adopt @zalo's code for me, thanks!

Wish to see this in three.js :)

@sciecode
Copy link
Contributor

sciecode commented Jun 10, 2022

Will experiment with a proposal PR for this feature, this weekend. I have been running a test build in a personal project for a while and I feel like it could fit well into our API. Will be good to test on as many devices as possible, to account for Polyfill needs and edge-cases.

@mrdoob
Copy link
Owner

mrdoob commented Jul 7, 2022

@sciecode any luck?

@sciecode
Copy link
Contributor

sciecode commented Jul 7, 2022

Well, sort of. I noticed that the approach I used only works because I respect a few rules, which aren't necessarily things that other users might want to follow. Let me break down the procedure to illustrate what's happening.

The main goal I set was to keep the API similar the current WebGLRenderer.readRenderTargetPixels. In WebGL 1, this blocks the main thread and guarantees that every call is gonna always be executed in the same animation frame that it was called. Which is why it acts so slow, but also why we have guarantees that every readPixels call is gonna return data from the framebuffer in that specific moment in time.

However, the async version works a bit differently. In order to prevent the main thread from being blocking, we introduce a fence command in the pipeline. This means, we have to opportunity to probe if this fence command was executed, and only after the fence is returned, we then procede with a blocking read-back of the specialized PIXEL_PACK_BUFFER buffer. Which should mean that everything runs a lot faster, because we don't need to wait for every command that was stacked on pipeline to run, before we regain control of the main thread.

The tricky part here is reliably making sure that each WebGLRenderer.readRenderTargetPixelsAsync is gonna always return the expected data, and handling what happens if another call originates before the previous one was fully executed. And this is exactly why I'm struggling a bit to make the PR for it. On my projects, I'm ok waiting for the previous call to be completely executed, before letting a new one originate. But this isn't necessarily what other users might want.

In Mozilla's best practices, they probe the fence command by looping a setTimeout with a small interval, which runs independently from the animation loop. However, I chose to only perform this check/readback synced with the animation frame, so that if we block the main thread, it only happens in a time where we are expecting it to already be busy. This helps prevent jitter and avoids breaking the allotted frame budget, but it also means that some frames might not be able to perform readPixels, or in other words, we might not have the result from the readback before the next frame.

If anyone has an idea of how to circumvent this limitation, I open to suggestions. If this is not super important, then I can push my version as an initial PR of this feature.

@sciecode
Copy link
Contributor

sciecode commented Jul 7, 2022

Thinking a bit more about it, I believe we might be fine using the described API. As long as the use-case is not a split CPU/GPU computation in a single frame. For example, performing some computation on the GPU and using that information in the same frame for a later computation on the CPU.

Which is far from being a common use of Three.js. Even in that case, using the synchronous version would still be a possibility. I think I'll move forwards with a PR, and if users request support for the use-case, we can try to devise a better solution to support it.

I'll do my best to work on the PR during the weekend, sorry for the delay @mrdoob.

@sguimmara
Copy link
Contributor

I think it would dramatically increase the efficiency of GPU picking, as currently (in our use case in giro3D), we need to render 3 differents passes to collect all the data for picking, and the main thread is frozen while readPixels() waits for the GPU to be idle.

@ligaofeng0901
Copy link
Contributor

I am incredibly interested in this too, especially in GPU picking.

@ligaofeng0901
Copy link
Contributor

Can this PR be merged or updated?
https://github.com/mrdoob/three.js/pull/24466/files

@mrdoob
Copy link
Owner

mrdoob commented Apr 17, 2024

Are you able to use the WebGPURenderer already? #26326

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants