-
-
Notifications
You must be signed in to change notification settings - Fork 21.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Emulate double precision for regular rendering operation when REAL_T_IS_DOUBLE #66178
Emulate double precision for regular rendering operation when REAL_T_IS_DOUBLE #66178
Conversation
ab1ed2e
to
b318b0f
Compare
Took me a minute to wrap my brain around it but this is a deceptively simple solution to a gnarly problem. Really cool! |
We calculate the lost precision on the CPU and pass it into the GPU so that it can calculate an error-corrected version of the vertex position
b318b0f
to
27a3014
Compare
I hope this isn't a ridiculous question, but should/can this be implemented for 2D as well? Based on my limited testing, it seems like 2D rendering also starts to jitter when getting millions of units out. |
I never even considered 2D. I guess it can be added if there is demand, but I wouldn't add it without reason. This code gets inserted in all qualifying shaders in the doubles build of the engine, so it has the potential to reduce performance even for normal UI stuff that would never need it. |
Sounds good, that makes sense! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not part of the rendering team which owns this, but I did successfully test with the sample project given by @Zylann and I did did the first pr on double precision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussing this in chat, feedback so far seems that this is working as advertised.
I had some concerns initially but after discussing with Clayjohn I think this is a very elegant solution.
Code wise I have nothing to add, maybe @reduz will want to have a final say but I think this is good to go.
Thanks! |
For reference and for clarity, is this only applicable when building godot with "bits = 64"? Or am I misunderstanding something? |
No, |
I agree, my question was rather whether the transformation matrix on the CPU-side must be computed in 64 bits in order to apply an adequate correction in the shader? And is this only done when Godot is compiled with "bits=64"? |
This is cool! Will this be enabled on default for official stable builds? |
@realkotob No, this is only applicable for "doubles" builds of the engine |
Hey, I tried to implement this in Love2D, do you guys think I've got it right? https://github.com/groverburger/g3d/pull/45/files I basically packaged the residual component into the unused bottom three zeroes of the 4x4 matrix like so:
Then unpacked them in the shader this way: mat4 diff = modelPacked - viewPacked; // viewpacked's third column is actually in worldspace
vec3 displacement = diff[3].xyz + vec3(diff[0].w,diff[1].w,diff[2].w);
mat3 modelAffine = mat3(modelPacked);
mat4 modelView = mat4( mat3(viewPacked) * mat4x3(modelAffine[0],modelAffine[1],modelAffine[2],displacement) );
worldPosition = vec4(modelAffine*vertexPosition.xyz, 0 ) + modelPacked[3];
viewPosition = modelView*vertexPosition;
screenPosition = projectionMatrix * viewPosition; |
It looks like you are missing the |
Interesting, did some reading about the 2sum and fast2sum algorithms. I tweaked the four-vector addition to a less generalized version that discards the very last error term, as you have done in your shaders for the final position: vec3 two_sum(vec3 a, vec3 b, out vec3 out_p) {
vec3 s = a + b;
vec3 v = s - a;
out_p = (a - (s - v)) + (b - v);
return s;
}
vec3 precise_sum(vec3 A, vec3 a, vec3 B, vec3 b) {
vec3 D,d;
vec3 C = two_sum(A,B,D);
vec3 c = two_sum(a,b,d);
vec3 CcD = C + (c+D);
vec3 e = (c+D) - (CcD-C);
return CcD + (d+e);
}
vec4 position(mat4 transformProjection, vec4 vertexPosition) {
mat3 modelAffine = mat3(modelPacked);
vec3 modelCoarse = modelPacked[3].xyz;
vec3 viewCoarse = viewPacked[3].xyz;
vec3 modelFine = (transpose(modelPacked))[3].xyz;
vec3 viewFine = (transpose( viewPacked))[3].xyz;
vec3 displacement = precise_sum(modelCoarse,modelFine,-viewCoarse,-viewFine);
mat4 modelView = mat4( mat3(viewPacked) * mat4x3(modelAffine[0],modelAffine[1],modelAffine[2],displacement) );
viewPosition = modelView*vertexPosition;
screenPosition = projectionMatrix * viewPosition;
return screenPosition;
} Out of curiosity, what would be the advantage and disadvantages of just pre-computing the vec3 displacements on the CPU in doubles and then send it to the shader as the relative difference? |
The reason to use the specialized addition operations is you minimize the error loss from the operations. You do that by tracking the error from every operation, aggregating that error and then adding it back in during subsequent operations. The net result is an overall reduction in error. If you simply take the original error term in add it back in at the very end, you will still have the cumulative error that came from the operations. In other words, that approach would be better than nothing, but it would not be as effective as the full solution used here |
Thanks, but it didn't quite answer my question. I'm asking about why not just compute the |
Ah, thanks for clarifying. For certain renderers it may be fully possible to pre-multiply the transforms using doubles and only send a baked modelviewmatrix. This would be optimal as far as precision goes. Godot can't use that approach for two reasons:
Overall, it is preferable to do a few extra vertex calculations than to send more data from the CPU to the GPU for each mesh for each frame |
Out of curiosity, does the vertex shader compute the VP matrix every time or is this updated once per viewport from the CPU side? Would this be a performance concern since I suppose GPUs have a one cycle 4x4 matrix multiplication operation in the SFU? |
I would like to clarify that I did not mean calculating the mvp on cpu, but rather purely the world space position delta relative to the camera. Here is future me answering past me: Even without needing to recalculate the matrix for every single object on cpu, there is still a deal breaking fact that it means you will also be updating the positions of all static objects on change of your camera position. Even for dynamic objects or skeletal animations they are not necessarily always updated every single frame, whereas with position being tied to your camera a simple walkaround results in constantly flushing out the instance data. |
Fixes: #58516
Finishes @fire 's 4 years 10 months 30 days journey of double support: #12299
We calculate the lost precision on the CPU and pass it into the GPU so that it can calculate an error-corrected version of the vertex position. The general approach is detailed here: http://andrewthall.org/papers/df64_qf128.pdf
This allows rendering very large worlds as if we were using double precision in the shader when we are not.
We don't use doubles in the shader for two reasons:
This method works for "normal" rendering meaning it does not work with the render modes
skip_vertex_transform
orworld_vertex_coords
in either case you end up doing calculations totally in floating point and you lose the benefit of this. By the same token, any shader operations in world space will still be a problem as the shaders are 100% floats. This means that world triplanar is still limited by the bounds of single precision floats.I have not implemented this yet in GLES3 as I want to flesh out the 3D renderer a bit more first. Particularly, I want to ensure that this won't conflict with our floating point precision needs.
Lastly, as it is written, this approach does not work with particles/multimeshes. The same general approach can be used, but when applied to particles, this approach will result in many many times more calculations. Right now, for smaller ranges (<500km) this isn't a big deal, but once you approach 1000km the error becomes noticeable and above 10,000km the error is significant. I can add the relevant code for particles/multimeshes if desired.Edit: I was wrong above, I thought that we would need to add a high precision path to the instance_transform * model matrix multiplication which would require decomposing the multiplication into a mat3 x mat3 multiplication and a high precision dot product (which would be a ton of calculations). However, I realized that when using the normal render path, we could keep the instance_transform separate and add its origin offset the same time we do the model/view multiplication. So I've added a code path that ensures the full model matrix is available if
MODEL_MATRIX
is read in the shader, if using world vertex coords, if skipping vertex transform, or if not using doubles. Finally, I madeMODEL_MATRIX
read only in shaders. Previously you could write to it and the value would be totally ignored.CC @Zylann @reduz @BastiaanOlij
Comparison
At origin
Before: (Ignore the triangle, it moves to show smooth particle movement)
After:
1 billion units away!
Before:
After: