-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port recent GPU stroke expansion changes in flatten
to CPU version
#415
Conversation
This is needed to unpack the miter limit parameter in the CPU test shader version of GPU stroke flattening.
Ported the recent GPU-side stroke expansion changes to the CPU version of the `flatten` stage.
e5c58a3
to
48521ba
Compare
fn test_f16_to_f32() { | ||
const EPS: f32 = 0.001; | ||
let input: f32 = std::f32::consts::PI; | ||
assert!((input - f16_to_f32(f32_to_f16(input))).abs() < EPS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have:
- an exported bit pattern of a 16 bit float, which we check gets the right number (i.e. test that both directions aren't wrong in the same way)
- test that values in the extreme ranges (i.e. values of the order
60_000
and1/60_000
± a factor of 2, if required - and 0) are appropriately roundtripped - special values (i.e. NaN, infinity) roundtrip as expected. This is less important, as we don't really want to see them - maybe we should
warn!
in those cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I added tests for max and min value roundtrip, nan/inf roundtrip, and an explicit conversion. Perhaps test_f16_to_f32_roundtrip
should become a proptest but I think this is good enough for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ran this locally, and it worked great, thanks.
I do unfortunately see different behaviour in tricky_strokes with use-cpu
enabled and disabled - the "bottom right" one doesn't change when zooming out when using the CPU version.
A few questions, which I believe are unlikely to be defects.
I can only really do a cursory scan of the code with my current understanding, but that hasn't brought up anything blocking
@@ -580,7 +580,6 @@ fn output_two_lines_with_transform( | |||
|
|||
struct NeighboringSegment { | |||
do_join: bool, | |||
p0: vec2f, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if we could get warned about this being never read, but that's wishful thinking.
Although...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
**suspense**
src/cpu_shader/flatten.rs
Outdated
let n_lines = if theta <= ROBUST_EPSILON { | ||
MAX_LINES | ||
} else { | ||
MAX_LINES.min((std::f32::consts::TAU / theta).ceil() as u32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it likely that most of this maximum will be used on paths which are off-screen? (or has some culling already occured?)
Having done some (probably incorrect) translation of this code, it seems to me that the radius
has to be much greater than 10_000
pixels1.
Footnotes
-
I tried this in two different ways, and got two different numbers, so I don't fully trust my maths here ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it likely that most of this maximum will be used on paths which are off-screen?
Yes, the really high line segment counts occur at really high zoom levels (for example if you zoom in directly to a stroke's cap). There is actually currently no culling and it's highly likely to wastefully generate off-screen line segments at high zoom levels. longpathdash with round caps is a good example of this, where zooming in actually exceeds the memory budget for LineSoup.
We've been talking about implementing at the very least viewport culling next year to handle the zoomed-in cases.
src/cpu_shader/flatten.rs
Outdated
MAX_LINES.min((std::f32::consts::TAU / theta).ceil() as u32) | ||
}; | ||
|
||
let th = angle / (n_lines as f32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does the number of lines depend only on the radius, rather than including the angle being swept over?
I would naïvely expect th
to be the same for all circles with the same radius, as that should make equal length lines.
It feels quite likely that I'm missing something obvious here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also have this concern, and wonder if it might have something to do with the high observed memory usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, there is a bug here and thank you for taking a close look and catching that. n_lines
is the number for a full circle. This is directly transcribed from the GPU flatten code where I was getting the count for a full circle and fitting that into any sweep angle. I also noticed that the derivation of theta
is also not quite right. Given a radius
And n_lines = (angle / theta).ceil()
. We still need to guard against a division by zero / over-subdividing. I went ahead and picked an arbitrary minimum angle limit of 0.0001
radians but this could probably get fine tuned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have concerns about the number of subdivisions in the arc flattening logic. I didn't wrap my head around it fully - perhaps it's right and there's something I'm missing, in which case I'd happily accept an argument.
Other than that, this looks like a good foundation for the Euler spiral flattening, and I'm looking forward to getting that fully wired.
src/cpu_shader/flatten.rs
Outdated
let tol: f32 = 0.5; | ||
let radius = tol.max((p0 - transform.apply(center)).length()); | ||
let x = 1. - tol / radius; | ||
let theta = (2. * x * x - 1.).clamp(-1., 1.).acos(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same as sqrt(8 tol / radius) for small tol. I wonder if the extra math is worth it. I haven't investigated in detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new expression should be correct now and is simpler.
src/cpu_shader/flatten.rs
Outdated
MAX_LINES.min((std::f32::consts::TAU / theta).ceil() as u32) | ||
}; | ||
|
||
let th = angle / (n_lines as f32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also have this concern, and wonder if it might have something to do with the high observed memory usage.
src/cpu_shader/flatten.rs
Outdated
) { | ||
let mut p0 = transform.apply(begin); | ||
let mut r = begin - center; | ||
let tol: f32 = 0.5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a little loose to me, I've generally been using 0.25. I'm also concerned that it doesn't match the flattening tolerance for curves in general.
This might be a good time for a heads up: I am not confident that the tolerance value in the existing code is accurately calibrated against the actual Fréchet distance between flattened and source curve; I may have let a constant factor of error enter. But the current work I'm doing does calibrate that, so when I land that I will be systematic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the new derivation, 0.25 leads to visible line segments. 0.1 looks decent enough to me. I'm worried that anything lower will blow up the line count but you should probably take a look and make a judgement as to whether this tolerance is visually satisfactory.
PTAL. I think the arc flattening logic should be correct now barring any adjustments to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, this makes more sense. I expect to take a closer look (hopefully visualizing the flattening output) as I adapt the Euler stuff, so I didn't go through with a fine-tooth comb, but this definitely gets us to a better place. Thanks!
let th = angle / (n_lines as f32); | ||
let c = th.cos(); | ||
let s = th.sin(); | ||
let c = theta.cos(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can write let (s, c) = theta.sin_cos()
but I'm not sure it's a significant improvement, and in any case the function is not available in WGSL.
@@ -126,6 +126,12 @@ fn write_line( | |||
bbox: &mut IntBbox, | |||
lines: &mut [LineSoup], | |||
) { | |||
assert!( | |||
!p0.is_nan() && !p1.is_nan(), | |||
"wrote line segment with NaN: p0: {:?}, p1: {:?}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can say {p0:?}
here, as of Rust 1.58.
@DJMcNab I'm not sure why there is a discrepancy but I'll investigate this further. That said the bottom rows of strokes are currently not at all handled correctly - that's awaiting @raphlinus 's Euler Spiral work (early prototype started here: https://github.com/linebender/vello/tree/euler-spiral-strokes). |
This replicates the GPU-side stroke expansion changes that were recently made to the
flatten
stage to its CPU version.Note on buffer sizes:
The
longpathdash (round caps)
andmmark
scenes assert (due to out-of-bounds buffer access) if zoomed in (the former crashes immediately). The asserts can occur in thepath_count
andflatten
stages. The root cause is that thesegments
,seg_counts
, andlines
buffers are not large enough to accommodate the required number of line segments. When run on the GPU, this manifests itself as artifacts rather than a crash.This PR does not adjust the bump buffer sizes in order to avoid hiding the problem. I would like to fix this as part of #366. Unfortunately, even an accurate buffer size estimate will require more memory than what's permitted by the WebGPU standard (the default limit for maxStorageBufferBindingSize is 128 MiB and implementations aren't guaranteed to support a higher value). We have been considering culling lines that fall outside the viewport which can alleviate the memory usage in zoomed-in scenarios.