Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wgsl-in] Overhaul number lexing / parsing #1863

Merged
merged 1 commit into from
Jun 10, 2022

Conversation

teoxoy
Copy link
Member

@teoxoy teoxoy commented Apr 25, 2022

Updated number literal format and behavior to latest spec version.

Reduced the number errors to the following errors:

error: `-0x1f.p500f` numeric literal not representable by target type
  ┌─ test.wgsl:7:18
  │
7 │     var var0: f32 = -0x1f.p500f;
  │                     ^^^^^^^^^^^ numeric literal not representable by target type
error: `1h` unimplemented f16 type
  ┌─ test.wgsl:7:18
  │
7 │     var var0: f32 = 1h;
  │                     ^^ unimplemented f16 type
error: `` invalid numeric literal format
  ┌─ test.wgsl:7:18
  │
7 │     var var0: f32 = 0xx;
  │                     ^ invalid numeric literal format

(the span of the invalid error can't be more accurate than the first character since we don't know where it ends)

fixes #1843
gets us closer to gpuweb/gpuweb#2227

Related
gpuweb/gpuweb#2762
gpuweb/gpuweb#2769

@jimblandy jimblandy self-requested a review May 1, 2022 04:35
@jimblandy
Copy link
Member

jimblandy commented May 1, 2022

I wish hexf_parse provided an API that just let you hand over the segments of text you'd identified.

edit: filed lifthrasiir/hexf#22

@jimblandy
Copy link
Member

@teoxoy Have you profiled the effect of using regular expressions to parse numeric literals? You can get some big shaders from the sources given in .github/workflows/lazy.yml.

Copy link
Member

@kvark kvark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for improving the number parsing!

We are generally trying hard to keep the code simple and dependencies to the minimum. So involving regex should come with a good justification. I suppose in case of your PR the justification looks like this:

  1. it fixes a bunch of cases
  2. WGSL spec uses regular expressions already, so we can encode it more directly

If that sounds right to you, please consider building regexes in a non-lazy way.

src/front/wgsl/lexer.rs Outdated Show resolved Hide resolved
@teoxoy teoxoy force-pushed the number-literal branch from fda411b to d725977 Compare May 2, 2022 15:03
@teoxoy
Copy link
Member Author

teoxoy commented May 2, 2022

Regarding the perf impact, I've done some benchmarking using the criterion benchmarks we have in the repo and the regex approach seems to be around 60% slower.

Tbh I didn't expect the difference to be this large...

We are generally trying hard to keep the code simple and dependencies to the minimum. So involving regex should come with a good justification. I suppose in case of your PR the justification looks like this:

  1. it fixes a bunch of cases
  2. WGSL spec uses regular expressions already, so we can encode it more directly

Indeed, however the perf impact is concerning.

I'll experiment with the manual state machine we had before and see how it goes.

@jimblandy
Copy link
Member

@teoxoy One comment in my review (I'll finish it soon) was, I wonder if the regexps ending in (.*) to capture the rest might be scanning all the way to the end of the input, each time we parse a literal. But I see your most recent version has taken those out. Was the 60% impact measured with, or without, the (.*) rest matchers?

@teoxoy
Copy link
Member Author

teoxoy commented May 2, 2022

The (.*) to match the rest was a huge slowdown (up to 10x slower than master) - I thought regex engines would optimize it out (as I did in the revised commit). I took it out as soon as I saw it. So yeah... 60% without it.

Progress on the manual state machine: it's giving me a headache 😅.

Do you guys know of other ways that we could optimize the regex approach?

@jimblandy
Copy link
Member

jimblandy commented May 2, 2022

Do you guys know of other ways that we could optimize the regex approach?

tl;dr: I don't think so, unfortunately.

I think it would have to entail using some sort of build-time code generator. We've generally shied away from those because of the build-time impact, since that affects developers constantly (serde and thiserror being exceptions).

I've also noticed a tendency for projects to move away from code generators in favor of hand-written parsers. This is pretty disappointing to me. The reason seems to be that generators have to get a whole range of things Just Right to be competitive with hand-written code:

  • flexibility (real-world grammars are not well-behaved; remember having to write > > in C++ all the time?)
  • error messages (i.e. statically detectable problems)
  • debuggability (i.e. dynamically detectable problems)
  • runtime performance
  • speed of the generator itself
  • generated code size (rustc is slow)

Generators let you write your code at a higher level, but coders have this "that's hard, but I could do it" mindset that undervalues that. Generators generate more correct code, but the use of safe Rust makes this less crucial than it used to be.

@teoxoy teoxoy force-pushed the number-literal branch 2 times, most recently from 3441dd6 to ff6f54b Compare May 31, 2022 19:27
@teoxoy
Copy link
Member Author

teoxoy commented May 31, 2022

I rewrote the number parser and removed the regex dependency. The perf impact of this PR is now in the range of -2% to -5% from my limited benchmarking. I think it should be good to go now.

On another note, we now only have 2 instances where we need to create a new String for hexf:

  • when the exponent is missing
  • when the period is missing

I think instead of pushing for a new API for hexf (i.e. from_parts) we could instead request to relax the grammar (more specifically here).

@teoxoy teoxoy requested a review from kvark May 31, 2022 19:38
@teoxoy
Copy link
Member Author

teoxoy commented May 31, 2022

ping @jimblandy (since it seems I can't re-request a review)

Copy link
Member

@jimblandy jimblandy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty intricate. Thanks for taking this on!

I have a few changes I'd like to see and then re-review.

src/front/wgsl/number.rs Outdated Show resolved Hide resolved
src/front/wgsl/number.rs Outdated Show resolved Hide resolved
src/front/wgsl/number.rs Outdated Show resolved Hide resolved
src/front/wgsl/number.rs Outdated Show resolved Hide resolved
src/front/wgsl/number.rs Outdated Show resolved Hide resolved
@teoxoy
Copy link
Member Author

teoxoy commented Jun 2, 2022

@jimblandy thanks for the review and suggestions - they were great! I addressed the comments, simplified some things further and added a few more docs (I know you like those 😉).

@teoxoy teoxoy requested a review from jimblandy June 2, 2022 12:18
Copy link
Member

@jimblandy jimblandy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really nice. Just a few more points I'd like to see addressed.

src/front/wgsl/number.rs Outdated Show resolved Hide resolved
src/front/wgsl/number.rs Show resolved Hide resolved
src/front/wgsl/number.rs Outdated Show resolved Hide resolved
src/front/wgsl/number.rs Outdated Show resolved Hide resolved
src/front/wgsl/mod.rs Outdated Show resolved Hide resolved
src/front/wgsl/mod.rs Outdated Show resolved Hide resolved
src/front/wgsl/mod.rs Show resolved Hide resolved
src/front/wgsl/mod.rs Outdated Show resolved Hide resolved
src/front/wgsl/number_literals.rs Outdated Show resolved Hide resolved
src/front/wgsl/lexer.rs Show resolved Hide resolved
@teoxoy teoxoy requested a review from jimblandy June 8, 2022 09:26
Copy link
Member

@jimblandy jimblandy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the one remaining change (in a reply to your comment) about the return type of parse.

@jimblandy
Copy link
Member

@teoxoy Since I had misunderstood some bits about the behavior of consume_number in my previous comments, this time I tried to put my money where my mouth is and actually implement my suggested change. It seems to work fine. If you'd like to look it over, I can push it to your branch (thanks, GitHub!) and if it seems all right, we can squash, rebase, and merge. If you don't like it, then we can just merge the PR as-is.

@teoxoy
Copy link
Member Author

teoxoy commented Jun 9, 2022

@jimblandy feel free to push the change; I think the reasoning for it makes sense 👍

Bring the lexer's parsing of numeric literals in line with the WGSL
specification as of 86a23b83 (2022-05-10).
@jimblandy jimblandy enabled auto-merge (rebase) June 10, 2022 17:32
@jimblandy jimblandy merged commit 53aa3e2 into gfx-rs:master Jun 10, 2022
@teoxoy teoxoy deleted the number-literal branch June 10, 2022 20:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[wgsl-in] Numeric suffixes + unsanitised related error
3 participants