Support unicode idents (matching rust) #444

ModProg · 2023-03-25T18:48:40Z

I've included my change in CHANGELOG.md

This not only changes the ident parsing, but also uses &str instead of &[u8]
for the rest of the parsing (sometime as a byte slice through .bytes()).

If you prefer to have a more restricted implementation that only changes
.*ident*() methods on Bytes I can revert the other changes and use
from_utf8_unchecked instead.

But as I saw that the benchmark using this implementation was more or less on
par with the current implementation I wanted to share it at least:

ron-new is this PR's string based implementation, ron is the current git
version.

Benchmarking Serde Deserialization/json/data/canada: Warming up for 3.0000 s
Warning: Unable to complete 10000 samples in 5.0s. You may wish to increase target time to 89.8s, or reduce sample count to 550.
Serde Deserialization/json/data/canada
                        time:   [8.9827 ms 8.9938 ms 9.0056 ms]
                        change: [-3.4193% -2.9076% -2.4310%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 860 outliers among 10000 measurements (8.60%)
  1 (0.01%) low mild
  493 (4.93%) high mild
  366 (3.66%) high severe
Benchmarking Serde Deserialization/ron/data/canada: Warming up for 3.0000 s
Warning: Unable to complete 10000 samples in 5.0s. You may wish to increase target time to 317.2s, or reduce sample count to 150.
Serde Deserialization/ron/data/canada
                        time:   [31.000 ms 31.028 ms 31.056 ms]
                        change: [-1.5615% -1.2462% -0.9532%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 521 outliers among 10000 measurements (5.21%)
  209 (2.09%) high mild
  312 (3.12%) high severe
Benchmarking Serde Deserialization/ron-new/data/canada: Warming up for 3.0000 s
Warning: Unable to complete 10000 samples in 5.0s. You may wish to increase target time to 320.1s, or reduce sample count to 150.
Serde Deserialization/ron-new/data/canada
                        time:   [30.619 ms 30.638 ms 30.658 ms]
                        change: [-2.9465% -2.7434% -2.5733%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 164 outliers among 10000 measurements (1.64%)
  63 (0.63%) high mild
  101 (1.01%) high severe

I also didn't add any tests yet.

fixes #321

src/parse.rs

codecov-commenter · 2023-03-25T21:39:03Z

Codecov Report

Patch coverage: 86.23% and project coverage change: -2.88 ⚠️

Comparison is base (5a407f3) 86.73% compared to head (091f715) 83.85%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #444      +/-   ##
==========================================
- Coverage   86.73%   83.85%   -2.88%     
==========================================
  Files          59       60       +1     
  Lines        7280     7561     +281     
==========================================
+ Hits         6314     6340      +26     
- Misses        966     1221     +255

Impacted Files	Coverage Δ
tests/struct_integers.rs	`95.65% <ø> (-4.35%)`	⬇️
src/parse.rs	`68.40% <81.35%> (-24.00%)`	⬇️
tests/unicode.rs	`82.75% <82.35%> (-17.25%)`	⬇️
src/error.rs	`37.76% <88.00%> (-4.69%)`	⬇️
src/de/mod.rs	`74.26% <96.39%> (-2.67%)`	⬇️
src/de/tests.rs	`100.00% <100.00%> (ø)`
src/extensions.rs	`100.00% <100.00%> (ø)`
src/ser/mod.rs	`75.65% <100.00%> (+4.50%)`	⬆️
tests/321_unicode_ident.rs	`100.00% <100.00%> (ø)`
tests/407_raw_value.rs	`100.00% <100.00%> (ø)`
... and 1 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

juntyr · 2023-03-25T21:40:07Z

@ModProg I've only had time to briefly skim your changes so far, but (formatting excluded) they look good so far. Adding some tests would definitely be the next step, and I'll try to review the PR more thoroughly over the coming days. Thank you already for your work!

ModProg · 2023-03-25T22:39:31Z

Yes, def needs some cleanup.

juntyr · 2023-03-26T05:34:17Z

Once you get through the final cleanup, there are also still some dbg!() macros left

juntyr · 2023-03-27T18:04:08Z

Could you also add tests for the new crash cases you discovered to document which errors they should produce (and on which column)?

ModProg · 2023-03-27T18:32:52Z

Could you also add tests for the new crash cases you discovered to document which errors they should produce (and on which column)?

The crashes occurred due to a mistake I made with byte offsets, checked them in by accident.

src/parse.rs

juntyr · 2023-03-28T08:39:17Z

I am wondering if we could not entirely stop relying on bytes but replace all "next byte" methods with "next character" methods. Would this be possible? What are your thoughts @ModProg?

ModProg · 2023-03-28T15:54:06Z

I am wondering if we could not entirely stop relying on bytes but replace all "next byte" methods with "next character" methods. Would this be possible? What are your thoughts @ModProg?

probably, we should run the benchmark with that change

ModProg · 2023-03-29T00:08:03Z

I just noticed that I made a stupid mistake while benchmarking and benchmarked the wrong versions... The performance impact of some of my changes is larger than I thought and I will have to check that tomorrow.

ModProg · 2023-03-29T00:08:57Z

I just noticed that I made a stupid mistake while benchmarking and benchmarked the wrong versions... The performance impact of some of my changes is larger than I thought and I will have to check that tomorrow.

(in total they ended up increasing the benchmark by ca. 50%)

src/de/mod.rs

src/parse.rs

juntyr · 2023-03-29T08:00:24Z

I just noticed that I made a stupid mistake while benchmarking and benchmarked the wrong versions... The performance impact of some of my changes is larger than I thought and I will have to check that tomorrow.

(in total they ended up increasing the benchmark by ca. 50%)

That is significant ... perhaps once the code changes have stabilised a bit we can look into where the perf hit comes from. So far your changes seem to make the code more readable, which would be a counter-benefit.

ModProg · 2023-03-29T12:31:18Z

Performance still 30% worse

ModProg · 2023-03-30T08:58:00Z

@juntyr I got performance back to where it was before this PR, but it would maybe be best if someone could verify.

juntyr · 2023-03-30T09:05:02Z

@juntyr I got performance back to where it was before this PR, but it would maybe be best if someone could verify.

Ok, that sounds wonderful! I'll have a deeper look sometime later, check the performance, and see if I can find anything else :)

juntyr · 2023-04-02T10:10:14Z

@ModProg Ok, I've been doing a bit of toying around with the changes. First, I just did a bit of cleanup with clippy and extended the docs. I also tested removing all possible byte strings from parsing, please feel free to disregard those changes. You can find my experiments here:
juntyr@c7e428b

I also tested the other side of ron, serialisation. Previously we also used byte strings heavily there and just hoped that it would all come out as UTF8 on the other end. I've tested switching it to UTF8 without breaking the existing API surface in:
juntyr@c6c0997

I'm quite happy with this PR! If you include the clippy fixes, we can definitely land the existing changes and add onto them with UTF8 serialisation and a better Value for ints in follow-up commits.

ModProg · 2023-04-02T22:13:46Z

I also tested removing all possible byte strings from parsing, please feel free to disregard those changes. You can find my experiments here:
juntyr@c7e428b

Those seem to add about 30% of time to the benchmark on my machine (from 29 to 38ms)

juntyr · 2023-04-05T19:44:26Z

(sorry for the long review delay, I'm finishing up my second thesis atm and it will occupy me for a few more days)

ModProg · 2023-04-06T13:43:46Z

(sorry for the long review delay, I'm finishing up my second thesis atm and it will occupy me for a few more days)

no worries, I am in no rush.

juntyr · 2023-08-24T01:03:11Z

@ModProg I am really sorry that it has taken me so long to get back to this PR, life has been happening.

I want to get this PR in before v0.9, as it is the last big change I intend to include in that. I will get started with rebasing the PR (on top of #438) over the coming days and will try to get it merged soon.

I am slowly trying to make ron even more Rusty (e.g. with Rusty byte strings in #438 and typed number suffixes and underscores in floats in #481), and this PR is an important step in that direction. Thank you so much for working on it!

juntyr · 2023-09-03T06:54:17Z

#488 has now landed, which supersedes this PR

Support unicode idents (matching rust)

5e3d049

juntyr reviewed Mar 25, 2023

View reviewed changes

src/parse.rs Outdated Show resolved Hide resolved

juntyr reviewed Mar 25, 2023

View reviewed changes

src/parse.rs Outdated Show resolved Hide resolved

juntyr assigned ModProg Mar 25, 2023

juntyr self-requested a review March 25, 2023 21:41

cleanup

b8dc822

ModProg force-pushed the unicode-ident branch from de2f593 to a780aff Compare March 27, 2023 19:39

add test

2a64c10

ModProg force-pushed the unicode-ident branch from a780aff to 2a64c10 Compare March 27, 2023 19:51

ModProg commented Mar 27, 2023

View reviewed changes

src/parse.rs Outdated Show resolved Hide resolved

juntyr reviewed Mar 28, 2023

View reviewed changes

src/parse.rs Outdated Show resolved Hide resolved

juntyr reviewed Mar 28, 2023

View reviewed changes

src/parse.rs Show resolved Hide resolved

juntyr reviewed Mar 28, 2023

View reviewed changes

src/parse.rs Outdated Show resolved Hide resolved

ModProg added 2 commits March 29, 2023 01:50

removing many bytes

841edaf

wip

7bf607a

juntyr reviewed Mar 29, 2023

View reviewed changes

src/de/mod.rs Outdated Show resolved Hide resolved

juntyr reviewed Mar 29, 2023

View reviewed changes

src/parse.rs Outdated Show resolved Hide resolved

remove

371cab7

ModProg added 2 commits March 29, 2023 14:59

remove dbg

102e0ba

performance back where it was

105492a

remove comment

17eacc5

ModProg added 4 commits March 30, 2023 11:10

replace consume_str with consume_char where possible

0ef12ac

remove char boundry error

e45e58f

fix tests for features

95853a9

remove comments

62d74eb

Small improvements to the PR

f8b3fa8

fix fuzz

091f715

juntyr mentioned this pull request Jul 16, 2023

Add benchmarking using arbitrary fuzzing #465

Merged

9 tasks

juntyr mentioned this pull request Aug 25, 2023

Add full UTF-8 support in RON incl. unicode identifiers #488

Merged

1 task

juntyr marked this pull request as draft August 25, 2023 22:38

juntyr closed this Sep 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support unicode idents (matching rust) #444

Support unicode idents (matching rust) #444

ModProg commented Mar 25, 2023 •

edited

Loading

codecov-commenter commented Mar 25, 2023 •

edited

Loading

juntyr commented Mar 25, 2023

ModProg commented Mar 25, 2023

juntyr commented Mar 26, 2023

juntyr commented Mar 27, 2023

ModProg commented Mar 27, 2023

juntyr commented Mar 28, 2023

ModProg commented Mar 28, 2023

ModProg commented Mar 29, 2023

ModProg commented Mar 29, 2023

juntyr commented Mar 29, 2023

ModProg commented Mar 29, 2023

ModProg commented Mar 30, 2023 •

edited

Loading

juntyr commented Mar 30, 2023

juntyr commented Apr 2, 2023

ModProg commented Apr 2, 2023

juntyr commented Apr 5, 2023

ModProg commented Apr 6, 2023

juntyr commented Aug 24, 2023

juntyr commented Sep 3, 2023 •

edited

Loading

Support unicode idents (matching rust) #444

Support unicode idents (matching rust) #444

Conversation

ModProg commented Mar 25, 2023 • edited Loading

codecov-commenter commented Mar 25, 2023 • edited Loading

Codecov Report

juntyr commented Mar 25, 2023

ModProg commented Mar 25, 2023

juntyr commented Mar 26, 2023

juntyr commented Mar 27, 2023

ModProg commented Mar 27, 2023

juntyr commented Mar 28, 2023

ModProg commented Mar 28, 2023

ModProg commented Mar 29, 2023

ModProg commented Mar 29, 2023

juntyr commented Mar 29, 2023

ModProg commented Mar 29, 2023

ModProg commented Mar 30, 2023 • edited Loading

juntyr commented Mar 30, 2023

juntyr commented Apr 2, 2023

ModProg commented Apr 2, 2023

juntyr commented Apr 5, 2023

ModProg commented Apr 6, 2023

juntyr commented Aug 24, 2023

juntyr commented Sep 3, 2023 • edited Loading

ModProg commented Mar 25, 2023 •

edited

Loading

codecov-commenter commented Mar 25, 2023 •

edited

Loading

ModProg commented Mar 30, 2023 •

edited

Loading

juntyr commented Sep 3, 2023 •

edited

Loading