-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support unicode idents (matching rust) #444
Conversation
Codecov ReportPatch coverage:
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more Additional details and impacted files@@ Coverage Diff @@
## master #444 +/- ##
==========================================
- Coverage 86.73% 83.85% -2.88%
==========================================
Files 59 60 +1
Lines 7280 7561 +281
==========================================
+ Hits 6314 6340 +26
- Misses 966 1221 +255
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
@ModProg I've only had time to briefly skim your changes so far, but (formatting excluded) they look good so far. Adding some tests would definitely be the next step, and I'll try to review the PR more thoroughly over the coming days. Thank you already for your work! |
Yes, def needs some cleanup. |
Once you get through the final cleanup, there are also still some |
Could you also add tests for the new crash cases you discovered to document which errors they should produce (and on which column)? |
The crashes occurred due to a mistake I made with byte offsets, checked them in by accident. |
I am wondering if we could not entirely stop relying on bytes but replace all "next byte" methods with "next character" methods. Would this be possible? What are your thoughts @ModProg? |
probably, we should run the benchmark with that change |
I just noticed that I made a stupid mistake while benchmarking and benchmarked the wrong versions... The performance impact of some of my changes is larger than I thought and I will have to check that tomorrow. |
(in total they ended up increasing the benchmark by ca. 50%) |
That is significant ... perhaps once the code changes have stabilised a bit we can look into where the perf hit comes from. So far your changes seem to make the code more readable, which would be a counter-benefit. |
Performance still 30% worse |
@juntyr I got performance back to where it was before this PR, but it would maybe be best if someone could verify. |
Ok, that sounds wonderful! I'll have a deeper look sometime later, check the performance, and see if I can find anything else :) |
@ModProg Ok, I've been doing a bit of toying around with the changes. First, I just did a bit of cleanup with clippy and extended the docs. I also tested removing all possible byte strings from parsing, please feel free to disregard those changes. You can find my experiments here: I also tested the other side of ron, serialisation. Previously we also used byte strings heavily there and just hoped that it would all come out as UTF8 on the other end. I've tested switching it to UTF8 without breaking the existing API surface in: I'm quite happy with this PR! If you include the clippy fixes, we can definitely land the existing changes and add onto them with UTF8 serialisation and a better |
Those seem to add about 30% of time to the benchmark on my machine (from 29 to 38ms) |
(sorry for the long review delay, I'm finishing up my second thesis atm and it will occupy me for a few more days) |
no worries, I am in no rush. |
@ModProg I am really sorry that it has taken me so long to get back to this PR, life has been happening. I want to get this PR in before v0.9, as it is the last big change I intend to include in that. I will get started with rebasing the PR (on top of #438) over the coming days and will try to get it merged soon. I am slowly trying to make ron even more Rusty (e.g. with Rusty byte strings in #438 and typed number suffixes and underscores in floats in #481), and this PR is an important step in that direction. Thank you so much for working on it! |
#488 has now landed, which supersedes this PR |
CHANGELOG.md
This not only changes the ident parsing, but also uses
&str
instead of&[u8]
for the rest of the parsing (sometime as a byte slice through
.bytes()
).If you prefer to have a more restricted implementation that only changes
.*ident*()
methods onBytes
I can revert the other changes and usefrom_utf8_unchecked
instead.But as I saw that the benchmark using this implementation was more or less on
par with the current implementation I wanted to share it at least:
ron-new
is this PR's string based implementation,ron
is the current gitversion.
I also didn't add any tests yet.
fixes #321