-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pep440: some minor refactoring, mostly around error types #780
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
konstin
approved these changes
Jan 4, 2024
276df67
to
9d982fa
Compare
I'll like expand this comment a little bit in subsequent commits, but I felt like this was useful to put in the beginning of this patch series to "set the stage." That is, this data will motivate some of the optimizations we'll do for version parsing, version representation and comparisons.
This moves LocalSegment and PreRelease after Version, and also keeps their impl blocks grouped together. No code is changed here.
It turns out that the current parser permits non-ASCII whitespace in places. We can be sneaky here to cause the existing implementation to produce incorrect offsets. The core issue here is that `unicode-width` is used to compute codepoint offsets, but its actual purpose is to compute the *visual width* of a codepoint that has been rendered. Some codepoints use more than 1 unit of visual width. While this establishes a mismatch between the implementation and the documented behavior of `Pep440Error`, we will also address this by switching to byte offsets instead of codepoint offsets. (Codepoint offsets are almost never what you want.)
We still use unicode-width, but only when rendering the error message. We also add the error message itself to the Display impl.
This error type was only be used when one attempted to parse a string as a `VersionSpecifiers`. We'll want to introduce more structured error types for parsing `Version` and `VersionSpecifier` as well, so renaming the error type helps make room for that. We also make the error type opaque. Nothing (in puffin at least) seemed to need its internals. We can always add accessor methods in the future if something else needs it. It's overall rare to need to expose the entire internal representation of an error type.
Since it's specifier to version specifier parsing, it makes sense to have it live with the corresponding type. Putting it in lib.rs I think makes it seem like an error type that applies to other things as well.
This smoothes out the error handling to using something a bit more structured. I did this because I intend to do the same for Version (eventually), and it seems good to be consistent. It also lets us nest errors a bit more easily and scrutinize the different error classes at a glance.
This makes the top-level error type small, just as the others we added in the previous commit are. This was specifically flagged by Clippy.
9d982fa
to
f619012
Compare
Prior to the refactoring in previous commits, tests were generally written against the rendered error message instead of the structured message. This was in part because many of the error types themselves were just a `String`, but also to explicitly test what an end user actually sees. That's valuable. While we keep most of the tests as rewritten to target the new structured error representation, we add some new tests that captures the value of testing the messages than humans will actually see.
f619012
to
63da3d1
Compare
BurntSushi
added a commit
that referenced
this pull request
Jan 5, 2024
This PR builds on #780 by making both version parsing faster, and perhaps more importantly, making version comparisons much faster. Overall, these changes result in a considerable improvement for the `boto3.in` workload. Here's the status quo: ``` $ time puffin pip-compile --no-build --cache-dir ~/astral/tmp/cache/ -o /dev/null ./scripts/requirements/boto3.in Resolved 31 packages in 34.56s real 34.579 user 34.004 sys 0.413 maxmem 2867 MB faults 0 ``` And now with this PR: ``` $ time puffin pip-compile --no-build --cache-dir ~/astral/tmp/cache/ -o /dev/null ./scripts/requirements/boto3.in Resolved 31 packages in 9.20s real 9.218 user 8.919 sys 0.165 maxmem 463 MB faults 0 ``` This particular workload gets stuck in pubgrub doing resolution, and thus benefits mightily from a faster `Version::cmp` routine. With that said, this change does also help a fair bit with "normal" runs: ``` $ hyperfine -w10 \ "puffin-base pip-compile --cache-dir ~/astral/tmp/cache/ -o /dev/null ./scripts/benchmarks/requirements.in" \ "puffin-cmparc pip-compile --cache-dir ~/astral/tmp/cache/ -o /dev/null ./scripts/benchmarks/requirements.in" Benchmark 1: puffin-base pip-compile --cache-dir ~/astral/tmp/cache/ -o /dev/null ./scripts/benchmarks/requirements.in Time (mean ± σ): 337.5 ms ± 3.9 ms [User: 310.5 ms, System: 73.2 ms] Range (min … max): 333.6 ms … 343.4 ms 10 runs Benchmark 2: puffin-cmparc pip-compile --cache-dir ~/astral/tmp/cache/ -o /dev/null ./scripts/benchmarks/requirements.in Time (mean ± σ): 189.8 ms ± 3.0 ms [User: 168.1 ms, System: 78.4 ms] Range (min … max): 185.0 ms … 196.2 ms 15 runs Summary puffin-cmparc pip-compile --cache-dir ~/astral/tmp/cache/ -o /dev/null ./scripts/benchmarks/requirements.in ran 1.78 ± 0.03 times faster than puffin-base pip-compile --cache-dir ~/astral/tmp/cache/ -o /dev/null ./scripts/benchmarks/requirements.in ``` There is perhaps some future work here (detailed in the commit messages), but I suspect it would be more fruitful to explore ways of making resolution itself and/or deserialization faster. Fixes #373, Closes #396
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR does a bit of refactoring to the pep440 crate, and in
particular around the erorr types. This PR is meant to be a precursor
to another PR that does some surgery (both in parsing and in
Version
representation) that benefits somewhat from this refactoring.
As usual, please review commit-by-commit.