-
-
Notifications
You must be signed in to change notification settings - Fork 905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
speed up CSS class queries #2137
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
available as `nokogiri-builtin:css-class` Part of #2135
and consolidated the XPathVisitor tests from css/test_parser.rb into css/test_xpath_visitor.rb so we're testing xpath visitor behavior in one place.
This is consistent with the class selector ".", and should be faster because it's doing less string manipulation in the XPath query.
This allows us to continue to generate XPath that uses standard XPath functions, but also allow us to inject the preference to use the optimized builtin implementation. This also leaves open a path to extracting Nokogiri's CSS parser into a separate gem, and allowing Nokogiri to inject builtin via a custom XPathVisitor class.
Notably, on libxml2 CSS class queries are now ~2x faster using the `nokogiri-builtin:css-class` function. Closes #2135
Code Climate has analyzed commit 5d0b7fe and detected 2 issues on this pull request. Here's the issue category breakdown:
The test coverage on the diff in this pull request is 92.0% (80% is the threshold). This pull request will bring the total coverage in the repository to 94.2% (0.0% change). View more on Code Climate. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What problem is this PR intended to solve?
This PR implements native C and Java XPath functions to look for CSS class names in a string in order to speed up CSS class selectors.
Currently (v1.10.10), Nokogiri turns the CSS query
.red
into the XPath query:which is doing a lot of string manipulation under the hood:
normalize-space
creates a new string buffer and assembles it one byte at a time, cleaning up whitespace along the way (xpath.c:xmlXPathNormalizeFunction
), then strdups that stringconcat
is pretty expensive, allocating new strings and repeatedly calling strlen and strdupThe native implementations gave mixed results. On CRuby, it's about 2x faster; but on JRuby it's slower. Here's a benchmark script:
CRuby/libxml2:
versus JRuby/Xerces:
It's unclear to me why this is; the benchmark tries to bust any caching that might getting done. If anybody can tell me why that is, I'd appreciate the help. But in the meantime the Xerces XPath implementation is compellingly faster than the native Java version, and so we should prefer it to the native builtin function.
To support having variations on the XPath generated from the same CSS query, this PR also extracts the optimization decision into a single method,
XPathVisitor#css_class
, and introduces new subclassesXPathVisitorAlwaysUseBuiltins
(which will prefer the builtin to the xpath function) andXPathVisitorOptimallyUseBuiltins
(which will generate the XPath query that will run the fastest on the user's platform).This PR also makes some non-functional changes:
.
and the attribute selector~=
now share the same logic and generate the same XPath (which is an improvement for~=
which was doing some unnecessary string concatenation)Finally, this PR also fixes a long-standing bug with the
~=
operator (related to #854) which was incorrectly limiting its consideration of whitespace to0x20
when it should have been also treating\r
,\r
, and\n
as whitespace.Have you included adequate test coverage?
You bet.
Does this change affect the behavior of either the C or the Java implementations?
It's notable that the XPath generated by default will differ because of the performance optimizations being made, however there are no functional changes outside of this performance improvement on CRuby/libxml2.