-
-
Notifications
You must be signed in to change notification settings - Fork 904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use glibc strlen to speed up xmlStrlen #2144
Conversation
@ilyazub Thanks for submitting this! The CI failures aren't because of this PR, it's because your base commit is from before I merged some packaging changes. I'm updating CI now and it should go green shortly. |
@ilyazub OK, there's something strange going on with Concourse's PR resource and I think the easiest thing to do will be to have you rebase this PR onto master. (I'd do it myself by don't have permissions on your branch.) Would you mind rebasing? |
Also please see my note at https://gitlab.gnome.org/GNOME/libxml2/-/issues/212 -- I'm not detecting any noticeable performance improvement by introducing this change. Can you help me write a benchmark that will demonstrate that? Here's a benchmark skeleton:
and you run it like:
and it will output something like:
Is that helpful for you to generate some benchmark results about this change? |
8cc9381
to
34ce1ae
Compare
Code Climate has analyzed commit 7ffd48a and detected 0 issues on this pull request. The test coverage on the diff in this pull request is 100.0% (80% is the threshold). This pull request will bring the total coverage in the repository to 94.1% (0.0% change). View more on Code Climate. |
@flavorjones Yes, the benchmark skeleton is helpful for me. I have only added Benchmark resultsBefore the patch
After the patch
10% speedup with a patch. I'm not sure why it runs faster with the patch. The flame graph of the sample program from the original post looks the same for nokogiri on Flame graph of the Nokogiri programFlame graph focused on Flame graph of the entire program Flamegraph of libxml2 benchmark in C with 1 iterationBefore the patch, Overall program Focused on the first In both cases, I'm looking into |
Thanks for the further research. Let's talk for a moment about how to pitch this change upstream -- making this change in libxml2 is the ideal outcome I'd like to drive towards. I think it's confusing to talk to Nick about Nokogiri and CSS; but I think if we emphasize your real-world use case and the XPath queries being implemented in C, your data above (10% speedup) demonstrates a good case. Can I ask that you take the C benchmark and the timing results to that upstream issue to continue the conversation with Nick? |
Edit. Added benchmark, timing results, and flame graphs to the upstream issue. |
Benchmark results: xmlStrlen (entire HTML file): 926171.936981 μs glibc_xmlStrlen (entire HTML file): 36905.903992 μs delta (xmlStrlen ÷ glibc_xmlStrlen): 25.094584 times xmlStrlen (average string): 57479.204010 μs glibc_xmlStrlen (average string): 5802.069000 μs delta (xmlStrlen ÷ glibc_xmlStrlen): 9.905937 times xmlStrlen (bigger string): 388056.315979 μs glibc_xmlStrlen (bigger string): 12797.856995 μs delta (xmlStrlen ÷ glibc_xmlStrlen): 30.318382 times xmlStrlen (smallest string): 15870.046021 μs glibc_xmlStrlen (smallest string): 6282.208984 μs delta (xmlStrlen ÷ glibc_xmlStrlen): 2.527903 times See https://gitlab.gnome.org/GNOME/libxml2/-/issues/212 for reference.
34ce1ae
to
7ffd48a
Compare
@ilyazub Merged! Will be in v1.11.0.rc4. Thanks for your contribution! |
What problem is this PR intended to solve?
This PR speeds up Nokogiri by speeding up
xmlStrlen
. It'd be better to reduce the amount ofxmlStrlen
calls from Nokogiri, but this patch is an easy win.Our parsers sped up from 1.4 seconds to 1 second on big HTML documents with this patch.
xmlStrlen
benchmark results:Benchmark code
big_html.zip
Makefile
Sample program that leads to a lot of
xmlStrlen
calls and summary of CPU profiles of its executionI can't share the entire parser because it's not open source. Our usage is similar except that instead of ten same searches we have multiple
doc.css
anddoc.at_css
calls with different selectors. CSS selectors indoc.css
anddoc.at_css
are translated to XPath by Nokogiri internally.big_html.zip
Command to execute the sample program with
gperftools
.CPU profile before usage of
strlen
inxmlStrlen
.CPU profile after usage of
strlen
inxmlStrlen
.Here are callgrind-style graphs of CPU profiles of the sample program before and after
xmlStrlen
patch: before.svg, after.svg.Have you included adequate test coverage?
No.
Does this change affect the behavior of either the C or the Java implementations?
No. Only the performance of C implementation.
See https://gitlab.gnome.org/GNOME/libxml2/-/issues/212 and #2133 for reference.