Performance tweaks #220

winfriedgerlach · 2024-11-27T06:33:43Z

Woodstox already features impressive performance optimizations, to which I would like to add small ones. I will submit in individual PRs, so we can discuss the changes separately.

I benchmarked with a JMH test using namespace-aware StAX-parsing with very little value extraction, other use-cases may show less significant improvements (but still benefit slightly).

Profilers identify WstxInputData.isNameStartChar(char) and WstxInputData.isNameStartChar(char) as hotspots. I suggest to make the common case (ASCII) slightly faster there by using a boolean lookup table. This takes ~256 bytes of extra memory, but I suppose that this a reasonable trade-off. I benchmarked with different JVM versions and results varied, but my JMH test showed improvements of e.g. ~3%, sometimes better.
A low hanging fruit is replacing StringBuffer with Stringbuilder. The latter is well-known to be faster due to lack of synchronization. If my analysis is correct, Stringbuffer is never used in a place where thread-safety is relevant, so it can safely be replaced. This has already been done in stax2-api, just a leftover in woodstox I guess.
Direct field access in (now private) inner class Bucket should be fine and faster than calling getters.

Approaches that did not work (maybe someone else tries with different results?):

Use Arrays.copyOf() or just .clone() (called by Arrays.copyOf() in newer Java versions) instead of System.arrayCopy() for cloning arrays: While theoretically .clone() should be fastest (also according to Joshua Bloch in a quite old statement), my microbenchmarking showed no significant difference, often System.arrayCopy() was even faster. But the code reads nicer (shorter, easier to understand), I have to admit. (more details see comment below)
Use bit mask trick ((1 << currToken) & MASK_GET_TEXT_XXX) also for masks with just 2 different types. This seems to perform worse than just type == A || type == B.
Attribute.getQName() seems to have optimization potential, but it's not a hotspot.

The text was updated successfully, but these errors were encountered:

…StartChar() for better performance

…nferior to System.arrayCopy() + cleanup

…pyOf

cowtowncoder · 2024-11-27T23:01:12Z

Correct, all use of StringBuffer is accidental/historical left-over: nothing in core parser/generator is thread-safe (minus some of SymbolTable reuse which is explicitly synchronized as necessary) -- they are not meant to be used from multiple threads.

Cool stuff, will go through PRs.

winfriedgerlach · 2024-11-29T12:37:55Z

More details regarding System.arrayCopy() vs. Arrays.copyOf() or .clone():

Very old comments (Joshua Bloch) say .clone() is the fastest way to clone an array. This is already challenged in some of the answers to that question. Most published benchmarks of "medium age" (e.g. some years ago) find that System.arrayCopy() is fastest.

Things seem to have changed quite a lot in the JDK recently (~beginning of 2023), so different Java versions may perform considerably differently:

Java 21 has improved Arrays.copyOf() performance, see e.g. https://bugs.openjdk.org/browse/JDK-8301958
Java 22 changed code again, https://bugs.openjdk.org/browse/JDK-8309665
2023 benchmark still sees System.arrayCopy() slightly ahead of .clone(), cf. https://bugs.openjdk.org/browse/JDK-8302315 and https://gist.github.com/schlosna/975e26965ec822ad42034b3ea2b08676

Other quite new (2024) benchmarks find no significant difference between Arrays.copy() and System.arrayCopy() (note the absence of the JVM version under test...):
https://www.baeldung.com/java-system-arraycopy-arrays-copyof-performance

My own micro (and a little less micro) benchmarks showed System.arrayCopy() ahead most of the times, sometimes on same level as clone()/Arrays.copyOf().

As Jackson currently supports every Java version from Java 8 to 23, I recon it is safer to stick with System.arrayCopy() for now when optimizing for performance, even if there is probably almost no performance difference when using the most current (=2024) JVM versions.

I definitely have to agree that the code is nicer with Arrays.copyOf()though, see cfe59fb .

winfriedgerlach · 2024-11-29T12:41:25Z

@cowtowncoder in StringVector.addString()/.addStrings() I stumbled over the array size being increased by 200%:

oldSize + (oldSize << 1)

This looked a bit odd to me, as in many other places in the source code you increase array size by 50% only, e.g. DataUtil.growArrayBy50Pct():

len + (len >> 1)

Is this intentional or maybe a typo (<< instead of >>) in StringVector?

cowtowncoder · 2024-11-29T23:38:01Z

Whoa! That sounds more like a bug indeed: unless some comment explicitly states otherwise, and I don't there is.

So it should be shift-right to get +50% increase.

winfriedgerlach added a commit to winfriedgerlach/woodstox that referenced this issue Nov 27, 2024

FasterXML#220 switch to lookup tables in hotspots isNameChar()/isName…

1c1293c

…StartChar() for better performance

winfriedgerlach added a commit to winfriedgerlach/woodstox that referenced this issue Nov 27, 2024

FasterXML#220 use StringBuilder instead of StringBuffer (+ some cleanup)

89210e9

winfriedgerlach added a commit to winfriedgerlach/woodstox that referenced this issue Nov 27, 2024

FasterXML#220 do *not* use Arrays.copyOf(), performance seems to be i…

deb2507

…nferior to System.arrayCopy() + cleanup

winfriedgerlach added a commit to winfriedgerlach/woodstox that referenced this issue Nov 27, 2024

FasterXML#220 avoid unnecessary getters in Bucket class

0aa4c84

cowtowncoder pushed a commit that referenced this issue Nov 27, 2024

#220 use StringBuilder instead of StringBuffer (+ some cleanup) (#222)

e8d8f36

cowtowncoder added a commit to winfriedgerlach/woodstox that referenced this issue Nov 27, 2024

Merge branch 'master' into feature/FasterXML#220-avoid-Bucket-getters

28b89bc

cowtowncoder added a commit to winfriedgerlach/woodstox that referenced this issue Nov 27, 2024

Merge branch 'master' into feature/FasterXML#220-LUT-for-name-chars

d079096

cowtowncoder added a commit to winfriedgerlach/woodstox that referenced this issue Nov 27, 2024

Merge branch 'master' into feature/FasterXML#220-do-not-use-Arrays.co…

556455e

…pyOf

cowtowncoder pushed a commit that referenced this issue Nov 27, 2024

#220 avoid unnecessary getters in Bucket class (#223)

012a512

cowtowncoder added a commit to winfriedgerlach/woodstox that referenced this issue Nov 27, 2024

Merge branch 'master' into feature/FasterXML#220-do-not-use-Arrays.co…

d2319fe

…pyOf

winfriedgerlach added a commit to winfriedgerlach/woodstox that referenced this issue Nov 28, 2024

FasterXML#220 use Arrays.copyOf() instead of System.arrayCopy()

cfe59fb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance tweaks #220

Performance tweaks #220

winfriedgerlach commented Nov 27, 2024 •

edited

Loading

cowtowncoder commented Nov 27, 2024

winfriedgerlach commented Nov 29, 2024 •

edited

Loading

winfriedgerlach commented Nov 29, 2024 •

edited

Loading

cowtowncoder commented Nov 29, 2024

Performance tweaks #220

Performance tweaks #220

Comments

winfriedgerlach commented Nov 27, 2024 • edited Loading

cowtowncoder commented Nov 27, 2024

winfriedgerlach commented Nov 29, 2024 • edited Loading

winfriedgerlach commented Nov 29, 2024 • edited Loading

cowtowncoder commented Nov 29, 2024

winfriedgerlach commented Nov 27, 2024 •

edited

Loading

winfriedgerlach commented Nov 29, 2024 •

edited

Loading

winfriedgerlach commented Nov 29, 2024 •

edited

Loading