Backend NML Parser: Performance Improvements #4872
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I had wrongly assumed the bulk of the upload nml complexity to be in validateTrees.
Analysis with a 350 MB real-world NML yielded:
This PR tackles two properties of
parseTrees
:(1) the accessing of XMLNode attributes via shortcut method
\
is not very efficient, new helper methodgetSingleAttribute
(speedup factor ×2–4)(2) replace inefficient handling of comments and branchpoints by hashmap lookup (speeup factor ×10–15 for this particular NML, which notably has 80k node comments)
so the parse trees line after this change reads with speedup factor ×36
with the others unchanged.
Other changes:
TODO
getSingleAttribute
in remaining places (won’t bring much more speedup, just for consistency)Steps to test: