Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve schema compiler performance #1396

Merged
merged 1 commit into from
Dec 24, 2024

Conversation

mbeckerle
Copy link
Contributor

@mbeckerle mbeckerle commented Dec 20, 2024

With this change, schema compilation for
Link16 drops from 2+ minutes to 18 seconds.

Added instrumentation to compiler which shows
that most time (by far, like 80+ percent) is
spent in Xerces validating each schema
file individually. This is unnecessary. We
only need to validate the top level schema
to detect UPA errors.

Fixed other error causing repeated
evaluation of OOLAG LV values.

DAFFODIL-2965, DAFFODIL-2781

This is the output from the instrumentation before the functional improvements in this change set:

[error] [info] validateAsDFDLSchema schemaFactory.newSchema                 66.852  59.13%    67187944     995
[error] [info] resolveCommon                                                 9.720   8.60%       17884  543499
[error] [info] loadDocument                                                  6.607   5.84%     6639796     995
[error] [info] validateAsDFDLSchema load                                     5.858   5.18%     5887291     995
[error] [info] parserFromURI                                                 5.689   5.03%     5717234     995
[error] [info] unparser                                                      3.707   3.28%  1853341986       2
[error] [info] TypeBase                                                      2.866   2.53%      431697    6639
[error] [info] constructingLoader.load                                       1.705   1.51%      856144    1991
[error] [info] elementRuntimeData                                            1.651   1.46%      248657    6638
[error] [info] areComponentsConstructed                                      1.603   1.42%  1602879249       1
[error] [info] compileSourceInternal                                         1.018   0.90%  1018485705       1
[error] [info] groupMembers                                                  0.628   0.56%       72189    8701
[error] [info] validateXML                                                   0.605   0.54%      608347     995

Here is the instrumentation after:

[error] [info] unparser                                                     3.199  21.79%  1599749444      2
[error] [info] TypeBase                                                     1.972  13.43%      297091   6639
[error] [info] areComponentsConstructed                                     1.696  11.55%  1696149699      1
[error] [info] elementRuntimeData                                           1.389   9.46%      209236   6638
[error] [info] constructingLoader.load                                      0.984   6.70%      988328    996
[error] [info] groupMembers                                                 0.646   4.40%       74285   8701
[error] [info] parser                                                       0.490   3.34%   244993274      2
[error] [info] expr                                                         0.427   2.91%      568419    751
[error] [info] DFDLNewVariableInstance_requiredEvaluation_1                 0.246   1.68%     2196754    112
[error] [info] compileSourceInternal                                        0.227   1.55%   227499753      1
[error] [info] repTypeMaps                                                  0.217   1.47%      410835    527
[error] [info] iiSchemaFile                                                 0.197   1.34%      198092    996
[error] [info] encoding                                                     0.177   1.20%       13484  13099
[error] [info] shortFormProperties                                          0.161   1.10%        7972  20201
[error] [info] nonDefaultPropertySources                                    0.160   1.09%        8622  18537
[error] [info] resolvedSchemaLocation                                       0.159   1.08%       40757   3907
[error] [info] notSeenThisBefore                                            0.141   0.96%       36013   3907
[error] [info] SequenceGroupRef_requiredEvaluation_7                        0.124   0.85%       29451   4225
[error] [info] enclosingElements                                            0.109   0.74%        8547  12726
[error] [info] priorAlignmentApprox                                         0.099   0.67%        8399  11755
[error] [info] combinedJustThisOneOproperties                               0.096   0.65%        4748  20201
[error] [info] longFormProperties                                           0.091   0.62%        4490  20201
[error] [info] LocalElementDecl_requiredEvaluation_24                       0.085   0.58%       13942   6105
[error] [info] defaultPropertySources                                       0.076   0.52%        4112  18537
[error] [info] hiddenGroupRefOption                                         0.074   0.50%       13526   5467

build.sbt Outdated Show resolved Hide resolved
@mbeckerle mbeckerle changed the title Improve schema compiler performance WIP: Improve schema compiler performance Dec 20, 2024
@mbeckerle mbeckerle marked this pull request as draft December 20, 2024 17:37
@mbeckerle mbeckerle self-assigned this Dec 20, 2024
@mbeckerle mbeckerle changed the title WIP: Improve schema compiler performance Improve schema compiler performance Dec 20, 2024
@mbeckerle mbeckerle marked this pull request as ready for review December 20, 2024 20:40
@mbeckerle
Copy link
Contributor Author

All tests pass on my system.

Copy link
Contributor

@jadams-tresys jadams-tresys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be a regression somewhere in this commit that is costing us concurrency in when running the test suite. Nothing is jumping out at me, but comparing this commit to the main branch, running the full test suite takes roughly 3x as long on this branch than on main.

@mbeckerle
Copy link
Contributor Author

There seems to be a regression somewhere in this commit that is costing us concurrency in when running the test suite. Nothing is jumping out at me, but comparing this commit to the main branch, running the full test suite takes roughly 3x as long on this branch than on main.

Yeah, I think I see where this happened. isError is not a lazy val. A bunch of work is done each time it is called. Most of that work will just be finding out all the various things have already been evaluated, but there is substantial overhead just traversing all the objects to find that out. I will investigate further.

@mbeckerle
Copy link
Contributor Author

@jadams-tresys the most recent commit addresses the slow-down of the daffodil test suite.

Was due to isError being called inside the synchronized block for the schema compiler. This was unnecessary. Calling isError outside the synchronized block restores the parallelism the loss of which was responsible for the slow-down, or at least that's my story and I'm sticking with it :-)

With this change, schema compilation for
Link16 drops from 2+ minutes to 18 seconds.

Added instrumentation to compiler which shows
that most time (by far, like 80+ percent) is
spent in Xerces validating each schema
file individually. This is unnecessary. We
only need to validate the top level schema
to detect UPA errors.

Fixed other error causing repeated
evaluation of OOLAG LV values.

Eliminate isError inside synchronized block
Avoids slowdown of Daffodil test suite.

DAFFODIL-2965, DAFFODIL-2781
Copy link
Contributor

@jadams-tresys jadams-tresys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor

@pkatlic pkatlic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@mbeckerle mbeckerle merged commit cf73995 into apache:main Dec 24, 2024
12 checks passed
@mbeckerle mbeckerle deleted the daf-2781-compiler branch December 24, 2024 19:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants