Improve compile times of the most expensive source modules #1806
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Some source modules took minutes to compile, mostly because they involve
lots of template instatiation for many type combinations.
So, first, some of the functions that had a very wide cross-product of
type combinations could be changed to the variety that only handles the
common types and uses float as an intermediate format for the outlier
cases. That reduces total compile work substantially.
But also, particularly long-compiling modules can screw up parallel
builds, because that all the other modules for a library may get done
compiling while that one with an extra long compile is still going, but
nothing else can proceed until that's done, so only one core is doing
work while the others wait. So to combat this, I took the two files with
the very longest times (imagebufalgo_pixelmath and imagebufalgo_copy)
and split them up into multiple pieces. That doesn't reduce the total
amount of compilation work, but it does make it easier to parallelize it
over multiple cores.
The net result can be seen by these stats for fresh full builds (after
clearing ccache) on my 4-core laptop:
The "real" time is the important one -- reduce wall clock time by ~35%
when doing a full parallel build on a 4-core machine.
Note: the big blocks of changes are just moving code from one file to another.
Those sections didn't have any logic changes.