Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve compile times of the most expensive source modules #1806

Merged
merged 1 commit into from
Nov 21, 2017

Conversation

lgritz
Copy link
Collaborator

@lgritz lgritz commented Nov 19, 2017

Some source modules took minutes to compile, mostly because they involve
lots of template instatiation for many type combinations.

So, first, some of the functions that had a very wide cross-product of
type combinations could be changed to the variety that only handles the
common types and uses float as an intermediate format for the outlier
cases. That reduces total compile work substantially.

But also, particularly long-compiling modules can screw up parallel
builds, because that all the other modules for a library may get done
compiling while that one with an extra long compile is still going, but
nothing else can proceed until that's done, so only one core is doing
work while the others wait. So to combat this, I took the two files with
the very longest times (imagebufalgo_pixelmath and imagebufalgo_copy)
and split them up into multiple pieces. That doesn't reduce the total
amount of compilation work, but it does make it easier to parallelize it
over multiple cores.

The net result can be seen by these stats for fresh full builds (after
clearing ccache) on my 4-core laptop:

     before:       315.85 real       850.53 user        42.60 sys
     after:        200.44 real       725.09 user        36.21 sys

The "real" time is the important one -- reduce wall clock time by ~35%
when doing a full parallel build on a 4-core machine.

Note: the big blocks of changes are just moving code from one file to another.
Those sections didn't have any logic changes.

Some source modules took minutes to compile, mostly because they involve
lots of template instatiation for many type combinations.

So, first, some of the functions that had a very wide cross-product of
type combinations could be changed to the variety that only handles the
common types and uses float as an intermediate format for the outlier
cases. That reduces total compile work substantially.

But also, particularly long-compiling modules can screw up parallel
builds, because that all the other modules for a library may get done
compiling while that one with an extra long compile is still going, but
nothing else can proceed until that's done, so only one core is doing
work while the others wait. So to combat this, I took the two files with
the very longest times (imagebufalgo_pixelmath and imagebufalgo_copy)
and split them up into multiple pieces. That doesn't reduce the total
amount of compilation work, but it does make it easier to parallelize it
over multiple cores.

The net result can be seen by these stats for fresh full builds (after
clearing ccache) on my 4-core laptop:

         before:       315.85 real       850.53 user        42.60 sys
         after:        200.44 real       725.09 user        36.21 sys

The "real" time is the important one -- reduce wall clock time by ~35%
when doing a full parallel build on a 4-core machine.
@lgritz lgritz merged commit 81ea41c into AcademySoftwareFoundation:master Nov 21, 2017
@lgritz
Copy link
Collaborator Author

lgritz commented Nov 23, 2017

For those curious, I reran the timings on a beefier Linux machine where I typically build with ninja -j 24, and the results were similar:

before:    3:01 real   23:51 user   0:41 sys
after:     2:02 real   22:48 user   0:41 sys

So it speeds up total work slightly (5%), but by load balancing among the threads better, it reduces the amount of human time waiting for the build by 33%!

Watching the graphical load meter as it goes, I saw that there is still a good 30-45 seconds where only 2 or 3 threads are working, so I think I can get even better utilization if I jiggle things a bit more.

@lgritz lgritz deleted the lg-breakup branch November 27, 2017 07:05
GerHobbelt pushed a commit to GerHobbelt/oiio that referenced this pull request Dec 10, 2024
Signed-off-by: Brad Smith <[email protected]>
Co-authored-by: Rémi Achard <[email protected]>
Co-authored-by: Doug Walker <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant