Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize build time (move stuff from hpp to cpp) #1787

Closed
wants to merge 13 commits into from

Conversation

pcaspers
Copy link
Contributor

@pcaspers pcaspers commented Sep 18, 2023

I am looking into optimizing the build time of ORE. I am using ClangBuildAnalyzer for this. Some optimizations will affect QuantLib, so I want to check first if these changes are welcome or not.

I started with "expensive headers". Both for ORE and QuantLib observable.hpp is high up in the list. For the QuantLib build

**** Expensive headers:
209293 ms: /Users/peter.caspers/QuantLib/ql/patterns/observable.hpp (included 825 times, avg 253 ms), included via:

By moving implementations from hpp to cpp the build time goes from

  Parsing (frontend):         2045.0 s
  Codegen & opts (backend):   1602.3 s

to

  Parsing (frontend):         2085.9 s
  Codegen & opts (backend):   1568.2 s

so no improvement so far! However, this change is not ready yet, I'd like to extend the optimization to the thread-safe code section and apply similar optimizations to other headers as well.

I am never sure, whether removing the inline qualifier affects the performance (or size) of the generated code. My rough understanding is that for modern compilers it should not. But I've not investigated that in any more detail.

Could you let me know whether you are okay with changes like the one in this PR. I am unsure whether we would apply those in our fork without our mothership doing the same, since a lot of stuff is moved around. @lballabio

@pcaspers pcaspers marked this pull request as draft September 18, 2023 14:41
@coveralls
Copy link

Coverage Status

coverage: 71.913% (+0.001%) from 71.912% when pulling 29eede9 on pcaspers:optimize_build_time into cd33939 on lballabio:master.

@lballabio
Copy link
Owner

I'm ok with it. As for inline, I think modern compilers can do without it but only if full-program optimization is enabled at link time.

@sweemer
Copy link
Contributor

sweemer commented Sep 19, 2023

I was also looking at this a while back with Build Insights but the rabbit hole went a bit too deep for me :) I might try to take another look soon if I can.

If I recall correctly, ql/errors.hpp was one of the expensive ones pointed out by Build Insights, so it's interesting that you actually added it to more places. Since this header is included directly or transitively by almost all translation units, I believe that speeding it up should provide bang for the buck.

@pcaspers
Copy link
Contributor Author

yes I am looking at errors.hpp right now :-)

@lballabio
Copy link
Owner

I'm guessing the problem is the inclusion of sstream to compose error messages.

@pcaspers
Copy link
Contributor Author

I'm guessing the problem is the inclusion of sstream to compose error messages.

Yes it's a big contributor:

159628 ms: /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX12.3.sdk/usr/include/c++/v1/ostream (included 1120 times, avg 142 ms), included via:
  67x: errors.hpp sstream istream 
...

@pcaspers
Copy link
Contributor Author

after simplifying errors.hpp:

  Parsing (frontend):         1982.0 s
  Codegen & opts (backend):   1493.0 s

@pcaspers
Copy link
Contributor Author

after moving some functions from cubicinterpolation.hpp -> cpp

  Parsing (frontend):         1887.5 s
  Codegen & opts (backend):   1375.4 s

@pcaspers
Copy link
Contributor Author

instrument.hpp, array.hpp, date.hpp, calendar.hpp, matrix.hpp are the next candidates to move stuff from hpp -> cpp

@pcaspers
Copy link
Contributor Author

after this:

  Parsing (frontend):         1781.8 s
  Codegen & opts (backend):   1305.7 s

@pcaspers
Copy link
Contributor Author

so far we went from 3647s to 3086s which is -15%

Copy link
Owner

@lballabio lballabio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments. Also, instead of optimizing single headers, I'd look into unity builds if I were you. You can enable them automatically in VC++ and CMake, and our configure script supports them as well. It would save a lot more time with a lot less work, and might make these changes less relevant.

@@ -49,94 +46,58 @@ namespace QuantLib {
const char* what() const noexcept override;

private:
ext::shared_ptr<std::string> message_;
std::string message_;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, having ext::shared_ptr<std::string> makes the copy constructor not throwing, as required by the standard. With std::string, it can throw. On the other hand, if it throws, it probably means we're out of memory anyway so all bets are off. I'd try and measure the effect of shared_ptr only, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I had to look that requirement to be honest.

Comment on lines +134 to +141
Real Array::operator[](Size i) const {
#if defined(QL_EXTRA_SAFETY_CHECKS)
QL_REQUIRE(i<n_,
"index (" << i << ") must be less than " << n_ <<
": array access out of range");
#endif
return data_.get()[i];
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's at least inline element access, please. I'd also try timing some heavy finite-difference calculations to check that we're not pessimizing run time.

Comment on lines -500 to -502
inline Real &Matrix::operator()(Size i, Size j) const {
return data_[i*columns()+j];
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, let's at least inline element access. This means operator(), operator[]andcolumns`.

downside_accumulator_set downsideAcc_;
class accumulator_set;
class downside_accumulator_set;
accumulator_set* acc_;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd go for unique_ptr otherwise you need a copy constructor and operator for this to be safe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

#ifndef QL_HIGH_RESOLUTION_DATE
// inline definitions

inline Weekday Date::weekday() const {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the effect of these on compile time?

@pcaspers
Copy link
Contributor Author

A few comments. Also, instead of optimizing single headers, I'd look into unity builds if I were you. You can enable them automatically in VC++ and CMake, and our configure script supports them as well. It would save a lot more time with a lot less work, and might make these changes less relevant.

Yes I wasn't so sure about these changes because of the relatively little gain anyhow. I'll try the unity build, that's a good idea.

@pcaspers
Copy link
Contributor Author

@lballabio we are using ccache in our internal CI and we have frequent builds with relatively few files changed. Do you have experience how this plays together with a unity build?

@lballabio
Copy link
Owner

I think that as long as you don't add or remove files, the unity builds always compile together the same ones, so ccache should keep working. At least that should hold for our configure-based unity build, which merges together all files in the same directory and gives the resulting big file the same name.

@pcaspers
Copy link
Contributor Author

Ok I'll check. Thank you!

@pcaspers
Copy link
Contributor Author

I got the unity build working for ORE. I see a reduction from 50 min -> 32 min which is much better than the gain from the single optimizations here.

@pcaspers pcaspers changed the title optimize build time optimize build time (move stuff from hpp to cpp) Oct 1, 2023
Copy link
Contributor

github-actions bot commented Dec 1, 2023

This PR was automatically marked as stale because it has been open 60 days with no activity. Remove stale label or comment, or this will be closed in two weeks.

@github-actions github-actions bot added the stale label Dec 1, 2023
@pcaspers
Copy link
Contributor Author

pcaspers commented Dec 7, 2023

superseded by precompiled header usage, at least on "our" end (ORE build)

@pcaspers pcaspers closed this Dec 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants