-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Consider removing spdlog dependency for substantial compile time improvements #1300
Comments
Cross-link to RMM issue: rapidsai/rmm#1222 |
Thanks for doing this comparison @ahendriksen! Have you, by chance, compared the end-to-end runtimes before and after the change to using the spdlog compiled lib? I'm attaching two ninja_log files- one before and one after. I haven't done any further analysis on these files other than to notice that the end-to-end compile time only seemed to go down by about 1.5mins. That being said, there's a couple stragglers that took quite some time to compile (ivf-flat for example) which don't yet have specializations so I think we can address those separately to reduce the compile times further. Also attached are the patches for the changes to RAFT and RMM to get them to use spdlog's compiled binary. |
Good point. I have analyzed your ninja logs and share results below. Some caveats:
As you point out, looking at total compile time is not always useful because of stragglers. Therefore, I have looked at the compile times per translation unit and the sum of the compile times per translation unit. Summary of results:
All results: (python script to generate is included below)
from pathlib import Path
from collections import Counter
def parse_ninja_log(log_path):
text = Path(log_path).read_text()
start, end, mtime, path, cmd = list(zip(*[line.split("\t") for line in text.splitlines()[1:]]))
start = list(map(int, start))
end = list(map(int, end))
seconds = [(e - s) / 1000. for e, s in zip(end, start)]
mtime = list(map(int, mtime))
return dict(
start=start,
end=end,
seconds=seconds,
mtime=mtime,
path=path,
cmd=cmd
)
def discard_earlier_builds(d):
prev_end = 0
start_index = 0
# end must be monotonically increasing. If we find and end value that is
# lower than the end value on the previous row, we know that a new build has
# started.
for i, end in enumerate(d['end']):
if end < prev_end:
start_index = i
prev_end = end
return {k: v[start_index:] for k, v in d.items()}
def print_duplicates(d):
# d is a dict returned by parse_ninja_log
print(f" # {'path':<60} sec cmd hash sec other cmd hash")
dup_paths = sorted(set(p for p, count in Counter(d['path']).items() if count > 1))
for i, p in enumerate(dup_paths):
print(f"{i:3d} {p[-60:]:<60}: ", end="")
for p_other, sec, cmd in zip(d['path'], d['seconds'], d['cmd']):
if p == p_other:
print(f"{sec:6.1f} ({cmd})", end="")
print()
compiled = parse_ninja_log("/home/ahendriksen/Downloads/ninja_log_spdlog/ninja_log_spdlog_compiled")
headers = parse_ninja_log("/home/ahendriksen/Downloads/ninja_log_spdlog/ninja_log_spdlog_headers")
compiled = discard_earlier_builds(compiled)
headers = discard_earlier_builds(headers)
# Print sum of compile times of each translation unit:
print(f"Sum of compile times for compiled spdlog: {sum(compiled['seconds']):.1f} seconds")
print(f"Sum of compile times for header-only spdlog: {sum(headers['seconds']):.1f} seconds\n")
compiled_times = dict(zip(compiled['path'], compiled['seconds']))
headers_times = dict(zip(headers['path'], headers['seconds']))
print("Compile times for paths only found in headers (seconds):")
for p in set(headers['path']) - set(compiled['path']):
print(f"{p[-80:]:<80} {headers_times[p]:6.1f}")
# Compare compile time per path between compiled and headers:
results = [(path, headers_times[path], compiled_times[path]) for path in compiled_times.keys()]
# Add relative change as a percentage
results = [(p, hsec, csec, csec - hsec, 100. * (csec / hsec - 1)) for p, hsec, csec in results]
# Sort by relative change
results = sorted(results, key=lambda x: x[4])
# Print results
print("\nComparison of compile times between headers and compiled: ")
print(f"{'path':<70} header (s) compiled (s) change (s) change (%)")
for p, hsec, csec, diff, rel in results:
print(f"{p[-80:]:<80} {hsec:6.1f} {csec:6.1f} {diff:+5.1f} {rel:+4.1f}%") |
I'm proposing that RMM allow the user to set whether the compiled or header-only spdlog target is used. I would honestly prefer if we just defaulted to compiled everywhere accept for users who "really" want fully header-only operation. |
Thanks for looking into this Corey! I agree it is a good idea to consider using the precompiled spdlog library. If we go the precompiled route, would this require adding a runtime dependency on spdlog in the conda package as well? We currently do not seem to have a Conda dependency on spdlog. |
Describe the bug
Including the
spdlog
headers is quite expensive. Just adding#include <spdlog/spdlog.h>
to an empty file adds 2.8 seconds to the compilation time. For the pairwise distance kernels, removing thespdlog
include can reduce compile times by 50%.Steps/Code to reproduce bug
Expected behavior
A smaller increase in compile time. For context, including
<string>
adds on the order of 100ms to the compilation time:Additional context
RMM
RMM also uses
spdlog
. In practice the compile time improvements will only be obtained when RMM also removes its spdlog dependency.Reason
The reason that compilation takes much longer is that
spdlog
instantiates a bunch of templates in every translation unit when used as a header only library. This happens in pattern_formatter::handle_flag_, which is instantiated here. Just adding back thespdlog
header doubles the compile times ofcicc
(device side) and alsogcc
on the host side.Precompiled-library
Another option is to not use
spdlog
as a header only library. The effect can be simulated by defining SPDLOG_COMPILED_LIB. When this is defined,spdlog
adds only 0.5 seconds:The text was updated successfully, but these errors were encountered: