[BUG] Consider removing spdlog dependency for substantial compile time improvements #1300

ahendriksen · 2023-02-23T15:54:27Z

Describe the bug

Including the spdlog headers is quite expensive. Just adding #include <spdlog/spdlog.h> to an empty file adds 2.8 seconds to the compilation time. For the pairwise distance kernels, removing the spdlog include can reduce compile times by 50%.

Steps/Code to reproduce bug

time nvcc -arch sm_70 -I/path/to/spdlog/include -x cu -c <(echo '// empty file')
real    0m1.042s

time nvcc -arch sm_70 -I/path/to/spdlog/include -x cu -c <(echo '#include <spdlog/spdlog.h> ')
real    0m3.840s

Expected behavior
A smaller increase in compile time. For context, including <string> adds on the order of 100ms to the compilation time:

time nvcc -arch sm_70 -I/path/to/spdlog/include -x cu -c <(echo '#include <string> ')
real    0m1.160s

Additional context

RMM
RMM also uses spdlog. In practice the compile time improvements will only be obtained when RMM also removes its spdlog dependency.

Reason
The reason that compilation takes much longer is that spdlog instantiates a bunch of templates in every translation unit when used as a header only library. This happens in pattern_formatter::handle_flag_, which is instantiated here. Just adding back the spdlog header doubles the compile times of cicc (device side) and also gcc on the host side.

Precompiled-library
Another option is to not use spdlog as a header only library. The effect can be simulated by defining SPDLOG_COMPILED_LIB. When this is defined, spdlog adds only 0.5 seconds:

time nvcc -DSPDLOG_COMPILED_LIB -arch sm_70 -I/home/ahendriksen/projects/raft-spdlog-issue/cpp/build/_deps/spdlog-src/include -x cu -c <(echo '#include <spdlog/spdlog.h> ')
real    0m1.520s

time nvcc -DSPDLOG_COMPILED_LIB -arch sm_70 -I/home/ahendriksen/projects/raft-spdlog-issue/cpp/build/_deps/spdlog-src/include -x cu -c <(echo '//empty file')
real    0m1.053s

The text was updated successfully, but these errors were encountered:

ahendriksen · 2023-02-23T15:56:34Z

Cross-link to RMM issue: rapidsai/rmm#1222

cjnolet · 2023-03-08T01:41:14Z

Thanks for doing this comparison @ahendriksen!

Have you, by chance, compared the end-to-end runtimes before and after the change to using the spdlog compiled lib? I'm attaching two ninja_log files- one before and one after.

I haven't done any further analysis on these files other than to notice that the end-to-end compile time only seemed to go down by about 1.5mins. That being said, there's a couple stragglers that took quite some time to compile (ivf-flat for example) which don't yet have specializations so I think we can address those separately to reduce the compile times further.

ninja_log_spdlog.zip

Also attached are the patches for the changes to RAFT and RMM to get them to use spdlog's compiled binary.

spdlog_compiled_patches.zip

ahendriksen · 2023-03-08T11:56:24Z

Good point. I have analyzed your ninja logs and share results below.

Some caveats:

Are you sure that the ninja logs you sent are for a recent branch? I can still see the uint32_t specializations for the distance library.
The ninja log files contained previous builds. Some files in both the headers and compiled log were compiled twice. The analysis below is only for the last (second) build.
In the headers build, an additional 8 files were compiled that took an additional ~120 seconds (which is negligible).

As you point out, looking at total compile time is not always useful because of stragglers. Therefore, I have looked at the compile times per translation unit and the sum of the compile times per translation unit.

Summary of results:

The sum of compile times was reduced by 10%.
The median reduction in compile time per translation unit is 11 seconds
Translation units that already took a long time to compile therefore saw the smallest (relative) improvement

All results: (python script to generate is included below)

Sum of compile times for compiled spdlog:    36580.8 seconds
Sum of compile times for header-only spdlog: 40334.0 seconds

Compile times for paths only found in headers (seconds):
CMakeFiles/CORE_TEST.dir/test/core/nvtx.cpp.o                                      15.7
CMakeFiles/CORE_TEST.dir/test/core/span.cpp.o                                       4.4
CMakeFiles/CORE_TEST.dir/test/core/math_device.cu.o                                29.2
CMakeFiles/CORE_TEST.dir/test/core/operators_host.cpp.o                             3.9
CMakeFiles/CORE_TEST.dir/test/core/interruptible.cu.o                              24.1
CMakeFiles/CORE_TEST.dir/test/core/memory_type.cpp.o                                1.9
CMakeFiles/CORE_TEST.dir/test/core/span.cu.o                                       28.2
CMakeFiles/CORE_TEST.dir/test/core/math_host.cpp.o                                  4.1

Comparison of compile times between headers and compiled: 
path                                                                         header (s)  compiled (s)  change (s) change (%)
CMakeFiles/CORE_TEST.dir/test/core/logger.cpp.o                                    17.7           5.2       -12.5     -70.6%
CMakeFiles/CORE_TEST.dir/test/test.cpp.o                                           18.7           5.8       -12.9     -69.1%
CMakeFiles/CORE_TEST.dir/test/core/handle.cpp.o                                    22.6          11.3       -11.3     -50.1%
CMakeFiles/UTILS_TEST.dir/test/util/cudart_utils.cpp.o                             20.8          10.6       -10.2     -49.1%
CMakeFiles/UTILS_TEST.dir/test/util/pow2_utils.cu.o                                23.0          12.8       -10.2     -44.4%
istance_lib.dir/src/distance/random/rmat_rectangular_generator_int64_double.cu.o   28.6          16.9       -11.7     -40.9%
distance_lib.dir/src/distance/random/rmat_rectangular_generator_int64_float.cu.o   27.4          16.3       -11.1     -40.5%
CMakeFiles/LINALG_TEST.dir/test/linalg/svd.cu.o                                    45.6          27.8       -17.8     -39.1%
CMakeFiles/LINALG_TEST.dir/test/linalg/axpy.cu.o                                   41.8          25.8       -16.0     -38.2%
CMakeFiles/UTILS_TEST.dir/test/core/seive.cu.o                                     21.4          13.3        -8.0     -37.7%
CMakeFiles/CORE_TEST.dir/test/core/mdspan_utils.cu.o                               32.7          20.7       -11.9     -36.5%
CMakeFiles/MATRIX_TEST.dir/test/matrix/argmin.cu.o                                 42.9          27.5       -15.5     -36.1%
CMakeFiles/LABEL_TEST.dir/test/label/merge_labels.cu.o                             40.6          26.3       -14.3     -35.2%
CMakeFiles/MATRIX_TEST.dir/test/matrix/reverse.cu.o                                42.0          27.3       -14.7     -35.0%
CMakeFiles/CORE_TEST.dir/test/core/mdarray.cu.o                                    41.8          27.2       -14.6     -34.9%
distance/specializations/detail/l2_sqrt_unexpanded_double_double_double_int.cu.o   34.7          22.8       -12.0     -34.4%
istance/distance/specializations/detail/russel_rao_double_double_double_int.cu.o   35.8          23.6       -12.3     -34.2%
CMakeFiles/STATS_TEST.dir/test/stats/cov.cu.o                                      46.6          30.8       -15.9     -34.0%
LABEL_TEST                                                                          0.3           0.2        -0.1     -33.8%
CMakeFiles/STATS_TEST.dir/test/stats/rand_index.cu.o                               36.0          23.9       -12.0     -33.5%
t_distance_lib.dir/src/distance/random/rmat_rectangular_generator_int_float.cu.o   27.1          18.1        -9.0     -33.4%
CMakeFiles/UTILS_TEST.dir/test/util/device_atomics.cu.o                            25.2          16.8        -8.4     -33.3%
CMakeFiles/LINALG_TEST.dir/test/linalg/divide.cu.o                                 41.7          27.8       -13.9     -33.3%
CMakeFiles/MATRIX_TEST.dir/test/sparse/spectral_matrix.cu.o                        38.7          25.9       -12.8     -33.1%
CMakeFiles/MATRIX_TEST.dir/test/matrix/argmax.cu.o                                 44.8          29.9       -14.8     -33.1%
CMakeFiles/LINALG_TEST.dir/test/linalg/multiply.cu.o                               40.4          27.1       -13.3     -32.9%
CMakeFiles/LINALG_TEST.dir/test/linalg/strided_reduction.cu.o                      40.0          26.9       -13.2     -32.9%
CMakeFiles/MATRIX_TEST.dir/test/matrix/diagonal.cu.o                               43.4          29.3       -14.0     -32.4%
CMakeFiles/LINALG_TEST.dir/test/linalg/dot.cu.o                                    41.1          27.8       -13.2     -32.3%
CMakeFiles/STATS_TEST.dir/test/stats/sum.cu.o                                      39.4          26.7       -12.6     -32.1%
rc/distance/distance/specializations/detail/kernels/gram_matrix_base_double.cu.o   35.8          24.3       -11.5     -32.1%
CMakeFiles/LINALG_TEST.dir/test/linalg/cholesky_r1.cu.o                            38.8          26.4       -12.4     -32.0%
CMakeFiles/LINALG_TEST.dir/test/linalg/map_then_reduce.cu.o                        44.1          30.1       -14.1     -31.9%
CMakeFiles/STATS_TEST.dir/test/stats/mean_center.cu.o                              47.7          32.5       -15.2     -31.9%
ance/distance/specializations/detail/l2_unexpanded_double_double_double_int.cu.o   35.7          24.4       -11.3     -31.6%
CMakeFiles/LINALG_TEST.dir/test/linalg/subtract.cu.o                               42.5          29.0       -13.4     -31.6%
CMakeFiles/MATRIX_TEST.dir/test/matrix/norm.cu.o                                   41.3          28.3       -13.0     -31.4%
CMakeFiles/LINALG_TEST.dir/test/linalg/coalesced_reduction.cu.o                    43.9          30.3       -13.7     -31.1%
_distance_lib.dir/src/distance/random/rmat_rectangular_generator_int_double.cu.o   26.9          18.6        -8.3     -31.0%
CMakeFiles/DISTANCE_TEST.dir/test/distance/dist_l1.cu.o                            47.7          33.0       -14.7     -30.8%
CMakeFiles/RANDOM_TEST.dir/test/random/permute.cu.o                                49.1          34.0       -15.1     -30.8%
CMakeFiles/LINALG_TEST.dir/test/linalg/eig_sel.cu.o                                42.8          29.6       -13.2     -30.7%
CMakeFiles/MATRIX_TEST.dir/test/matrix/triangular.cu.o                             40.9          28.3       -12.6     -30.7%
CMakeFiles/SPARSE_TEST.dir/test/sparse/convert_coo.cu.o                            39.7          27.5       -12.2     -30.7%
CMakeFiles/STATS_TEST.dir/test/stats/entropy.cu.o                                  39.6          27.5       -12.1     -30.6%
CMakeFiles/SPARSE_TEST.dir/test/sparse/spgemmi.cu.o                                37.5          26.0       -11.4     -30.5%
CMakeFiles/SPARSE_TEST.dir/test/sparse/row_op.cu.o                                 40.9          28.5       -12.4     -30.4%
ce_lib.dir/src/distance/neighbors/specializations/refine_h_uint64_t_uint8_t.cu.o   39.8          27.8       -12.1     -30.3%
CMakeFiles/STATS_TEST.dir/test/stats/stddev.cu.o                                   43.7          30.4       -13.2     -30.3%
CMakeFiles/STATS_TEST.dir/test/stats/mean.cu.o                                     40.3          28.1       -12.1     -30.1%
ir/src/distance/distance/specializations/detail/l1_double_double_double_int.cu.o   34.5          24.2       -10.3     -29.8%
eFiles/raft_distance_lib.dir/src/distance/neighbors/refine_d_uint64_t_float.cu.o   37.9          26.7       -11.2     -29.6%
CMakeFiles/LINALG_TEST.dir/test/linalg/gemm_layout.cu.o                            44.4          31.3       -13.2     -29.6%
CMakeFiles/DISTANCE_TEST.dir/test/distance/dist_canberra.cu.o                      49.0          34.5       -14.4     -29.5%
CMakeFiles/LINALG_TEST.dir/test/linalg/gemv.cu.o                                   40.4          28.5       -11.9     -29.5%
CMakeFiles/DISTANCE_TEST.dir/test/distance/dist_jensen_shannon.cu.o                46.7          33.0       -13.7     -29.4%
CMakeFiles/LINALG_TEST.dir/test/linalg/eig.cu.o                                    44.1          31.2       -12.9     -29.3%
iles/raft_distance_lib.dir/src/distance/neighbors/refine_d_uint64_t_uint8_t.cu.o   36.7          26.0       -10.7     -29.3%
CMakeFiles/DISTANCE_TEST.dir/test/distance/dist_hamming.cu.o                       49.6          35.1       -14.5     -29.2%
CMakeFiles/RANDOM_TEST.dir/test/random/make_blobs.cu.o                             48.9          34.6       -14.3     -29.2%
CMakeFiles/LINALG_TEST.dir/test/linalg/transpose.cu.o                              41.7          29.6       -12.1     -29.0%
istance/distance/specializations/detail/l2_unexpanded_float_float_float_int.cu.o   47.2          33.6       -13.6     -28.8%
CMakeFiles/DISTANCE_TEST.dir/test/distance/dist_euc_unexp.cu.o                     49.4          35.2       -14.2     -28.7%
CMakeFiles/DISTANCE_TEST.dir/test/distance/dist_hellinger.cu.o                     45.4          32.4       -13.0     -28.6%
iles/raft_distance_lib.dir/src/distance/neighbors/refine_h_uint64_t_uint8_t.cu.o   36.6          26.2       -10.4     -28.5%
CMakeFiles/LINALG_TEST.dir/test/linalg/ternary_op.cu.o                             43.9          31.5       -12.4     -28.3%
CMakeFiles/SPARSE_TEST.dir/test/sparse/add.cu.o                                    46.1          33.0       -13.0     -28.3%
dir/src/distance/distance/specializations/detail/kernels/tanh_kernel_double.cu.o   36.3          26.1       -10.3     -28.2%
CMakeFiles/SPARSE_TEST.dir/test/sparse/csr_transpose.cu.o                          39.1          28.1       -11.0     -28.1%
CMakeFiles/LINALG_TEST.dir/test/linalg/reduce_rows_by_key.cu.o                     41.9          30.2       -11.7     -28.0%
Files/raft_distance_lib.dir/src/distance/neighbors/refine_d_uint64_t_int8_t.cu.o   36.9          26.6       -10.3     -28.0%
CMakeFiles/DISTANCE_TEST.dir/test/distance/dist_euc_exp.cu.o                       47.6          34.4       -13.2     -27.7%
CMakeFiles/SPARSE_TEST.dir/test/sparse/sort.cu.o                                   42.9          31.1       -11.8     -27.4%
CMakeFiles/STATS_TEST.dir/test/stats/information_criterion.cu.o                    39.6          28.8       -10.8     -27.3%
CMakeFiles/SPARSE_TEST.dir/test/sparse/degree.cu.o                                 38.7          28.2       -10.5     -27.1%
CMakeFiles/MATRIX_TEST.dir/test/matrix/math.cu.o                                   46.9          34.2       -12.7     -27.1%
CMakeFiles/DISTANCE_TEST.dir/test/distance/dist_eucsqrt_exp.cu.o                   47.1          34.4       -12.7     -27.0%
.dir/src/distance/distance/specializations/detail/kernels/tanh_kernel_float.cu.o   45.8          33.4       -12.3     -26.9%
CMakeFiles/DISTANCE_TEST.dir/test/distance/dist_russell_rao.cu.o                   48.2          35.3       -13.0     -26.9%
tance_lib.dir/src/distance/distance/specializations/fused_l2_nn_float_int64.cu.o   50.7          37.1       -13.6     -26.8%
CMakeFiles/RANDOM_TEST.dir/test/random/rng_int.cu.o                                50.0          36.7       -13.3     -26.6%
CMakeFiles/raft_distance_lib.dir/src/distance/cluster/kmeans_fit_float.cu.o        67.7          49.7       -18.0     -26.6%
CMakeFiles/SPARSE_TEST.dir/test/sparse/csr_to_dense.cu.o                           37.5          27.5        -9.9     -26.5%
CMakeFiles/raft_distance_lib.dir/src/distance/cluster/kmeans_fit_double.cu.o       66.5          48.9       -17.6     -26.5%
ance_lib.dir/src/distance/distance/specializations/fused_l2_nn_double_int64.cu.o   38.0          27.9       -10.1     -26.5%
CMakeFiles/LINALG_TEST.dir/test/linalg/add.cu.o                                    39.4          29.0       -10.4     -26.5%
CMakeFiles/LINALG_TEST.dir/test/linalg/unary_op.cu.o                               51.6          37.9       -13.6     -26.4%
CMakeFiles/SPARSE_TEST.dir/test/sparse/convert_csr.cu.o                            47.3          34.9       -12.4     -26.3%
stance_lib.dir/src/distance/distance/specializations/fused_l2_nn_double_int.cu.o   36.6          27.0        -9.6     -26.3%
CMakeFiles/SPARSE_TEST.dir/test/sparse/reduce.cu.o                                 44.1          32.5       -11.6     -26.3%
CMakeFiles/raft_distance_lib.dir/src/distance/distance/pairwise_distance.cu.o      30.0          22.1        -7.9     -26.2%
MakeFiles/raft_distance_lib.dir/src/distance/cluster/update_centroids_float.cu.o   37.3          27.5        -9.8     -26.2%
CMakeFiles/DISTANCE_TEST.dir/test/distance/dist_minkowski.cu.o                     47.2          34.8       -12.3     -26.1%
CMakeFiles/NEIGHBORS_TEST.dir/test/neighbors/epsilon_neighborhood.cu.o             44.0          32.5       -11.5     -26.1%
istance/distance/specializations/detail/kernels/polynomial_kernel_float_int.cu.o   47.4          35.0       -12.3     -26.0%
ce/distance/specializations/detail/l2_sqrt_unexpanded_float_float_float_int.cu.o   46.6          34.5       -12.1     -25.9%
CMakeFiles/DISTANCE_TEST.dir/test/distance/dist_correlation.cu.o                   45.3          33.6       -11.7     -25.9%
CMakeFiles/SPARSE_TEST.dir/test/sparse/csr_row_slice.cu.o                          36.4          27.0        -9.4     -25.8%
CMakeFiles/DISTANCE_TEST.dir/test/distance/dist_kl_divergence.cu.o                 47.6          35.4       -12.3     -25.8%
CMakeFiles/SPARSE_DIST_TEST.dir/test/sparse/dist_coo_spmv.cu.o                     61.3          45.5       -15.8     -25.7%
CMakeFiles/STATS_TEST.dir/test/stats/kl_divergence.cu.o                            36.5          27.1        -9.4     -25.7%
eFiles/raft_distance_lib.dir/src/distance/neighbors/refine_h_uint64_t_float.cu.o   35.0          26.0        -9.0     -25.7%
Files/raft_distance_lib.dir/src/distance/neighbors/refine_h_uint64_t_int8_t.cu.o   36.0          26.7        -9.2     -25.7%
CMakeFiles/LINALG_TEST.dir/test/linalg/binary_op.cu.o                              45.4          33.8       -11.6     -25.6%
istance/distance/specializations/detail/russel_rao_float_float_float_uint32.cu.o   41.1          30.6       -10.5     -25.5%
c/distance/distance/specializations/detail/russel_rao_float_float_float_int.cu.o   42.7          31.8       -10.8     -25.4%
CORE_TEST                                                                           0.7           0.5        -0.2     -25.1%
nce_lib.dir/src/distance/neighbors/specializations/refine_h_uint64_t_int8_t.cu.o   36.4          27.3        -9.1     -24.9%
CMakeFiles/SPARSE_TEST.dir/test/sparse/norm.cu.o                                   40.9          30.8       -10.2     -24.9%
CMakeFiles/SPARSE_TEST.dir/test/sparse/filter.cu.o                                 54.0          40.6       -13.3     -24.7%
CMakeFiles/STATS_TEST.dir/test/stats/weighted_mean.cu.o                            52.2          39.3       -12.9     -24.7%
CMakeFiles/MATRIX_TEST.dir/test/matrix/matrix.cu.o                                 44.9          33.8       -11.0     -24.6%
akeFiles/raft_distance_lib.dir/src/distance/cluster/update_centroids_double.cu.o   37.5          28.3        -9.2     -24.6%
CMakeFiles/CORE_TEST.dir/test/core/operators_device.cu.o                           35.7          26.9        -8.8     -24.6%
ance_lib.dir/src/distance/neighbors/specializations/refine_h_uint64_t_float.cu.o   36.0          27.2        -8.8     -24.5%
CMakeFiles/STATS_TEST.dir/test/stats/histogram.cu.o                                39.2          29.6        -9.6     -24.5%
CMakeFiles/RANDOM_TEST.dir/test/random/rmat_rectangular_generator.cu.o             39.8          30.1        -9.7     -24.4%
CMakeFiles/LINALG_TEST.dir/test/linalg/power.cu.o                                  42.1          31.9       -10.2     -24.2%
CMakeFiles/RANDOM_TEST.dir/test/random/multi_variable_gaussian.cu.o                50.8          38.6       -12.2     -24.0%
CMakeFiles/LINALG_TEST.dir/test/linalg/map.cu.o                                    51.6          39.3       -12.4     -24.0%
CMakeFiles/MATRIX_TEST.dir/test/matrix/gather.cu.o                                 51.2          38.9       -12.2     -23.9%
CMakeFiles/DISTANCE_TEST.dir/test/distance/dist_chebyshev.cu.o                     45.6          34.7       -10.9     -23.8%
CMakeFiles/LINALG_TEST.dir/test/linalg/rsvd.cu.o                                   53.5          40.8       -12.7     -23.7%
CMakeFiles/RANDOM_TEST.dir/test/random/sample_without_replacement.cu.o             65.2          50.0       -15.1     -23.2%
CMakeFiles/STATS_TEST.dir/test/stats/dispersion.cu.o                               43.3          33.3       -10.0     -23.1%
stance/distance/specializations/detail/l2_expanded_double_double_double_int.cu.o   46.1          35.5       -10.6     -23.0%
CMakeFiles/LABEL_TEST.dir/test/label/label.cu.o                                    42.2          32.5        -9.6     -22.9%
b.dir/src/distance/distance/specializations/detail/l1_float_float_float_int.cu.o   45.2          34.9       -10.3     -22.9%
CMakeFiles/DISTANCE_TEST.dir/test/distance/dist_cos.cu.o                           47.9          37.1       -10.9     -22.7%
e/distance/specializations/detail/l2_sqrt_expanded_double_double_double_int.cu.o   46.9          36.3       -10.6     -22.6%
CMakeFiles/STATS_TEST.dir/test/stats/contingencyMatrix.cu.o                        63.0          48.8       -14.2     -22.6%
CMakeFiles/LINALG_TEST.dir/test/linalg/mean_squared_error.cu.o                     37.5          29.1        -8.4     -22.4%
CMakeFiles/STATS_TEST.dir/test/stats/adjusted_rand_index.cu.o                      71.0          55.3       -15.7     -22.1%
CMakeFiles/STATS_TEST.dir/test/stats/minmax.cu.o                                   41.7          32.5        -9.2     -22.1%
stance/distance/specializations/detail/kernels/polynomial_kernel_double_int.cu.o   33.8          26.4        -7.3     -21.7%
CMakeFiles/RANDOM_TEST.dir/test/random/rng_discrete.cu.o                           50.7          39.6       -11.0     -21.7%
src/distance/distance/specializations/detail/kernels/gram_matrix_base_float.cu.o   44.2          34.6        -9.6     -21.7%
CMakeFiles/LINALG_TEST.dir/test/linalg/reduce_cols_by_key.cu.o                     48.3          37.9       -10.5     -21.6%
istance_lib.dir/src/distance/distance/specializations/fused_l2_nn_float_int.cu.o   46.1          36.3        -9.8     -21.3%
CMakeFiles/LINALG_TEST.dir/test/linalg/sqrt.cu.o                                   40.3          31.8        -8.5     -21.2%
distance/specializations/detail/l2_sqrt_unexpanded_float_float_float_uint32.cu.o   43.4          34.3        -9.2     -21.1%
CMakeFiles/RANDOM_TEST.dir/test/random/rng.cu.o                                    62.2          49.1       -13.1     -21.0%
CMakeFiles/raft_distance_lib.dir/src/distance/neighbors/ivfpq_deserialize.cu.o     53.6          42.3       -11.3     -21.0%
CMakeFiles/RANDOM_TEST.dir/test/random/make_regression.cu.o                        54.4          43.0       -11.4     -20.9%
CMakeFiles/STATS_TEST.dir/test/stats/meanvar.cu.o                                  40.4          32.0        -8.4     -20.8%
ir/src/distance/distance/specializations/detail/l1_float_float_float_uint32.cu.o   45.0          35.9        -9.1     -20.3%
ance/distance/specializations/detail/l2_unexpanded_float_float_float_uint32.cu.o   43.0          34.4        -8.6     -19.9%
CMakeFiles/SOLVERS_TEST.dir/test/sparse/mst.cu.o                                   65.3          52.4       -12.9     -19.8%
CMakeFiles/STATS_TEST.dir/test/stats/trustworthiness.cu.o                          78.1          62.7       -15.4     -19.7%
CMakeFiles/MATRIX_TEST.dir/test/matrix/slice.cu.o                                  38.6          31.1        -7.5     -19.5%
CMakeFiles/MATRIX_TEST.dir/test/matrix/columnSort.cu.o                             57.7          46.4       -11.3     -19.5%
CMakeFiles/NEIGHBORS_TEST.dir/test/neighbors/faiss_mr.cu.o                         56.7          45.7       -11.1     -19.5%
CMakeFiles/STATS_TEST.dir/test/stats/r2_score.cu.o                                 70.4          56.9       -13.5     -19.2%
CMakeFiles/raft_distance_lib.dir/src/distance/neighbors/ivfpq_build.cu.o          156.9         126.9       -30.0     -19.1%
CMakeFiles/LINALG_TEST.dir/test/linalg/matrix_vector.cu.o                          69.5          56.3       -13.2     -19.0%
CMakeFiles/SOLVERS_TEST.dir/test/cluster/cluster_solvers_deprecated.cu.o           69.8          56.9       -13.0     -18.6%
CMakeFiles/raft_distance_lib.dir/src/distance/cluster/cluster_cost_double.cu.o     39.9          32.7        -7.3     -18.2%
CMakeFiles/raft_distance_lib.dir/src/distance/distance/fused_l2_min_arg.cu.o       37.1          30.4        -6.7     -18.2%
CMakeFiles/STATS_TEST.dir/test/stats/completeness_score.cu.o                       65.8          53.9       -11.9     -18.1%
CMakeFiles/STATS_TEST.dir/test/stats/homogeneity_score.cu.o                        63.2          51.8       -11.4     -18.0%
CMakeFiles/SOLVERS_TEST.dir/test/lap/lap.cu.o                                      51.8          42.6        -9.3     -17.9%
CMakeFiles/CLUSTER_TEST.dir/test/cluster/kmeans.cu.o                              136.6         112.3       -24.3     -17.8%
CMakeFiles/NEIGHBORS_TEST.dir/test/neighbors/haversine.cu.o                        47.7          39.2        -8.5     -17.7%
CMakeFiles/raft_distance_lib.dir/src/distance/cluster/cluster_cost_float.cu.o      40.0          33.0        -7.0     -17.4%
CMakeFiles/SPARSE_TEST.dir/test/sparse/symmetrize.cu.o                             52.5          43.4        -9.1     -17.4%
CMakeFiles/LINALG_TEST.dir/test/linalg/normalize.cu.o                              84.0          69.5       -14.5     -17.3%
CMakeFiles/CLUSTER_TEST.dir/test/cluster/cluster_solvers.cu.o                      86.0          71.4       -14.6     -17.0%
nce_lib.dir/src/distance/distance/specializations/detail/hellinger_expanded.cu.o   62.8          52.2       -10.6     -16.9%
aft_distance_lib.dir/src/distance/distance/specializations/detail/chebyshev.cu.o   71.4          59.4       -12.0     -16.8%
CMakeFiles/STATS_TEST.dir/test/stats/v_measure.cu.o                                60.2          50.5        -9.7     -16.2%
CMakeFiles/STATS_TEST.dir/test/stats/accuracy.cu.o                                 66.7          56.3       -10.4     -15.6%
CMakeFiles/DISTANCE_TEST.dir/test/distance/gram.cu.o                               56.1          47.5        -8.7     -15.4%
CMakeFiles/raft_distance_lib.dir/src/distance/neighbors/ivfpq_serialize.cu.o       45.1          38.4        -6.7     -14.9%
CMakeFiles/DISTANCE_TEST.dir/test/distance/fused_l2_nn.cu.o                        75.3          64.2       -11.1     -14.8%
CMakeFiles/LINALG_TEST.dir/test/linalg/reduce.cu.o                                 68.3          58.2       -10.1     -14.8%
CMakeFiles/LINALG_TEST.dir/test/linalg/matrix_vector_op.cu.o                       70.5          60.2       -10.3     -14.6%
libraft_distance.so                                                                 1.4           1.2        -0.2     -14.3%
CMakeFiles/CORE_TEST.dir/test/core/numpy_serializer.cu.o                           75.9          65.2       -10.7     -14.1%
nce_lib.dir/src/distance/distance/specializations/detail/hamming_unexpanded.cu.o   67.3          58.2        -9.2     -13.6%
CMakeFiles/STATS_TEST.dir/test/stats/mutual_info_score.cu.o                        61.0          52.7        -8.3     -13.6%
CMakeFiles/MATRIX_TEST.dir/test/matrix/linewise_op.cu.o                            80.1          69.3       -10.7     -13.4%
CMakeFiles/STATS_TEST.dir/test/stats/silhouette_score.cu.o                         71.2          62.1        -9.1     -12.8%
neighbors/specializations/detail/ivfpq_compute_similarity_float_no_smem_lut.cu.o  265.5         233.6       -32.0     -12.0%
raft_nn_lib.dir/src/nn/specializations/detail/ball_cover_lowdim_pass_one_2d.cu.o  110.0          97.0       -13.0     -11.8%
neighbors/specializations/detail/ivfpq_compute_similarity_float_no_basediff.cu.o  245.2         216.5       -28.7     -11.7%
nce/distance/specializations/detail/jensen_shannon_double_double_double_int.cu.o  217.1         192.0       -25.1     -11.6%
STATS_TEST                                                                          0.5           0.5        -0.1     -11.5%
/neighbors/specializations/detail/ivfpq_compute_similarity_fp8s_no_basediff.cu.o  274.3         243.3       -31.0     -11.3%
CMakeFiles/MATRIX_TEST.dir/test/matrix/select_k.cu.o                              469.2         416.2       -53.0     -11.3%
CMakeFiles/SPARSE_DIST_TEST.dir/test/sparse/distance.cu.o                          81.7          72.5        -9.2     -11.3%
CMakeFiles/STATS_TEST.dir/test/stats/regression_metrics.cu.o                       67.6          60.3        -7.3     -10.8%
raft_nn_lib.dir/src/nn/specializations/detail/ball_cover_lowdim_pass_one_3d.cu.o  116.7         104.2       -12.5     -10.7%
/neighbors/specializations/detail/ivfpq_compute_similarity_fp8u_no_basediff.cu.o  271.2         242.6       -28.7     -10.6%
CMakeFiles/CLUSTER_TEST.dir/test/cluster/kmeans_balanced.cu.o                     206.4         184.5       -21.8     -10.6%
stance/distance/specializations/detail/l2_expanded_float_float_float_uint32.cu.o  186.3         166.7       -19.7     -10.6%
e/distance/specializations/detail/l2_sqrt_expanded_float_float_float_uint32.cu.o  184.9         165.4       -19.5     -10.5%
CMakeFiles/SPARSE_NEIGHBORS_TEST.dir/test/sparse/neighbors/brute_force.cu.o       260.7         233.4       -27.3     -10.5%
t_distance_lib.dir/src/distance/distance/specializations/detail/correlation.cu.o   79.6          71.3        -8.3     -10.4%
NEIGHBORS_TEST                                                                      1.0           0.9        -0.1     -10.1%
/raft_distance_lib.dir/src/distance/neighbors/ivfpq_search_uint8_t_uint64_t.cu.o 1043.6         937.8       -105.8     -10.1%
istance/distance/specializations/detail/kl_divergence_float_float_float_int.cu.o  197.7         178.0       -19.7     -10.0%
/distance/distance/specializations/detail/l2_expanded_float_float_float_int.cu.o  190.1         172.0       -18.2     -9.6%
CMakeFiles/NEIGHBORS_TEST.dir/test/neighbors/knn.cu.o                             367.5         332.7       -34.8     -9.5%
CMakeFiles/LINALG_TEST.dir/test/linalg/norm.cu.o                                   74.8          67.7        -7.1     -9.5%
istance/neighbors/specializations/detail/ivfpq_compute_similarity_half_fast.cu.o  259.8         235.8       -24.0     -9.3%
CMakeFiles/raft_nn_lib.dir/src/nn/specializations/ball_cover_build_index.cu.o     137.8         125.1       -12.7     -9.2%
ance/distance/specializations/detail/l2_sqrt_expanded_float_float_float_int.cu.o  180.8         164.3       -16.5     -9.1%
raft_distance_lib.dir/src/distance/distance/specializations/detail/canberra.cu.o  250.0         227.7       -22.3     -8.9%
CMakeFiles/NEIGHBORS_TEST.dir/test/neighbors/ball_cover.cu.o                      131.1         119.5       -11.6     -8.9%
ance/distance/specializations/detail/kl_divergence_double_double_double_int.cu.o  258.0         235.1       -22.8     -8.9%
istance/neighbors/specializations/detail/ivfpq_compute_similarity_fp8u_fast.cu.o  275.1         251.5       -23.6     -8.6%
stance/neighbors/specializations/detail/ivfpq_compute_similarity_float_fast.cu.o  253.1         232.1       -21.0     -8.3%
r/src/distance/neighbors/specializations/detail/ivfpq_search_float_uint32_t.cu.o  836.5         767.3       -69.2     -8.3%
r/src/distance/neighbors/specializations/detail/ivfpq_search_float_uint64_t.cu.o  964.1         888.0       -76.2     -7.9%
/neighbors/specializations/detail/ivfpq_compute_similarity_fp8u_no_smem_lut.cu.o  278.9         258.2       -20.7     -7.4%
Files/raft_nn_lib.dir/src/nn/specializations/brute_force_knn_long_float_int.cu.o  360.4         334.9       -25.6     -7.1%
CMakeFiles/NEIGHBORS_TEST.dir/test/neighbors/refine.cu.o                          163.3         152.2       -11.2     -6.8%
keFiles/raft_nn_lib.dir/src/nn/specializations/fused_l2_knn_int_float_false.cu.o  282.6         264.0       -18.6     -6.6%
ance/distance/specializations/detail/kl_divergence_float_float_float_uint32.cu.o  197.3         184.6       -12.8     -6.5%
/neighbors/specializations/detail/ivfpq_compute_similarity_fp8s_no_smem_lut.cu.o  299.8         280.6       -19.1     -6.4%
CLUSTER_TEST                                                                        0.4           0.3        -0.0     -6.3%
/neighbors/specializations/detail/ivfpq_compute_similarity_half_no_smem_lut.cu.o  255.8         241.3       -14.4     -5.6%
s/raft_nn_lib.dir/src/nn/specializations/brute_force_knn_uint32_t_float_int.cu.o  334.2         315.7       -18.6     -5.6%
CMakeFiles/UTILS_TEST.dir/test/util/bitonic_sort.cu.o                             234.7         221.7       -13.0     -5.5%
CMakeFiles/raft_nn_lib.dir/src/nn/specializations/ball_cover_knn_query.cu.o       880.9         832.3       -48.6     -5.5%
nce/distance/specializations/detail/jensen_shannon_float_float_float_uint32.cu.o  200.0         189.4       -10.6     -5.3%
akeFiles/raft_nn_lib.dir/src/nn/specializations/fused_l2_knn_int_float_true.cu.o  287.1         272.0       -15.2     -5.3%
akeFiles/NEIGHBORS_TEST.dir/test/neighbors/ann_ivf_pq/test_uint8_t_uint64_t.cu.o  747.2         708.5       -38.7     -5.2%
DISTANCE_TEST                                                                       0.4           0.3        -0.0     -5.0%
CMakeFiles/SPARSE_NEIGHBORS_TEST.dir/test/sparse/neighbors/knn_graph.cu.o         134.6         128.1        -6.6     -4.9%
s/raft_distance_lib.dir/src/distance/neighbors/ivfpq_search_int8_t_uint64_t.cu.o 1034.0         983.8       -50.2     -4.9%
s/raft_distance_lib.dir/src/distance/distance/specializations/detail/cosine.cu.o  336.6         320.3       -16.3     -4.9%
CMakeFiles/raft_nn_lib.dir/src/nn/specializations/ball_cover_all_knn_query.cu.o   941.0         897.8       -43.2     -4.6%
ance_lib.dir/src/distance/neighbors/specializations/refine_d_uint64_t_float.cu.o  517.6         494.4       -23.2     -4.5%
eFiles/raft_nn_lib.dir/src/nn/specializations/fused_l2_knn_long_float_false.cu.o  276.6         264.2       -12.3     -4.5%
libraft_nn.so                                                                       0.3           0.3        -0.0     -4.3%
raft_nn_lib.dir/src/nn/specializations/detail/ball_cover_lowdim_pass_two_3d.cu.o  168.3         161.3        -7.0     -4.1%
keFiles/raft_nn_lib.dir/src/nn/specializations/fused_l2_knn_long_float_true.cu.o  287.2         275.4       -11.8     -4.1%
iles/raft_nn_lib.dir/src/nn/specializations/brute_force_knn_long_float_uint.cu.o  354.7         340.2       -14.5     -4.1%
ance/distance/specializations/detail/lp_unexpanded_double_double_double_int.cu.o  622.1         597.9       -24.2     -3.9%
CMakeFiles/install.util                                                             0.3           0.2        -0.0     -3.8%
MakeFiles/NEIGHBORS_TEST.dir/test/neighbors/ann_ivf_pq/test_int8_t_uint64_t.cu.o  730.8         703.0       -27.8     -3.8%
CMakeFiles/DISTANCE_TEST.dir/test/distance/dist_adj.cu.o                          224.7         216.3        -8.4     -3.7%
/raft_nn_lib.dir/src/nn/specializations/brute_force_knn_uint32_t_float_uint.cu.o  333.7         321.3       -12.4     -3.7%
ance/distance/specializations/detail/lp_unexpanded_float_float_float_uint32.cu.o  487.5         470.5       -17.0     -3.5%
ir/src/distance/neighbors/specializations/detail/ivfpq_search_float_int64_t.cu.o  951.0         918.2       -32.8     -3.4%
CMakeFiles/CLUSTER_TEST.dir/test/cluster/linkage.cu.o                            1166.6        1126.6       -40.0     -3.4%
stance/distance/specializations/detail/jensen_shannon_float_float_float_int.cu.o  387.6         375.7       -11.9     -3.1%
istance/distance/specializations/detail/lp_unexpanded_float_float_float_int.cu.o  482.4         469.2       -13.1     -2.7%
istance/neighbors/specializations/detail/ivfpq_compute_similarity_fp8s_fast.cu.o  279.4         272.1        -7.2     -2.6%
/neighbors/specializations/detail/ivfpq_compute_similarity_half_no_basediff.cu.o  243.4         237.2        -6.1     -2.5%
CMakeFiles/SOLVERS_TEST.dir/test/linalg/eigen_solvers.cu.o                        815.6         795.9       -19.7     -2.4%
ce_lib.dir/src/distance/neighbors/specializations/refine_d_uint64_t_uint8_t.cu.o  811.9         792.4       -19.5     -2.4%
nce_lib.dir/src/distance/neighbors/specializations/refine_d_uint64_t_int8_t.cu.o  797.5         781.1       -16.4     -2.1%
LINALG_TEST                                                                         1.4           1.4        -0.0     -1.6%
es/raft_distance_lib.dir/src/distance/neighbors/ivfpq_search_float_uint64_t.cu.o  991.4         975.7       -15.7     -1.6%
CMakeFiles/NEIGHBORS_TEST.dir/test/neighbors/ann_ivf_pq/test_float_uint32_t.cu.o  617.6         607.8        -9.8     -1.6%
raft_nn_lib.dir/src/nn/specializations/detail/ball_cover_lowdim_pass_two_2d.cu.o  165.6         163.1        -2.4     -1.5%
CMakeFiles/NEIGHBORS_TEST.dir/test/neighbors/ann_ivf_pq/test_float_int64_t.cu.o   732.0         725.8        -6.2     -0.8%
akeFiles/SPARSE_NEIGHBORS_TEST.dir/test/sparse/neighbors/connect_components.cu.o  362.0         359.1        -2.9     -0.8%
SPARSE_DIST_TEST                                                                    0.1           0.1        -0.0     -0.7%
UTILS_TEST                                                                          0.1           0.1        +0.0     +0.0%
CMakeFiles/NEIGHBORS_TEST.dir/test/neighbors/fused_l2_knn.cu.o                    911.6         915.8        +4.2     +0.5%
CMakeFiles/NEIGHBORS_TEST.dir/test/neighbors/selection.cu.o                       389.8         392.1        +2.3     +0.6%
SOLVERS_TEST                                                                        0.1           0.1        +0.0     +0.7%
CMakeFiles/NEIGHBORS_TEST.dir/test/neighbors/ann_ivf_flat.cu.o                   1515.9        1533.0       +17.1     +1.1%
CMakeFiles/NEIGHBORS_TEST.dir/test/neighbors/ann_ivf_pq/test_float_uint64_t.cu.o  712.4         727.7       +15.3     +2.2%
CMakeFiles/UTILS_TEST.dir/test/util/integer_utils.cpp.o                             2.3           2.3        +0.1     +2.5%
SPARSE_NEIGHBORS_TEST                                                               0.2           0.2        +0.0     +7.1%
RANDOM_TEST                                                                         0.5           0.5        +0.0     +8.7%
MATRIX_TEST                                                                         0.4           0.5        +0.1     +13.3%
SPARSE_TEST                                                                         0.4           0.6        +0.2     +43.0%

from pathlib import Path
from collections import Counter

def parse_ninja_log(log_path):
    text = Path(log_path).read_text()
    start, end, mtime, path, cmd = list(zip(*[line.split("\t") for line in text.splitlines()[1:]]))
    start = list(map(int, start))
    end = list(map(int, end))
    seconds = [(e - s) / 1000. for e, s in zip(end, start)]
    mtime = list(map(int, mtime))

    return dict(
        start=start,
        end=end,
        seconds=seconds,
        mtime=mtime,
        path=path,
        cmd=cmd
    )

def discard_earlier_builds(d):
    prev_end = 0
    start_index = 0
    # end must be monotonically increasing. If we find and end value that is
    # lower than the end value on the previous row, we know that a new build has
    # started.
    for i, end in enumerate(d['end']):
        if end < prev_end:
            start_index = i
        prev_end = end

    return {k: v[start_index:] for k, v in d.items()}

def print_duplicates(d):
    # d is a dict returned by parse_ninja_log
    print(f"  # {'path':<60}     sec  cmd hash             sec  other cmd hash")
    dup_paths = sorted(set(p for p, count in Counter(d['path']).items() if count > 1))
    for i, p in enumerate(dup_paths):
        print(f"{i:3d} {p[-60:]:<60}: ", end="")
        for p_other, sec, cmd in zip(d['path'], d['seconds'], d['cmd']):
            if p == p_other:
                print(f"{sec:6.1f}  ({cmd})", end="")
        print()

compiled = parse_ninja_log("/home/ahendriksen/Downloads/ninja_log_spdlog/ninja_log_spdlog_compiled")
headers = parse_ninja_log("/home/ahendriksen/Downloads/ninja_log_spdlog/ninja_log_spdlog_headers")

compiled = discard_earlier_builds(compiled)
headers = discard_earlier_builds(headers)

# Print sum of compile times of each translation unit:
print(f"Sum of compile times for compiled spdlog:    {sum(compiled['seconds']):.1f} seconds")
print(f"Sum of compile times for header-only spdlog: {sum(headers['seconds']):.1f} seconds\n")

compiled_times = dict(zip(compiled['path'], compiled['seconds']))
headers_times = dict(zip(headers['path'], headers['seconds']))

print("Compile times for paths only found in headers (seconds):")
for p in set(headers['path']) - set(compiled['path']):
    print(f"{p[-80:]:<80} {headers_times[p]:6.1f}")


# Compare compile time per path between compiled and headers:
results = [(path, headers_times[path], compiled_times[path]) for path in compiled_times.keys()]
# Add relative change as a percentage
results = [(p, hsec, csec, csec - hsec, 100. * (csec / hsec - 1)) for p, hsec, csec in results]
# Sort by relative change
results = sorted(results, key=lambda x: x[4])

# Print results
print("\nComparison of compile times between headers and compiled: ")
print(f"{'path':<70}       header (s)  compiled (s)  change (s) change (%)")
for p, hsec, csec, diff, rel in results:
    print(f"{p[-80:]:<80} {hsec:6.1f}        {csec:6.1f}       {diff:+5.1f}     {rel:+4.1f}%")

cjnolet · 2023-03-12T19:57:24Z

I'm proposing that RMM allow the user to set whether the compiled or header-only spdlog target is used. I would honestly prefer if we just defaulted to compiled everywhere accept for users who "really" want fully header-only operation.

ahendriksen · 2023-03-13T08:37:44Z

Thanks for looking into this Corey! I agree it is a good idea to consider using the precompiled spdlog library. If we go the precompiled route, would this require adding a runtime dependency on spdlog in the conda package as well? We currently do not seem to have a Conda dependency on spdlog.

ahendriksen added the bug Something isn't working label Feb 23, 2023

ahendriksen mentioned this issue Feb 23, 2023

[BUG] Consider removing spdlog dependency for substantial compile time improvements rapidsai/rmm#1222

Open

ahendriksen mentioned this issue Feb 23, 2023

[FEA] Investigating build times and number of specializations #1201

Open

ahendriksen mentioned this issue Mar 27, 2023

[ENH] Add extern template for ivfflat_interleaved_scan #1360

Merged

ahendriksen mentioned this issue Apr 4, 2023

Move RMM_LOGGING_ASSERT into separate header rapidsai/rmm#1241

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Consider removing spdlog dependency for substantial compile time improvements #1300

[BUG] Consider removing spdlog dependency for substantial compile time improvements #1300

ahendriksen commented Feb 23, 2023

ahendriksen commented Feb 23, 2023 •

edited

Loading

cjnolet commented Mar 8, 2023 •

edited

Loading

ahendriksen commented Mar 8, 2023

cjnolet commented Mar 12, 2023

ahendriksen commented Mar 13, 2023

[BUG] Consider removing spdlog dependency for substantial compile time improvements #1300

[BUG] Consider removing spdlog dependency for substantial compile time improvements #1300

Comments

ahendriksen commented Feb 23, 2023

ahendriksen commented Feb 23, 2023 • edited Loading

cjnolet commented Mar 8, 2023 • edited Loading

ahendriksen commented Mar 8, 2023

cjnolet commented Mar 12, 2023

ahendriksen commented Mar 13, 2023

ahendriksen commented Feb 23, 2023 •

edited

Loading

cjnolet commented Mar 8, 2023 •

edited

Loading