perf: reduce allocations in MergedFileDescriptors #59

mark-rushakoff · 2023-04-12T19:58:19Z

While profiling memory usage on the SDK's sim non-determinism test, MergedFileDescriptors showed up as a hotspot. By reusing a gzip.Reader and a bytes.Buffer instead of instantiating a new one on every loop, I was able to take the GC CPU fraction for that test (on a somewhat small case of 10 blocks of block size 10), from about 5.1% to about 2.6%, roughly cutting GC time in half. Also pre-size a couple slices that have an obvious final size.

In the real world, I don't think apps tend to have that many calls to NewInterfaceRegistry, but more efficient memory usage in that call certainly won't hurt.

While profiling memory usage on the SDK's sim non-determinism test, MergedFileDescriptors showed up as a hotspot. By reusing a gzip.Reader and a bytes.Buffer instead of instantiating a new one on every loop, I was able to take the GC CPU fraction for that test (on a somewhat small case of 10 blocks of block size 10), from about 5.1% to about 2.6%, roughly cutting GC time in half. Also pre-size a couple slices that have an obvious final size. In the real world, I don't think apps tend to have that many calls to NewInterfaceRegistry, but more efficient memory usage in that call certainly won't hurt.

mark-rushakoff · 2023-04-12T21:06:16Z

I didn't look at open PRs before writing this, so I didn't see that #56 was another approach to tune this path.

In speeding up the the non-determinism test, there was little CPU contention, so I didn't consider adding concurrency to this path.

To compare with the benchmark noted in #56, running from cosmos/cosmos-sdk@001c11e, I see a slight speedup in benchmarking simd q with cosmos/gogoproto at this PR, compared to a build with 0c06433.

$ hyperfine --warmup 4 -r 50 '/tmp/simd-yesperf q' '/tmp/simd-noperf q'
Benchmark 1: /tmp/simd-yesperf q
  Time (mean ± σ):     125.8 ms ±  13.2 ms    [User: 170.9 ms, System: 24.7 ms]
  Range (min … max):   117.9 ms … 195.6 ms    50 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 2: /tmp/simd-noperf q
  Time (mean ± σ):     129.7 ms ±   5.2 ms    [User: 221.1 ms, System: 25.8 ms]
  Range (min … max):   124.8 ms … 161.8 ms    50 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Summary
  '/tmp/simd-yesperf q' ran
    1.03 ± 0.12 times faster than '/tmp/simd-noperf q'

My preference would be to merge this change first as it visibly does not risk reordering anything. Then I would make some adjustments to the approach in #56 -- in the sim tests in the SDK, I noticed that there were about 113 proto file descriptors. Instead of processing each in a distinct goroutine, I would start GOMAXPROCS goroutines, each reading from a single channel and each reusing its own gzip.Reader and bytes.Buffer. That should provide a strong balance of maximizing CPU throughput and minimizing garbage creation.

julienrbrt · 2023-04-12T22:48:24Z

Can you add a changelog?

mark-rushakoff requested a review from a team as a code owner April 12, 2023 19:58

aaronc approved these changes Apr 12, 2023

View reviewed changes

chore: mention #59 in changelog

5537111

julienrbrt approved these changes Apr 13, 2023

View reviewed changes

julienrbrt merged commit 7dca9a2 into main Apr 13, 2023

julienrbrt deleted the mr/perf-reduce-gc branch April 13, 2023 12:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: reduce allocations in MergedFileDescriptors #59

perf: reduce allocations in MergedFileDescriptors #59

mark-rushakoff commented Apr 12, 2023

mark-rushakoff commented Apr 12, 2023

julienrbrt commented Apr 12, 2023

perf: reduce allocations in MergedFileDescriptors #59

perf: reduce allocations in MergedFileDescriptors #59

Conversation

mark-rushakoff commented Apr 12, 2023

mark-rushakoff commented Apr 12, 2023

julienrbrt commented Apr 12, 2023