Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: reduce allocations in MergedFileDescriptors #59

Merged
merged 2 commits into from
Apr 13, 2023

Conversation

mark-rushakoff
Copy link
Member

While profiling memory usage on the SDK's sim non-determinism test, MergedFileDescriptors showed up as a hotspot. By reusing a gzip.Reader and a bytes.Buffer instead of instantiating a new one on every loop, I was able to take the GC CPU fraction for that test (on a somewhat small case of 10 blocks of block size 10), from about 5.1% to about 2.6%, roughly cutting GC time in half. Also pre-size a couple slices that have an obvious final size.

In the real world, I don't think apps tend to have that many calls to NewInterfaceRegistry, but more efficient memory usage in that call certainly won't hurt.

While profiling memory usage on the SDK's sim non-determinism test,
MergedFileDescriptors showed up as a hotspot. By reusing a gzip.Reader
and a bytes.Buffer instead of instantiating a new one on every loop, I
was able to take the GC CPU fraction for that test (on a somewhat small
case of 10 blocks of block size 10), from about 5.1% to about 2.6%,
roughly cutting GC time in half. Also pre-size a couple slices that have
an obvious final size.

In the real world, I don't think apps tend to have that many calls to
NewInterfaceRegistry, but more efficient memory usage in that call
certainly won't hurt.
@mark-rushakoff mark-rushakoff requested a review from a team as a code owner April 12, 2023 19:58
@mark-rushakoff
Copy link
Member Author

I didn't look at open PRs before writing this, so I didn't see that #56 was another approach to tune this path.

In speeding up the the non-determinism test, there was little CPU contention, so I didn't consider adding concurrency to this path.

To compare with the benchmark noted in #56, running from cosmos/cosmos-sdk@001c11e, I see a slight speedup in benchmarking simd q with cosmos/gogoproto at this PR, compared to a build with 0c06433.

$ hyperfine --warmup 4 -r 50 '/tmp/simd-yesperf q' '/tmp/simd-noperf q'
Benchmark 1: /tmp/simd-yesperf q
  Time (mean ± σ):     125.8 ms ±  13.2 ms    [User: 170.9 ms, System: 24.7 ms]
  Range (min … max):   117.9 ms … 195.6 ms    50 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 2: /tmp/simd-noperf q
  Time (mean ± σ):     129.7 ms ±   5.2 ms    [User: 221.1 ms, System: 25.8 ms]
  Range (min … max):   124.8 ms … 161.8 ms    50 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Summary
  '/tmp/simd-yesperf q' ran
    1.03 ± 0.12 times faster than '/tmp/simd-noperf q'

My preference would be to merge this change first as it visibly does not risk reordering anything. Then I would make some adjustments to the approach in #56 -- in the sim tests in the SDK, I noticed that there were about 113 proto file descriptors. Instead of processing each in a distinct goroutine, I would start GOMAXPROCS goroutines, each reading from a single channel and each reusing its own gzip.Reader and bytes.Buffer. That should provide a strong balance of maximizing CPU throughput and minimizing garbage creation.

@julienrbrt
Copy link
Member

Can you add a changelog?

@julienrbrt julienrbrt merged commit 7dca9a2 into main Apr 13, 2023
@julienrbrt julienrbrt deleted the mr/perf-reduce-gc branch April 13, 2023 12:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants