Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of immutable maps #1720

Merged
merged 3 commits into from
Jul 26, 2016
Merged

Improve performance of immutable maps #1720

merged 3 commits into from
Jul 26, 2016

Conversation

2opremio
Copy link
Contributor

@2opremio 2opremio commented Jul 25, 2016

  • Replace github.com/mndrix/ps by github.com/weaveworks/ps (See Improve performance of maps ps#1 )
  • LatestMap codec performance improvements and cleanups
    • Allocate all map entries of the intermediate representation at once
    • Use UnsafeMutableMap to improve performance of LatestMap construction
    • Remove gob encoder/decoder

Improves #1010 #1457 #971

@2opremio 2opremio changed the title [WIP] Improve performance when decoding/setting maps [WIP] Improve performance when decoding/setting immutable maps Jul 25, 2016
@tomwilkie
Copy link
Contributor

I like that UnsafeMutableSet is on the ps.Tree, and not on the LatestMap.

What is the performance impact?

@2opremio
Copy link
Contributor Author

2opremio commented Jul 25, 2016

My measurements against the query service in the dev-c4 cluster show about a ~10% reduction in CPU consumption. (For @rade , I initially saw about 30% but stabilized in ~10% when I removed load balancing from the equation, using a single replica, and I obtained longer profiles of 90s instead of 30s)

Without the changes in this PR:

$ go tool pprof --seconds 90 http://localhost:4040/debug/pprof/profile
Fetching profile from http://localhost:4040/debug/pprof/profile?seconds=90
Please wait... (1m30s)
Saved profile in /Users/fons/pprof/pprof.localhost:4040.samples.cpu.001.pb.gz
Entering interactive mode (type "help" for commands)
(pprof) top10
21.96s of 41.59s total (52.80%)
Dropped 382 nodes (cum <= 0.21s)
Showing top 10 nodes out of 197 (cum >= 1.13s)
      flat  flat%   sum%        cum   cum%
     5.12s 12.31% 12.31%      8.14s 19.57%  runtime.scanobject
     3.32s  7.98% 20.29%      6.26s 15.05%  runtime.heapBitsSweepSpan
     2.94s  7.07% 27.36%      2.94s  7.07%  runtime.(*mspan).sweep.func1
     2.30s  5.53% 32.89%      8.21s 19.74%  runtime.mallocgc
     1.96s  4.71% 37.61%      1.98s  4.76%  runtime.heapBitsSetType
     1.57s  3.77% 41.38%      1.57s  3.77%  runtime.memmove
     1.24s  2.98% 44.36%      1.94s  4.66%  runtime.greyobject
     1.23s  2.96% 47.32%      2.38s  5.72%  github.com/weaveworks/scope/vendor/github.com/mndrix/ps.hashKey
     1.15s  2.77% 50.08%      1.15s  2.77%  runtime.stringiter2
     1.13s  2.72% 52.80%      1.13s  2.72%  runtime.heapBitsForObject
(pprof) 

With the changes:

$ go tool pprof --seconds 90 http://localhost:4040/debug/pprof/profile
Fetching profile from http://localhost:4040/debug/pprof/profile?seconds=90
Please wait... (1m30s)
Saved profile in /Users/fons/pprof/pprof.localhost:4040.samples.cpu.002.pb.gz
Entering interactive mode (type "help" for commands)
(pprof) top10
20.28s of 38.11s total (53.21%)
Dropped 409 nodes (cum <= 0.19s)
Showing top 10 nodes out of 200 (cum >= 1.01s)
      flat  flat%   sum%        cum   cum%
     4.91s 12.88% 12.88%      8.36s 21.94%  runtime.scanobject
     3.38s  8.87% 21.75%      5.54s 14.54%  runtime.heapBitsSweepSpan
     2.16s  5.67% 27.42%      2.16s  5.67%  runtime.(*mspan).sweep.func1
     1.93s  5.06% 32.48%      6.01s 15.77%  runtime.mallocgc
     1.53s  4.01% 36.50%      1.53s  4.01%  runtime.memmove
     1.44s  3.78% 40.28%      1.44s  3.78%  runtime.heapBitsSetType
     1.38s  3.62% 43.90%      2.15s  5.64%  runtime.greyobject
     1.33s  3.49% 47.39%      1.33s  3.49%  runtime.heapBitsForObject
     1.21s  3.18% 50.56%      2.22s  5.83%  github.com/weaveworks/scope/vendor/github.com/mndrix/ps.hashKey
     1.01s  2.65% 53.21%      1.01s  2.65%  runtime.stringiter2
(pprof) 

@2opremio
Copy link
Contributor Author

2opremio commented Jul 25, 2016

After fixing the cut-and-paste bug in the recursive call things look better: ~18% improvement.

$ go tool pprof --seconds 90 http://localhost:4040/debug/pprof/profile
Fetching profile from http://localhost:4040/debug/pprof/profile?seconds=90
Please wait... (1m30s)
Saved profile in /Users/fons/pprof/pprof.localhost:4040.samples.cpu.005.pb.gz
Entering interactive mode (type "help" for commands)
(pprof) top10
17710ms of 34480ms total (51.36%)
Dropped 400 nodes (cum <= 172.40ms)
Showing top 10 nodes out of 198 (cum >= 970ms)
      flat  flat%   sum%        cum   cum%
    3500ms 10.15% 10.15%     6040ms 17.52%  runtime.scanobject
    2710ms  7.86% 18.01%     2710ms  7.86%  runtime.(*mspan).sweep.func1
    2520ms  7.31% 25.32%     5240ms 15.20%  runtime.heapBitsSweepSpan
    1620ms  4.70% 30.02%     6440ms 18.68%  runtime.mallocgc
    1550ms  4.50% 34.51%     1550ms  4.50%  runtime.memmove
    1340ms  3.89% 38.40%     2450ms  7.11%  github.com/weaveworks/scope/vendor/github.com/mndrix/ps.hashKey
    1320ms  3.83% 42.23%     1320ms  3.83%  runtime.heapBitsSetType
    1110ms  3.22% 45.45%     1110ms  3.22%  runtime.stringiter2
    1070ms  3.10% 48.55%     1620ms  4.70%  runtime.greyobject
     970ms  2.81% 51.36%      970ms  2.81%  runtime.heapBitsForObject
(pprof) 

@tomwilkie
Copy link
Contributor

Thats a bit better!

@2opremio
Copy link
Contributor Author

2opremio commented Jul 25, 2016

I've discovered that we are spending 10% of the app time ... parsing unicode while hashing

func hashKey(key string) uint64 {
    hash := offset64
    for _, codetime := range key {
        hash ^= uint64(codepoint)
        hash *= prime64
    }
    return hash
}

An unsafe casting to a bytes cuts down another 10% CPU:

$ go tool pprof --seconds 90 http://localhost:4040/debug/pprof/profile
Fetching profile from http://localhost:4040/debug/pprof/profile?seconds=90
Please wait... (1m30s)
Saved profile in /Users/fons/pprof/pprof.localhost:4040.samples.cpu.006.pb.gz
Entering interactive mode (type "help" for commands) 
(pprof) top5
11.88s of 31.75s total (37.42%)
Dropped 374 nodes (cum <= 0.16s)
Showing top 5 nodes out of 206 (cum >= 1.29s)
      flat  flat%   sum%        cum   cum%
     3.58s 11.28% 11.28%      6.09s 19.18%  runtime.scanobject
     3.06s  9.64% 20.91%      5.34s 16.82%  runtime.heapBitsSweepSpan
     2.28s  7.18% 28.09%      2.28s  7.18%  runtime.(*mspan).sweep.func1
     1.67s  5.26% 33.35%      6.12s 19.28%  runtime.mallocgc
     1.29s  4.06% 37.42%      1.29s  4.06%  runtime.memmove
(pprof) 

@2opremio 2opremio changed the title [WIP] Improve performance when decoding/setting immutable maps [WIP] Improve immutable maps performance Jul 25, 2016
@2opremio 2opremio changed the title [WIP] Improve immutable maps performance Improve performance of immutable maps Jul 26, 2016
"github.com/ugorji/go/codec"
"github.com/weaveworks/ps"

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

@tomwilkie
Copy link
Contributor

Other than one comment, LGTM

}
return LatestMap{out}
}

// CodecEncodeSelf implements codec.Selfer
func (m *LatestMap) CodecEncodeSelf(encoder *codec.Encoder) {
if m.Map != nil {

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

@paulbellamy
Copy link
Contributor

You probably meant to gvt fetch github.com/weaveworks/ps

@2opremio
Copy link
Contributor Author

You probably meant to gvt fetch github.com/weaveworks/ps

I forgot to commit it, thanks

Alfonso Acosta added 2 commits July 26, 2016 10:35
* Allocate all map entries of the intermadiate representation at once
* Use UnsafeMutableMap to improve performance of LatestMap construction
* Remove gob encoder/decoder
@paulbellamy
Copy link
Contributor

Once vendored, LGTM.

@2opremio 2opremio merged commit 2132528 into master Jul 26, 2016
@2opremio 2opremio deleted the reduce-gc branch July 26, 2016 12:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants