Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: switch Store.{replicas,replicaQueues} from sync.Map to IntMap #17625

Merged
merged 5 commits into from
Aug 14, 2017

Conversation

petermattis
Copy link
Collaborator

@petermattis petermattis commented Aug 13, 2017

Motivated by seeing sync.Map.Load() high on profiles when there are large
numbers of replicas on a node. See #17609.

name               old time/op  new time/op  delta
StoreGetReplica-8  9.12ns ±10%  3.22ns ± 5%  -64.65%  (p=0.000 n=10+8)  

@petermattis petermattis requested a review from a team August 13, 2017 14:22
@cockroach-teamcity
Copy link
Member

This change is Reviewable

IntMap is a fork of sync.Map which models map[int64]unsafe.Pointer
rather than map[interface{}]interface{}. Using a concrete type for the
key and unsafe.Pointer for the value provides a significant speedup.

Changed IntMap.read from an atomic.Value to an unsafe.Pointer that is
updated atomically. The use of atomic.Value was unnecessary because we
always know the concrete type. Will investigate upstreaming this change.

name                                          old time/op  new time/op  delta
LoadMostlyHits/*syncutil.DeepCopyMap-8        18.5ns ±21%   7.1ns ±24%  -61.69%  (p=0.000 n=20+20)
LoadMostlyHits/*syncutil.RWMutexMap-8         60.5ns ± 4%  63.5ns ± 1%   +4.88%  (p=0.000 n=20+19)
LoadMostlyHits/*syncutil.Map-8                20.6ns ± 8%   5.9ns ± 5%  -71.49%  (p=0.000 n=20+19)
LoadMostlyMisses/*syncutil.DeepCopyMap-8      15.0ns ±20%   6.9ns ± 1%  -54.02%  (p=0.000 n=20+20)
LoadMostlyMisses/*syncutil.RWMutexMap-8       63.0ns ± 3%  61.3ns ± 6%     ~     (p=0.082 n=20+20)
LoadMostlyMisses/*syncutil.Map-8              15.3ns ±25%   5.1ns ±35%  -66.41%  (p=0.000 n=20+20)
LoadOrStoreBalanced/*syncutil.RWMutexMap-8     424ns ± 3%   252ns ± 1%  -40.66%  (p=0.000 n=20+18)
LoadOrStoreBalanced/*syncutil.Map-8            433ns ± 1%   263ns ± 3%  -39.34%  (p=0.000 n=18+19)
LoadOrStoreUnique/*syncutil.RWMutexMap-8       688ns ± 2%   356ns ± 1%  -48.22%  (p=0.000 n=18+19)
LoadOrStoreUnique/*syncutil.Map-8              855ns ± 3%   431ns ± 2%  -49.62%  (p=0.000 n=20+20)
LoadOrStoreCollision/*syncutil.DeepCopyMap-8  7.82ns ± 3%  6.61ns ± 2%  -15.53%  (p=0.000 n=20+20)
LoadOrStoreCollision/*syncutil.RWMutexMap-8    161ns ± 3%   164ns ± 5%     ~     (p=0.050 n=19+20)
LoadOrStoreCollision/*syncutil.Map-8          8.78ns ± 1%  4.56ns ±50%  -48.13%  (p=0.000 n=18+20)
Range/*syncutil.DeepCopyMap-8                 4.42µs ± 2%  4.30µs ± 2%   -2.54%  (p=0.000 n=19+20)
Range/*syncutil.RWMutexMap-8                  67.9µs ± 6%  62.7µs ± 3%   -7.64%  (p=0.000 n=19+19)
Range/*syncutil.Map-8                         4.94µs ± 3%  4.65µs ± 3%   -5.90%  (p=0.000 n=20+19)
AdversarialAlloc/*syncutil.DeepCopyMap-8      1.05µs ± 2%  0.67µs ± 3%  -36.59%  (p=0.000 n=19+20)
AdversarialAlloc/*syncutil.RWMutexMap-8       67.0ns ± 4%  68.1ns ± 5%   +1.57%  (p=0.012 n=20+19)
AdversarialAlloc/*syncutil.Map-8               368ns ± 2%   265ns ± 1%  -27.84%  (p=0.000 n=20+15)
AdversarialDelete/*syncutil.DeepCopyMap-8      222ns ± 1%   148ns ± 0%  -33.27%  (p=0.000 n=20+20)
AdversarialDelete/*syncutil.RWMutexMap-8      81.1ns ± 8%  78.4ns ± 9%     ~     (p=0.058 n=20+20)
AdversarialDelete/*syncutil.Map-8             70.0ns ± 1%  71.0ns ± 3%   +1.49%  (p=0.000 n=17+14)
Motivated by seeing sync.Map.Load() high on profiles when there are
large numbers of replicas on a node. See cockroachdb#17609.

name               old time/op  new time/op  delta
StoreGetReplica-8  9.12ns ±10%  3.22ns ± 5%  -64.65%  (p=0.000 n=10+8)
@nvanbenschoten
Copy link
Member

:lgtm_strong:

The switch of IntMap.read from an atomic.Value to an unsafe.Pointer should certainly be upstreamed. Out of curiosity, did you look into how much each of the three changes (the int key specialization to preventing boxing/unboxing, the unsafe.Pointer value specialization, and the IntMap.read change) individually impacted the benchmarks? Or is it difficult to separate them out?

Also, we might as well use this for storeMap and some of the RaftTransport maps.


Reviewed 4 of 4 files at r1, 4 of 4 files at r2, 2 of 2 files at r3.
Review status: all files reviewed at latest revision, all discussions resolved, some commit checks failed.


Comments from Reviewable

@petermattis
Copy link
Collaborator Author

The IntMap.read change provided a 14% improvement in the StoreGetReplica benchmark. I didn't break apart the specialization of the key and value, but that change account for the majority of the benefit.

Also, we might as well use this for storeMap and some of the RaftTransport maps.

Yes, we can use it for everything except RaftTransport.handlers. Well, I suppose we could use it for that too if we store a pointer to the handler.


Review status: all files reviewed at latest revision, all discussions resolved, some commit checks failed.


Comments from Reviewable

@petermattis
Copy link
Collaborator Author

Btw, do you happen to know how to run benchmarks in the Go runtime:

~/Development/go-tip/src/sync master go test -v -run - -bench .
mutex.go:14:2: use of internal package not allowed

Review status: all files reviewed at latest revision, all discussions resolved, some commit checks failed.


Comments from Reviewable

@petermattis
Copy link
Collaborator Author

Add 2 more commits to convert RaftTransport and Stores to use IntMap.

@bdarnell
Copy link
Contributor

:lgtm:


Review status: 6 of 8 files reviewed at latest revision, all discussions resolved, some commit checks failed.


Comments from Reviewable

@petermattis petermattis merged commit cc645f3 into cockroachdb:master Aug 14, 2017
@petermattis petermattis deleted the pmattis/int-map branch August 14, 2017 15:49
@nvanbenschoten
Copy link
Member

I'm not sure about running benchmarks in the Go runtime. I guess you could just create a wrapper benchmark around whatever you want to measure, but it seems strange that you'd need to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants