-
Notifications
You must be signed in to change notification settings - Fork 179
Do not use syncPool for chunks when querying #427
Do not use syncPool for chunks when querying #427
Conversation
db.go
Outdated
@@ -521,7 +519,7 @@ func (db *DB) reload() (err error) { | |||
// See if we already have the block in memory or open it otherwise. | |||
b, ok := db.getBlock(meta.ULID) | |||
if !ok { | |||
b, err = OpenBlock(dir, db.chunkPool) | |||
b, err = OpenBlock(dir, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you sure this doesn't increase allocations?
if you pass the same pool to OpenBlock()
it should help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could have been a bit more clear, this issue is not an allocation issue, but just an optimization to avoid the overhead of using a sync.Pool
. Since we never put anything into the pool except during a compaction, the pool is almost always empty so we have to allocate a new chunk anyway.
I could look into trying to add chunks back to the pool, but it is hard to know exactly when we would be able to. Perhaps when a querier is closed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after a compactoin we put back in the Pool and right after that we reload the db which calls OpenBlock so if these share the same pool should reduce allocations, no?
Maybe a dedicated benchmark would answer the question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping @csmarchbanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will do some work looking at testing with a dedicated pool today.
Having the pool will likely help allocations a bit right after a compaction, either until the first garbage collection (see golang/go#22950), or enough queries have run such that the pool is drained. But in reality, the only allocation that will be saved is the struct (the buffer is replaced after a Get), so it would be a tiny amount of allocations saved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried using a combined pool, and it didn't seem to help with allocations at all. I would like to try a few more things, to make sure I am actually testing what I think I am. I will report back soon!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some more benchmarking, and it really seems like the overhead of syncPool is more expensive than any caching gains for the small amount of time around a compaction. I went as far as having the querier return chunks to the pool when it was done with them, but it was still slower.
It looks like the syncPool was originally implemented to reduce spikes when compacting, and that path will still use a pool (#118).
Since it's only a couple percent gain for retrieving and iterating through a chunk, I understand if it feels a bit risky to merge though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this code be added to any of the test files so I can have a look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed two new commits, one doing the shared pool, one allowing a reader to pass chunks back to the pool. I fixed one small issue with the reader writing to the pool, and it looks like writing to the pool will slow the queries down a bit, but also reduce allocations significantly. I am not sure which is better overall...:
benchcmp with_pool.txt without_pool.txt
benchmark old ns/op new ns/op delta
BenchmarkPersistedQueries/series=10,samplesPerSeries=1000-8 1188273 1137934 -4.24%
BenchmarkPersistedQueries/series=10,samplesPerSeries=10000-8 11466663 10678957 -6.87%
BenchmarkPersistedQueries/series=10,samplesPerSeries=100000-8 114114078 107062847 -6.18%
BenchmarkPersistedQueries/series=100,samplesPerSeries=1000-8 11005831 10901434 -0.95%
BenchmarkPersistedQueries/series=100,samplesPerSeries=10000-8 108194015 112203305 +3.71%
BenchmarkPersistedQueries/series=100,samplesPerSeries=100000-8 1191467654 1167810755 -1.99%
benchmark old allocs new allocs delta
BenchmarkPersistedQueries/series=10,samplesPerSeries=1000-8 358 520 +45.25%
BenchmarkPersistedQueries/series=10,samplesPerSeries=10000-8 2037 3563 +74.91%
BenchmarkPersistedQueries/series=10,samplesPerSeries=100000-8 18999 34092 +79.44%
BenchmarkPersistedQueries/series=100,samplesPerSeries=1000-8 2807 4587 +63.41%
BenchmarkPersistedQueries/series=100,samplesPerSeries=10000-8 18202 34996 +92.26%
BenchmarkPersistedQueries/series=100,samplesPerSeries=100000-8 174065 340264 +95.48%
benchmark old bytes new bytes delta
BenchmarkPersistedQueries/series=10,samplesPerSeries=1000-8 64981 68153 +4.88%
BenchmarkPersistedQueries/series=10,samplesPerSeries=10000-8 181887 212317 +16.73%
BenchmarkPersistedQueries/series=10,samplesPerSeries=100000-8 1415349 1713858 +21.09%
BenchmarkPersistedQueries/series=100,samplesPerSeries=1000-8 269631 305181 +13.18%
BenchmarkPersistedQueries/series=100,samplesPerSeries=10000-8 1327456 1662354 +25.23%
BenchmarkPersistedQueries/series=100,samplesPerSeries=100000-8 12651272 15943688 +26.02%
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, here's the result of current master, compared to adding a put to the Pool in the Reader:
benchcmp original.txt with_pool.txt
benchmark old ns/op new ns/op delta
BenchmarkPersistedQueries/series=10,samplesPerSeries=1000-8 1100883 1112310 +1.04%
BenchmarkPersistedQueries/series=10,samplesPerSeries=10000-8 10623663 10533229 -0.85%
BenchmarkPersistedQueries/series=10,samplesPerSeries=100000-8 105319888 106529656 +1.15%
BenchmarkPersistedQueries/series=100,samplesPerSeries=1000-8 10772258 10645391 -1.18%
BenchmarkPersistedQueries/series=100,samplesPerSeries=10000-8 105563054 105657945 +0.09%
BenchmarkPersistedQueries/series=100,samplesPerSeries=100000-8 1128303811 1103207237 -2.22%
benchmark old allocs new allocs delta
BenchmarkPersistedQueries/series=10,samplesPerSeries=1000-8 520 358 -31.15%
BenchmarkPersistedQueries/series=10,samplesPerSeries=10000-8 3563 2037 -42.83%
BenchmarkPersistedQueries/series=10,samplesPerSeries=100000-8 34092 18999 -44.27%
BenchmarkPersistedQueries/series=100,samplesPerSeries=1000-8 4587 2806 -38.83%
BenchmarkPersistedQueries/series=100,samplesPerSeries=10000-8 34997 18256 -47.84%
BenchmarkPersistedQueries/series=100,samplesPerSeries=100000-8 340262 179192 -47.34%
benchmark old bytes new bytes delta
BenchmarkPersistedQueries/series=10,samplesPerSeries=1000-8 68170 64940 -4.74%
BenchmarkPersistedQueries/series=10,samplesPerSeries=10000-8 212348 181887 -14.34%
BenchmarkPersistedQueries/series=10,samplesPerSeries=100000-8 1713962 1415349 -17.42%
BenchmarkPersistedQueries/series=100,samplesPerSeries=1000-8 305244 269624 -11.67%
BenchmarkPersistedQueries/series=100,samplesPerSeries=10000-8 1662664 1330012 -20.01%
BenchmarkPersistedQueries/series=100,samplesPerSeries=100000-8 15942680 12854432 -19.37%
The only time any chunks are currently put into the syncPool is during a compaction. This means that during almost all normal operation a new chunk is still being created, along with the extra overhead of going through the syncPool Get process. Signed-off-by: Chris Marchbanks <[email protected]>
ae6c4d7
to
b3678cc
Compare
Signed-off-by: Chris Marchbanks <[email protected]>
b009912
to
c28c3db
Compare
Signed-off-by: Chris Marchbanks <[email protected]>
c28c3db
to
a8c80db
Compare
@@ -603,6 +603,10 @@ func (s *populatedChunkSeries) Next() bool { | |||
continue | |||
} | |||
|
|||
for _, chk := range s.chks { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we choose that implementing writing to the pool on queries is the best way to go, would there be any concern about doing this at the beginning of the Next() block? Then they can be used right away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah , I have been looking at this for the past 2 days and although it is an awkward place to put it couldn't find a better way to structure the code.
started a benchmark to check the gains it at high load |
Signed-off-by: Krasi Georgiev <[email protected]>
@csmarchbanks trying to ping you on IRC to syn on this. |
I am at KubeCon this week. I will check in on IRC when i get a chance. |
Results of promql benchmarks:
|
Signed-off-by: Chris Marchbanks <[email protected]>
A couple of the promql benchmarks are failing with the new code change making me think it is unsafe. I am investigating why, and also why no unit tests are failing. |
@csmarchbanks could be becasue of unrelated change already fixed in #479 btw I also did some test on an isolated VM and here are the top results:
|
Yeah, looks like this is broken as it is right now, even after merging in the fix you mentioned. There are still some references pointing to a chunk that has been returned to the pool. These tests break right now: |
Closing because the gains are minimal, and the bugs that could be caused are subtle. |
The only time any chunks are currently put into the syncPool is during a
compaction. This means that during almost all normal operation a new
chunk is still being created, along with the extra overhead of going
through the syncPool Get process.
This keeps the pool as an optional argument so as to not change the behavior of the compactor.
Noticed while working on #236
When using the benchmarks from #425 we save a couple percent:
And if we remove the
it.Next()
loop (focus only on chunk creation, not decoding) then we get:Signed-off-by: Chris Marchbanks [email protected]