-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dbnode] Optimize block range scan in queryWithSpan #3813
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3813 +/- ##
======================================
Coverage 57.0% 57.0%
======================================
Files 552 552
Lines 63540 63540
======================================
Hits 36280 36280
Misses 24054 24054
Partials 3206 3206
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice find!
@@ -387,6 +393,10 @@ func newEntryIndexState() entryIndexState { | |||
} | |||
} | |||
|
|||
func (s *entryIndexState) indexedRangeWithRLock() (xtime.UnixNano, xtime.UnixNano) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why the separate method instead of inlining in IndexedRange
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IndexedRange
is on Entry
struct, indexedRangeWithRLock
is on entryIndexState
.
guess we have some work to do over the next 1000 years! |
if currentBlock.Before(minIndexed) { | ||
currentBlock = minIndexed | ||
} | ||
maxIndexedExclusive := maxIndexed.Add(time.Nanosecond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why add the nanosecond?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To convert inclusive timestamp to exclusive one. So that we can compare and replace endExclusive
with maxIndexedExclusive
in the subsequent if
.
endExclusive = maxIndexedExclusive | ||
} | ||
|
||
for !inBlock && currentBlock.Before(endExclusive) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I don't quite follow this for loop. queryWithSpan
is called on a block
and takes a queryIter
which iterates over index results within the same block. Given that, why do we need to do this loop over every block in the query range to check and see if the doc is indexed? Don't we only need to check that the block itself is within the query range and that doc is indexed for this same block (which it should be since it's in the queryIter
)? I'm not super familiar with the intricacies of the read path, so could be misunderstanding here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH I'm not really familiar with this aspect, either. I saw the opportunity to optimize this code without affecting its semantics, in which case I don't have to fully understand the context of it.
I think the best bet is to ask @robskillington who wrote the original code to shed some light on the purpose of this loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, this loop behaves slightly different than before. Previously, this loop was being iterated no less than once, now it might not iterate at all. While the current implementation is more correct, maybe the less correct version was necessary for proper functioning? (Though conditions when the behavior would differ seem to be too rare for this to matter).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean the case when start == endInclusive
? I think this was a bug, and it would have been difficult to replicate with the optimized version, so I chose to fix it. I believe this is not a realistic edge case, also this is certainly not the issue that we could have seen.
What this PR does / why we need it:
The client could query for an arbitrary wide time range (eg. 1000 years) which would cause an expensive iteration over all of it in the inner most loop. This change narrows down the scanned range of blocks to at most what is actually covered by the index entry.
Special notes for your reviewer:
Does this PR introduce a user-facing and/or backwards incompatible change?:
NONE
Does this PR require updating code package or user-facing documentation?:
NONE