-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lazy initialize checkpoint tracker bit sets #27179
Conversation
This local checkpoint tracker uses collections of bit sets to track which sequence numbers are complete, eventually removing these bit sets when the local checkpoint advances. However, these bit sets were eagerly allocated so that if a sequence number far ahead of the checkpoint was marked as completed, all bit sets between the "last" bit set and the bit set needed to track the marked sequence number were allocated. If this sequence number was too far ahead, the memory requirements could be excessive. This commit opts for a different strategy for holding on to these bit sets and enables them to be lazily allocated.
9581e2d
to
21299b8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Left some nits.
FixedBitSet current = processedSeqNo.getFirst(); | ||
long bitArrayKey = getBitArrayKey(checkpoint); | ||
FixedBitSet current = processedSeqNo.get(bitArrayKey); | ||
if (current == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when can this happen? it seems we only clean bit sets here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we clean the set (e.g., the checkpoint is equal to size - 1
) but the checkpoint can not advance to size
. In this case, when size
is marked as completed, checkpoint will still be equal to size - 1
but we have already cleaned the set.
"updateCheckpoint is called but the bit following the checkpoint is not set"; | ||
try { | ||
// keep it simple for now, get the checkpoint one by one; in the future we can optimize and read words | ||
FixedBitSet current = processedSeqNo.getFirst(); | ||
long bitArrayKey = getBitArrayKey(checkpoint); | ||
FixedBitSet current = processedSeqNo.get(bitArrayKey); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this is a FixedBitSet
I wonder why renamed everything to use bitArray? I don't mind. Just curious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I did not like the inconsistency that in some places it's referred to as a "bit array", and in some it's referred to as a "bit set". I wanted to make them all "bit set" but the problem is the setting index.seq_no.checkpoint.bit_arrays_size
(currently never released to the world) refers to "bit array". If, and only if, you would be okay with changing the name of this setting to index.seq_no.checkpoint.bit_sets_size
(in 6.0) then I would be okay with normalizing everything to "bit set".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with renaming the setting. I think I added it just be able to control things from tests and as an extra escape hatch if the default size proved wrong. We might want to just remove it. We can solve the test part differently and I'm less concerned now about the default size (I'm also thinking that 1024 is maybe too small)..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened #27191.
* The checkpoint always falls in the current bit set or we have already cleaned it; if it falls on the last bit of the | ||
* current bit set, we can clean it. | ||
*/ | ||
if (checkpoint == (1 + bitArrayKey) * bitArraysSize - 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we maybe cache (1 + bitArrayKey) * bitArraysSize - 1
to something like lastSeqNoInArray ? I think it will be easier to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed 91fbab3.
* Previously this would allocate the entire chain of bit sets to the one for the sequence number being marked; for very large | ||
* sequence numbers this could lead to excessive memory usage resulting in out of memory errors. | ||
*/ | ||
tracker.markSeqNoAsCompleted(randomNonNegativeLong()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol. can we assert we allocated just one array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed b2cbfc5.
@@ -253,7 +271,17 @@ public void testResetCheckpoint() { | |||
tracker.resetCheckpoint(localCheckpoint); | |||
assertThat(tracker.getCheckpoint(), equalTo((long) localCheckpoint)); | |||
assertThat(tracker.getMaxSeqNo(), equalTo((long) maxSeqNo)); | |||
assertThat(tracker.processedSeqNo, empty()); | |||
assertThat(tracker.processedSeqNo, new BaseMatcher<LongObjectHashMap<FixedBitSet>>() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I despise assertTrue
. Note that this gives a nicer error message if the assertion fails:
java.lang.AssertionError:
Expected: empty
but: was <[512=>org.apache.lucene.util.FixedBitSet@98761236]>
Expected :empty
Actual :<[512=>org.apache.lucene.util.FixedBitSet@98761236]>
as opposed to
java.lang.AssertionError
at __randomizedtesting.SeedInfo.seed([F9E3DE31F5D979AE:C4C54170CE328934]:0)
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at org.elasticsearch.index.seqno.LocalCheckpointTrackerTests.testResetCheckpoint(LocalCheckpointTrackerTests.java:276)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
Thanks @bleskes, I've responded to your feedback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Left one suggestion.
return seqNo / bitArraysSize; | ||
} | ||
|
||
private FixedBitSet getBitArrayForSeqNo(final long seqNo) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe use indexOf(...)
and then indexInsert(...)
and indexGet(...)
respectively to avoid determining what the slot is for a key several times?
final int slot = processedSeqNo.indexOf(bitArrayKey);
if (processedSeqNo.indexExists(slot) == false) {
processedSeqNo.indexInsert(slot, bitArrayKey, new FixedBitSet(bitArraysSize));
}
return processedSeqNo.indexGet(slot);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @martijnvg, I pushed cabda87.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And 68c4eab. 😐
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the indexGet
at the end is not correct because the returned index is negative if the slot is not actually used. Yet, ~slot
also does not work because the table could have been resized after the insert. I pushed 97bb3ca.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, I didn't realize that. Maybe we should do this to make optimal use of the slot
?
final int slot = processedSeqNo.indexOf(bitArrayKey);
if (processedSeqNo.indexExists(slot) == false) {
FixedBitSet bitSet = new FixedBitSet(bitArraysSize));
processedSeqNo.indexInsert(slot, bitArrayKey, bitSet);
return bitSet;
} else {
return processedSeqNo.indexGet(slot);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you see a chance to avoid == false
, you seize it. 😉
I pushed f4f0dae.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:)
* master: Remove checkpoint tracker bit sets setting Fix stable BWC branch detection logic Fix logic detecting unreleased versions Enhances exists queries to reduce need for `_field_names` (elastic#26930) Added new terms_set query Set request body to required to reflect the code base (elastic#27188) Update Docker docs for 6.0.0-rc2 (elastic#27166) Add version 6.0.0 Docs: restore now fails if it encounters incompatible settings (elastic#26933) Convert index blocks to cluster block exceptions (elastic#27050) [DOCS] Link remote info API in Cross Cluster Search docs page Fix Laplace scorer to multiply by alpha (and not add) (elastic#27125) [DOCS] Clarify migrate guide and search request validation Raise IllegalArgumentException if query validation failed (elastic#26811) prevent duplicate fields when mixing parent and root nested includes (elastic#27072) TopHitsAggregator must propagate calls to `setScorer`. (elastic#27138)
This local checkpoint tracker uses collections of bit sets to track which sequence numbers are complete, eventually removing these bit sets when the local checkpoint advances. However, these bit sets were eagerly allocated so that if a sequence number far ahead of the checkpoint was marked as completed, all bit sets between the "last" bit set and the bit set needed to track the marked sequence number were allocated. If this sequence number was too far ahead, the memory requirements could be excessive. This commit opts for a different strategy for holding on to these bit sets and enables them to be lazily allocated. Relates #27179
This local checkpoint tracker uses collections of bit sets to track which sequence numbers are complete, eventually removing these bit sets when the local checkpoint advances. However, these bit sets were eagerly allocated so that if a sequence number far ahead of the checkpoint was marked as completed, all bit sets between the "last" bit set and the bit set needed to track the marked sequence number were allocated. If this sequence number was too far ahead, the memory requirements could be excessive. This commit opts for a different strategy for holding on to these bit sets and enables them to be lazily allocated.
Relates #10708