Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal for time series deletion with block storage #4274
Proposal for time series deletion with block storage #4274
Changes from 1 commit
e08af0f
8027c4a
9081eb2
30ca52d
47f06d4
0224010
73db542
ffd079e
2f30ab5
f95b002
add4da2
434df48
98abab0
bc19925
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a timing issue here. When a new tombstone is created or deleted (cancel delete request) the querier will take some time before applying it (it's not instantaneous). However, the cache generation number is increased immediately so we're going to cache query results with the new gen number but results upon which tombstones haven't been enforced yet.
Am I missing anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pracucci - I was thinking of an alternative approach.
The purger would write new tombstones to 3 ingesters assigned to a user and wait until at-least 2 succeeds. While executing a query, a querier would fetch the tombstones for a user from all ingesters in the cluster. If the
tombstoneTimestamp > currentCacheGenNumber
, the querier would update thecurrentCacheGenNumber
tocurrentTimestamp
. I believe this would solve the timing issue. WDYT?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This requires that queriers have access to "place where current cache gen number" is stored, and also doesn't handle cancellation as currently proposed.
I am wondering if we can modify design:
We need to consider cancelled tombstones -- in order to make this work, we need to keep them around until at least one other tombstone request or cancellation request with newer timestamp exists. In other words, cancellation wouldn't just delete a tombstone, but create another "tombstone cancelled" file, so that queriers can include its timestamp in the response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the suggestion. I like this idea and think it could work. I've updated the proposal to include this. One small modification I made is that instead of using the largest tombstone timestamp, I propose to use the timestamp of when the compactor writes the tombstones to the bucket index. I think this should work the same but make the delete request cancellations a bit simpler to implement.