-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enforce retention policies #2776
Conversation
7e80aaa
to
676b62c
Compare
676b62c
to
99f68d4
Compare
99f68d4
to
c9c59d0
Compare
@@ -70,6 +72,13 @@ func (s *Server) appendClusterService(c cluster.Config) { | |||
s.Services = append(s.Services, srv) | |||
} | |||
|
|||
func (s *Server) appendRetentionPolicyService(c retention.Config) { | |||
srv := retention.NewService(c) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be consistent with the current pattern, we should do:
if !c.Enabled {
return
}
Looks good. I'm assuming we are adding test coverage in a future PR. |
type ShardGroupInfo struct { | ||
ID uint64 | ||
StartTime time.Time | ||
EndTime time.Time | ||
Shards []ShardInfo | ||
Tombstone bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be better make this a time.Time
so we can later go and expunge shard groups that have been tombstoned long ago.
Also, the function names and comments refer to Delete
ing a shard group. I think it would be clearer to name this DeletedAt
personally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, using time.Time
isn't a bad idea. Renaming to DeletedAt
works for me.
Thanks for the valuable feedback @jwilder @corylanou |
662b317
to
00c262d
Compare
00c262d
to
ae3e8c7
Compare
func (s *Service) deleteShardGroups() { | ||
defer s.wg.Done() | ||
|
||
ticker := time.NewTicker(s.checkInterval) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still missing the defer ticker.Stop()
Two small items, but otherwise looks good. 👍 |
Thanks again @corylanou @jwilder -- final changes pushed up. Will perform more system test now before merging. |
Since this has passed basic unit tests and has been reviewed, I'm going to merge now. There are a couple of other small changes that are needed post-testing, and it will be easier to review that way. |
+1 |
Fixes #8819. Previously, the process of dropping expired shards according to the retention policy duration, was managed by two independent goroutines in the retention policy service. This behaviour was introduced in #2776, at a time when there were both data and meta nodes in the OSS codebase. The idea was that only the leader meta node would run the meta data deletions in the first goroutine, and all other nodes would run the local deletions in the second goroutine. InfluxDB no longer operates in that way and so we ended up with two independent goroutines that were carrying out an action that was really dependent on each other. If the second goroutine runs before the first then it may not see the meta data changes indicating shards should be deleted and it won't delete any shards locally. Shortly after this the first goroutine will run and remove the meta data for the shard groups. This results in a situation where it looks like the shards have gone, but in fact they remain on disk (and importantly, their series within the index) until the next time the second goroutine runs. By default that's 30 minutes. In the case where the shards to be removed would have removed the last occurences of some series, then it's possible that if the database was already at its maximum series limit (or tag limit for that matter), no further new series can be inserted.
This change adds a new service to the system for Retention Policy enforcement.
Enforcement is composed of two parts:
ShardGroups
and deletes anyShardGroups
that have expired. Note that with this change ShardGroups are not actually deleted. Instead they have a tombstone flag set. This is so it is explicit that aShardGroup
, and its associated shards, can be safely deleted.ShardGroup
it is deleted.