-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tools bucket replicate: add support for concurrent bucket replication #5280
Conversation
Signed-off-by: Michal Wasilewski <[email protected]>
writing down some notes:
Summary:
|
ah, I just realized that |
Signed-off-by: Michal Wasilewski <[email protected]>
Signed-off-by: Michal Wasilewski <[email protected]>
switched to a pipeline approach, but I still want to give this some more thought |
Signed-off-by: Michal Wasilewski <[email protected]>
Signed-off-by: Michal Wasilewski <[email protected]>
The bucket client methods seem to be thread safe at least for a few backends that I checked: GCS
src: https://pkg.go.dev/cloud.google.com/go/storage#hdr-Creating_a_Client MinioFilesystemthere are cases of race conditions when calling bucket methods, e.g. #5103 , but it seems that as long as the filename is unique it should be fine: https://github.com/thanos-io/thanos/blob/main/pkg/objstore/filesystem/filesystem.go#L191 |
ideas for extending this further (potentially create issues):
|
Signed-off-by: Michal Wasilewski <[email protected]>
Signed-off-by: Michal Wasilewski <[email protected]>
Signed-off-by: Michal Wasilewski <[email protected]>
@GiedriusS I think this is ready for a first review to confirm I'm heading in the right direction |
I'll take a look soon, thank you for your thorough work. A bit on vacation ATM. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this!
Implementation is great, simple and efficient - we need tests and just minor style nits fixed.
But there is a reason why this code is not concurrent. I tried to explain this in flag help which we could use to warn users. This is essentially unsafe - but for your use case it might work (e.g if you don't have a compactor turned off on the target bucket). There is even comment about this:
There is a way to make it a bit safer - we can split to groups as mentioned here like we do in the compactor itself (unless already done here).
if err := rs.ensureBlockIsReplicated(ctx, b.BlockMeta.ULID); err != nil { | ||
return errors.Wrapf(err, "ensure block %v is replicated", b.BlockMeta.ULID.String()) | ||
} | ||
// iterate over blocks and send them to a channel sequentially |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use a full sentences in comments (nit).
// iterate over blocks and send them to a channel sequentially | |
// Iterate over blocks and send them to a channel sequentially. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, my bad, I'll fix it
I wonder why this was not picked up by the linter, golangci-lint config includes godot:
Line 36 in a6f6ce0
- godot |
.godot.yaml
with config appropriate to make it check all comments? (and probably fix lots and lots of comments across the code base)
return errors.Wrapf(err, "ensure block %v is replicated", b.BlockMeta.ULID.String()) | ||
} | ||
// iterate over blocks and send them to a channel sequentially | ||
blocksChan := func() <-chan *metadata.Meta { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does it have to be in closure (anon function)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No reason, good point, I'll simplify this.
return out | ||
}() | ||
|
||
// fan-out for concurrent replication |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong comment format (full sentence please).
} | ||
// iterate over blocks and send them to a channel sequentially | ||
blocksChan := func() <-chan *metadata.Meta { | ||
out := make(chan *metadata.Meta) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out := make(chan *metadata.Meta) | |
blocksChan := make(chan *metadata.Meta) |
@@ -210,6 +211,7 @@ func (tbc *bucketReplicateConfig) registerBucketReplicateFlag(cmd extkingpin.Fla | |||
cmd.Flag("matcher", "Only blocks whose external labels exactly match this matcher will be replicated.").PlaceHolder("key=\"value\"").StringsVar(&tbc.matcherStrs) | |||
|
|||
cmd.Flag("single-run", "Run replication only one time, then exit.").Default("false").BoolVar(&tbc.singleRun) | |||
cmd.Flag("concurrency-level", "Max number of go-routines to use for replication.").Default("4").IntVar(&tbc.concurrencyLvl) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately this is not as simple, so if you want to keep this implementation we have to add extra context and set it to 1.
cmd.Flag("concurrency-level", "Max number of go-routines to use for replication.").Default("4").IntVar(&tbc.concurrencyLvl) | |
cmd.Flag("concurrency-level", "Number of go-routines to use for replication. WARNING: Value bigger than one enables, concurrent, non-sequential block replication. This means that the block from 4w ago can be uploaded after the block from 2h ago. In the default Thanos compactor, with a 30m consistency delay (https://thanos.io/tip/components/compact.md/#consistency-delay) if within 30 minutes after uploading those blocks you will have still some blocks to upload between those, the compactor might do compaction assuming a gap! This is solvable with vertical compaction, but it wastes extra computation and is not enabled by default. Use with care.").Default("1").IntVar(&tbc.concurrencyLvl) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is solvable with vertical compaction, but it wastes extra computation and is not enabled by default.
I see what you mean here, even if the problem can be solved by vertical compaction, the blocks should not be uploaded out of sequence for no valid reason.
I wonder if we should have a method of acquiring a lock on the compactor while these kinds of operations are in progress. For example, the tool could could make an RPC to the compactor, or write a top-of-bucket lockfile. |
I think better to lock, would be to have a compaction planning algorithm that just deals with those "out of order blocks" reasonably. We have vertical compaction we could enable by default, but we need just more efficient planning and execution that just waste more time and resources. No need for lock IMO (: |
Some entry point for this work: #4233 |
There's no use case on my side, in our environments we don't run replication between buckets. I was simply looking for Thanos issues to work on in my spare time and looked through the "good first issues" list in the issue tracker ;)
I suspected I might be missing something about the bigger picture here. Thanks for the detailed explanation!
Thank you so much for these pointers! I wonder if having replication concurrency between compaction groups, but not within groups (basically what you suggested) would be sufficient here to address this problem. I'll need to give it some more thought, but any further feedback is welcome. |
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward? This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
Hello 👋 Looks like there was no activity on this amazing PR for the last 30 days. |
Closing for now as promised, let us know if you need this to be reopened! 🤗 |
fixes: #4278
Signed-off-by: Michal Wasilewski [email protected]
Changes
The current implementation of bucket replication is serial, i.e. there's a loop which iterates over blocks and synchronizes them between a source and destination bucket, one block at a time. This PR switches to a pipeline based approach which allows for concurrent replication.
The pipeline consists of three stages. The first stage is a single goroutine which sends pointers to blocks that need to be synchronized to an unbuffered channel. The second stage consists of multiple goroutines which receive pointers to blocks from the upstream channel and execute replication for each received pointer. The level of concurrency of this stage can be controlled with a cli config option. The default concurrency level is 4 and it's an arbitrary number. This default should work even on single threaded CPUs because goroutines are used. Any errors that happen in the second stage are sent to a channel which is then read from by the third stage. The third stage aggregates the errors using a multierror object. Once all executions from the second stage are finished the function returns the multierror object which also implements the
error
interface from the stdlib.Verification