-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add binlogs to backup providers #3581
Comments
This functionality can be the first step in significantly speed ing up tablet provisioning. We could also provision and sync a new replica without the master being involved in the operation (saving a good amount of resources). Example:
Step 4 is technically not necessary but simply a nice to have. I see its value only if binlogs are uploaded very infrequently and/or are very large. Regarding the binlog backup implementation what do you think about:
|
I think you'd want some way to filter those binlogs based on GTID or something. Maybe we edit the binlog filename to include beginning/ending GTID to filter on. Another option would be to store a GTID->filename mapping alongside the binlog lock.
Sounds good to me. We'd need to hook into reparenting logic to make sure that the binlog replica hasn't been promoted to master.
Also seems logical. What about retention? Do we delete the local binlog immediately on successful upload? |
Yeah we definitely need some logic to download only the data that we need. We should not be worried about the "apply" as MySQL will automatically skip all transactions with GTID <
I'm not sure I'm following. Why do we need this?
I don't think we should do that as we need the binlogs to allow other replicas to catch up. I think we should leave the file rotation/retention to MySQL. |
It wouldn't necessarily break anything, but I don't think you want your master also doing your binlog backups. |
Yes, we are on the same page: during reparenting the replica-now-new-master should release the lock |
hey guys, thanks for creating this. This issue seems to focus mostly on using it for speeding up replica provisioning. If we are to build a feature, it should also be useful for Disaster Recovery. That involves additional requirements:
There may be other improvements as well, but I consider these to be the bare minimum (in addition to the spec outlined in past comments). |
A future improvement to the above could allow replicas to serve different purposes. One uploads at 5m interval for DR, and another uploads hourly chunks for speedy provisioning. This would be useful for those who can afford the extra storage. For now I think we can do without though. |
@bbeaudreault all ✅ Minor point regarding "tunable upload interval": I don't see a case where we shouldn't upload a binlog file right after it has been rotated by MySQL. Do you want to upload as well the binlog file that MySQL is still using? Looping as well @ameetkotian @demmer and @rafael in the conversation as we were all discussing about this functionality yesterday |
@guidoiaquinti actually yea, that works. I missed that part and got hung up on the second sentence here:
But to your point, this could be configurable by setting the max binlog size appropriately in the mysql configs, rather than as a tablet config. so 👍 from me |
Oh, I realized -- what we do is we force a particular interval by using a cron job which first calls This way, regardless of write volume, you can ensure that you're always recoverable to within X minutes. I suppose we could use a separate process to do this and just let vitess notice the new log and upload it. Would be nice to be built-in, but I guess not necessary if we take that approach. |
Good point, this is probably the best way to enforce a valid DR policy (e.g. we can lose till X mins of binlogs) |
I could talk for hours about this subject, and all the ramifications working on some of these features would have... Here are some related thoughts: A. High replica count use cases: In setups where there are many replicas per shard (tens of them), a simple star-shaped replication scheme gets very limiting:
Internally, we developed a tool very similar to 'binlog server': it's basically a mysql replication node that only knows how to replicate binlogs, stores them locally, and can serve them to other slaves. If such nodes are inserted in between the master and the replicas, it gives the user a much better replication profile:
But this tool only makes sense with a somewhat high replica count, which may not be a very common use case. However... B. Binlogs backups: binlogs have a very cool property: they're append-only streams. It's a very different pattern than backups (big one-time dump), or data files (random access all the time). It makes then very similar to log files. Some Cloud providers have very cool cheap storage for append-only files. Back to the 'binlog server', you could have one instance of the binlog server backed by one of these storage systems: as it gets binlogs from the master, they're added to the cloud storage, making it a backup with very little latency (write binlogs every second, or if bigger than 128k for instance). Then you may not even need binlog streams to stand up more 'binlog servers', they could just read the cloud storage generated by the main guy (and use master election to know which one owns writing to the cloud storage... good thing we have master election code in our topo API!). Also, when taking a backup, you store the exact position of the backup in the replication stream, so when you bring up a replica, it's very easy to find the replication starting point. With both backups and filtered replication streams being stored securely in your distributed cloud storage, it also becomes very easy to stand up a replica at exactly a replication point. Very easy to access data at any point in time then, even from days or weeks ago, depending on the retention policy of the backups and binlogs. Very cool feature to get out of a jam when a bad application change wiped out critical data. C. Different binlog strategies: to save local processing on the replicas, only the replicas that can be elected as masters should save their binlogs. The others should not, as they don't need to. Most setups now enabling local binlogs on all replicas. in Vitess, filtered replication is connecting to replicas, but it could also connect to the binlog servers, obviously. Filtering is done on the server side for these, so they'd need to know the vschema, easy enough. D. Pre-filtered binlog streams: when we split shards, we have to split the replication stream on the fly. The destination split shards get a subset of the replication stream. But what if the replication stream was already split, by default on every shard, into multiple streams, each for a smaller keyrange? They when splitting a shard, you can just subscribe to a smaller number of streams. The 'binlog servers' could pre-split the streams by keyrange as they store them in the Cloud storage. We could also pre-split the backups, but that's harder, as we just save the entire data files for backup. E. Reversing how we do filtered replication: somewhat related to this topic. The way we do filtered replication right now is a bit convoluted: the destination shard master vttablet connects to a source shard replica, gets a replication stream from the binlog, turns these into SQL, and re-plays the SQL locally. An alternate solution would be to let the destination shard master have a MySQL replication protocol master that knows how to server the filtered binlog replication stream. Serving this from a binlog server would be somewhat easy. And we'd let MySQL replication do the work of remembering where we left of for us (now that it supports multiple sources, we can do that for merges too, not just splits). Wow that's a lot of cool stuff we could do here... and I'm not even going into Sugu's plan of using semi-sync to control master commits, that would allow multi masters and Paxos master election... You can tell Sugu and I have been thinking about it a lot hehe |
Great write up @alainjobart! Knowledge.. dropped. It does seem like vitess is well positioned to provide a new binlog server binary. We know the topology and we have the lock service, two big components for such a feature. This does seem like a larger project than periodic uploads, but seems both higher fidelity and more scalable. One could recover to within seconds or less, and scale for reads by just adding more readers to the cloud storage. Would be really sweet. |
Couldn't agree more. This sounds like a hugely valuable addition to the Vitess portfolio of features. |
This has a lot of interesting ramifications about what Vitess is, which is obviously important to @sougou and @jvaidya now that they're building a business. This is one of the growing list of features that is indifferent to sharding. This is an opinionated way to operate MySQL at any scale. Even if you were a startup with 100 MB of data, Vitess becomes a compelling solution that integrates and automates backups, failover and monitoring. In the same way that Kubernetes "won" the orchestration wars, features like this position Vitess to "win" MySQL deployments. Why would you ever choose to run vanilla MySQL if/when it's just as easy to spin up Vitess? |
Once Vitess is aware of binlogs, we can potentially expand their usage for other operations.
The text was updated successfully, but these errors were encountered: