-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Point in time recovery (PITR) Part 2 #6267
Comments
I've been confused about the binlog server part of the PITR approach. Why are we using ripple instead of VRep? What do we get out of it? |
To restore back to exactly certain point of time, we need the continous binlogs to be available, which any binlog server can manage it(as in saving the binlog files to storage). Whereas VRep helps in replicating any database i.e. read the binlogs and apply to the database. That is why we need a binlog server and ripple can be used as binlog server. |
To elaborate a bit more on @arindamnayak's point above:
|
It's a known quantity and if you're running Vitess, you're running VReplication. Certainly that's less complexity than introducing an entire new unmanaged binlog server that you have to manage outside the context of Vitess. I wouldn't want to have to deploy and learn ripple or any other binlog server when I shouldn't have to. That's how we ended up in the current undesirable Orchestrator state. |
I'm not sure how this compares functionality wise, but if we are trying to keep things simple, maybe we could make a binlog server a first class citizen and use https://github.com/flike/kingbus that's also written in Go. |
@derekperkins we are trying to first address the situation where people already have a binlog server and we can connect to that. I agree that a VReplication based solution will be more native with fewer moving parts. |
I totally get tackling it a piece at a time, just hoping that the solution is built with that in mind. The dream would be for Vitess to own binlogs entirely, for replication purposes, backups, PITR, etc. |
Also, rereading that conversation makes me wish that @alainjobart would make the jump to PlanetScale. :) |
This is an extension to #4886 .
Feature Description
With current PITR support in vitess, it is possible to restore to the last backup timestamp. But if we want to go back to the exact time for the restore, it is not possible to apply that delta change. For e.g. say we have last backup at 12:00 AM and it is required to restore upto 3:15 AM. As of now, we can restore the backup upto 12 AM. With current change, it is possible to restore the data till 3:15 AM.
Use cases
This remains still the same as part-1(#4886). Here is the following use case.
Precondition
Proposed Design
There will be a binlog server which will connect to the mysql server of the master tablet. In a sharded cluster with n shards, there would be n binlog servers.
There is scheduled backup available at regular intervals.
Say we have to recover the data to 6:15 AM, then we will create a restore keyspace from 6 AM backup and it will connect with the binlog server to get the incremental data for the last 15 min.
Binlog server
There should be a binlog server which uses a reliable file storage system. It should be highly available so that we don’t miss any binlogs. For a sharded environment, we need to run separate binlog servers for each shard. For binlog server, mysql-ripple can be used. The lifecycle of a binlog server has to be managed independently.
Applying binlogs
While creating the recovery keyspace, we accept a timestamp. Using that information, we will extract the required GTID up to which the binlog will be applied to restored backup. The recovered replica will replicate from binlog server to apply the binlogs needed to get to the required GTID using the mysql replication command(START SLAVE UNTIL SQL_BEFORE_GTIDS = ‘xxxx-xx-xx:y-z’)
Note: we will choose the last GTID before the provided recovery timestamp.
Getting GTID from timestamp
While creating the recovery keyspace, we have got the required timestamp(#Ref) to restore up to. Also we have the GTID of the last recent backup (the time closer to the required time) E.g. for PITR for 6:15AM, the last recent backup is 6 AM ( considering we have 6 hr scheduled backup). Then we will connect with binlog server as replica, asking that start_pos = current_GTID of last backup and we will read all event sequentially till the timestamp of event is less than or equal to the requested timestamp(#Ref), once we reach here, we will note the GTID.
Getting the data till desired point of time.
At this point we have got the following things.
First, we will restore to the last available backup. Then we will connect to the binlog server as a replica with START SLAVE UNTIL SQL_BEFORE_GTIDS = ‘xxxx-xx-xx:y-z’ option, which will apply the incremental data till desired point of time.
FAQ
New configuration
While restoring the tablet, you have to specify the binlog server details as the command line argument of the vttablet process.
If we have multiple shards in keyspace, then you need to spawn multiple binlog servers and while doing recovery (of particular shard/shards), pass that information in the cmd line arguments.
Binlog server and its state management
As of now, there will be no binlog server provided out of box in vitess. You will have to spawn the binlog server yourself and connect it with the master tablet’s database. Since the master tablet can be changed via reparenting/other ways, you have to change the binlog server to point to the new master. Also the binlog server needs to be highly available as the binlog files are critical for the restore. If you have a sharded database, then you will need multiple binlog servers for each master of shard.
The text was updated successfully, but these errors were encountered: