-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Register Log-based Backup Task to PD. #29969
Comments
For debugging, JSON would be the perfect format, however, when transmitting become the bottleneck (!), maybe protobuf would be good. 🤔 |
According to the original design:
I guess the task would be like(the field while marshaling maybe shorten?): struct TaskInfo {
/// Backend url.
storage_url: String,
/// The last timestamp of the task has been updated.
/// This is a simple solution for unfrequent config changing:
/// When we watched a config change(via polling or etcd watching),
/// We perform a incremental scan between [last_update_ts, now),
/// for filling the diff data during conf changing.
/// The current implementation scan [0, now) for every task ranges newly added,
/// So this field is reserved for future usage.
last_update_ts: TimeStamp,
/// The timestamp range for the backup task.
start_ts: TimeStamp,
end_ts: TimeStamp,
/// the key ranges for the backup task.
/// generated by the client.
task_ranges: BTreeSet<(Key, Key)>
/// the table filter for displaying.
table_filter: String,
/// credential ?
} struct TaskInfo {
/// Backend url.
storage_url: String,
/// The object(index or data) name of the origin of the task, for displaying.
/// The full path of task info would be like:
/// TaskInfo:(task_name):(object_name)
/// (We can still perform a range scan for all tasks)
/// formatted like "{schema_name}.{table_name}.{"data"|index_name}"
object_name: String,
/// The timestamp range for the backup task.
start_ts: TimeStamp,
end_ts: TimeStamp,
/// the key range for the backup task.
/// generated by the client.
task_range: (Key, Key)
} |
The path at etcd may be like the path used by CDC.
|
shall we consider to integrate credentials of storage into TaskInfo. |
The operations BR needs: type MetadataService interface {
// Create New Tasks.
PutTask(task TaskInfo) error
// For Update range when meeting DDLs.
PatchTask(task string, rangeDiff []RangeDiff) error
// For Delete Task when stop.
DeleteTask(task string)
// FetchProgressOf a store, returning the min of `next backup ts` of each region in the store.
FetchProgressOf(store uint64) uint64
} the operations TiKV needs: enum TaskChange {
// Maybe batch them?
// Even for now, we perform scan on [0, now) for each change,
// Maybe adding the conf change TS here would be needed?
AddRange {
task_name: String,
start_key: Key,
end_key: Key,
},
RemoveRange {
task_name: String,
start_key: Key,
end_key: Key,
},
TaskAdd {
task_info: TaskInfo,
},
TaskRemoved {
task_name: String,
}
}
trait MetadataService {
fn all_tasks(&self) -> Result<Vec<TaskInfo>>;
fn task_by_name(&self, name: &str) -> Result<Option<TaskInfo>>;
/// Watch the change of a task.
/// Maybe also a per-task watch?
fn watch_task(&self) -> Stream<TaskChange>;
/// Update the next backup ts (aka checkpoint ts), this should be call after the backup archive successfully saved?
/// The store_id should be held by the impl of MetadataService.
/// Maybe also update a cache for querying?
fn update_next_backup_ts(&self, task: str, region_id: u64, next_backup_ts: TimeStamp) -> Result<()>;
/// fetch the `next_backup_ts` of some region.
/// for fail-over and do the incremental scanning.
fn next_backup_ts_of(&self, task: str, region_id: u64) -> Result<TimeStamp>;
} |
When the range size grows (say, someone backup a cluster with 100K tables with
We can watch the prefix We may add a new method |
If we watch the task key for pausing, things may get complex: we have to compare each field for the diff.
When the key was set, the task would be treated as paused. |
There is another problem about how we store the store level min backup ts and export it to the BR client? (The current implementation, #30088 chooses scan all regions in the /checkpoint// and calculate the minimal next backup TS, which must involve GC when region split / merged for prevent stale next backup TS.) |
We need:
The text was updated successfully, but these errors were encountered: