Register Log-based Backup Task to PD. #29969

YuJuncen · 2021-11-22T03:01:28Z

We need:

TiKV: Listen from etcd(Is there a v3 client?).
TiDB: Client for push TaskInfo.

YuJuncen · 2021-11-22T03:17:43Z

For debugging, JSON would be the perfect format, however, when transmitting become the bottleneck (!), maybe protobuf would be good. 🤔

YuJuncen · 2021-11-22T04:12:26Z

According to the original design:

TaskInfo:(task_name) → (storage_url, start_ts, end_ts, …)
NextBackupTS:(task_id, store_id, region_id) → next_backup_TS
BackupStatus:(task_id, store_id) → min_resolve_TS

I guess the task would be like(the field while marshaling maybe shorten?):

struct TaskInfo {
  /// Backend url. 
  storage_url: String,
  /// The last timestamp of the task has been updated. 
  /// This is a simple solution for unfrequent config changing: 
  /// When we watched a config change(via polling or etcd watching),
  /// We perform a incremental scan between [last_update_ts, now), 
  /// for filling the diff data during conf changing.
  /// The current implementation scan [0, now) for every task ranges newly added, 
  /// So this field is reserved for future usage.
  last_update_ts: TimeStamp,
  /// The timestamp range for the backup task.
  start_ts: TimeStamp,
  end_ts: TimeStamp,
  /// the key ranges for the backup task. 
  /// generated by the client. 
  task_ranges: BTreeSet<(Key, Key)>
  /// the table filter for displaying.
  table_filter: String,
  /// credential ?
}

An other solution without involving the `last_update_ts` field would be keeping tasks immutable, and only one range for one task, corresponding to each index or data of some table:
struct TaskInfo { /// Backend url. storage_url: String, /// The object(index or data) name of the origin of the task, for displaying. /// The full path of task info would be like: /// TaskInfo:(task_name):(object_name) /// (We can still perform a range scan for all tasks) /// formatted like "{schema_name}.{table_name}.{"data"|index_name}" object_name: String, /// The timestamp range for the backup task. start_ts: TimeStamp, end_ts: TimeStamp, /// the key range for the backup task. /// generated by the client. task_range: (Key, Key) }

YuJuncen · 2021-11-22T05:05:29Z

The path at etcd may be like the path used by CDC.
(The semantic of NextBackupTS is pretty like the checkpoint ts in CDC, which means all data before this TS have been sent to the downstream.)

/tidb/br-stream/info/<task-name> → <JSON of TaskInfo>
/tidb/br-stream/checkpoint/task-name/store-id/region-id → <next_backup_ts_of_region>

~~/tidb/br-stream/status/task-name/store-id → <min_resolved_ts_of_store>~~

3pointer · 2021-11-22T05:34:12Z

shall we consider to integrate credentials of storage into TaskInfo.

YuJuncen · 2021-11-22T06:53:54Z

The operations BR needs:
(Assuming we use the task name as its ID.)

type MetadataService interface {
  // Create New Tasks.
  PutTask(task TaskInfo) error
  // For Update range when meeting DDLs.
  PatchTask(task string, rangeDiff []RangeDiff) error
  // For Delete Task when stop.
  DeleteTask(task string)
  // FetchProgressOf a store, returning the min of `next backup ts` of each region in the store.
  FetchProgressOf(store uint64) uint64
}

the operations TiKV needs:

enum TaskChange {
  // Maybe batch them?
  // Even for now, we perform scan on [0, now) for each change, 
  // Maybe adding the conf change TS here would be needed?
  AddRange {
    task_name: String,
    start_key: Key,
    end_key: Key,
  },
  RemoveRange {
    task_name: String,
    start_key: Key,
    end_key: Key,
  },
  TaskAdd {
    task_info: TaskInfo,
  },
  TaskRemoved {
    task_name: String,
  }
}

trait MetadataService {
  fn all_tasks(&self) -> Result<Vec<TaskInfo>>;
  fn task_by_name(&self, name: &str) -> Result<Option<TaskInfo>>;
  /// Watch the change of a task. 
  /// Maybe also a per-task watch? 
  fn watch_task(&self) -> Stream<TaskChange>;
  ///  Update the next backup ts (aka checkpoint ts), this should be call after the backup archive successfully saved?
  /// The store_id should be held by the impl of MetadataService.
  /// Maybe also update a cache for querying?
  fn update_next_backup_ts(&self, task: str, region_id: u64, next_backup_ts: TimeStamp) -> Result<()>;
  /// fetch the `next_backup_ts` of some region.
  /// for fail-over and do the incremental scanning.
  fn next_backup_ts_of(&self, task: str, region_id: u64) -> Result<TimeStamp>;
}

YuJuncen · 2021-11-22T09:26:57Z

When the range size grows (say, someone backup a cluster with 100K tables with -f '*.*'), the request size limit of etcd might be exceed.
We may extract the backup range out of TaskInfo. That is:

remove the task_ranges field in the TaskInfo struct.
add a new key family in etcd:

/tidb/br-stream/ranges/<task-name>/<range_start> → <range_end>

We can watch the prefix /tidb/br-stream/ranges/<task-name>, so when any task changed, we can make a notification.

We may add a new method get_range_intersection_of(range: (Vec<u8>, Vec<u8>)) -> Vec<(Vec<u8>, Vec<u8>)> for perform a bisection search and find the intersection of the provided range(probably the key range of some region) and the target ranges of the task for farther filtering.

YuJuncen · 2021-11-23T08:46:11Z

If we watch the task key for pausing, things may get complex: we have to compare each field for the diff.
Like ranges, I prefer extract it into other keys:

/tidb/br-stream/pause/<task-name> → ()

When the key was set, the task would be treated as paused.

YuJuncen · 2021-11-24T07:15:47Z

There is another problem about how we store the store level min backup ts and export it to the BR client? (The current implementation, #30088 chooses scan all regions in the /checkpoint// and calculate the minimal next backup TS, which must involve GC when region split / merged for prevent stale next backup TS.)

YuJuncen mentioned this issue Nov 22, 2021

Development Tasks for Log-based Increment Backup, directly on TiKV #29501

Closed

41 tasks

YuJuncen added component/br This issue is related to BR of TiDB. sig/migrate type/feature-request Categorizes issue or PR as related to a new feature. labels Nov 22, 2021

This was referenced Nov 22, 2021

br: add stream backup task info YuJuncen/kvproto#1

Closed

br: add br backup info pingcap/kvproto#836

Merged

This was referenced Nov 24, 2021

BR: add stream backup meta client #30088

Merged

Make TiKV aware PD tasks and starts backup. #30098

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Register Log-based Backup Task to PD. #29969

Register Log-based Backup Task to PD. #29969

YuJuncen commented Nov 22, 2021 •

edited

Loading

YuJuncen commented Nov 22, 2021

YuJuncen commented Nov 22, 2021 •

edited

Loading

YuJuncen commented Nov 22, 2021 •

edited

Loading

3pointer commented Nov 22, 2021

YuJuncen commented Nov 22, 2021 •

edited

Loading

YuJuncen commented Nov 22, 2021

YuJuncen commented Nov 23, 2021

YuJuncen commented Nov 24, 2021

Register Log-based Backup Task to PD. #29969

Register Log-based Backup Task to PD. #29969

Comments

YuJuncen commented Nov 22, 2021 • edited Loading

YuJuncen commented Nov 22, 2021

YuJuncen commented Nov 22, 2021 • edited Loading

YuJuncen commented Nov 22, 2021 • edited Loading

3pointer commented Nov 22, 2021

YuJuncen commented Nov 22, 2021 • edited Loading

YuJuncen commented Nov 22, 2021

YuJuncen commented Nov 23, 2021

YuJuncen commented Nov 24, 2021

YuJuncen commented Nov 22, 2021 •

edited

Loading

YuJuncen commented Nov 22, 2021 •

edited

Loading

YuJuncen commented Nov 22, 2021 •

edited

Loading

YuJuncen commented Nov 22, 2021 •

edited

Loading