-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rfc(dm): add rfc for continuous data validation #6391
Merged
Merged
Changes from all commits
Commits
Show all changes
18 commits
Select commit
Hold shift + click to select a range
7b80cdb
add rfc
D3Hunter 8858388
Update dm/docs/RFCS/20220721_continuous_data_validation.md
D3Hunter 6053342
Update dm/docs/RFCS/20220721_continuous_data_validation.md
D3Hunter 8980c17
Update dm/docs/RFCS/20220721_continuous_data_validation.md
D3Hunter a27a6d6
Update dm/docs/RFCS/20220721_continuous_data_validation.md
D3Hunter dd809de
Update dm/docs/RFCS/20220721_continuous_data_validation.md
D3Hunter 46e142d
Update dm/docs/RFCS/20220721_continuous_data_validation.md
D3Hunter a40f87d
Update dm/docs/RFCS/20220721_continuous_data_validation.md
D3Hunter 25eac7c
Update dm/docs/RFCS/20220721_continuous_data_validation.md
D3Hunter ef59e40
Update dm/docs/RFCS/20220721_continuous_data_validation.md
D3Hunter d975c06
Update dm/docs/RFCS/20220721_continuous_data_validation.md
D3Hunter 823a698
Update dm/docs/RFCS/20220721_continuous_data_validation.md
D3Hunter 220e426
Update dm/docs/RFCS/20220721_continuous_data_validation.md
D3Hunter d20af6b
Update dm/docs/RFCS/20220721_continuous_data_validation.md
D3Hunter 33b2a5f
fix comments
D3Hunter b7117cf
Merge remote-tracking branch 'origin/validator-design-doc' into valid…
D3Hunter 5b9b255
fix comments
D3Hunter e073ce7
Merge branch 'master' into validator-design-doc
ti-chi-bot File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,64 @@ | ||
# Continuous Data Validation | ||
|
||
## Background | ||
|
||
TiDB already has [`sync-diff-inspector`](https://docs.pingcap.com/tidb/stable/sync-diff-inspector-overview) to do data validation for full data migration, but for incremental data migration there's no such option. We can do full data validation using `sync-diff-inspector` during database downtime for maintenance to validate all incremental data migrated since last full data validation, but its time window is too short for `sync-diff-inspector` to completed and it might be too late to find any incorrect-migrated rows. We need an alternative to do incremental data validation which can validate incremental data in a more real-time way so that we can find incorrect-migrated rows earlier and has less pressure on upstream and downstream database where during validation business queries can run normally. | ||
|
||
## Limition | ||
|
||
Some known limitions to this design: | ||
- table which needs validation should have primary key or non-null unique key | ||
- there should only be compatible DDL operations in binlog stream between current syncer location and validation location, to make sure validation to use current DDL in schema-tracker to validate row change in history binlog: | ||
- there is no operation to change primary key or non-null unique key | ||
- there is no column order change operation | ||
- there is no drop column operation | ||
- table in downstream which needs validation cannot be dropped | ||
- do not support validation for tasks which enable extend-column | ||
- do not support validation for tasks which enable event filtering by expression | ||
- TiDB implements floating point data types differently with MySQL, we take it as equal if its absolute difference < 10^-6 | ||
- do not support validation for JSON and binary data types | ||
|
||
## Concepts | ||
|
||
- `row change`: validator decodes upstream binlog into `row change` which contains: | ||
- change type(`insert`/`update`/`delete`) | ||
- table name | ||
- data before change(missing for `insert` change) | ||
- data after change(missing for `delete` change). | ||
- `failed row change`: `row change` which has failed to validate but hasn't been marked as `error row change` | ||
- `error row change`: if some `failed row change` keeps failing to validator for enough time(`delay time`), it's marked as `error row change` | ||
|
||
## Detailed Design | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. its name is "Detailed" design, I expect there should also exist "brief" introduction There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it's from google doc template |
||
|
||
### Life cycle of validator | ||
|
||
Validator can be enabled together with the task or enabled on the fly, the task which enables validator should has an incremental migration unit, i.e. syncer. Validator will be deleted when the task is deleted. Enabled validator will be in `running` state, it can be stopped manually or stopped when meeting error that cannot be recovered automatically, subsequently turning to `stopped` state. Validator in `stopped` state can be started again and return to `running` state. | ||
|
||
### Validation process | ||
|
||
1. Validator in `running` state will pull binlog from upstream and get `row change`s. | ||
- validator will only validate `row change` which has already been migrated by syncer. | ||
2. After routing and filtering using the same rule as in syncer, `row change` is dispatched to validator `worker`. `row change` of same table and primary key will be dispatched to the same `worker`. | ||
3. `worker` merges `row change` by table and primary key, we will use the last `row change` since last change overrides previous one. Then put it into `pending row changes`. | ||
4. After accumulating enough `row change`s or after a set interval, `worker` queries the downstream to fetch replicated data with respect to those `row change`s, then compares `row change`s and their downstream counterpart | ||
- for `insert`/`update` type of row change, we validate them differently regarding the validation mode | ||
- in `full` validation mode, we compare them column by column | ||
- in `fast` validation mode, we only check its existence | ||
- for `delete` type of `row change`, downstream should not contain that data. | ||
5. For `row change`s which are validated successfully, worker will remove them from `pending row changes`, while others failing the validation will be marked as `failed row change` and be validated again after a set interval. | ||
6. If a `failed row change` doesn't pass the validation after a set time(`delay time`) since its first validation, we mark it as `error row change` and save it into the meta database. | ||
|
||
### False positive | ||
|
||
`error row change` produced in validation process might not be data which is incorrectly migrated, there are cases where `row change` is marked falsely. Suppose some rows keep changing on upstream for a time period > `delay time`. If it's marked as `failed row change` since the first time it changes, validator may mark it as `error row change` falsely. In real world scenarios, it's not common. | ||
|
||
To reduce the chance of false positive, validator will not start marking `failed row change` until validator has reached the progress of syncer or after some `initial delay time` | ||
|
||
|
||
### Validation checkpoint | ||
|
||
Validator will start validation from previous location after failover or resuming. Validator will save current location, current `pending row changes` and `error row change`s into meta-db after some interval. | ||
|
||
### Self-protection | ||
|
||
Validator `worker` will cache `pending row changes` in memory, to avoid potential OOM, we add a self-protect mechanism. If there are too many `pending row changes` or the overall size of `pending row changes` is too large, validator will be stopped automatically. |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a workaround for binary data type?