-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sink to csv: output old value to CSV files #10174
Conversation
Signed-off-by: zhangjinpeng1987 <[email protected]>
cdc/api/v2/model.go
Outdated
@@ -919,6 +919,7 @@ type CSVConfig struct { | |||
NullString string `json:"null"` | |||
IncludeCommitTs bool `json:"include_commit_ts"` | |||
BinaryEncodingMethod string `json:"binary_encoding_method"` | |||
EnableOldValue bool `json:"enable_old_value"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enable_old_value
has different meanings on different versions of ticdc and has been confusing for users, so a more explicit parameter name such as output_old_value
is recommended here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
force-split-update
is recommended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
output-old-value is more obvious from the semantic perspective, because split-update is just one specific implementation case of output old value in the context of sink to csv files.
There is no need to update the CSV implementation. It's recommended to split the update event https://github.com/pingcap/tiflow/blob/master/cdc/model/sink.go#L833 Add one new variable |
@3AceShowHand Thanks for your reminding. But
|
Signed-off-by: zhangjinpeng1987 <[email protected]>
Signed-off-by: zhangjinpeng1987 <[email protected]>
IMHO, TiCDC always fetch old value from the TiKV, and also output old value if the output format allows to do so. In this aspect, It's ok to name it as It's ok to support the |
@3AceShowHand @3AceShowHand @CharlesCheung96 Thanks for your review, please let me know if you have other concerns about this PR. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: 3AceShowHand, CharlesCheung96 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
/test dm-integration-test |
What problem does this PR solve?
Issue Number: close #10167
What is changed and how it works?
When changefeed configuration set:
As #10167 described, sink to csv replace update with a pair of delete and insert, in this way to record the old value for UPDATE statement, so it is possible to generate undo DML for some data repairing cases.
Before this change, the CSV output statement of
update schema.table set name="def" where id=1 and name="abc"
isAfter this change, the CSV output looks like:
Check List
Tests
Questions
Will it cause performance regression or break compatibility?
After set
[sink.csv] output-old-value = true
configuration for the sink to storage changefeed, for each UPDATE statement there will be 2 rows of data in the output CSV file. The output CSV files size is larger than before if there are many UPDATE statements.Do you need to update user documentation, design documentation or monitoring documentation?
Release note