-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VDiff: Use byte compare if weight_string() returns null for either source or target #7696
VDiff: Use byte compare if weight_string() returns null for either source or target #7696
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
go/vt/wrangler/vdiff.go
Outdated
// This detects if we are using weight_string() to compare this (text) column. | ||
// If either source or target weight_string is null we fallback to a byte compare | ||
if sourceRow[col].IsNull() && sourceRow[i].IsText() && col > i { | ||
col = i | ||
} | ||
if targetRow[col].IsNull() && targetRow[i].IsText() && col > i { | ||
col = i | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I confess I'm completely lost in this logic. What is the logical connection between col
(column) and i
(array index)? What does it mean where col > i
and what does it mean to assign col = i
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VDiff adds additional pseudo-columns to the rows which contain the weight_string() for text fields and points to that column for comparison purposes. When those are null (for reasons mentioned in #7296) the PR falls back to the original column)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I still don't understand the connection between col
and i
. They seem to be in different spaces. i
is an array index. col
is, ... col. Why would you test col > i
or assign col = i`?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
col
is initialized to be the index i
and then overwritten for text fields by pointing to the weight_string(): https://github.com/vitessio/vitess/blob/98e4e0e9bfae1e1483e2f7fed952df455fe528af/go/vt/wrangler/vdiff.go#L436
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that compare
is being called with td.comparePKs
in one case and td.cmpareCols
in the other. I feel like this logic may not work correctly in the case of comparePKs.
Also, I think it will become more readable if we changed cols
to be a struct that contains a compareCol
and an originalCol
. If compareCol
fails, then we can fall back to originalCol
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that
compare
is being called withtd.comparePKs
in one case andtd.cmpareCols
in the other. I feel like this logic may not work correctly in the case of comparePKs.Also, I think it will become more readable if we changed
cols
to be a struct that contains acompareCol
and anoriginalCol
. IfcompareCol
fails, then we can fall back tooriginalCol
.
Refactored based on your suggestion. @sougou, please review both this change and the updated tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I still don't understand the connection between
col
andi
. They seem to be in different spaces.i
is an array index.col
is, ... col. Why would you testcol > i
or assign col = i`?
@shlomi-noach, when you get a chance please review the changes and see if it easier to understand now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
asking for clarifications, please see inline
98e4e0e
to
a506a80
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is now more readable
… target Signed-off-by: Rohit Nayak <[email protected]>
…f tests needed changes. Signed-off-by: Rohit Nayak <[email protected]>
a506a80
to
921c225
Compare
Description
VDiff compares text columns using the weight_string() of the value. However there are times when weight_string() can return a null value. Per the mysql reference
It is possible that just one of the weight_string(), for source or target is null or both are null, depending on the configuration of the source and target servers. Currently if one is null vdiff detects a mismatch and if both are null then it is considered a match. This is incorrect since, in such cases, vdiff needs to compare the actual data.
This PR detects such a condition and uses a byte compare instead.
See related issue for more details and repro.
Signed-off-by: Rohit Nayak [email protected]
Related Issue(s)
Checklist
Impacted Areas in Vitess
Components that this PR will affect: