-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VReplication: Pad binlog values for binary() columns to match the value returned by mysql selects #7969
Conversation
91b04bf
to
57842d7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
// the data without this padding. In particular, this causes an issue if a binary(n) column is part of the | ||
// sharding key: the keyspace_id() returned during the copy phase (where the value is the result of a mysql query) | ||
// is different from the one during replication (where the value is the one from the binlogs) | ||
// Hence we need to add the padding here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What a catch this one!!
Just reviewed with Sugu and he recommended that the padding should be done right when it is read from the binlog and not just for computing the keyspace id. That way the padded value, which is also the result of a mysql select, will be sent to all consumers of the vstream. So moving it to draft for now. |
… a select query. This also ensures that if such columns are used as sharding keys we get the same keyspace_id Signed-off-by: Rohit Nayak <[email protected]>
…rs see the padded value instead of doing it later in vstreamer or doint it just for keyspace id computation Signed-off-by: Rohit Nayak <[email protected]>
966763e
to
91357c6
Compare
Signed-off-by: Rohit Nayak <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's painful to see how small the change is :)
paddedData := make([]byte, max) | ||
copy(paddedData[:l], mdata) | ||
mdata = paddedData | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a bug here for CHAR columns using binary collations. If you have e.g. CHAR (3) COLLATE UTF8MB4_BIN
then the padding goes out to the max byte length of 12 and you get this in the vreplication stream:
'FOO\0\0\0\0\0\0\0\0\0'
and the SQL fails because it's too long for the column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like a similar bug may exist with CHAR(N), where we may need to pad with spaces.
}, | ||
"tables": { | ||
"tables": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit nit - extra whitespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Let us plan to backport this to 10.0, 9.0 and 8.0. |
Also, just realized there was an almost identical recent contribution to |
…ue returned by mysql selects Backport of vitessio#7969 * Pad binlog values for binary() columns to match the value returned by a select query. This also ensures that if such columns are used as sharding keys we get the same keyspace_id * Pad binary() values in the binlog reader directly so that all consumers see the padded value instead of doing it later in vstreamer or doint it just for keyspace id computation Signed-off-by: Rohit Nayak <[email protected]> Signed-off-by: Andres Taylor <[email protected]>
Signed-off-by: Rohit Nayak <[email protected]>
Signed-off-by: Rohit Nayak [email protected]
Description
For fixed length binary columns, mysql internally pads the values on the right will nulls. However the binlogs contain the original value. This leads to a problem in vreplication workflows if a binary column is used as a sharding key.
During the copy phase the row is read using a mysql select and the column has the padded value. However while replicating we get the unpadded value. This means that we get different keyspace_ids. We found bugs where the copy phase inserted the row in one shard but the update during the catchup phase routed this row to another shard where, of course, the update failed.
The PR pads the value found in the binlog to match the value returned my mysql to fix this issue.
Related Issue(s)
#3984
Checklist