Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Take MySQL Column Type Into Account in VStreamer #9331

Merged
merged 2 commits into from
Dec 9, 2021

Conversation

mattlord
Copy link
Contributor

@mattlord mattlord commented Dec 7, 2021

Description

In #7969 we added right side null-byte padding to BINARY columns after processing the binlog event in order to match the MySQL behavior and value so that we correctly calculate the keyspace IDs for BINARY columns in vindex functions like binary_md5 and correctly apply vreplication filters using those columns.

It turns out, however, that row based binlog events make no distinction between a BINARY(4) column and a CHAR(4) column with a binary collation like utf8mb4_bin (you can see a detailed discussion here). So after #7969 we were also, incorrectly, adding right side null-byte padding to CHAR columns with binary collations.

Although we ensured we did not add more padding than the actual column would hold in #8730, we really shouldn't be adding any padding at all to CHAR columns in order to match the MySQL behavior (otherwise we have discrepancies as described here). But in order to do this we needed to thread the target MySQL column type info through the vstreamer and RBR binlog event processing components so that we ONLY add the padding to values for actual BINARY columns and nothing else (CHAR columns being the known problematic case today).

Now we only add the padding if the underlying MySQL column on the target is BINARY and then it’s just bytes, not chars made of up N bytes, so the subsequent pad trimming based on the charset was removed.

The manual test case here also now passes in this branch:

vitess@163dba62a8fb:/vt/local$ mysql -h 127.0.0.1 -P 15306 customer -e "select customer_id, hex(md5) from customer;"
+-------------+------------------------------------------------------------------+
| customer_id | hex(md5)                                                         |
+-------------+------------------------------------------------------------------+
|           1 |                                                                  |
|           2 | 6132613733393661326666396365333930373461613932643233663339633362 |
|           3 |                                                                  |
|           4 | 6132613733393661326666396365333930373461613932643233663339633362 |
+-------------+------------------------------------------------------------------+

vitess@163dba62a8fb:/vt/local$ vtctlclient -server localhost:15999 VDiff customer.commerce2customer
Summary for table customer:
	ProcessedRows: 4
	MatchingRows: 4
	MismatchedRows: 0
	ExtraRowsSource: 0
	ExtraRowsTarget: 0
Summary for table corder:
	ProcessedRows: 0
	MatchingRows: 0
	MismatchedRows: 0
	ExtraRowsSource: 0
	ExtraRowsTarget: 0

vitess@163dba62a8fb:/vt/local$ vtgate --version
Version: 13.0.0-SNAPSHOT (Git revision 3ff9fbdad7 branch 'NullBytePaddingFinalFinal') built on Thu Dec  9 04:04:01 UTC 2021 by vitess@09daf1423251 using go1.17 linux/amd6

Related Issue(s)

Fixes: #9207

Checklist

  • Should this PR be backported? Yes, to 12.0
  • Tests were added
  • Documentation is not required

@mattlord mattlord force-pushed the NullBytePaddingFinalFinal branch 10 times, most recently from 1882a7c to 99dc068 Compare December 8, 2021 05:19
@mattlord mattlord changed the title Null byte padding final final Take MySQL Column Type Into Account in VStreamer and RBR Binlog Event Processing Dec 8, 2021
@mattlord mattlord changed the title Take MySQL Column Type Into Account in VStreamer and RBR Binlog Event Processing Take MySQL Column Type Into Account in VReplication Dec 8, 2021
@mattlord mattlord force-pushed the NullBytePaddingFinalFinal branch from 99dc068 to ffd97a4 Compare December 8, 2021 05:40
@mattlord mattlord added this to the v12.0.1 milestone Dec 8, 2021
@mattlord mattlord self-assigned this Dec 8, 2021
@mattlord mattlord force-pushed the NullBytePaddingFinalFinal branch from ffd97a4 to 348ae55 Compare December 8, 2021 06:09
@mattlord mattlord changed the title Take MySQL Column Type Into Account in VReplication Take MySQL Column Type Into Account in VStreamer Dec 8, 2021
@mattlord mattlord force-pushed the NullBytePaddingFinalFinal branch 4 times, most recently from f79ea0f to 529da1b Compare December 8, 2021 16:12
This is required when we need to match MySQL behavior for data
that requires column type information as well. For example, the
binlog event metadata makes no distinction between events for a
BINARY(4) column and events for a CHAR(4) column with a binary
collation like utf8mb4_bin. So we need to know the underlying
MySQL column type in order to handle them disctinctly -- MySQL
pads (fixed length) binary columns on the right side with null
bytes, but it does NOT do that for (fixed lengthed) CHARo
columns, regardless of the collation.

Signed-off-by: Matt Lord <[email protected]>
@mattlord mattlord force-pushed the NullBytePaddingFinalFinal branch 3 times, most recently from 83048d6 to 13ea4ca Compare December 8, 2021 21:52
@mattlord mattlord force-pushed the NullBytePaddingFinalFinal branch 7 times, most recently from 8fd8f3f to 421f4d1 Compare December 9, 2021 02:27
And use ToLower when looking for BINARY types to be safe.

Signed-off-by: Matt Lord <[email protected]>
@mattlord mattlord force-pushed the NullBytePaddingFinalFinal branch from 421f4d1 to 3ff9fbd Compare December 9, 2021 03:39
@mattlord mattlord marked this pull request as ready for review December 9, 2021 04:03
@mattlord mattlord requested review from deepthi and removed request for systay, shlomi-noach and harshit-gangal December 9, 2021 04:03
Copy link
Contributor

@rohit-nayak-ps rohit-nayak-ps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.
Very nice! An elegant solution to another binlog parser edge case.

Copy link
Member

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very thorough code comments and nice test case. 💯

@deepthi deepthi merged commit be2d337 into vitessio:main Dec 9, 2021
@deepthi deepthi deleted the NullBytePaddingFinalFinal branch December 9, 2021 23:24
@mattlord mattlord removed this from the v12.0.1 milestone Dec 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

VReplication discrepancy between copy vs catchup phases for binary columns
3 participants