WIP: Vreplication streamer to convert textual columns to UTF8 #8356
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Experimental
Description
In #8322 we introduced support for non-UTF8 characters. In #8322 it was the responsibility of the initiator of the stream (online DDL in this case) to analyze which columns had to convert to UTF8. It then needed to signal to the streamer the identities of those columns. This was done by applying
CONVERT(col USING utf8mb4)
. Streamer analyzed the query, found aConvertUsing
expression, and re-appliedCONVERT(col USING utf8mb4)
inSendQuery
.This PR moves the burden to the streamer. The streamer reads columns from a table. It has direct access to the table. It is in the best position to determine which columns are textual and not UTF8.
In this PR the streamer issues a query on
information_schema.columns
to determine which columns need to be converted to UTF8. Online DDL doesn't need to hint any more, and we are able to strip out some excessive code.But there's a problem
This doesn't work yet. Curiously, it is apparently still required that
vrepl.go
issues asb.WriteString(fmt.Sprintf("convert(%s using utf8mb4)", escapeName(name)))
. If I replace that withsb.WriteString(escapeName(name))
, then alatin1
text in alatin1
character set gets garbled and deciphered incorrectly.To elaborate, what
vrepl.go
does is to generate the Filter/Rule query for vreplication.But this confuses me. I thought the filter query in VReplication is mostly meaningless. That is, it is not the actual query that runs on the source table. Vstreamer completely rewrites the query after analyzing its expressions. Assuming vstreamer always reads textual columns via
convert(col using utf8mb4)
, I don't see why it matters if the filter query does or does not haveconvert(col using utf8mb4)
.The code as it is today, works. But I do want to clean it up and make the filter query as simple as possible. So I'd like to pursue this issue.
Related Issue(s)
#8322
Checklist
Deployment Notes