-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collations Module Integration #9018
Conversation
So I ran some integration tests for this yesterday, and we got some bad results. Here's a log:
As you can see, we're creating a simple table where each column has a different charset + collation and inserting the same string on it. When performing a This is bad! It looks like a bug in MySQL but this behavior is not going to change in the future, so we need to work around it. @harshit-gangal had some ideas on using Vitess' table schemas to detect these collations instead of depending on the Field data. Harshit, could you please explain these to @king-11? How confident are you feeling about this approach? |
General comment - this PR all about collations and not so much about charsets. We should use the word collation and not charset for this. https://dev.mysql.com/doc/refman/8.0/en/charset-general.html |
Another general comment - since we need to do planner work to make sure that the engine primitives and evalengine get the right info, what about focusing this PR to make it work well with grouping ( |
@systay: Yes, but do note that the field in |
This sounds like a great idea. Let's make grouping work perfectly and leave the other callsites for another time. This means that everywhere else we don't pass a collation ID to the comparison function. That should be an easy way to unblock @king-11. |
My reading of https://dev.mysql.com/doc/internals/en/character-set.html#packet-Protocol::CharacterSet makes me think that it is the charset, and not the collation the You are right, it's all very confusing. Anyway - does it make sense to not perpetuate this confusion more than necessary in our code base? :) |
6cea81e
to
720ad5e
Compare
0aaf5e3
to
dc4fe47
Compare
96dd315
to
42e5731
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing more than what @systay wrote to add. His comments need to be addressed, otherwise lgtm
Signed-off-by: Lakshya Singh <[email protected]>
reduce map size and also check with a constant not literal Signed-off-by: Lakshya Singh <[email protected]>
Signed-off-by: Lakshya Singh <[email protected]>
Signed-off-by: Lakshya Singh <[email protected]>
Signed-off-by: Lakshya Singh <[email protected]>
Collate returns +ve and -ve value add switch case on it to return -1, 1 or 0 only Signed-off-by: Lakshya Singh <[email protected]>
Signed-off-by: Lakshya Singh <[email protected]>
Signed-off-by: Lakshya Singh <[email protected]>
Signed-off-by: Lakshya Singh <[email protected]>
Signed-off-by: Lakshya Singh <[email protected]>
@systay made the changes as per your review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
I'm not sure how, but this seems to have broken vitess/go/vt/vtgate/evalengine/arithmetic.go Line 227 in ab2f0b4
Does not compile:
right now unit tests and some vreplication tests are failing. |
#9131 contains the fix for main ☝️ |
Description
Integration of Collation Module into Vitess Engine
Adds in the collation module into the Vitess Code which can then be used to replace instances of the
weight_string
function used in the vitess codebase. The major change that this PR makes is with theNullSafeComparator
vitess/go/vt/vtgate/evalengine/arithmetic.go
Line 187 in 315d907
which now takes in additional collation info parameter which is used by the
collations.LookupByID
to find out the collation and then use it to collate thevarchar
strings based on their collation set.Update Function Calls
So given the
NullSafeComparator
needs access to collation info we intend to provide the information available to it by updating the function calls and trying to fetch charset info from theresult
orFields
available at the least distant parent function calls and trickle it down the succeeding calls.Related Issue(s)
Checklist
Deployment Notes