-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
expression: add check for utf8 charset when decode to avoid invalid character #29765
Conversation
[REVIEW NOTIFICATION] This pull request has not been approved. To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
/cc @xiongjiwei @tangenta @zimulala @wjhuang2016 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it affected by sql_mode?
Sql_mode = "", the result is below
sql_mode=default, the result is below
Seems like |
parser/charset/encoding.go
Outdated
// Decode convert bytes from a specific charset to utf-8 charset. | ||
func (e *Encoding) Decode(dest, src []byte) ([]byte, error) { | ||
if e.name == "utf-8" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use a const variable.
// we should convert target into utf8 internal. | ||
exprInternal, _, _ := transform.String(encoding.NewDecoder(), target) | ||
return exprInternal, false, nil | ||
return target, false, nil | ||
} | ||
if types.IsBinaryStr(b.tp) { | ||
enc := charset.NewEncoding(b.args[0].GetType().Charset) | ||
expr, err = enc.EncodeString(expr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it ok to return err directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine. But not sure. PTAL @xiongjiwei
c80e9d7
to
2d64254
Compare
@Defined2014: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@@ -140,8 +141,40 @@ func (e *Encoding) EncodeInternal(dest, src []byte) []byte { | |||
return dest | |||
} | |||
|
|||
// validUTF8 checks whether there are illegal utf8 characters in []byte. | |||
func (e *Encoding) validUTF8(src []byte) ([]byte, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is utf8.Valid()
, we can avoid write it by ourself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implement this func by ourself is for compatible err msg
.
// Decode convert bytes from a specific charset to utf-8 charset. | ||
func (e *Encoding) Decode(dest, src []byte) ([]byte, error) { | ||
if e.name == encodings[CharsetUTF8].name { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tangenta PTAL, Do we need to check the utf8 here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason is #29685
@@ -150,6 +183,9 @@ func (e *Encoding) Decode(dest, src []byte) ([]byte, error) { | |||
|
|||
// DecodeString convert a string from a specific charset to utf-8 charset. | |||
func (e *Encoding) DecodeString(src string) (string, error) { | |||
if e.name == encodings[CharsetUTF8].name { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WIll an extra check cause a performance regression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DecodeString will called by the lexer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tangenta @Defined2014 Do you still want to go on with this issue, or leave it with #30288?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I will close it.
It's be covered by #30288. |
What problem does this PR solve?
Issue Number: close #29685, close #29735
Problem Summary: check the binary string when decode them into utf8
What is changed and how it works?
Check List
Tests
Side effects
Documentation
Release note