digester: avoid infinite loop when an invalid token is seen #313
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
Starting from pingcap/tidb#10284, executing the following would cause TiDB to enter an infinite loop:
This is caused by multiple issues:
MySQL does not permit non-BMP characters like
🥳
in an identifier, no matter quoted or not. But TiDB's and MySQL's behavior differ on such non-BMP characters.🥳
and quoted`🥳`
, but both will be translated into a question mark`?`
.🥳
a lexer error, but accepts the quoted`🥳`
as-is.The
parser.Normalize
strips away the backquotes around the identifier, while MySQL adds the backquotes in all cases.Since *: support
select
/explain select
using bind info tidb#10284, when we execute a SELECT statement, it will callGetBindRecord
. The problem isGetBindRecord
callsparser.DigestHash
on a normalized SQL (*: support "add session binding" tidb#10247), so we will double-normalize the statement before computing the hash.So, up till now, the statement
SELECT * FROM `🥳`;
is (1) accepted by the parser, (2) normalized intoSELECT * FROM 🥳;
, and (3) normalized again before calculating the hash.🥳
is treated as a lexer error. The lexer will just return the invalid token without advancing the cursor. This throws thenormalize
function into an infinite loop.What is changed and how it works?
Break the infinite loop if a lexer error is encountered, so at least running
SELECT * FROM `🥳`;
won't DoS the server.We could think of how to fix the other 3 problems in other PRs.
Check List
Tests
Code changes
Side effects
Related changes