-
Notifications
You must be signed in to change notification settings - Fork 490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support sql digest #32
Conversation
This comment has been minimized.
This comment has been minimized.
Yes, I think it is important for compatibility. Although I will clarify that I think it should be a non-requirement to have matching digests to MySQL. It sounds just too difficult. |
Two functions to be aware of:
Although they are from 8.0, you might find adding them useful for writing tests. |
Interesting. On MySQL 8.0 the hash function is changed from MD5 to SHA-256, but the output definitely isn't the SHA-256 of the normalized string.
>>> hashlib.sha256(b'SELECT ?').hexdigest()
'66cbb3a40d4bbd150b75825ad291a6545399f3098fc1079e4d8b5bb061a6a481' Apparently MySQL hashes the parsed token stream directly (we could do the same here 😉), meaning we're never going to match the digest as discussed before. |
@kennytm wow, looks like it did change to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
normalize.go
Outdated
// Digest generates a digest(or sql-id) for a SQL. | ||
// the purpose of digest is to identity a group of similar SQLs, then we can do other logic base on it. | ||
func Digest(sql string) string { | ||
d := digesterPool.Get().(*digester) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about:
d := digesterPool.Get().(*digester)
defer func() {
d.buffer.Reset()
digesterPool.Put(d)
}()
and remove line 48~49?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should better don't use defer
as we can in hot-code path and I have do some refactor PTAL :D
https://medium.com/i0exception/runtime-overhead-of-using-defer-in-go-7140d5c40e32
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may need more comments for this commit, which will make this PR easier to be understood.
digester.go
Outdated
} | ||
|
||
func (d *sqlDigester) isPrefixByUnary(currTok int) (isUnary bool) { | ||
if !isNumLit(currTok) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that isNumLit(-1) will return false. So isPrefixByUnary(-1) return false?
SignedLiteral:
Literal
{
$$ = ast.NewValueExpr($1)
}
| '+' NumLiteral
{
$$ = &ast.UnaryOperationExpr{Op: opcode.Plus, V: ast.NewValueExpr($2)}
}
| '-' NumLiteral
{
$$ = &ast.UnaryOperationExpr{Op: opcode.Minus, V: ast.NewValueExpr($2)}
}
NumLiteral:
intLit
| floatLit
| decLit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@XuHuaiyu No, this PR only use lexer without parser, so alway see token stream like [1]
or [-, 1]
or [+, 1]
digester.go
Outdated
} | ||
|
||
const ( | ||
genericSymbol = -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to add comments for these 2 constants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What problem does this PR solve?
Now, we are hard to identity a group of similar SQLs, but in real product most SQLs execute by program is some similar SQLs that only different in parameter part(yes they maybe not use prepare too).
This PR want to generate a digest for sql group, so later we can do some useful sth base on it.
What is changed and how it works?
Check List
Tests
Code changes
This change is