-
Notifications
You must be signed in to change notification settings - Fork 999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Several fixes for parsing issues in mysql_query_digest_and_first_comment
#3680
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
renecannao
changed the title
Several fixes for parsing issues in 'mysql_query_digest_and_first_comment'
Several fixes for parsing issues in Oct 29, 2021
mysql_query_digest_and_first_comment
…r of deletion for avoiding unnecessary removals
… char has been replaced
…en followed by number
…ses covered by digests specification
retest this please. |
…t' implementation
…digest_and_first_comment'
…ql_query_digest_and_first_comment_one_it' as part of query digests rework
…fferent parsing stages of query digests rework
…stead of skipping characters copy
…e and some functions params order
…or grouping and groups grouping
… grouping support
retest this please |
…d_first_comment_2' + Improved stage parsing to get the maximum compression from the maximum digest size imposed by 'mysql_thread___query_digests_max_query_length'. + Homogenized staging API and improved offset computation between stages. + Improved documentation (WIP). + Simplified literal parsing logic for some states of stage 1 parsing.
…t' to 'test_tokenizer-t'
…est result buffer
…and_first_comment' into old one
…lementation * Added missing documentation, including stages implementation reasoning. * Refactored common stages operations into isolated functions. * Fixed issue with operator collapsing for 'stage 2'. * Fixed compression edge cases founds with latest testing.
…plementation * Improved testing format and introduced new version of grouping features testing using random query generation. * Provided general test description and improved documentation. * Allowed to individually execute the different kind of tests supported.
…s_stages-t.cpp' left as doc
…nd 'process_cmnt_type_2'
…on 'mysql_query_digest_and_first_comment_2'
retest this please. |
1 similar comment
retest this please. |
retest this please |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR provides a new implementation for query digests generation, address multiple issues with the current implementation, and provides a new configuration option for the digests:
Issues fixed
1. Queries ending with comment without spaces doesn't get the comment removed properly.
# random_comment\n select 1.1# final_comment \n
select ?# final comment
2. Numbers in scientific notation are not properly parsed because exponent is camel sensitive.
SELECT 1.2E3, 1.2E-3, -1.2E3, -1.2E-3
SELECT 12E3, 12E-?,-12E3,-12E-?
3. For query grouping
NULL
values doesn't reset the grouping count, this leads to cases as described in'Grouping section' limitations description.
4. Commas preceding
NULL
values are not properly replaced in the digest.5. Commas are currently not copied if a grouping query is taking place, this conflicts with possible
NULL
values that can be found, but that won't be replaced if
mysql-query_digests_replace_null
is0
:5. Arithmetics operators break grouping:
6. When no digits is enabled, if there is any space between a
identifier
which name finish with a number,and a closing parenthesis, the space is collapsed.
This is not consistent with regular behavior:
7. Spaces not removed after parenthesis when literal strings are preceded by '+|-'
8. Operators not removed when extra space precedes the value.
9. Buffer overrun detected when comment isn't finished by
*/
mark:Query:
ASAN Output:
10. Double spaces not properly suppressed for some comments of kind
/* */
Actual:
Expected:
11. Signs are not properly removed when preceding literal numbers if they are surrounded by spaces.
Aside of these and other minor fixes found during development.
New configuration option
This PR introduces the new configuration variable 'query_digests_groups_grouping_limit'. When enabled together with 'query_digests_grouping_limit' performs a replacement of the groups pattern found described by 'query_digests_grouping_limit' when the number of groups exceeds the value imposed by 'query_digests_groups_grouping_limit'. Ex:
This enables a new level of compression for digests in which literal values are not relevant, this feature is disabled by default.
Improved testing tooling
Introduced a new test 'test_mysql_query_digests_stages-t.cpp' which supports multiple testing payloads formats for checking the behavior between the different parsing stages, and different configuration options.
A fuzzy testing solution using AFL++ was also included in the PR for checking the new implementation stability.