Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Several fixes for parsing issues in mysql_query_digest_and_first_comment #3680

Merged
merged 30 commits into from
Jan 6, 2022

Conversation

JavierJF
Copy link
Collaborator

@JavierJF JavierJF commented Oct 29, 2021

This PR provides a new implementation for query digests generation, address multiple issues with the current implementation, and provides a new configuration option for the digests:

Issues fixed

1. Queries ending with comment without spaces doesn't get the comment removed properly.

  • Q: # random_comment\n select 1.1# final_comment \n
  • D: select ?# final comment

2. Numbers in scientific notation are not properly parsed because exponent is camel sensitive.

  • Q: SELECT 1.2E3, 1.2E-3, -1.2E3, -1.2E-3
  • D: SELECT 12E3, 12E-?,-12E3,-12E-?

3. For query grouping NULL values doesn't reset the grouping count, this leads to cases as described in
'Grouping section' limitations description.

4. Commas preceding NULL values are not properly replaced in the digest.

SELECT 7,null,null, null;
SELECT ?,?,?,null

5. Commas are currently not copied if a grouping query is taking place, this conflicts with possible NULL
values that can be found, but that won't be replaced if mysql-query_digests_replace_null is 0:

Q: `INSERT INTO db.table VALUES ( Null , NULL, '',NULL, 'a', 'b', 'z',nuLL)`
D: `INSERT INTO db.table VALUES ( Null,NULL,?,NULL,?,?,...nuLL)`
                                                         ^ missing comma

5. Arithmetics operators break grouping:

Q: `INSERT INTO db.table ( col1, col2,col3,col4, col5 ) VALUES ('val',2,3,'foo', 5 + 10, 6 - 9)`
D: `INSERT INTO db.table ( col1,col2,col3,col4,col5 ) VALUES (?,?,?,... + -)`

6. When no digits is enabled, if there is any space between a identifier which name finish with a number,
and a closing parenthesis, the space is collapsed.

Q: `INSERT INTO db.table ( col1 )  VALUES ( 'val' )`
D: `INSERT INTO db.table ( col?) VALUES (?)`
                              ^ missing space after '?' mark

This is not consistent with regular behavior:

Q: `INSERT INTO db.table ( col1 )  VALUES ( 'val' )`
D: `INSERT INTO db.table ( col1 ) VALUES (?)`
                               ^ preserved space after '?' mark

7. Spaces not removed after parenthesis when literal strings are preceded by '+|-'

Q: `INSERT INTO db.table (col1)  VALUES ( +'val' )`
D: `INSERT INTO db.table (col1) VALUES ( ?)`
                                        ^ preserved space

8. Operators not removed when extra space precedes the value.

Q: `select  + 1`
D: `select +?`

Q: `select CONCAT(- '89')`
D: `select CONCAT(-?)`

9. Buffer overrun detected when comment isn't finished by */ mark:

Query:

/* SELECT 1

ASAN Output:

=27486==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x619003c01880 at pc 0x7fb1597c472b bp 0x7fb153cf6ec0 sp 0x7fb153cf6668
READ of size 1025 at 0x619003c01880 thread T6
    #0 0x7fb1597c472a in __interceptor_strlen /build/gcc/src/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:389
    #1 0x56214beb851a in tokenizer /home/javjarfer/Projects/proxysql/lib/c_tokenizer.c:24
    #2 0x56214c197aad in Query_Processor::query_parser_first_comment(Query_Processor_Output*, char*) /home/javjarfer/Projects/proxysql/lib/Query_Processor.cpp:2440
    #3 0x56214c19258d in Query_Processor::process_mysql_query(MySQL_Session*, void*, unsigned int, Query_Info*) /home/javjarfer/Projects/proxysql/lib/Query_Processor.cpp:1730
    #4 0x56214c10c214 in MySQL_Session::get_pkts_from_client(bool&, _PtrSize_t&) /home/javjarfer/Projects/proxysql/lib/MySQL_Session.cpp:3535
    #5 0x56214c115085 in MySQL_Session::handler() /home/javjarfer/Projects/proxysql/lib/MySQL_Session.cpp:4269
    #6 0x56214c08f26a in MySQL_Thread::process_all_sessions() /home/javjarfer/Projects/proxysql/lib/MySQL_Thread.cpp:3762
    #7 0x56214c0890b1 in MySQL_Thread::run() /home/javjarfer/Projects/proxysql/lib/MySQL_Thread.cpp:3271
    #8 0x56214be9ff6e in mysql_worker_thread_func(void*) /home/javjarfer/Projects/proxysql/src/main.cpp:412
    #9 0x7fb15956e258 in start_thread (/usr/lib/libpthread.so.0+0x9258)
    #10 0x7fb158fe55e2 in __GI___clone (/usr/lib/libc.so.6+0xfe5e2)

10. Double spaces not properly suppressed for some comments of kind /* */

Actual:

Q: `SELECT /* COMMENT */ 1`
D: `SELECT  ?`
           ^ preserved space

Expected:

Q: `SELECT /* COMMENT */ 1`
D: `SELECT ?`
           ^ non-preserved space

11. Signs are not properly removed when preceding literal numbers if they are surrounded by spaces.

Q: `SELECT 5 +  6,  -  6,- 7;`
D: `SELECT ?+?,-?,-? `
               ^ extra sign

Aside of these and other minor fixes found during development.

New configuration option

This PR introduces the new configuration variable 'query_digests_groups_grouping_limit'. When enabled together with 'query_digests_grouping_limit' performs a replacement of the groups pattern found described by 'query_digests_grouping_limit' when the number of groups exceeds the value imposed by 'query_digests_groups_grouping_limit'. Ex:

  • query_digests_grouping_limit: 1
  • query_digests_groups_grouping_limit: 2
Q: `INSERT INTO db.table (col1,col2,col3) VALUES (1,2,3),(3,5,6),(7,8,9) ON DUPLICATE KEY UPDATE col1 = VALUES(col2)`
D: `INSERT INTO db.table (col1,col2,col3) VALUES (?,...),(?,...),... ON DUPLICATE KEY UPDATE col1 = VALUES(col2)`

This enables a new level of compression for digests in which literal values are not relevant, this feature is disabled by default.

Improved testing tooling

Introduced a new test 'test_mysql_query_digests_stages-t.cpp' which supports multiple testing payloads formats for checking the behavior between the different parsing stages, and different configuration options.

A fuzzy testing solution using AFL++ was also included in the PR for checking the new implementation stability.

@renecannao renecannao changed the title Several fixes for parsing issues in 'mysql_query_digest_and_first_comment' Several fixes for parsing issues in mysql_query_digest_and_first_comment Oct 29, 2021
@JavierJF
Copy link
Collaborator Author

JavierJF commented Nov 4, 2021

retest this please.

@renecannao
Copy link
Contributor

retest this please

…d_first_comment_2'

+ Improved stage parsing to get the maximum compression from the maximum
  digest size imposed by 'mysql_thread___query_digests_max_query_length'.
+ Homogenized staging API and improved offset computation between
  stages.
+ Improved documentation (WIP).
+ Simplified literal parsing logic for some states of stage 1 parsing.
…lementation

 * Added missing documentation, including stages implementation
   reasoning.
 * Refactored common stages operations into isolated functions.
 * Fixed issue with operator collapsing for 'stage 2'.
 * Fixed compression edge cases founds with latest testing.
…plementation

 * Improved testing format and introduced new version of grouping
   features testing using random query generation.
 * Provided general test description and improved documentation.
 * Allowed to individually execute the different kind of tests
   supported.
@JavierJF
Copy link
Collaborator Author

retest this please.

1 similar comment
@JavierJF
Copy link
Collaborator Author

retest this please.

@JavierJF JavierJF marked this pull request as ready for review December 2, 2021 10:05
@renecannao
Copy link
Contributor

retest this please

@renecannao renecannao merged commit cb34b82 into v2.x Jan 6, 2022
@renecannao renecannao deleted the v2.x-fixes_for_query_diggest_issues branch January 11, 2023 10:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants