-
Notifications
You must be signed in to change notification settings - Fork 14.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(CRUD/listviews): Errors with rison and search strings using special characters #18056
fix(CRUD/listviews): Errors with rison and search strings using special characters #18056
Conversation
Hi @corbinrobb thanks for the initial fix. Is it possible to add some unit tests for this? |
Hey @pkdotson thanks for taking a look. I had to think about it a lot but I believe I may have a good idea of how I can do some unit tests on this. I will try to implement it and get it pushed up soon! I am going to try to add an end to end test for it as well. I have only a little bit of experience writing cypress tests so I am not sure how long that will take to add |
Codecov Report
@@ Coverage Diff @@
## master #18056 +/- ##
==========================================
- Coverage 66.34% 66.31% -0.04%
==========================================
Files 1569 1620 +51
Lines 61685 63075 +1390
Branches 6240 6370 +130
==========================================
+ Hits 40927 41827 +900
- Misses 19161 19591 +430
- Partials 1597 1657 +60
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
/testenv up |
@pkdotson Ephemeral environment spinning up at http://34.212.43.212:8080. Credentials are |
@@ -33,6 +33,35 @@ import { FetchDataConfig } from 'src/components/ListView'; | |||
import SupersetText from 'src/utils/textUtils'; | |||
import { Dashboard, Filters } from './types'; | |||
|
|||
// Modifies the rison encoding slightly to match the backend's | |||
// rison encoding/decoding. Applies globally. Code pulled from rison.js | |||
(() => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this called everytime rison.encode_uri
is called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function should only run once and since it doesn't get named, exported, or called anywhere else, it shouldn't run again. This just takes advantage of the rison object being global and mutable, and changes the regex variable on it that decides what to do with certain characters when it encodes/decodes. That regex variable gets used every time encode_uri, encode or decode is called but this function itself doesn't
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome, @corbinrobb! Thanks for troubleshooting and fixing this!
@@ -33,6 +33,35 @@ import { FetchDataConfig } from 'src/components/ListView'; | |||
import SupersetText from 'src/utils/textUtils'; | |||
import { Dashboard, Filters } from './types'; | |||
|
|||
// Modifies the rison encoding slightly to match the backend's | |||
// rison encoding/decoding. Applies globally. Code pulled from rison.js |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, can you add here that rison.js
is licensed under the MIT license, so we know we can reuse it? We also need to have a link to the original project, since the MIT license requires crediting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing! I will have that pushed up soon
const actualEncoding = rison.encode(testObject); | ||
|
||
const expectedEncoding = | ||
"('\"':'\"','#':'#','&':'&','+':'+','=':'=','?':'?','[':'[',']':']','^':'^','`':'`','{':'{','|':'|','}':'}')"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this more readable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can have it make and encode an object for each one of the characters being tested instead of having it all in one. It will get rid of this big string and should be easier to read. I am pushing it up now so let me know if it needs more work and I can give it another go
Ephemeral environment shutdown and build artifacts deleted. |
🏷 preset:2022.7 |
…al characters (apache#18056) * fix errors with rison and useQueryParams * add test for encode/decode * add rison link and make test case more readable Co-authored-by: Corbin Robb <[email protected]> (cherry picked from commit c8df849)
Should this PR also have updated this usage too:? superset/superset-frontend/src/components/Datasource/DatasourceEditor.jsx Lines 642 to 644 in 2491b89
|
SUMMARY
Short description:
Fixes an issue with rison caused by differences of encoding/decoding between front end and back end dependencies.
Also, fixes some issues caused by sending and/or retrieving URLs without cleaning/encoding potential URL breaking syntax added by the user.
Involves these issues #17288 and #13708
Detailed Description:
This mostly comes down to three problems:
1. Rison encode/decode:
This might seem complicated at first but it really comes down to a difference in a single Regular Expression (regex).
To put it very simply, rison is a way to make a string from an object. It aims to make this string as compact, human-readable, and URL-friendly as possible. Read more about it here.
It will take this:
and turn it into this:
So what is happening?
The answer comes down to the regex that I mentioned.
Both of them have a function that builds a string of characters to be used in the regex that decides whether a character can be a valid "id". An id is any value being used as a rison value. (filters, value, 'string', and 100 are all ids in the previous examples)
Rison has reserved values that can't be used as ids but a "string" value that is passed can have any characters inside of it as long as it has 'single quotes' around it. Rison encode will provide these 'single quotes' around any value that contains a character that it has reserved or restricted for its syntax. This helps rison know a value is a string and thus parse the reserved characters correctly.
The regex expression is what decides this and will fail to decode the string if it determines that single quotes weren't added correctly because it sees it as invalid syntax.
So why is the regex different?
I don't know the motivation behind this but as I mentioned, there is a loop that builds the string in the JS rison and the Python rison. Both of these loops build the same string.
But for some reason, the JS rison overwrites this string with a different hard typed one immediately after it is made.
JS:
Python:
So all we need to do is change that id_char variable on the javascript rison and update it's regex. Then we have the javascript rison encoding the same way that the python rison is expecting it to be.
This fixes the issues that cause a 400 error when searching for these symbols by themselves
My implementation
I will admit that I do not believe this is the best possible implementation. I just wanted to provide a solution and get feedback to hopefully find a better way. This is a pretty unique problem and there are a ton of different directions we could go with the fix.
What I have in this PR is a fix that will take the rison global object that gets created and mutates the regex properties on it to make it work with the backend decoding.
I just took the code from rison where it is creating the regex and then overwrote the properties that use it.
My problem with my solution is that it is feels hacky and it doesn't feel like there is a good place for this code to live. I put it in
CRUD/utils.tsx
because it seemed like the right choice.There is also the issue of this code being immediately executed out in the open and not being clearly contained anywhere.
Alternative options:
2. Rison encode being URL friendly
Rison encode is URL friendly but it is not perfect when it comes to URLs and rison knows this. So it provided a way to safely send URLs in the form of an alternate encode method called
encode_uri
. Using this is not always necessary but it is useful when you are unsure about the data that is being provided in your string like when user input is involved. Here is where it is talked about in the README.So I switched to encode_uri for the API calls that are building strings that may have user input.
This fixes the Fatal Errors that are returned from the backend when a search string includes & or # or includes a percent encoded value like a single quote (%27). This also prevents the user from being able to search using percent encoded values. For example, if you currently search %27%26%27 ('&') in any of the list-views it will actually return the results that include an ampersand.
3. URL encoding for QueryParams
Another simple explanation here. The list views use QueryParams to put the query string in the browser URL for search/filter persistence. This is done so you can do things like refresh the page or press the forward and back button without it completely removing the filters, sort orders, etc that you set. At least that's what I believe it is being used for.
Looks like this:
This part of it is rison:
QueryParams allows you to set custom encoding/decoding for certain Params but it will still always read from the params as a URL before it decodes. So this means that non percent encoded symbols that can break URLs ( &, #, and unexpected percent encoded strings) will cause issues.
This is what is causing the current page breaking Fatal Error that happens on the list views when searching for an & by itself.
This is the same fatal error that was happening on the backend but this time it is on the front end.
I opted to not use the rison.encode_uri method for this because it doesn't keep the string that clean and will percent encode a little too much if certain characters are passed to it. I felt that since the user can see this query in their browser that the best approach would be to keep the percent encoding minimal.
So the fix here is to encode only the characters that cause problems by using some String.replace() methods after using rison.encode()
BEFORE
Searching for &
Searching for !"#$%&'()*+,-./:;<=?>@\[]^_`{|}~
Searching for #
Searching for ?
Time range for & - 500 response
Time range for # - 500 response
AFTER
Searching for &
Searching for !"#$%&'()*+,-./:;<=?>@\[]^_`{|}~
Searching for #
Searching for ?
Time range for &
Time range for #
TESTING INSTRUCTIONS
#
should no longer cause a fatal error crash on client side after refreshing the page following a search for #ADDITIONAL INFORMATION