Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(CRUD/listviews): Errors with rison and search strings using special characters #18056

Conversation

corbinrobb
Copy link
Contributor

@corbinrobb corbinrobb commented Jan 14, 2022

SUMMARY

Short description:

Fixes an issue with rison caused by differences of encoding/decoding between front end and back end dependencies.

Also, fixes some issues caused by sending and/or retrieving URLs without cleaning/encoding potential URL breaking syntax added by the user.

Involves these issues #17288 and #13708


Detailed Description:

This mostly comes down to three problems:

  1. The JavaScript rison encodes/decodes in a slightly different way than the Python rison does
  2. rison's encode method doesn't URL encode by default so URL queries can be broken by the user's input
  3. When we retrieve a query string appended to a route (Refresh the page during a search), it gets read as a URL. So certain characters can break the URL and cause rison to try to read incomplete or improperly encoded/decoded strings.

1. Rison encode/decode:

This might seem complicated at first but it really comes down to a difference in a single Regular Expression (regex).

To put it very simply, rison is a way to make a string from an object. It aims to make this string as compact, human-readable, and URL-friendly as possible. Read more about it here.

It will take this:

{ filters: [{ value: 'string' }, { value: 100 }] }

and turn it into this:

(filters:!((value:string),(value:100)))
  • Both the JS rison and Python rison can encode any object thrown at it
  • Both the JS rison and Python rison can decode any string that they encoded
  • Both the JS rison and Python rison can't decode all strings that the other has encoded

So what is happening?

The answer comes down to the regex that I mentioned.

Both of them have a function that builds a string of characters to be used in the regex that decides whether a character can be a valid "id". An id is any value being used as a rison value. (filters, value, 'string', and 100 are all ids in the previous examples)

Rison has reserved values that can't be used as ids but a "string" value that is passed can have any characters inside of it as long as it has 'single quotes' around it. Rison encode will provide these 'single quotes' around any value that contains a character that it has reserved or restricted for its syntax. This helps rison know a value is a string and thus parse the reserved characters correctly.

The regex expression is what decides this and will fail to decode the string if it determines that single quotes weren't added correctly because it sees it as invalid syntax.

So why is the regex different?

I don't know the motivation behind this but as I mentioned, there is a loop that builds the string in the JS rison and the Python rison. Both of these loops build the same string.

But for some reason, the JS rison overwrites this string with a different hard typed one immediately after it is made.

JS:

(function () {
    var l = [];
    for (var hi = 0; hi < 16; hi++) {
        for (var lo = 0; lo < 16; lo++) {
            if (hi+lo === 0) continue;
            var c = String.fromCharCode(hi*16 + lo);
            if (! /\w|[-_.\/~]/.test(c))
                l.push('\\u00' + hi.toString(16) + lo.toString(16));
        }
    }
    rison.not_idchar = l.join('');  // <<--- Where it is added
})();
//rison.not_idchar  = " \t\r\n\"<>[]{}'!=:(),*@$;&";
rison.not_idchar  = " '!:(),*@$"; // <<--- Where it is reassigned

Python:

NOT_IDCHAR = ''.join([c for c in (chr(i) for i in range(127))
                      if not (c.isalnum()
                              or c in IDCHAR_PUNCTUATION)])

# ^^^ Good ol python code not doing anything weird ^^^

So all we need to do is change that id_char variable on the javascript rison and update it's regex. Then we have the javascript rison encoding the same way that the python rison is expecting it to be.

This fixes the issues that cause a 400 error when searching for these symbols by themselves

& # ? ^ { } [ ] | " = + `

My implementation

I will admit that I do not believe this is the best possible implementation. I just wanted to provide a solution and get feedback to hopefully find a better way. This is a pretty unique problem and there are a ton of different directions we could go with the fix.

What I have in this PR is a fix that will take the rison global object that gets created and mutates the regex properties on it to make it work with the backend decoding.

I just took the code from rison where it is creating the regex and then overwrote the properties that use it.

// Modifies the rison encoding slightly to match the backend's
// rison encoding/decoding. Applies globally. Code pulled from rison.js
(() => {
  const risonRef: {
    not_idchar: string;
    not_idstart: string;
    id_ok: RegExp;
    next_id: RegExp;
  } = rison as any;

  const l = [];
  for (let hi = 0; hi < 16; hi += 1) {
    for (let lo = 0; lo < 16; lo += 1) {
      if (hi + lo === 0) continue;
      const c = String.fromCharCode(hi * 16 + lo);
      if (!/\w|[-_./~]/.test(c))
        l.push(`\\u00${hi.toString(16)}${lo.toString(16)}`);
    }
  }

  risonRef.not_idchar = l.join('');
  risonRef.not_idstart = '-0123456789';

  const idrx = `[^${risonRef.not_idstart}${risonRef.not_idchar}][^${risonRef.not_idchar}]*`;

  risonRef.id_ok = new RegExp(`^${idrx}$`);
  risonRef.next_id = new RegExp(idrx, 'g');
})();

My problem with my solution is that it is feels hacky and it doesn't feel like there is a good place for this code to live. I put it in CRUD/utils.tsx because it seemed like the right choice.

There is also the issue of this code being immediately executed out in the open and not being clearly contained anywhere.

Alternative options:

  • Somehow intercept and modify rison before it ever gets imported into react ( I couldn't figure this out )
  • Fork the rison repo and modify that one line to get this to work as expected ( who will maintain the fork and the node module? )
  • Have a local Superset specific rison package live inside the project and change that one line of code ( feels like overkill for a single line of difference )
  • Use Json instead of rison for these calls ( not sure how this would look but Flask App Builder rison decorator has a fall back to json so this is an option )
  • Parse the encoded rison strings and modify them to work correctly. ( This will have to be done on all calls containing user input. Also if a new call containing user input is added we will need to do this to it, in other words it's repetitive )

2. Rison encode being URL friendly

Rison encode is URL friendly but it is not perfect when it comes to URLs and rison knows this. So it provided a way to safely send URLs in the form of an alternate encode method called encode_uri. Using this is not always necessary but it is useful when you are unsure about the data that is being provided in your string like when user input is involved. Here is where it is talked about in the README.

So I switched to encode_uri for the API calls that are building strings that may have user input.

This fixes the Fatal Errors that are returned from the backend when a search string includes & or # or includes a percent encoded value like a single quote (%27). This also prevents the user from being able to search using percent encoded values. For example, if you currently search %27%26%27 ('&') in any of the list-views it will actually return the results that include an ampersand.


3. URL encoding for QueryParams

Another simple explanation here. The list views use QueryParams to put the query string in the browser URL for search/filter persistence. This is done so you can do things like refresh the page or press the forward and back button without it completely removing the filters, sort orders, etc that you set. At least that's what I believe it is being used for.

Looks like this:

chart/list/?filters=(slice_name:hello)&pageIndex=0&sortColumn=changed_on_delta_humanized&sortOrder=desc&viewMode=card

This part of it is rison:

(slice_name:hello)

QueryParams allows you to set custom encoding/decoding for certain Params but it will still always read from the params as a URL before it decodes. So this means that non percent encoded symbols that can break URLs ( &, #, and unexpected percent encoded strings) will cause issues.

This is what is causing the current page breaking Fatal Error that happens on the list views when searching for an & by itself.

This is the same fatal error that was happening on the backend but this time it is on the front end.

I opted to not use the rison.encode_uri method for this because it doesn't keep the string that clean and will percent encode a little too much if certain characters are passed to it. I felt that since the user can see this query in their browser that the best approach would be to keep the percent encoding minimal.

So the fix here is to encode only the characters that cause problems by using some String.replace() methods after using rison.encode()

          rison.encode(data)
          .replace(/%/g, '%25')
          .replace(/&/g, '%26')
          .replace(/\+/g, '%2B')
          .replace(/#/g, '%23'),

BEFORE

Searching for &
Screen Shot 2022-01-31 at 12 09 59 PM

Searching for !"#$%&'()*+,-./:;<=?>@\[]^_`{|}~
Screen Shot 2022-01-31 at 12 12 41 PM

Searching for #
Screen Shot 2022-01-31 at 12 56 21 PM

Searching for ?
Screen Shot 2022-01-31 at 12 58 09 PM

Time range for & - 500 response
Screen Shot 2022-02-04 at 12 44 24 PM

Time range for # - 500 response
Screen Shot 2022-02-04 at 12 45 19 PM

AFTER

Searching for &
Screen Shot 2022-01-31 at 11 55 28 AM

Searching for !"#$%&'()*+,-./:;<=?>@\[]^_`{|}~
Screen Shot 2022-01-31 at 11 56 11 AM

Searching for #
Screen Shot 2022-01-31 at 1 03 28 PM

Searching for ?
Screen Shot 2022-01-31 at 1 03 45 PM

Time range for &
Screen Shot 2022-02-04 at 12 48 48 PM

Time range for #
Screen Shot 2022-02-04 at 12 49 02 PM

TESTING INSTRUCTIONS

  • Open up Superset
  • Navigate to Charts/Dashboards by clicking the tab in the top navigation bar
  • You can also go to:
    • Data -> Databases
    • Data -> Datasets
    • SQL Lab -> Saved Queries
    • SQL Lab -> Query History
    • Settings -> CSS Templates
    • Settings -> Annotation Layers
  • Try out the search bar and verify that special characters can be used in a search ( Ex: & # ? ^ { } [ ] | " = + ` )
    • & by itself shouldn't cause a fatal error crash on the client side
    • # should no longer cause a fatal error crash on client side after refreshing the page following a search for #
    • All the symbols should no longer cause 500 fatal error responses or 400 not a valid rison/json responses
    • All symbols should be searched by correctly - ( Except % and _ when searched by themselves. Both return all results when searched alone. % is an issue with the backend and I am not sure what is going on with _ but it could be the same issue. These already existed and were not introduced with this fix )

ADDITIONAL INFORMATION

@lyndsiWilliams lyndsiWilliams added the Superset-Community-Partners Preset community partner program participants label Feb 1, 2022
@pkdotson
Copy link
Member

pkdotson commented Feb 1, 2022

Hi @corbinrobb thanks for the initial fix. Is it possible to add some unit tests for this?

@corbinrobb
Copy link
Contributor Author

Hey @pkdotson thanks for taking a look. I had to think about it a lot but I believe I may have a good idea of how I can do some unit tests on this. I will try to implement it and get it pushed up soon! I am going to try to add an end to end test for it as well. I have only a little bit of experience writing cypress tests so I am not sure how long that will take to add

@corbinrobb corbinrobb marked this pull request as ready for review February 3, 2022 18:34
@lyndsiWilliams lyndsiWilliams changed the title fix(CRUD/listviews): [WIP] errors with rison and search strings using special characters fix(CRUD/listviews): Errors with rison and search strings using special characters Feb 3, 2022
@codecov
Copy link

codecov bot commented Feb 3, 2022

Codecov Report

Merging #18056 (a91bba7) into master (14b9298) will decrease coverage by 0.03%.
The diff coverage is n/a.

❗ Current head a91bba7 differs from pull request most recent head eee8975. Consider uploading reports for the commit eee8975 to get more accurate results

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18056      +/-   ##
==========================================
- Coverage   66.34%   66.31%   -0.04%     
==========================================
  Files        1569     1620      +51     
  Lines       61685    63075    +1390     
  Branches     6240     6370     +130     
==========================================
+ Hits        40927    41827     +900     
- Misses      19161    19591     +430     
- Partials     1597     1657      +60     
Flag Coverage Δ
javascript 51.24% <ø> (+0.32%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/examples/multi_line.py 0.00% <0.00%> (-53.85%) ⬇️
superset/commands/importers/v1/examples.py 0.00% <0.00%> (-38.64%) ⬇️
superset/examples/big_data.py 0.00% <0.00%> (-35.00%) ⬇️
superset/examples/misc_dashboard.py 0.00% <0.00%> (-33.34%) ⬇️
superset/examples/utils.py 0.00% <0.00%> (-28.58%) ⬇️
superset/examples/tabbed_dashboard.py 0.00% <0.00%> (-27.59%) ⬇️
superset/db_engine_specs/teradata.py 62.75% <0.00%> (-27.25%) ⬇️
superset/examples/bart_lines.py 0.00% <0.00%> (-25.81%) ⬇️
superset/utils/mock_data.py 0.00% <0.00%> (-25.19%) ⬇️
superset/examples/paris.py 0.00% <0.00%> (-25.00%) ⬇️
... and 322 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 14b9298...eee8975. Read the comment docs.

@pkdotson
Copy link
Member

pkdotson commented Feb 4, 2022

/testenv up

@github-actions
Copy link
Contributor

github-actions bot commented Feb 4, 2022

@pkdotson Ephemeral environment spinning up at http://34.212.43.212:8080. Credentials are admin/admin. Please allow several minutes for bootstrapping and startup.

@@ -33,6 +33,35 @@ import { FetchDataConfig } from 'src/components/ListView';
import SupersetText from 'src/utils/textUtils';
import { Dashboard, Filters } from './types';

// Modifies the rison encoding slightly to match the backend's
// rison encoding/decoding. Applies globally. Code pulled from rison.js
(() => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this called everytime rison.encode_uri is called?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should only run once and since it doesn't get named, exported, or called anywhere else, it shouldn't run again. This just takes advantage of the rison object being global and mutable, and changes the regex variable on it that decides what to do with certain characters when it encodes/decodes. That regex variable gets used every time encode_uri, encode or decode is called but this function itself doesn't

Copy link
Member

@betodealmeida betodealmeida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome, @corbinrobb! Thanks for troubleshooting and fixing this!

@@ -33,6 +33,35 @@ import { FetchDataConfig } from 'src/components/ListView';
import SupersetText from 'src/utils/textUtils';
import { Dashboard, Filters } from './types';

// Modifies the rison encoding slightly to match the backend's
// rison encoding/decoding. Applies globally. Code pulled from rison.js
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, can you add here that rison.js is licensed under the MIT license, so we know we can reuse it? We also need to have a link to the original project, since the MIT license requires crediting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing! I will have that pushed up soon

const actualEncoding = rison.encode(testObject);

const expectedEncoding =
"('\"':'\"','#':'#','&':'&','+':'+','=':'=','?':'?','[':'[',']':']','^':'^','`':'`','{':'{','|':'|','}':'}')";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this more readable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can have it make and encode an object for each one of the characters being tested instead of having it all in one. It will get rid of this big string and should be easier to read. I am pushing it up now so let me know if it needs more work and I can give it another go

@lyndsiWilliams lyndsiWilliams merged commit c8df849 into apache:master Feb 15, 2022
@lyndsiWilliams lyndsiWilliams deleted the corbin/fix-listview-search-errors branch February 15, 2022 22:08
@github-actions
Copy link
Contributor

Ephemeral environment shutdown and build artifacts deleted.

@rosemarie-chiu
Copy link
Contributor

🏷 preset:2022.7

rosemarie-chiu pushed a commit to preset-io/superset that referenced this pull request Feb 16, 2022
…al characters (apache#18056)

* fix errors with rison and useQueryParams

* add test for encode/decode

* add rison link and make test case more readable

Co-authored-by: Corbin Robb <[email protected]>
(cherry picked from commit c8df849)
@cancan101
Copy link
Contributor

Should this PR also have updated this usage too:?

const endpoint = `/datasource/external_metadata_by_name/?q=${rison.encode(
params,
)}`;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels preset:2022.7 size/M Superset-Community-Partners Preset community partner program participants 🚢 1.5.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants