Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(native-filters): Fetch only the required dataset fields #23303

Merged
merged 2 commits into from
Mar 20, 2023

Conversation

john-bodley
Copy link
Member

@john-bodley john-bodley commented Mar 7, 2023

SUMMARY

This PR partially addresses recommendation #⁠1 in #14383 by requesting—via the FAB API—only the required fields (columns) from the /api/v1/dataset/{pk} RESTful GET endpoint.

Fetching the entirety of dataset (especially a very large datasets) can be rather expensive which. #22413 made significant performance improvements (reducing the response time from > 300 seconds to ~ 5 seconds) however later #23113 was introduced to address some performance tradeoffs (in other contexts) by disabling eager loading (increasing the response time from ~ 5 seconds to ~ 30 seconds). The bulk of this increased cost was from the repeated lazy loading of the back referenced table i.e., self.table, for every dataset column (even though this is always the same table; possibly a SQLAlchemy quirk) for a subset of the TableColumn fields including the advanced data type.

This PR updates—in the context of the native filters modal—the /api/v1/dataset/{pk} requests to fetch only the subset of columns required in the response. This helps to ensure the we're:

  1. Not building overly complex SQLAlchemy queries on the backend.
  2. Reducing the response payload.

For the dataset in question said change reduced the response time from ~ 30 seconds to ~ 3 seconds which significantly improves the UX when interfacing with the modal.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

CI.

ADDITIONAL INFORMATION

  • Has associated issue: [SIP-64] Migrate filter_box to Dashboard Native Filter Component #14383
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@codecov
Copy link

codecov bot commented Mar 7, 2023

Codecov Report

Merging #23303 (2fd632d) into master (da3791a) will increase coverage by 9.66%.
The diff coverage is 78.61%.

❗ Current head 2fd632d differs from pull request most recent head a1c5cf6. Consider uploading reports for the commit a1c5cf6 to get more accurate results

@@            Coverage Diff             @@
##           master   #23303      +/-   ##
==========================================
+ Coverage   56.27%   65.93%   +9.66%     
==========================================
  Files        1907     1907              
  Lines       73495    73590      +95     
  Branches     7977     7982       +5     
==========================================
+ Hits        41356    48524    +7168     
+ Misses      30091    23018    -7073     
  Partials     2048     2048              
Flag Coverage Δ
javascript 53.77% <78.46%> (-0.03%) ⬇️
unit ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...chart-echarts/src/Timeseries/Area/controlPanel.tsx 40.00% <ø> (ø)
...charts/src/Timeseries/Regular/Bar/controlPanel.tsx 35.71% <ø> (ø)
...harts/src/Timeseries/Regular/Line/controlPanel.tsx 33.33% <ø> (ø)
...ts/src/Timeseries/Regular/Scatter/controlPanel.tsx 40.00% <ø> (ø)
...src/Timeseries/Regular/SmoothLine/controlPanel.tsx 40.00% <ø> (ø)
...chart-echarts/src/Timeseries/Step/controlPanel.tsx 33.33% <ø> (ø)
...s/plugin-chart-echarts/src/Timeseries/constants.ts 100.00% <ø> (ø)
...gin-chart-echarts/src/Timeseries/transformProps.ts 57.14% <0.00%> (-1.81%) ⬇️
...tersConfigModal/FiltersConfigForm/ColumnSelect.tsx 77.14% <ø> (ø)
...onfigModal/FiltersConfigForm/FiltersConfigForm.tsx 55.17% <ø> (-4.49%) ⬇️
... and 20 more

... and 341 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@john-bodley john-bodley force-pushed the john-bodley--native-filters-dataset-columns branch 2 times, most recently from 31a91aa to 1b53e5d Compare March 7, 2023 23:50
@john-bodley
Copy link
Member Author

Closing (for now) in favor of #23314.

@john-bodley john-bodley closed this Mar 8, 2023
@john-bodley john-bodley deleted the john-bodley--native-filters-dataset-columns branch March 8, 2023 18:41
@john-bodley john-bodley restored the john-bodley--native-filters-dataset-columns branch March 14, 2023 22:04
@john-bodley john-bodley reopened this Mar 14, 2023
@john-bodley john-bodley force-pushed the john-bodley--native-filters-dataset-columns branch 5 times, most recently from 419f484 to cce1165 Compare March 15, 2023 17:12
@pull-request-size pull-request-size bot added size/M and removed size/S labels Mar 15, 2023
@john-bodley john-bodley changed the title chore(native-filters): Fetch only the dataset columns chore(native-filters): Fetch only the required dataset columns Mar 15, 2023
@john-bodley john-bodley changed the title chore(native-filters): Fetch only the required dataset columns chore(native-filters): Fetch only the required dataset fields Mar 15, 2023
@john-bodley john-bodley force-pushed the john-bodley--native-filters-dataset-columns branch 3 times, most recently from b602228 to 01ffa3b Compare March 15, 2023 19:00
@pull-request-size pull-request-size bot added size/S and removed size/M labels Mar 15, 2023
@john-bodley john-bodley force-pushed the john-bodley--native-filters-dataset-columns branch from 01ffa3b to 8a64476 Compare March 15, 2023 19:34
@john-bodley john-bodley marked this pull request as ready for review March 15, 2023 20:33
@john-bodley john-bodley requested a review from rusackas March 16, 2023 01:11
Copy link
Member

@michael-s-molina michael-s-molina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @john-bodley. I left some comments.

@@ -85,7 +86,9 @@ export function ColumnSelect({
}
if (datasetId != null) {
cachedSupersetGet({
endpoint: `/api/v1/dataset/${datasetId}`,
endpoint: `/api/v1/dataset/${datasetId}?q=${rison.encode({
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The columns state is later used to get currentColumn which is then passed to the filterValues function. If you check filterValues references, you will see that is_dttm and type_generic attributes are also used.

@@ -654,7 +655,20 @@ const FiltersConfigForm = (
useEffect(() => {
if (datasetId) {
cachedSupersetGet({
endpoint: `/api/v1/dataset/${datasetId}`,
endpoint: `/api/v1/dataset/${datasetId}?q=${rison.encode({
columns: [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found other attributes that need to be queried when analyzing the following files:

database.id
datasource_name
schema
is_sqllab_view

columns.type

columns.is_dttm
columns.name
main_dttm_col

return !!datasource && 'results' in datasource && 'sql' in datasource;

results
sql

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michael-s-molina it seem like results isn't an attribute returned per here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@john-bodley You're right. Given that results is not listed in show_select_columns, I'm assuming that it's populated elsewhere and you projections won't affect this behavior. Tagging @zhaoyongjie and @villebro in case they know something different.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I think this is needed in the Explore control panel only, hence shouldn't affect dashboards/native filters.

@john-bodley john-bodley force-pushed the john-bodley--native-filters-dataset-columns branch from 8a64476 to 61dd061 Compare March 16, 2023 20:51
@pull-request-size pull-request-size bot added size/M and removed size/S labels Mar 16, 2023
@michael-s-molina
Copy link
Member

@villebro @zhaoyongjie Can you double check if there are no missing attributes or invalid projection names?

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested to work as expected and LGTM after @michael-s-molina 's database_name comment has been addressed.

Comment on lines +92 to +93
'columns.is_dttm',
'columns.type_generic',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100 % sure on this (more like 90 %), but I feel like is_dttm is no longer necessary, and type_generic === GenericDataType.TEMPORAL should always be true for any col that has is_dttm === true, and is the preferred way of checking for temporal columns in the frontend. So if we want to reduce payload size, this is kind of a low hanging fruit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@villebro thanks for the review. If your hypothesis is correct then it seems like a follow up PR would be to deprecate the is_dttm column everywhere.

@@ -654,7 +655,20 @@ const FiltersConfigForm = (
useEffect(() => {
if (datasetId) {
cachedSupersetGet({
endpoint: `/api/v1/dataset/${datasetId}`,
endpoint: `/api/v1/dataset/${datasetId}?q=${rison.encode({
columns: [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I think this is needed in the Explore control panel only, hence shouldn't affect dashboards/native filters.

…rsConfigModal/FiltersConfigForm/FiltersConfigForm.tsx

Co-authored-by: Michael S. Molina <[email protected]>
@@ -664,7 +664,7 @@ const FiltersConfigForm = (
'columns.type',
'columns.verbose_name',
'database.id',
'database_name',
'database.database_name',
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's somewhat weird how FAB doesn't barf when you give it invalid columns.

Copy link
Member

@michael-s-molina michael-s-molina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@john-bodley john-bodley merged commit ffc0a81 into master Mar 20, 2023
@john-bodley john-bodley deleted the john-bodley--native-filters-dataset-columns branch March 20, 2023 18:43
john-bodley added a commit to airbnb/superset-fork that referenced this pull request Mar 20, 2023
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 3.0.0 labels Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/M 🚢 3.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants