Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: handle subtle bug with load-examples #16052

Merged

Conversation

betodealmeida
Copy link
Member

@betodealmeida betodealmeida commented Aug 3, 2021

SUMMARY

This PR solves a bug caused by a number of small errors that separate are harmless:

Bug 1. When running load-examples the datasets created from configs (https://github.com/apache/superset/tree/master/superset/examples/configs/datasets/examples) were loaded with schema set to null. The bug was fixed in #16041, but there are many datasets in the wild with NULL as their schema.

Bug 2. When following the dashboard creation tutorial the user is instructed to create a dataset called cleaned_sales_data in the public schema. This results in two datasets with the same name:

  1. [NULL].cleaned_sales_data with UUID A (added without a schema by load-examples)
  2. public.cleaned_sales_data with UUID B (added with a schema by the user)

This shouldn't be possible, because tables had a uniqueness constraint of database_id, table_name at the time:

__table_args__ = (UniqueConstraint("database_id", "table_name"),)

But the logic was enforced by the application, not the DB.

If the user now runs load-examples again (with the fix from #16041) we'll try to import the dataset with the schema, public.cleaned_sales_data, with the same UUID ("A"). The helper import_from_dict will then run a query similar to this to check for uniqueness:

SELECT * FROM tables WHERE (name='cleaned_sales_data' AND `schema`='public') OR uuid='A'

And this returns both datasets.

A solution would be to delete all the datasets that have schema set to NULL (assuming they should have one), and then run load-examples again. But this would overwrite any custom datasets created by users.

Instead, I changed the load-examples script to skip an import when duplicates are found. This should affect a small number of datasets, and since we have made no changes to the cleaned_sales_data dataset it's fine to skip it.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A

TESTING INSTRUCTIONS

To replicate:

  1. Run load-examples with a SHA before fix: set correct schema on config import #16041 was merged.
  2. Add a dataset called cleaned_sales_data in a given schema.
  3. Check that there are 2 datasets called cleaned_sales_data, one with schema and another without.
  4. Upgrade to post-fix: set correct schema on config import #16041.
  5. Run load-examples again, it should work.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@codecov
Copy link

codecov bot commented Aug 3, 2021

Codecov Report

Merging #16052 (4593a5c) into master (4cb79e5) will decrease coverage by 0.24%.
The diff coverage is 57.77%.

❗ Current head 4593a5c differs from pull request most recent head fef02d0. Consider uploading reports for the commit fef02d0 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16052      +/-   ##
==========================================
- Coverage   76.90%   76.66%   -0.25%     
==========================================
  Files         995      995              
  Lines       52842    52869      +27     
  Branches     6709     6712       +3     
==========================================
- Hits        40640    40533     -107     
- Misses      11976    12110     +134     
  Partials      226      226              
Flag Coverage Δ
hive ?
mysql 81.56% <63.15%> (-0.07%) ⬇️
postgres 81.62% <63.15%> (-0.03%) ⬇️
presto ?
python 81.71% <63.15%> (-0.46%) ⬇️
sqlite 81.22% <63.15%> (-0.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...c/views/CRUD/data/database/DatabaseModal/index.tsx 44.53% <0.00%> (-0.36%) ⬇️
...set-frontend/src/views/CRUD/data/database/types.ts 100.00% <ø> (ø)
superset/commands/importers/v1/examples.py 35.44% <10.00%> (-2.59%) ⬇️
superset/databases/commands/export.py 90.47% <66.66%> (-3.81%) ⬇️
superset/databases/schemas.py 98.38% <83.33%> (-0.38%) ⬇️
...erset-frontend/src/datasource/DatasourceEditor.jsx 74.25% <100.00%> (+0.09%) ⬆️
.../CRUD/data/database/DatabaseModal/ExtraOptions.tsx 93.18% <100.00%> (ø)
superset/connectors/druid/models.py 82.95% <100.00%> (+0.01%) ⬆️
superset/connectors/sqla/models.py 88.08% <100.00%> (-1.66%) ⬇️
superset/db_engines/hive.py 0.00% <0.00%> (-82.15%) ⬇️
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4cb79e5...fef02d0. Read the comment docs.

@betodealmeida betodealmeida merged commit 69c5cd7 into apache:master Aug 3, 2021
henryyeh pushed a commit to preset-io/superset that referenced this pull request Aug 3, 2021
opus-42 pushed a commit to opus-42/incubator-superset that referenced this pull request Nov 14, 2021
cccs-RyanS pushed a commit to CybercentreCanada/superset that referenced this pull request Dec 17, 2021
QAlexBall pushed a commit to QAlexBall/superset that referenced this pull request Dec 29, 2021
cccs-rc pushed a commit to CybercentreCanada/superset that referenced this pull request Mar 6, 2024
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 1.3.0 labels Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/S 🚢 1.3.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants