feat: enable metadata sync for virtual tables #10645

villebro · 2020-08-19T20:43:00Z

SUMMARY

This PR adds column metadata syncing support for SQL-based virtual tables (both legacy and React CRUD). In addition, the query is checked for DML and multiple statements which raise an exception. This PR is blocked by #10658 which fixes a bug that this PR exposes.

SCREENSHOTS

Syncing column metadata for virtual table:

Trying to execute a DELETE FROM query:

Trying to execute multiple SELECTs:

On legacy CRUD view, refreshing does the same:

TEST PLAN

ADDITIONAL INFORMATION

codecov-commenter · 2020-08-21T08:24:31Z

Codecov Report

Merging #10645 into master will decrease coverage by 0.13%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master   #10645      +/-   ##
==========================================
- Coverage   64.32%   64.18%   -0.14%     
==========================================
  Files         784      784              
  Lines       36952    36955       +3     
  Branches     3529     3524       -5     
==========================================
- Hits        23769    23721      -48     
- Misses      13074    13126      +52     
+ Partials      109      108       -1

Flag	Coverage Δ
#cypress	`54.61% <0.00%> (+0.09%)`	⬆️
#javascript	`60.83% <100.00%> (+<0.01%)`	⬆️
#python	`59.56% <100.00%> (-0.23%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...erset-frontend/src/datasource/DatasourceEditor.jsx	`73.01% <100.00%> (+0.33%)`	⬆️
superset/connectors/sqla/models.py	`90.28% <100.00%> (+0.55%)`	⬆️
superset/views/datasource.py	`94.82% <100.00%> (+1.38%)`	⬆️
superset/db_engine_specs/presto.py	`70.56% <0.00%> (-12.14%)`	⬇️
superset/examples/world_bank.py	`97.10% <0.00%> (-2.90%)`	⬇️
superset/examples/birth_names.py	`97.36% <0.00%> (-2.64%)`	⬇️
superset/views/database/mixins.py	`80.70% <0.00%> (-1.76%)`	⬇️
superset/models/core.py	`87.22% <0.00%> (-0.28%)`	⬇️
...rontend/src/SqlLab/components/QueryAutoRefresh.jsx	`72.72% <0.00%> (+6.81%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0f44d3e...104bda0. Read the comment docs.

villebro · 2020-08-21T08:47:28Z

superset-frontend/src/datasource/DatasourceEditor.jsx

-    // Handle carefully when the schema is empty
-    const endpoint =
-      `/datasource/external_metadata/${
-        datasource.type || datasource.datasource_type
-      }/${datasource.id}/` +
-      `?db_id=${datasource.database.id}` +
-      `&schema=${datasource.schema || ''}` +
-      `&table_name=${datasource.datasource_name || datasource.table_name}`;
+    const endpoint = `/datasource/external_metadata/${
+      datasource.type || datasource.datasource_type
+    }/${datasource.id}/`;


With the simplification of the code in the endpoint, the slightly hackish query params (db_id, schema and table_name) are now redundant.

villebro · 2020-08-21T08:57:22Z

superset/connectors/sqla/models.py

+        else:
+            db_dialect = self.database.get_dialect()
+            cols = self.database.get_columns(
+                self.table_name, schema=self.schema or None


I had to add checking for empty strings here (schema=self.schema or None) to get CI to pass, as I wasn't able to track down why CI was always creating empty schema names instead of None on some of the examples databases. I later noticed that similar logic is being applied elsewhere (e.g. here and here), so I figured it's ok to solve the symptoms here with this simple workaround instead of ensuring undefined schemas are always None.

villebro · 2020-08-21T08:59:53Z

superset/connectors/sqla/models.py

-            try:
-                datatype = db_engine_spec.column_datatype_to_string(
-                    col.type, db_dialect
-                )
-            except Exception as ex:  # pylint: disable=broad-except
-                datatype = "UNKNOWN"
-                logger.error("Unrecognized data type in %s.%s", new_table, col.name)
-                logger.exception(ex)


This compilation step was moved to external_metadata() which is called earlier in the method, hence making this step redundant.

villebro · 2020-08-21T09:02:05Z

superset/views/datasource.py

-        elif datasource_type == "table":
-            database = (
-                db.session.query(Database).filter_by(id=request.args.get("db_id")).one()
-            )
-            table_class = ConnectorRegistry.sources["table"]
-            datasource = table_class(
-                database=database,
-                table_name=request.args.get("table_name"),
-                schema=request.args.get("schema") or None,
-            )
-        else:
-            raise Exception(f"Unsupported datasource_type: {datasource_type}")


I don't know why the table was created like this instead of just using the available functionality that populates all fields like sql.

Mmmmh, oh it could b that we use this for tables that don't have associated datasets yet, for example in the SQL Lab code base where we show the schema of a table on the left panel.

I just checked and it looks like we call /api/v1/database/1/table/ and /superset/extra_table_metadata/ from SQL Lab, and i think this endpoint used to be called in place of /api/v1/database/1/table/

NOTE: looked a bit deeper and it turns out that /api/v1/database/1/table/ used for SQL Lab does much more, like getting index/comments and more. Eventually we could reuse that endpoint here and surface more of that metadata in this context as its somewhat relevant here too, but for now I think your approach is a better path forward.

villebro · 2020-08-21T10:19:19Z

superset/connectors/sqla/models.py

+            parsed_query = ParsedQuery(self.sql)
+            if not parsed_query.is_readonly():
+                raise SupersetSecurityException(
+                    SupersetError(
+                        error_type=SupersetErrorType.DATASOURCE_SECURITY_ACCESS_ERROR,
+                        message=_("Only `SELECT` statements are allowed"),
+                        level=ErrorLevel.ERROR,
+                    )
+                )
+            statements = parsed_query.get_statements()
+            if len(statements) > 1:
+                raise SupersetSecurityException(
+                    SupersetError(
+                        error_type=SupersetErrorType.DATASOURCE_SECURITY_ACCESS_ERROR,
+                        message=_("Only single queries supported"),
+                        level=ErrorLevel.ERROR,
+                    )
+                )


While this checking isn't being done when rendering the query in get_sqla_query(), I doubt anyone should be attempting DML or multiple queries here. It's possible that someone might be executing stored procedures or similar here along with a select on an engine that supports it, but we can deal with that later when the use case comes to light.

villebro · 2020-08-21T11:10:40Z

superset/db_engine_specs/base.py

@@ -929,23 +929,25 @@ def _truncate_label(cls, label: str) -> str:

    @classmethod
    def column_datatype_to_string(
-        cls, sqla_column_type: TypeEngine, dialect: Dialect
+        cls, column_type: Union[TypeEngine, str], dialect: Dialect


@bkyryliuk it appears the the get_columns() method in the Presto spec is sometimes returning types as strings and not native Sql Alchemy type objects, which was causing my new tests to fail (see comment below). Which made me wonder how we hadn't bumped into this problem before, as this method should be called every time we add a new table.

villebro · 2020-08-21T11:12:36Z

tests/datasource_tests.py

+            schema="main",
+            table_name="dummy_sql_table",
+            database=get_example_database(),
+            sql="select 123 as intcol, 'abc' as strcol",


@bkyryliuk this query was raising an exception, as either the type for intcol or strcol was returned as OTHER, indicating that the type wasn't found in models/sql_types/presto_sql_types.py:type_map. See https://github.com/apache/incubator-superset/blob/878f06d1339bb32f74a70e3c6c5d338c86a6f5c6/superset/db_engine_specs/presto.py#L337-L358

I see the same in our deployment:

Looks like pyhive / sqlalchemy is not happy with the varchar(3)

SHOW COLUMNS FROM bogdankyryliuk.bogdan_simple_test -- intcol integer strcol varchar(3)

mistercrunch · 2020-08-21T16:42:37Z

superset/connectors/sqla/models.py

+                    )
+                )
+            with closing(engine.raw_connection()) as conn:
+                with closing(conn.cursor()) as cursor:


I'd like to have a single code path that does this. Is there a way we can refactor/share code with the sqllab modules here?
https://github.com/apache/incubator-superset/blob/master/superset/sql_lab.py#L337

I'm also wondering if this should run on an async worker when possible, but that makes more complex here.

In this particular case the work is very much synchronous, but I agree that the single code path is desirable (this solution was a compromise for quick delivery as I feel sql_lab.py and result_set.py are in need of more comprehensive refactoring outside the scope of this PR). I have a proposal in mind that should be a small step in the right direction without having to derail this PR too much. Will update this PR shortly.

@mistercrunch I looked into joining these code paths and making it possible to make it async, but came to the conclusion that that refactoring is best done once we start working on the async query framework. I added a todo with my name next to it stating that the metadata fetching should be merged with the SQL Lab code, and will be happy to do that once we have the necessary structures in place.

WenQiangW · 2020-09-16T08:17:38Z

superset/connectors/sqla/models.py

-                col["type"] = "UNKNOWN"
+        db_engine_spec = self.database.db_engine_spec
+        if self.sql:
+            engine = self.database.get_sqla_engine()


engine = self.database.get_sqla_engine(schema=self.schema)

I wonder if there are missing parameters ？

WenQiangW · 2020-09-16T09:32:17Z

superset/connectors/sqla/models.py

+                )
+            with closing(engine.raw_connection()) as conn:
+                with closing(conn.cursor()) as cursor:
+                    query = statements[0]


query = self.database.apply_limit_to_sql(query, limit)

Respect

I think it’s better to add a restriction, because here only a few data needs to be queried.

villebro · 2020-10-21T07:39:43Z

Thanks @WenQiangW for the review comments!

codecov-io · 2020-10-21T08:13:06Z

Codecov Report

Merging #10645 into master will decrease coverage by 4.17%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master   #10645      +/-   ##
==========================================
- Coverage   65.78%   61.60%   -4.18%     
==========================================
  Files         838      838              
  Lines       39841    39843       +2     
  Branches     3655     3650       -5     
==========================================
- Hits        26208    24544    -1664     
- Misses      13532    15119    +1587     
- Partials      101      180      +79

Flag	Coverage Δ
#cypress	`?`
#javascript	`62.64% <100.00%> (+<0.01%)`	⬆️
#python	`60.97% <100.00%> (+0.05%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...erset-frontend/src/datasource/DatasourceEditor.jsx	`63.72% <100.00%> (-7.65%)`	⬇️
superset/connectors/sqla/models.py	`90.45% <100.00%> (+0.70%)`	⬆️
superset/views/datasource.py	`94.82% <100.00%> (+1.38%)`	⬆️
superset-frontend/src/SqlLab/App.jsx	`0.00% <0.00%> (-100.00%)`	⬇️
superset-frontend/src/explore/App.jsx	`0.00% <0.00%> (-100.00%)`	⬇️
superset-frontend/src/dashboard/App.jsx	`0.00% <0.00%> (-100.00%)`	⬇️
superset-frontend/src/explore/index.jsx	`0.00% <0.00%> (-100.00%)`	⬇️
superset-frontend/src/dashboard/index.jsx	`0.00% <0.00%> (-100.00%)`	⬇️
superset-frontend/src/setup/setupColors.js	`0.00% <0.00%> (-100.00%)`	⬇️
superset-frontend/src/chart/ChartContainer.jsx	`0.00% <0.00%> (-100.00%)`	⬇️
... and 171 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cae54ac...9cbf857. Read the comment docs.

villebro · 2020-10-21T10:56:29Z

This has been rebased + comments addressed + added support for templating. If there are no further bugs here, I propose merging this as-is.

villebro · 2020-10-26T21:50:23Z

Ping @robdiciuccio . IMO the refactoring proposed here is best taken care of when we start working on the async query framework, during which I assume we'll end up refactoring many parts of the SQL Lab codebase relevant to this functionality.

bkyryliuk

everything looks good to me, only caveat is that we don't use virtual tables in dropbox - won't be able to test it out.

mistercrunch

I'm supportive of merging this to fix the "dead end" issue that currently exists when following the explore flow (can't add columns to your query).

villebro · 2020-10-27T06:57:20Z

Thanks @mistercrunch and @bkyryliuk for your help pushing this across the finish line! 🏁

* feat: enable metadata sync for virtual tables * add migration and check for empty schema name * simplify request * truncate trailing column attributes for MySQL * add unit test * use db_engine_spec func to truncate collation and charset * Remove redundant migration * add more tests * address review comments and apply templating to query * add todo for refactoring * remove schema from tests * check column datatype

superset-github-bot bot added the preset-io label Aug 19, 2020

pull-request-size bot added the size/M label Aug 19, 2020

villebro force-pushed the villebro/virtual-table-metadata branch 2 times, most recently from af7e124 to 4f1d41f Compare August 19, 2020 20:55

pull-request-size bot added size/L and removed size/M labels Aug 19, 2020

villebro force-pushed the villebro/virtual-table-metadata branch 3 times, most recently from 031df5d to 0483a1d Compare August 21, 2020 07:04

villebro commented Aug 21, 2020

View reviewed changes

villebro changed the title ~~[WIP] feat: enable metadata sync for virtual tables~~ feat: enable metadata sync for virtual tables Aug 21, 2020

villebro requested review from etr2460, bkyryliuk, dpgaspar and john-bodley August 21, 2020 10:16

villebro commented Aug 21, 2020

View reviewed changes

villebro force-pushed the villebro/virtual-table-metadata branch from d532fdb to 0b5ecd5 Compare August 21, 2020 11:22

mistercrunch reviewed Aug 21, 2020

View reviewed changes

villebro changed the title ~~feat: enable metadata sync for virtual tables~~ [WIP] feat: enable metadata sync for virtual tables Aug 22, 2020

villebro force-pushed the villebro/virtual-table-metadata branch 3 times, most recently from dda5494 to a8749f0 Compare August 25, 2020 05:05

WenQiangW reviewed Sep 16, 2020

View reviewed changes

villebro added 8 commits October 21, 2020 10:32

feat: enable metadata sync for virtual tables

142bb8f

add migration and check for empty schema name

e0822b2

simplify request

eec7e59

truncate trailing column attributes for MySQL

b117c75

add unit test

e513305

use db_engine_spec func to truncate collation and charset

3dac92d

Remove redundant migration

6587292

add more tests

9948233

villebro force-pushed the villebro/virtual-table-metadata branch from a8749f0 to 9948233 Compare October 21, 2020 07:33

address review comments and apply templating to query

45d031a

villebro changed the title ~~[WIP] feat: enable metadata sync for virtual tables~~ feat: enable metadata sync for virtual tables Oct 21, 2020

benceorlai mentioned this pull request Oct 21, 2020

Improve the SQL Lab to Explore flow UX apache-superset/superset-roadmap#19

Closed

add todo for refactoring

c0b918b

benceorlai mentioned this pull request Oct 21, 2020

Attempting to edit a table column redirects to another table #10965

Closed

3 tasks

villebro added 2 commits October 21, 2020 11:47

remove schema from tests

5148ded

check column datatype

9cbf857

villebro requested a review from mistercrunch October 21, 2020 10:56

bkyryliuk approved these changes Oct 27, 2020

View reviewed changes

mistercrunch reviewed Oct 27, 2020

View reviewed changes

villebro merged commit ecdff72 into apache:master Oct 27, 2020

villebro deleted the villebro/virtual-table-metadata branch October 27, 2020 06:56

villebro mentioned this pull request Nov 17, 2020

fix: do not drop calculated column on metadata sync #11731

Merged

6 tasks

mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 1.0.0 labels Mar 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: enable metadata sync for virtual tables #10645

feat: enable metadata sync for virtual tables #10645

villebro commented Aug 19, 2020 •

edited

Loading

codecov-commenter commented Aug 21, 2020 •

edited

Loading

villebro Aug 21, 2020

villebro Aug 21, 2020 •

edited

Loading

villebro Aug 21, 2020

villebro Aug 21, 2020

mistercrunch Oct 26, 2020

mistercrunch Oct 26, 2020

mistercrunch Oct 27, 2020

villebro Aug 21, 2020

villebro Aug 21, 2020

villebro Aug 21, 2020 •

edited

Loading

bkyryliuk Sep 16, 2020

mistercrunch Aug 21, 2020 •

edited

Loading

mistercrunch Aug 21, 2020

villebro Aug 22, 2020

villebro Oct 21, 2020

WenQiangW Sep 16, 2020

WenQiangW Sep 16, 2020

villebro commented Oct 21, 2020

codecov-io commented Oct 21, 2020 •

edited

Loading

villebro commented Oct 21, 2020

villebro commented Oct 26, 2020

bkyryliuk left a comment

mistercrunch left a comment

villebro commented Oct 27, 2020

feat: enable metadata sync for virtual tables #10645

feat: enable metadata sync for virtual tables #10645

Conversation

villebro commented Aug 19, 2020 • edited Loading

SUMMARY

SCREENSHOTS

TEST PLAN

ADDITIONAL INFORMATION

codecov-commenter commented Aug 21, 2020 • edited Loading

Codecov Report

Choose a reason for hiding this comment

villebro Aug 21, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

villebro Aug 21, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mistercrunch Aug 21, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

villebro commented Oct 21, 2020

codecov-io commented Oct 21, 2020 • edited Loading

Codecov Report

villebro commented Oct 21, 2020

villebro commented Oct 26, 2020

bkyryliuk left a comment

Choose a reason for hiding this comment

mistercrunch left a comment

Choose a reason for hiding this comment

villebro commented Oct 27, 2020

villebro commented Aug 19, 2020 •

edited

Loading

codecov-commenter commented Aug 21, 2020 •

edited

Loading

villebro Aug 21, 2020 •

edited

Loading

villebro Aug 21, 2020 •

edited

Loading

mistercrunch Aug 21, 2020 •

edited

Loading

codecov-io commented Oct 21, 2020 •

edited

Loading