feat: welcome presto to the suite of tested databases #10498

bkyryliuk · 2020-08-01T19:46:50Z

Goal: adds presto to the CI.
Non goal: clean up inconsistencies in handling different databases

Follow up: there will be more PRs cleaning up inconsistent handling of the presto db, moving load_examples into the pytest fixture etc

SUMMARY

introduces test suite with presto & memory connector
splits main from examples db in the tests

Based on: #10487
Test only change.

TEST PLAN

CI

willbarrett · 2020-08-04T20:47:35Z

requirements-dev.txt

-pyhive==0.6.2
+# Enable in-place multirow inserts
+# TODO(bkyryliuk): release new version of pyhive
+git+https://github.com/dropbox/PyHive@master


I think this will be a blocker - the community has not been accepting of installing dependencies from Github.

willbarrett · 2020-08-04T20:48:24Z

superset/examples/birth_names.py

@@ -54,19 +54,26 @@ def gen_filter(

 def load_data(tbl_name: str, database: Database, sample: bool = False) -> None:
    pdf = pd.read_json(get_example_data("birth_names.json.gz"))
-    pdf.ds = pd.to_datetime(pdf.ds, unit="ms")
+    if database.backend != "presto":


Nit: switching the order of the conditions might make this more readable.

willbarrett · 2020-08-04T20:49:43Z

superset/utils/core.py

@@ -1022,6 +1022,13 @@ def get_example_database() -> "Database":
    return get_or_create_db("examples", db_uri)


+def get_main_database() -> "Database":


We usually refer to this as the "metadata" database - should that be in the function name?

it is called in superset as a main database on the dev installation, I am fine with renaming it - however we should probably do it across the board in a separate PR

Ah, forgive the confusion on my part. This is fine as-is then.

Sample test data Datetime conversion Sample test data Fix tests

villebro

I think we should aim to put as much db-specific logic in db_engine_specs, especially in the examples loading logic. We need to add similar logic for mutating column names, e.g. BigQuery doesn't support columns starting with a number (there's other similar problems for other dbs).

WRT the test assertions, I think those are fine left as-is. Also curious about that extra cache key test failing, that seems like a potential security problem, so I'm happy to help track down what's causing that.

villebro · 2020-08-06T17:03:40Z

superset/examples/birth_names.py

+    if database.backend == "presto":
+        pdf.ds = pd.to_datetime(pdf.ds, unit="ms")
+        pdf.ds = pdf.ds.dt.strftime("%Y-%m-%d %H:%M%:%S")
+    else:
+        pdf.ds = pd.to_datetime(pdf.ds, unit="ms")


This logic should ideally be moved out to db_engine_specs/presto.py to keep this clean of db-specific logic. Something along the lines of BaseEngineSpec.convert_examples_datetime() or similar (there are other similar methods there).

my plan here is a bit different, we have talked about getting rid of load_examples for the test, I plan to move initialization code into the pytest fixture and will do that cleanup & restructure code to be more database generic.

The scope of the changes is test only, I prefer not to modify production code.

villebro · 2020-08-06T17:04:37Z

superset/examples/birth_names.py

+            # TODO(bkyryliuk): use TIMESTAMP type for presto
+            "ds": DateTime if database.backend != "presto" else String(255),


Same here, e.g. get_examples_datetime_type(). There are a few places below to which I feel the same applies.

villebro · 2020-08-06T17:16:31Z

tests/celery_tests.py

+        expected_result = []
+        if backend == "presto":
+            expected_result = (
+                [{"rows": 1}] if ctas_method == CtasMethod.TABLE else [{"result": True}]
+            )
+        self.assertEqual(expected_result, result["data"])
+        expected_columns = []
+        if backend == "presto":
+            expected_columns = [
+                {
+                    "name": "rows" if ctas_method == CtasMethod.TABLE else "result",
+                    "type": "BIGINT" if ctas_method == CtasMethod.TABLE else "BOOLEAN",
+                    "is_date": False,
+                }
+            ]


This also runs the risk of becoming convoluted if we introduce db-specific logic for all supported dbs. As this logic is repeated below, I think there is merit to centralizing this somewhere, too, but db_engine_specs probably isn't the right place for test assertion related stuff. I'm open to suggestions, but feel we can probably leave this as-is until we come up with a more scalable solution.

agree, will tackle it in the upcoming PRs, do not have clear solution yet.
most likely it would be possible to introduce TestDBSpec and hide that complexity there.

villebro · 2020-08-06T17:21:12Z

tests/database_api_tests.py

@@ -222,7 +223,7 @@ def test_get_select_star_not_found_table(self):
            return
        uri = f"api/v1/database/{example_db.id}/select_star/table_does_not_exist/"
        rv = self.client.get(uri)
-        self.assertEqual(rv.status_code, 404)
+        self.assertEqual(rv.status_code, 404 if example_db.backend != "presto" else 500)


I'd be interested in understanding why this is returning a 500. Perhaps add a TODO here so we can follow up on it.

pyhive raises the exception, probably it is not caught, left todo

villebro · 2020-08-06T17:22:26Z

tests/sqla_models_tests.py

+        # TODO: make it work with presto
+        if get_example_database().backend == "presto":
+            assert extra_cache_keys == []
+        else:
+            assert extra_cache_keys == ["abc"]


hmm, I wonder why this is failing for presto. This really shouldn't have anything to do with the underlying analytical db type..

agree, did not investigate - worth looking into

villebro · 2020-08-06T17:22:36Z

tests/sqla_models_tests.py

+        # TODO: make it work with presto
+        if get_example_database().backend == "presto":
+            assert extra_cache_keys == []
+        else:
+            assert extra_cache_keys == ["abc"]


Same here..

bkyryliuk · 2020-08-06T17:52:58Z

I think we should aim to put as much db-specific logic in db_engine_specs, especially in the examples loading logic. We need to add similar logic for mutating column names, e.g. BigQuery doesn't support columns starting with a number (there's other similar problems for other dbs).

WRT the test assertions, I think those are fine left as-is. Also curious about that extra cache key test failing, that seems like a potential security problem, so I'm happy to help track down what's causing that.

Agree with all the suggestions, my only comment here is this PR is fairly large already - I am happy to tackle the suggestions in the followup PRs.

villebro

Sounds good and I agree it makes sense to break it up into more small PRs than aim for a solve-it-all mega PR. LGTM.

* Add presto to the CI Sample test data Datetime conversion Sample test data Fix tests * TODO to switch to timestamps * Address feedback * Update requirements * Add TODOs Co-authored-by: bogdan kyryliuk <[email protected]>

pull-request-size bot added the size/XL label Aug 1, 2020

bkyryliuk marked this pull request as draft August 1, 2020 19:46

bkyryliuk force-pushed the bogdan/add_presto_to_tests branch 6 times, most recently from 9172b7e to e0841a1 Compare August 3, 2020 18:22

bkyryliuk marked this pull request as ready for review August 3, 2020 18:44

bkyryliuk requested review from ktmud, villebro, john-bodley and etr2460 August 3, 2020 18:44

bkyryliuk force-pushed the bogdan/add_presto_to_tests branch 2 times, most recently from 1c17b53 to e0841a1 Compare August 4, 2020 16:59

willbarrett reviewed Aug 4, 2020

View reviewed changes

bogdan-dbx added 2 commits August 5, 2020 11:07

Add presto to the CI

131309d

Sample test data Datetime conversion Sample test data Fix tests

TODO to switch to timestamps

1c86d56

bkyryliuk force-pushed the bogdan/add_presto_to_tests branch 2 times, most recently from 695c296 to e90deaa Compare August 5, 2020 18:15

Address feedback

2e1c8a8

bkyryliuk force-pushed the bogdan/add_presto_to_tests branch 2 times, most recently from cc2b2e0 to d45cbf9 Compare August 5, 2020 21:58

Update requirements

8fee355

bkyryliuk force-pushed the bogdan/add_presto_to_tests branch from d45cbf9 to 8fee355 Compare August 5, 2020 22:20

villebro reviewed Aug 6, 2020

View reviewed changes

Add TODOs

9f1efa6

villebro approved these changes Aug 6, 2020

View reviewed changes

bkyryliuk merged commit 62b873e into apache:master Aug 6, 2020

mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.38.0 labels Mar 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: welcome presto to the suite of tested databases #10498

feat: welcome presto to the suite of tested databases #10498

bkyryliuk commented Aug 1, 2020 •

edited

Loading

willbarrett Aug 4, 2020

willbarrett Aug 4, 2020

willbarrett Aug 4, 2020

bkyryliuk Aug 5, 2020

willbarrett Aug 5, 2020

villebro left a comment

villebro Aug 6, 2020

bkyryliuk Aug 6, 2020

villebro Aug 6, 2020

villebro Aug 6, 2020

bkyryliuk Aug 6, 2020

villebro Aug 6, 2020

bkyryliuk Aug 6, 2020

villebro Aug 6, 2020

bkyryliuk Aug 6, 2020

villebro Aug 6, 2020

bkyryliuk commented Aug 6, 2020

villebro left a comment

		@@ -1022,6 +1022,13 @@ def get_example_database() -> "Database":
		return get_or_create_db("examples", db_uri)


		def get_main_database() -> "Database":

		# TODO(bkyryliuk): use TIMESTAMP type for presto
		"ds": DateTime if database.backend != "presto" else String(255),

feat: welcome presto to the suite of tested databases #10498

feat: welcome presto to the suite of tested databases #10498

Conversation

bkyryliuk commented Aug 1, 2020 • edited Loading

SUMMARY

TEST PLAN

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

villebro left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bkyryliuk commented Aug 6, 2020

villebro left a comment

Choose a reason for hiding this comment

bkyryliuk commented Aug 1, 2020 •

edited

Loading