Feature/python model v1 #209

ChenyuLInx · 2022-06-29T04:38:44Z

This change currently includes table materialization.

Also super happy to hear any feedback and what you think we missed.
TODO:

pass in timeout, retry for dataproc submit leaving it it out and commented in [CT-881] refactor retries #231 to revisit
Add clear error message when running python model but related library is not installed

A lot of TODO still

ChenyuLInx · 2022-06-29T04:40:39Z

tests/functional/adapter/test_python.py

+models__simple_python_model_v2 = """
+import pandas
+
+def model(dbt):


What the model code looks like

ChenyuLInx · 2022-06-29T15:08:12Z

dbt/include/bigquery/macros/materializations/table.sql

+spark = SparkSession.builder.appName('smallTest').getOrCreate()
+
+spark.conf.set("viewsEnabled","true")
+spark.conf.set("temporaryGcsBucket","python-model-test")


TODO in the comment above

jtcohen6 · 2022-07-19T09:01:52Z

dbt/adapters/bigquery/impl.py

+    from google.cloud import dataproc_v1
+    from google.cloud import storage


Let's add these as an extras_require — e.g. pip install dbt-bigquery[dataproc]

Let's raise a clearer error below if a user wants to use Python models, and hasn't installed the extra

jtcohen6 · 2022-07-19T09:13:51Z

dbt/adapters/bigquery/impl.py

+        matches = re.match("gs://(.*?)/(.*)", response.driver_output_resource_uri)
+        output = (
+            storage.Client()
+            .get_bucket(matches.group(1))
+            .blob(f"{matches.group(2)}.000000000")
+            .download_as_string()
+        )


There's no way to pass a full GCS resource URL into storage.Client()? That's sorta surprising to me. This isn't the worst regex, so I'm not strictly opposed to taking this approach if it's necessary

I am not sure! I just followed a tutorial google provided for this, we can look into it more if needed

ope ok makes sense! :)

jtcohen6 · 2022-07-19T09:27:20Z

dbt/adapters/bigquery/impl.py

+        # Dataproc job output is saved to the Cloud Storage bucket
+        # allocated to the job. Use regex to obtain the bucket and blob info.
+        matches = re.match("gs://(.*?)/(.*)", response.driver_output_resource_uri)
+        output = (


I take it that these are Spark logs being streamed back to GCS: https://cloud.google.com/dataproc/docs/guides/dataproc-job-output

If we felt motivated, we could try parsing these logs to infer metadata for the result. Not a priority right now

Yes, I looked into it briefly, didn't found much things to surface up right now. We can take another look later on

* Use same auth for Dataproc + GCS * remove fetch result Co-authored-by: Chenyu Li <[email protected]>

* fix test and add clear install instruction * rename and fix format

stu-k

Nothing stands out 👍

gshank

I don't see why the dataproc_region wasn't found. Once that is working, looks fine.

ChenyuLInx added 3 commits June 28, 2022 15:25

bump version and add a TODo

30f76cd

python model table materialization

0305345

A lot of TODO still

more consistent name and new submit args

bd205e2

cla-bot bot added the cla:yes label Jun 29, 2022

ChenyuLInx marked this pull request as draft June 29, 2022 04:38

ChenyuLInx commented Jun 29, 2022

View reviewed changes

jtcohen6 mentioned this pull request Jul 5, 2022

Beta docs: Python models dbt-labs/docs.getdbt.com#1664

Closed

1 task

ChenyuLInx added 2 commits July 18, 2022 17:39

adjust return and add requirment

6550b92

pass in bucket instead of hardcode

e8b81a0

jtcohen6 reviewed Jul 19, 2022

View reviewed changes

jtcohen6 and others added 5 commits July 21, 2022 15:10

Use same auth for Dataproc + GCS (#224)

9be7017

* Use same auth for Dataproc + GCS * remove fetch result Co-authored-by: Chenyu Li <[email protected]>

fix test and add clear install instruction (#234)

3201e90

* fix test and add clear install instruction * rename and fix format

Chenyu/fix python model test (#237)

cf4f1b9

Merge branch 'main' into feature/python-model-v1

45aff6f

fix for pre-commit

c1360c9

ChenyuLInx marked this pull request as ready for review July 26, 2022 04:34

ChenyuLInx requested review from gshank, stu-k, VersusFacit, McKnight-42 and jtcohen6 July 26, 2022 04:39

start running test for python model

1d544a6

stu-k approved these changes Jul 27, 2022

View reviewed changes

ChenyuLInx and others added 2 commits July 27, 2022 13:08

test envvar

fe44cb7

add debugging

bf6b58e

emmyoop closed this Jul 27, 2022

emmyoop reopened this Jul 27, 2022

emmyoop and others added 2 commits July 27, 2022 15:35

Merge branch 'main' into feature/python-model-v1

40f1935

run tests again

9b17d7d

gshank approved these changes Jul 27, 2022

View reviewed changes

ChenyuLInx and others added 8 commits July 27, 2022 14:37

more trouble shoot

12acd0a

pass in arg with tox

bfa7716

install extra deps

0a572d4

Add incremental materializations (#226)

257f994

point to core

47f3011

Add changelog for python incremantal mats (#241)

412f06d

adjust test name

3922a02

remove unneeded call statement to generate right string

c7604e4

ChenyuLInx merged commit 55bbd3d into main Jul 28, 2022

ChenyuLInx deleted the feature/python-model-v1 branch July 28, 2022 21:12

dataders mentioned this pull request Apr 25, 2024

BigQuery tags do not work dbt-labs/dbt-adapters#565

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/python model v1 #209

Feature/python model v1 #209

ChenyuLInx commented Jun 29, 2022 •

edited

Loading

ChenyuLInx Jun 29, 2022

ChenyuLInx Jun 29, 2022

jtcohen6 Jul 19, 2022

jtcohen6 Jul 19, 2022

ChenyuLInx Jul 19, 2022

jtcohen6 Jul 19, 2022

jtcohen6 Jul 19, 2022

ChenyuLInx Jul 19, 2022

stu-k left a comment

gshank left a comment

		from google.cloud import dataproc_v1
		from google.cloud import storage

Feature/python model v1 #209

Feature/python model v1 #209

Conversation

ChenyuLInx commented Jun 29, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stu-k left a comment

Choose a reason for hiding this comment

gshank left a comment

Choose a reason for hiding this comment

ChenyuLInx commented Jun 29, 2022 •

edited

Loading