[CT-1340] [Feature] Support Python models (dbt-py) on Redshift/AWS #204

ChenyuLInx · 2022-10-13T00:30:33Z

Is this your first time submitting a feature request?

I have read the expectations for open source contributors
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward extension of existing dbt-redshift functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Background:

There's a Spark redshift connector. This would allow user to run python transformation code on EMR cluster that load data from Redshift, and write transformed data back to Redshift. The whole process is very similar to using Dataproc to run python models on GCP/BigQuery.

Items needed for implementation:

If there's additional profile information needed for EMR cluster, we can add it as optional attributes at Credentials(existing example for bigquery).
We need one macro to generate the final code to run on EMR cluster, Previous example for dbt-bigquery here
Now that we have the profile info and the macro to generate final code, we need submission classes to submit python code to the cluster. Existing submission code for dbt-bigquery, Function to define in impl.py(link1, link2). And how those classes are being used by dbt-core(This doesn't need to be changed.)

Describe alternatives you've considered

No response

Who will this benefit?

No response

Are you interested in contributing this feature?

No response

Anything else?

No response

lostmygithubaccount · 2022-11-01T17:12:33Z

some relevant community discussion:

per @colin-rogers-dbt, it may be easier to run on Glue than EMR. I personally have no preference -- whatever is easier for users to setup and faster to run on

colin-rogers-dbt · 2022-11-01T18:06:15Z

I wonder if the right thing here to do is just pick on to implement with a long term of supporting both, can we leverage the existing spark/glue adapters?

lostmygithubaccount · 2022-11-30T17:27:00Z

new redshift integration with Apache Spark announced: https://aws.amazon.com/blogs/aws/new-amazon-redshift-integration-with-apache-spark/

saraleon1 · 2022-12-01T12:17:07Z

+1 dbt Cloud Enterprise Customer - This team is using Redshift and are really interested in leveraging python models in their dbt project. Their AWS contact passed this resource along as a possible way to run python + redshift.

colin-rogers-dbt · 2022-12-01T18:42:52Z

@lostmygithubaccount thanks for sharing, we should definitely look into leverage this!

This is particularly interesting as it means we don't need to recreate the issues with the dbt-bigquery adapter where we shoehorned dataproc in. Going forward we should adhere to the rough principle that an adapter should only know how to leverage the capabilities of a single data transformation tool. If we want to support emr/glue we should focus on multi-adapter/project so folk can use the existing adapters for those tools.

lostmygithubaccount · 2022-12-01T18:49:40Z

@saraleon1 great to hear! on the Python connector there, we should use that in dbt-redshift in general for making the connection in the way we use the similar Snowflake connector. however, I don't think that'll get us Python models -- or at least not the ones we want. I see it can read into numpy or pandas locally, but we want to maintain the principle of executing Python code remotely (in the "warehouse")

@colin-rogers-dbt very much agreed! as fyi BigQuery was working on more native integration (forwarded you an email)

cc: @jtcohen6

ryadav03 · 2023-01-13T08:13:39Z

Any ETA on this?

github-actions · 2023-07-13T02:11:29Z

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

zdravis · 2023-08-07T14:54:09Z

Hi, is this feature going to be implemented? @colin-rogers-dbt

viniciusnunest · 2023-08-11T18:03:43Z

+1 Great ideia !

dnascimento · 2023-09-04T04:41:01Z

:+1 dbt-fal supports this but not with incremetal

joshua-pgatour · 2023-09-12T18:50:55Z

Would love this feature please.

ipcleary · 2023-09-22T14:00:50Z

+1

cschouten · 2023-09-25T16:00:44Z

+1

rohaldb · 2023-10-28T01:50:50Z

+1

rohaldb · 2023-10-28T01:56:33Z

I would really like to move my pipelines into DBT but don't want to lock myself into having to use SQL for transformations. So unfortuantely until this is done i'm going to have to stick to Glue :/

erees-embarkvet · 2023-12-08T20:35:24Z

+1

qoqajr · 2024-01-11T13:45:35Z

1

SkinnyPigeon · 2024-01-18T12:52:32Z

We would love this feature

marzaccaro · 2024-05-09T15:00:06Z

Is this going to be implemented?

MICHAELFOLA · 2024-05-15T21:14:39Z

This will be a great addition. I was looking to use this today but unfortunately I will have to use sql again.

spagnoloe-amenitiz · 2024-08-21T08:51:37Z

+1, Redshift is one of the main DWHs in the market and this feature is a great tool for Data Science/Analytics. It is a pity not to have it there...

ChenyuLInx added enhancement New feature or request triage labels Oct 13, 2022

github-actions bot changed the title ~~[Feature] Python model~~ [CT-1340] [Feature] Python model Oct 13, 2022

ChenyuLInx added help_wanted Extra attention is needed and removed triage labels Oct 13, 2022

lostmygithubaccount added the python_models label Oct 13, 2022

lostmygithubaccount changed the title ~~[CT-1340] [Feature] Python model~~ [CT-1340] [Feature] Support Python models (dbt-py) on Redshift/AWS Nov 1, 2022

This was referenced Nov 14, 2022

[CT-1504] [Feature] Replace psycopg2 with redshift_connector #219

Closed

[CT-1185] [Feature] [Spike] Support dbt Python models on Redshift/AWS #188

Closed

dbeatty10 mentioned this issue Dec 19, 2022

[CT-1679] Python Materialization dbt-labs/dbt-core#6459

Closed

2 tasks

dbeatty10 mentioned this issue Mar 20, 2023

[ADAP-381] Enable tagging of query_group in configs #376

Open

3 tasks

trouze mentioned this issue May 15, 2023

ADAP-381: adds query tagging support and materializations to dbt-redshift #379

Closed

6 tasks

github-actions bot added the Stale label Jul 13, 2023

github-actions bot closed this as completed Jul 21, 2023

jessedobbelaere mentioned this issue Aug 12, 2024

feat: Implement Python Models Using EMR Serverless dbt-labs/dbt-athena#700

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CT-1340] [Feature] Support Python models (dbt-py) on Redshift/AWS #204

[CT-1340] [Feature] Support Python models (dbt-py) on Redshift/AWS #204

ChenyuLInx commented Oct 13, 2022

lostmygithubaccount commented Nov 1, 2022

colin-rogers-dbt commented Nov 1, 2022

lostmygithubaccount commented Nov 30, 2022

saraleon1 commented Dec 1, 2022

colin-rogers-dbt commented Dec 1, 2022

lostmygithubaccount commented Dec 1, 2022

ryadav03 commented Jan 13, 2023

github-actions bot commented Jul 13, 2023

zdravis commented Aug 7, 2023

viniciusnunest commented Aug 11, 2023

dnascimento commented Sep 4, 2023

joshua-pgatour commented Sep 12, 2023

ipcleary commented Sep 22, 2023

cschouten commented Sep 25, 2023

rohaldb commented Oct 28, 2023

rohaldb commented Oct 28, 2023

erees-embarkvet commented Dec 8, 2023

qoqajr commented Jan 11, 2024

SkinnyPigeon commented Jan 18, 2024

marzaccaro commented May 9, 2024

MICHAELFOLA commented May 15, 2024

spagnoloe-amenitiz commented Aug 21, 2024

[CT-1340] [Feature] Support Python models (dbt-py) on Redshift/AWS #204

[CT-1340] [Feature] Support Python models (dbt-py) on Redshift/AWS #204

Comments

ChenyuLInx commented Oct 13, 2022

Is this your first time submitting a feature request?

Describe the feature

Background:

Items needed for implementation:

Describe alternatives you've considered

Who will this benefit?

Are you interested in contributing this feature?

Anything else?

lostmygithubaccount commented Nov 1, 2022

colin-rogers-dbt commented Nov 1, 2022

lostmygithubaccount commented Nov 30, 2022

saraleon1 commented Dec 1, 2022

colin-rogers-dbt commented Dec 1, 2022

lostmygithubaccount commented Dec 1, 2022

ryadav03 commented Jan 13, 2023

github-actions bot commented Jul 13, 2023

zdravis commented Aug 7, 2023

viniciusnunest commented Aug 11, 2023

dnascimento commented Sep 4, 2023

joshua-pgatour commented Sep 12, 2023

ipcleary commented Sep 22, 2023

cschouten commented Sep 25, 2023

rohaldb commented Oct 28, 2023

rohaldb commented Oct 28, 2023

erees-embarkvet commented Dec 8, 2023

qoqajr commented Jan 11, 2024

SkinnyPigeon commented Jan 18, 2024

marzaccaro commented May 9, 2024

MICHAELFOLA commented May 15, 2024

spagnoloe-amenitiz commented Aug 21, 2024