-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-1340] [Feature] Support Python models (dbt-py) on Redshift/AWS #204
Comments
some relevant community discussion:
per @colin-rogers-dbt, it may be easier to run on Glue than EMR. I personally have no preference -- whatever is easier for users to setup and faster to run on |
I wonder if the right thing here to do is just pick on to implement with a long term of supporting both, can we leverage the existing spark/glue adapters? |
new redshift integration with Apache Spark announced: https://aws.amazon.com/blogs/aws/new-amazon-redshift-integration-with-apache-spark/ |
+1 dbt Cloud Enterprise Customer - This team is using Redshift and are really interested in leveraging python models in their dbt project. Their AWS contact passed this resource along as a possible way to run python + redshift. |
@lostmygithubaccount thanks for sharing, we should definitely look into leverage this! This is particularly interesting as it means we don't need to recreate the issues with the dbt-bigquery adapter where we shoehorned dataproc in. Going forward we should adhere to the rough principle that an adapter should only know how to leverage the capabilities of a single data transformation tool. If we want to support emr/glue we should focus on multi-adapter/project so folk can use the existing adapters for those tools. |
@saraleon1 great to hear! on the Python connector there, we should use that in @colin-rogers-dbt very much agreed! as fyi BigQuery was working on more native integration (forwarded you an email) cc: @jtcohen6 |
Any ETA on this? |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days. |
Hi, is this feature going to be implemented? @colin-rogers-dbt |
+1 Great ideia ! |
:+1 dbt-fal supports this but not with incremetal |
Would love this feature please. |
+1 |
2 similar comments
+1 |
+1 |
I would really like to move my pipelines into DBT but don't want to lock myself into having to use SQL for transformations. So unfortuantely until this is done i'm going to have to stick to Glue :/ |
+1 |
|
We would love this feature |
Is this going to be implemented? |
This will be a great addition. I was looking to use this today but unfortunately I will have to use sql again. |
+1, Redshift is one of the main DWHs in the market and this feature is a great tool for Data Science/Analytics. It is a pity not to have it there... |
Is this your first time submitting a feature request?
Describe the feature
Background:
There's a Spark redshift connector. This would allow user to run python transformation code on EMR cluster that load data from Redshift, and write transformed data back to Redshift. The whole process is very similar to using Dataproc to run python models on GCP/BigQuery.
Items needed for implementation:
Describe alternatives you've considered
No response
Who will this benefit?
No response
Are you interested in contributing this feature?
No response
Anything else?
No response
The text was updated successfully, but these errors were encountered: