GlueJobOperator with local script location fails on consecutive runs #38959
Labels
area:providers
kind:bug
This is a clearly a bug
needs-triage
label for new issues that we didn't triage yet
Apache Airflow Provider(s)
amazon
Versions of Apache Airflow Providers
apache-airflow-providers-amazon 8.19.0
Apache Airflow version
2.8.3
Operating System
Amazon Linux 2; Kernel Version: 5.10.209-198.812.amzn2.x86_64
Deployment
Official Apache Airflow Helm Chart
Deployment details
We deploy airflow on EKS using the official Helm chart.
What happened
We are deploying a Glue Job using the GlueJobOperator with the following configuration:
This works fine for the first run of our DAG and the script file gets uploaded to
artifacts/glue-scripts/weather_data_prepared.py
However, when we trigger the DAG for a second run, it fails because the file already exists.
What you think should happen instead
We are of the opinion, that the file on S3 should be overwritten for subsequent DAG executions.
So that consecutive runs of GlueJobOperators using local script locations do not fail.
This enables us to subject our script files to version control and CI/CD pipelines.
How to reproduce
Anything else
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: