Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1446 dag run failed 979s #1464

Merged
merged 5 commits into from
Dec 2, 2024
Merged

1446 dag run failed 979s #1464

merged 5 commits into from
Dec 2, 2024

Conversation

jgreben
Copy link
Contributor

@jgreben jgreben commented Nov 27, 2024

Fixes #1446

It is not possible to simply update the state of a dag to "QUEUED" (or anything else) and have it run again (see apache/airflow#35729 (comment)). This PR makes it so that a new DAG is launched with the previous DAG's conf. Also, the DAG that finds the previously failed DAGS will only look for ones that are 30 days old, so that it will not keep trying to re-rerun the subsequently failed ones. I made that number 30 days because I vaguely remember that was the period we thought we would run this new DAG. If you want, I can make that number into a Variable or change it for this PR.

Using the airflow.models.DagRun clear_dags method now.

@jgreben jgreben marked this pull request as draft November 27, 2024 01:05
@shelleydoljack
Copy link
Contributor

It's kind of confusing to separate out commits that were previously merged but I managed. I like the idea of making the 30 days a variable. Especially since we are almost at that point from when we first ran the digital bookplate dags. We'd want to retry the failed ones monthly, so 30 days from the original run seems okay. The MARC records that we can add a 979 to may not appear for a whole year so I guess the same one would get rerun every 30 days, the only difference being the execution_date.

Does it not work to clear the state (rather than setting it to queued) and just let the scheduler queue them?

@jgreben jgreben force-pushed the 1446-dag-run-failed-979s branch from f0a618a to a4cb4e1 Compare November 27, 2024 01:54
@jgreben
Copy link
Contributor Author

jgreben commented Nov 27, 2024

The draft PR was not actually ready for a review, but it is now. I didn't see a way using the DagRunState model to remove or delete a state. I will add the Variable for the start_date.

@jgreben jgreben marked this pull request as ready for review November 27, 2024 01:59
@jgreben jgreben marked this pull request as draft November 27, 2024 02:01
@jgreben
Copy link
Contributor Author

jgreben commented Nov 27, 2024

https://sul-libsys-airflow-dev.stanford.edu/api/v1/dags/{dag_id}/dagRuns/{dag_run_id}/clear does work if we want to create a https function instead of using the DagRun class. I also found a clear method in the Dag model https://airflow.apache.org/docs/apache-airflow/stable/_api/airflow/models/dag/index.html#airflow.models.dag.DAG.clear that I can try.

@shelleydoljack
Copy link
Contributor

https://sul-libsys-airflow-dev.stanford.edu/api/v1/dags/{dag_id}/dagRuns/{dag_run_id}/clear does work if we want to create a https function instead of using the DagRun class. I also found a clear method in the Dag model https://airflow.apache.org/docs/apache-airflow/stable/_api/airflow/models/dag/index.html#airflow.models.dag.DAG.clear that I can try.

I thought you could use the DagRun.set_state() to None but maybe not?

@jgreben
Copy link
Contributor Author

jgreben commented Nov 27, 2024

https://sul-libsys-airflow-dev.stanford.edu/api/v1/dags/{dag_id}/dagRuns/{dag_run_id}/clear does work if we want to create a https function instead of using the DagRun class. I also found a clear method in the Dag model https://airflow.apache.org/docs/apache-airflow/stable/_api/airflow/models/dag/index.html#airflow.models.dag.DAG.clear that I can try.

I thought you could use the DagRun.set_state() to None but maybe not?

This works:
dag.clear_dags( dags=[dag], only_failed=True, dry_run=False, )

@jgreben jgreben marked this pull request as ready for review November 27, 2024 22:21
Copy link
Contributor

@shelleydoljack shelleydoljack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I'm not sure about the execution_date for dag runs more than 30 days ago.

@jgreben jgreben merged commit 609fbdb into main Dec 2, 2024
4 checks passed
@jgreben jgreben deleted the 1446-dag-run-failed-979s branch December 2, 2024 22:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Trigger polling dag for re-running failed digital_bookplate_979 dag runs
3 participants