Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenLineage: Add dag_id when generating run_id for task instance. #36659

Merged
merged 1 commit into from
Jan 9, 2024

Conversation

kacpermuda
Copy link
Contributor

@kacpermuda kacpermuda commented Jan 8, 2024

We might get the same run_id for the corresponding tasks even though they come from different dags. This happens when two dags are almost identical (same task ids, same schedules).

F.e., in this case we have the same dag copied twice, but i changed the name of one of them. When we allow dags to run backfill, we will receive events from two dags, but the run_id for task events will be the same in both cases.

from airflow import DAG
from airflow.operators.bash import BashOperator
from airflow.utils.dates import days_ago

with DAG(
    dag_id='test',
    start_date=days_ago(2),
    schedule_interval="@daily",
    catchup=True
) as dag1:
    task1 = BashOperator(
        task_id='run',
        bash_command="echo 'test'; "
    )


with DAG(
    dag_id='test_1',
    start_date=days_ago(2),
    schedule_interval="@daily",
    catchup=True
) as dag2:
    task2 = BashOperator(
        task_id='run',
        bash_command="echo 'test'; "
    )

To fix that, I added a dag_id to the function that generates run_id for the task. I also added some macros tests and adjusted all other tests.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@kacpermuda kacpermuda marked this pull request as draft January 8, 2024 13:48
@kacpermuda kacpermuda force-pushed the ol/fix/task_run_id branch 3 times, most recently from 5ac1fd6 to 12463d3 Compare January 8, 2024 17:43
@kacpermuda kacpermuda marked this pull request as ready for review January 8, 2024 18:02
@kacpermuda kacpermuda requested a review from josh-fell as a code owner January 8, 2024 18:02
@eladkal eladkal requested a review from mobuchowski January 9, 2024 06:32
@mobuchowski mobuchowski merged commit 95a8310 into apache:main Jan 9, 2024
53 checks passed
@kacpermuda kacpermuda deleted the ol/fix/task_run_id branch January 9, 2024 12:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants