Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

utilize map_index for deterministic generation of OpenLineage's run_id #43936

Merged
merged 1 commit into from
Nov 25, 2024

Conversation

mobuchowski
Copy link
Contributor

Use queued_dttm to differentiate between sensors with mode=reschedule runs.
Use map_index to differentiate between mapped tasks runs.

@mobuchowski mobuchowski added the full tests needed We need to run full set of tests for this PR to merge label Nov 15, 2024
Copy link
Contributor

@kacpermuda kacpermuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, let's run all the compat tests to make sure it works for past Airflow versions as well.

@mobuchowski mobuchowski force-pushed the more-unique-data-for-ol-runids branch 3 times, most recently from ec9f51c to 3aa017d Compare November 18, 2024 13:00
@mobuchowski mobuchowski force-pushed the more-unique-data-for-ol-runids branch 3 times, most recently from 482115b to 9cb56cc Compare November 19, 2024 19:14
Copy link
Contributor

@dstandish dstandish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might look into whether you can use TaskReschedule to know whether you are in a reschedule. if so, could limit unnec queries by checking whether you inherit from basesensor and have mode==reschedule. and then, on the exit side, you could check state == up-for-reschedule.
although, this could be an issue with deferrables as well. with deferrables though you could check whether next_method / next kwargs populated to infer coming out of deferral. i think we do that somewhere and selectively emit a log message. then on exit, you could look at state deferred perhaps.
but fundamentally, you are OL master so, i defer to you

@mobuchowski mobuchowski force-pushed the more-unique-data-for-ol-runids branch 2 times, most recently from eafa59b to 8cf16c8 Compare November 25, 2024 10:57
@mobuchowski
Copy link
Contributor Author

@dstandish at the end just

        session.query(
            exists().where(
                TaskReschedule.dag_id == ti.dag_id,
                TaskReschedule.task_id == ti.task_id,
                TaskReschedule.run_id == ti.run_id,
                TaskReschedule.map_index == ti.map_index,
                TaskReschedule.try_number == ti.try_number,
            )
        ).scalar()
        is True

works - the additional query isn't a big problem because I can perform those just for reschedulable sensors.

@mobuchowski mobuchowski force-pushed the more-unique-data-for-ol-runids branch from 8cf16c8 to 7e32fe5 Compare November 25, 2024 12:00
@mobuchowski mobuchowski merged commit 05f935d into apache:main Nov 25, 2024
89 of 91 checks passed
@mobuchowski mobuchowski changed the title utilize more information for deterministic generation of OpenLineage's run_id utilize map_index for deterministic generation of OpenLineage's run_id Nov 25, 2024
@dstandish
Copy link
Contributor

@dstandish at the end just

        session.query(
            exists().where(
                TaskReschedule.dag_id == ti.dag_id,
                TaskReschedule.task_id == ti.task_id,
                TaskReschedule.run_id == ti.run_id,
                TaskReschedule.map_index == ti.map_index,
                TaskReschedule.try_number == ti.try_number,
            )
        ).scalar()
        is True

works - the additional query isn't a big problem because I can perform those just for reschedulable sensors.

cool @mobuchowski

@potiuk
Copy link
Member

potiuk commented Nov 26, 2024

Is not that something that should be back-ported to 2.10.4 ? It certainly looks like

@potiuk potiuk added this to the Airflow 2.10.4 milestone Nov 26, 2024
@potiuk
Copy link
Member

potiuk commented Nov 26, 2024

I provisionally added 2.10.4 milestone now but @mobuchowski -> maybe you can use the new cherry-picker manual flow to back-port it (it's already merged, so we missed the opportunity to auto cherry-pick it) https://github.com/apache/airflow/blob/main/dev/README_AIRFLOW3_DEV.md#how-to-backport-pr-with-cherry-picker-cli

@potiuk
Copy link
Member

potiuk commented Nov 26, 2024

Ah. STupid me. It's provider-only :). Forget it.

LefterisXefteris pushed a commit to LefterisXefteris/airflow that referenced this pull request Jan 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers full tests needed We need to run full set of tests for this PR to merge provider:dbt-cloud provider:openlineage AIP-53
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants