From f73149a05972ca67fe6fe9d11f70b2b13dd3af37 Mon Sep 17 00:00:00 2001 From: Ellie O'Neil <110510035+eboneil@users.noreply.github.com> Date: Wed, 31 Jul 2024 04:31:09 -0700 Subject: [PATCH] docs(airflow): example query to get datajobs for a dataflow (#11034) --- docs/api/graphql/getting-started.md | 1 + docs/lineage/airflow.md | 28 ++++++++++++++++++++++++++++ 2 files changed, 29 insertions(+) diff --git a/docs/api/graphql/getting-started.md b/docs/api/graphql/getting-started.md index 98aeca196600d..dfa556051bd4d 100644 --- a/docs/api/graphql/getting-started.md +++ b/docs/api/graphql/getting-started.md @@ -27,6 +27,7 @@ For more information on, please refer to the following links." - [Querying for Domain of a Dataset](/docs/api/tutorials/domains.md#read-domains) - [Querying for Glossary Terms of a Dataset](/docs/api/tutorials/terms.md#read-terms) - [Querying for Deprecation of a dataset](/docs/api/tutorials/deprecation.md#read-deprecation) +- [Querying for all DataJobs that belong to a DataFlow](/docs/lineage/airflow.md#get-all-datajobs-associated-with-a-dataflow) ### Search diff --git a/docs/lineage/airflow.md b/docs/lineage/airflow.md index 8680e36e2baf3..9d838ef8a4404 100644 --- a/docs/lineage/airflow.md +++ b/docs/lineage/airflow.md @@ -266,6 +266,34 @@ with DAG( - ingest this DAG, and it will remove all the obsolete pipelines and tasks from the Datahub based on the `cluster` value set in the `airflow.cfg` +## Get all dataJobs associated with a dataFlow + +If you are looking to find all tasks (aka DataJobs) that belong to a specific pipeline (aka DataFlow), you can use the following GraphQL query: + +```graphql +query { + dataFlow(urn: "urn:li:dataFlow:(airflow,db_etl,prod)") { + childJobs: relationships( + input: { + types: ["IsPartOf"], + direction: INCOMING, + start: 0, + count: 100 + } + ) { + total + relationships { + entity { + ... on DataJob { + urn + } + } + } + } + } +} +``` + ## Emit Lineage Directly If you can't use the plugin or annotate inlets/outlets, you can also emit lineage using the `DatahubEmitterOperator`.