Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openlineage, snowflake: add OpenLineage support for Snowflake #31696

Merged
merged 2 commits into from
Jul 21, 2023

Conversation

JDarDagran
Copy link
Contributor

This PR adds OpenLineage support for SnowflakeOperator.

depends on: #31398

Copy link
Collaborator

@sunank200 sunank200 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation changes for the operator should be added for open lineage

authority = f"{parsed.hostname}:{parsed.port}"
return authority

def get_default_schema(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have we tested this?

"""
return self.__schema or "public"

def get_database_specific_lineage(self, task_instance):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't there specific lineage in case of snowflake - eg. query run time

Copy link
Contributor Author

@JDarDagran JDarDagran Jun 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I'm trying to reflect what's currently sent from openlineage-airflow package.
I might add some more information but I'd really appreciate what would be useful and available to retrieve.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand - https://docs.snowflake.com/en/sql-reference/account-usage/query_history#usage-notes: latency for QUERY_HISTORY is up to 45 minutes so it looks like it's not sustainable to always get_ query run time

airflow/providers/common/sql/operators/sql.py Outdated Show resolved Hide resolved
airflow/providers/common/sql/operators/sql.py Outdated Show resolved Hide resolved
airflow/providers/openlineage/extractors/base.py Outdated Show resolved Hide resolved
@JDarDagran JDarDagran force-pushed the aip-53-snowflake branch 2 times, most recently from edb89d4 to 3c4844f Compare July 7, 2023 07:02
@JDarDagran
Copy link
Contributor Author

One failing test looking rather flaky.

@pankajkoti pankajkoti removed their request for review July 12, 2023 12:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants