Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include Column Lineage Info for All Queries #23322

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

behrooz-stripe
Copy link

@behrooz-stripe behrooz-stripe commented Sep 6, 2024

Description

Column lineage info is extremely helpful but is only available for Create View, Table etc. not Select queries. This PR simply expands the support for lineage for other queries as well.

Additional context and related issues

adds column lineage to updateTarget so it'll be available in EventListener without any other plumbing/changes required. Please see assertLineage for testing, it'll force all the select queries to go through the same tests and assumptions.

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
(x) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

Copy link

cla-bot bot commented Sep 6, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

4 similar comments
Copy link

cla-bot bot commented Sep 6, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

Copy link

cla-bot bot commented Sep 6, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

Copy link

cla-bot bot commented Sep 10, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

Copy link

cla-bot bot commented Sep 10, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@behrooz-stripe behrooz-stripe marked this pull request as ready for review September 10, 2024 00:10
@behrooz-stripe behrooz-stripe marked this pull request as draft September 10, 2024 00:11
Copy link

cla-bot bot commented Sep 10, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

14 similar comments
Copy link

cla-bot bot commented Sep 10, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

Copy link

cla-bot bot commented Sep 10, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

Copy link

cla-bot bot commented Sep 10, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

Copy link

cla-bot bot commented Sep 10, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

Copy link

cla-bot bot commented Sep 10, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

Copy link

cla-bot bot commented Sep 10, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

Copy link

cla-bot bot commented Sep 10, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

Copy link

cla-bot bot commented Sep 12, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

Copy link

cla-bot bot commented Sep 12, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

Copy link

cla-bot bot commented Sep 13, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

Copy link

cla-bot bot commented Sep 13, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

Copy link

cla-bot bot commented Sep 13, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

Copy link

cla-bot bot commented Sep 13, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

Copy link

cla-bot bot commented Sep 13, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@behrooz-stripe behrooz-stripe marked this pull request as ready for review September 16, 2024 19:25
@behrooz-stripe
Copy link
Author

Hi, I work for Stripe and I've double checked on our end and we already should have a Corporate CLA signed. Please verify on your end. Thanks!

@mosabua
Copy link
Member

mosabua commented Sep 16, 2024

@behrooz-stripe you still need to sign the individual contributor license agreement

@behrooz-stripe
Copy link
Author

@behrooz-stripe you still need to sign the individual contributor license agreement

thanks @mosabua ! I just emailed the individual one too.

@wendigo
Copy link
Contributor

wendigo commented Sep 17, 2024

@behrooz-stripe Can you squash commits so that the commit is a logical unit of a change that fully compiles on its own?

Copy link

cla-bot bot commented Sep 17, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

1 similar comment
Copy link

cla-bot bot commented Sep 17, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@behrooz-stripe
Copy link
Author

@wendigo done.

@wendigo
Copy link
Contributor

wendigo commented Sep 17, 2024

@behrooz-stripe Please drop merge commit and rebase on master and I'll review it

Copy link

cla-bot bot commented Sep 17, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@behrooz-stripe
Copy link
Author

@behrooz-stripe Please drop merge commit and rebase on master and I'll review it

alright, should be good now! I think I accidentally clicked on sync button in GitHub UI an it committed the merge one.

@behrooz-stripe
Copy link
Author

@wendigo seems like after I rebased this testCacheFileOperations is failing but I can't seem to be able to reproduce it locally and it succeeds. Could it be it's flakey?
image

@behrooz-stripe
Copy link
Author

nvm, I can see the master is failing with the same error too.

@behrooz-stripe
Copy link
Author

Hey @wendigo would you please take a look at this PR and provide feedback? thanks!

@wendigo
Copy link
Contributor

wendigo commented Oct 7, 2024

@behrooz-stripe sorry I don't have context here

@wendigo wendigo removed their assignment Oct 7, 2024
@behrooz-stripe
Copy link
Author

@electrum might have the best context on this based on discussions in this thread: https://trinodb.slack.com/archives/CP1MUNEUX/p1715720365582019

@hashhar hashhar requested a review from Praveen2112 October 8, 2024 10:02
@patstevens4
Copy link

Bump, eagerly waiting for this feature! Thanks for all the work on this great idea

@hashhar
Copy link
Member

hashhar commented Nov 18, 2024

@Praveen2112 PTAL. This looks good to go in my opinion but you're the expert here.

@behrooz-stripe
Copy link
Author

Bump @Praveen2112 @hashhar

Copy link

This pull request has gone a while without any activity. Tagging for triage help: @mosabua

@github-actions github-actions bot added the stale label Dec 25, 2024
@Praveen2112
Copy link
Member

Sorry for the delay. I'll take a look this week.

@github-actions github-actions bot removed the stale label Dec 26, 2024
analysis.setScope(node, queryScope);
ImmutableList<OutputColumn> outputColumns = outputColumnsBuilder.build();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: List

}
}
}
analysis.setUpdateTarget(version, qualifiedName, Optional.empty(), Optional.of(outputColumns));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if using updateTarget is a right approach here as it is used for a different set of operations and we can't bind it to the first table we encounter as a part of SELECT query

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did think about this and tried to first add a dedicated target but it required quite a bit of plumbing and changes creeped in across many classes and ultimately the EventListener as well; hence the snippet in the description of the PR. So I wasn't sure if we want to go down a major change.
I do agree it's somewhat awkward to tie a select to a first table but ultimately binding a select to table is kinda irrelevant conceptually and should be just ignored. I am happy to change course on this as well.

if (!outputColumns.isEmpty()) {
QualifiedObjectName qualifiedName = new QualifiedObjectName("", "", "");
CatalogHandle.CatalogVersion version = new CatalogHandle.CatalogVersion("1");
if (node.getQueryBody() instanceof QuerySpecification querySpecification) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is a bit brittle as we capture them only for simple queries and not for queries which has WITH clause in the begining.

@@ -122,7 +123,10 @@ public void testSplitsForNormalQuery()
QueryCompletedEvent queryCompletedEvent = queryEvents.getQueryCompletedEvent();
assertThat(queryCompletedEvent.getContext().getResourceGroupId().isPresent()).isTrue();
assertThat(queryCompletedEvent.getContext().getResourceGroupId().get()).isEqualTo(createResourceGroupId("global", "user-user"));
assertThat(queryCompletedEvent.getIoMetadata().getOutput()).isEqualTo(Optional.empty());
Optional<QueryOutputMetadata> output = queryCompletedEvent.getIoMetadata().getOutput();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of capturing this event for SELECT in QueryOutputMetadata ? Can we add additional field in QueryIOMetadata where there could be a Optional<List<OutputColumnMetadata>> which could be set only for SELECT statements in this way the logic could be simplified.

@martint - Is QueryIOMetadata a right place to add them or is there a better place we could store them ?

Copy link
Author

@behrooz-stripe behrooz-stripe Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did take a stab at storing these info separately from output and as I mentioned here it (changes) quickly leaks to many places (QueryInfo.java, QueryStateMachine.java, QueryMonitor.java, QueryCompletedEvent.java, ...) to plumb this info from analysis all the way to to event listener.
So basically my idea was to avoid that and reuse update target for carrying the same info. In any case, I put a gist of the actual diff if I were to hold this in a separate variable in the analysis, CompletedEvent, and QueryInfo. I am happy to update this PR to reflect that if you folks prefer that.
CC: @martint @Praveen2112

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

7 participants