Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for absolute path in Delta Lake connector #17038

Merged

Conversation

vinay-kl
Copy link
Contributor

@vinay-kl vinay-kl commented Apr 14, 2023

Description

Support reading absolute paths from the Delta transaction log

Additional context and related issues

Fixes #17011

Release notes

(x) Release notes are required, with the following suggested text:

# Delta Lake
* Add support reading [shallow cloned tables](https://docs.databricks.com/sql/language-manual/delta-clone.html). ({issue}`17011`)

@cla-bot
Copy link

cla-bot bot commented Apr 14, 2023

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: 300070018.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@findinpath
Copy link
Contributor

@vinay-kl pls follow the guideline https://github.com/trinodb/trino/blob/master/.github/DEVELOPMENT.md#format-git-commit-messages for commit messages.

@vinay-kl vinay-kl changed the title Databricks shollow cloned tables read support on Trino Support reading of _shallow_ cloned tables Apr 14, 2023
@findinpath findinpath requested a review from krvikash April 14, 2023 15:12
@cla-bot
Copy link

cla-bot bot commented Apr 14, 2023

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: 300070018.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@github-actions github-actions bot added the delta-lake Delta Lake connector label Apr 14, 2023
@vinay-kl vinay-kl changed the title Support reading of _shallow_ cloned tables Support reading of Shallow Cloned tables Apr 17, 2023
@vinay-kl
Copy link
Contributor Author

@ebyhr @alexjo2144 can you please review this PR

@findinpath findinpath self-requested a review April 21, 2023 10:45
@findinpath
Copy link
Contributor

findinpath commented Apr 21, 2023

Small checkstyle issues in your PR

Error:  src/main/java/io/trino/plugin/deltalake/DeltaLakeSplitManager.java:[48,1] (imports) ImportOrder: Wrong order for 'java.util.Objects' import.
Error:  src/main/java/io/trino/plugin/deltalake/DeltaLakeSplitManager.java:[311,88] (blocks) LeftCurly: '{' at column 88 should be on a new line.
Error:  src/main/java/io/trino/plugin/deltalake/DeltaLakeSplitManager.java:[331,9] (blocks) RightCurly: '}' at column 9 should be alone on a line.
Error:  Failed to execute goal org.apache.maven.plugins:maven-checkstyle-plugin:3.2.1:check (checkstyle) on project trino-delta-lake: You have 3 Checkstyle violations. -> [Help 1]
Error:  

Address them and do a build before pushing:

mvn install -P errorprone-compiler -DskipTests -nsu -pl :trino-delta-lake

At this stage do consider a rebase.

git rebase -i $(git merge-base HEAD master)

@cla-bot
Copy link

cla-bot bot commented Apr 21, 2023

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: 300070018.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

Copy link
Member

@alexjo2144 alexjo2144 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for proposing a fix! There are a couple other places that use AddFileEntry.getPath() that probably also need an update but I don't think any others are used on the read path so we can look at them separately.

@findepi @findinpath we probably need to think through if all table operations make sense on clones. DML, Analyze, optimize, vacuum, etc.

@cla-bot
Copy link

cla-bot bot commented Apr 21, 2023

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: 300070018.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

1 similar comment
@cla-bot
Copy link

cla-bot bot commented Apr 21, 2023

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: 300070018.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

Copy link
Member

@alexjo2144 alexjo2144 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes are looking pretty good to me, just one more small clean up thing. Can you squash your changes and also include "Delta Lake" in the commit message?

@ebyhr @findepi I think this is ready for a maintainer to take a look

@alexjo2144
Copy link
Member

@vinay-kl the CLA bot doesn't seem happy, have you filled one out?

@vinay-kl
Copy link
Contributor Author

@vinay-kl the CLA bot doesn't seem happy, have you filled one out?

@alexjo2144 I have, The CLA was accepted and confirmation mail from martin was received yesterday, I'm not sure why the CLA issue still prevails.

@alexjo2144
Copy link
Member

The CLA was accepted and confirmation mail from martin was received yesterday

Awesome. In that case, can you double check the couple steps listed in the bot's messages to make sure that your commits are tagged appropriately?

@cla-bot
Copy link

cla-bot bot commented Apr 21, 2023

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: 300070018.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@vinay-kl vinay-kl closed this Apr 21, 2023
@vinay-kl
Copy link
Contributor Author

sorry, by mistakingly i closed this PR and don't suppose how to open this.. will raise another PR. Extremely sorry for inconvenience @alexjo2144 @findinpath

@krvikash krvikash reopened this Apr 21, 2023
@krvikash
Copy link
Contributor

Hi @vinay-kl, Reopened the PR.

@vinay-kl vinay-kl force-pushed the databricks-shollow-cloned-tables-support branch from 8978a0a to ec3d68e Compare April 21, 2023 21:23
@cla-bot
Copy link

cla-bot bot commented Apr 21, 2023

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: 300070018.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@vinay-kl
Copy link
Contributor Author

@krvikash Thank you so much for re-opening the PR.

@ebyhr
Copy link
Member

ebyhr commented Apr 21, 2023

Your account isn't yet registered in https://github.com/trinodb/cla. Please wait for a while.
cc: @martint

@github-actions github-actions bot removed the stale label Mar 4, 2024
@vinay-kl
Copy link
Contributor Author

vinay-kl commented Mar 15, 2024

@mosabua I will continue working on this PR, will submit the latest commit changes soon.

@vinay-kl vinay-kl force-pushed the databricks-shollow-cloned-tables-support branch 2 times, most recently from d2f563d to 1ea810e Compare March 21, 2024 14:02
@vinay-kl vinay-kl force-pushed the databricks-shollow-cloned-tables-support branch from 1ea810e to 12102c7 Compare March 21, 2024 16:09
@findinpath
Copy link
Contributor

@vinay-kl pls rebase on master to address the conflicts.

@vinay-kl vinay-kl force-pushed the databricks-shollow-cloned-tables-support branch from 12102c7 to 4ed0460 Compare March 29, 2024 10:06
dropDeltaTableWithRetry("default." + clonedTable);
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please add a simple test showcasing the following scenario:

  • create table t1
  • spark: create table t2 shallow clone t1
  • ensure that t2 and t1 have the same content
  • trino: drop table t2
  • ensure that the t1 table has the content intact
  • trino: drop table t1

@vinay-kl vinay-kl force-pushed the databricks-shollow-cloned-tables-support branch from 4ed0460 to 7a8dd02 Compare March 29, 2024 11:47
Copy link
Contributor

@findinpath findinpath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % open nits

@vinay-kl vinay-kl force-pushed the databricks-shollow-cloned-tables-support branch from 7a8dd02 to ff0f200 Compare March 31, 2024 18:23
@ebyhr
Copy link
Member

ebyhr commented Apr 1, 2024

/test-with-secrets sha=ff0f20084d0e10b159893dd577b3d094e4792864

Copy link

github-actions bot commented Apr 1, 2024

The CI workflow run with tests that require additional secrets has been started: https://github.com/trinodb/trino/actions/runs/8507516113

row(2, "a", "update_postimage", 2L),
row(2, "b", "update_preimage", 2L),
row(3, "b", "update_postimage", 2L));
// table_changes function from trino isn't considering `base table inserts on shallow cloned table` as CDF as of v422
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as of v422

It's different from the current version. I would recommend removing.

@vinay-kl vinay-kl force-pushed the databricks-shollow-cloned-tables-support branch from ff0f200 to 3de4fdc Compare April 2, 2024 23:24
@ebyhr ebyhr merged commit cadf38e into trinodb:master Apr 3, 2024
49 checks passed
@github-actions github-actions bot added this to the 444 milestone Apr 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed delta-lake Delta Lake connector
Development

Successfully merging this pull request may close these issues.

Trino is unable to read shallow cloned tables using Databricks Runtime
8 participants