Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace usage of hadoop.fs.Path with String in Delta Lake #16256

Merged
merged 2 commits into from
Mar 20, 2023
Merged

Replace usage of hadoop.fs.Path with String in Delta Lake #16256

merged 2 commits into from
Mar 20, 2023

Conversation

pajaks
Copy link
Member

@pajaks pajaks commented Feb 24, 2023

Description

Origin of this change is to migrate to TrinoFileSystem
Relates to #16020

Some hadoop.fs.Path will be migrated in following PR as it needs some additional path resolution functions.

Release notes

(x) This is not user-visible or docs only and no release notes are required.

@cla-bot cla-bot bot added the cla-signed label Feb 24, 2023
@pajaks
Copy link
Member Author

pajaks commented Feb 28, 2023

@ebyhr Could you run tests with secretes?

@ebyhr
Copy link
Member

ebyhr commented Feb 28, 2023

/test-with-secrets sha=fe0df4c21f640800bb37143aba3cc6e24f43f686

@github-actions github-actions bot added the delta-lake Delta Lake connector label Mar 1, 2023
@pajaks pajaks marked this pull request as ready for review March 2, 2023 09:30
@findepi findepi requested review from electrum and removed request for findepi March 2, 2023 14:54
Copy link
Member

@alexjo2144 alexjo2144 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, couple small things.

I'd also dd Delta or Delta Lake to the commit message somewhere.

@@ -724,13 +724,13 @@ public void createTable(ConnectorSession session, ConnectorTableMetadata tableMe
}
Path targetPath = new Path(location);
ensurePathExists(session, targetPath);
Path deltaLogDirectory = getTransactionLogDir(targetPath);
String deltaLogDirectory = getTransactionLogDir(location);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we migrate targetPath two lines up as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still used in code which will be removed in #16200. So I would wait for that PR to land and remove it during rebase.

@@ -107,7 +107,7 @@ public void flush()
}

String clusterId = commitInfoEntry.get().getCommitInfo().getClusterId();
logSynchronizer.write(session, clusterId, logEntry, bos.toByteArray());
logSynchronizer.write(session, clusterId, new Path(logEntry), bos.toByteArray());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we migrate logSynchronizer as well? can be follow up, I know there are a lot of these

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was planning to do that in separate commit as some implementation of LogSynchronizer are strongly coupled with org.apache.hadoop.fs.Path

@pajaks
Copy link
Member Author

pajaks commented Mar 7, 2023

rebased to resolve conflicts

@pajaks
Copy link
Member Author

pajaks commented Mar 15, 2023

Rebased to use

public static String appendPath(String location, String path)

@@ -825,7 +824,7 @@ public DeltaLakeOutputTableHandle beginCreateTable(ConnectorSession session, Con
Path targetPath = new Path(location);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you omit this occurence of org.apache.hadoop.fs.Path ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those changes are in follow up PR: #16469

@pajaks
Copy link
Member Author

pajaks commented Mar 15, 2023

@ebyhr @findepi Could you run with secrets?

@ebyhr
Copy link
Member

ebyhr commented Mar 15, 2023

/test-with-secrets sha=1fb60c69ab92fdbc7487c74d8944b581bb12b9d4

@pajaks pajaks self-assigned this Mar 15, 2023
@github-actions
Copy link

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/4425399001

@findinpath findinpath added the no-release-notes This pull request does not require release notes entry label Mar 16, 2023
@ebyhr
Copy link
Member

ebyhr commented Mar 16, 2023

/test-with-secrets sha=94a4e3e6bcb2eb8c5f031290930d0c75a66bcac1

@github-actions
Copy link

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/4435274449

@pajaks
Copy link
Member Author

pajaks commented Mar 16, 2023

CI hit: #13199 and also download github action timeout
@ebyhr Could you rerun?

@ebyhr
Copy link
Member

ebyhr commented Mar 16, 2023

/test-with-secrets sha=94a4e3e6bcb2eb8c5f031290930d0c75a66bcac1

@ebyhr ebyhr changed the title Replace hadoop.fs.Path with String Replace usage of hadoop.fs.Path with String in Delta Lake Mar 16, 2023
@pajaks
Copy link
Member Author

pajaks commented Mar 17, 2023

Rebased to include fix for skipped tests in CI

@ebyhr ebyhr merged commit 760375b into trinodb:master Mar 20, 2023
@github-actions github-actions bot added this to the 411 milestone Mar 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed delta-lake Delta Lake connector no-release-notes This pull request does not require release notes entry
Development

Successfully merging this pull request may close these issues.

4 participants