Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for absolute paths instead of assuming defaultFS in UDF's required files #33

Merged
merged 1 commit into from
Jan 14, 2020

Conversation

SurenNihalani
Copy link
Contributor

@SurenNihalani SurenNihalani commented Dec 17, 2019

Problem statement: I want fully qualified URIs to work in required files function of a UDF. Currently, to resolve #LATEST, the code assumes that the filesystem in use is coming from the configuration option fs.defaultFS. This change does by switching usages of FileSystem.get(conf) to FileSystem.get(URI, conf). This change is similar to the internal RB (rb no. 1901530) I posted recently.

Following steps were used for testing:

  1. setting defaultFS to an abfs directory
  2. setting the protocol to be file:/// in unit tests.
  3. running the FileSystemUtilTest

@shardulm94
Copy link
Contributor

shardulm94 commented Dec 17, 2019

Can you provide more information? What is the change that you are making? Why is this required/What is the issue that you are facing? What testing was performed?

@wmoustafa
Copy link
Contributor

wmoustafa commented Dec 18, 2019

Can you provide more information? What is the change that you are making? Why is this required/What is the issue that you are facing? What testing was performed?

I addition to what @shardulm94 mentioned, commit message should start with the imperative part (e.g., Allow). Also, I am not clear on how "absolute path" and "defaultFS" compare. Seem to be different concepts. Is the change backward compatible?

@SurenNihalani SurenNihalani changed the title UDF's files: allow for absolute paths instead of assuming defaultFS Allow for absolute paths instead of assuming defaultFS in UDF's required files Dec 18, 2019
@SurenNihalani
Copy link
Contributor Author

@wmoustafa , I missed the backward incompatibility thought. Thanks for pointing it out. If we consider FileSystemUtils as public API, it's not. I'd like us to think of this as backwards compatible. Reasons:

  1. FileSystemUtils is an implementation class than an api class
  2. If LinkedIn is the only user we are aware of, I have code searched it internally and it's not being used anywhere else

re: absolute path and defaultFS: absolute path is a URI that has protocol, scheme and authority. If it doesn't have those, we assume that the path is actually existing on the filesystem defined by fs.defaultFS. However, the way it's currently being used here, it's enforcing that all paths be relative to fs.defaultFS. This PR changes that.

lmk if jumping on bluejeans will be helpful to move this PR faster!

Copy link
Contributor

@shardulm94 shardulm94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks okay to me. Have a minor comment. @wmoustafa Thoughts?

FileSystem fs;
JobConf conf = new JobConf();
try {
// Checks if currently we are in local mode, which is basically when running unit tests
if (isLocalEnvironment(conf)) {
fs = FileSystem.getLocal(conf);
} else {
fs = FileSystem.get(conf);
fs = FileSystem.get(new java.net.URI(filePath), conf);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer Path.getFileSystem(conf) since the Path constructor does some handling for normalization and unescaped characters.

@@ -44,17 +45,17 @@ public static boolean isLocalEnvironment(Configuration conf) {
*
* @return the HDFS FileSystem if we are not in local mode, local FileSystem if we are.
*/
public static FileSystem getHDFSFileSystem() {
public static FileSystem getHDFSFileSystem(String filePath) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this name still valid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think the name is still valid.

@wmoustafa
Copy link
Contributor

This looks okay to me. Have a minor comment. @wmoustafa Thoughts?

Just added another minor comment. Looks okay in general.

@SurenNihalani
Copy link
Contributor Author

Addressed @shardulm94 's point in the updated commit

@shardulm94 shardulm94 merged commit a3906a4 into linkedin:master Jan 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants