-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow for absolute paths instead of assuming defaultFS in UDF's required files #33
Conversation
Can you provide more information? What is the change that you are making? Why is this required/What is the issue that you are facing? What testing was performed? |
I addition to what @shardulm94 mentioned, commit message should start with the imperative part (e.g., Allow). Also, I am not clear on how "absolute path" and "defaultFS" compare. Seem to be different concepts. Is the change backward compatible? |
@wmoustafa , I missed the backward incompatibility thought. Thanks for pointing it out. If we consider FileSystemUtils as public API, it's not. I'd like us to think of this as backwards compatible. Reasons:
re: absolute path and defaultFS: absolute path is a URI that has protocol, scheme and authority. If it doesn't have those, we assume that the path is actually existing on the filesystem defined by fs.defaultFS. However, the way it's currently being used here, it's enforcing that all paths be relative to fs.defaultFS. This PR changes that. lmk if jumping on bluejeans will be helpful to move this PR faster! |
cabaf16
to
8893dd3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks okay to me. Have a minor comment. @wmoustafa Thoughts?
FileSystem fs; | ||
JobConf conf = new JobConf(); | ||
try { | ||
// Checks if currently we are in local mode, which is basically when running unit tests | ||
if (isLocalEnvironment(conf)) { | ||
fs = FileSystem.getLocal(conf); | ||
} else { | ||
fs = FileSystem.get(conf); | ||
fs = FileSystem.get(new java.net.URI(filePath), conf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer Path.getFileSystem(conf) since the Path constructor does some handling for normalization and unescaped characters.
@@ -44,17 +45,17 @@ public static boolean isLocalEnvironment(Configuration conf) { | |||
* | |||
* @return the HDFS FileSystem if we are not in local mode, local FileSystem if we are. | |||
*/ | |||
public static FileSystem getHDFSFileSystem() { | |||
public static FileSystem getHDFSFileSystem(String filePath) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this name still valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think the name is still valid.
Just added another minor comment. Looks okay in general. |
Addressed @shardulm94 's point in the updated commit |
Problem statement: I want fully qualified URIs to work in required files function of a UDF. Currently, to resolve #LATEST, the code assumes that the filesystem in use is coming from the configuration option fs.defaultFS. This change does by switching usages of FileSystem.get(conf) to FileSystem.get(URI, conf). This change is similar to the internal RB (rb no. 1901530) I posted recently.
Following steps were used for testing: