-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explicitly set input dir in job conf instead of FileInputFormat.setInputPath which makes an IO call #16640
Conversation
Can we have any test cases - which fails without this fix ? |
@@ -529,7 +529,7 @@ private ListenableFuture<Void> loadPartition(HivePartitionMetadata partition) | |||
} | |||
|
|||
JobConf jobConf = toJobConf(configuration); | |||
FileInputFormat.setInputPaths(jobConf, path); | |||
hdfsEnvironment.doAs(hdfsContext.getIdentity(), () -> FileInputFormat.setInputPaths(jobConf, path)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed. It seems to only modify JobConf
, not do any IO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I thought too. But it is internally making a call to namenode to fetch the working directory and update mapreduce.job.working.dir
in the jobConf as well. This call started to fail.
Caused by: java.lang.RuntimeException: java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "<hostname>/<ip>"; destination host is: "<namenode-hostname>":<port>;
at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:665)
at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:452)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. We never set the working directory, so we could replace both usages of this method:
jobConf.set(FileInputFormat.INPUT_DIR, StringUtils.escapeString(path.toString()));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is another usage inside createHiveSymlinkSplits()
which also needs to be fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should also work and may be better. Thanks for the feedback. Let me test and update the PR.
One other thing, FileInputFormat.INPUT_DIR isn't defined in the version of hadoop-apache (3.2.0-18) we use. I'll expose the parameter ("mapreduce.input.fileinputformat.inputdir") in this class and update both places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@electrum I have updated the PR based on the above feedback.
The CI seems to have failed on a previous commit due to checkstyle violation. But it has been updated in the latest commit.
Description
Fixes #16639
Release notes
(x) This is not user-visible or docs only and no release notes are required.