Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kerberos error when using custom input format via UseFileSplitsFromInputFormat #16639

Closed
akshayrai opened this issue Mar 20, 2023 · 0 comments · Fixed by #16640
Closed

Kerberos error when using custom input format via UseFileSplitsFromInputFormat #16639

akshayrai opened this issue Mar 20, 2023 · 0 comments · Fixed by #16640

Comments

@akshayrai
Copy link
Contributor

akshayrai commented Mar 20, 2023

We integrated a custom input format using "UseFileSplitsFromInputFormat" annotation and deployed changes to our cluster. Things were working fine for a few days until we started seeing the below kerberos error for queries that were relying on the input format.

Caused by: java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "<hostname>/<ip>"; destination host is: "<namenode-hostname>":<port>;

Full stacktrace:

Query 20230310_213839_92645_7sjqm failed: java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "<hostname>/<ip>"; destination host is: "<namenode-hostname>":<port>;
io.trino.spi.TrinoException: java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "<hostname>/<ip>"; destination host is: "<namenode-hostname>":<port>;
	at io.trino.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:281)
	at io.trino.plugin.hive.util.ResumableTasks$1.run(ResumableTasks.java:38)
	at io.trino.$gen.Trino_li_352_base_155_g961211d_dirty____20230303_011254_2.run(Unknown Source)
	at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:80)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.lang.RuntimeException: java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "<hostname>/<ip>"; destination host is: "<namenode-hostname>":<port>;
	at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:665)
	at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:452)
	at io.trino.plugin.hive.BackgroundHiveSplitLoader.loadPartition(BackgroundHiveSplitLoader.java:486)
	at io.trino.plugin.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:345)
	at io.trino.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:274)
	... 6 more
Caused by: java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "<hostname>/<ip>"; destination host is: "<namenode-hostname>":<port>;
	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:808)
	at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1558)
	at org.apache.hadoop.ipc.Client.call(Client.java:1500)
	at org.apache.hadoop.ipc.Client.call(Client.java:1397)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
	at jdk.proxy5/jdk.proxy5.$Proxy367.getBlockLocations(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:263)
	at jdk.internal.reflect.GeneratedMethodAccessor1345.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
	at jdk.proxy5/jdk.proxy5.$Proxy368.getBlockLocations(Unknown Source)
	at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:849)
	at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:838)
	at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:827)
	at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:332)
	at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:292)
	at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:276)
	at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1066)
	at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:327)
	at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:323)
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
	at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:335)
	at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:919)
	at org.apache.hadoop.fs.FileBasedMountTableLoader.loadMountTableFromLocation(FileBasedMountTableLoader.java:203)
	at org.apache.hadoop.fs.FileBasedMountTableLoader.loadMountTable(FileBasedMountTableLoader.java:107)
	at org.apache.hadoop.fs.MountTableLoaderFactory.loadMountTable(MountTableLoaderFactory.java:83)
	at org.apache.hadoop.fs.GridMountTableState.fetchLatestMountTableConfiguration(GridMountTableState.java:148)
	at org.apache.hadoop.fs.GridFilesystem.loadMountTableConfigs(GridFilesystem.java:303)
	at org.apache.hadoop.fs.viewfs.ViewFileSystemOverloadScheme.initialize(ViewFileSystemOverloadScheme.java:108)
	at org.apache.hadoop.fs.GridFilesystem.reloadInternalMountState(GridFilesystem.java:1289)
	at org.apache.hadoop.fs.GridMountTableState.reload(GridMountTableState.java:64)
	at org.apache.hadoop.fs.GridFilesystem.initialize(GridFilesystem.java:182)
	at io.trino.plugin.hive.fs.TrinoFileSystemCache.createFileSystem(TrinoFileSystemCache.java:150)
	at io.trino.plugin.hive.fs.TrinoFileSystemCache$FileSystemHolder.createFileSystemOnce(TrinoFileSystemCache.java:332)
	at io.trino.plugin.hive.fs.TrinoFileSystemCache.getInternal(TrinoFileSystemCache.java:129)
	at io.trino.plugin.hive.fs.TrinoFileSystemCache.get(TrinoFileSystemCache.java:88)
	at org.apache.hadoop.fs.ForwardingFileSystemCache.get(ForwardingFileSystemCache.java:38)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:475)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:228)
	at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:661)
	... 10 more
Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
	at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:783)
	at java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
	at java.base/javax.security.auth.Subject.doAs(Subject.java:439)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1905)
	at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:743)
	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:840)
	at org.apache.hadoop.ipc.Client$Connection.access$3700(Client.java:423)
	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1615)
	at org.apache.hadoop.ipc.Client.call(Client.java:1444)
	... 53 more
Caused by: javax.security.sasl.SaslException: GSS initiate failed
	at jdk.security.jgss/com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:228)
	at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:414)
	at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:629)
	at org.apache.hadoop.ipc.Client$Connection.access$2200(Client.java:423)
	at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:827)
	at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:823)
	at java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
	at java.base/javax.security.auth.Subject.doAs(Subject.java:439)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1905)
	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:823)
	... 56 more
Caused by: org.ietf.jgss.GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
	at java.security.jgss/sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:164)
	at java.security.jgss/sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:126)
	at java.security.jgss/sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:195)
	at java.security.jgss/sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:205)
	at java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:230)
	at java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:196)
	at jdk.security.jgss/com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:209)
	... 65 more

Root cause

  • Remote debugging showed that the FileInputFormat.setInputPaths which is supposed to be setting some parameters in the jobConf is internally making a call to namenode to fetch the working directory and update mapreduce.job.working.dir. This call started to fail once the ticket got expired.
  • When the above call is not wrapped inside Trino's hdfsEnvironment, it gets the ugi information via the UGI.loginUserFromSubject(null) call. This sets the externalKeyTab flag to true and injects only the Tgt ticket into the ugi. This means the keytab is managed externally and will prevent UGI from attempting to login the keytab, or to renew it.
  • When we wrap the above call inside hdfsEnvironment.doAs, the KerberosAuthentication code on Trino creates a login context and injects the presto keytab into it. The ugi created this way will ensure the ticket gets renewed correctly.

Will open a PR for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
1 participant