You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We integrated a custom input format using "UseFileSplitsFromInputFormat" annotation and deployed changes to our cluster. Things were working fine for a few days until we started seeing the below kerberos error for queries that were relying on the input format.
Caused by: java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "<hostname>/<ip>"; destination host is: "<namenode-hostname>":<port>;
Full stacktrace:
Query 20230310_213839_92645_7sjqm failed: java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "<hostname>/<ip>"; destination host is: "<namenode-hostname>":<port>;
io.trino.spi.TrinoException: java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "<hostname>/<ip>"; destination host is: "<namenode-hostname>":<port>;
at io.trino.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:281)
at io.trino.plugin.hive.util.ResumableTasks$1.run(ResumableTasks.java:38)
at io.trino.$gen.Trino_li_352_base_155_g961211d_dirty____20230303_011254_2.run(Unknown Source)
at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:80)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.lang.RuntimeException: java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "<hostname>/<ip>"; destination host is: "<namenode-hostname>":<port>;
at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:665)
at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:452)
at io.trino.plugin.hive.BackgroundHiveSplitLoader.loadPartition(BackgroundHiveSplitLoader.java:486)
at io.trino.plugin.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:345)
at io.trino.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:274)
... 6 more
Caused by: java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "<hostname>/<ip>"; destination host is: "<namenode-hostname>":<port>;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:808)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1558)
at org.apache.hadoop.ipc.Client.call(Client.java:1500)
at org.apache.hadoop.ipc.Client.call(Client.java:1397)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
at jdk.proxy5/jdk.proxy5.$Proxy367.getBlockLocations(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:263)
at jdk.internal.reflect.GeneratedMethodAccessor1345.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at jdk.proxy5/jdk.proxy5.$Proxy368.getBlockLocations(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:849)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:838)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:827)
at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:332)
at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:292)
at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:276)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1066)
at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:327)
at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:323)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:335)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:919)
at org.apache.hadoop.fs.FileBasedMountTableLoader.loadMountTableFromLocation(FileBasedMountTableLoader.java:203)
at org.apache.hadoop.fs.FileBasedMountTableLoader.loadMountTable(FileBasedMountTableLoader.java:107)
at org.apache.hadoop.fs.MountTableLoaderFactory.loadMountTable(MountTableLoaderFactory.java:83)
at org.apache.hadoop.fs.GridMountTableState.fetchLatestMountTableConfiguration(GridMountTableState.java:148)
at org.apache.hadoop.fs.GridFilesystem.loadMountTableConfigs(GridFilesystem.java:303)
at org.apache.hadoop.fs.viewfs.ViewFileSystemOverloadScheme.initialize(ViewFileSystemOverloadScheme.java:108)
at org.apache.hadoop.fs.GridFilesystem.reloadInternalMountState(GridFilesystem.java:1289)
at org.apache.hadoop.fs.GridMountTableState.reload(GridMountTableState.java:64)
at org.apache.hadoop.fs.GridFilesystem.initialize(GridFilesystem.java:182)
at io.trino.plugin.hive.fs.TrinoFileSystemCache.createFileSystem(TrinoFileSystemCache.java:150)
at io.trino.plugin.hive.fs.TrinoFileSystemCache$FileSystemHolder.createFileSystemOnce(TrinoFileSystemCache.java:332)
at io.trino.plugin.hive.fs.TrinoFileSystemCache.getInternal(TrinoFileSystemCache.java:129)
at io.trino.plugin.hive.fs.TrinoFileSystemCache.get(TrinoFileSystemCache.java:88)
at org.apache.hadoop.fs.ForwardingFileSystemCache.get(ForwardingFileSystemCache.java:38)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:475)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:228)
at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:661)
... 10 more
Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:783)
at java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
at java.base/javax.security.auth.Subject.doAs(Subject.java:439)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1905)
at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:743)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:840)
at org.apache.hadoop.ipc.Client$Connection.access$3700(Client.java:423)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1615)
at org.apache.hadoop.ipc.Client.call(Client.java:1444)
... 53 more
Caused by: javax.security.sasl.SaslException: GSS initiate failed
at jdk.security.jgss/com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:228)
at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:414)
at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:629)
at org.apache.hadoop.ipc.Client$Connection.access$2200(Client.java:423)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:827)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:823)
at java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
at java.base/javax.security.auth.Subject.doAs(Subject.java:439)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1905)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:823)
... 56 more
Caused by: org.ietf.jgss.GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
at java.security.jgss/sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:164)
at java.security.jgss/sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:126)
at java.security.jgss/sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:195)
at java.security.jgss/sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:205)
at java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:230)
at java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:196)
at jdk.security.jgss/com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:209)
... 65 more
Root cause
Remote debugging showed that the FileInputFormat.setInputPaths which is supposed to be setting some parameters in the jobConf is internally making a call to namenode to fetch the working directory and update mapreduce.job.working.dir. This call started to fail once the ticket got expired.
When the above call is not wrapped inside Trino's hdfsEnvironment, it gets the ugi information via the UGI.loginUserFromSubject(null) call. This sets the externalKeyTab flag to true and injects only the Tgt ticket into the ugi. This means the keytab is managed externally and will prevent UGI from attempting to login the keytab, or to renew it.
When we wrap the above call inside hdfsEnvironment.doAs, the KerberosAuthentication code on Trino creates a login context and injects the presto keytab into it. The ugi created this way will ensure the ticket gets renewed correctly.
Will open a PR for this.
The text was updated successfully, but these errors were encountered:
We integrated a custom input format using "UseFileSplitsFromInputFormat" annotation and deployed changes to our cluster. Things were working fine for a few days until we started seeing the below kerberos error for queries that were relying on the input format.
Full stacktrace:
Root cause
mapreduce.job.working.dir
. This call started to fail once the ticket got expired.Will open a PR for this.
The text was updated successfully, but these errors were encountered: