-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: BEARER auth not working with REST API #10215
Comments
Could you show your complete Spark session config for the REST API use case (masking the secret stuff, of course)? |
Certainly: spark.broadcast.compress true
spark.checkpoint.compress true
#spark.driver.log.dfsDir s3a://XXXXXXXX
#spark.driver.log.persistToDfs.enabled true
spark.driver.maxResultSize 2g
spark.dynamicAllocation.shuffleTracking.enabled true
spark.eventLog.compress true
spark.eventLog.compression.codec snappy
#spark.eventLog.dir s3a://XXXXXXXX
spark.eventLog.enabled false
spark.eventLog.rolling.enabled true
spark.eventLog.rolling.maxFileSize 20m
spark.executor.memoryOverhead 2g
spark.hadoop.fs.s3a.aws.credentials.provider com.amazonaws.auth.WebIdentityTokenCredentialsProvider
spark.hadoop.fs.s3a.fast.upload true
spark.hadoop.fs.s3a.impl org.apache.hadoop.fs.s3a.S3AFileSystem
spark.hadoop.fs.s3a.path.style.access true
spark.hadoop.parquet.avro.write-old-list-structure false
spark.io.compression.codec snappy
spark.jars.ivySettings file:///opt/spark/config/ivy-settings.xml
spark.kryoserializer.buffer.max 1024m
spark.memory.fraction 0.2
spark.memory.storageFraction 0.2
spark.rdd.compress true
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.shuffle.compress true
# leave disabled as kubernetes doesn't support external shuffle service:
spark.shuffle.service.enabled false
spark.shuffle.spill.compress true
spark.speculation false
#spark.streaming.concurrentJobs 2
spark.streaming.stopGracefullyOnShutdown true
spark.sql.catalog.iceberg org.apache.iceberg.spark.SparkCatalog
spark.sql.defaultCatalog iceberg
spark.sql.catalog.iceberg.authentication.type BEARER
spark.sql.catalog.iceberg.authentication.token XXXXXXXX
#
# REST API settings:
#
spark.sql.catalog.iceberg.type rest
spark.sql.catalog.iceberg.uri http://nessie.nessie:19120/iceberg
#
# Nessie API settings:
#
##spark.sql.catalog.iceberg.warehouse s3a://XXXXXXXX
##spark.sql.catalog.iceberg.type nessie
##spark.sql.catalog.iceberg.uri http://nessie.nessie:19120/api/v2
##spark.sql.catalog.iceberg.catalog-impl org.apache.iceberg.nessie.NessieCatalog
#
spark.sql.catalogImplementation in-memory
spark.sql.catalog.spark_catalog org.apache.iceberg.spark.SparkSessionCatalog
spark.sql.extensions org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,org.projectnessie.spark.extensions.NessieSparkSessionExtensions
spark.sql.parquet.enableVectorizedReader true
spark.sql.shuffle.partitions 500
spark.sql.sources.partitionOverwriteMode dynamic
spark.sql.streaming.stateStore.compression.codec snappy
spark.sql.streaming.stateStore.providerClass org.apache.spark.sql.execution.streaming.state.RocksDBStateStoreProvider
spark.sql.warehouse.dir /tmp/warehouse
spark.ui.prometheus.enabled true
#
# S3 optimization settings (see https://spark.apache.org/docs/latest/cloud-integration.html):
#
spark.hadoop.fs.s3a.committer.name directory
spark.hadoop.parquet.enable.summary-metadata false
spark.sql.hive.metastorePartitionPruning true
spark.sql.orc.cache.stripe.details.size 10000
spark.sql.orc.filterPushdown true
spark.sql.orc.splits.include.file.footer true
spark.sql.parquet.filterPushdown true
spark.sql.parquet.mergeSchema false
spark.sql.parquet.output.committer.class org.apache.spark.internal.io.cloud.BindingParquetOutputCommitter
spark.sql.sources.commitProtocolClass org.apache.spark.internal.io.cloud.PathOutputCommitProtocol
spark.sql.streaming.checkpointFileManagerClass org.apache.spark.internal.io.cloud.AbortableStreamBasedCheckpointFileManager
#
# S3A performance settings from https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/performance.html:
#
spark.hadoop.fs.s3a.vectored.read.min.seek.size 4K
spark.hadoop.fs.s3a.vectored.read.max.merged.size 1M
spark.hadoop.fs.s3a.vectored.active.ranged.reads 4
spark.hadoop.fs.s3a.experimental.input.fadvise random
spark.hadoop.fs.s3a.performance.flag *
spark.hadoop.fs.s3a.block.size 128M
spark.hadoop.fs.s3a.retry.throttle.limit 20
spark.hadoop.fs.s3a.retry.throttle.interval 500ms
|
I should mention that I turned on debug logging in spark. The actual call to the nessie server does not appear to contain the bearer token. When I attempt the same call using curl it also hangs unless I add the "Authorization: Bearer" header. Here's the output from the http request call: 25/01/18 02:25:49 DEBUG headers: http-outgoing-0 >> GET /iceberg/v1/config HTTP/1.1
25/01/18 02:25:49 DEBUG headers: http-outgoing-0 >> Accept: application/json
25/01/18 02:25:49 DEBUG headers: http-outgoing-0 >> Content-Type: application/json
25/01/18 02:25:49 DEBUG headers: http-outgoing-0 >> Accept-Encoding: gzip, x-gzip, deflate
25/01/18 02:25:49 DEBUG headers: http-outgoing-0 >> Host: nessie.nessie:19120
25/01/18 02:25:49 DEBUG headers: http-outgoing-0 >> X-Client-Git-Commit-Short: 5f7c992
25/01/18 02:25:49 DEBUG headers: http-outgoing-0 >> X-Client-Version: Apache Iceberg 1.7.0 (commit 5f7c992ca673bf41df1d37543b24d646c24568a9)
25/01/18 02:25:49 DEBUG headers: http-outgoing-0 >> Connection: keep-alive
25/01/18 02:25:49 DEBUG headers: http-outgoing-0 >> User-Agent: Apache-HttpClient/5.4 (Java/17.0.13)
25/01/18 02:25:49 DEBUG headers: http-outgoing-0 >> Upgrade: TLS/1.2
25/01/18 02:25:49 DEBUG headers: http-outgoing-0 >> Connection: Upgrade
25/01/18 02:25:49 DEBUG wire: http-outgoing-0 >> "GET /iceberg/v1/config HTTP/1.1[\r][\n]"
25/01/18 02:25:49 DEBUG wire: http-outgoing-0 >> "Accept: application/json[\r][\n]"
25/01/18 02:25:49 DEBUG wire: http-outgoing-0 >> "Content-Type: application/json[\r][\n]"
25/01/18 02:25:49 DEBUG wire: http-outgoing-0 >> "Accept-Encoding: gzip, x-gzip, deflate[\r][\n]"
25/01/18 02:25:49 DEBUG wire: http-outgoing-0 >> "Host: nessie.nessie:19120[\r][\n]"
25/01/18 02:25:49 DEBUG wire: http-outgoing-0 >> "X-Client-Git-Commit-Short: 5f7c992[\r][\n]"
25/01/18 02:25:49 DEBUG wire: http-outgoing-0 >> "X-Client-Version: Apache Iceberg 1.7.0 (commit 5f7c992ca673bf41df1d37543b24d646c24568a9)[\r][\n]"
25/01/18 02:25:49 DEBUG wire: http-outgoing-0 >> "Connection: keep-alive[\r][\n]"
25/01/18 02:25:49 DEBUG wire: http-outgoing-0 >> "User-Agent: Apache-HttpClient/5.4 (Java/17.0.13)[\r][\n]"
25/01/18 02:25:49 DEBUG wire: http-outgoing-0 >> "Upgrade: TLS/1.2[\r][\n]"
25/01/18 02:25:49 DEBUG wire: http-outgoing-0 >> "Connection: Upgrade[\r][\n]"
25/01/18 02:25:49 DEBUG wire: http-outgoing-0 >> "[\r][\n]" |
This looks like an Iceberg Catalog (in Spark) configuration issue to me (i.e. not Nessie 🙂 ). I think that since |
Not sure why the call hangs... If that is still an issue, it might be best to take that discussion to Nessie's Zulip Chat |
You are correct. It works when I set |
The latter (longer) property is probably required for Nessie's support for authentication options is much broader than Iceberg's. Hence, it is probably not practical for Nessie's SQL extensions to even try to integrate with Iceberg's auth properties at this time. There will always be rough edges on that path. If you need to use Nessie's SQL extensions with the Iceberg REST Catalog in the same spark session, I think the best approach is to configure authentication for both APIs separately for now. Reconciliation of auth features between the Nessie Client and Iceberg REST Client may be possible later, depending on how apache/iceberg#11995 (and related PRs) go on the Iceberg side. |
What happened
BEARER authentication seems to work fine when hitting the nessie api. However when I switch to rest the endpoint simply freezes. I've tried making requests with curl and it just sits there waiting for a result until timeout.
The http log entry looks like this:
2025-01-17 16:44:54,710 INFO [io.qua.htt.access-log] (vert.x-eventloop-thread-0) 10.239.21.45 - - [17/Jan/2025:16:44:54 +0000] "GET /iceberg/v1/config HTTP/1.1" 200 -
. So the request is making through to the server. Something is causing it to hang though. And I'm not sure how to debug further.Like I said, switching to the nessie API makes it work. And if I turn off authentication with REST enabled it also works.
How to reproduce it
Here's how I enable REST api and disable nessie from spark:
Nessie server type (docker/uber-jar/built from source) and version
docker/helm 0.101.3
Client type (Ex: UI/Spark/pynessie ...) and version
spark 3.5.4
Additional information
No response
The text was updated successfully, but these errors were encountered: