[Bug]: BEARER auth not working with REST API #10215

meatheadmike · 2025-01-17T17:02:39Z

What happened

BEARER authentication seems to work fine when hitting the nessie api. However when I switch to rest the endpoint simply freezes. I've tried making requests with curl and it just sits there waiting for a result until timeout.

The http log entry looks like this:
2025-01-17 16:44:54,710 INFO [io.qua.htt.access-log] (vert.x-eventloop-thread-0) 10.239.21.45 - - [17/Jan/2025:16:44:54 +0000] "GET /iceberg/v1/config HTTP/1.1" 200 - . So the request is making through to the server. Something is causing it to hang though. And I'm not sure how to debug further.

Like I said, switching to the nessie API makes it work. And if I turn off authentication with REST enabled it also works.

How to reproduce it

Here's how I enable REST api and disable nessie from spark:

spark.sql.catalog.iceberg.type rest
spark.sql.catalog.iceberg.uri http://nessie.nessie:19120/iceberg/
#
##spark.sql.catalog.iceberg.warehouse s3a://my-bucket/iceberg-warehouse
##spark.sql.catalog.iceberg.type nessie
##spark.sql.catalog.iceberg.uri http://nessie.nessie:19120/api/v2
##spark.sql.catalog.nessie.catalog-impl org.apache.iceberg.nessie.NessieCatalog

Nessie server type (docker/uber-jar/built from source) and version

docker/helm 0.101.3

Client type (Ex: UI/Spark/pynessie ...) and version

spark 3.5.4

Additional information

No response

The text was updated successfully, but these errors were encountered:

dimas-b · 2025-01-17T18:22:09Z

Could you show your complete Spark session config for the REST API use case (masking the secret stuff, of course)?

meatheadmike · 2025-01-18T02:20:58Z

Certainly:

spark.broadcast.compress true
spark.checkpoint.compress true
#spark.driver.log.dfsDir s3a://XXXXXXXX
#spark.driver.log.persistToDfs.enabled true
spark.driver.maxResultSize 2g
spark.dynamicAllocation.shuffleTracking.enabled true
spark.eventLog.compress true
spark.eventLog.compression.codec snappy
#spark.eventLog.dir s3a://XXXXXXXX
spark.eventLog.enabled false
spark.eventLog.rolling.enabled true
spark.eventLog.rolling.maxFileSize 20m
spark.executor.memoryOverhead 2g
spark.hadoop.fs.s3a.aws.credentials.provider com.amazonaws.auth.WebIdentityTokenCredentialsProvider
spark.hadoop.fs.s3a.fast.upload true
spark.hadoop.fs.s3a.impl org.apache.hadoop.fs.s3a.S3AFileSystem
spark.hadoop.fs.s3a.path.style.access true
spark.hadoop.parquet.avro.write-old-list-structure false
spark.io.compression.codec snappy
spark.jars.ivySettings file:///opt/spark/config/ivy-settings.xml
spark.kryoserializer.buffer.max 1024m
spark.memory.fraction 0.2
spark.memory.storageFraction 0.2
spark.rdd.compress true
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.shuffle.compress true
# leave disabled as kubernetes doesn't support external shuffle service:
spark.shuffle.service.enabled false
spark.shuffle.spill.compress true
spark.speculation false
#spark.streaming.concurrentJobs 2
spark.streaming.stopGracefullyOnShutdown true
spark.sql.catalog.iceberg org.apache.iceberg.spark.SparkCatalog
spark.sql.defaultCatalog iceberg
spark.sql.catalog.iceberg.authentication.type BEARER
spark.sql.catalog.iceberg.authentication.token XXXXXXXX
#
# REST API settings:
#
spark.sql.catalog.iceberg.type rest
spark.sql.catalog.iceberg.uri http://nessie.nessie:19120/iceberg
#
# Nessie API settings:
#
##spark.sql.catalog.iceberg.warehouse s3a://XXXXXXXX
##spark.sql.catalog.iceberg.type nessie
##spark.sql.catalog.iceberg.uri http://nessie.nessie:19120/api/v2
##spark.sql.catalog.iceberg.catalog-impl org.apache.iceberg.nessie.NessieCatalog
#
spark.sql.catalogImplementation in-memory
spark.sql.catalog.spark_catalog org.apache.iceberg.spark.SparkSessionCatalog
spark.sql.extensions org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,org.projectnessie.spark.extensions.NessieSparkSessionExtensions
spark.sql.parquet.enableVectorizedReader true
spark.sql.shuffle.partitions 500
spark.sql.sources.partitionOverwriteMode dynamic
spark.sql.streaming.stateStore.compression.codec snappy
spark.sql.streaming.stateStore.providerClass org.apache.spark.sql.execution.streaming.state.RocksDBStateStoreProvider
spark.sql.warehouse.dir /tmp/warehouse
spark.ui.prometheus.enabled true
#
# S3 optimization settings (see https://spark.apache.org/docs/latest/cloud-integration.html):
#
spark.hadoop.fs.s3a.committer.name directory
spark.hadoop.parquet.enable.summary-metadata false
spark.sql.hive.metastorePartitionPruning true
spark.sql.orc.cache.stripe.details.size 10000
spark.sql.orc.filterPushdown true
spark.sql.orc.splits.include.file.footer true
spark.sql.parquet.filterPushdown true
spark.sql.parquet.mergeSchema false
spark.sql.parquet.output.committer.class org.apache.spark.internal.io.cloud.BindingParquetOutputCommitter
spark.sql.sources.commitProtocolClass org.apache.spark.internal.io.cloud.PathOutputCommitProtocol
spark.sql.streaming.checkpointFileManagerClass org.apache.spark.internal.io.cloud.AbortableStreamBasedCheckpointFileManager
#
# S3A performance settings from https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/performance.html:
#
spark.hadoop.fs.s3a.vectored.read.min.seek.size 4K
spark.hadoop.fs.s3a.vectored.read.max.merged.size 1M
spark.hadoop.fs.s3a.vectored.active.ranged.reads 4
spark.hadoop.fs.s3a.experimental.input.fadvise random
spark.hadoop.fs.s3a.performance.flag *
spark.hadoop.fs.s3a.block.size 128M
spark.hadoop.fs.s3a.retry.throttle.limit 20
spark.hadoop.fs.s3a.retry.throttle.interval 500ms

meatheadmike · 2025-01-18T02:27:34Z

I should mention that I turned on debug logging in spark. The actual call to the nessie server does not appear to contain the bearer token. When I attempt the same call using curl it also hangs unless I add the "Authorization: Bearer" header. Here's the output from the http request call:

25/01/18 02:25:49 DEBUG headers: http-outgoing-0 >> GET /iceberg/v1/config HTTP/1.1
25/01/18 02:25:49 DEBUG headers: http-outgoing-0 >> Accept: application/json
25/01/18 02:25:49 DEBUG headers: http-outgoing-0 >> Content-Type: application/json
25/01/18 02:25:49 DEBUG headers: http-outgoing-0 >> Accept-Encoding: gzip, x-gzip, deflate
25/01/18 02:25:49 DEBUG headers: http-outgoing-0 >> Host: nessie.nessie:19120
25/01/18 02:25:49 DEBUG headers: http-outgoing-0 >> X-Client-Git-Commit-Short: 5f7c992
25/01/18 02:25:49 DEBUG headers: http-outgoing-0 >> X-Client-Version: Apache Iceberg 1.7.0 (commit 5f7c992ca673bf41df1d37543b24d646c24568a9)
25/01/18 02:25:49 DEBUG headers: http-outgoing-0 >> Connection: keep-alive
25/01/18 02:25:49 DEBUG headers: http-outgoing-0 >> User-Agent: Apache-HttpClient/5.4 (Java/17.0.13)
25/01/18 02:25:49 DEBUG headers: http-outgoing-0 >> Upgrade: TLS/1.2
25/01/18 02:25:49 DEBUG headers: http-outgoing-0 >> Connection: Upgrade
25/01/18 02:25:49 DEBUG wire: http-outgoing-0 >> "GET /iceberg/v1/config HTTP/1.1[\r][\n]"
25/01/18 02:25:49 DEBUG wire: http-outgoing-0 >> "Accept: application/json[\r][\n]"
25/01/18 02:25:49 DEBUG wire: http-outgoing-0 >> "Content-Type: application/json[\r][\n]"
25/01/18 02:25:49 DEBUG wire: http-outgoing-0 >> "Accept-Encoding: gzip, x-gzip, deflate[\r][\n]"
25/01/18 02:25:49 DEBUG wire: http-outgoing-0 >> "Host: nessie.nessie:19120[\r][\n]"
25/01/18 02:25:49 DEBUG wire: http-outgoing-0 >> "X-Client-Git-Commit-Short: 5f7c992[\r][\n]"
25/01/18 02:25:49 DEBUG wire: http-outgoing-0 >> "X-Client-Version: Apache Iceberg 1.7.0 (commit 5f7c992ca673bf41df1d37543b24d646c24568a9)[\r][\n]"
25/01/18 02:25:49 DEBUG wire: http-outgoing-0 >> "Connection: keep-alive[\r][\n]"
25/01/18 02:25:49 DEBUG wire: http-outgoing-0 >> "User-Agent: Apache-HttpClient/5.4 (Java/17.0.13)[\r][\n]"
25/01/18 02:25:49 DEBUG wire: http-outgoing-0 >> "Upgrade: TLS/1.2[\r][\n]"
25/01/18 02:25:49 DEBUG wire: http-outgoing-0 >> "Connection: Upgrade[\r][\n]"
25/01/18 02:25:49 DEBUG wire: http-outgoing-0 >> "[\r][\n]"

dimas-b · 2025-01-18T05:03:21Z

This looks like an Iceberg Catalog (in Spark) configuration issue to me (i.e. not Nessie 🙂 ).

I think that since sql.catalog.iceberg.type=rest, then spark.sql.catalog.iceberg.authentication.token=*** should probably be spark.sql.catalog.iceberg.token=***

Cf. https://github.com/apache/iceberg/blob/63af974efe51486c89bff8df5416781ab3181976/core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java#L984

dimas-b · 2025-01-18T05:05:21Z

Not sure why the call hangs... If that is still an issue, it might be best to take that discussion to Nessie's Zulip Chat

meatheadmike · 2025-01-18T12:25:03Z

You are correct. It works when I set spark.sql.catalog.iceberg.token. BUT - I have to set BOTH spark.sql.catalog.iceberg.token and spark.sql.catalog.iceberg.authentication.token or I get an error message!

dimas-b · 2025-01-20T23:15:40Z

The latter (longer) property is probably required for NessieSparkSessionExtensions because Nessie SQL extensions work via the Nessie API (not Iceberg REST API).

Nessie's support for authentication options is much broader than Iceberg's. Hence, it is probably not practical for Nessie's SQL extensions to even try to integrate with Iceberg's auth properties at this time. There will always be rough edges on that path.

If you need to use Nessie's SQL extensions with the Iceberg REST Catalog in the same spark session, I think the best approach is to configure authentication for both APIs separately for now.

Reconciliation of auth features between the Nessie Client and Iceberg REST Client may be possible later, depending on how apache/iceberg#11995 (and related PRs) go on the Iceberg side.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: BEARER auth not working with REST API #10215

[Bug]: BEARER auth not working with REST API #10215

meatheadmike commented Jan 17, 2025

dimas-b commented Jan 17, 2025

meatheadmike commented Jan 18, 2025

meatheadmike commented Jan 18, 2025

dimas-b commented Jan 18, 2025

dimas-b commented Jan 18, 2025 •

edited

Loading

meatheadmike commented Jan 18, 2025

dimas-b commented Jan 20, 2025

[Bug]: BEARER auth not working with REST API #10215

[Bug]: BEARER auth not working with REST API #10215

Comments

meatheadmike commented Jan 17, 2025

What happened

How to reproduce it

Nessie server type (docker/uber-jar/built from source) and version

Client type (Ex: UI/Spark/pynessie ...) and version

Additional information

dimas-b commented Jan 17, 2025

meatheadmike commented Jan 18, 2025

meatheadmike commented Jan 18, 2025

dimas-b commented Jan 18, 2025

dimas-b commented Jan 18, 2025 • edited Loading

meatheadmike commented Jan 18, 2025

dimas-b commented Jan 20, 2025

dimas-b commented Jan 18, 2025 •

edited

Loading