Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: BEARER auth not working with REST API #10215

Open
meatheadmike opened this issue Jan 17, 2025 · 7 comments
Open

[Bug]: BEARER auth not working with REST API #10215

meatheadmike opened this issue Jan 17, 2025 · 7 comments

Comments

@meatheadmike
Copy link

What happened

BEARER authentication seems to work fine when hitting the nessie api. However when I switch to rest the endpoint simply freezes. I've tried making requests with curl and it just sits there waiting for a result until timeout.

The http log entry looks like this:
2025-01-17 16:44:54,710 INFO [io.qua.htt.access-log] (vert.x-eventloop-thread-0) 10.239.21.45 - - [17/Jan/2025:16:44:54 +0000] "GET /iceberg/v1/config HTTP/1.1" 200 - . So the request is making through to the server. Something is causing it to hang though. And I'm not sure how to debug further.

Like I said, switching to the nessie API makes it work. And if I turn off authentication with REST enabled it also works.

How to reproduce it

Here's how I enable REST api and disable nessie from spark:

spark.sql.catalog.iceberg.type rest
spark.sql.catalog.iceberg.uri http://nessie.nessie:19120/iceberg/
#
##spark.sql.catalog.iceberg.warehouse s3a://my-bucket/iceberg-warehouse
##spark.sql.catalog.iceberg.type nessie
##spark.sql.catalog.iceberg.uri http://nessie.nessie:19120/api/v2
##spark.sql.catalog.nessie.catalog-impl org.apache.iceberg.nessie.NessieCatalog

Nessie server type (docker/uber-jar/built from source) and version

docker/helm 0.101.3

Client type (Ex: UI/Spark/pynessie ...) and version

spark 3.5.4

Additional information

No response

@dimas-b
Copy link
Member

dimas-b commented Jan 17, 2025

Could you show your complete Spark session config for the REST API use case (masking the secret stuff, of course)?

@meatheadmike
Copy link
Author

Certainly:

spark.broadcast.compress true
spark.checkpoint.compress true
#spark.driver.log.dfsDir s3a://XXXXXXXX
#spark.driver.log.persistToDfs.enabled true
spark.driver.maxResultSize 2g
spark.dynamicAllocation.shuffleTracking.enabled true
spark.eventLog.compress true
spark.eventLog.compression.codec snappy
#spark.eventLog.dir s3a://XXXXXXXX
spark.eventLog.enabled false
spark.eventLog.rolling.enabled true
spark.eventLog.rolling.maxFileSize 20m
spark.executor.memoryOverhead 2g
spark.hadoop.fs.s3a.aws.credentials.provider com.amazonaws.auth.WebIdentityTokenCredentialsProvider
spark.hadoop.fs.s3a.fast.upload true
spark.hadoop.fs.s3a.impl org.apache.hadoop.fs.s3a.S3AFileSystem
spark.hadoop.fs.s3a.path.style.access true
spark.hadoop.parquet.avro.write-old-list-structure false
spark.io.compression.codec snappy
spark.jars.ivySettings file:///opt/spark/config/ivy-settings.xml
spark.kryoserializer.buffer.max 1024m
spark.memory.fraction 0.2
spark.memory.storageFraction 0.2
spark.rdd.compress true
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.shuffle.compress true
# leave disabled as kubernetes doesn't support external shuffle service:
spark.shuffle.service.enabled false
spark.shuffle.spill.compress true
spark.speculation false
#spark.streaming.concurrentJobs 2
spark.streaming.stopGracefullyOnShutdown true
spark.sql.catalog.iceberg org.apache.iceberg.spark.SparkCatalog
spark.sql.defaultCatalog iceberg
spark.sql.catalog.iceberg.authentication.type BEARER
spark.sql.catalog.iceberg.authentication.token XXXXXXXX
#
# REST API settings:
#
spark.sql.catalog.iceberg.type rest
spark.sql.catalog.iceberg.uri http://nessie.nessie:19120/iceberg
#
# Nessie API settings:
#
##spark.sql.catalog.iceberg.warehouse s3a://XXXXXXXX
##spark.sql.catalog.iceberg.type nessie
##spark.sql.catalog.iceberg.uri http://nessie.nessie:19120/api/v2
##spark.sql.catalog.iceberg.catalog-impl org.apache.iceberg.nessie.NessieCatalog
#
spark.sql.catalogImplementation in-memory
spark.sql.catalog.spark_catalog org.apache.iceberg.spark.SparkSessionCatalog
spark.sql.extensions org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,org.projectnessie.spark.extensions.NessieSparkSessionExtensions
spark.sql.parquet.enableVectorizedReader true
spark.sql.shuffle.partitions 500
spark.sql.sources.partitionOverwriteMode dynamic
spark.sql.streaming.stateStore.compression.codec snappy
spark.sql.streaming.stateStore.providerClass org.apache.spark.sql.execution.streaming.state.RocksDBStateStoreProvider
spark.sql.warehouse.dir /tmp/warehouse
spark.ui.prometheus.enabled true
#
# S3 optimization settings (see https://spark.apache.org/docs/latest/cloud-integration.html):
#
spark.hadoop.fs.s3a.committer.name directory
spark.hadoop.parquet.enable.summary-metadata false
spark.sql.hive.metastorePartitionPruning true
spark.sql.orc.cache.stripe.details.size 10000
spark.sql.orc.filterPushdown true
spark.sql.orc.splits.include.file.footer true
spark.sql.parquet.filterPushdown true
spark.sql.parquet.mergeSchema false
spark.sql.parquet.output.committer.class org.apache.spark.internal.io.cloud.BindingParquetOutputCommitter
spark.sql.sources.commitProtocolClass org.apache.spark.internal.io.cloud.PathOutputCommitProtocol
spark.sql.streaming.checkpointFileManagerClass org.apache.spark.internal.io.cloud.AbortableStreamBasedCheckpointFileManager
#
# S3A performance settings from https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/performance.html:
#
spark.hadoop.fs.s3a.vectored.read.min.seek.size 4K
spark.hadoop.fs.s3a.vectored.read.max.merged.size 1M
spark.hadoop.fs.s3a.vectored.active.ranged.reads 4
spark.hadoop.fs.s3a.experimental.input.fadvise random
spark.hadoop.fs.s3a.performance.flag *
spark.hadoop.fs.s3a.block.size 128M
spark.hadoop.fs.s3a.retry.throttle.limit 20
spark.hadoop.fs.s3a.retry.throttle.interval 500ms

@meatheadmike
Copy link
Author

I should mention that I turned on debug logging in spark. The actual call to the nessie server does not appear to contain the bearer token. When I attempt the same call using curl it also hangs unless I add the "Authorization: Bearer" header. Here's the output from the http request call:

25/01/18 02:25:49 DEBUG headers: http-outgoing-0 >> GET /iceberg/v1/config HTTP/1.1
25/01/18 02:25:49 DEBUG headers: http-outgoing-0 >> Accept: application/json
25/01/18 02:25:49 DEBUG headers: http-outgoing-0 >> Content-Type: application/json
25/01/18 02:25:49 DEBUG headers: http-outgoing-0 >> Accept-Encoding: gzip, x-gzip, deflate
25/01/18 02:25:49 DEBUG headers: http-outgoing-0 >> Host: nessie.nessie:19120
25/01/18 02:25:49 DEBUG headers: http-outgoing-0 >> X-Client-Git-Commit-Short: 5f7c992
25/01/18 02:25:49 DEBUG headers: http-outgoing-0 >> X-Client-Version: Apache Iceberg 1.7.0 (commit 5f7c992ca673bf41df1d37543b24d646c24568a9)
25/01/18 02:25:49 DEBUG headers: http-outgoing-0 >> Connection: keep-alive
25/01/18 02:25:49 DEBUG headers: http-outgoing-0 >> User-Agent: Apache-HttpClient/5.4 (Java/17.0.13)
25/01/18 02:25:49 DEBUG headers: http-outgoing-0 >> Upgrade: TLS/1.2
25/01/18 02:25:49 DEBUG headers: http-outgoing-0 >> Connection: Upgrade
25/01/18 02:25:49 DEBUG wire: http-outgoing-0 >> "GET /iceberg/v1/config HTTP/1.1[\r][\n]"
25/01/18 02:25:49 DEBUG wire: http-outgoing-0 >> "Accept: application/json[\r][\n]"
25/01/18 02:25:49 DEBUG wire: http-outgoing-0 >> "Content-Type: application/json[\r][\n]"
25/01/18 02:25:49 DEBUG wire: http-outgoing-0 >> "Accept-Encoding: gzip, x-gzip, deflate[\r][\n]"
25/01/18 02:25:49 DEBUG wire: http-outgoing-0 >> "Host: nessie.nessie:19120[\r][\n]"
25/01/18 02:25:49 DEBUG wire: http-outgoing-0 >> "X-Client-Git-Commit-Short: 5f7c992[\r][\n]"
25/01/18 02:25:49 DEBUG wire: http-outgoing-0 >> "X-Client-Version: Apache Iceberg 1.7.0 (commit 5f7c992ca673bf41df1d37543b24d646c24568a9)[\r][\n]"
25/01/18 02:25:49 DEBUG wire: http-outgoing-0 >> "Connection: keep-alive[\r][\n]"
25/01/18 02:25:49 DEBUG wire: http-outgoing-0 >> "User-Agent: Apache-HttpClient/5.4 (Java/17.0.13)[\r][\n]"
25/01/18 02:25:49 DEBUG wire: http-outgoing-0 >> "Upgrade: TLS/1.2[\r][\n]"
25/01/18 02:25:49 DEBUG wire: http-outgoing-0 >> "Connection: Upgrade[\r][\n]"
25/01/18 02:25:49 DEBUG wire: http-outgoing-0 >> "[\r][\n]"

@dimas-b
Copy link
Member

dimas-b commented Jan 18, 2025

This looks like an Iceberg Catalog (in Spark) configuration issue to me (i.e. not Nessie 🙂 ).

I think that since sql.catalog.iceberg.type=rest, then spark.sql.catalog.iceberg.authentication.token=*** should probably be spark.sql.catalog.iceberg.token=***

Cf. https://github.com/apache/iceberg/blob/63af974efe51486c89bff8df5416781ab3181976/core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java#L984

@dimas-b
Copy link
Member

dimas-b commented Jan 18, 2025

Not sure why the call hangs... If that is still an issue, it might be best to take that discussion to Nessie's Zulip Chat

@meatheadmike
Copy link
Author

You are correct. It works when I set spark.sql.catalog.iceberg.token. BUT - I have to set BOTH spark.sql.catalog.iceberg.token and spark.sql.catalog.iceberg.authentication.token or I get an error message!

@dimas-b
Copy link
Member

dimas-b commented Jan 20, 2025

The latter (longer) property is probably required for NessieSparkSessionExtensions because Nessie SQL extensions work via the Nessie API (not Iceberg REST API).

Nessie's support for authentication options is much broader than Iceberg's. Hence, it is probably not practical for Nessie's SQL extensions to even try to integrate with Iceberg's auth properties at this time. There will always be rough edges on that path.

If you need to use Nessie's SQL extensions with the Iceberg REST Catalog in the same spark session, I think the best approach is to configure authentication for both APIs separately for now.

Reconciliation of auth features between the Nessie Client and Iceberg REST Client may be possible later, depending on how apache/iceberg#11995 (and related PRs) go on the Iceberg side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants