Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support to skip schema inference #45

Merged
merged 4 commits into from
Mar 20, 2024
Merged

Support to skip schema inference #45

merged 4 commits into from
Mar 20, 2024

Conversation

devendrasr
Copy link

The current version does not support complex data type parsing while inferring the schema from within the snapshot.
By the time support for complex data type comes, I am introducing a flag that can be used to skip this flow. This will offload schema parsing to the underlying parquet extension. Here is how you can do it -

scan data:

SELECT * FROM iceberg_scan("s3://my-bucket/icebergwh/someschema/t01", skip_schema_inference = true) limit 10;

scan metadata:

SELECT * FROM iceberg_metadata("s3://my-bucket/icebergwh/someschema/t01", skip_schema_inference = true) limit 10;

scan snapshots:

SELECT * FROM iceberg_snapshots("s3://my-bucket/icebergwh/someschema/t01", skip_schema_inference = true) limit 10;

Note - I am closing an earlier PR that was requesting these changes and was a bit complex to understand - #43

@samansmink
Copy link
Collaborator

looks good, thanks!

@samansmink samansmink merged commit 6bda4df into duckdb:main Mar 20, 2024
16 checks passed
@harel-e
Copy link

harel-e commented Mar 26, 2024

@samansmink - Hi, I downloaded DuckDB nightly and didn't find this feature (skip_schema_inference)
Will it be part of the upcoming 0.10.2?
Thanks

@samansmink
Copy link
Collaborator

@harel-e are you sure? for me it works:

force install iceberg from 'http://nightly-extensions.duckdb.org';
load iceberg;
FROM iceberg_metadata("my_iceberg_table", skip_schema_inference = true);

@harel-e
Copy link

harel-e commented Mar 28, 2024

@samansmink - I wasn't aware of force install, but it still failed.

Using the nightly build binary

./duckdb
v0.10.2-dev265 2687e2d6d9
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.

D force install iceberg from 'http://nightly-extensions.duckdb.org';
HTTP Error: Failed to download extension "iceberg" at URL "http://nightly-extensions.duckdb.org/2687e2d6d9/osx_arm64/iceberg.duckdb_extension.gz"
Extension "iceberg" is an existing extension.

Are you using a development build? In this case, extensions might not (yet) be uploaded.

@samansmink
Copy link
Collaborator

@harel-e yea we don't have good update semantics (yet) for extensions. Force installing will override your current installation with whatever you provide, otherwise DuckDB will not update thinking that iceberg is already installed.

Using the nightly build binary

That's a bit quirky atm: we distribute nightly binaries for extensions that target the latest stable release of duckdb, and we distribute nightly binaries of duckdb with stable versions of extensions. But we do not distribute nightly extensions for nightly binaries of duckdb automatically so these can be behind sometimes.

I will bump the iceberg extension in duckdb main which should resolve this

@harel-e
Copy link

harel-e commented Apr 7, 2024

@samansmink - Thank you for making this change available in the extensions.
Will this PR be available in the upcoming 0.10.2 version as part of the stable extension version? (i.e. just using 'install iceberg') ?

mike-luabase pushed a commit to definite-app/duckdb_iceberg that referenced this pull request Oct 27, 2024
Support to skip schema inference
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants