Support to skip schema inference #45

devendrasr · 2024-03-19T11:56:31Z

The current version does not support complex data type parsing while inferring the schema from within the snapshot.
By the time support for complex data type comes, I am introducing a flag that can be used to skip this flow. This will offload schema parsing to the underlying parquet extension. Here is how you can do it -

scan data:

SELECT * FROM iceberg_scan("s3://my-bucket/icebergwh/someschema/t01", skip_schema_inference = true) limit 10;

scan metadata:

SELECT * FROM iceberg_metadata("s3://my-bucket/icebergwh/someschema/t01", skip_schema_inference = true) limit 10;

scan snapshots:

SELECT * FROM iceberg_snapshots("s3://my-bucket/icebergwh/someschema/t01", skip_schema_inference = true) limit 10;

Note - I am closing an earlier PR that was requesting these changes and was a bit complex to understand - #43

…adata files

…schema_inference

samansmink · 2024-03-20T08:28:17Z

looks good, thanks!

harel-e · 2024-03-26T04:52:01Z

@samansmink - Hi, I downloaded DuckDB nightly and didn't find this feature (skip_schema_inference)
Will it be part of the upcoming 0.10.2?
Thanks

samansmink · 2024-03-28T09:04:44Z

@harel-e are you sure? for me it works:

force install iceberg from 'http://nightly-extensions.duckdb.org';
load iceberg;
FROM iceberg_metadata("my_iceberg_table", skip_schema_inference = true);

harel-e · 2024-03-28T23:45:09Z

@samansmink - I wasn't aware of force install, but it still failed.

Using the nightly build binary

./duckdb
v0.10.2-dev265 2687e2d6d9
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.

D force install iceberg from 'http://nightly-extensions.duckdb.org';
HTTP Error: Failed to download extension "iceberg" at URL "http://nightly-extensions.duckdb.org/2687e2d6d9/osx_arm64/iceberg.duckdb_extension.gz"
Extension "iceberg" is an existing extension.

Are you using a development build? In this case, extensions might not (yet) be uploaded.

samansmink · 2024-03-29T08:12:11Z

@harel-e yea we don't have good update semantics (yet) for extensions. Force installing will override your current installation with whatever you provide, otherwise DuckDB will not update thinking that iceberg is already installed.

Using the nightly build binary

That's a bit quirky atm: we distribute nightly binaries for extensions that target the latest stable release of duckdb, and we distribute nightly binaries of duckdb with stable versions of extensions. But we do not distribute nightly extensions for nightly binaries of duckdb automatically so these can be behind sometimes.

I will bump the iceberg extension in duckdb main which should resolve this

harel-e · 2024-04-07T06:02:38Z

@samansmink - Thank you for making this change available in the extensions.
Will this PR be available in the upcoming 0.10.2 version as part of the stable extension version? (i.e. just using 'install iceberg') ?

Support to skip schema inference

Devendra added 4 commits March 7, 2024 11:46

support to skip schema inference along with reading gz compressed met…

c02a3d3

…adata files

Merge branch 'feature/read_gz_compressed_metadata' into feature/skip_…

0390051

…schema_inference

Merge branch 'feature/read_gz_compressed_metadata' into feature/skip_…

e6e6023

…schema_inference

Merge branch 'feature/read_gz_compressed_metadata' into feature/skip_…

8686446

…schema_inference

devendrasr mentioned this pull request Mar 19, 2024

Support to skip schema inference - Archived #43

Closed

samansmink merged commit 6bda4df into duckdb:main Mar 20, 2024
16 checks passed

devendrasr mentioned this pull request Mar 20, 2024

Unable to read complex data types(e.g. Map, Struct) after upgrading to latest (0.10.0) version #41

Closed

samansmink mentioned this pull request Mar 28, 2024

IOException when querying table with a list<int> column. #47

Closed

mike-luabase pushed a commit to definite-app/duckdb_iceberg that referenced this pull request Oct 27, 2024

Merge pull request duckdb#45 from devendrasr/main

80f496b

Support to skip schema inference

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support to skip schema inference #45

Support to skip schema inference #45

devendrasr commented Mar 19, 2024

samansmink commented Mar 20, 2024

harel-e commented Mar 26, 2024

samansmink commented Mar 28, 2024

harel-e commented Mar 28, 2024

samansmink commented Mar 29, 2024

harel-e commented Apr 7, 2024

Support to skip schema inference #45

Support to skip schema inference #45

Conversation

devendrasr commented Mar 19, 2024

samansmink commented Mar 20, 2024

harel-e commented Mar 26, 2024

samansmink commented Mar 28, 2024

harel-e commented Mar 28, 2024

samansmink commented Mar 29, 2024

harel-e commented Apr 7, 2024