Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting produce_arrow_string_view has no effect, still returns Utf8/LargeUtf arrow types #396

Open
2 tasks done
jhorstmann opened this issue Nov 12, 2024 · 0 comments
Open
2 tasks done

Comments

@jhorstmann
Copy link

What happens?

I am using the duckdb-rs crate to query an embedded duckdb and return the results as arrow record batches. I would like string types to be returned as arrow StringView types for more efficient memory usage, but it seems the produce_arrow_string_view setting has no effect on the returned data type.

To Reproduce

The issue can be reproduced with the following rust program. I did not see any type conversions in the rust wrapper itself, so I assume the issue in in the duckdb core.

[package]
name = "duckdb-arrow-test"
version = "0.1.0"
edition = "2021"

[dependencies]
duckdb = { version = "1.0.0", features = ["bundled"] }
use duckdb::Connection;

fn main() {
    let conn = Connection::open_in_memory().unwrap();

    let setup_script = r"
        SET arrow_output_list_view = true;
        SET produce_arrow_string_view = true;
        ";

    conn.execute_batch(&setup_script).unwrap();

    let mut query = conn
        .prepare("SELECT (i*10^i)::varchar AS str FROM range(5) tbl(i)")
        .unwrap();

    let arrow = query.query_arrow([]).unwrap();

    for batch in arrow {
        dbg!(batch.schema().field(0).data_type());
        dbg!(batch.column(0));
    }
}

Output:

$ cargo run
     Running `target/debug/duckdb-arrow-test`
[src/main.rs:20:9] batch.schema().field(0).data_type() = Utf8
[src/main.rs:21:9] batch.column(0) = StringArray
[
  "0.0",
  "10.0",
  "200.0",
  "3000.0",
  "40000.0",
]

The arrow_large_buffer_size setting correctly changes the data type to LargeUtf8 instead of Utf8.

OS:

x86_64 linux ubuntu

DuckDB Version:

1.1.1

DuckDB Client:

rust (duckdb-rs)

Hardware:

No response

Full Name:

Jörn Horstmann

Affiliation:

SAP SE

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

Not applicable - the reproduction does not require a data set

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

  • Yes, I have
@szarnyasg szarnyasg transferred this issue from duckdb/duckdb Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant