Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to pass query parameters? (:param or ?) #513

Open
shner-elmo opened this issue Oct 9, 2023 · 7 comments · May be fixed by #964
Open

Is it possible to pass query parameters? (:param or ?) #513

shner-elmo opened this issue Oct 9, 2023 · 7 comments · May be fixed by #964
Labels
enhancement New feature or request

Comments

@shner-elmo
Copy link

Hey, I'm trying to port a query from duckdb to datafusion using the Python connector, the issue I'm facing, is adding query parameters.
Is there currently a way that I can pass query parameters ?

import datafusion
import pyarrow.dataset as ds

dataset = ds.dataset(...)

ctx = datafusion.SessionContext()
ctx.register_dataset('dataset', dataset)
table = ctx.sql("SELECT * FROM dataset WHERE col BETWEEN ? AND ?", parameters=[1, 10]).to_arrow_table()
## TypeError: SessionContext.sql() got an unexpected keyword argument 'parameters'
@shner-elmo shner-elmo added the enhancement New feature or request label Oct 9, 2023
@MrPowers
Copy link
Contributor

Yea, parameterized queries would be awesome, especially syntax like this:

ctx.sql(
    "select country, count(*) as num_ppl from {person_df} group by country",
    person_df=person_df,
).show()

@timsaucer
Copy link
Contributor

I don't fully understand what is needed here. Can you not just do

ctx.sql(
    f"select country, count(*) as num_ppl from {person_df} group by country"
).show()

@MrPowers
Copy link
Contributor

@timsaucer - In this example, person_df is a DataFrame object, not a table name. This saves the user from having to register the table (the engine just does this under the hood). It was recently added to Spark and is a surprisingly nice dev quality of life improvement!

@timsaucer
Copy link
Contributor

Oh, interesting. As a pure dataframe user, I hadn't understood the request. I appreciate the clarification!

@shner-elmo
Copy link
Author

Parametrized queries are even more important for when you want to do: WHERE col IN big_list_of_values.

A year later, I again need to choose a library for reading parquet files, but I had to settle with ducks because data fusion doesn't support this

@timsaucer
Copy link
Contributor

timsaucer commented Dec 6, 2024

I just tried a quick proof of concept. To get this to work, I did need a name for the table. Would the following be an acceptable solution?

from datafusion import SessionContext
ctx = SessionContext()
ctx.register_parquet("customers", "examples/tpch/data/customer.parquet")
df_customer = ctx.table("customers")
ctx.sql("select c_custkey, c_name from {df}", df=df_customer)

@timsaucer
Copy link
Contributor

I think that if we change all of the read_x like read_parquet over to call register_x and auto generate a table name, then we can support removing the register line and make this work. I tested with parquet with success.

@timsaucer timsaucer linked a pull request Dec 6, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants