Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/parameterized sql queries #964

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

timsaucer
Copy link
Contributor

Which issue does this PR close?

Closes #513

Rationale for this change

Users would like to use DataFrames as a parameter inside an SQL query. With this change, you can do the following:

from datafusion import SessionContext
ctx = SessionContext()
df_customer = ctx.read_parquet("examples/tpch/data/customer.parquet")
ctx.sql("select c_custkey, c_name from {df}", df=df_customer)

The string {df} in the query will be replaced with the SQL equivalent of the logical plan of the DataFrame.

What changes are included in this PR?

All of the read_parquet, read_avro, read_json, and read_csv have been changed to call register_ with a generated table name. This table name is the file name. If a table already exists with that file name, a generated UUID is used instead.

One unit test is included.

Are there any user-facing changes?

There is an addition of an optional table name to each of the read_ functions above, but it is a non breaking change for the users.

@MrPowers
Copy link
Contributor

MrPowers commented Dec 6, 2024

This user interface looks nice 😎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Is it possible to pass query parameters? (:param or ?)
2 participants