-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for SQLContext #999
base: main
Are you sure you want to change the base?
Conversation
Hi @ceyhunkerti! Quick question before I dig into the code. You say:
Would |
Hey, thanks for the quick feedback, |
Ah I see. A function that only works on one frame certainly won't enable multi-frame joins! Thanks for clarifying the goal. I'll take a look soon. Note: we were/are hesitant about supporting Polars-specific SQL interfaces. You can read the discussion here: We went with |
hmm, thinking loudly 🤔 to understand backend independence.
# pseudo
df1 = DF.sql("select ...", backend: polars) |> compute
df2 = DF.some_action(df1, backend: my_new_backend) I believe the above one is not possible unless I think by backend independence we just imply the If we are worrying about the PS. When thinking of another backend, first one comes to my mind is btw, I do these for learning purposes and am happy to support if I can. We can park this one for now if you'd like, and I can focus on something else that you think might be more helpful or a higher priority. |
What if we allow |
Or this could be even better:
|
@josevalim You read my mind! Perhaps even: DF.sql(%{foo: df1, bar: df2, baz: df3}, "select") to prevent duplicates? Also I think behind the scenes we'd need to individually register each table with SQLContext. |
If we use a map, we don't have the concept of "first". |
Also mentioned something similar to it in here Back to the original issue.
DF.sql("select", %{}) seems more natural to me like |
Also, this referenced issue here In the issue author mentioned in order to achieve complex result =
all_pairs
|> DataFrame.join(all_relationships_1, how: :left,
on: [
{"table_name_1", "table_name"},
{"column_name_1", "column_name"},
{"table_name_2", "referenced_table_name"},
{"column_name_2", "referenced_column_name"}
])
|> DataFrame.join(all_relationships_2, how: :left,
on: [
{"table_name_2", "table_name"},
{"column_name_2", "column_name"},
{"table_name_1", "referenced_table_name"},
{"column_name_1", "referenced_column_name"}
])
|> DataFrame.mutate(has_relationship: coalesce(has_relationship_1, has_relationship_2)) So not exactly sure what At the end, this case, IMHO, justifies having a powerful |
sql_context_test.exs
file to demonstrate the usage pattern.Request for Feedback
@billylanchantin, @josevalim, @philss,
@anyone_maintaining_the_repo
:). I would greatly appreciate it if you could review this and provide feedback, pointing me in the right direction for any improvements.Notes
Thank you for your time and guidance!