You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What happened:
When attempting to project a column that has been casted to a different dtype, unexpected behavior can occur due to the fact that DataFusion seems to map cast operations to the name of the original column (e.g. the key for cast(df.a to date) would be df.a).
In particular, this can cause significant issues when trying to project both a casted column and the original, as this results in a collision in our named projects, causing us to use the same column for both projects.
What you expected to happen:
I would've expected cast operations to be mapped to some alias that would distinguish them from the original column, such that collisions wouldn't occur here.
Minimal Complete Verifiable Example:
We get parsing issues when trying to project the casted and original column without an alias:
importpandasaspdfromdask_sqlimportContextdf=pd.DataFrame({"a": ["1999-06-21"]})
c=Context()
c.create_table("df", df)
c.sql(""" select a, cast(a as date) from df """)
# ParsingException: Plan("Projections require unique expression names but the expression \"df.a\" at position 0 and \"CAST(df.a AS Date32)\" at position 1 have the same name. Consider aliasing (\"AS\") one of them.")
When using an alias, we see that one column is used for both projects:
c.sql(""" select a, cast(a as date) as b from df """)
# Dask DataFrame Structure:# a b# npartitions=1 # 0 datetime64[ns] datetime64[ns]# 0 ... ...# Dask Name: rename, 15 graph layers
Anything else we need to know?:
I'm fairly sure this is the underlying issue behind failures we were seeing in q21 and q40 before merging in #924, as the failures seemed to indicate that a cast column wasn't the expected dtype (cc @ayushdg).
Environment:
dask-sql version: latest
Python version: 3.9
Operating System: ubuntu
Install method (conda, pip, source): source
The text was updated successfully, but these errors were encountered:
What happened:
When attempting to project a column that has been casted to a different dtype, unexpected behavior can occur due to the fact that DataFusion seems to map
cast
operations to the name of the original column (e.g. the key forcast(df.a to date)
would bedf.a
).In particular, this can cause significant issues when trying to project both a casted column and the original, as this results in a collision in our named projects, causing us to use the same column for both projects.
What you expected to happen:
I would've expected
cast
operations to be mapped to some alias that would distinguish them from the original column, such that collisions wouldn't occur here.Minimal Complete Verifiable Example:
We get parsing issues when trying to project the casted and original column without an alias:
When using an alias, we see that one column is used for both projects:
Anything else we need to know?:
I'm fairly sure this is the underlying issue behind failures we were seeing in q21 and q40 before merging in #924, as the failures seemed to indicate that a cast column wasn't the expected dtype (cc @ayushdg).
Environment:
The text was updated successfully, but these errors were encountered: