Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Improved error message when calling query() on expresions involving columns of string values #9894

Closed
rlratzel opened this issue Dec 13, 2021 · 1 comment · Fixed by #9921
Assignees
Labels
feature request New feature or request numba Numba issue Python Affects Python cuDF API.

Comments

@rlratzel
Copy link
Contributor

DataFrame.query() calls using expressions that involve columns of string values are not supported in cuDF. Adding support for this would be ideal, but given that strings are not fully supported for a variety of reasons, a better error message would be the next best improvement.
As an example, in pandas the following works as expected:

>>> DF4.query("type_name=='merchants' and merchant_size>80 and merchant_location==44145") # pandas version
   vertex  type_name  merchant_location  merchant_size  distro_size  user_location  vertical
2      21  merchants            44145.0           83.0          NaN            NaN       NaN

However, the same code using cudf produces the following:

>>> DF4.query("type_name=='merchants' and merchant_size>80 and merchant_location==44145") # cudf version
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/dataframe.py", line 3915, in query
    boolmask = queryutils.query_execute(self, expr, callenv)
  File "/opt/conda/envs/rapids/lib/python3.8/site-packages/cudf/utils/queryutils.py", line 220, in query_execute
    colarrays = [
  File "/opt/conda/envs/rapids/lib/python3.8/site-packages/cudf/utils/queryutils.py", line 221, in <listcomp>
    cudf.core.dataframe.extract_col(df, col).data_array_view
  File "/opt/conda/envs/rapids/lib/python3.8/site-packages/cudf/core/column/string.py", line 5179, in data_array_view
    raise ValueError("Cannot get an array view of a StringColumn")
ValueError: Cannot get an array view of a StringColumn

Perhaps raising TypeError(f"query expression including string column {col} is not supported") or something similar might be better.

NOTE: for those interested, the proposed workaround for the above example is:

DF4[DF4["type_name"]=='merchants'].query(merchant_size>80 and merchant_location==44145")

cc @vyasr @brandon-b-miller

@rlratzel rlratzel added feature request New feature or request Needs Triage Need team to review and classify labels Dec 13, 2021
@brandon-b-miller brandon-b-miller self-assigned this Dec 13, 2021
@vyasr
Copy link
Contributor

vyasr commented Dec 14, 2021

For context, supporting the query itself requires addressing #9639.

@brandon-b-miller brandon-b-miller added Python Affects Python cuDF API. numba Numba issue and removed Needs Triage Need team to review and classify labels Dec 14, 2021
rapids-bot bot pushed a commit that referenced this issue Jan 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request numba Numba issue Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants