Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

on_compare not properly handling non-boolean values #131

Closed
OliverCWY opened this issue Aug 17, 2024 · 9 comments
Closed

on_compare not properly handling non-boolean values #131

OliverCWY opened this issue Aug 17, 2024 · 9 comments

Comments

@OliverCWY
Copy link

In some libraries (such as polars), the __bool__ methods do not raise ValueError (e.g. polars raises TypeError). This causes the try-except block

try:
  if not res:
    break
except ValueError:
  pass

to raise the uncaught TypeError.

Example code snippet that demonstrates the above:

import polars as pl
from asteval import Interpreter

aeval = Interpreter()
aeval("pl.col('a') > 1")

I assume that any exceptions in the try block would come from the __bool__ method and thus it would be safe to catch all types of error?

@newville
Copy link
Member

@OliverCWY Um, your example never gets to the comparison. It raises a NameError:

  pl.col('a') > 1
NameError: name 'pl' is not defined

Yup: pl is not defined in the Interpreter.

If you still think there is a problem, post actual working code that actually shows the problem, and the full traceback.
Spare the conjecture about the cause of any problem until that problem has been identified.

@OliverCWY
Copy link
Author

Sorry, I forgot to pass the symbol table when modifying my code.

import polars as pl
from asteval import Interpreter

aeval = Interpreter({"pl": pl})
aeval("pl.col('a') > 1")

and the traceback:

   pl.col('a') > 1
TypeError: the truth value of an Expr is ambiguous

Hint: use '&' or '|' to logically combine Expr, not 'and'/'or', and use `x.is_in([y,z])` instead of `x in [y,z]` to check membership.

@newville
Copy link
Member

@OliverCWY Indeed, from Python:

>>> import polars as pl
>>> if (pl.col('a') > 1): print('Yes')
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../python3.12/site-packages/polars/expr/expr.py", line 152, in __bool__
    raise TypeError(msg)
TypeError: the truth value of an Expr is ambiguous

You probably got here by using a Python standard library function instead of the native expressions API.
Here are some things you might want to try:
- instead of `pl.col('a') and pl.col('b')`, use `pl.col('a') & pl.col('b')`
- instead of `pl.col('a') in [y, z]`, use `pl.col('a').is_in([y, z])`
- instead of `max(pl.col('a'), pl.col('b'))`, use `pl.max_horizontal(pl.col('a'), pl.col('b'))`

Asteval just raises this exception more aggressively (at the first "Compare" instead of at "If"). But if you do (in Python):

>>>(pl.col('a') > 1 ) or (pl.col('b') < 0)

That will raise the same kind of TypeError exception.

It sort of seems like you would want to follow polars advice and use its methods instead of Python standard library.

I do not have much experience with polars, but this seems like a not very effective sales pitch ;). Like, it has a top-level function called col(), and col('a') is supposed to be comparable to an integer, only sometimes that is going to not be comparable??

What would you expect to happen?

@OliverCWY
Copy link
Author

OliverCWY commented Aug 17, 2024

@OliverCWY Indeed, from Python:

>>> import polars as pl
>>> if (pl.col('a') > 1): print('Yes')
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../python3.12/site-packages/polars/expr/expr.py", line 152, in __bool__
    raise TypeError(msg)
TypeError: the truth value of an Expr is ambiguous

You probably got here by using a Python standard library function instead of the native expressions API.
Here are some things you might want to try:
- instead of `pl.col('a') and pl.col('b')`, use `pl.col('a') & pl.col('b')`
- instead of `pl.col('a') in [y, z]`, use `pl.col('a').is_in([y, z])`
- instead of `max(pl.col('a'), pl.col('b'))`, use `pl.max_horizontal(pl.col('a'), pl.col('b'))`

Asteval just raises this exception more aggressively (at the first "Compare" instead of at "If"). But if you do (in Python):

>>>(pl.col('a') > 1 ) or (pl.col('b') < 0)

That will raise the same kind of TypeError exception.

It sort of seems like you would want to follow polars advice and use its methods instead of Python standard library.

I do not have much experience with polars, but this seems like a not very effective sales pitch ;). Like, it has a top-level function called col(), and col('a') is supposed to be comparable to an integer, only sometimes that is going to not be comparable??

What would you expect to happen?

Apologies for not explaining the use case. If you simply run pl.col('a') < 1 instead of testing its truth value, you will get a polars expression which can then be used to filter the dataframe.

Following the previous snippet:

expr = pl.col('a') > 1          # works fine
expr = aeval("pl.col('a') > 1") # fails

@newville
Copy link
Member

@OliverCWY Thanks -- that helps.

Yeah, we do use a special case there that maybe should be relaxed. As with this example (but others, notably numpy),
x > y does not necessary return a bool or even a bool-like value.

The challenge is that Comparisons may have multiple operators: x > y > z results in one Comparison with multiple operator/values. In that case, you'd like to return False or raise an exception as early as possible.

And indeed,

>>> import polars as pl
>>> pl.col('a') < 10  > 2

raises the same TypeError exception. The result of pl.col('a') < 10 cannot be compared to 2.

A similar case is

>>> import numpy as np
>>> np.arange(10) > 7
array([False, False, False, False, False, False, False, False,  True,
        True])
>>> np.arange(10) > 4 < 9
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

anyway, I think we can fix this so it better matches Python behavior.

@OliverCWY
Copy link
Author

@newville Yes, I have read the source codes and understand the reasoning. I think in the try-except block, any error would come from converting res to bool, so it would be safe to simply catch all exceptions rather than ValueError which is only specific to numpy.

@newville
Copy link
Member

@OliverCWY Yeah, I agree with that. And maybe for the case of a single comparison, we should just return the result. without testing "true-ness" That would still fail on the "If" and behave more like Python. Looking into it...

@newville
Copy link
Member

@OliverCWY OK, I think this should be fixed (that is, "match Python") in the master branch with 7e2050d

@OliverCWY
Copy link
Author

Thanks a lot for this great project. I will close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants