on_compare not properly handling non-boolean values #131

OliverCWY · 2024-08-17T02:06:02Z

In some libraries (such as polars), the __bool__ methods do not raise ValueError (e.g. polars raises TypeError). This causes the try-except block

try:
  if not res:
    break
except ValueError:
  pass

to raise the uncaught TypeError.

Example code snippet that demonstrates the above:

import polars as pl
from asteval import Interpreter

aeval = Interpreter()
aeval("pl.col('a') > 1")

I assume that any exceptions in the try block would come from the __bool__ method and thus it would be safe to catch all types of error?

The text was updated successfully, but these errors were encountered:

newville · 2024-08-17T13:48:12Z

@OliverCWY Um, your example never gets to the comparison. It raises a NameError:

  pl.col('a') > 1
NameError: name 'pl' is not defined

Yup: pl is not defined in the Interpreter.

If you still think there is a problem, post actual working code that actually shows the problem, and the full traceback.
Spare the conjecture about the cause of any problem until that problem has been identified.

OliverCWY · 2024-08-17T14:52:28Z

Sorry, I forgot to pass the symbol table when modifying my code.

import polars as pl
from asteval import Interpreter

aeval = Interpreter({"pl": pl})
aeval("pl.col('a') > 1")

and the traceback:

   pl.col('a') > 1
TypeError: the truth value of an Expr is ambiguous

Hint: use '&' or '|' to logically combine Expr, not 'and'/'or', and use `x.is_in([y,z])` instead of `x in [y,z]` to check membership.

newville · 2024-08-17T16:32:53Z

@OliverCWY Indeed, from Python:

>>> import polars as pl
>>> if (pl.col('a') > 1): print('Yes')
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../python3.12/site-packages/polars/expr/expr.py", line 152, in __bool__
    raise TypeError(msg)
TypeError: the truth value of an Expr is ambiguous

You probably got here by using a Python standard library function instead of the native expressions API.
Here are some things you might want to try:
- instead of `pl.col('a') and pl.col('b')`, use `pl.col('a') & pl.col('b')`
- instead of `pl.col('a') in [y, z]`, use `pl.col('a').is_in([y, z])`
- instead of `max(pl.col('a'), pl.col('b'))`, use `pl.max_horizontal(pl.col('a'), pl.col('b'))`

Asteval just raises this exception more aggressively (at the first "Compare" instead of at "If"). But if you do (in Python):

>>>(pl.col('a') > 1 ) or (pl.col('b') < 0)

That will raise the same kind of TypeError exception.

It sort of seems like you would want to follow polars advice and use its methods instead of Python standard library.

I do not have much experience with polars, but this seems like a not very effective sales pitch ;). Like, it has a top-level function called col(), and col('a') is supposed to be comparable to an integer, only sometimes that is going to not be comparable??

What would you expect to happen?

OliverCWY · 2024-08-17T16:47:28Z

@OliverCWY Indeed, from Python:
>>> import polars as pl
>>> if (pl.col('a') > 1): print('Yes')
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../python3.12/site-packages/polars/expr/expr.py", line 152, in __bool__
    raise TypeError(msg)
TypeError: the truth value of an Expr is ambiguous

You probably got here by using a Python standard library function instead of the native expressions API.
Here are some things you might want to try:
- instead of `pl.col('a') and pl.col('b')`, use `pl.col('a') & pl.col('b')`
- instead of `pl.col('a') in [y, z]`, use `pl.col('a').is_in([y, z])`
- instead of `max(pl.col('a'), pl.col('b'))`, use `pl.max_horizontal(pl.col('a'), pl.col('b'))`
Asteval just raises this exception more aggressively (at the first "Compare" instead of at "If"). But if you do (in Python):
>>>(pl.col('a') > 1 ) or (pl.col('b') < 0)
That will raise the same kind of TypeError exception.

It sort of seems like you would want to follow polars advice and use its methods instead of Python standard library.

I do not have much experience with polars, but this seems like a not very effective sales pitch ;). Like, it has a top-level function called col(), and col('a') is supposed to be comparable to an integer, only sometimes that is going to not be comparable??

What would you expect to happen?

Apologies for not explaining the use case. If you simply run pl.col('a') < 1 instead of testing its truth value, you will get a polars expression which can then be used to filter the dataframe.

Following the previous snippet:

expr = pl.col('a') > 1          # works fine
expr = aeval("pl.col('a') > 1") # fails

newville · 2024-08-18T00:21:23Z

@OliverCWY Thanks -- that helps.

Yeah, we do use a special case there that maybe should be relaxed. As with this example (but others, notably numpy),
x > y does not necessary return a bool or even a bool-like value.

The challenge is that Comparisons may have multiple operators: x > y > z results in one Comparison with multiple operator/values. In that case, you'd like to return False or raise an exception as early as possible.

And indeed,

>>> import polars as pl
>>> pl.col('a') < 10  > 2

raises the same TypeError exception. The result of pl.col('a') < 10 cannot be compared to 2.

A similar case is

>>> import numpy as np
>>> np.arange(10) > 7
array([False, False, False, False, False, False, False, False,  True,
        True])
>>> np.arange(10) > 4 < 9
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

anyway, I think we can fix this so it better matches Python behavior.

OliverCWY · 2024-08-18T00:32:00Z

@newville Yes, I have read the source codes and understand the reasoning. I think in the try-except block, any error would come from converting res to bool, so it would be safe to simply catch all exceptions rather than ValueError which is only specific to numpy.

newville · 2024-08-18T00:38:23Z

@OliverCWY Yeah, I agree with that. And maybe for the case of a single comparison, we should just return the result. without testing "true-ness" That would still fail on the "If" and behave more like Python. Looking into it...

newville · 2024-08-18T14:17:28Z

@OliverCWY OK, I think this should be fixed (that is, "match Python") in the master branch with 7e2050d

OliverCWY · 2024-08-18T15:08:03Z

Thanks a lot for this great project. I will close this issue.

OliverCWY closed this as completed Aug 18, 2024

newville mentioned this issue Sep 22, 2024

NameError oddity, undefined python variable reference #133

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

on_compare not properly handling non-boolean values #131

on_compare not properly handling non-boolean values #131

OliverCWY commented Aug 17, 2024

newville commented Aug 17, 2024

OliverCWY commented Aug 17, 2024

newville commented Aug 17, 2024

OliverCWY commented Aug 17, 2024 •

edited

Loading

newville commented Aug 18, 2024

OliverCWY commented Aug 18, 2024

newville commented Aug 18, 2024

newville commented Aug 18, 2024

OliverCWY commented Aug 18, 2024

on_compare not properly handling non-boolean values #131

on_compare not properly handling non-boolean values #131

Comments

OliverCWY commented Aug 17, 2024

newville commented Aug 17, 2024

OliverCWY commented Aug 17, 2024

newville commented Aug 17, 2024

OliverCWY commented Aug 17, 2024 • edited Loading

newville commented Aug 18, 2024

OliverCWY commented Aug 18, 2024

newville commented Aug 18, 2024

newville commented Aug 18, 2024

OliverCWY commented Aug 18, 2024

OliverCWY commented Aug 17, 2024 •

edited

Loading