Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More convenient syntax instead of FieldOneOfPredicate #3239

Open
joelostblom opened this issue Oct 26, 2023 · 10 comments · May be fixed by #3505
Open

More convenient syntax instead of FieldOneOfPredicate #3239

joelostblom opened this issue Oct 26, 2023 · 10 comments · May be fixed by #3505
Labels
enhancement v6 Aim to be completed for version 6

Comments

@joelostblom
Copy link
Contributor

Writing a condition that matches a single value in relatively straightforward:

import altair as alt
from vega_datasets import data


source = data.cars()
alt.Chart(source).mark_point().encode(
    x='Horsepower',
    y='Miles_per_Gallon',
    color=alt.condition(
        alt.datum.Origin == 'Japan',
        alt.value('red'),
        alt.value('grey')
    )
)

But when we want to match multiple values, the alt.datum.Origin == 'Japan', needs to be replaced with the notably more compllicated:

        alt.FieldOneOfPredicate(
            'Origin',
            ['Japan', 'Europe']
        ),

It would be nice to support the in operator here and allow the syntax to just be alt.datum.Origin in ['Japan', 'Europe']. I haven't looked into the change necessary to make this happen but noting it down as something to revisit in the future.

@mattijn
Copy link
Contributor

mattijn commented Oct 28, 2023

I think that you would need a custom Altair list instead of using a python list to control this behaviour in the predicate.
You could add a custom magic method for the membership operator (eg. __contains__) to the OperatorMixin, but the equivalent in and not in membership operators will only be applied to the right-hand-side operand and not on the left-hand-side operand. Basically the list decide what it contains and since this is a python list it will return a boolean.

@joelostblom
Copy link
Contributor Author

...the equivalent in and not in membership operators will only be applied to the right-hand-side operand and not on the left-hand-side operand. Basically the list decide what it contains and since this is a python list it will return a boolean.

I was wondering about this. What made me think that it might be possible was that we do use the standard Python operators for the other comparisons, but I hear that this might not work the same for the in operator. I don't fully understand why it would only be applied to the right-hand side from just skimming the code, but I will try to have a closer look at that section of the code base in the future. Thanks for pointing this out and linking to where the logic is!

@dangotbanned
Copy link
Member

dangotbanned commented Jul 11, 2024

I have a couple of alternative ideas for this one.

1 - A wrapper function (one_of)

Note: the ...TimeUnit_T aliases are from https://github.com/dangotbanned/altair/blob/ac23fd51460e69b1a7189da28b9a10e28eda2ac7/altair/vegalite/v5/schema/core.py#L16331-L16333

Code block
from __future__ import annotations

from typing import TYPE_CHECKING

from altair import FieldOneOfPredicate, Undefined

if TYPE_CHECKING:
    from altair import Optional, SchemaBase
    from altair.vegalite.v5.schema._typing import (
        BinnedTimeUnit_T,
        Map,
        MultiTimeUnit_T,
        SingleTimeUnit_T,
    )


def one_of(
    field: str,
    /,
    *oneOf: bool | float | Map | SchemaBase,
    timeUnit: Optional[
        dict | SchemaBase | MultiTimeUnit_T | BinnedTimeUnit_T | SingleTimeUnit_T
    ] = Undefined,
) -> FieldOneOfPredicate:
    return FieldOneOfPredicate(field=field, oneOf=oneOf, timeUnit=timeUnit)

Compared

This has a very similar appearance to @joelostblom's original:

alt.datum.Origin in ['Japan', 'Europe'] # original

alt.one_of("Origin", "Japan", "Europe") # proposed

Pros

  • field is positional-only
    • so alt.one_of(field=origin,...) would raise a TypeError
    • This would be implicit without the /, I'm just using that to highlight
  • oneOf is variadic
    • No need to spell out a list
  • timeUnit
    • Specifying this wouldn't have been possible in a hypothetical operator-based solution
  • Doesn't require modifying any existing code

Cons

  • Without reading the parameter names, it might be unintuitive to figure out what this function is one_of-of
  • Given the current number of symbols in the top level namespace, alt.one_of could be difficult to find among hundreds
  • This only addresses 1/8 of the Field...Predicate classes https://vega.github.io/vega-lite/docs/predicate.html#field-predicate

2 - Utility class for all field predicates (field)

Note: This is enough to demo the idea, but I've left it as stubs to avoid repeating parts

Code block
class field:
    @classmethod
    def one_of(
        cls,
        field: str,
        /,
        *oneOf: bool | float | Map | SchemaBase,
        timeUnit: Optional[
            dict | SchemaBase | MultiTimeUnit_T | BinnedTimeUnit_T | SingleTimeUnit_T
        ] = Undefined,
    ) -> FieldOneOfPredicate:
        return FieldOneOfPredicate(field=field, oneOf=oneOf, timeUnit=timeUnit)

    @classmethod
    def eq(
        cls,
        field: str,
        value: Any,
        /,
        *,
        timeUnit: Optional[
            dict | SchemaBase | MultiTimeUnit_T | BinnedTimeUnit_T | SingleTimeUnit_T
        ] = Undefined,
    ) -> FieldEqualPredicate:
        return FieldEqualPredicate(equal=value, field=field, timeUnit=timeUnit)

    @classmethod
    def lt(
        cls,
        field: str,
        value: Any,
        /,
        *,
        timeUnit: Optional[
            dict | SchemaBase | MultiTimeUnit_T | BinnedTimeUnit_T | SingleTimeUnit_T
        ] = Undefined,
    ) -> FieldLTPredicate:
        return FieldLTPredicate(lt=value, field=field, timeUnit=timeUnit)

    @classmethod
    def lte(cls, field: str, value: Any, /, **kwds) -> FieldLTEPredicate: ...
    @classmethod
    def gt(cls, field: str, value: Any, /, **kwds) -> FieldGTPredicate: ...
    @classmethod
    def gte(cls, field: str, value: Any, /, **kwds) -> FieldGTEPredicate: ...
    @classmethod
    def valid(cls, field: str, valid: bool, **kwds) -> FieldValidPredicate: ...
    @classmethod
    def range(
        cls,
        field: str,
        range: Optional[
            dict
            | Parameter
            | SchemaBase
            | Sequence[dict | None | float | Parameter | SchemaBase]
        ] = Undefined,
        /,
        **kwds,
    ) -> FieldRangePredicate: ...

Compared

alt.datum.Origin in ['Japan', 'Europe'] # original

alt.one_of("Origin", "Japan", "Europe") # proposed (1)

alt.field.one_of("Origin", "Japan", "Europe") # proposed (2)

Autocomplete

image

Pros

  • More explicit, without being overly verbose
  • All 8 of the field predicate types are accessible under a single namespace
  • Using positional-only parameters avoids lots of verbose repetition
    • alt.FieldValidPredicate(field=..., valid=,...)
  • Would align closer to the proposed expr changes in feat: Reimplement expr as a class that is understood by IDEs #3466

Cons

  • Still may be difficult to find, but easy to teach like alt.expr, alt.datum
  • Some of these operations overlap with alt.datum

Summary

Personally think alt.field.one_of would be the least controversial and, in the future, could allow removing those wrapped classes from the top-level.
Rather than special-casing one_of alone.

I'd be happy to work on this if there's any interest

@joelostblom
Copy link
Contributor Author

Thanks for these great ideas @dangotbanned ! I think I agree that a submodule is better than top module to avoid cluttering the top level api and allow easy discovery of related functionality. However, of the existing field predicates, most are already covered by the regular comparison operators >, ==, etc, so there is really only a need to introduce one_of and range (and maybe valid) which makes the grouping/discoverability advantages from the submodule less noticeable.

I wonder if we could be closer to how the syntax looks in Python (since this is what is used for the already existing comparison operators). I'm a bit short of time at the moment to dive deep into things, but do you think something along these lines would be possible or at they both susceptible to the constraints Mattijn mentioned above?

# 1
alt.datum.Origin in alt.list('Japan', 'Europe')
alt.datum.Origin in alt.range(1, 10)

# 2
alt.datum.Origin alt.in ['Japan', 'Europe']
alt.datum.Origin alt.in range(1, 10)

@dangotbanned
Copy link
Member

dangotbanned commented Jul 17, 2024

Thanks for the response @joelostblom

However, of the existing field predicates, most are already covered by the regular comparison operators >, ==, etc, so there is really only a need to introduce one_of and range (and maybe valid) which makes the grouping/discoverability advantages from the submodule less noticeable.

Yeah I agree. I've personally not used any of them before, so mostly just included them for completeness.

Continuation of Utility class for all field predicates (field)

One additional thing I thought of, but seemed a bit much for the initial idea.
Changing the classmethod definitions to wrap the result in api.SelectionPredicateComposition.

So this allows using ~, &, | with compositions of selection objects.

# Previously ref as (2)
class field_old:
    @classmethod
    def one_of(
        cls,
        field: str,
        /,
        *oneOf: bool | float | Map | SchemaBase,
        timeUnit= Undefined,
    ) -> FieldOneOfPredicate:
        return FieldOneOfPredicate(field=field, oneOf=oneOf, timeUnit=timeUnit)
	@classmethod
    def range(
        cls,
        field: str,
        range: Optional[
            dict
            | Parameter
            | SchemaBase
            | Sequence[dict | None | float | Parameter | SchemaBase]
        ] = Undefined,
        /,
        **kwds,
    ) -> FieldRangePredicate: ...

# ----------------------------------------------------
# Extending for composition (3)
class field:
	@classmethod
    def one_of(
        cls,
        field: str,
        /,
        *values: bool | float | dict[str, Any] | SchemaBase,
        timeUnit: TimeUnitType = Undefined,
    ) -> SelectionPredicateComposition:
		m = FieldOneOfPredicate(field=field, oneOf=values, timeUnit=timeUnit).to_dict()
		return SelectionPredicateComposition(m)

	@classmethod
    def range(
        cls, field: str, value: RangeType, /, *, timeUnit: TimeUnitType = Undefined
    ) -> SelectionPredicateComposition:
        m = FieldRangePredicate(field=field, range=value, timeUnit=timeUnit).to_dict()
        return SelectionPredicateComposition(m)

I was curious on if this would actually render, and surpisingly it did!

altair.utils._shorthand_ns isn't part of any public branch at the moment.

image


Feedback on @joelostblom's idea

I wonder if we could be closer to how the syntax looks in Python (since this is what is used for the already existing comparison operators). I'm a bit short of time at the moment to dive deep into things, but do you think something along these lines would be possible or at they both susceptible to the constraints Mattijn mentioned above?

# 1
alt.datum.Origin in alt.list('Japan', 'Europe')
alt.datum.Origin in alt.range(1, 10)

My first thoughts on # 1 were issues like:

Since both list and range are builtins

But apparently shadowing a builtin with a function isn't an issue?

Personally, I think it would be error prone since a user could do this and accidentally change the meaning of their code:

from altair import list, range

...

list(...)
range(...)

# 2
alt.datum.Origin alt.in ['Japan', 'Europe']
alt.datum.Origin alt.in range(1, 10)

For # 2 there are some major issues:

Cell In[11],   [line 22](vscode-notebook-cell:?execution_count=11&line=22)
    alt.datum.Origin alt.in range(1, 10)
                     ^
SyntaxError: invalid syntax

Also VSCode yelling at me:

SyntaxError: Simple statements must be separated by newlines or semicolons (Ruff)
Statements must be separated by newlines or semicolons (Pylance)
Expected attribute name after "." (Pylance)
SyntaxError: Expected an identifier, but found a keyword 'in' that cannot be used here (Ruff)
Statements must be separated by newlines or semicolons (Pylance)
Expected expression (Pylance)
SyntaxError: Simple statements must be separated by newlines or semicolons (Ruff)
Missing whitespace around operator (Ruff)

in is a reserved keyword https://docs.python.org/3/reference/lexical_analysis.html#keywords

Which is the same reason you'll see as_ used in the TopLevelMixin.transform_ methods:

def transform_bin(
self,
as_: Optional[str | FieldName | list[str | FieldName]] = Undefined,
field: Optional[str | FieldName] = Undefined,
bin: Literal[True] | BinParams = True,

@mattijn
Copy link
Contributor

mattijn commented Jul 17, 2024

I quite like the alt.field.* approach! Thanks for this exploration @dangotbanned!

@joelostblom
Copy link
Contributor Author

Personally, I think it would be error prone since a user could do this and accidentally change the meaning of their code:

Yup, definitely. We would have to append an underscore like we do with alt.expr.if_ but I've always thought that trailing underscore looks a bit unofficial/private, so I'm not super sold on alt.list_/alt.range_.

So this allows using ~, &, | with compositions of selection objects

I like this! Maybe just go ahead with the field submodule approach then as mattijn is on board too. One modification I wonder if we can do is that instead of passing the items of the list as parameters, could we just pass the list? So the syntax would be alt.field.one_of("Origin", ["Japan", "Europe"]). I think that is more intuitive, aligns with the syntax for field.range in your screenshot, and it would make it easier if the list is predefined as well since we avoid unpacking it, ie avoid alt.field.one_of("Origin", *predefined_list). Unless I'm missing some key functionality enables by having these as params instead?

@dangotbanned
Copy link
Member

dangotbanned commented Jul 17, 2024

I quite like the alt.field.* approach! Thanks for this exploration @dangotbanned!

Thanks @mattijn


So this allows using ~, &, | with compositions of selection objects

I like this! Maybe just go ahead with the field submodule approach then as mattijn is on board too. One modification I wonder if we can do is that instead of passing the items of the list as parameters, could we just pass the list? So the syntax would be alt.field.one_of("Origin", ["Japan", "Europe"]). I think that is more intuitive,

@joelostblom You can absolutely have both for field.one_of, if you feel its more intuitive!

Most of the polars API that accepts *args does so like LazyFrame.select.


aligns with the syntax for field.range in your screenshot, and it would make it easier if the list is predefined as well since we avoid unpacking it, ie avoid alt.field.one_of("Origin", *predefined_list). Unless I'm missing some key functionality enables by having these as params instead?

So for field.range, which wraps FieldRangePredicate - the range parameter type is a bit more complex:

https://github.com/vega/altair/blob/f8c9776a72afa433f3e6b311bdaf5133d38f127b/altair/vegalite/v5/schema/core.py#L16545-L16576

But I think for most cases it would be tuple[float, float].
The most important part being 2 elements (min/max), so *args might be misleading as a specific length is expected.

@mattijn
Copy link
Contributor

mattijn commented Jul 25, 2024

I saw #3440 (comment) again.
How would the proposed syntax looks like here?

import altair as alt
from vega_datasets import data

source = data.disasters()
columns_sorted = ['Drought', 'Epidemic', 'Earthquake', 'Flood']

alt.Chart(source, height=200).transform_filter(
    {'and': [
        alt.FieldOneOfPredicate(field='Entity', oneOf=columns_sorted), # Filter data to show only disasters in columns_sorted
        alt.FieldRangePredicate(field='Year', range=[1900, 2000]) # Filter data to show only 20th century
    ]}
).mark_line(interpolate='monotone').encode(x=alt.X('Year:Q').axis(format='d'), color='Entity',column='Entity', y='sum(Deaths)')

Will it be as such?

alt.field.one_of(field='Entity', one_of=columns_sorted) & alt.field.range(field='Year', range=[1900, 200])

Also linking #695 here. Is this applicable here too?

dangotbanned added a commit to dangotbanned/altair that referenced this issue Jul 26, 2024
`field` proposed in vega#3239 (comment)
`agg` was developed during vega#3427 (comment) as a solution to part of vega#3476
dangotbanned added a commit to dangotbanned/altair that referenced this issue Jul 26, 2024
@dangotbanned dangotbanned linked a pull request Jul 26, 2024 that will close this issue
@dangotbanned
Copy link
Member

dangotbanned commented Jul 26, 2024

I saw #3440 (comment) again. How would the proposed syntax looks like here?

import altair as alt
from vega_datasets import data

source = data.disasters()
columns_sorted = ['Drought', 'Epidemic', 'Earthquake', 'Flood']

alt.Chart(source, height=200).transform_filter(
    {'and': [
        alt.FieldOneOfPredicate(field='Entity', oneOf=columns_sorted), # Filter data to show only disasters in columns_sorted
        alt.FieldRangePredicate(field='Year', range=[1900, 2000]) # Filter data to show only 20th century
    ]}
).mark_line(interpolate='monotone').encode(x=alt.X('Year:Q').axis(format='d'), color='Entity',column='Entity', y='sum(Deaths)')

@mattijn I've opened #3505 to allow you (and anyone really) to test out what I've presented so far.

Will it be as such?

alt.field.one_of(field='Entity', one_of=columns_sorted) & alt.field.range(field='Year', range=[1900, 2000])

It would be the slightly shorter:

# Also valid
# alt.field.one_of('Entity', 'Drought', 'Epidemic', 'Earthquake', 'Flood')
alt.field.one_of('Entity', columns_sorted) & alt.field.range('Year', [1900, 2000])

Each field argument is positional-only, and what would be one_of and range are as well.

See #3239 (comment) for extra detail on that choice

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement v6 Aim to be completed for version 6
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants