Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The "collect" decorator #557

Closed
maxme1 opened this issue Dec 25, 2022 · 9 comments
Closed

The "collect" decorator #557

maxme1 opened this issue Dec 25, 2022 · 9 comments

Comments

@maxme1
Copy link

maxme1 commented Dec 25, 2022

Hi!
I often encounter the same pattern in my code:

def to_power(array, powers):
    result = []
    for entry in array:
        power = powers.get(entry)
        if power is not None and power >= 0:
            result.append(entry ** power)

    return result


def reverse_mapping(expensive_func, values):
    result = {}
    for entry in values:
        value = expensive_func(entry)
        if value is not None:
            result[value] = entry

    return result

The examples are somewhat simplistic, but you get the idea:

  1. create a container
  2. iterate some data, do some branching, etc and fill the container
  3. return the container

I came up with several decorators that reduce this to:

@collect
def to_power(array, powers):
    for entry in array:
        power = powers.get(entry)
        if power is not None and power >= 0:
            yield entry ** power


@composed(dict)
def reverse_mapping(expensive_func, values):
    for entry in values:
        value = expensive_func(entry)
        if value is not None:
            yield value, entry

composed(func) simply applies func to the result of the decorated function, which effectively gathers the generator. And collect is just a shorter version of composed(list)

I can create a PR with my implementation, if you are interested in adding it to toolz.

@mentalisttraceur
Copy link

mentalisttraceur commented Jan 13, 2023

This seems like a really worthwhile ergonomics boon. If I was the maintainer of toolz I would be leaning hard towards "yes" on this.

Goes well with #486 - if the starcall function described there is available, then this creates a nicer way of defining functions that return args or kwargs for the next function in the composition.

I'm imagining @composed(Arguments.wrap) on top of @collect or @composed(dict).

Could maybe even enhance the Arguments in that other issue enable something like @composed(Arguments.build) on functions that do yield value, yield 'name', value, or both.

@maxme1
Copy link
Author

maxme1 commented Feb 12, 2023

I decided to add a PR in case you decide to add it to toolz

@ZeroBomb
Copy link

ZeroBomb commented Feb 17, 2023

So... I'm not a big fan.

In my mind, the functional style of programming is about separating the "business" logic from the iteration logic that feeds them data.

These decorators encourage people to stop before achieving such separation.

consider the following:

@curry
def raise(powers, entry):
    power = powers.get(entry)
    return entry ** power if power is not None and power >= 0 else None

def to_power(powers):
    return compose_left(
      map(raise(powers))
    , remove(identity)
    , list
    )
def reverse_mapping(expensive_func):
    return compose_left( 
         map(juxt(expensive_func, identity))
        , dict
        , keyfilter(identity)
    )

@mentalisttraceur
Copy link

@ZeroBomb for what it's worth, I find your first alternative harder to understand than either the manual list-appending version or this "collect into list" decorator proposal.

(I had the luck of trying to understand your alternatives from a clean slate, because I totally forgot the examples from the beginning of this issue.)

The biggest reason is that the relevant information gets spread out further apart and I feel myself having to think and remember more across the two functions to re-connect it.

@mentalisttraceur
Copy link

mentalisttraceur commented Feb 17, 2023

I suggest looking at this proposal more abstractly:

  1. composed is a decorator version of compose + partial. That can be pretty nice shorthand. Some of the arguments from PEP-318 in favor of decorators apply: some things are clearer if they immediately precede a function rather than being buried below it.

    Example: @composed(log_result, prefix="whatever") on top of def foo instead of foo = compose_left(foo, partial(log_result, prefix="whatever")) below the whole body of foo.

  2. "Collect yielded values into list/dict/whatever" adds ergonomics for building arbitrary collections. A good option in situations where literals, list/dict/set comprehensions, or map/filter start to get awkward. Sometimes there's a lot of coupling between multiple loops; sometimes logic can be more readably expressed in a full function body.

@maxme1
Copy link
Author

maxme1 commented Feb 17, 2023

@mentalisttraceur you're reading my mind!
I mostly use these decorators as a cleaner way to write "comprehension-like" code, when I have to chew through some data.

@ZeroBomb
Consider this code (almost verbatim from one of my projects):

@collect
def interpolate_dict(slices, size, default):
    for index in range(size):
        # find_nearest_keys's contents are not relevant for the example
        start, stop = find_nearest_keys(slices, index)
        if index in slices:
            yield slices[index]

        elif start is not None and stop is not None:
            try:
                # interpolate's contents are not relevant for the example
                yield interpolate(slices[start], slices[stop], (index - start) / (stop - start))
            except InterpolationError:
                yield default

        else:
            yield []

Here you could do some comprehensions and exceptions rerouting and maybe define a tiny function that calls find_nearest_keys inside it (because we need a scope for start and stop). But would it be more readable? In my experience when you're working with some irregular data it's much cleaner to write a loop, than an one-off-function.

@ZeroBomb
Copy link

ZeroBomb commented Feb 18, 2023

So I will admit to writing decorators like the ones in this proposal in the past, but ultimately I would end up removing them because I was using them in ways that ended up being counter-productive.

The pattern I observed in my own coding would go like this:

  1. I write my functions simple and focused on the problem at hand (good!)
  2. Next, I want to use them someplace, but it would be more convenient if they (automatically mapped themselves over dataframes/formatted their results a different way/did some kind of post processing/etc)
  3. I know! I'll decorate the simple functions so they do that!
  4. Wait, now I need the functionality of the underlying function, but without the decoration, or with the results formatted a different way...
  5. I remove the decoration and put the responsibility for pre- or post- processing the function's data where the pre- or post- processing is needed, instead of permanently attaching it to the simple function.

In @maxme1 's example, I would gravitate towards something like this:

@curry
def interpolate_index(slices, index, default=None):
    # business code, and nothing but.  
    # This describes exactly the process for calculating the interpolated value at an index
    if index in slices:
        return slices[index]
    start, stop = find_nearest_keys(slices, index)
    if start and stop:
       try:
           return interpolate(slices[start], slices[stop], (index - start) / (stop - start))
       except InterpolationError:
           return default
    return []

def interpolate_dict(slices, size, default=None):
    # This describes the process for interpolating over a range of values
    return map(interpolate_index(slices, default=default), range(size))

# More generally
def apply_interpolator(interpolator, size):
    # This describes the process for interpolating over a range of values
    return map(interpolator, range(size))

There is a sense in which you have to keep two things in mind (the mapping function and the mapped function), but I would argue that:

  1. Mapping functions over collections of values is a fundamental concept in the functional style of programming, and the toolz library should encourage this style.
  2. The interpolate_index function is the basic "unit of work" in this algorithm, and setting it apart actually clarifies what you are trying to accomplish.
  3. Putting the iteration in a separate function gives you the ability to trivially add e.g. a parallelized version of the iterpolate_dict function without touching your business logic at all.

@maxme1
Copy link
Author

maxme1 commented Feb 18, 2023

Isn't writing code like this a bit too much of future-proofing?
Yes, in perspective I could restructure my code as you suggested, but if I need the function right now, writing a single function with a decorator is simpler and (arguably) more readable.

But I see your point. You're basically saying that decorators like these are against the philosophy of toolz, and this is a strong argument 🙂

@ZeroBomb @mentalisttraceur maybe you could suggest another package where this functionality will fit?

@maxme1
Copy link
Author

maxme1 commented Feb 21, 2023

Thanks for the feedback, everyone!
I didn't think of anything simpler than moving these 2 decorators to a separate library - jboc (Just a Bunch Of Code).
Pretty dumb, but gets the job done 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants