Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Base.filter to work with all iterators #31188

Open
pdeffebach opened this issue Feb 27, 2019 · 17 comments
Open

Allow Base.filter to work with all iterators #31188

pdeffebach opened this issue Feb 27, 2019 · 17 comments
Labels
collections Data structures holding multiple items, e.g. sets missing data Base.missing and related functionality

Comments

@pdeffebach
Copy link
Contributor

This issue is motivated by this discussion on discourse.

My example of when this is a problem is when you are working with a subset of data that, by chance, does not contain missing values. When you work with the full data, you now have missing values and have to use skipmissing at various places in your code. Now all your filter commands are broken.

A heavy and breaking change would be to make Base.filter work on any iterator, and not just arrays. Other alternatives include pointing more people towards Iterators.filter for general use, or special-casing skipmissing so a collect call is included somewhere.

@ararslan ararslan added the collections Data structures holding multiple items, e.g. sets label Feb 28, 2019
@ararslan
Copy link
Member

One issue here is that filter is eager and, generally speaking, typeof(filter(f, x)) == typeof(x). That isn't really possible to implement in a general case, but I think it would be okay to define filter for SkipMissing to materialize and filter.

@nalimilan nalimilan added the missing data Base.missing and related functionality label Mar 1, 2019
@nalimilan
Copy link
Member

So I guess in that proposal filter(::SkipMissing{<:AbstractArray}) would work, but not with other wrapped iterators?

@ararslan
Copy link
Member

ararslan commented Mar 1, 2019

That would make sense to me. SkipMissing is an odd enough case itself that I think special treatment of it would be fine. We could define it as

filter(f, s::SkipMissing) = filter(x->!ismissing(x) && f(x), s.x)

then let filter fail if the underlying iterator doesn't have filter defined on it.

@pdeffebach
Copy link
Contributor Author

An alternative is that people could just change their filter commands to have ismissing, which is a bit more explicit anyways.

This could just be a documentation / tutorial problem.

@ararslan
Copy link
Member

ararslan commented Mar 2, 2019

I was looking at implementing what I said above verbatim, but I realized that that definition leaves you with a container without any missing values but with an element type that allows missing. That seems suboptimal and is inconsistent with the result of filter(f, collect(s)).

This is difficult to do generically, since ideally you'd want to be able to use filter on any SkipMissing{T} where T supports filter and get a T back, but to provide useful and consistent behavior, I'm not sure how we'd swing it for anything but arrays...

@nalimilan
Copy link
Member

Supporting filter only for AbstractArray sounds OK to me.

@ararslan
Copy link
Member

ararslan commented Mar 2, 2019

Implemented in #31235.

@nalimilan
Copy link
Member

One possible rationale for supporting filter(::SkipMissing{<:AbstractArray}) and not other methods is that SkipMissing{<:AbstractArray} is indexable. If an Indexable trait was added as discussed at #31020 (comment), it would make sense for filter to support all types with that trait. That would include in particular Broadcasted objects.

@ararslan
Copy link
Member

ararslan commented Mar 7, 2019

We can't support filter for arbitary iterables and we now have filter for SkipMissing-wrapped arrays, so is there anything actionable left or can this be closed now that #31235 is merged?

@nalimilan
Copy link
Member

As I noted it could make sense to also support Broadcasted. Not sure whether that warrants keeping this open.

@ararslan
Copy link
Member

ararslan commented Mar 7, 2019

Ah right, sorry. It might make sense to have that as a separate issue, since it's very specific (and a difficult situation to find yourself in, since presumably a broadcast will materialize before filter is called).

@pdeffebach
Copy link
Contributor Author

Broadcasted would materialize an array while lifting missing elements? Would be great!

@nalimilan
Copy link
Member

Yes that's in the context of #31088.

@ssfrr
Copy link
Contributor

ssfrr commented Mar 5, 2020

I just ran into this trying to filter a zip, which also doesn't work.

filter(x->x[2] > 0.5, zip(rand(10), rand(10)))

Is the main objection to having filter work for arbitrary iterators that we can't then guarantee that typeof(filter(f, x)) == typeof(x)? It doesn't seem that map makes that guarantee:

map(sum, zip(rand(10), rand(10)))

This returns a Vector{Float64}. It seems like returning a Vector{eltype(x)} in the case that x is an iterator seems like a pretty reasonable behavior.

@ararslan
Copy link
Member

ararslan commented Mar 5, 2020

It doesn't seem that map makes that guarantee

Sure, but that's not the same operation. map is explicitly a transformation of the data into something different, while filter gives you back a subset of the existing data.

It seems like returning a Vector{eltype(x)} in the case that x is an iterator seems like a pretty reasonable behavior

The problem is defining what an "iterator" is. For example, one can iterate a tuple, but I'd be surprised if e.g. filter(isodd, (1,2,3)) gave me back an array.

@nalimilan
Copy link
Member

The problem is defining what an "iterator" is. For example, one can iterate a tuple, but I'd be surprised if e.g. filter(isodd, (1,2,3)) gave me back an array.

But why would it return an array? Just like map returns a tuple for tuples, filter could fall back to an array for general iterators, but return another type for some iterable types.

@ssfrr
Copy link
Contributor

ssfrr commented Mar 5, 2020

Sure, but that's not the same operation. map is explicitly a transformation of the data into something different, while filter gives you back a subset of the existing data.

map transforms the individual elements, so I'd expect that the eltype would change, but that seems separate from whether if affects the container type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
collections Data structures holding multiple items, e.g. sets missing data Base.missing and related functionality
Projects
None yet
Development

No branches or pull requests

4 participants