Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify iterator behaviour (filter is half-lazy) #13712

Closed
cstjean opened this issue Oct 22, 2015 · 4 comments
Closed

Clarify iterator behaviour (filter is half-lazy) #13712

cstjean opened this issue Oct 22, 2015 · 4 comments
Labels
docs This change adds or pertains to documentation

Comments

@cstjean
Copy link
Contributor

cstjean commented Oct 22, 2015

Filter does not evaluate the filtering function immediately:

f = filter(x->begin print("here"); true end, keys(Dict(:x=>10)))
> Filter{Function,Base.KeyIterator{Dict{Symbol,Int64}}}((anonymous function),[:x])

However it may evaluate it multiple times,

for x=1:5 collect(f) end
> hereherehereherehere

This is not specified in the docstring:

filter(function, collection)
Return a copy of collection, removing elements for which function is false. For associative collections, the function is passed two arguments (key and value).

Jeff mentions in a comment that iterators should be side-effect free. Fair enough. However, my filtering functions take a long time per-element, and I do not want them called many times needlessly. I can put a collect in front, but this was a surprise to me, since map is not lazy. Is there a general policy to know where (half-)laziness is to be expected? Thank you.

@JeffBezanson
Copy link
Member

I don't understand the multiple evaluation problem. Why is it surprising that collecting a filter of a 1-element collection 5 times calls the function 5 times? That's exactly what I would expect.

@cstjean
Copy link
Contributor Author

cstjean commented Oct 22, 2015

Because it's not symmetric with map's behaviour? Why doesn't map(f, ::Iterator) return an iterator? The example that caused me issues is similar to this:

remove_non_primes(iterable) = filter(is_prime, iterable)

numbers_without_primes = remove_non_primes(numbers)

#  `is_prime` will be called twice for each number in `numbers`
variance = mean(map(x->x^2, numbers_without_primes)) - mean(numbers_without_primes)^2

I had an (iterator over an) array of objects that got filtered by an expensive function, then my iterative algorithm went over the filtered array N times, and I did not realize that the expensive function was called N times per object until I profiled.

Granted, the above code only has an issue if numbers is an iterator to begin with. I suppose that I should have reasoned more carefully about iterator behavior? From my experience, I expect either eager evaluation, or lazy evaluation that saves the results of expensive computations (like Haskell), but if I am alone in finding Julia's behavior surprising, I'll close the issue tomorrow.

@JeffBezanson
Copy link
Member

Ok, that example makes it clearer. In fact I think filter is the only function that is sometimes eager and sometimes returns a lazy iterator. That definitely doesn't seem so good.

@kshyatt kshyatt added the docs This change adds or pertains to documentation label Nov 25, 2015
@cstjean
Copy link
Contributor Author

cstjean commented Mar 7, 2017

Fixed by #18839

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs This change adds or pertains to documentation
Projects
None yet
Development

No branches or pull requests

3 participants