Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter is not documented #16742

Closed
cossio opened this issue Jun 3, 2016 · 6 comments
Closed

Filter is not documented #16742

cossio opened this issue Jun 3, 2016 · 6 comments
Labels
docs This change adds or pertains to documentation

Comments

@cossio
Copy link
Contributor

cossio commented Jun 3, 2016

After some experimentation, I found that Base.Filter creates an iterator from a predicate and another iterator. Example:

Filter(iseven, 1:7)

creates an iterator over the integers 2, 4, 6.

I couldn't find this in the documentation. But I think it's very convenient.

Note that this is distinct from the function filter, which instantiates a filtered collection, instead of an iterator.

@berndbohmeier
Copy link

berndbohmeier commented Jun 4, 2016

A comment on your note, according to the source, this is not true.
filter(flt, itr) = Filter(flt, itr)
If none of the more special filter implementations match, filter is implemented as Filter.

I am new to julia, so this is just a thought, but this is inconsistent and should be at least be better documented.
Why should filter(flt, 1:4) return an array but filter(flt,it) an iterator. I would expect that 1:4 and an own iterator are treated the same way and return an iterator.

@krcools
Copy link

krcools commented Jun 4, 2016

I also noticed this inconsistency. Why not have filter always return an iterator? You can always force the Array to precipitate by calling collect(filter(flt,it)) if required.

Since Julia aims to be a language for HPC scientific computing I think it is important to give client programmers control over memory complexity. Returning an Array results in O(N) memory complexity, returning an iterator gives the user the choice to process every returned object one at a time, keeping the memory complexity down to O(1).

@zhmz90
Copy link
Contributor

zhmz90 commented Jun 5, 2016

julia> methods(filter)
# 7 methods for generic function "filter":
filter(f, a::Array{T<:Any,1}) at array.jl:952
filter(f, Bs::BitArray) at bitarray.jl:1746
filter(f, As::AbstractArray) at array.jl:937
filter(f, d::Associative) at dict.jl:273
filter(f, s::Set) at set.jl:166
filter(f, s::AbstractString) at strings/basic.jl:279
filter(flt, itr) at iterator.jl:112
julia> filter(iseven,1:4) #The datatype of 1:4 is a subtype of AbstractArray, so filter return array
2-element Array{Int64,1}:
 2
 4

julia> filter(iseven,(i for i in 1:4))  #input is an iterator, so filter return iterator
Filter{Base.#iseven,Base.Generator{UnitRange{Int64},##7#8}}(iseven,Base.Generator{UnitRange{Int64},##7#8}(#7,1:4))

julia> filter(iseven,[i for i in 1:4]) #input is an array, so filter return array
2-element Array{Int64,1}:
 2
 4

@berndbohmeier

Why should filter(flt, 1:4) return an array but filter(flt,it) an iterator. I would expect that 1:4 and an own iterator are treated the same way and return an iterator.

julia> typeof(1:4)
UnitRange{Int64}

julia> UnitRange{Int64} <: AbstractArray
true

UnitRange is a subtype of AbstractArray in Julia, so filter return an array.

@krcools

Why not have filter always return an iterator? You can always force the Array to precipitate by calling collect(filter(flt,it)) if required.
Since Julia aims to be a language for HPC scientific computing I think it is important to give client programmers control over memory complexity.

In my opinion, small or mediate sized arrays are more frequent used than large arrays among Julia users. If you want control memory, iterator can be used.

I am also new to Julia, just try to offer what I know. Hope it will help.

@cstjean
Copy link
Contributor

cstjean commented Jun 5, 2016

We discussed filter's behaviour over iterators in #13712. Returning an iterator is a performance gotcha, because if the result is iterated over N times, the filtering function will be called N times per element.

I would suggest that filter always return a collection, and Iterators.ifilter (currently non-existent) return an iterator. This would mirror map and Iterators.imap.

@vtjnash vtjnash added the docs This change adds or pertains to documentation label Jun 13, 2016
@krcools
Copy link

krcools commented Jul 3, 2016

If you plan to iterate over a filtered collection more than once and do not want to apply the filter multiple times, you should collect the outcome of filter. This is explicit in the code and indicates that the author is more worried about CPU cycles than memory. I strongly believe that in computational science the CPU and memory complexity should be part of the API, meaning you should be able to read it from the code.

I agree that there might be cases where a single function doing the filtering and collecting is more efficient than calling collect(filter(...)). Reading the Julia style docs a function called collect_filter is suggested...

@TotalVerb
Copy link
Contributor

Iterators.filter is still not documented. Now that this is exported, it should get a docstring.

TotalVerb added a commit to TotalVerb/julia that referenced this issue Apr 25, 2017
TotalVerb added a commit to TotalVerb/julia that referenced this issue Apr 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs This change adds or pertains to documentation
Projects
None yet
Development

No branches or pull requests

7 participants