Skip to content
This repository has been archived by the owner on Sep 1, 2020. It is now read-only.

length for chain (fixes #84) #85

Merged
merged 1 commit into from
Oct 30, 2016
Merged

Conversation

CarloLucibello
Copy link
Contributor

@CarloLucibello CarloLucibello commented Oct 11, 2016

Adds a length function for chains when they are supposed to have one

Probably this is not the right way to do if, since iteratorsize is type unstable. Does anyone have better ideas?

@@ -118,6 +118,16 @@ ch2 = chain(1:0, 1:2:5, 0.2:0.1:1.6)
@test eltype(ch2) == typejoin(Int, Float64)
@test collect(ch2) == [1:2:5; 0.2:0.1:1.6]

ch3 = chain(1:10,1:10, 1:10)
@test length(ch3) == 30
@test Base.iteratorsize(ch3) == Base.HasLength()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iteratorsize is imported (if available) and otherwise defined in this package, so it shouldn't be prefixed with Base. (Causing tests to fail.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like I introduced some typo at the last moment just before pushing it (Meth instead of MethodErrod), that was causing the failure

Base.iteratorsize is doing the right job on my machine, i.e. using the method defined in Iterators.jl. Should I still change the test?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I see, sorry, I was looking at 0.5 travis build

I can switch Base.Iteratorsize to Interface.Iteratorsize but we still have a problem with Base.HasLength(). Should I put the whole test inside a

if VERSION >= v"0.5.0-dev+3305"
...
end

?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that sounds good.

@CarloLucibello
Copy link
Contributor Author

the test should be good also on 0.4, thanks @kmsquire for the review. Also, I left the prefix Base., tell me if you want it to be Interators. I can't tell the difference at this point.

Do you thinks that this should be merged even if iteratorsize becomes type unstable?

julia> c=chain(1:10,1:10)
Iterators.Chain{Tuple{UnitRange{Int64},UnitRange{Int64}}}((1:10,1:10))

julia> @code_warntype Base.iteratorsize(c)
Variables:
  #self#::Base.#iteratorsize
  x::Iterators.Chain{Tuple{UnitRange{Int64},UnitRange{Int64}}}

Body:
  begin 
      return $(Expr(:invoke, LambdaInfo for iteratorsize(::Type{Iterators.Chain{Tuple{UnitRange{Int64},UnitRange{Int64}}}}), :(Base.iteratorsize), Iterators.Chain{Tuple{UnitRange{Int64},UnitRange{Int64}}}))
  end::Union{Base.HasLength,Base.SizeUnknown}

@iamed2
Copy link
Collaborator

iamed2 commented Oct 11, 2016

IMO the way forward is to determine this when constructing Chain and include the iterator size as a type parameter. That way length could also be defined only when it would return a length.

@CarloLucibello
Copy link
Contributor Author

IMO the way forward is to determine this when constructing Chain and include the iterator size as a type parameter. That way length could also be defined only when it would return a length.

nice! I'll try to cook something to also preserve julia 0.4 compatibility

if VERSION >= v"0.5.0-dev+3305"
r = ( x for x in 1:10 if isodd(x))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though this wouldn't actually run on v0.4, the compiler still attempts to parse it, and unfortunately, this syntax is a parse error on v0.4. 😢

For now, my suggestion is to comment out this test, explaining that it is a parse error on Julia v0.4, and that it should be uncommented when support for v0.4 is dropped.

@codecov-io
Copy link

codecov-io commented Oct 12, 2016

Current coverage is 94.15% (diff: 100%)

Merging #85 into master will increase coverage by 0.13%

@@             master        #85   diff @@
==========================================
  Files             1          1          
  Lines           351        359     +8   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits            330        338     +8   
  Misses           21         21          
  Partials          0          0          

Powered by Codecov. Last update dd9024c...efe829d

@CarloLucibello
Copy link
Contributor Author

ok, the tests are passing now, but...

on a second thought the problem of type instability has shifted from the function iteratorsize to the function chain, so I don't know if we can call that a progress

@CarloLucibello
Copy link
Contributor Author

I figured a way out of the type instability issue, this should be good now (if tests pass :)

Copy link
Collaborator

@iamed2 iamed2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Base.iteratorsize should be defined on the iterator type, not the instance.

_chain_is should return IsInfinite() as soon as it hits an iterator with IsInfinite().

IteratorSize type import is unused.

There should be more tests for more iteratorsize iterator combinations.

@CarloLucibello
Copy link
Contributor Author

Base.iteratorsize should be defined on the iterator type, not the instance.

will do

_chain_is should return IsInfinite() as soon as it hits an iterator with IsInfinite().

it does (the line after #fallback)

IteratorSize type import is unused.

will remove

There should be more tests for more iteratorsize iterator combinations.

I'm kinda impaired here by the 0.4 compatibility requirement (e.g. cannot use generators and test for return type of itersize can be run only on 0.5)

@iamed2
Copy link
Collaborator

iamed2 commented Oct 17, 2016

it does (the line after #fallback)

SizeUnknown() and IsInfinite() are different. Unless maybe you didn't push a change?

I'm kinda impaired here by the 0.4 compatibility requirement (e.g. cannot use generators and test for return type of itersize can be run only on 0.5)

Feel free to write more 0.5-only tests when necessary. There should be enough available in terms of data structures and other iterators that you shouldn't need generators.

@CarloLucibello
Copy link
Contributor Author

it does (the line after #fallback)

SizeUnknown() and IsInfinite() are different. Unless maybe you didn't push a change?

you're right, sorry, I misread

Everything should be fixed now (crossing fingers)

@oxinabox
Copy link
Member

oxinabox commented Oct 18, 2016

This is a good plan, and I like the idea of having iteratorsize defined on more things,
A few issues though:

It has been suggested that
HasLength should only be used by things that have a length that can be found in O(1) time.
And further that iteratorsize itself should always be O(1)

This (like #60) is has the property that iteratorsize and length are is O(n) when n is the number of iterators being chained together.

I personally think that is probably Ok, as n is kinda expected to be smallish.

But maybe it isn't
Consider the case of: Chain((rand(rand(1:10)) for ii in 1:1_000)...) for example.
Indeed, I myself often collect up sequences of words into sentences,
and then I might like to chain all the sentences into the words of the whole document.
For me, than n might be 100_000, or even 10_000_000
In this case, though perhaps splatting has already killed me.
So it may not be anything to worry about

IMO the way forward is to determine this when constructing Chain and include the iterator size as a type parameter

Adding that extra type parameter would let you calculate the iteratorsize at construction time -- the constructor is already O(n) where n is the number of iterators being chained.
(In 0.4 that parameter could be set to Void, I guess).
It could also let you calculate the length at construction time and store that in a field.


I note that you have define the iteratorsize recursively.
Which is very elegant (I really like defining things that way).
But julia does not support TCO, right now.
So that will be slower than a loop.
And can stackoverflow.
It should be rewritten to be done with a loop.

@CarloLucibello
Copy link
Contributor Author

This is a good plan, and I like the idea of having iteratorsize defined on more things,
A few issues though:

It has been suggested that
HasLength should only be used by things that have a length that can be found in O(1) time.
And further that iteratorsize itself should always be O(1)

This (like #60) is has the property that iteratorsize and length are is O(n) when n is the number of iterators being chained together.

I personally think that is probably Ok, as n is kinda expected to be smallish.
I'm not sure iteratorsize is O(n), It should be O(1) if the compiler is smart, since it's all static evaluation

Length is O(n), but is we compute it at creation time we have a problem:

r = rand(10)
c = chain(r, r)
push!(r, 1.)
length(c) != length(collect(c)) # problem here

I think chain should be optimized for small n. It is a good thing also that the constructor chain(x...) enforces n to be small, in some sense

btw: length is also implemented for Product (still O(n))

But maybe it isn't
Consider the case of: Chain((rand(rand(1:10)) for ii in 1:1_000)...) for example.
Indeed, I myself often collect up sequences of words into sentences,
and then I might like to chain all the sentences into the words of the whole document.
For me, than n might be 100_000, or even 10_000_000
In this case, though perhaps splatting has already killed me.
So it may not be anything to worry about
You may want to use Base.flatten with that

IMO the way forward is to determine this when constructing Chain and include the iterator size as a type parameter
Adding that extra type parameter would let you calculate the iteratorsize at construction time -- the constructor is already O(n) where n is the number of iterators being chained.
(In 0.4 that parameter could be set to Void, I guess).
It could also let you calculate the length at construction time and store that in a field.

I note that you have define the iteratorsize recursively.
Which is very elegant (I really like defining things that way).
But julia does not support TCO, right now.
So that will be slower than a loop.
And can stackoverflow.
It should be rewritten to be done with a loop.

runtime evaluation of iteratorsize, as done with a loop, even at construction time, would cause type instability (see comments above). I fear much more that perfomancewise that recursion. Do you agree?

@oxinabox
Copy link
Member

oxinabox commented Oct 18, 2016

I think chain should be optimized for small n. It is a good thing also that the constructor chain(x...) enforces n to be small, in some sense

I agree.
Maybe we can in fact enforce that,
say by making chain throw up a warning if you chain more than 32 things (unless you a pass a override kwargument?)
And that warning can direct people to some new iterator: LongChain,
that takes an iterator of iterators as input (see #34),
and which is SizeUnknown

Wouldn't a generated function be type stable?
I think it should be.
We do know that iteratorsize in theory at compile time, no?

using Iterators
import Base: IsInfinite,SizeUnknown, SizeUnknown, iteratorsize, HasLength
using Base.Test

@generated function _gen_iteratorsize(args...)
    for itype in args
        if iteratorsize(itype)==Base.IsInfinite()
            return :(IsInfinite())
        elseif iteratorsize(itype)==Base.SizeUnknown()
            return :(SizeUnknown())
        end
    end
    return :(HasLength())
end

a1=[1,2,3]
a2=(2,3,4)
a3=cycle([23])
@test _gen_iteratorsize(a1,a2) == HasLength()
@test _gen_iteratorsize(a1,a3) == IsInfinite()

What do you think?

@CarloLucibello
Copy link
Contributor Author

I've never @generated anything yet :) I'll look into that

@CarloLucibello
Copy link
Contributor Author

and here it is the third or fourth life for this pr, now with generated iteratorsize

@oxinabox do you know of any downsides of generated functions? All I could find are some, almost creepy, "don't you use them if you don't need them"

@oxinabox
Copy link
Member

@oxinabox do you know of any downsides of generated functions? All I could find are some, almost creepy, "don't you use them if you don't need them"

They are spooky pseudo-macros.
Just like macro's you should not use them with you can avoid it.
But here I think they are the right tool for the job

There are a bunch of things you are not allowed to do with them.
In particular: JuliaLang/julia#18568
but also JuliaLang/julia#18747

There purpose is for writing type generic code, that depends only on the types on the arguments.
It seems like they are a perfect match for traits.
(and that as you found, it is nearly impossible to write code to deal with traits in a type stable way, without them)

@iamed2
Copy link
Collaborator

iamed2 commented Oct 19, 2016

The downside to this is that it will compile a function every time you call chain.

All these options will need some empirical measurement.

@CarloLucibello
Copy link
Contributor Author

CarloLucibello commented Oct 19, 2016

The downside to this is that it will compile a function every time you call chain.

only if you chain different types. But that would happen in any case using dispatch I guess.

All these options will need some empirical measurement.

Yeah, it would be nice to set up the benchmarking framework. I'm sure this is the best approach for runtime speed. I'm not sure about compile speed, though it may be not that relevant.
In any case, the only two viable approaches after the whole discussion are the present one and recursive dispatch.
Merge and eventually revisit later?

Copy link
Collaborator

@iamed2 iamed2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the coverage! I think I'm alright with this conceptually for now if it doesn't kill benchmarks.


c = chain(ch1, ch2, ch3)
@test length(c) == length(ch1) + length(ch2) + length(ch3)
#@test collect(c) == [ch1; ch2; ch3] # why doesn't work?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this commented-out line

@@ -106,7 +109,6 @@ end

# chain
# -----

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a nice blank line.

using Iterators, Base.Test

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bring back this whitespace.

@@ -1,5 +1,8 @@
include("../src/Iterators.jl")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this please.

iteratorsize{T<:Chain}(::Type{T}) = SizeUnknown()
if VERSION >= v"0.5.0-dev+3305"
# corner case of empty chain
iteratorsize(c::Chain) = iteratorsize(typeof(c))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this; there is already a generic function in Base which does this.

end

r = countfrom(1)
ch4 = chain(1:10, countfrom(1))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deserves an eltype test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

end

if VERSION >= v"0.5.0-dev+3305"
r = distinct(1:10)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use distinct(collect(1:10)); it's possible someone might make a no-op UnitRange optimization for distinct.

@CarloLucibello
Copy link
Contributor Author

why travis is taking so long? it's been like 9 hours

@iamed2
Copy link
Collaborator

iamed2 commented Oct 20, 2016

I'm not sure, but I think there are a finite number of julia workers so if a ton of Julia projects push things we'll be in a long line.

@kmsquire
Copy link
Member

Specifically, everything in the JuliaLang organization is run serially. This is why a number of packages have been recently migrated over to other organizations (JuliaMath, JuliaIO`, etc).

@CarloLucibello
Copy link
Contributor Author

is something missing for this?

@iamed2 iamed2 merged commit 509fb8b into JuliaCollections:master Oct 30, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants