-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add IntSet to DataStructures #114
Conversation
Cc: @StephenVavasis |
(Some branches are missing coverage since they would require 2^32 or 2^64 bits of storage in order to test completely)
Note that this behaves way better than Base with regards to complement IntSets, which is precisely why I wanted to eliminate Base: julia> c = complement(IntSet(0:255))
WARNING: storing zero in IntSets is deprecated
in depwarn at /Users/mbauman/Applications/julia-0.4/lib/julia/sys.dylib
in push! at /Users/mbauman/Applications/julia-0.4/lib/julia/sys.dylib
in call at intset.jl:10
while loading no file, in expression starting on line 0
WARNING: complement IntSets are deprecated
in depwarn at /Users/mbauman/Applications/julia-0.4/lib/julia/sys.dylib
in complement! at deprecated.jl:732
in complement at deprecated.jl:739
while loading no file, in expression starting on line 0
IntSet([256, ..., 9223372036854775806])
julia> pop!(c, 256)
ERROR: InexactError()
in sizehint! at /Users/mbauman/Applications/julia-0.4/lib/julia/sys.dylib
in pop! at intset.jl:80
in pop! at intset.jl:101
julia> pop!(c, 256)
256
julia> c
IntSet([384, ..., 9223372036854775806])
julia> first(c)
ERROR: ArgumentError: set must be non-empty
in first at intset.jl:186
julia> last(c)
ERROR: ArgumentError: set has no last element
in last at intset.jl:202 This PR: julia> using DataStructures
const IntSet = DataStructures.IntSet
DataStructures.IntSet
julia> c = complement(IntSet(0:255))
IntSet([256, 257, 258, ..., 9223372036854775806])
julia> pop!(c, 256)
256
julia> c
IntSet([257, 258, 259, ..., 9223372036854775806])
julia> first(c)
257
julia> last(c)
9223372036854775806 |
It looks good to me, but shouldn't it be documented in DataStructures' README.rst? Can the IntSet documentation from the Julia manual just be copied and pasted? |
Yes, definitely. The language should be tightened up as compared to the base documentation, but it can serve as a good start. This version doesn't hold non-negative integers, it holds |
Alright, I've added a blurb to README.rst. Any thoughts about the name and/or export? It's actually a sorted set — should it be called |
Matt, IntSet is not a sorted set in the sense of the sorted containers of DataStructures.jl. A sorted container has the property that its entries can be enumerated in sort-order in time proportional to O(n) or O(n log n), where n is the number entries. To enumerate an IntSet in order requires O(m), where m is the maximum entry. But meanwhile, I am confused about the general question of renaming. I thought that the point of moving IntSet to DataStructures was because you and others are designing a new 'core' bitvector type for managing indices, but you want to leave the old IntSet around for users who depend on it in their codes. In this case, we should preserve the same name IntSet and give a different name to the new bitvector type? |
The elements are enumerated in sorted order, but you're right that the algorithmic complexities are different from the other Like I said, I don't feel strongly. I'm just trying to find a solution that will work the best during this migration time where there's both
In any of the cases above, compatibility is pretty straight-forward… just use DataStructures version if you want stored zeros and complements. Otherwise, Base's version will do you just fine. Eventually, Base's name will be changed to |
|
I'd rather an |
I vote for Matt's first option (call it DataStructures.IntSet but don't export it). It is easier to add two statements to each module than to change a name everywhere it occurs. In addition, that option is easier on people writing packages that are supposed to support Julia 0.3, 0.4 and 0.5. Package writers could simply start their program with @compat import DataStructures.IntSet and the compat package would do the right thing. (@stevengj?) |
No need for |
Matt, I think that compat or something similar will be needed for a package-writer who wants to use IntSet and wants his/her program to work correctly for 0.3, 0.4, 0.5. In 0.3 and earlier 0.4 masters, IntSet is in Base. In upcoming 0.4 masters, it will issue deprecation warnings, In 0.5, it may be gone from Base completely and only in DataStructures. So it is seems like some metaprogramming is necessary for a package that uses IntSet and is supposed to work in all three Julia versions. Note that my sorted containers code falls into this category. |
Right, my point is that regardless of the Julia version, folks should just choose to use this implementation if they rely on 0s or complements. In fact, that's what happens for the sorted containers here. On both 0.3 and 0.4, they use this implementation. There's no reason to use Base.IntSet on 0.3 and this on 0.4 -- just use this on both! Then even if it disappears from base in 0.5, you'll already have been using the replacement. |
As far as I'm concerned, this is ready to merge. I'm not a heavy user of or contributor to DataStructures, however, so I'd prefer someone else to review and pull the button here. |
There also a warning during testing atm
|
Hi @mbauman, sorry to ignore this for so long. Can you fix the conflicts, and I'll merge? |
Bump. |
Merged this in #160. |
This moves my work at JuliaLang/julia#10065 into DataStructures. I don't export the name DataStructures.IntSet since I'm not sure what the best way to do this would be. Please feel free to push modifications to this branch. We could either rename the type in DataStructures, or we could export it (forcing folks to manually choose
Base.IntSet
orDataStructures.IntSet
), or we could leave it as is (which means that you useBase.IntSet
by default unless you explicitly importDataStructures.IntSet
).Note that this is the pure Julia refactor, not the current Base implementation (which relies upon some C code in base). That means that it's a little slower in a few cases, but for the most part it's very competitive or way faster.
We could also easily add a second BitVector to the type in the future to add support for negative
Int
s, making this a trueIntSet
(well, except we still wouldn't be able to reachtypemax(Int)
ortypemin(Int)
, but you can't do that on a 64 bit machine and it's a terrible idea in any case).