Faster intersection of Indices #69

pevnak · 2021-10-06T19:22:45Z

I think that the intersection of Indices can be made trivially faster by ordering of arguments.

 a = distinct(rand(Int, 10000))
 b = distinct(rand(Int, 100))
julia> @btime intersect(a, b)
  134.240 μs (8 allocations: 284.55 KiB)
0-element Indices{Int64}

julia> @btime intersect(b, a)
  1.238 μs (5 allocations: 3.94 KiB)
0-element Indices{Int64}

julia> @btime length(a) > length(b) ? intersect(b, a) : intersect(a, b)
  1.283 μs (6 allocations: 3.95 KiB)
0-element Indices{Int64}

Would it make sense?

The text was updated successfully, but these errors were encountered:

pevnak · 2021-10-06T19:37:17Z

This did the job for me

function Base.intersect(s1::AbstractIndices, s2::AbstractIndices)
    length(s1) > length(s2) ? filter!(in(s2), copy(s1)) : filter!(in(s1), copy(s2))
end

andyferris · 2021-10-06T21:35:23Z

Thanks @pevnak

One issue I had was trying to get the order of elements to come out the same either way. I don’t know if that is important, but so far I’ve been conservative with ordering issues.

Not sure what other people’s opinions are?

pevnak · 2021-10-07T03:08:56Z

I cannot judge that. I am usually concerned with speed.

pevnak · 2021-10-07T04:19:00Z

Since there is a stable_sort and sort, would it make sense to have something similar? I apologize for abusing the comments, but the difference in speed I am getting in my application is enormous.

I guess it might be possible to implement filtering for the opposite case, but it might be awful. One would first invalidate everything and then turn on only items that are in the other set. In this way, I guess, the iteration might go over the other set. I guess you got my idea, but not sure if it something that should be done.

andyferris · 2021-10-07T04:21:59Z

Yes perhaps.

I apologize for abusing the comments, but the difference in speed I am getting in my application is enormous.

There's no abuse here :) I, too, take performance seriously - we'll get this sorted.

One consideration might be, when length(s1) > length(s2), to record the tokens in s1, and sort them and give that back.

andyferris · 2021-10-07T04:27:32Z

Another complication is the fact that either collection might be SizeUnknown (i.e. a FilteredIndices) so even length can be expensive.

pevnak · 2021-10-07T05:25:08Z

But it can be narrowed for types where this will be performant.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster intersection of Indices #69

Faster intersection of Indices #69

pevnak commented Oct 6, 2021 •

edited

Loading

pevnak commented Oct 6, 2021 •

edited

Loading

andyferris commented Oct 6, 2021 •

edited

Loading

pevnak commented Oct 7, 2021

pevnak commented Oct 7, 2021 •

edited

Loading

andyferris commented Oct 7, 2021 •

edited

Loading

andyferris commented Oct 7, 2021

pevnak commented Oct 7, 2021

Faster intersection of Indices #69

Faster intersection of Indices #69

Comments

pevnak commented Oct 6, 2021 • edited Loading

pevnak commented Oct 6, 2021 • edited Loading

andyferris commented Oct 6, 2021 • edited Loading

pevnak commented Oct 7, 2021

pevnak commented Oct 7, 2021 • edited Loading

andyferris commented Oct 7, 2021 • edited Loading

andyferris commented Oct 7, 2021

pevnak commented Oct 7, 2021

pevnak commented Oct 6, 2021 •

edited

Loading

pevnak commented Oct 6, 2021 •

edited

Loading

andyferris commented Oct 6, 2021 •

edited

Loading

pevnak commented Oct 7, 2021 •

edited

Loading

andyferris commented Oct 7, 2021 •

edited

Loading