Speed up `group_by_color` #116

gdalle · 2024-09-26T21:50:08Z

In group_by_color, instead of allocating one vector per color group, allocate a single common vector and return one view for each group. That way we only need $4$ allocations instead of $O(c_{\max})$.
Adjust docstrings because the type of the returned groups has changed from vector to view (but that does not break any public API).
Add tests for grouping.
Activate memory benchmarking in addition to time.

codecov · 2024-09-26T22:18:04Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (c28490d) to head (0832e31).
Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff            @@
##              main      #116   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           12        12           
  Lines          878       884    +6     
=========================================
+ Hits           878       884    +6

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

github-actions · 2024-09-26T22:32:51Z

Benchmark Results

	main	`dc514b9`...	main/dc514b923d33a4...
coloring/nonsymmetric/column/direct/n=1000/p=0.002	0.0861 ± 0.0041 ms	0.0858 ± 0.0039 ms	1
coloring/nonsymmetric/column/direct/n=1000/p=0.005	0.188 ± 0.0076 ms	0.187 ± 0.0077 ms	1
coloring/nonsymmetric/column/direct/n=1000/p=0.01	0.396 ± 0.013 ms	0.39 ± 0.014 ms	1.01
coloring/nonsymmetric/column/direct/n=100000/p=0.0001	0.0736 ± 0.002 s	0.0736 ± 0.0044 s	1
coloring/nonsymmetric/column/direct/n=100000/p=2.0e-5	13.5 ± 0.17 ms	13.5 ± 0.19 ms	1.01
coloring/nonsymmetric/column/direct/n=100000/p=5.0e-5	0.0324 ± 0.00061 s	0.0325 ± 0.0005 s	0.997
coloring/nonsymmetric/row/direct/n=1000/p=0.002	0.103 ± 0.0045 ms	0.103 ± 0.0047 ms	0.993
coloring/nonsymmetric/row/direct/n=1000/p=0.005	0.213 ± 0.0084 ms	0.214 ± 0.0089 ms	0.996
coloring/nonsymmetric/row/direct/n=1000/p=0.01	0.439 ± 0.015 ms	0.441 ± 0.016 ms	0.995
coloring/nonsymmetric/row/direct/n=100000/p=0.0001	0.0827 ± 0.0022 s	0.0817 ± 0.0017 s	1.01
coloring/nonsymmetric/row/direct/n=100000/p=2.0e-5	16 ± 0.22 ms	16 ± 0.23 ms	0.998
coloring/nonsymmetric/row/direct/n=100000/p=5.0e-5	0.0371 ± 0.00091 s	0.0372 ± 0.00098 s	0.999
coloring/symmetric/column/direct/n=1000/p=0.002	0.271 ± 0.014 ms	0.273 ± 0.014 ms	0.995
coloring/symmetric/column/direct/n=1000/p=0.005	0.74 ± 0.029 ms	0.739 ± 0.028 ms	1
coloring/symmetric/column/direct/n=1000/p=0.01	1.81 ± 0.06 ms	1.81 ± 0.055 ms	1
coloring/symmetric/column/direct/n=100000/p=0.0001	0.381 ± 0.031 s	0.448 ± 0.029 s	0.852
coloring/symmetric/column/direct/n=100000/p=2.0e-5	0.0372 ± 0.0011 s	0.0387 ± 0.0022 s	0.959
coloring/symmetric/column/direct/n=100000/p=5.0e-5	0.126 ± 0.0046 s	0.131 ± 0.0093 s	0.962
coloring/symmetric/column/substitution/n=1000/p=0.002	0.614 ± 0.031 ms	0.624 ± 0.032 ms	0.983
coloring/symmetric/column/substitution/n=1000/p=0.005	1.62 ± 0.062 ms	1.62 ± 0.059 ms	0.999
coloring/symmetric/column/substitution/n=1000/p=0.01	3.5 ± 0.1 ms	3.51 ± 0.098 ms	0.998
coloring/symmetric/column/substitution/n=100000/p=0.0001	0.775 ± 0.023 s	0.882 ± 0.017 s	0.879
coloring/symmetric/column/substitution/n=100000/p=2.0e-5	0.0968 ± 0.01 s	0.0979 ± 0.0077 s	0.989
coloring/symmetric/column/substitution/n=100000/p=5.0e-5	0.311 ± 0.021 s	0.313 ± 0.022 s	0.996
decompress/nonsymmetric/column/direct/n=1000/p=0.002	4.48 ± 0.36 μs	4.24 ± 0.32 μs	1.06
decompress/nonsymmetric/column/direct/n=1000/p=0.005	9.21 ± 0.63 μs	9.21 ± 0.78 μs	1
decompress/nonsymmetric/column/direct/n=1000/p=0.01	19.5 ± 1.1 μs	27.4 ± 7.2 μs	0.714
decompress/nonsymmetric/column/direct/n=100000/p=0.0001	5.16 ± 0.43 ms	5.09 ± 0.4 ms	1.01
decompress/nonsymmetric/column/direct/n=100000/p=2.0e-5	1.03 ± 0.14 ms	1.02 ± 0.11 ms	1.01
decompress/nonsymmetric/column/direct/n=100000/p=5.0e-5	2.51 ± 0.24 ms	2.51 ± 0.23 ms	0.998
decompress/nonsymmetric/row/direct/n=1000/p=0.002	4.32 ± 0.3 μs	4.2 ± 0.33 μs	1.03
decompress/nonsymmetric/row/direct/n=1000/p=0.005	8.57 ± 0.53 μs	7.92 ± 0.59 μs	1.08
decompress/nonsymmetric/row/direct/n=1000/p=0.01	16.6 ± 0.93 μs	17.7 ± 1.7 μs	0.934
decompress/nonsymmetric/row/direct/n=100000/p=0.0001	2.06 ± 0.39 ms	2.04 ± 0.16 ms	1.01
decompress/nonsymmetric/row/direct/n=100000/p=2.0e-5	0.401 ± 0.05 ms	0.39 ± 0.057 ms	1.03
decompress/nonsymmetric/row/direct/n=100000/p=5.0e-5	0.94 ± 0.11 ms	0.934 ± 0.17 ms	1.01
decompress/symmetric/column/direct/n=1000/p=0.002	4.17 ± 0.31 μs	4.1 ± 0.3 μs	1.01
decompress/symmetric/column/direct/n=1000/p=0.005	8.32 ± 0.46 μs	8.07 ± 0.52 μs	1.03
decompress/symmetric/column/direct/n=1000/p=0.01	17.2 ± 0.98 μs	16.8 ± 1.1 μs	1.02
decompress/symmetric/column/direct/n=100000/p=0.0001	4.26 ± 0.19 ms	5.42 ± 2.4 ms	0.786
decompress/symmetric/column/direct/n=100000/p=2.0e-5	0.827 ± 0.1 ms	0.825 ± 0.11 ms	1
decompress/symmetric/column/direct/n=100000/p=5.0e-5	2.17 ± 0.64 ms	2.09 ± 0.32 ms	1.04
decompress/symmetric/column/substitution/n=1000/p=0.002	0.0638 ± 0.0038 ms	0.0649 ± 0.0041 ms	0.983
decompress/symmetric/column/substitution/n=1000/p=0.005	0.158 ± 0.0068 ms	0.16 ± 0.0075 ms	0.991
decompress/symmetric/column/substitution/n=1000/p=0.01	0.34 ± 0.014 ms	0.343 ± 0.014 ms	0.993
decompress/symmetric/column/substitution/n=100000/p=0.0001	0.0637 ± 0.002 s	0.0657 ± 0.00054 s	0.969
decompress/symmetric/column/substitution/n=100000/p=2.0e-5	12.6 ± 0.3 ms	12.8 ± 0.49 ms	0.986
decompress/symmetric/column/substitution/n=100000/p=5.0e-5	29.7 ± 0.49 ms	31.1 ± 0.93 ms	0.953
time_to_load	0.216 ± 0.0013 s	0.216 ± 0.002 s	1

	main	`dc514b9`...	main/dc514b923d33a4...
coloring/nonsymmetric/column/direct/n=1000/p=0.002	13 allocs: 0.0593 MB	9 allocs: 0.0585 MB	1.02
coloring/nonsymmetric/column/direct/n=1000/p=0.005	23 allocs: 0.103 MB	11 allocs: 0.103 MB	1
coloring/nonsymmetric/column/direct/n=1000/p=0.01	0.042 k allocs: 0.178 MB	11 allocs: 0.178 MB	0.996
coloring/nonsymmetric/column/direct/n=100000/p=0.0001	0.074 k allocs: 18.2 MB	15 allocs: 18.3 MB	0.998
coloring/nonsymmetric/column/direct/n=100000/p=2.0e-5	27 allocs: 6.08 MB	15 allocs: 6.08 MB	1
coloring/nonsymmetric/column/direct/n=100000/p=5.0e-5	0.042 k allocs: 10.6 MB	15 allocs: 10.6 MB	1
coloring/nonsymmetric/row/direct/n=1000/p=0.002	21 allocs: 0.0944 MB	17 allocs: 0.0935 MB	1.01
coloring/nonsymmetric/row/direct/n=1000/p=0.005	31 allocs: 0.183 MB	19 allocs: 0.182 MB	1
coloring/nonsymmetric/row/direct/n=1000/p=0.01	0.049 k allocs: 0.332 MB	19 allocs: 0.33 MB	1
coloring/nonsymmetric/row/direct/n=100000/p=0.0001	0.083 k allocs: 0.0334 GB	24 allocs: 0.0334 GB	1
coloring/nonsymmetric/row/direct/n=100000/p=2.0e-5	0.037 k allocs: 9.86 MB	24 allocs: 9.87 MB	1
coloring/nonsymmetric/row/direct/n=100000/p=5.0e-5	0.051 k allocs: 19 MB	24 allocs: 19 MB	0.998
coloring/symmetric/column/direct/n=1000/p=0.002	1.08 k allocs: 0.275 MB	1.07 k allocs: 0.275 MB	1
coloring/symmetric/column/direct/n=1000/p=0.005	2.66 k allocs: 0.425 MB	2.65 k allocs: 0.426 MB	0.999
coloring/symmetric/column/direct/n=1000/p=0.01	5.69 k allocs: 1.1 MB	5.71 k allocs: 1.1 MB	1
coloring/symmetric/column/direct/n=100000/p=0.0001	0.603 M allocs: 0.104 GB	0.604 M allocs: 0.105 GB	0.999
coloring/symmetric/column/direct/n=100000/p=2.0e-5	0.117 M allocs: 23.7 MB	0.117 M allocs: 23.7 MB	1
coloring/symmetric/column/direct/n=100000/p=5.0e-5	0.286 M allocs: 0.0517 GB	0.287 M allocs: 0.0517 GB	0.999
coloring/symmetric/column/substitution/n=1000/p=0.002	3.46 k allocs: 0.667 MB	3.44 k allocs: 0.67 MB	0.997
coloring/symmetric/column/substitution/n=1000/p=0.005	6.59 k allocs: 1.19 MB	6.73 k allocs: 1.2 MB	0.989
coloring/symmetric/column/substitution/n=1000/p=0.01	12.6 k allocs: 2.65 MB	12.6 k allocs: 2.65 MB	0.999
coloring/symmetric/column/substitution/n=100000/p=0.0001	1.29 M allocs: 0.231 GB	1.29 M allocs: 0.231 GB	0.999
coloring/symmetric/column/substitution/n=100000/p=2.0e-5	0.39 M allocs: 0.0626 GB	0.388 M allocs: 0.0625 GB	1
coloring/symmetric/column/substitution/n=100000/p=5.0e-5	0.699 M allocs: 0.122 GB	0.702 M allocs: 0.123 GB	0.995
decompress/nonsymmetric/column/direct/n=1000/p=0.002	3 allocs: 0.0353 MB	3 allocs: 0.0354 MB	0.997
decompress/nonsymmetric/column/direct/n=1000/p=0.005	5 allocs: 0.0787 MB	5 allocs: 0.0792 MB	0.994
decompress/nonsymmetric/column/direct/n=1000/p=0.01	5 allocs: 0.153 MB	5 allocs: 0.153 MB	0.997
decompress/nonsymmetric/column/direct/n=100000/p=0.0001	6 allocs: 16 MB	6 allocs: 16 MB	1
decompress/nonsymmetric/column/direct/n=100000/p=2.0e-5	6 allocs: 3.79 MB	6 allocs: 3.79 MB	0.999
decompress/nonsymmetric/column/direct/n=100000/p=5.0e-5	6 allocs: 8.36 MB	6 allocs: 8.35 MB	1
decompress/nonsymmetric/row/direct/n=1000/p=0.002	3 allocs: 0.0349 MB	3 allocs: 0.0355 MB	0.983
decompress/nonsymmetric/row/direct/n=1000/p=0.005	5 allocs: 0.0791 MB	5 allocs: 0.0791 MB	1
decompress/nonsymmetric/row/direct/n=1000/p=0.01	5 allocs: 0.154 MB	5 allocs: 0.155 MB	0.997
decompress/nonsymmetric/row/direct/n=100000/p=0.0001	6 allocs: 16 MB	6 allocs: 16 MB	1
decompress/nonsymmetric/row/direct/n=100000/p=2.0e-5	6 allocs: 3.79 MB	6 allocs: 3.79 MB	1
decompress/nonsymmetric/row/direct/n=100000/p=5.0e-5	6 allocs: 8.36 MB	6 allocs: 8.37 MB	0.999
decompress/symmetric/column/direct/n=1000/p=0.002	3 allocs: 0.0347 MB	3 allocs: 0.0352 MB	0.986
decompress/symmetric/column/direct/n=1000/p=0.005	5 allocs: 0.0792 MB	5 allocs: 0.078 MB	1.02
decompress/symmetric/column/direct/n=1000/p=0.01	5 allocs: 0.152 MB	5 allocs: 0.152 MB	1
decompress/symmetric/column/direct/n=100000/p=0.0001	6 allocs: 16 MB	6 allocs: 16 MB	1
decompress/symmetric/column/direct/n=100000/p=2.0e-5	6 allocs: 3.79 MB	6 allocs: 3.79 MB	0.998
decompress/symmetric/column/direct/n=100000/p=5.0e-5	6 allocs: 8.36 MB	6 allocs: 8.37 MB	0.999
decompress/symmetric/column/substitution/n=1000/p=0.002	3 allocs: 0.0348 MB	3 allocs: 0.0352 MB	0.99
decompress/symmetric/column/substitution/n=1000/p=0.005	5 allocs: 0.0789 MB	5 allocs: 0.0797 MB	0.991
decompress/symmetric/column/substitution/n=1000/p=0.01	5 allocs: 0.154 MB	5 allocs: 0.154 MB	1
decompress/symmetric/column/substitution/n=100000/p=0.0001	6 allocs: 16 MB	6 allocs: 16 MB	1
decompress/symmetric/column/substitution/n=100000/p=2.0e-5	6 allocs: 3.79 MB	6 allocs: 3.8 MB	0.999
decompress/symmetric/column/substitution/n=100000/p=5.0e-5	6 allocs: 8.37 MB	6 allocs: 8.36 MB	1
time_to_load	0.153 k allocs: 14.5 kB	0.153 k allocs: 14.5 kB	1

gdalle · 2024-09-26T22:46:00Z

On second thought it is a kind of bucket sort, maybe we could just use sort from Base

gdalle · 2024-09-26T22:59:55Z

An example of the benefits, for the following code:

using SparseMatrixColorings
using SparseArrays

problem = ColoringProblem(; structure=:nonsymmetric, partition=:column)
algo = GreedyColoringAlgorithm(; decompression=:direct)
A = sprand(Bool, 1000, 1000, 0.02)

coloring(A, problem, algo)
@profview_allocs for _ in 1:10000; coloring(A, problem, algo); end

Before

Allocation count

Allocation size

After

Allocation count

Allocation size

amontoison · 2024-09-27T02:04:16Z

@gdalle Your argument is that we have a better frame graph?
Honestly, I've rarely seen such a weak argument—it has no real benefit for the user, and on our side, we profile routine by routine if readability really becomes an issue.

Sincerely, I don't think we should do this.
I prefer a Vector{Vector{Int}} as output, and your approach doesn't yield any memory savings.
If the number of allocations is significant, it should be reflected in the execution time.
Otherwise, it's all for nothing.
So far, we haven't seen any gains, not even with row or column coloring.

I think we should focus on more important issues for now and potentially revisit this PR later if you really want to keep it.

gdalle · 2024-09-27T04:55:47Z

Your argument is that we have a better frame graph?

No, it's a combination of things (profiling was just one of those):

The number of allocations is reduced, and from what I understand each allocation is costly regardless of its size (see this Discourse thread)
Yes, it makes profiling and benchmarking easier because we can now exactly count the number of allocations for column and row coloring. This can in turn become part of the test suite.
Memory locality is improved with a single flat vector compared to a vector of vector.

Honestly, I've rarely seen such a weak argument

First of all, let's remain civil please.

If the number of allocations is significant, it should be reflected in the execution time. Otherwise, it's all for nothing. So far, we haven't seen any gains, not even with row or column coloring.

Benchmarks are noisy and they don't tell the whole story, otherwise your manual transposition would have been clearly superior in #107. On certain use cases this view approach might be faster, on most cases it probably won't make much of a difference, but even then what's the harm? It's 10 LOCs to have a better quantitative understanding of how many allocations happen.

I prefer a Vector{Vector{Int}} as output

I haven't seen you make a strong case for this either: what are your arguments?

Performance-wise, if you are not okay with using things like view(x, i:j) where x is a plain vector, then we need to rethink the whole graph structure cause I don't see a way around it for neighbor enumeration. It's about as fast as it gets, which is why it is also used in SimpleWeightedGraphs.jl.
Usage-wise, you can enumerate and iterate over each group just fine, you can even modify it in-place without troubles.

I think we should focus on more important issues for now and potentially revisit this PR later if you really want to keep it.

Here's the thing though: the right way to decide about important issues on performance is to profile the code to find the bottlenecks. If the profile is shitty to read because there's one source of allocation taking up all the space, this makes our life harder for no reason.

gdalle · 2024-09-27T05:58:24Z

TLDR: my approach is slightly worse when there are very very few colors (3), and faster otherwise (>=10).

Here's a benchmark taking only the grouping into account:

using BenchmarkTools

function compute_group_sizes(colors::Vector{Int})
    cmax = maximum(colors)
    group_sizes = zeros(Int, cmax)
    for c in colors
        group_sizes[c] += 1
    end
    return group_sizes
end

function split_vecvec(colors::Vector{Int})
    group_sizes = compute_group_sizes(colors)
    groups = [Vector{Int}(undef, group_sizes[c]) for c in eachindex(group_sizes)]
    fill!(group_sizes, 0)
    for (k, c) in enumerate(colors)
        group_sizes[c] += 1
        pos = group_sizes[c]
        groups[c][pos] = k
    end
    return groups
end

function split_vecview(colors::Vector{Int})
    group_sizes = compute_group_sizes(colors)
    group_offsets = cumsum(group_sizes)
    groups_flat = similar(colors)
    for (k, c) in enumerate(colors)
        i = group_offsets[c] - group_sizes[c] + 1
        groups_flat[i] = k
        group_sizes[c] -= 1
    end
    TV = typeof(view(groups_flat, 1:1))
    groups = Vector{TV}(undef, length(group_sizes))  # allocation 4, size cmax
    for c in eachindex(group_sizes)
        i = 1 + (c == 1 ? 0 : group_offsets[c - 1])
        j = group_offsets[c]
        groups[c] = view(groups_flat, i:j)
    end
    return groups
end

And the benchmarking results (> 1 means the approach with views is better):

julia> for n in 10 .^ (2, 3, 4, 5), cmax in (3, 10, 30, 100)
           yield()
           bench_vecvec = @benchmark split_vecvec(_colors) setup = (_colors = rand(1:($cmax), $n))
           bench_vecview = @benchmark split_vecview(_colors) setup = (
               _colors = rand(1:($cmax), $n)
           )
           ratios = (
               time=minimum(bench_vecvec).time / minimum(bench_vecview).time,
               memory=minimum(bench_vecvec).memory / minimum(bench_vecview).memory,
               allocs=minimum(bench_vecvec).allocs / minimum(bench_vecview).allocs,
           )
           @info "Vecvec / vecview ratios - n=$n, cmax=$cmax" ratios.time ratios.memory ratios.allocs
       end
┌ Info: Vecvec / vecview ratios - n=100, cmax=3
│   ratios.time = 0.9921126179496853
│   ratios.memory = 0.922077922077922
└   ratios.allocs = 1.25
┌ Info: Vecvec / vecview ratios - n=100, cmax=10
│   ratios.time = 1.4329145256099618
│   ratios.memory = 0.9714285714285714
└   ratios.allocs = 3.0
┌ Info: Vecvec / vecview ratios - n=100, cmax=30
│   ratios.time = 1.9765313592357387
│   ratios.memory = 1.1
└   ratios.allocs = 7.5
┌ Info: Vecvec / vecview ratios - n=100, cmax=100
│   ratios.time = 2.646371976647206
│   ratios.memory = 1.243718592964824
└   ratios.allocs = 23.0
┌ Info: Vecvec / vecview ratios - n=1000, cmax=3
│   ratios.time = 1.1245857661353953
│   ratios.memory = 1.00945179584121
└   ratios.allocs = 1.25
┌ Info: Vecvec / vecview ratios - n=1000, cmax=10
│   ratios.time = 1.099305752912996
│   ratios.memory = 1.007181328545781
└   ratios.allocs = 3.0
┌ Info: Vecvec / vecview ratios - n=1000, cmax=30
│   ratios.time = 1.1895478015959944
│   ratios.memory = 1.0316957210776545
└   ratios.allocs = 8.0
┌ Info: Vecvec / vecview ratios - n=1000, cmax=100
│   ratios.time = 1.5270439790191834
│   ratios.memory = 1.1164383561643836
└   ratios.allocs = 25.5
┌ Info: Vecvec / vecview ratios - n=10000, cmax=3
│   ratios.time = 0.9527100549951475
│   ratios.memory = 0.9990047770700637
└   ratios.allocs = 1.6
┌ Info: Vecvec / vecview ratios - n=10000, cmax=10
│   ratios.time = 1.050451626197754
│   ratios.memory = 1.0096991290577988
└   ratios.allocs = 2.4
┌ Info: Vecvec / vecview ratios - n=10000, cmax=30
│   ratios.time = 1.2483034681177096
│   ratios.memory = 1.0349200156067109
└   ratios.allocs = 6.4
┌ Info: Vecvec / vecview ratios - n=10000, cmax=100
│   ratios.time = 1.2643422354104847
│   ratios.memory = 1.0528372093023255
└   ratios.allocs = 20.4
┌ Info: Vecvec / vecview ratios - n=100000, cmax=3
│   ratios.time = 0.9579135843188993
│   ratios.memory = 0.9999000479769711
└   ratios.allocs = 1.6
┌ Info: Vecvec / vecview ratios - n=100000, cmax=10
│   ratios.time = 1.0529008105578626
│   ratios.memory = 0.9999800207783904
└   ratios.allocs = 4.4
┌ Info: Vecvec / vecview ratios - n=100000, cmax=30
│   ratios.time = 1.0253449503339322
│   ratios.memory = 1.0005785420739737
└   ratios.allocs = 12.4
┌ Info: Vecvec / vecview ratios - n=100000, cmax=100
│   ratios.time = 1.0237008826453011
│   ratios.memory = 1.0133598014888336
└   ratios.allocs = 20.4

gdalle · 2024-09-29T17:13:19Z

@amontoison thoughts on this one? When we benchmark the grouping function on its own, as you see above, the benefits are clear as soon as we go beyond cmax=3, especially for rather small matrices.

gdalle · 2024-10-06T20:53:44Z

Closing temporarily because bicoloring will require rethinking this grouping function. We can reoptimize it afterwards

gdalle · 2024-10-07T15:14:47Z

Actually figured out a way to keep the same grouping behavior in the bicoloring branch, so we can merge this one

gdalle · 2024-10-07T18:53:08Z

@amontoison this is a very quick review and while the global benchmarks don't show much of a difference, the specific benchmarks in this comment strongly support this change. What do you think?

amontoison · 2024-10-07T18:57:40Z

Merged 😉

gdalle added 2 commits September 26, 2024 23:48

Constant number of allocations in group_by_color

febdac2

Improvements and benchmark memory

f449ca1

gdalle added the benchmark Run benchmarks on PR label Sep 26, 2024

Remove playground

64fc995

gdalle requested a review from amontoison September 26, 2024 22:15

Merge remote-tracking branch 'origin/main' into gd/better_grouping

dc514b9

gdalle removed the benchmark Run benchmarks on PR label Sep 26, 2024

Merge branch 'main' into gd/better_grouping

8be4247

gdalle closed this Oct 6, 2024

gdalle reopened this Oct 7, 2024

gdalle changed the title ~~Constant number of allocations in group_by_color~~ Speed up group_by_color Oct 7, 2024

gdalle added 2 commits October 7, 2024 20:39

Merge remote-tracking branch 'origin/main' into gd/better_grouping

c59de95

Docstrings

0832e31

amontoison approved these changes Oct 7, 2024

View reviewed changes

amontoison merged commit e349f50 into main Oct 7, 2024
7 checks passed

amontoison deleted the gd/better_grouping branch October 7, 2024 18:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up `group_by_color` #116

Speed up `group_by_color` #116

gdalle commented Sep 26, 2024 •

edited

Loading

codecov bot commented Sep 26, 2024 •

edited

Loading

github-actions bot commented Sep 26, 2024 •

edited

Loading

gdalle commented Sep 26, 2024

gdalle commented Sep 26, 2024

amontoison commented Sep 27, 2024

gdalle commented Sep 27, 2024 •

edited

Loading

gdalle commented Sep 27, 2024 •

edited

Loading

gdalle commented Sep 29, 2024

gdalle commented Oct 6, 2024

gdalle commented Oct 7, 2024

gdalle commented Oct 7, 2024

amontoison commented Oct 7, 2024

Speed up group_by_color #116

Speed up group_by_color #116

Conversation

gdalle commented Sep 26, 2024 • edited Loading

codecov bot commented Sep 26, 2024 • edited Loading

Codecov Report

github-actions bot commented Sep 26, 2024 • edited Loading

Benchmark Results

gdalle commented Sep 26, 2024

gdalle commented Sep 26, 2024

amontoison commented Sep 27, 2024

gdalle commented Sep 27, 2024 • edited Loading

gdalle commented Sep 27, 2024 • edited Loading

gdalle commented Sep 29, 2024

gdalle commented Oct 6, 2024

gdalle commented Oct 7, 2024

gdalle commented Oct 7, 2024

amontoison commented Oct 7, 2024

Speed up `group_by_color` #116

Speed up `group_by_color` #116

gdalle commented Sep 26, 2024 •

edited

Loading

codecov bot commented Sep 26, 2024 •

edited

Loading

github-actions bot commented Sep 26, 2024 •

edited

Loading

gdalle commented Sep 27, 2024 •

edited

Loading

gdalle commented Sep 27, 2024 •

edited

Loading