Multithreaded custom grouped operations with single-row result #2588

nalimilan · 2020-12-27T14:05:18Z

This is a continuation of (though it's mostly orthogonal to) #2574.

Spawn one task per thread in _combine_rows_with_first! so that custom grouped operations that return a single row are run in parallel. This is optimal if operations take about the same time for all groups. Spawning one task per group could be faster if these times vary a lot, but the overhead would be larger: we could add this as an option later.

The implementation is somewhat tricky as output columns need to be reallocated when a new return type is detected.

Benchmarks for a relatively fast operation such as (non-fast-path) sum (which should be the worst case for this reason) show nice speedups with multiple threads even for small number of rows, and a limited overhead when a single thread is available.

using Revise, DataFrames, BenchmarkTools, Random
Random.seed!(1);
for N in (1_000, 10_000, 1_000_000, 100_000_000)
    for k in (10, 100, 1000, N)
        @show N, k
        df = DataFrame(x=rand(1:k, N), y=rand(N));
        gd = groupby(df, :x);
        @btime combine($gd, :y => y -> sum(y));
        @btime combine($gd, :y => (y -> sum(y)) => :y1, :y => maximum => :y2);
        @btime combine($gd, :y => (y -> sum(y)) => :y1, :y => (y -> maximum(y)) => :y2);
    end
end

# main, 10 threads
(N, k) = (1000, 10)
  96.437 μs (297 allocations: 25.08 KiB)
  49.339 μs (328 allocations: 27.02 KiB)
  100.763 μs (443 allocations: 37.94 KiB)
(N, k) = (1000, 100)
  97.507 μs (1016 allocations: 44.30 KiB)
  91.216 μs (1048 allocations: 46.98 KiB)
  99.144 μs (1884 allocations: 75.03 KiB)
(N, k) = (1000, 1000)
  282.916 μs (5553 allocations: 164.27 KiB)
  235.435 μs (5591 allocations: 171.31 KiB)
  250.699 μs (10956 allocations: 306.42 KiB)
(N, k) = (1000, 1000)
  228.583 μs (5524 allocations: 163.50 KiB)
  276.059 μs (5561 allocations: 170.45 KiB)
  220.080 μs (10897 allocations: 304.98 KiB)
(N, k) = (10000, 10)
  108.358 μs (296 allocations: 95.86 KiB)
  209.613 μs (329 allocations: 97.86 KiB)
  249.938 μs (444 allocations: 179.59 KiB)
(N, k) = (10000, 100)
  128.388 μs (1016 allocations: 118.12 KiB)
  176.558 μs (1049 allocations: 120.84 KiB)
  167.717 μs (1883 allocations: 222.66 KiB)
(N, k) = (10000, 1000)
  404.830 μs (9203 allocations: 323.69 KiB)
  425.396 μs (9241 allocations: 333.55 KiB)
  388.231 μs (18256 allocations: 619.64 KiB)
(N, k) = (10000, 10000)
  1.794 ms (62557 allocations: 1.61 MiB)
  1.797 ms (62594 allocations: 1.66 MiB)
  1.722 ms (124961 allocations: 3.11 MiB)
(N, k) = (1000000, 10)
  5.607 ms (307 allocations: 7.65 MiB)
  4.789 ms (338 allocations: 7.65 MiB)
  8.015 ms (463 allocations: 15.28 MiB)
(N, k) = (1000000, 100)
  5.890 ms (1117 allocations: 7.67 MiB)
  6.125 ms (1149 allocations: 7.67 MiB)
  13.363 ms (2083 allocations: 15.32 MiB)
(N, k) = (1000000, 1000)
  6.655 ms (9203 allocations: 7.94 MiB)
  6.808 ms (9241 allocations: 7.95 MiB)
  13.974 ms (18256 allocations: 15.85 MiB)
(N, k) = (1000000, 1000000)
  232.730 ms (6325576 allocations: 160.56 MiB)
  226.783 ms (6325614 allocations: 165.38 MiB)
  269.170 ms (12651002 allocations: 311.45 MiB)
(N, k) = (100000000, 10)
  1.228 s (306 allocations: 762.96 MiB)
  1.254 s (339 allocations: 762.96 MiB)
  2.909 s (464 allocations: 1.49 GiB)
(N, k) = (100000000, 100)
  1.161 s (1117 allocations: 762.98 MiB)
  1.182 s (1148 allocations: 762.98 MiB)
  2.864 s (2084 allocations: 1.49 GiB)
(N, k) = (100000000, 1000)
  1.205 s (10203 allocations: 763.20 MiB)
  1.215 s (10241 allocations: 763.21 MiB)
  3.004 s (20256 allocations: 1.49 GiB)
(N, k) = (100000000, 100000000)
  22.971 s (632119236 allocations: 15.67 GiB)
  23.605 s (632119274 allocations: 16.14 GiB)
  32.581 s (1264238322 allocations: 30.39 GiB)

# nl/threadedops2, 10 threads
(N, k) = (1000, 10)
  64.665 μs (381 allocations: 30.89 KiB)
  71.676 μs (415 allocations: 32.89 KiB)
  82.551 μs (615 allocations: 49.64 KiB)
(N, k) = (1000, 100)
  66.255 μs (664 allocations: 43.78 KiB)
  74.061 μs (698 allocations: 46.50 KiB)
  83.251 μs (1181 allocations: 73.98 KiB)
(N, k) = (1000, 1000)
  81.080 μs (2285 allocations: 118.19 KiB)
  76.983 μs (2324 allocations: 125.23 KiB)
  91.562 μs (4422 allocations: 214.28 KiB)
(N, k) = (1000, 1000)
  77.146 μs (2276 allocations: 117.72 KiB)
  84.048 μs (2315 allocations: 124.70 KiB)
  91.479 μs (4404 allocations: 213.47 KiB)
(N, k) = (10000, 10)
  71.757 μs (381 allocations: 101.70 KiB)
  100.601 μs (415 allocations: 103.70 KiB)
  89.618 μs (615 allocations: 191.27 KiB)
(N, k) = (10000, 100)
  68.208 μs (663 allocations: 117.58 KiB)
  103.174 μs (698 allocations: 120.33 KiB)
  88.001 μs (1181 allocations: 221.64 KiB)
(N, k) = (10000, 1000)
  106.391 μs (3386 allocations: 237.78 KiB)
  108.113 μs (3425 allocations: 247.64 KiB)
  131.220 μs (6624 allocations: 447.84 KiB)
(N, k) = (10000, 10000)
  188.207 μs (19409 allocations: 976.09 KiB)
  226.279 μs (19449 allocations: 1.00 MiB)
  291.965 μs (38668 allocations: 1.80 MiB)
(N, k) = (1000000, 10)
  1.837 ms (391 allocations: 7.65 MiB)
  4.315 ms (425 allocations: 7.65 MiB)
  2.858 ms (635 allocations: 15.29 MiB)
(N, k) = (1000000, 100)
  1.018 ms (764 allocations: 7.67 MiB)
  4.213 ms (799 allocations: 7.67 MiB)
  2.552 ms (1381 allocations: 15.32 MiB)
(N, k) = (1000000, 1000)
  1.031 ms (3386 allocations: 7.85 MiB)
  4.094 ms (3426 allocations: 7.86 MiB)
  2.855 ms (6624 allocations: 15.68 MiB)
(N, k) = (1000000, 1000000)
  24.385 ms (1898314 allocations: 93.01 MiB)
  21.404 ms (1898354 allocations: 97.83 MiB)
  40.751 ms (3796481 allocations: 176.35 MiB)
(N, k) = (100000000, 10)
  498.879 ms (392 allocations: 762.96 MiB)
  405.193 ms (425 allocations: 762.96 MiB)
  960.752 ms (637 allocations: 1.49 GiB)
(N, k) = (100000000, 100)
  282.862 ms (764 allocations: 762.98 MiB)
  414.542 ms (799 allocations: 762.98 MiB)
  817.869 ms (1382 allocations: 1.49 GiB)
(N, k) = (100000000, 1000)
  257.536 ms (4387 allocations: 763.12 MiB)
  398.462 ms (4425 allocations: 763.13 MiB)
  734.363 ms (8623 allocations: 1.49 GiB)
(N, k) = (100000000, 100000000)
  7.108 s (189636413 allocations: 9.07 GiB)
  9.061 s (189636454 allocations: 9.54 GiB)
  13.085 s (379272679 allocations: 17.21 GiB)

# main, 1 thread
(N, k) = (1000, 10)
  29.853 μs (295 allocations: 25.02 KiB)
  38.357 μs (324 allocations: 26.89 KiB)
  47.261 μs (439 allocations: 37.81 KiB)
(N, k) = (1000, 100)
  47.884 μs (1014 allocations: 44.23 KiB)
  56.884 μs (1044 allocations: 46.86 KiB)
  83.894 μs (1879 allocations: 74.88 KiB)
(N, k) = (1000, 1000)
  153.230 μs (5551 allocations: 164.20 KiB)
  165.389 μs (5586 allocations: 171.16 KiB)
  301.295 μs (10952 allocations: 306.30 KiB)
(N, k) = (1000, 1000)
  152.689 μs (5521 allocations: 163.41 KiB)
  164.037 μs (5556 allocations: 170.30 KiB)
  298.169 μs (10892 allocations: 304.83 KiB)
(N, k) = (10000, 10)
  56.893 μs (294 allocations: 95.80 KiB)
  97.518 μs (324 allocations: 97.70 KiB)
  123.263 μs (439 allocations: 179.44 KiB)
(N, k) = (10000, 100)
  68.243 μs (1014 allocations: 118.06 KiB)
  109.575 μs (1044 allocations: 120.69 KiB)
  150.286 μs (1879 allocations: 222.53 KiB)
(N, k) = (10000, 1000)
  264.377 μs (9201 allocations: 323.62 KiB)
  314.388 μs (9236 allocations: 333.39 KiB)
  552.166 μs (18252 allocations: 619.52 KiB)
(N, k) = (10000, 10000)
  1.339 ms (62554 allocations: 1.61 MiB)
  1.416 ms (62590 allocations: 1.66 MiB)
  2.735 ms (124956 allocations: 3.10 MiB)
(N, k) = (1000000, 10)
  5.500 ms (304 allocations: 7.65 MiB)
  8.886 ms (334 allocations: 7.65 MiB)
  13.079 ms (459 allocations: 15.28 MiB)
(N, k) = (1000000, 100)
  5.923 ms (1114 allocations: 7.67 MiB)
  9.290 ms (1144 allocations: 7.67 MiB)
  18.235 ms (2079 allocations: 15.32 MiB)
(N, k) = (1000000, 1000)
  6.315 ms (9201 allocations: 7.94 MiB)
  9.693 ms (9236 allocations: 7.95 MiB)
  18.640 ms (18252 allocations: 15.85 MiB)
(N, k) = (1000000, 1000000)
  202.962 ms (6325574 allocations: 160.56 MiB)
  213.000 ms (6325610 allocations: 165.38 MiB)
  457.345 ms (12650996 allocations: 311.45 MiB)
(N, k) = (100000000, 10)
  1.235 s (304 allocations: 762.96 MiB)
  1.585 s (334 allocations: 762.96 MiB)
  2.874 s (459 allocations: 1.49 GiB)
(N, k) = (100000000, 100)
  1.157 s (1114 allocations: 762.98 MiB)
  1.503 s (1144 allocations: 762.98 MiB)
  3.926 s (2079 allocations: 1.49 GiB)
(N, k) = (100000000, 1000)
  1.274 s (10201 allocations: 763.20 MiB)
  1.617 s (10236 allocations: 763.21 MiB)
  4.295 s (20252 allocations: 1.49 GiB)
(N, k) = (100000000, 100000000)
  23.238 s (632119234 allocations: 15.67 GiB)
  25.330 s (632119270 allocations: 16.14 GiB)
  44.467 s (1264238316 allocations: 30.39 GiB)

# nl/threadedops2, 1 thread
(N, k) = (1000, 10)
  36.437 μs (275 allocations: 25.38 KiB)
  44.645 μs (304 allocations: 27.25 KiB)
  57.421 μs (399 allocations: 38.53 KiB)
(N, k) = (1000, 100)
  47.241 μs (544 allocations: 37.56 KiB)
  55.669 μs (574 allocations: 40.19 KiB)
  78.339 μs (939 allocations: 61.53 KiB)
(N, k) = (1000, 1000)
  104.285 μs (2159 allocations: 111.88 KiB)
  113.739 μs (2194 allocations: 118.83 KiB)
  192.765 μs (4168 allocations: 201.64 KiB)
(N, k) = (1000, 1000)
  103.111 μs (2150 allocations: 111.41 KiB)
  113.578 μs (2185 allocations: 118.30 KiB)
  191.765 μs (4150 allocations: 200.83 KiB)
(N, k) = (10000, 10)
  63.930 μs (274 allocations: 96.16 KiB)
  103.461 μs (304 allocations: 98.06 KiB)
  136.040 μs (399 allocations: 180.16 KiB)
(N, k) = (10000, 100)
  67.273 μs (544 allocations: 111.39 KiB)
  107.447 μs (574 allocations: 114.02 KiB)
  145.129 μs (939 allocations: 209.19 KiB)
(N, k) = (10000, 1000)
  176.896 μs (3254 allocations: 231.38 KiB)
  218.655 μs (3289 allocations: 241.14 KiB)
  359.632 μs (6358 allocations: 435.02 KiB)
(N, k) = (10000, 10000)
  751.112 μs (19262 allocations: 969.45 KiB)
  798.386 μs (19298 allocations: 1020.86 KiB)
  1.462 ms (38372 allocations: 1.79 MiB)
(N, k) = (1000000, 10)
  5.508 ms (284 allocations: 7.65 MiB)
  8.913 ms (314 allocations: 7.65 MiB)
  13.053 ms (419 allocations: 15.28 MiB)
(N, k) = (1000000, 100)
  5.877 ms (644 allocations: 7.66 MiB)
  9.230 ms (674 allocations: 7.66 MiB)
  17.750 ms (1139 allocations: 15.31 MiB)
(N, k) = (1000000, 1000)
  6.152 ms (3254 allocations: 7.85 MiB)
  9.551 ms (3289 allocations: 7.86 MiB)
  18.222 ms (6358 allocations: 15.67 MiB)
(N, k) = (1000000, 1000000)
  117.085 ms (1898168 allocations: 93.00 MiB)
  124.986 ms (1898204 allocations: 97.83 MiB)
  241.569 ms (3796184 allocations: 176.33 MiB)
(N, k) = (100000000, 10)
  1.237 s (284 allocations: 762.96 MiB)
  1.582 s (314 allocations: 762.96 MiB)
  2.857 s (419 allocations: 1.49 GiB)
(N, k) = (100000000, 100)
  1.175 s (644 allocations: 762.97 MiB)
  1.558 s (674 allocations: 762.97 MiB)
  3.895 s (1139 allocations: 1.49 GiB)
(N, k) = (100000000, 1000)
  1.226 s (4254 allocations: 763.11 MiB)
  1.570 s (4289 allocations: 763.12 MiB)
  4.215 s (8358 allocations: 1.49 GiB)
(N, k) = (100000000, 100000000)
  16.099 s (189636266 allocations: 9.07 GiB)
  18.843 s (189636302 allocations: 9.54 GiB)
  31.667 s (379272380 allocations: 17.21 GiB)

bkamins · 2020-12-28T13:43:28Z

I agree that it is better to statically split work. In rare cases one thread will get much more work, but I think most of the time it is good.

This is complex. Let me summarize the logic here:

we get a row for first group to preallocate columns to a proper eltype and size
each thread tries to fill its portion of the columns
if some thread needs to widen some column then it does so and signals other threads they need to sync and also copy data to the new columns (we avoid reallocation if the other widening was already enough)
at the end it is still possible that some threads have finished, but not copied the data so one last clean-up pass is required
finally we push the first row to the collection
We split columns in continuous chunks so we do not have a problem with false-sharing

So the invariant the threading code outside of the lock-ed part guarantees is that:

each thread writes to other areas of arrays - so no duplication may happen
we clearly know if we are writing to the "latest" version of the vector, and if not - we take care to copy the data to a correct vector the first time we notice the widening of the type.
Type widening will not happen often anyway (but we still need to make sure we correctly cover in tests different combinations of widenings in different threads)

src/groupeddataframe/complextransforms.jl

nalimilan · 2020-12-28T13:56:34Z

That's right. This also assumes that the order in which we call promote_type doesn't make a difference, i.e. that it's associative. That's not documented AFAIK but I can't think of examples where this fails. We commonly do this when using it with reduce.

bkamins · 2020-12-28T14:03:22Z

Yes - I assumed it is associative 😄.

Spawn one task per thread in `_combine_rows_with_first!` so that custom grouped operations that return a single row are run in parallel. This is optimal if operations take about the same time for all groups. Spawning one task per group could be faster if these times vary a lot, but the overhead would be larger: we could add this as an option later. The implementation is somewhat tricky as output columns need to be reallocated when a new return type is detected.

bkamins · 2021-01-21T08:36:46Z

CI fails

docs/src/man/split_apply_combine.md

bkamins · 2021-01-21T18:37:47Z

Do you have some comparable benchmarks of e.g.:

combine($gd, :y => y -> sum(y), :y => (y -> sum(y)) => :y2)

or

combine($gd, :y => y -> sum(y), :y => sum => :y2)

(and with more similar arguments) to see the performance?

If not I can run some

nalimilan · 2021-01-21T18:41:02Z

No I haven't tested that yet. I can try later if you don't beat me to it.

bkamins · 2021-01-21T18:50:00Z

A quck tests show that we get the benefits, but in normal situations everything is swamped by the compilation cost for a new anonymous function when doing it in an interactive mode.

bkamins · 2021-01-24T22:05:45Z

Can you please also add a manual section for threading (maybe even a separate page)? (I mean that threading is so important topic that it should be somewhere on top of the manual - not buried deep in the description of grouping operations and potentially in the future we will have more things implemented there; e.g. select et. al. can also use threading for data frame source)

src/groupeddataframe/complextransforms.jl

bkamins · 2021-01-24T22:22:08Z

src/groupeddataframe/complextransforms.jl

+    # This has lower overhead than creating one task per group,
+    # but is optimal only if operations take roughly the same time for all groups
+    @static if VERSION >= v"1.4"
+        basesize = max(1, (len - 1) ÷ Threads.nthreads())


this is incorrect. as div rounds down and we spawn one thread too many. Eg.:

julia> max(1, (20 - 1) ÷ 4) 4 julia> collect(Iterators.partition(2:20, 4)) 5-element Array{UnitRange{Int64},1}: 2:5 6:9 10:13 14:17 18:20

Woops. This probably slows down things significantly.

When you change this could you please report the benchmarks? Thank you.

src/groupeddataframe/complextransforms.jl

bkamins · 2021-01-24T22:27:43Z

src/groupeddataframe/complextransforms.jl

+    else
+        partitions = (2:len,)
+    end
+    widen_type_lock = ReentrantLock()


why do you create a new lock and not use the GroupedDataFrame lock? Is the other lock used in the part of code that does parallel processing of different transformations?

In theory AFAICT we don't have to take gd.lazy_lock, i.e. it should be possible to compute indices in parallel without conflicting with what we're doing here. But yeah, since the code here requires indices to have been computed, gd.lazy_lock will never be locked when we are here, so it doesn't make a difference and I can reuse gd.lazy_lock for simplicity.

After thinking about it actually it is OK to use a separate lock I think. The reason is that if you run 2 transformations that produce one row then you want separate locks for both of them.

Still - as commented below - it would be good to have a benchmark of something like:

combine($gd, :y => (y -> sum(y)) => :y1, :y => (y -> sum(y)) => :y2);

(so that we can see what happens to the performance when we run in parallel two transformations that themselves get run in parallel)

Ah yes that's probably why I used separate locks. :-)

I'll run more benchmarks after fixing the PR.

test/grouping.jl

bkamins · 2021-01-24T22:29:05Z

PR looks good - there is one problem I think (with the number of tasks spawned). The rest of the comments are minor.

nalimilan · 2021-01-29T10:54:42Z

I've updated the benchmarks. The PR seems consistently faster than main, except with 1000 rows and 10 groups, where some cases (but not all) regressed a lot in relative terms (but these operations are super fast so I'm not sure it matters). Even with only 1000 rows the PR is faster when the number of groups is large -- what's funny is that it's the case even with a single thread. Not sure what has been improved, maybe some type instability was fixed.

bkamins · 2021-01-31T20:18:04Z

Looks good. Thank you

I'll run more benchmarks after fixing the PR.

Just to be sure: apart from benchmarking have you run more correctness tests?

Also I think it would be good to add printing

@show Threads.nthreads()

or something smilar around https://github.com/JuliaData/DataFrames.jl/blob/main/test/runtests.jl#L11 (so that we do not only see a warning when threading is not tested, but also an information when threading actually is tested how many threads are used).

bkamins · 2021-02-06T14:43:46Z

What is left for this PR? (except maybe the corner case of threading with CategoricalArrays.jl we discussed recently - but I guess you concluded we do not need to fix it - right?)

nalimilan · 2021-02-06T17:33:47Z

Actually I thought it was merged already. :-D

I've added @show Threads.nthreads().

Regarding the CategoricalArrays issue, I had implemented a check to disable threading when input and output columns contained CategoricalArrays with different levels, but I wasn't able to find a way to trigger a corruption. So doing further checks I realized we are actually safe! The reason is that setindex!(::CategoricalArray, ::CategoricalValue, ...) always replaces the pool when adding new levels instead of updating it in place. This is actually not the most efficient implementation, since the pool could be kept when only adding levels at the end, but IIRC I decided to use that to make the copying behavior more predictable (remember that sometimes we add levels at the beginning, e.g. if assigned value has levels 1:100 but target array has levels 10:20, and since we add all levels at once it won't be repeated). EDIT: I should have mentioned that another necessary safety condition which is satisfied by this PR is that the only places where we read existing data is when copying after widening, and that operation is protected by a lock (otherwise one thread could have added levels at the front of the pool while the other is reading refs that haven't yet been updated to match the new levels).

Of course this implementation doesn't actually fix the thread safety issue in general, since assigning a value other than CategoricalValue can still update the pool in place (that's needed as replacing the pool for each new level would be terribly slow). But for the particular case of this PR, we only return a CategoricalArray when all return values are CategoricalValue: if another type is returned for one group, promotion will give another type and the result will be a Vector. That's good news.

What remains problematic is if CategoricalValue objects from more than two different pools are returned, as threads may try to add levels to the pool at the same time. But that's quite unlikely and it can be fixed by using locks in CategoricalArrays since adding levels is rare and expensive.

bkamins · 2021-02-06T17:37:29Z

OK - thank you. Will you add the lock you discuss in CategoricalArrays.jl? (probably we only need it if more than 1 thread is present)

nalimilan · 2021-02-07T13:39:13Z

Actually I spoke too soon. These issues are really tricky. Even if we replace the pool, we have to recode existing refs if a level is added at the front of the pool, and while one thread is going that, another one may continue to assign values based on the old pool. And even if it used the new pool values may get recoded even if they were already correct.

I've still not been able to trigger this kind of bug, but it's probably safer to disable multithreading when the output is CategoricalArray and the input columns contain at least two CategoricalArrays with different pools. That's not a very common use case for grouping anyway (I can't imagine a scenario where I would do that...).

bkamins · 2021-02-07T20:22:34Z

disable multithreading when the output is CategoricalArray

I agree - let us disable multithreading in this case.

nalimilan force-pushed the nl/threadedops2 branch 2 times, most recently from 1cc8880 to d2bef41 Compare December 27, 2020 14:31

bkamins reviewed Dec 28, 2020

View reviewed changes

src/groupeddataframe/complextransforms.jl Show resolved Hide resolved

nalimilan mentioned this pull request Jan 14, 2021

Enable multithreading with several operations in combine/select/transform #2574

Merged

Base automatically changed from nl/threadedops to main January 21, 2021 08:10

nalimilan added 3 commits January 21, 2021 09:13

Add tests, fix and simplify code

13f4f76

Drop tforeach

135512b

nalimilan force-pushed the nl/threadedops2 branch from 35eacfa to 135512b Compare January 21, 2021 08:18

nalimilan added 2 commits January 21, 2021 10:11

Fix failure, small cleanup

c5930a9

Docs

30a650c

bkamins reviewed Jan 21, 2021

View reviewed changes

docs/src/man/split_apply_combine.md Show resolved Hide resolved

Fixes

d94539f

nalimilan marked this pull request as ready for review January 22, 2021 21:30

bkamins reviewed Jan 24, 2021

View reviewed changes

src/groupeddataframe/complextransforms.jl Outdated Show resolved Hide resolved

bkamins reviewed Jan 24, 2021

View reviewed changes

src/groupeddataframe/complextransforms.jl Show resolved Hide resolved

bkamins reviewed Jan 24, 2021

View reviewed changes

test/grouping.jl Outdated Show resolved Hide resolved

nalimilan added 2 commits January 28, 2021 22:58

Review fixes

77a0d72

Better test

7c97723

bkamins approved these changes Jan 31, 2021

View reviewed changes

Print number of threads

7d26ba2

Add tests for CategoricalArrays thread safety

9f97dfb

Disable multithreading with CategoricalArrays with different levels

8b6fe9b

nalimilan mentioned this pull request Feb 10, 2021

Ensure thread safety where possible JuliaData/CategoricalArrays.jl#326

Open

nalimilan merged commit ecfc733 into main Feb 10, 2021

nalimilan deleted the nl/threadedops2 branch February 10, 2021 21:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multithreaded custom grouped operations with single-row result #2588

Multithreaded custom grouped operations with single-row result #2588

nalimilan commented Dec 27, 2020 •

edited

Loading

bkamins commented Dec 28, 2020

nalimilan commented Dec 28, 2020

bkamins commented Dec 28, 2020

bkamins commented Jan 21, 2021

bkamins commented Jan 21, 2021

nalimilan commented Jan 21, 2021

bkamins commented Jan 21, 2021

bkamins commented Jan 24, 2021 •

edited

Loading

bkamins Jan 24, 2021 •

edited

Loading

nalimilan Jan 25, 2021

bkamins Jan 25, 2021

bkamins Jan 24, 2021

nalimilan Jan 25, 2021

bkamins Jan 25, 2021

nalimilan Jan 25, 2021

bkamins commented Jan 24, 2021

nalimilan commented Jan 29, 2021

bkamins commented Jan 31, 2021

bkamins commented Feb 6, 2021

nalimilan commented Feb 6, 2021 •

edited

Loading

bkamins commented Feb 6, 2021

nalimilan commented Feb 7, 2021

bkamins commented Feb 7, 2021

Multithreaded custom grouped operations with single-row result #2588

Multithreaded custom grouped operations with single-row result #2588

Conversation

nalimilan commented Dec 27, 2020 • edited Loading

bkamins commented Dec 28, 2020

nalimilan commented Dec 28, 2020

bkamins commented Dec 28, 2020

bkamins commented Jan 21, 2021

bkamins commented Jan 21, 2021

nalimilan commented Jan 21, 2021

bkamins commented Jan 21, 2021

bkamins commented Jan 24, 2021 • edited Loading

bkamins Jan 24, 2021 • edited Loading

Choose a reason for hiding this comment

nalimilan Jan 25, 2021

Choose a reason for hiding this comment

bkamins Jan 25, 2021

Choose a reason for hiding this comment

bkamins Jan 24, 2021

Choose a reason for hiding this comment

nalimilan Jan 25, 2021

Choose a reason for hiding this comment

bkamins Jan 25, 2021

Choose a reason for hiding this comment

nalimilan Jan 25, 2021

Choose a reason for hiding this comment

bkamins commented Jan 24, 2021

nalimilan commented Jan 29, 2021

bkamins commented Jan 31, 2021

bkamins commented Feb 6, 2021

nalimilan commented Feb 6, 2021 • edited Loading

bkamins commented Feb 6, 2021

nalimilan commented Feb 7, 2021

bkamins commented Feb 7, 2021

nalimilan commented Dec 27, 2020 •

edited

Loading

bkamins commented Jan 24, 2021 •

edited

Loading

bkamins Jan 24, 2021 •

edited

Loading

nalimilan commented Feb 6, 2021 •

edited

Loading