-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More LoopVectorization tests & checks #57
Conversation
CI on Julia 1.4 picks CUDA v0.1.0 which fails:
Perhaps because it has SpecialFunctions v1.1.0, and KernelAbstractions v0.2.4? Testing locally on 1.4, the resolver picks CUDA v1.3.3 & it passes. It also has KernelAbstractions v0.4.5, and SpecialFunctions v0.10.3. Maybe adding |
I would say: merge #55 first, and then rebase this PR on master. |
I was going to suggest the reverse... but either will work ultimately. I am keen to keep tests on 1.4, I dropped 1.3 when I got tired of fighting the resolver. But 1.4 forces LoopVectorization 0.8, which it seems a bit soon to drop completely, given that this package can't bound the version actually used outside of tests. There are, however, still some test bugs to track down. The recent 1.4 pass here still has some tests disabled. There is also a surprising slowdown, somehow LV takes 2410.5 seconds on Julia 1.4 (LV 0.8), vs 6149.6 seconds on 1.5 (0.9), there aren't that many tests disabled. From JuliaSIMD/LoopVectorization.jl#171 I learn that this may be caused by coverage, I wonder if that can be run more selectively? |
LoopVectorization's tests are 99.9%+ compilation. That may be a Julia 1.4 vs 1.5 difference. I recall 1.4 being faster than 1.5 on the same LoopVectorization version. Running tests for both LoopVectorization 0.8.26 and 0.9.6 on Julia 1.5, I get (clipping off the gemm tests): #= /home/chriselrod/.julia/packages/LoopVectorization/pHMnJ/test/printmethods.jl:2 =# @__LINE__() = 2
2.300284 seconds (5.02 M allocations: 256.232 MiB, 2.01% gc time)
(Float64, #= /home/chriselrod/.julia/packages/LoopVectorization/pHMnJ/test/fallback.jl:4 =# @__LINE__()) = (Float64, 4)
6.730764 seconds (12.41 M allocations: 626.128 MiB, 3.69% gc time)
0.035563 seconds (69.39 k allocations: 3.810 MiB)
2.574874 seconds (8.74 M allocations: 446.962 MiB, 3.17% gc time)
0.517646 seconds (2.19 M allocations: 110.348 MiB, 3.07% gc time)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/pHMnJ/test/offsetarrays.jl:204 =# @__LINE__()) = (Float32, 204)
r = -1:1
r = -2:2
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/pHMnJ/test/offsetarrays.jl:204 =# @__LINE__()) = (Float64, 204)
r = -1:1
r = -2:2
186.278399 seconds (323.83 M allocations: 24.462 GiB, 5.23% gc time)
(Float64, #= /home/chriselrod/.julia/packages/LoopVectorization/pHMnJ/test/tensors.jl:51 =# @__LINE__()) = (Float64, 51)
5.293765 seconds (13.67 M allocations: 746.867 MiB, 5.85% gc time)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/pHMnJ/test/map.jl:4 =# @__LINE__()) = (Float32, 4)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/pHMnJ/test/map.jl:4 =# @__LINE__()) = (Float64, 4)
1.959536 seconds (7.39 M allocations: 376.356 MiB, 4.18% gc time)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/pHMnJ/test/filter.jl:4 =# @__LINE__()) = (Float32, 4)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/pHMnJ/test/filter.jl:4 =# @__LINE__()) = (Float64, 4)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/pHMnJ/test/filter.jl:4 =# @__LINE__()) = (Int32, 4)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/pHMnJ/test/filter.jl:4 =# @__LINE__()) = (Int64, 4)
0.306723 seconds (724.39 k allocations: 37.850 MiB, 2.75% gc time)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/pHMnJ/test/mapreduce.jl:19 =# @__LINE__()) = (Int32, 19)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/pHMnJ/test/mapreduce.jl:19 =# @__LINE__()) = (Int64, 19)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/pHMnJ/test/mapreduce.jl:19 =# @__LINE__()) = (Float32, 19)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/pHMnJ/test/mapreduce.jl:19 =# @__LINE__()) = (Float64, 19)
49.398566 seconds (448.45 M allocations: 29.106 GiB, 10.44% gc time)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/pHMnJ/test/ifelsemasks.jl:366 =# @__LINE__()) = (Float32, 366)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/pHMnJ/test/ifelsemasks.jl:366 =# @__LINE__()) = (Float64, 366)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/pHMnJ/test/ifelsemasks.jl:366 =# @__LINE__()) = (Int32, 366)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/pHMnJ/test/ifelsemasks.jl:366 =# @__LINE__()) = (Int64, 366)
20.996625 seconds (56.05 M allocations: 2.809 GiB, 8.57% gc time)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/pHMnJ/test/dot.jl:234 =# @__LINE__()) = (Float32, 234)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/pHMnJ/test/dot.jl:234 =# @__LINE__()) = (Float64, 234)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/pHMnJ/test/dot.jl:234 =# @__LINE__()) = (Int32, 234)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/pHMnJ/test/dot.jl:234 =# @__LINE__()) = (Int64, 234)
12.680781 seconds (44.60 M allocations: 2.271 GiB, 4.08% gc time)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/pHMnJ/test/special.jl:339 =# @__LINE__()) = (Float32, 339)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/pHMnJ/test/special.jl:339 =# @__LINE__()) = (Float64, 339)
4.091667 seconds (13.12 M allocations: 633.796 MiB, 2.57% gc time)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/pHMnJ/test/gemv.jl:211 =# @__LINE__()) = (Float32, 211)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/pHMnJ/test/gemv.jl:211 =# @__LINE__()) = (Float64, 211)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/pHMnJ/test/gemv.jl:211 =# @__LINE__()) = (Int32, 211)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/pHMnJ/test/gemv.jl:211 =# @__LINE__()) = (Int64, 211)
15.978444 seconds (52.62 M allocations: 2.395 GiB, 2.73% gc time)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/pHMnJ/test/miscellaneous.jl:789 =# @__LINE__()) = (Float32, 789)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/pHMnJ/test/miscellaneous.jl:789 =# @__LINE__()) = (Float64, 789)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/pHMnJ/test/miscellaneous.jl:1070 =# @__LINE__()) = (Float32, 1070)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/pHMnJ/test/miscellaneous.jl:1070 =# @__LINE__()) = (Float64, 1070)
30.148872 seconds (127.87 M allocations: 6.813 GiB, 8.33% gc time)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/pHMnJ/test/copy.jl:129 =# @__LINE__()) = (Float32, 129)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/pHMnJ/test/copy.jl:129 =# @__LINE__()) = (Float64, 129)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/pHMnJ/test/copy.jl:129 =# @__LINE__()) = (Int32, 129)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/pHMnJ/test/copy.jl:129 =# @__LINE__()) = (Int64, 129)
3.025664 seconds (9.11 M allocations: 447.853 MiB, 4.00% gc time)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/pHMnJ/test/broadcast.jl:5 =# @__LINE__()) = (Float32, 5)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/pHMnJ/test/broadcast.jl:5 =# @__LINE__()) = (Float64, 5)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/pHMnJ/test/broadcast.jl:5 =# @__LINE__()) = (Int32, 5)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/pHMnJ/test/broadcast.jl:5 =# @__LINE__()) = (Int64, 5)
98.177554 seconds (140.41 M allocations: 7.892 GiB, 4.08% gc time) 0.9.6: #= /home/chriselrod/.julia/packages/LoopVectorization/5Eosy/test/printmethods.jl:2 =# @__LINE__() = 2
2.346917 seconds (5.03 M allocations: 256.636 MiB, 2.05% gc time)
0.010064 seconds (9.74 k allocations: 628.211 KiB)
(Float64, #= /home/chriselrod/.julia/packages/LoopVectorization/5Eosy/test/fallback.jl:4 =# @__LINE__()) = (Float64, 4)
9.985701 seconds (19.52 M allocations: 987.222 MiB, 4.43% gc time)
0.032851 seconds (67.26 k allocations: 3.726 MiB)
2.540316 seconds (9.26 M allocations: 477.631 MiB, 3.24% gc time)
0.834916 seconds (3.25 M allocations: 166.502 MiB, 2.88% gc time)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/5Eosy/test/offsetarrays.jl:211 =# @__LINE__()) = (Float32, 211)
r = -1:1
r = -2:2
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/5Eosy/test/offsetarrays.jl:211 =# @__LINE__()) = (Float64, 211)
r = -1:1
r = -2:2
5.230486 seconds (16.37 M allocations: 794.435 MiB, 2.26% gc time)
(Float64, #= /home/chriselrod/.julia/packages/LoopVectorization/5Eosy/test/tensors.jl:51 =# @__LINE__()) = (Float64, 51)
5.337574 seconds (16.69 M allocations: 905.243 MiB, 9.21% gc time)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/5Eosy/test/map.jl:4 =# @__LINE__()) = (Float32, 4)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/5Eosy/test/map.jl:4 =# @__LINE__()) = (Float64, 4)
2.939322 seconds (10.46 M allocations: 519.786 MiB, 4.44% gc time)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/5Eosy/test/filter.jl:4 =# @__LINE__()) = (Float32, 4)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/5Eosy/test/filter.jl:4 =# @__LINE__()) = (Float64, 4)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/5Eosy/test/filter.jl:4 =# @__LINE__()) = (Int32, 4)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/5Eosy/test/filter.jl:4 =# @__LINE__()) = (Int64, 4)
0.476314 seconds (1.19 M allocations: 62.302 MiB, 1.93% gc time)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/5Eosy/test/mapreduce.jl:19 =# @__LINE__()) = (Int32, 19)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/5Eosy/test/mapreduce.jl:19 =# @__LINE__()) = (Int64, 19)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/5Eosy/test/mapreduce.jl:19 =# @__LINE__()) = (Float32, 19)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/5Eosy/test/mapreduce.jl:19 =# @__LINE__()) = (Float64, 19)
49.636117 seconds (485.14 M allocations: 30.130 GiB, 10.02% gc time)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/5Eosy/test/ifelsemasks.jl:366 =# @__LINE__()) = (Float32, 366)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/5Eosy/test/ifelsemasks.jl:366 =# @__LINE__()) = (Float64, 366)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/5Eosy/test/ifelsemasks.jl:366 =# @__LINE__()) = (Int32, 366)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/5Eosy/test/ifelsemasks.jl:366 =# @__LINE__()) = (Int64, 366)
20.505279 seconds (60.46 M allocations: 3.053 GiB, 9.45% gc time)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/5Eosy/test/dot.jl:234 =# @__LINE__()) = (Float32, 234)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/5Eosy/test/dot.jl:234 =# @__LINE__()) = (Float64, 234)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/5Eosy/test/dot.jl:234 =# @__LINE__()) = (Int32, 234)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/5Eosy/test/dot.jl:234 =# @__LINE__()) = (Int64, 234)
13.784316 seconds (50.99 M allocations: 2.600 GiB, 5.45% gc time)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/5Eosy/test/special.jl:339 =# @__LINE__()) = (Float32, 339)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/5Eosy/test/special.jl:339 =# @__LINE__()) = (Float64, 339)
4.632350 seconds (15.96 M allocations: 765.493 MiB, 3.58% gc time)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/5Eosy/test/gemv.jl:211 =# @__LINE__()) = (Float32, 211)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/5Eosy/test/gemv.jl:211 =# @__LINE__()) = (Float64, 211)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/5Eosy/test/gemv.jl:211 =# @__LINE__()) = (Int32, 211)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/5Eosy/test/gemv.jl:211 =# @__LINE__()) = (Int64, 211)
16.232091 seconds (53.92 M allocations: 2.471 GiB, 3.38% gc time)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/5Eosy/test/miscellaneous.jl:792 =# @__LINE__()) = (Float32, 792)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/5Eosy/test/miscellaneous.jl:792 =# @__LINE__()) = (Float64, 792)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/5Eosy/test/miscellaneous.jl:1075 =# @__LINE__()) = (Float32, 1075)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/5Eosy/test/miscellaneous.jl:1075 =# @__LINE__()) = (Float64, 1075)
33.904497 seconds (138.12 M allocations: 7.276 GiB, 8.18% gc time)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/5Eosy/test/copy.jl:129 =# @__LINE__()) = (Float32, 129)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/5Eosy/test/copy.jl:129 =# @__LINE__()) = (Float64, 129)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/5Eosy/test/copy.jl:129 =# @__LINE__()) = (Int32, 129)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/5Eosy/test/copy.jl:129 =# @__LINE__()) = (Int64, 129)
2.892922 seconds (9.38 M allocations: 471.123 MiB, 4.23% gc time)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/5Eosy/test/broadcast.jl:8 =# @__LINE__()) = (Float32, 8)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/5Eosy/test/broadcast.jl:8 =# @__LINE__()) = (Float64, 8)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/5Eosy/test/broadcast.jl:8 =# @__LINE__()) = (Int32, 8)
(T, #= /home/chriselrod/.julia/packages/LoopVectorization/5Eosy/test/broadcast.jl:8 =# @__LINE__()) = (Int64, 8)
101.429469 seconds (151.64 M allocations: 8.432 GiB, 3.62% gc time) Gemm total, and test total: 179.480013 seconds (388.27 M allocations: 23.585 GiB, 5.36% gc time)
Test Summary: | Pass Total
LoopVectorization.jl | 1724 1724
620.346936 seconds (1.66 G allocations: 103.007 GiB, 5.61% gc time)
Testing LoopVectorization tests passed 0.9.6: 174.757699 seconds (417.31 M allocations: 25.917 GiB, 5.65% gc time)
Test Summary: | Pass Total
LoopVectorization.jl | 20429 20429
447.869647 seconds (1.47 G allocations: 85.253 GiB, 5.85% gc time)
Testing LoopVectorization tests passed Overall, locally, the tests are almost 3 minutes (25%) faster. So, I think regressions are Julia-related. |
Thanks for digging, as always. Am I right to think that these local tests don't include coverage? And, is there or should there be a Julia issue about such regressions? They could be inevitable consequence of progress elsewhere, but could also be bugs. |
Yes, it was without coverage. There's another package making a lot of use of DifferentialEquations.jl and ForwardDiff.jl that hit a pretty severe regression in compile times in some benchmarks on Julia 1.6, that some Julia core folks are looking into. I'm not aware of an issue, but it probably wouldn't hurt. Couldn't find any documentation about it anywhere, so I'm not sure what to do from here. EDIT: While building Julia from source, it keeps dumping these summaries now. So, I guess it'll do that automatically. 640.984369 seconds (379.33 M allocations: 25.899 GiB, 1.58% gc time, 99.55% compilation time)
Test Summary: | Pass Total
LoopVectorization.jl | 20429 20429
1572.714641 seconds (1.61 G allocations: 107.668 GiB, 2.26% gc time, 99.16% compilation time)
ROOT : 0.07 % 3378911825
GC : 2.26 % 106785484285
LOWERING : 2.82 % 133262113044
PARSING : 0.01 % 445595991
INFERENCE : 5.07 % 239589938800
CODEGEN : 75.02 % 3544299937574
METHOD_LOOKUP_SLOW : 0.01 % 393193036
METHOD_LOOKUP_FAST : 0.46 % 21748429238
LLVM_OPT : 10.95 % 517271539743
LLVM_MODULE_FINISH : 0.11 % 5067346249
METHOD_MATCH : 0.30 % 14070103600
TYPE_CACHE_LOOKUP : 0.78 % 36937850996
TYPE_CACHE_INSERT : 0.00 % 85812602
STAGED_FUNCTION : 1.28 % 60412449084
MACRO_INVOCATION : 0.00 % 167520658
AST_COMPRESS : 0.37 % 17331975652
AST_UNCOMPRESS : 0.30 % 14386285566
SYSIMG_LOAD : 0.00 % 155666605
ADD_METHOD : 0.01 % 677747888
LOAD_MODULE : 0.00 % 149572157
INIT_MODULE : 0.00 % 12006711
Testing LoopVectorization tests passed So most of the time is spent in codegen. I'm not really sure what that means. Also, I discovered a regression where |
src/macro.jl
Outdated
@@ -494,7 +494,8 @@ padmodclamp_pair(A, inds, store) = begin | |||
elseif ex.args[1] == :pad && length(ex.args) >= 2 | |||
i = ex.args[2] | |||
if !all(==(0), ex.args[3:end]) || length(ex.args) == 2 | |||
push!(nopadif, :($i ∈ $axes($A,$d))) | |||
# push!(nopadif, :($i ∈ $axes($A,$d))) | |||
push!(nopadif, :($i >= first(axes($A,$d))), :($i <= Base.last(axes($A,$d)))) # allows avx? Weirdly, deleting "Base." causes errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weirdly, deleting "Base." causes errors
That's odd, wouldn't mind an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do, I was trying to isolate it a bit, but so far the simple ones all work just fine!
FYI, on Julia 1.5, |
Previous CI run passed, will this one?
Let's call this done.
|
This is part II of #53.
Closes #77, closes #75, closes #72.