Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Use Tullio for pairwise distances #386

Draft
wants to merge 23 commits into
base: master
Choose a base branch
from
Draft

[WIP] Use Tullio for pairwise distances #386

wants to merge 23 commits into from

Conversation

theogf
Copy link
Member

@theogf theogf commented Oct 19, 2021

Summary
We have a long-time problem for binary operations like DotProduct not satisfying the requirements of the Distances.jl framework (not a proper metric). Additionally, Distances.jl is very incompatible with GPU operations (see JuliaStats/Distances.jl#143 and JuliaStats/Distances.jl#137).
Using Tullio.jl should solve both these problems. Some quick benchmarks shows that Tullio is both faster and more GPU-able than Distances.jl

There is a longer discussion about this PR in #380
This should also close #98 and replace #194

Proposed changes

  • Tullio is used for computing all pairwise and colwise
  • The only thing used from Distances.jl are their types (we stop using Distances.pairwise).
  • Adding special implementations of pairwise for ColVecs and RowVecs when possible to improve speed (and GPU compatibility)
  • Create a AbstractBinaryOp abstract type for objects like DotProduct and Delta and combine them with Distances using BinaryOp = Union{AbstractBinaryOp,Distances.PreMetric}.

What alternatives have you considered?
Dropping Distances.jl operations anyway but without Tullio but Tullio shows it's faster.

Breaking changes

  • It's technically not breaking since the API does not change, only the internal computations change

src/utils.jl Outdated Show resolved Hide resolved
@willtebbutt
Copy link
Member

This is cool but I would definitely want to see more extensive benchmarking before definitely adopting this -- I wonder whether Tullio's performance drops off for quite large problems? Either way, I feel like we need some graphs -- maybe a good thing to add to the benchmarking day list?

src/distances/euclidean.jl Outdated Show resolved Hide resolved
src/distances/euclidean.jl Outdated Show resolved Hide resolved
src/distances/euclidean.jl Outdated Show resolved Hide resolved
src/distances/euclidean.jl Outdated Show resolved Hide resolved
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@theogf theogf mentioned this pull request Oct 19, 2021
@theogf
Copy link
Member Author

theogf commented Oct 21, 2021

Tests are failing because of the AD
Looking at some of the errors one of the first problem is that Tullio automatically get differentiation rules when creating a rule and uses DiffRules.jl. This works well most of the time, however on function like:

function colwise(d::BinaryOp, x::AbstractVector, y::AbstractVector)
    return @tullio out[i] := d(x[i], y[i])
end

the differentiation given d is not clear. More generally using functors does not make things easy...
In this example, adding verbose=true returns

┌ Warning: symbolic gradient failed
│   err = "no diffrule found for function d(_, _)."
└ @ Tullio ~/.julia/packages/Tullio/qPZkO/src/macro.jl:1264

We could use the evaluate(::Metric, x, y) instead and create @define_diffrule for some of the metrics/binary op but this only solve the problem partially

@devmotion
Copy link
Member

You can disable the symbolic differentiation with grad=false. It is also possible to compute derivatives with ForwardDiff instead of DiffRules. From the README:

The macro also tries to provide a gradient for use with Tracker or (via ChainRules) for Zygote, Yota, etc. (Disable with grad=false, or nograd=A.) This is done in one of two ways:
-By default it takes a symbolic derivative of the right hand side expression. This works for reductions over + or min/max. The functions as typed must be known, mostly from DiffRules.
-The option grad=Dual uses instead ForwardDiff to differentiate the right hand side (only for reductions over +). This allows for more complicated expressions.

I think evaluate is quite cumbersome (it just calls the functor version in Distances by the way) and we should use d(x, y) instead.

@codecov
Copy link

codecov bot commented Oct 26, 2021

Codecov Report

Merging #386 (5f2b08b) into master (f9bbd84) will decrease coverage by 61.19%.
The diff coverage is 19.67%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master     #386       +/-   ##
===========================================
- Coverage   93.09%   31.89%   -61.20%     
===========================================
  Files          52       53        +1     
  Lines        1202     1182       -20     
===========================================
- Hits         1119      377      -742     
- Misses         83      805      +722     
Impacted Files Coverage Δ
src/KernelFunctions.jl 100.00% <ø> (ø)
src/distances/delta.jl 0.00% <ø> (-100.00%) ⬇️
src/distances/dotproduct.jl 0.00% <0.00%> (-90.00%) ⬇️
src/utils.jl 63.63% <0.00%> (-26.51%) ⬇️
src/distances/euclidean.jl 26.66% <26.66%> (ø)
src/distances/pairwise.jl 33.33% <30.00%> (-66.67%) ⬇️
src/distances/sinus.jl 18.18% <100.00%> (-63.64%) ⬇️
src/basekernels/sm.jl 0.00% <0.00%> (-100.00%) ⬇️
src/basekernels/gabor.jl 0.00% <0.00%> (-100.00%) ⬇️
src/kernels/kernelsum.jl 0.00% <0.00%> (-100.00%) ⬇️
... and 32 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f9bbd84...5f2b08b. Read the comment docs.

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
src/distances/euclidean.jl Outdated Show resolved Hide resolved
src/distances/euclidean.jl Outdated Show resolved Hide resolved
src/distances/euclidean.jl Outdated Show resolved Hide resolved
theogf and others added 3 commits October 26, 2021 13:44
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@theogf
Copy link
Member Author

theogf commented Oct 28, 2021

As far as I can tell, there is no concrete solution with Tullio for this problem.
Two options I see would be:
In any case write colwise and pairwise explicitly for as many Metric as possible (Euclidean, SqEuclidean, SqMahalanobis) and our own operations (Delta, DotProduct), then:

  • Write a default rrule bypassing Tullio's implementation and pass a more standard method:
rrule(::typeof(pairwise), metric, x, y)
    return val, pull = pullback( , x, y) do x, y
       metric.(x, permutedims(y))
   end
end
  • Use the standard Distances.jl implementation for colwise/pairwise and hope for the best.

@devmotion
Copy link
Member

devmotion commented Oct 28, 2021

@devmotion So applying grad=false does not solve the problem, cause one still hit this line when using Zygote : https://github.com/mcabbott/Tullio.jl/blob/93278c6bf0441382fde9c52fedac8dc41e3e4648/src/eval.jl#L52 nograd also is not useful because it ignores completely the arguments given to it

I think this could be fixed by replacing https://github.com/mcabbott/Tullio.jl/blob/93278c6bf0441382fde9c52fedac8dc41e3e4648/src/eval.jl#L49-L58 with

function ChainRulesCore.rrule(ev::Eval, args...)
    Z = ev.fwd(args...)
    function tullio_back(Δ)
        dxs = map(ev.rev(Δ, Z, args...)) do dx
            dx === nothing ? ChainRulesCore.ZeroTangent() : dx
        end
        return (ChainRulesCore.ZeroTangent(), dxs...)
    end
    return Z, tullio_back
end

# without gradient definition
ChainRulesCore.@opt_out rrule(ev::Eval{<:Any,Nothing}, args...)

@devmotion
Copy link
Member

Even if it does not fix our specific use case here, I think it deserves a PR. I'll make one later today.

@devmotion
Copy link
Member

Hmm this works but it does not really help. The current error is gone but since Zygote can't differentiate mutating functions it can't differentiate through it. It's helpful though for AD systems that support mutation 🤷

@github-actions
Copy link
Contributor

github-actions bot commented Nov 4, 2021

Benchmark result

Judge result

Benchmark Report for /home/runner/work/KernelFunctions.jl/KernelFunctions.jl

Job Properties

  • Time of benchmarks:
    • Target: 4 Nov 2021 - 13:31
    • Baseline: 4 Nov 2021 - 13:32
  • Package commits:
    • Target: cc7237
    • Baseline: 33d64d
  • Julia commits:
    • Target: ae8452
    • Baseline: ae8452
  • Julia command flags:
    • Target: None
    • Baseline: None
  • Environment variables:
    • Target: None
    • Baseline: None

Results

A ratio greater than 1.0 denotes a possible regression (marked with ❌), while a ratio less
than 1.0 denotes a possible improvement (marked with ✅). Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).

ID time ratio memory ratio
["Exponential", "ColVecs", "kernelmatrixX"] 1.00 (5%) 0.97 (1%) ✅
["Exponential", "ColVecs", "kernelmatrixXY"] 1.15 (5%) ❌ 0.93 (1%) ✅
["Exponential", "ColVecs", "kernelmatrix_diagX"] 1.12 (5%) ❌ 1.00 (1%)
["Exponential", "ColVecs", "kernelmatrix_diagXY"] 1.15 (5%) ❌ 1.20 (1%) ❌
["Exponential", "RowVecs", "kernelmatrixX"] 1.28 (5%) ❌ 0.48 (1%) ✅
["Exponential", "RowVecs", "kernelmatrixXY"] 1.01 (5%) 0.31 (1%) ✅
["Exponential", "RowVecs", "kernelmatrix_diagXY"] 1.37 (5%) ❌ 1.30 (1%) ❌
["Exponential", "Vecs", "kernelmatrixXY"] 0.90 (5%) ✅ 1.00 (1%)
["Exponential", "Vecs", "kernelmatrix_diagXY"] 0.83 (5%) ✅ 1.13 (1%) ❌
["SqExponential", "ColVecs", "kernelmatrixX"] 0.88 (5%) ✅ 0.97 (1%) ✅
["SqExponential", "ColVecs", "kernelmatrixXY"] 0.98 (5%) 0.93 (1%) ✅
["SqExponential", "ColVecs", "kernelmatrix_diagX"] 0.91 (5%) ✅ 1.00 (1%)
["SqExponential", "ColVecs", "kernelmatrix_diagXY"] 1.02 (5%) 1.13 (1%) ❌
["SqExponential", "RowVecs", "kernelmatrixX"] 1.05 (5%) 0.48 (1%) ✅
["SqExponential", "RowVecs", "kernelmatrixXY"] 1.12 (5%) ❌ 0.31 (1%) ✅
["SqExponential", "RowVecs", "kernelmatrix_diagXY"] 1.22 (5%) ❌ 1.20 (1%) ❌
["SqExponential", "Vecs", "kernelmatrixX"] 0.82 (5%) ✅ 0.99 (1%)
["SqExponential", "Vecs", "kernelmatrixXY"] 0.89 (5%) ✅ 0.99 (1%)
["SqExponential", "Vecs", "kernelmatrix_diagXY"] 0.76 (5%) ✅ 1.07 (1%) ❌

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["Exponential", "ColVecs"]
  • ["Exponential", "RowVecs"]
  • ["Exponential", "Vecs"]
  • ["SqExponential", "ColVecs"]
  • ["SqExponential", "RowVecs"]
  • ["SqExponential", "Vecs"]

Julia versioninfo

Target

Julia Version 1.6.3
Commit ae8452a9e0 (2021-09-23 17:34 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.3 LTS
  uname: Linux 5.11.0-1020-azure #21~20.04.1-Ubuntu SMP Mon Oct 11 18:54:28 UTC 2021 x86_64 x86_64
  CPU: Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz: 
              speed         user         nice          sys         idle          irq
       #1  2294 MHz       1714 s          2 s        177 s        828 s          0 s
       #2  2294 MHz       1041 s          1 s        152 s       1560 s          0 s
       
  Memory: 6.788990020751953 GB (3398.17578125 MB free)
  Uptime: 280.0 sec
  Load Avg:  1.07  0.9  0.43
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, broadwell)

Baseline

Julia Version 1.6.3
Commit ae8452a9e0 (2021-09-23 17:34 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.3 LTS
  uname: Linux 5.11.0-1020-azure #21~20.04.1-Ubuntu SMP Mon Oct 11 18:54:28 UTC 2021 x86_64 x86_64
  CPU: Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz: 
              speed         user         nice          sys         idle          irq
       #1  2294 MHz       1797 s          2 s        183 s       1589 s          0 s
       #2  2294 MHz       1803 s          1 s        163 s       1639 s          0 s
       
  Memory: 6.788990020751953 GB (3362.47265625 MB free)
  Uptime: 366.0 sec
  Load Avg:  1.03  0.95  0.49
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, broadwell)

Target result

Benchmark Report for /home/runner/work/KernelFunctions.jl/KernelFunctions.jl

Job Properties

  • Time of benchmark: 4 Nov 2021 - 13:31
  • Package commit: cc7237
  • Julia commit: ae8452
  • Julia command flags: None
  • Environment variables: None

Results

Below is a table of this job's results, obtained by running the benchmarks.
The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to
index into the BaseBenchmarks suite to retrieve the corresponding benchmarks.
The percentages accompanying time and memory values in the below table are noise tolerances. The "true"
time/memory value for a given benchmark is expected to fall within this percentage of the reported value.
An empty cell means that the value was zero.

ID time GC time memory allocations
["Exponential", "ColVecs", "kernelmatrixX"] 7.075 μs (5%) 6.50 KiB (1%) 2
["Exponential", "ColVecs", "kernelmatrixXY"] 7.475 μs (5%) 6.50 KiB (1%) 2
["Exponential", "ColVecs", "kernelmatrix_diagX"] 210.179 ns (5%) 480 bytes (1%) 2
["Exponential", "ColVecs", "kernelmatrix_diagXY"] 435.176 ns (5%) 576 bytes (1%) 6
["Exponential", "RowVecs", "kernelmatrixX"] 3.178 μs (5%) 1.75 KiB (1%) 2
["Exponential", "RowVecs", "kernelmatrixXY"] 2.933 μs (5%) 1.75 KiB (1%) 2
["Exponential", "RowVecs", "kernelmatrix_diagX"] 129.431 ns (5%) 320 bytes (1%) 2
["Exponential", "RowVecs", "kernelmatrix_diagXY"] 414.141 ns (5%) 416 bytes (1%) 6
["Exponential", "Vecs", "kernelmatrixX"] 8.300 μs (5%) 6.56 KiB (1%) 4
["Exponential", "Vecs", "kernelmatrixXY"] 7.733 μs (5%) 6.56 KiB (1%) 4
["Exponential", "Vecs", "kernelmatrix_diagX"] 199.202 ns (5%) 480 bytes (1%) 2
["Exponential", "Vecs", "kernelmatrix_diagXY"] 425.773 ns (5%) 544 bytes (1%) 4
["SqExponential", "ColVecs", "kernelmatrixX"] 6.420 μs (5%) 6.50 KiB (1%) 2
["SqExponential", "ColVecs", "kernelmatrixXY"] 6.400 μs (5%) 6.50 KiB (1%) 2
["SqExponential", "ColVecs", "kernelmatrix_diagX"] 185.609 ns (5%) 480 bytes (1%) 2
["SqExponential", "ColVecs", "kernelmatrix_diagXY"] 374.000 ns (5%) 544 bytes (1%) 5
["SqExponential", "RowVecs", "kernelmatrixX"] 2.544 μs (5%) 1.75 KiB (1%) 2
["SqExponential", "RowVecs", "kernelmatrixXY"] 3.033 μs (5%) 1.75 KiB (1%) 2
["SqExponential", "RowVecs", "kernelmatrix_diagX"] 131.376 ns (5%) 320 bytes (1%) 2
["SqExponential", "RowVecs", "kernelmatrix_diagXY"] 364.322 ns (5%) 384 bytes (1%) 5
["SqExponential", "Vecs", "kernelmatrixX"] 6.825 μs (5%) 6.53 KiB (1%) 3
["SqExponential", "Vecs", "kernelmatrixXY"] 7.025 μs (5%) 6.53 KiB (1%) 3
["SqExponential", "Vecs", "kernelmatrix_diagX"] 195.385 ns (5%) 480 bytes (1%) 2
["SqExponential", "Vecs", "kernelmatrix_diagXY"] 382.915 ns (5%) 512 bytes (1%) 3

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["Exponential", "ColVecs"]
  • ["Exponential", "RowVecs"]
  • ["Exponential", "Vecs"]
  • ["SqExponential", "ColVecs"]
  • ["SqExponential", "RowVecs"]
  • ["SqExponential", "Vecs"]

Julia versioninfo

Julia Version 1.6.3
Commit ae8452a9e0 (2021-09-23 17:34 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.3 LTS
  uname: Linux 5.11.0-1020-azure #21~20.04.1-Ubuntu SMP Mon Oct 11 18:54:28 UTC 2021 x86_64 x86_64
  CPU: Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz: 
              speed         user         nice          sys         idle          irq
       #1  2294 MHz       1714 s          2 s        177 s        828 s          0 s
       #2  2294 MHz       1041 s          1 s        152 s       1560 s          0 s
       
  Memory: 6.788990020751953 GB (3398.17578125 MB free)
  Uptime: 280.0 sec
  Load Avg:  1.07  0.9  0.43
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, broadwell)

Baseline result

Benchmark Report for /home/runner/work/KernelFunctions.jl/KernelFunctions.jl

Job Properties

  • Time of benchmark: 4 Nov 2021 - 13:32
  • Package commit: 33d64d
  • Julia commit: ae8452
  • Julia command flags: None
  • Environment variables: None

Results

Below is a table of this job's results, obtained by running the benchmarks.
The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to
index into the BaseBenchmarks suite to retrieve the corresponding benchmarks.
The percentages accompanying time and memory values in the below table are noise tolerances. The "true"
time/memory value for a given benchmark is expected to fall within this percentage of the reported value.
An empty cell means that the value was zero.

ID time GC time memory allocations
["Exponential", "ColVecs", "kernelmatrixX"] 7.050 μs (5%) 6.73 KiB (1%) 3
["Exponential", "ColVecs", "kernelmatrixXY"] 6.475 μs (5%) 6.97 KiB (1%) 4
["Exponential", "ColVecs", "kernelmatrix_diagX"] 188.036 ns (5%) 480 bytes (1%) 2
["Exponential", "ColVecs", "kernelmatrix_diagXY"] 379.392 ns (5%) 480 bytes (1%) 2
["Exponential", "RowVecs", "kernelmatrixX"] 2.478 μs (5%) 3.67 KiB (1%) 4
["Exponential", "RowVecs", "kernelmatrixXY"] 2.900 μs (5%) 5.59 KiB (1%) 6
["Exponential", "RowVecs", "kernelmatrix_diagX"] 134.396 ns (5%) 320 bytes (1%) 2
["Exponential", "RowVecs", "kernelmatrix_diagXY"] 302.525 ns (5%) 320 bytes (1%) 2
["Exponential", "Vecs", "kernelmatrixX"] 8.433 μs (5%) 6.59 KiB (1%) 4
["Exponential", "Vecs", "kernelmatrixXY"] 8.567 μs (5%) 6.59 KiB (1%) 4
["Exponential", "Vecs", "kernelmatrix_diagX"] 198.004 ns (5%) 480 bytes (1%) 2
["Exponential", "Vecs", "kernelmatrix_diagXY"] 513.918 ns (5%) 480 bytes (1%) 2
["SqExponential", "ColVecs", "kernelmatrixX"] 7.260 μs (5%) 6.73 KiB (1%) 3
["SqExponential", "ColVecs", "kernelmatrixXY"] 6.560 μs (5%) 6.97 KiB (1%) 4
["SqExponential", "ColVecs", "kernelmatrix_diagX"] 203.965 ns (5%) 480 bytes (1%) 2
["SqExponential", "ColVecs", "kernelmatrix_diagXY"] 366.500 ns (5%) 480 bytes (1%) 2
["SqExponential", "RowVecs", "kernelmatrixX"] 2.433 μs (5%) 3.67 KiB (1%) 4
["SqExponential", "RowVecs", "kernelmatrixXY"] 2.700 μs (5%) 5.59 KiB (1%) 6
["SqExponential", "RowVecs", "kernelmatrix_diagX"] 136.342 ns (5%) 320 bytes (1%) 2
["SqExponential", "RowVecs", "kernelmatrix_diagXY"] 297.487 ns (5%) 320 bytes (1%) 2
["SqExponential", "Vecs", "kernelmatrixX"] 8.325 μs (5%) 6.59 KiB (1%) 4
["SqExponential", "Vecs", "kernelmatrixXY"] 7.850 μs (5%) 6.59 KiB (1%) 4
["SqExponential", "Vecs", "kernelmatrix_diagX"] 189.423 ns (5%) 480 bytes (1%) 2
["SqExponential", "Vecs", "kernelmatrix_diagXY"] 505.528 ns (5%) 480 bytes (1%) 2

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["Exponential", "ColVecs"]
  • ["Exponential", "RowVecs"]
  • ["Exponential", "Vecs"]
  • ["SqExponential", "ColVecs"]
  • ["SqExponential", "RowVecs"]
  • ["SqExponential", "Vecs"]

Julia versioninfo

Julia Version 1.6.3
Commit ae8452a9e0 (2021-09-23 17:34 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.3 LTS
  uname: Linux 5.11.0-1020-azure #21~20.04.1-Ubuntu SMP Mon Oct 11 18:54:28 UTC 2021 x86_64 x86_64
  CPU: Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz: 
              speed         user         nice          sys         idle          irq
       #1  2294 MHz       1797 s          2 s        183 s       1589 s          0 s
       #2  2294 MHz       1803 s          1 s        163 s       1639 s          0 s
       
  Memory: 6.788990020751953 GB (3362.47265625 MB free)
  Uptime: 366.0 sec
  Load Avg:  1.03  0.95  0.49
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, broadwell)

Runtime information

Runtime Info
BLAS #threads 2
BLAS.vendor() openblas64
Sys.CPU_THREADS 2

lscpu output:

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          2
On-line CPU(s) list:             0,1
Thread(s) per core:              1
Core(s) per socket:              2
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           79
Model name:                      Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
Stepping:                        1
CPU MHz:                         2294.687
BogoMIPS:                        4589.37
Hypervisor vendor:               Microsoft
Virtualization type:             full
L1d cache:                       64 KiB
L1i cache:                       64 KiB
L2 cache:                        512 KiB
L3 cache:                        50 MiB
NUMA node0 CPU(s):               0,1
Vulnerability Itlb multihit:     KVM: Mitigation: VMX unsupported
Vulnerability L1tf:              Mitigation; PTE Inversion
Vulnerability Mds:               Mitigation; Clear CPU buffers; SMT Host state unknown
Vulnerability Meltdown:          Mitigation; PTI
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Full generic retpoline, STIBP disabled, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Mitigation; Clear CPU buffers; SMT Host state unknown
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt md_clear
Cpu Property Value
Brand Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
Vendor :Intel
Architecture :Broadwell
Model Family: 0x06, Model: 0x4f, Stepping: 0x01, Type: 0x00
Cores 2 physical cores, 2 logical cores (on executing CPU)
No Hyperthreading hardware capability detected
Clock Frequencies Not supported by CPU
Data Cache Level 1:3 : (32, 256, 51200) kbytes
64 byte cache line size
Address Size 48 bits virtual, 46 bits physical
SIMD 256 bit = 32 byte max. SIMD vector size
Time Stamp Counter TSC is accessible via rdtsc
TSC increased at every clock cycle (non-invariant TSC)
Perf. Monitoring Performance Monitoring Counters (PMC) are not supported
Hypervisor Yes, Microsoft

@github-actions
Copy link
Contributor

Benchmark result

Judge result

Benchmark Report for /home/runner/work/KernelFunctions.jl/KernelFunctions.jl

Job Properties

  • Time of benchmarks:
    • Target: 12 Jan 2022 - 13:41
    • Baseline: 12 Jan 2022 - 13:42
  • Package commits:
    • Target: 1ddebb
    • Baseline: 93d33c
  • Julia commits:
    • Target: 905826
    • Baseline: 905826
  • Julia command flags:
    • Target: None
    • Baseline: None
  • Environment variables:
    • Target: None
    • Baseline: None

Results

A ratio greater than 1.0 denotes a possible regression (marked with ❌), while a ratio less
than 1.0 denotes a possible improvement (marked with ✅). Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).

ID time ratio memory ratio
["Exponential", "ColVecs", "kernelmatrixX"] 1.04 (5%) 0.97 (1%) ✅
["Exponential", "ColVecs", "kernelmatrixXY"] 0.98 (5%) 0.93 (1%) ✅
["Exponential", "ColVecs", "kernelmatrix_diagX"] 1.13 (5%) ❌ 1.00 (1%)
["Exponential", "ColVecs", "kernelmatrix_diagXY"] 1.23 (5%) ❌ 1.20 (1%) ❌
["Exponential", "RowVecs", "kernelmatrixX"] 1.26 (5%) ❌ 0.48 (1%) ✅
["Exponential", "RowVecs", "kernelmatrixXY"] 1.15 (5%) ❌ 0.31 (1%) ✅
["Exponential", "RowVecs", "kernelmatrix_diagXY"] 1.26 (5%) ❌ 1.30 (1%) ❌
["Exponential", "Vecs", "kernelmatrixX"] 0.91 (5%) ✅ 1.00 (1%)
["Exponential", "Vecs", "kernelmatrixXY"] 0.93 (5%) ✅ 1.00 (1%)
["Exponential", "Vecs", "kernelmatrix_diagXY"] 0.84 (5%) ✅ 1.13 (1%) ❌
["SqExponential", "ColVecs", "kernelmatrixX"] 0.99 (5%) 0.97 (1%) ✅
["SqExponential", "ColVecs", "kernelmatrixXY"] 1.05 (5%) ❌ 0.93 (1%) ✅
["SqExponential", "ColVecs", "kernelmatrix_diagXY"] 1.21 (5%) ❌ 1.13 (1%) ❌
["SqExponential", "RowVecs", "kernelmatrixX"] 1.19 (5%) ❌ 0.48 (1%) ✅
["SqExponential", "RowVecs", "kernelmatrixXY"] 1.00 (5%) 0.31 (1%) ✅
["SqExponential", "RowVecs", "kernelmatrix_diagXY"] 1.24 (5%) ❌ 1.20 (1%) ❌
["SqExponential", "Vecs", "kernelmatrixX"] 0.93 (5%) ✅ 0.99 (1%)
["SqExponential", "Vecs", "kernelmatrix_diagXY"] 0.84 (5%) ✅ 1.07 (1%) ❌

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["Exponential", "ColVecs"]
  • ["Exponential", "RowVecs"]
  • ["Exponential", "Vecs"]
  • ["SqExponential", "ColVecs"]
  • ["SqExponential", "RowVecs"]
  • ["SqExponential", "Vecs"]

Julia versioninfo

Target

Julia Version 1.6.5
Commit 9058264a69 (2021-12-19 12:30 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.3 LTS
  uname: Linux 5.11.0-1022-azure #23~20.04.1-Ubuntu SMP Fri Nov 19 10:20:52 UTC 2021 x86_64 x86_64
  CPU: Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz: 
              speed         user         nice          sys         idle          irq
       #1  2294 MHz       1763 s          1 s        164 s        683 s          0 s
       #2  2294 MHz        787 s          1 s        148 s       1679 s          0 s
       
  Memory: 6.788982391357422 GB (3591.40234375 MB free)
  Uptime: 267.7 sec
  Load Avg:  1.04  0.74  0.34
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, broadwell)

Baseline

Julia Version 1.6.5
Commit 9058264a69 (2021-12-19 12:30 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.3 LTS
  uname: Linux 5.11.0-1022-azure #23~20.04.1-Ubuntu SMP Fri Nov 19 10:20:52 UTC 2021 x86_64 x86_64
  CPU: Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz: 
              speed         user         nice          sys         idle          irq
       #1  2294 MHz       2514 s          1 s        175 s        710 s          0 s
       #2  2294 MHz        819 s          1 s        152 s       2430 s          0 s
       
  Memory: 6.788982391357422 GB (3443.5234375 MB free)
  Uptime: 346.7 sec
  Load Avg:  1.01  0.81  0.39
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, broadwell)

Target result

Benchmark Report for /home/runner/work/KernelFunctions.jl/KernelFunctions.jl

Job Properties

  • Time of benchmark: 12 Jan 2022 - 13:41
  • Package commit: 1ddebb
  • Julia commit: 905826
  • Julia command flags: None
  • Environment variables: None

Results

Below is a table of this job's results, obtained by running the benchmarks.
The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to
index into the BaseBenchmarks suite to retrieve the corresponding benchmarks.
The percentages accompanying time and memory values in the below table are noise tolerances. The "true"
time/memory value for a given benchmark is expected to fall within this percentage of the reported value.
An empty cell means that the value was zero.

ID time GC time memory allocations
["Exponential", "ColVecs", "kernelmatrixX"] 7.125 μs (5%) 6.50 KiB (1%) 2
["Exponential", "ColVecs", "kernelmatrixXY"] 6.800 μs (5%) 6.50 KiB (1%) 2
["Exponential", "ColVecs", "kernelmatrix_diagX"] 223.093 ns (5%) 480 bytes (1%) 2
["Exponential", "ColVecs", "kernelmatrix_diagXY"] 467.010 ns (5%) 576 bytes (1%) 6
["Exponential", "RowVecs", "kernelmatrixX"] 3.038 μs (5%) 1.75 KiB (1%) 2
["Exponential", "RowVecs", "kernelmatrixXY"] 3.150 μs (5%) 1.75 KiB (1%) 2
["Exponential", "RowVecs", "kernelmatrix_diagX"] 129.006 ns (5%) 320 bytes (1%) 2
["Exponential", "RowVecs", "kernelmatrix_diagXY"] 407.071 ns (5%) 416 bytes (1%) 6
["Exponential", "Vecs", "kernelmatrixX"] 7.833 μs (5%) 6.56 KiB (1%) 4
["Exponential", "Vecs", "kernelmatrixXY"] 7.767 μs (5%) 6.56 KiB (1%) 4
["Exponential", "Vecs", "kernelmatrix_diagX"] 201.321 ns (5%) 480 bytes (1%) 2
["Exponential", "Vecs", "kernelmatrix_diagXY"] 455.897 ns (5%) 544 bytes (1%) 4
["SqExponential", "ColVecs", "kernelmatrixX"] 6.750 μs (5%) 6.50 KiB (1%) 2
["SqExponential", "ColVecs", "kernelmatrixXY"] 6.950 μs (5%) 6.50 KiB (1%) 2
["SqExponential", "ColVecs", "kernelmatrix_diagX"] 209.302 ns (5%) 480 bytes (1%) 2
["SqExponential", "ColVecs", "kernelmatrix_diagXY"] 423.121 ns (5%) 544 bytes (1%) 5
["SqExponential", "RowVecs", "kernelmatrixX"] 2.844 μs (5%) 1.75 KiB (1%) 2
["SqExponential", "RowVecs", "kernelmatrixXY"] 2.811 μs (5%) 1.75 KiB (1%) 2
["SqExponential", "RowVecs", "kernelmatrix_diagX"] 133.914 ns (5%) 320 bytes (1%) 2
["SqExponential", "RowVecs", "kernelmatrix_diagXY"] 376.005 ns (5%) 384 bytes (1%) 5
["SqExponential", "Vecs", "kernelmatrixX"] 7.200 μs (5%) 6.53 KiB (1%) 3
["SqExponential", "Vecs", "kernelmatrixXY"] 7.100 μs (5%) 6.53 KiB (1%) 3
["SqExponential", "Vecs", "kernelmatrix_diagX"] 208.772 ns (5%) 480 bytes (1%) 2
["SqExponential", "Vecs", "kernelmatrix_diagXY"] 422.111 ns (5%) 512 bytes (1%) 3

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["Exponential", "ColVecs"]
  • ["Exponential", "RowVecs"]
  • ["Exponential", "Vecs"]
  • ["SqExponential", "ColVecs"]
  • ["SqExponential", "RowVecs"]
  • ["SqExponential", "Vecs"]

Julia versioninfo

Julia Version 1.6.5
Commit 9058264a69 (2021-12-19 12:30 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.3 LTS
  uname: Linux 5.11.0-1022-azure #23~20.04.1-Ubuntu SMP Fri Nov 19 10:20:52 UTC 2021 x86_64 x86_64
  CPU: Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz: 
              speed         user         nice          sys         idle          irq
       #1  2294 MHz       1763 s          1 s        164 s        683 s          0 s
       #2  2294 MHz        787 s          1 s        148 s       1679 s          0 s
       
  Memory: 6.788982391357422 GB (3591.40234375 MB free)
  Uptime: 267.7 sec
  Load Avg:  1.04  0.74  0.34
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, broadwell)

Baseline result

Benchmark Report for /home/runner/work/KernelFunctions.jl/KernelFunctions.jl

Job Properties

  • Time of benchmark: 12 Jan 2022 - 13:42
  • Package commit: 93d33c
  • Julia commit: 905826
  • Julia command flags: None
  • Environment variables: None

Results

Below is a table of this job's results, obtained by running the benchmarks.
The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to
index into the BaseBenchmarks suite to retrieve the corresponding benchmarks.
The percentages accompanying time and memory values in the below table are noise tolerances. The "true"
time/memory value for a given benchmark is expected to fall within this percentage of the reported value.
An empty cell means that the value was zero.

ID time GC time memory allocations
["Exponential", "ColVecs", "kernelmatrixX"] 6.875 μs (5%) 6.73 KiB (1%) 3
["Exponential", "ColVecs", "kernelmatrixXY"] 6.925 μs (5%) 6.97 KiB (1%) 4
["Exponential", "ColVecs", "kernelmatrix_diagX"] 198.184 ns (5%) 480 bytes (1%) 2
["Exponential", "ColVecs", "kernelmatrix_diagXY"] 378.680 ns (5%) 480 bytes (1%) 2
["Exponential", "RowVecs", "kernelmatrixX"] 2.413 μs (5%) 3.67 KiB (1%) 4
["Exponential", "RowVecs", "kernelmatrixXY"] 2.737 μs (5%) 5.59 KiB (1%) 6
["Exponential", "RowVecs", "kernelmatrix_diagX"] 133.684 ns (5%) 320 bytes (1%) 2
["Exponential", "RowVecs", "kernelmatrix_diagXY"] 322.222 ns (5%) 320 bytes (1%) 2
["Exponential", "Vecs", "kernelmatrixX"] 8.567 μs (5%) 6.59 KiB (1%) 4
["Exponential", "Vecs", "kernelmatrixXY"] 8.367 μs (5%) 6.59 KiB (1%) 4
["Exponential", "Vecs", "kernelmatrix_diagX"] 205.716 ns (5%) 480 bytes (1%) 2
["Exponential", "Vecs", "kernelmatrix_diagXY"] 544.103 ns (5%) 480 bytes (1%) 2
["SqExponential", "ColVecs", "kernelmatrixX"] 6.825 μs (5%) 6.73 KiB (1%) 3
["SqExponential", "ColVecs", "kernelmatrixXY"] 6.600 μs (5%) 6.97 KiB (1%) 4
["SqExponential", "ColVecs", "kernelmatrix_diagX"] 200.777 ns (5%) 480 bytes (1%) 2
["SqExponential", "ColVecs", "kernelmatrix_diagXY"] 350.251 ns (5%) 480 bytes (1%) 2
["SqExponential", "RowVecs", "kernelmatrixX"] 2.400 μs (5%) 3.67 KiB (1%) 4
["SqExponential", "RowVecs", "kernelmatrixXY"] 2.800 μs (5%) 5.59 KiB (1%) 6
["SqExponential", "RowVecs", "kernelmatrix_diagX"] 133.683 ns (5%) 320 bytes (1%) 2
["SqExponential", "RowVecs", "kernelmatrix_diagXY"] 303.000 ns (5%) 320 bytes (1%) 2
["SqExponential", "Vecs", "kernelmatrixX"] 7.775 μs (5%) 6.59 KiB (1%) 4
["SqExponential", "Vecs", "kernelmatrixXY"] 7.425 μs (5%) 6.59 KiB (1%) 4
["SqExponential", "Vecs", "kernelmatrix_diagX"] 208.598 ns (5%) 480 bytes (1%) 2
["SqExponential", "Vecs", "kernelmatrix_diagXY"] 500.005 ns (5%) 480 bytes (1%) 2

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["Exponential", "ColVecs"]
  • ["Exponential", "RowVecs"]
  • ["Exponential", "Vecs"]
  • ["SqExponential", "ColVecs"]
  • ["SqExponential", "RowVecs"]
  • ["SqExponential", "Vecs"]

Julia versioninfo

Julia Version 1.6.5
Commit 9058264a69 (2021-12-19 12:30 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.3 LTS
  uname: Linux 5.11.0-1022-azure #23~20.04.1-Ubuntu SMP Fri Nov 19 10:20:52 UTC 2021 x86_64 x86_64
  CPU: Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz: 
              speed         user         nice          sys         idle          irq
       #1  2294 MHz       2514 s          1 s        175 s        710 s          0 s
       #2  2294 MHz        819 s          1 s        152 s       2430 s          0 s
       
  Memory: 6.788982391357422 GB (3443.5234375 MB free)
  Uptime: 346.7 sec
  Load Avg:  1.01  0.81  0.39
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, broadwell)

Runtime information

Runtime Info
BLAS #threads 2
BLAS.vendor() openblas64
Sys.CPU_THREADS 2

lscpu output:

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          2
On-line CPU(s) list:             0,1
Thread(s) per core:              1
Core(s) per socket:              2
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           79
Model name:                      Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
Stepping:                        1
CPU MHz:                         2294.686
BogoMIPS:                        4589.37
Hypervisor vendor:               Microsoft
Virtualization type:             full
L1d cache:                       64 KiB
L1i cache:                       64 KiB
L2 cache:                        512 KiB
L3 cache:                        50 MiB
NUMA node0 CPU(s):               0,1
Vulnerability Itlb multihit:     KVM: Mitigation: VMX unsupported
Vulnerability L1tf:              Mitigation; PTE Inversion
Vulnerability Mds:               Mitigation; Clear CPU buffers; SMT Host state unknown
Vulnerability Meltdown:          Mitigation; PTI
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Full generic retpoline, STIBP disabled, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Mitigation; Clear CPU buffers; SMT Host state unknown
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt md_clear
Cpu Property Value
Brand Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
Vendor :Intel
Architecture :Broadwell
Model Family: 0x06, Model: 0x4f, Stepping: 0x01, Type: 0x00
Cores 2 physical cores, 2 logical cores (on executing CPU)
No Hyperthreading hardware capability detected
Clock Frequencies Not supported by CPU
Data Cache Level 1:3 : (32, 256, 51200) kbytes
64 byte cache line size
Address Size 48 bits virtual, 46 bits physical
SIMD 256 bit = 32 byte max. SIMD vector size
Time Stamp Counter TSC is accessible via rdtsc
TSC increased at every clock cycle (non-invariant TSC)
Perf. Monitoring Performance Monitoring Counters (PMC) are not supported
Hypervisor Yes, Microsoft

src/distances/euclidean.jl Outdated Show resolved Hide resolved
src/distances/euclidean.jl Outdated Show resolved Hide resolved
@github-actions
Copy link
Contributor

Benchmark result

Judge result

Benchmark Report for /home/runner/work/KernelFunctions.jl/KernelFunctions.jl

Job Properties

  • Time of benchmarks:
    • Target: 13 Jan 2022 - 18:29
    • Baseline: 13 Jan 2022 - 18:30
  • Package commits:
    • Target: a8417d
    • Baseline: 93d33c
  • Julia commits:
    • Target: 905826
    • Baseline: 905826
  • Julia command flags:
    • Target: None
    • Baseline: None
  • Environment variables:
    • Target: None
    • Baseline: None

Results

A ratio greater than 1.0 denotes a possible regression (marked with ❌), while a ratio less
than 1.0 denotes a possible improvement (marked with ✅). Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).

ID time ratio memory ratio
["Exponential", "ColVecs", "kernelmatrixX"] 1.23 (5%) ❌ 0.97 (1%) ✅
["Exponential", "ColVecs", "kernelmatrixXY"] 1.24 (5%) ❌ 0.93 (1%) ✅
["Exponential", "ColVecs", "kernelmatrix_diagXY"] 1.17 (5%) ❌ 1.20 (1%) ❌
["Exponential", "RowVecs", "kernelmatrixX"] 1.40 (5%) ❌ 0.48 (1%) ✅
["Exponential", "RowVecs", "kernelmatrixXY"] 1.23 (5%) ❌ 0.31 (1%) ✅
["Exponential", "RowVecs", "kernelmatrix_diagXY"] 1.22 (5%) ❌ 1.30 (1%) ❌
["Exponential", "Vecs", "kernelmatrix_diagXY"] 0.86 (5%) ✅ 1.13 (1%) ❌
["SqExponential", "ColVecs", "kernelmatrixX"] 1.06 (5%) ❌ 0.97 (1%) ✅
["SqExponential", "ColVecs", "kernelmatrixXY"] 1.18 (5%) ❌ 0.93 (1%) ✅
["SqExponential", "ColVecs", "kernelmatrix_diagXY"] 1.07 (5%) ❌ 1.13 (1%) ❌
["SqExponential", "RowVecs", "kernelmatrixX"] 1.23 (5%) ❌ 0.48 (1%) ✅
["SqExponential", "RowVecs", "kernelmatrixXY"] 1.15 (5%) ❌ 0.31 (1%) ✅
["SqExponential", "RowVecs", "kernelmatrix_diagXY"] 1.25 (5%) ❌ 1.20 (1%) ❌
["SqExponential", "Vecs", "kernelmatrixX"] 0.95 (5%) ✅ 0.99 (1%)
["SqExponential", "Vecs", "kernelmatrix_diagXY"] 0.81 (5%) ✅ 1.07 (1%) ❌

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["Exponential", "ColVecs"]
  • ["Exponential", "RowVecs"]
  • ["Exponential", "Vecs"]
  • ["SqExponential", "ColVecs"]
  • ["SqExponential", "RowVecs"]
  • ["SqExponential", "Vecs"]

Julia versioninfo

Target

Julia Version 1.6.5
Commit 9058264a69 (2021-12-19 12:30 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.3 LTS
  uname: Linux 5.11.0-1025-azure #27~20.04.1-Ubuntu SMP Fri Jan 7 15:02:06 UTC 2022 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz: 
              speed         user         nice          sys         idle          irq
       #1  2593 MHz       1493 s          0 s        127 s       1272 s          0 s
       #2  2593 MHz        523 s          0 s        111 s       2261 s          0 s
       
  Memory: 6.788974761962891 GB (3641.1796875 MB free)
  Uptime: 292.99 sec
  Load Avg:  1.0  0.6  0.26
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake-avx512)

Baseline

Julia Version 1.6.5
Commit 9058264a69 (2021-12-19 12:30 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.3 LTS
  uname: Linux 5.11.0-1025-azure #27~20.04.1-Ubuntu SMP Fri Jan 7 15:02:06 UTC 2022 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz: 
              speed         user         nice          sys         idle          irq
       #1  2593 MHz       2019 s          0 s        136 s       1369 s          0 s
       #2  2593 MHz        624 s          0 s        117 s       2784 s          0 s
       
  Memory: 6.788974761962891 GB (3555.3984375 MB free)
  Uptime: 356.14 sec
  Load Avg:  1.0  0.68  0.31
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake-avx512)

Target result

Benchmark Report for /home/runner/work/KernelFunctions.jl/KernelFunctions.jl

Job Properties

  • Time of benchmark: 13 Jan 2022 - 18:29
  • Package commit: a8417d
  • Julia commit: 905826
  • Julia command flags: None
  • Environment variables: None

Results

Below is a table of this job's results, obtained by running the benchmarks.
The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to
index into the BaseBenchmarks suite to retrieve the corresponding benchmarks.
The percentages accompanying time and memory values in the below table are noise tolerances. The "true"
time/memory value for a given benchmark is expected to fall within this percentage of the reported value.
An empty cell means that the value was zero.

ID time GC time memory allocations
["Exponential", "ColVecs", "kernelmatrixX"] 6.760 μs (5%) 6.50 KiB (1%) 2
["Exponential", "ColVecs", "kernelmatrixXY"] 6.740 μs (5%) 6.50 KiB (1%) 2
["Exponential", "ColVecs", "kernelmatrix_diagX"] 220.004 ns (5%) 480 bytes (1%) 2
["Exponential", "ColVecs", "kernelmatrix_diagXY"] 450.510 ns (5%) 576 bytes (1%) 6
["Exponential", "RowVecs", "kernelmatrixX"] 2.767 μs (5%) 1.75 KiB (1%) 2
["Exponential", "RowVecs", "kernelmatrixXY"] 2.767 μs (5%) 1.75 KiB (1%) 2
["Exponential", "RowVecs", "kernelmatrix_diagX"] 142.078 ns (5%) 320 bytes (1%) 2
["Exponential", "RowVecs", "kernelmatrix_diagXY"] 390.557 ns (5%) 416 bytes (1%) 6
["Exponential", "Vecs", "kernelmatrixX"] 7.834 μs (5%) 6.56 KiB (1%) 4
["Exponential", "Vecs", "kernelmatrixXY"] 7.850 μs (5%) 6.56 KiB (1%) 4
["Exponential", "Vecs", "kernelmatrix_diagX"] 220.557 ns (5%) 480 bytes (1%) 2
["Exponential", "Vecs", "kernelmatrix_diagXY"] 471.434 ns (5%) 544 bytes (1%) 4
["SqExponential", "ColVecs", "kernelmatrixX"] 6.040 μs (5%) 6.50 KiB (1%) 2
["SqExponential", "ColVecs", "kernelmatrixXY"] 6.180 μs (5%) 6.50 KiB (1%) 2
["SqExponential", "ColVecs", "kernelmatrix_diagX"] 220.367 ns (5%) 480 bytes (1%) 2
["SqExponential", "ColVecs", "kernelmatrix_diagXY"] 433.673 ns (5%) 544 bytes (1%) 5
["SqExponential", "RowVecs", "kernelmatrixX"] 2.522 μs (5%) 1.75 KiB (1%) 2
["SqExponential", "RowVecs", "kernelmatrixXY"] 2.567 μs (5%) 1.75 KiB (1%) 2
["SqExponential", "RowVecs", "kernelmatrix_diagX"] 143.230 ns (5%) 320 bytes (1%) 2
["SqExponential", "RowVecs", "kernelmatrix_diagXY"] 366.837 ns (5%) 384 bytes (1%) 5
["SqExponential", "Vecs", "kernelmatrixX"] 7.050 μs (5%) 6.53 KiB (1%) 3
["SqExponential", "Vecs", "kernelmatrixXY"] 7.175 μs (5%) 6.53 KiB (1%) 3
["SqExponential", "Vecs", "kernelmatrix_diagX"] 220.160 ns (5%) 480 bytes (1%) 2
["SqExponential", "Vecs", "kernelmatrix_diagXY"] 415.588 ns (5%) 512 bytes (1%) 3

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["Exponential", "ColVecs"]
  • ["Exponential", "RowVecs"]
  • ["Exponential", "Vecs"]
  • ["SqExponential", "ColVecs"]
  • ["SqExponential", "RowVecs"]
  • ["SqExponential", "Vecs"]

Julia versioninfo

Julia Version 1.6.5
Commit 9058264a69 (2021-12-19 12:30 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.3 LTS
  uname: Linux 5.11.0-1025-azure #27~20.04.1-Ubuntu SMP Fri Jan 7 15:02:06 UTC 2022 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz: 
              speed         user         nice          sys         idle          irq
       #1  2593 MHz       1493 s          0 s        127 s       1272 s          0 s
       #2  2593 MHz        523 s          0 s        111 s       2261 s          0 s
       
  Memory: 6.788974761962891 GB (3641.1796875 MB free)
  Uptime: 292.99 sec
  Load Avg:  1.0  0.6  0.26
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake-avx512)

Baseline result

Benchmark Report for /home/runner/work/KernelFunctions.jl/KernelFunctions.jl

Job Properties

  • Time of benchmark: 13 Jan 2022 - 18:30
  • Package commit: 93d33c
  • Julia commit: 905826
  • Julia command flags: None
  • Environment variables: None

Results

Below is a table of this job's results, obtained by running the benchmarks.
The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to
index into the BaseBenchmarks suite to retrieve the corresponding benchmarks.
The percentages accompanying time and memory values in the below table are noise tolerances. The "true"
time/memory value for a given benchmark is expected to fall within this percentage of the reported value.
An empty cell means that the value was zero.

ID time GC time memory allocations
["Exponential", "ColVecs", "kernelmatrixX"] 5.500 μs (5%) 6.73 KiB (1%) 3
["Exponential", "ColVecs", "kernelmatrixXY"] 5.420 μs (5%) 6.97 KiB (1%) 4
["Exponential", "ColVecs", "kernelmatrix_diagX"] 221.513 ns (5%) 480 bytes (1%) 2
["Exponential", "ColVecs", "kernelmatrix_diagXY"] 384.354 ns (5%) 480 bytes (1%) 2
["Exponential", "RowVecs", "kernelmatrixX"] 1.978 μs (5%) 3.67 KiB (1%) 4
["Exponential", "RowVecs", "kernelmatrixXY"] 2.245 μs (5%) 5.59 KiB (1%) 6
["Exponential", "RowVecs", "kernelmatrix_diagX"] 145.923 ns (5%) 320 bytes (1%) 2
["Exponential", "RowVecs", "kernelmatrix_diagXY"] 320.900 ns (5%) 320 bytes (1%) 2
["Exponential", "Vecs", "kernelmatrixX"] 7.933 μs (5%) 6.59 KiB (1%) 4
["Exponential", "Vecs", "kernelmatrixXY"] 7.950 μs (5%) 6.59 KiB (1%) 4
["Exponential", "Vecs", "kernelmatrix_diagX"] 225.103 ns (5%) 480 bytes (1%) 2
["Exponential", "Vecs", "kernelmatrix_diagXY"] 551.031 ns (5%) 480 bytes (1%) 2
["SqExponential", "ColVecs", "kernelmatrixX"] 5.680 μs (5%) 6.73 KiB (1%) 3
["SqExponential", "ColVecs", "kernelmatrixXY"] 5.220 μs (5%) 6.97 KiB (1%) 4
["SqExponential", "ColVecs", "kernelmatrix_diagX"] 220.972 ns (5%) 480 bytes (1%) 2
["SqExponential", "ColVecs", "kernelmatrix_diagXY"] 405.035 ns (5%) 480 bytes (1%) 2
["SqExponential", "RowVecs", "kernelmatrixX"] 2.044 μs (5%) 3.67 KiB (1%) 4
["SqExponential", "RowVecs", "kernelmatrixXY"] 2.222 μs (5%) 5.59 KiB (1%) 6
["SqExponential", "RowVecs", "kernelmatrix_diagX"] 143.936 ns (5%) 320 bytes (1%) 2
["SqExponential", "RowVecs", "kernelmatrix_diagXY"] 293.274 ns (5%) 320 bytes (1%) 2
["SqExponential", "Vecs", "kernelmatrixX"] 7.450 μs (5%) 6.59 KiB (1%) 4
["SqExponential", "Vecs", "kernelmatrixXY"] 7.425 μs (5%) 6.59 KiB (1%) 4
["SqExponential", "Vecs", "kernelmatrix_diagX"] 221.726 ns (5%) 480 bytes (1%) 2
["SqExponential", "Vecs", "kernelmatrix_diagXY"] 512.065 ns (5%) 480 bytes (1%) 2

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["Exponential", "ColVecs"]
  • ["Exponential", "RowVecs"]
  • ["Exponential", "Vecs"]
  • ["SqExponential", "ColVecs"]
  • ["SqExponential", "RowVecs"]
  • ["SqExponential", "Vecs"]

Julia versioninfo

Julia Version 1.6.5
Commit 9058264a69 (2021-12-19 12:30 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.3 LTS
  uname: Linux 5.11.0-1025-azure #27~20.04.1-Ubuntu SMP Fri Jan 7 15:02:06 UTC 2022 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz: 
              speed         user         nice          sys         idle          irq
       #1  2593 MHz       2019 s          0 s        136 s       1369 s          0 s
       #2  2593 MHz        624 s          0 s        117 s       2784 s          0 s
       
  Memory: 6.788974761962891 GB (3555.3984375 MB free)
  Uptime: 356.14 sec
  Load Avg:  1.0  0.68  0.31
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake-avx512)

Runtime information

Runtime Info
BLAS #threads 2
BLAS.vendor() openblas64
Sys.CPU_THREADS 2

lscpu output:

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          2
On-line CPU(s) list:             0,1
Thread(s) per core:              1
Core(s) per socket:              2
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           85
Model name:                      Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Stepping:                        7
CPU MHz:                         2593.906
BogoMIPS:                        5187.81
Hypervisor vendor:               Microsoft
Virtualization type:             full
L1d cache:                       64 KiB
L1i cache:                       64 KiB
L2 cache:                        2 MiB
L3 cache:                        35.8 MiB
NUMA node0 CPU(s):               0,1
Vulnerability Itlb multihit:     KVM: Mitigation: VMX unsupported
Vulnerability L1tf:              Mitigation; PTE Inversion
Vulnerability Mds:               Mitigation; Clear CPU buffers; SMT Host state unknown
Vulnerability Meltdown:          Mitigation; PTI
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Full generic retpoline, STIBP disabled, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Mitigation; Clear CPU buffers; SMT Host state unknown
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt avx512cd avx512bw avx512vl xsaveopt xsavec xsaves md_clear
Cpu Property Value
Brand Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Vendor :Intel
Architecture :Skylake
Model Family: 0x06, Model: 0x55, Stepping: 0x07, Type: 0x00
Cores 2 physical cores, 2 logical cores (on executing CPU)
No Hyperthreading hardware capability detected
Clock Frequencies Not supported by CPU
Data Cache Level 1:3 : (32, 1024, 36608) kbytes
64 byte cache line size
Address Size 48 bits virtual, 46 bits physical
SIMD 512 bit = 64 byte max. SIMD vector size
Time Stamp Counter TSC is accessible via rdtsc
TSC increased at every clock cycle (non-invariant TSC)
Perf. Monitoring Performance Monitoring Counters (PMC) are not supported
Hypervisor Yes, Microsoft

@github-actions
Copy link
Contributor

Benchmark result

Judge result

Benchmark Report for /home/runner/work/KernelFunctions.jl/KernelFunctions.jl

Job Properties

  • Time of benchmarks:
    • Target: 13 Jan 2022 - 18:38
    • Baseline: 13 Jan 2022 - 18:38
  • Package commits:
    • Target: ec1863
    • Baseline: 93d33c
  • Julia commits:
    • Target: 905826
    • Baseline: 905826
  • Julia command flags:
    • Target: None
    • Baseline: None
  • Environment variables:
    • Target: None
    • Baseline: None

Results

A ratio greater than 1.0 denotes a possible regression (marked with ❌), while a ratio less
than 1.0 denotes a possible improvement (marked with ✅). Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).

ID time ratio memory ratio
["Exponential", "ColVecs", "kernelmatrixX"] 1.20 (5%) ❌ 0.97 (1%) ✅
["Exponential", "ColVecs", "kernelmatrixXY"] 1.27 (5%) ❌ 0.93 (1%) ✅
["Exponential", "ColVecs", "kernelmatrix_diagXY"] 1.18 (5%) ❌ 1.20 (1%) ❌
["Exponential", "RowVecs", "kernelmatrixX"] 1.34 (5%) ❌ 0.48 (1%) ✅
["Exponential", "RowVecs", "kernelmatrixXY"] 1.19 (5%) ❌ 0.31 (1%) ✅
["Exponential", "RowVecs", "kernelmatrix_diagXY"] 1.21 (5%) ❌ 1.30 (1%) ❌
["Exponential", "Vecs", "kernelmatrixX"] 0.88 (5%) ✅ 1.00 (1%)
["Exponential", "Vecs", "kernelmatrixXY"] 0.86 (5%) ✅ 1.00 (1%)
["Exponential", "Vecs", "kernelmatrix_diagXY"] 0.82 (5%) ✅ 1.13 (1%) ❌
["SqExponential", "ColVecs", "kernelmatrixX"] 1.14 (5%) ❌ 0.97 (1%) ✅
["SqExponential", "ColVecs", "kernelmatrixXY"] 1.19 (5%) ❌ 0.93 (1%) ✅
["SqExponential", "ColVecs", "kernelmatrix_diagX"] 0.92 (5%) ✅ 1.00 (1%)
["SqExponential", "ColVecs", "kernelmatrix_diagXY"] 1.10 (5%) ❌ 1.13 (1%) ❌
["SqExponential", "RowVecs", "kernelmatrixX"] 1.25 (5%) ❌ 0.48 (1%) ✅
["SqExponential", "RowVecs", "kernelmatrixXY"] 1.11 (5%) ❌ 0.31 (1%) ✅
["SqExponential", "RowVecs", "kernelmatrix_diagX"] 0.94 (5%) ✅ 1.00 (1%)
["SqExponential", "RowVecs", "kernelmatrix_diagXY"] 1.21 (5%) ❌ 1.20 (1%) ❌
["SqExponential", "Vecs", "kernelmatrixX"] 0.83 (5%) ✅ 0.99 (1%)
["SqExponential", "Vecs", "kernelmatrixXY"] 0.83 (5%) ✅ 0.99 (1%)
["SqExponential", "Vecs", "kernelmatrix_diagX"] 0.90 (5%) ✅ 1.00 (1%)
["SqExponential", "Vecs", "kernelmatrix_diagXY"] 0.91 (5%) ✅ 1.07 (1%) ❌

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["Exponential", "ColVecs"]
  • ["Exponential", "RowVecs"]
  • ["Exponential", "Vecs"]
  • ["SqExponential", "ColVecs"]
  • ["SqExponential", "RowVecs"]
  • ["SqExponential", "Vecs"]

Julia versioninfo

Target

Julia Version 1.6.5
Commit 9058264a69 (2021-12-19 12:30 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.3 LTS
  uname: Linux 5.11.0-1025-azure #27~20.04.1-Ubuntu SMP Fri Jan 7 15:02:06 UTC 2022 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz: 
              speed         user         nice          sys         idle          irq
       #1  2593 MHz        438 s          0 s        101 s       2898 s          0 s
       #2  2593 MHz       1488 s          0 s        135 s       1836 s          0 s
       
  Memory: 6.788978576660156 GB (3642.44140625 MB free)
  Uptime: 349.6 sec
  Load Avg:  1.18  0.61  0.26
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake-avx512)

Baseline

Julia Version 1.6.5
Commit 9058264a69 (2021-12-19 12:30 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.3 LTS
  uname: Linux 5.11.0-1025-azure #27~20.04.1-Ubuntu SMP Fri Jan 7 15:02:06 UTC 2022 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz: 
              speed         user         nice          sys         idle          irq
       #1  2593 MHz        923 s          0 s        112 s       3016 s          0 s
       #2  2593 MHz       1605 s          0 s        141 s       2324 s          0 s
       
  Memory: 6.788978576660156 GB (3558.45703125 MB free)
  Uptime: 410.88 sec
  Load Avg:  1.1  0.7  0.31
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake-avx512)

Target result

Benchmark Report for /home/runner/work/KernelFunctions.jl/KernelFunctions.jl

Job Properties

  • Time of benchmark: 13 Jan 2022 - 18:38
  • Package commit: ec1863
  • Julia commit: 905826
  • Julia command flags: None
  • Environment variables: None

Results

Below is a table of this job's results, obtained by running the benchmarks.
The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to
index into the BaseBenchmarks suite to retrieve the corresponding benchmarks.
The percentages accompanying time and memory values in the below table are noise tolerances. The "true"
time/memory value for a given benchmark is expected to fall within this percentage of the reported value.
An empty cell means that the value was zero.

ID time GC time memory allocations
["Exponential", "ColVecs", "kernelmatrixX"] 6.800 μs (5%) 6.50 KiB (1%) 2
["Exponential", "ColVecs", "kernelmatrixXY"] 6.800 μs (5%) 6.50 KiB (1%) 2
["Exponential", "ColVecs", "kernelmatrix_diagX"] 213.589 ns (5%) 480 bytes (1%) 2
["Exponential", "ColVecs", "kernelmatrix_diagXY"] 452.294 ns (5%) 576 bytes (1%) 6
["Exponential", "RowVecs", "kernelmatrixX"] 2.644 μs (5%) 1.75 KiB (1%) 2
["Exponential", "RowVecs", "kernelmatrixXY"] 2.645 μs (5%) 1.75 KiB (1%) 2
["Exponential", "RowVecs", "kernelmatrix_diagX"] 138.377 ns (5%) 320 bytes (1%) 2
["Exponential", "RowVecs", "kernelmatrix_diagXY"] 390.599 ns (5%) 416 bytes (1%) 6
["Exponential", "Vecs", "kernelmatrixX"] 6.875 μs (5%) 6.56 KiB (1%) 4
["Exponential", "Vecs", "kernelmatrixXY"] 6.875 μs (5%) 6.56 KiB (1%) 4
["Exponential", "Vecs", "kernelmatrix_diagX"] 213.514 ns (5%) 480 bytes (1%) 2
["Exponential", "Vecs", "kernelmatrix_diagXY"] 440.919 ns (5%) 544 bytes (1%) 4
["SqExponential", "ColVecs", "kernelmatrixX"] 6.400 μs (5%) 6.50 KiB (1%) 2
["SqExponential", "ColVecs", "kernelmatrixXY"] 6.420 μs (5%) 6.50 KiB (1%) 2
["SqExponential", "ColVecs", "kernelmatrix_diagX"] 200.808 ns (5%) 480 bytes (1%) 2
["SqExponential", "ColVecs", "kernelmatrix_diagXY"] 401.505 ns (5%) 544 bytes (1%) 5
["SqExponential", "RowVecs", "kernelmatrixX"] 2.467 μs (5%) 1.75 KiB (1%) 2
["SqExponential", "RowVecs", "kernelmatrixXY"] 2.478 μs (5%) 1.75 KiB (1%) 2
["SqExponential", "RowVecs", "kernelmatrix_diagX"] 132.660 ns (5%) 320 bytes (1%) 2
["SqExponential", "RowVecs", "kernelmatrix_diagXY"] 353.343 ns (5%) 384 bytes (1%) 5
["SqExponential", "Vecs", "kernelmatrixX"] 6.660 μs (5%) 6.53 KiB (1%) 3
["SqExponential", "Vecs", "kernelmatrixXY"] 6.660 μs (5%) 6.53 KiB (1%) 3
["SqExponential", "Vecs", "kernelmatrix_diagX"] 199.192 ns (5%) 480 bytes (1%) 2
["SqExponential", "Vecs", "kernelmatrix_diagXY"] 453.817 ns (5%) 512 bytes (1%) 3

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["Exponential", "ColVecs"]
  • ["Exponential", "RowVecs"]
  • ["Exponential", "Vecs"]
  • ["SqExponential", "ColVecs"]
  • ["SqExponential", "RowVecs"]
  • ["SqExponential", "Vecs"]

Julia versioninfo

Julia Version 1.6.5
Commit 9058264a69 (2021-12-19 12:30 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.3 LTS
  uname: Linux 5.11.0-1025-azure #27~20.04.1-Ubuntu SMP Fri Jan 7 15:02:06 UTC 2022 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz: 
              speed         user         nice          sys         idle          irq
       #1  2593 MHz        438 s          0 s        101 s       2898 s          0 s
       #2  2593 MHz       1488 s          0 s        135 s       1836 s          0 s
       
  Memory: 6.788978576660156 GB (3642.44140625 MB free)
  Uptime: 349.6 sec
  Load Avg:  1.18  0.61  0.26
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake-avx512)

Baseline result

Benchmark Report for /home/runner/work/KernelFunctions.jl/KernelFunctions.jl

Job Properties

  • Time of benchmark: 13 Jan 2022 - 18:38
  • Package commit: 93d33c
  • Julia commit: 905826
  • Julia command flags: None
  • Environment variables: None

Results

Below is a table of this job's results, obtained by running the benchmarks.
The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to
index into the BaseBenchmarks suite to retrieve the corresponding benchmarks.
The percentages accompanying time and memory values in the below table are noise tolerances. The "true"
time/memory value for a given benchmark is expected to fall within this percentage of the reported value.
An empty cell means that the value was zero.

ID time GC time memory allocations
["Exponential", "ColVecs", "kernelmatrixX"] 5.660 μs (5%) 6.73 KiB (1%) 3
["Exponential", "ColVecs", "kernelmatrixXY"] 5.360 μs (5%) 6.97 KiB (1%) 4
["Exponential", "ColVecs", "kernelmatrix_diagX"] 218.872 ns (5%) 480 bytes (1%) 2
["Exponential", "ColVecs", "kernelmatrix_diagXY"] 381.731 ns (5%) 480 bytes (1%) 2
["Exponential", "RowVecs", "kernelmatrixX"] 1.978 μs (5%) 3.67 KiB (1%) 4
["Exponential", "RowVecs", "kernelmatrixXY"] 2.222 μs (5%) 5.59 KiB (1%) 6
["Exponential", "RowVecs", "kernelmatrix_diagX"] 143.416 ns (5%) 320 bytes (1%) 2
["Exponential", "RowVecs", "kernelmatrix_diagXY"] 321.787 ns (5%) 320 bytes (1%) 2
["Exponential", "Vecs", "kernelmatrixX"] 7.850 μs (5%) 6.59 KiB (1%) 4
["Exponential", "Vecs", "kernelmatrixXY"] 7.950 μs (5%) 6.59 KiB (1%) 4
["Exponential", "Vecs", "kernelmatrix_diagX"] 216.846 ns (5%) 480 bytes (1%) 2
["Exponential", "Vecs", "kernelmatrix_diagXY"] 540.414 ns (5%) 480 bytes (1%) 2
["SqExponential", "ColVecs", "kernelmatrixX"] 5.621 μs (5%) 6.73 KiB (1%) 3
["SqExponential", "ColVecs", "kernelmatrixXY"] 5.400 μs (5%) 6.97 KiB (1%) 4
["SqExponential", "ColVecs", "kernelmatrix_diagX"] 219.006 ns (5%) 480 bytes (1%) 2
["SqExponential", "ColVecs", "kernelmatrix_diagXY"] 365.505 ns (5%) 480 bytes (1%) 2
["SqExponential", "RowVecs", "kernelmatrixX"] 1.978 μs (5%) 3.67 KiB (1%) 4
["SqExponential", "RowVecs", "kernelmatrixXY"] 2.233 μs (5%) 5.59 KiB (1%) 6
["SqExponential", "RowVecs", "kernelmatrix_diagX"] 141.783 ns (5%) 320 bytes (1%) 2
["SqExponential", "RowVecs", "kernelmatrix_diagXY"] 292.386 ns (5%) 320 bytes (1%) 2
["SqExponential", "Vecs", "kernelmatrixX"] 8.040 μs (5%) 6.59 KiB (1%) 4
["SqExponential", "Vecs", "kernelmatrixXY"] 8.000 μs (5%) 6.59 KiB (1%) 4
["SqExponential", "Vecs", "kernelmatrix_diagX"] 221.107 ns (5%) 480 bytes (1%) 2
["SqExponential", "Vecs", "kernelmatrix_diagXY"] 500.518 ns (5%) 480 bytes (1%) 2

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["Exponential", "ColVecs"]
  • ["Exponential", "RowVecs"]
  • ["Exponential", "Vecs"]
  • ["SqExponential", "ColVecs"]
  • ["SqExponential", "RowVecs"]
  • ["SqExponential", "Vecs"]

Julia versioninfo

Julia Version 1.6.5
Commit 9058264a69 (2021-12-19 12:30 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.3 LTS
  uname: Linux 5.11.0-1025-azure #27~20.04.1-Ubuntu SMP Fri Jan 7 15:02:06 UTC 2022 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz: 
              speed         user         nice          sys         idle          irq
       #1  2593 MHz        923 s          0 s        112 s       3016 s          0 s
       #2  2593 MHz       1605 s          0 s        141 s       2324 s          0 s
       
  Memory: 6.788978576660156 GB (3558.45703125 MB free)
  Uptime: 410.88 sec
  Load Avg:  1.1  0.7  0.31
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake-avx512)

Runtime information

Runtime Info
BLAS #threads 2
BLAS.vendor() openblas64
Sys.CPU_THREADS 2

lscpu output:

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          2
On-line CPU(s) list:             0,1
Thread(s) per core:              1
Core(s) per socket:              2
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           85
Model name:                      Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Stepping:                        7
CPU MHz:                         2593.908
BogoMIPS:                        5187.81
Hypervisor vendor:               Microsoft
Virtualization type:             full
L1d cache:                       64 KiB
L1i cache:                       64 KiB
L2 cache:                        2 MiB
L3 cache:                        35.8 MiB
NUMA node0 CPU(s):               0,1
Vulnerability Itlb multihit:     KVM: Mitigation: VMX unsupported
Vulnerability L1tf:              Mitigation; PTE Inversion
Vulnerability Mds:               Mitigation; Clear CPU buffers; SMT Host state unknown
Vulnerability Meltdown:          Mitigation; PTI
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Full generic retpoline, STIBP disabled, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Mitigation; Clear CPU buffers; SMT Host state unknown
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt avx512cd avx512bw avx512vl xsaveopt xsavec xsaves md_clear
Cpu Property Value
Brand Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Vendor :Intel
Architecture :Skylake
Model Family: 0x06, Model: 0x55, Stepping: 0x07, Type: 0x00
Cores 2 physical cores, 2 logical cores (on executing CPU)
No Hyperthreading hardware capability detected
Clock Frequencies Not supported by CPU
Data Cache Level 1:3 : (32, 1024, 36608) kbytes
64 byte cache line size
Address Size 48 bits virtual, 46 bits physical
SIMD 512 bit = 64 byte max. SIMD vector size
Time Stamp Counter TSC is accessible via rdtsc
TSC increased at every clock cycle (non-invariant TSC)
Perf. Monitoring Performance Monitoring Counters (PMC) are not supported
Hypervisor Yes, Microsoft

@github-actions
Copy link
Contributor

Benchmark result

Judge result

Benchmark Report for /home/runner/work/KernelFunctions.jl/KernelFunctions.jl

Job Properties

  • Time of benchmarks:
    • Target: 14 Jan 2022 - 11:34
    • Baseline: 14 Jan 2022 - 11:35
  • Package commits:
    • Target: 8c95e2
    • Baseline: d1c68a
  • Julia commits:
    • Target: 905826
    • Baseline: 905826
  • Julia command flags:
    • Target: None
    • Baseline: None
  • Environment variables:
    • Target: None
    • Baseline: None

Results

A ratio greater than 1.0 denotes a possible regression (marked with ❌), while a ratio less
than 1.0 denotes a possible improvement (marked with ✅). Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).

ID time ratio memory ratio
["Exponential", "ColVecs", "kernelmatrixX"] 1.15 (5%) ❌ 0.97 (1%) ✅
["Exponential", "ColVecs", "kernelmatrixXY"] 1.24 (5%) ❌ 0.93 (1%) ✅
["Exponential", "ColVecs", "kernelmatrix_diagXY"] 1.15 (5%) ❌ 1.20 (1%) ❌
["Exponential", "RowVecs", "kernelmatrixX"] 1.34 (5%) ❌ 0.48 (1%) ✅
["Exponential", "RowVecs", "kernelmatrixXY"] 1.24 (5%) ❌ 0.31 (1%) ✅
["Exponential", "RowVecs", "kernelmatrix_diagX"] 0.95 (5%) ✅ 1.00 (1%)
["Exponential", "RowVecs", "kernelmatrix_diagXY"] 1.24 (5%) ❌ 1.30 (1%) ❌
["Exponential", "Vecs", "kernelmatrixX"] 0.85 (5%) ✅ 1.00 (1%)
["Exponential", "Vecs", "kernelmatrixXY"] 0.89 (5%) ✅ 1.00 (1%)
["Exponential", "Vecs", "kernelmatrix_diagXY"] 0.82 (5%) ✅ 1.13 (1%) ❌
["SqExponential", "ColVecs", "kernelmatrixX"] 1.12 (5%) ❌ 0.97 (1%) ✅
["SqExponential", "ColVecs", "kernelmatrixXY"] 1.18 (5%) ❌ 0.93 (1%) ✅
["SqExponential", "ColVecs", "kernelmatrix_diagX"] 0.95 (5%) ✅ 1.00 (1%)
["SqExponential", "ColVecs", "kernelmatrix_diagXY"] 1.11 (5%) ❌ 1.13 (1%) ❌
["SqExponential", "RowVecs", "kernelmatrixX"] 1.30 (5%) ❌ 0.48 (1%) ✅
["SqExponential", "RowVecs", "kernelmatrixXY"] 1.18 (5%) ❌ 0.31 (1%) ✅
["SqExponential", "RowVecs", "kernelmatrix_diagXY"] 1.27 (5%) ❌ 1.20 (1%) ❌
["SqExponential", "Vecs", "kernelmatrixX"] 0.92 (5%) ✅ 0.99 (1%)
["SqExponential", "Vecs", "kernelmatrixXY"] 0.93 (5%) ✅ 0.99 (1%)
["SqExponential", "Vecs", "kernelmatrix_diagXY"] 0.81 (5%) ✅ 1.07 (1%) ❌

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["Exponential", "ColVecs"]
  • ["Exponential", "RowVecs"]
  • ["Exponential", "Vecs"]
  • ["SqExponential", "ColVecs"]
  • ["SqExponential", "RowVecs"]
  • ["SqExponential", "Vecs"]

Julia versioninfo

Target

Julia Version 1.6.5
Commit 9058264a69 (2021-12-19 12:30 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.3 LTS
  uname: Linux 5.11.0-1025-azure #27~20.04.1-Ubuntu SMP Fri Jan 7 15:02:06 UTC 2022 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz: 
              speed         user         nice          sys         idle          irq
       #1  2593 MHz        567 s          1 s        118 s       2866 s          0 s
       #2  2593 MHz       1392 s          1 s        120 s       2072 s          0 s
       
  Memory: 6.788978576660156 GB (3580.671875 MB free)
  Uptime: 361.96 sec
  Load Avg:  1.05  0.67  0.3
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake-avx512)

Baseline

Julia Version 1.6.5
Commit 9058264a69 (2021-12-19 12:30 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.3 LTS
  uname: Linux 5.11.0-1025-azure #27~20.04.1-Ubuntu SMP Fri Jan 7 15:02:06 UTC 2022 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz: 
              speed         user         nice          sys         idle          irq
       #1  2593 MHz        916 s          1 s        125 s       3137 s          0 s
       #2  2593 MHz       1663 s          1 s        128 s       2420 s          0 s
       
  Memory: 6.788978576660156 GB (3501.64453125 MB free)
  Uptime: 424.65 sec
  Load Avg:  1.02  0.73  0.35
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake-avx512)

Target result

Benchmark Report for /home/runner/work/KernelFunctions.jl/KernelFunctions.jl

Job Properties

  • Time of benchmark: 14 Jan 2022 - 11:34
  • Package commit: 8c95e2
  • Julia commit: 905826
  • Julia command flags: None
  • Environment variables: None

Results

Below is a table of this job's results, obtained by running the benchmarks.
The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to
index into the BaseBenchmarks suite to retrieve the corresponding benchmarks.
The percentages accompanying time and memory values in the below table are noise tolerances. The "true"
time/memory value for a given benchmark is expected to fall within this percentage of the reported value.
An empty cell means that the value was zero.

ID time GC time memory allocations
["Exponential", "ColVecs", "kernelmatrixX"] 6.340 μs (5%) 6.50 KiB (1%) 2
["Exponential", "ColVecs", "kernelmatrixXY"] 6.340 μs (5%) 6.50 KiB (1%) 2
["Exponential", "ColVecs", "kernelmatrix_diagX"] 213.522 ns (5%) 480 bytes (1%) 2
["Exponential", "ColVecs", "kernelmatrix_diagXY"] 472.964 ns (5%) 576 bytes (1%) 6
["Exponential", "RowVecs", "kernelmatrixX"] 2.645 μs (5%) 1.75 KiB (1%) 2
["Exponential", "RowVecs", "kernelmatrixXY"] 2.645 μs (5%) 1.75 KiB (1%) 2
["Exponential", "RowVecs", "kernelmatrix_diagX"] 136.334 ns (5%) 320 bytes (1%) 2
["Exponential", "RowVecs", "kernelmatrix_diagXY"] 390.552 ns (5%) 416 bytes (1%) 6
["Exponential", "Vecs", "kernelmatrixX"] 6.825 μs (5%) 6.56 KiB (1%) 4
["Exponential", "Vecs", "kernelmatrixXY"] 6.875 μs (5%) 6.56 KiB (1%) 4
["Exponential", "Vecs", "kernelmatrix_diagX"] 212.956 ns (5%) 480 bytes (1%) 2
["Exponential", "Vecs", "kernelmatrix_diagXY"] 439.909 ns (5%) 544 bytes (1%) 4
["SqExponential", "ColVecs", "kernelmatrixX"] 5.900 μs (5%) 6.50 KiB (1%) 2
["SqExponential", "ColVecs", "kernelmatrixXY"] 5.867 μs (5%) 6.50 KiB (1%) 2
["SqExponential", "ColVecs", "kernelmatrix_diagX"] 197.588 ns (5%) 480 bytes (1%) 2
["SqExponential", "ColVecs", "kernelmatrix_diagXY"] 399.505 ns (5%) 544 bytes (1%) 5
["SqExponential", "RowVecs", "kernelmatrixX"] 2.522 μs (5%) 1.75 KiB (1%) 2
["SqExponential", "RowVecs", "kernelmatrixXY"] 2.522 μs (5%) 1.75 KiB (1%) 2
["SqExponential", "RowVecs", "kernelmatrix_diagX"] 133.185 ns (5%) 320 bytes (1%) 2
["SqExponential", "RowVecs", "kernelmatrix_diagXY"] 375.495 ns (5%) 384 bytes (1%) 5
["SqExponential", "Vecs", "kernelmatrixX"] 6.640 μs (5%) 6.53 KiB (1%) 3
["SqExponential", "Vecs", "kernelmatrixXY"] 6.660 μs (5%) 6.53 KiB (1%) 3
["SqExponential", "Vecs", "kernelmatrix_diagX"] 199.037 ns (5%) 480 bytes (1%) 2
["SqExponential", "Vecs", "kernelmatrix_diagXY"] 419.101 ns (5%) 512 bytes (1%) 3

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["Exponential", "ColVecs"]
  • ["Exponential", "RowVecs"]
  • ["Exponential", "Vecs"]
  • ["SqExponential", "ColVecs"]
  • ["SqExponential", "RowVecs"]
  • ["SqExponential", "Vecs"]

Julia versioninfo

Julia Version 1.6.5
Commit 9058264a69 (2021-12-19 12:30 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.3 LTS
  uname: Linux 5.11.0-1025-azure #27~20.04.1-Ubuntu SMP Fri Jan 7 15:02:06 UTC 2022 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz: 
              speed         user         nice          sys         idle          irq
       #1  2593 MHz        567 s          1 s        118 s       2866 s          0 s
       #2  2593 MHz       1392 s          1 s        120 s       2072 s          0 s
       
  Memory: 6.788978576660156 GB (3580.671875 MB free)
  Uptime: 361.96 sec
  Load Avg:  1.05  0.67  0.3
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake-avx512)

Baseline result

Benchmark Report for /home/runner/work/KernelFunctions.jl/KernelFunctions.jl

Job Properties

  • Time of benchmark: 14 Jan 2022 - 11:35
  • Package commit: d1c68a
  • Julia commit: 905826
  • Julia command flags: None
  • Environment variables: None

Results

Below is a table of this job's results, obtained by running the benchmarks.
The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to
index into the BaseBenchmarks suite to retrieve the corresponding benchmarks.
The percentages accompanying time and memory values in the below table are noise tolerances. The "true"
time/memory value for a given benchmark is expected to fall within this percentage of the reported value.
An empty cell means that the value was zero.

ID time GC time memory allocations
["Exponential", "ColVecs", "kernelmatrixX"] 5.500 μs (5%) 6.73 KiB (1%) 3
["Exponential", "ColVecs", "kernelmatrixXY"] 5.100 μs (5%) 6.97 KiB (1%) 4
["Exponential", "ColVecs", "kernelmatrix_diagX"] 221.485 ns (5%) 480 bytes (1%) 2
["Exponential", "ColVecs", "kernelmatrix_diagXY"] 410.719 ns (5%) 480 bytes (1%) 2
["Exponential", "RowVecs", "kernelmatrixX"] 1.978 μs (5%) 3.67 KiB (1%) 4
["Exponential", "RowVecs", "kernelmatrixXY"] 2.133 μs (5%) 5.59 KiB (1%) 6
["Exponential", "RowVecs", "kernelmatrix_diagX"] 143.947 ns (5%) 320 bytes (1%) 2
["Exponential", "RowVecs", "kernelmatrix_diagXY"] 314.433 ns (5%) 320 bytes (1%) 2
["Exponential", "Vecs", "kernelmatrixX"] 8.000 μs (5%) 6.59 KiB (1%) 4
["Exponential", "Vecs", "kernelmatrixXY"] 7.750 μs (5%) 6.59 KiB (1%) 4
["Exponential", "Vecs", "kernelmatrix_diagX"] 221.147 ns (5%) 480 bytes (1%) 2
["Exponential", "Vecs", "kernelmatrix_diagXY"] 536.879 ns (5%) 480 bytes (1%) 2
["SqExponential", "ColVecs", "kernelmatrixX"] 5.260 μs (5%) 6.73 KiB (1%) 3
["SqExponential", "ColVecs", "kernelmatrixXY"] 4.967 μs (5%) 6.97 KiB (1%) 4
["SqExponential", "ColVecs", "kernelmatrix_diagX"] 208.538 ns (5%) 480 bytes (1%) 2
["SqExponential", "ColVecs", "kernelmatrix_diagXY"] 359.005 ns (5%) 480 bytes (1%) 2
["SqExponential", "RowVecs", "kernelmatrixX"] 1.944 μs (5%) 3.67 KiB (1%) 4
["SqExponential", "RowVecs", "kernelmatrixXY"] 2.133 μs (5%) 5.59 KiB (1%) 6
["SqExponential", "RowVecs", "kernelmatrix_diagX"] 139.187 ns (5%) 320 bytes (1%) 2
["SqExponential", "RowVecs", "kernelmatrix_diagXY"] 295.103 ns (5%) 320 bytes (1%) 2
["SqExponential", "Vecs", "kernelmatrixX"] 7.200 μs (5%) 6.59 KiB (1%) 4
["SqExponential", "Vecs", "kernelmatrixXY"] 7.180 μs (5%) 6.59 KiB (1%) 4
["SqExponential", "Vecs", "kernelmatrix_diagX"] 209.182 ns (5%) 480 bytes (1%) 2
["SqExponential", "Vecs", "kernelmatrix_diagXY"] 518.598 ns (5%) 480 bytes (1%) 2

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["Exponential", "ColVecs"]
  • ["Exponential", "RowVecs"]
  • ["Exponential", "Vecs"]
  • ["SqExponential", "ColVecs"]
  • ["SqExponential", "RowVecs"]
  • ["SqExponential", "Vecs"]

Julia versioninfo

Julia Version 1.6.5
Commit 9058264a69 (2021-12-19 12:30 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.3 LTS
  uname: Linux 5.11.0-1025-azure #27~20.04.1-Ubuntu SMP Fri Jan 7 15:02:06 UTC 2022 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz: 
              speed         user         nice          sys         idle          irq
       #1  2593 MHz        916 s          1 s        125 s       3137 s          0 s
       #2  2593 MHz       1663 s          1 s        128 s       2420 s          0 s
       
  Memory: 6.788978576660156 GB (3501.64453125 MB free)
  Uptime: 424.65 sec
  Load Avg:  1.02  0.73  0.35
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake-avx512)

Runtime information

Runtime Info
BLAS #threads 2
BLAS.vendor() openblas64
Sys.CPU_THREADS 2

lscpu output:

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          2
On-line CPU(s) list:             0,1
Thread(s) per core:              1
Core(s) per socket:              2
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           85
Model name:                      Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Stepping:                        7
CPU MHz:                         2593.905
BogoMIPS:                        5187.81
Hypervisor vendor:               Microsoft
Virtualization type:             full
L1d cache:                       64 KiB
L1i cache:                       64 KiB
L2 cache:                        2 MiB
L3 cache:                        35.8 MiB
NUMA node0 CPU(s):               0,1
Vulnerability Itlb multihit:     KVM: Mitigation: VMX unsupported
Vulnerability L1tf:              Mitigation; PTE Inversion
Vulnerability Mds:               Mitigation; Clear CPU buffers; SMT Host state unknown
Vulnerability Meltdown:          Mitigation; PTI
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Full generic retpoline, STIBP disabled, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Mitigation; Clear CPU buffers; SMT Host state unknown
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt avx512cd avx512bw avx512vl xsaveopt xsavec xsaves md_clear
Cpu Property Value
Brand Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Vendor :Intel
Architecture :Skylake
Model Family: 0x06, Model: 0x55, Stepping: 0x07, Type: 0x00
Cores 2 physical cores, 2 logical cores (on executing CPU)
No Hyperthreading hardware capability detected
Clock Frequencies Not supported by CPU
Data Cache Level 1:3 : (32, 1024, 36608) kbytes
64 byte cache line size
Address Size 48 bits virtual, 46 bits physical
SIMD 512 bit = 64 byte max. SIMD vector size
Time Stamp Counter TSC is accessible via rdtsc
TSC increased at every clock cycle (non-invariant TSC)
Perf. Monitoring Performance Monitoring Counters (PMC) are not supported
Hypervisor Yes, Microsoft

@github-actions
Copy link
Contributor

Benchmark result

Judge result

Benchmark Report for /home/runner/work/KernelFunctions.jl/KernelFunctions.jl

Job Properties

  • Time of benchmarks:
    • Target: 14 Jan 2022 - 16:13
    • Baseline: 14 Jan 2022 - 16:14
  • Package commits:
    • Target: 594c32
    • Baseline: d1c68a
  • Julia commits:
    • Target: 905826
    • Baseline: 905826
  • Julia command flags:
    • Target: None
    • Baseline: None
  • Environment variables:
    • Target: None
    • Baseline: None

Results

A ratio greater than 1.0 denotes a possible regression (marked with ❌), while a ratio less
than 1.0 denotes a possible improvement (marked with ✅). Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).

ID time ratio memory ratio
["Exponential", "ColVecs", "kernelmatrixX"] 1.18 (5%) ❌ 0.97 (1%) ✅
["Exponential", "ColVecs", "kernelmatrixXY"] 1.32 (5%) ❌ 0.93 (1%) ✅
["Exponential", "ColVecs", "kernelmatrix_diagXY"] 0.93 (5%) ✅ 1.00 (1%)
["Exponential", "RowVecs", "kernelmatrixX"] 1.16 (5%) ❌ 0.48 (1%) ✅
["Exponential", "RowVecs", "kernelmatrixXY"] 1.03 (5%) 0.31 (1%) ✅
["Exponential", "RowVecs", "kernelmatrix_diagXY"] 0.93 (5%) ✅ 1.00 (1%)
["Exponential", "Vecs", "kernelmatrixX"] 0.87 (5%) ✅ 1.00 (1%)
["Exponential", "Vecs", "kernelmatrixXY"] 0.87 (5%) ✅ 1.00 (1%)
["Exponential", "Vecs", "kernelmatrix_diagX"] 1.10 (5%) ❌ 1.00 (1%)
["Exponential", "Vecs", "kernelmatrix_diagXY"] 0.71 (5%) ✅ 1.13 (1%) ❌
["SqExponential", "ColVecs", "kernelmatrixX"] 1.05 (5%) ❌ 0.97 (1%) ✅
["SqExponential", "ColVecs", "kernelmatrixXY"] 1.12 (5%) ❌ 0.93 (1%) ✅
["SqExponential", "ColVecs", "kernelmatrix_diagX"] 0.92 (5%) ✅ 1.00 (1%)
["SqExponential", "ColVecs", "kernelmatrix_diagXY"] 0.88 (5%) ✅ 1.00 (1%)
["SqExponential", "RowVecs", "kernelmatrixX"] 1.24 (5%) ❌ 0.48 (1%) ✅
["SqExponential", "RowVecs", "kernelmatrixXY"] 1.20 (5%) ❌ 0.31 (1%) ✅
["SqExponential", "RowVecs", "kernelmatrix_diagX"] 0.94 (5%) ✅ 1.00 (1%)
["SqExponential", "RowVecs", "kernelmatrix_diagXY"] 1.11 (5%) ❌ 1.00 (1%)
["SqExponential", "Vecs", "kernelmatrixX"] 0.83 (5%) ✅ 0.99 (1%)
["SqExponential", "Vecs", "kernelmatrixXY"] 0.73 (5%) ✅ 0.99 (1%)
["SqExponential", "Vecs", "kernelmatrix_diagX"] 0.90 (5%) ✅ 1.00 (1%)
["SqExponential", "Vecs", "kernelmatrix_diagXY"] 0.83 (5%) ✅ 1.07 (1%) ❌

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["Exponential", "ColVecs"]
  • ["Exponential", "RowVecs"]
  • ["Exponential", "Vecs"]
  • ["SqExponential", "ColVecs"]
  • ["SqExponential", "RowVecs"]
  • ["SqExponential", "Vecs"]

Julia versioninfo

Target

Julia Version 1.6.5
Commit 9058264a69 (2021-12-19 12:30 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.3 LTS
  uname: Linux 5.11.0-1025-azure #27~20.04.1-Ubuntu SMP Fri Jan 7 15:02:06 UTC 2022 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz: 
              speed         user         nice          sys         idle          irq
       #1  2095 MHz       1720 s          2 s        153 s        370 s          0 s
       #2  2095 MHz        419 s          0 s        125 s       1718 s          0 s
       
  Memory: 6.788978576660156 GB (3615.23046875 MB free)
  Uptime: 230.41 sec
  Load Avg:  1.1  0.81  0.36
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake-avx512)

Baseline

Julia Version 1.6.5
Commit 9058264a69 (2021-12-19 12:30 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.3 LTS
  uname: Linux 5.11.0-1025-azure #27~20.04.1-Ubuntu SMP Fri Jan 7 15:02:06 UTC 2022 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz: 
              speed         user         nice          sys         idle          irq
       #1  2095 MHz       2305 s          2 s        164 s        478 s          0 s
       #2  2095 MHz        532 s          0 s        131 s       2301 s          0 s
       
  Memory: 6.788978576660156 GB (3546.81640625 MB free)
  Uptime: 300.81 sec
  Load Avg:  1.03  0.86  0.41
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake-avx512)

Target result

Benchmark Report for /home/runner/work/KernelFunctions.jl/KernelFunctions.jl

Job Properties

  • Time of benchmark: 14 Jan 2022 - 16:13
  • Package commit: 594c32
  • Julia commit: 905826
  • Julia command flags: None
  • Environment variables: None

Results

Below is a table of this job's results, obtained by running the benchmarks.
The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to
index into the BaseBenchmarks suite to retrieve the corresponding benchmarks.
The percentages accompanying time and memory values in the below table are noise tolerances. The "true"
time/memory value for a given benchmark is expected to fall within this percentage of the reported value.
An empty cell means that the value was zero.

ID time GC time memory allocations
["Exponential", "ColVecs", "kernelmatrixX"] 6.640 μs (5%) 6.50 KiB (1%) 2
["Exponential", "ColVecs", "kernelmatrixXY"] 6.660 μs (5%) 6.50 KiB (1%) 2
["Exponential", "ColVecs", "kernelmatrix_diagX"] 178.598 ns (5%) 480 bytes (1%) 2
["Exponential", "ColVecs", "kernelmatrix_diagXY"] 346.335 ns (5%) 480 bytes (1%) 2
["Exponential", "RowVecs", "kernelmatrixX"] 2.244 μs (5%) 1.75 KiB (1%) 2
["Exponential", "RowVecs", "kernelmatrixXY"] 2.267 μs (5%) 1.75 KiB (1%) 2
["Exponential", "RowVecs", "kernelmatrix_diagX"] 118.974 ns (5%) 320 bytes (1%) 2
["Exponential", "RowVecs", "kernelmatrix_diagXY"] 298.635 ns (5%) 320 bytes (1%) 2
["Exponential", "Vecs", "kernelmatrixX"] 6.720 μs (5%) 6.56 KiB (1%) 4
["Exponential", "Vecs", "kernelmatrixXY"] 6.800 μs (5%) 6.56 KiB (1%) 4
["Exponential", "Vecs", "kernelmatrix_diagX"] 204.170 ns (5%) 480 bytes (1%) 2
["Exponential", "Vecs", "kernelmatrix_diagXY"] 380.910 ns (5%) 544 bytes (1%) 4
["SqExponential", "ColVecs", "kernelmatrixX"] 5.833 μs (5%) 6.50 KiB (1%) 2
["SqExponential", "ColVecs", "kernelmatrixXY"] 5.817 μs (5%) 6.50 KiB (1%) 2
["SqExponential", "ColVecs", "kernelmatrix_diagX"] 193.085 ns (5%) 480 bytes (1%) 2
["SqExponential", "ColVecs", "kernelmatrix_diagXY"] 312.269 ns (5%) 480 bytes (1%) 2
["SqExponential", "RowVecs", "kernelmatrixX"] 2.400 μs (5%) 1.75 KiB (1%) 2
["SqExponential", "RowVecs", "kernelmatrixXY"] 2.400 μs (5%) 1.75 KiB (1%) 2
["SqExponential", "RowVecs", "kernelmatrix_diagX"] 130.918 ns (5%) 320 bytes (1%) 2
["SqExponential", "RowVecs", "kernelmatrix_diagXY"] 317.801 ns (5%) 320 bytes (1%) 2
["SqExponential", "Vecs", "kernelmatrixX"] 6.480 μs (5%) 6.53 KiB (1%) 3
["SqExponential", "Vecs", "kernelmatrixXY"] 5.640 μs (5%) 6.53 KiB (1%) 3
["SqExponential", "Vecs", "kernelmatrix_diagX"] 192.610 ns (5%) 480 bytes (1%) 2
["SqExponential", "Vecs", "kernelmatrix_diagXY"] 404.505 ns (5%) 512 bytes (1%) 3

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["Exponential", "ColVecs"]
  • ["Exponential", "RowVecs"]
  • ["Exponential", "Vecs"]
  • ["SqExponential", "ColVecs"]
  • ["SqExponential", "RowVecs"]
  • ["SqExponential", "Vecs"]

Julia versioninfo

Julia Version 1.6.5
Commit 9058264a69 (2021-12-19 12:30 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.3 LTS
  uname: Linux 5.11.0-1025-azure #27~20.04.1-Ubuntu SMP Fri Jan 7 15:02:06 UTC 2022 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz: 
              speed         user         nice          sys         idle          irq
       #1  2095 MHz       1720 s          2 s        153 s        370 s          0 s
       #2  2095 MHz        419 s          0 s        125 s       1718 s          0 s
       
  Memory: 6.788978576660156 GB (3615.23046875 MB free)
  Uptime: 230.41 sec
  Load Avg:  1.1  0.81  0.36
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake-avx512)

Baseline result

Benchmark Report for /home/runner/work/KernelFunctions.jl/KernelFunctions.jl

Job Properties

  • Time of benchmark: 14 Jan 2022 - 16:14
  • Package commit: d1c68a
  • Julia commit: 905826
  • Julia command flags: None
  • Environment variables: None

Results

Below is a table of this job's results, obtained by running the benchmarks.
The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to
index into the BaseBenchmarks suite to retrieve the corresponding benchmarks.
The percentages accompanying time and memory values in the below table are noise tolerances. The "true"
time/memory value for a given benchmark is expected to fall within this percentage of the reported value.
An empty cell means that the value was zero.

ID time GC time memory allocations
["Exponential", "ColVecs", "kernelmatrixX"] 5.620 μs (5%) 6.73 KiB (1%) 3
["Exponential", "ColVecs", "kernelmatrixXY"] 5.040 μs (5%) 6.97 KiB (1%) 4
["Exponential", "ColVecs", "kernelmatrix_diagX"] 184.740 ns (5%) 480 bytes (1%) 2
["Exponential", "ColVecs", "kernelmatrix_diagXY"] 373.399 ns (5%) 480 bytes (1%) 2
["Exponential", "RowVecs", "kernelmatrixX"] 1.933 μs (5%) 3.67 KiB (1%) 4
["Exponential", "RowVecs", "kernelmatrixXY"] 2.200 μs (5%) 5.59 KiB (1%) 6
["Exponential", "RowVecs", "kernelmatrix_diagX"] 120.458 ns (5%) 320 bytes (1%) 2
["Exponential", "RowVecs", "kernelmatrix_diagXY"] 319.635 ns (5%) 320 bytes (1%) 2
["Exponential", "Vecs", "kernelmatrixX"] 7.720 μs (5%) 6.59 KiB (1%) 4
["Exponential", "Vecs", "kernelmatrixXY"] 7.780 μs (5%) 6.59 KiB (1%) 4
["Exponential", "Vecs", "kernelmatrix_diagX"] 185.335 ns (5%) 480 bytes (1%) 2
["Exponential", "Vecs", "kernelmatrix_diagXY"] 533.678 ns (5%) 480 bytes (1%) 2
["SqExponential", "ColVecs", "kernelmatrixX"] 5.533 μs (5%) 6.73 KiB (1%) 3
["SqExponential", "ColVecs", "kernelmatrixXY"] 5.183 μs (5%) 6.97 KiB (1%) 4
["SqExponential", "ColVecs", "kernelmatrix_diagX"] 210.081 ns (5%) 480 bytes (1%) 2
["SqExponential", "ColVecs", "kernelmatrix_diagXY"] 354.722 ns (5%) 480 bytes (1%) 2
["SqExponential", "RowVecs", "kernelmatrixX"] 1.933 μs (5%) 3.67 KiB (1%) 4
["SqExponential", "RowVecs", "kernelmatrixXY"] 2.000 μs (5%) 5.59 KiB (1%) 6
["SqExponential", "RowVecs", "kernelmatrix_diagX"] 139.412 ns (5%) 320 bytes (1%) 2
["SqExponential", "RowVecs", "kernelmatrix_diagXY"] 287.292 ns (5%) 320 bytes (1%) 2
["SqExponential", "Vecs", "kernelmatrixX"] 7.800 μs (5%) 6.59 KiB (1%) 4
["SqExponential", "Vecs", "kernelmatrixXY"] 7.760 μs (5%) 6.59 KiB (1%) 4
["SqExponential", "Vecs", "kernelmatrix_diagX"] 213.554 ns (5%) 480 bytes (1%) 2
["SqExponential", "Vecs", "kernelmatrix_diagXY"] 488.005 ns (5%) 480 bytes (1%) 2

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["Exponential", "ColVecs"]
  • ["Exponential", "RowVecs"]
  • ["Exponential", "Vecs"]
  • ["SqExponential", "ColVecs"]
  • ["SqExponential", "RowVecs"]
  • ["SqExponential", "Vecs"]

Julia versioninfo

Julia Version 1.6.5
Commit 9058264a69 (2021-12-19 12:30 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.3 LTS
  uname: Linux 5.11.0-1025-azure #27~20.04.1-Ubuntu SMP Fri Jan 7 15:02:06 UTC 2022 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz: 
              speed         user         nice          sys         idle          irq
       #1  2095 MHz       2305 s          2 s        164 s        478 s          0 s
       #2  2095 MHz        532 s          0 s        131 s       2301 s          0 s
       
  Memory: 6.788978576660156 GB (3546.81640625 MB free)
  Uptime: 300.81 sec
  Load Avg:  1.03  0.86  0.41
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake-avx512)

Runtime information

Runtime Info
BLAS #threads 2
BLAS.vendor() openblas64
Sys.CPU_THREADS 2

lscpu output:

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          2
On-line CPU(s) list:             0,1
Thread(s) per core:              1
Core(s) per socket:              2
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           85
Model name:                      Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
Stepping:                        4
CPU MHz:                         2095.079
BogoMIPS:                        4190.15
Hypervisor vendor:               Microsoft
Virtualization type:             full
L1d cache:                       64 KiB
L1i cache:                       64 KiB
L2 cache:                        2 MiB
L3 cache:                        35.8 MiB
NUMA node0 CPU(s):               0,1
Vulnerability Itlb multihit:     KVM: Mitigation: VMX unsupported
Vulnerability L1tf:              Mitigation; PTE Inversion
Vulnerability Mds:               Mitigation; Clear CPU buffers; SMT Host state unknown
Vulnerability Meltdown:          Mitigation; PTI
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Full generic retpoline, STIBP disabled, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Mitigation; Clear CPU buffers; SMT Host state unknown
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt avx512cd avx512bw avx512vl xsaveopt xsavec xsaves md_clear
Cpu Property Value
Brand Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
Vendor :Intel
Architecture :Skylake
Model Family: 0x06, Model: 0x55, Stepping: 0x04, Type: 0x00
Cores 2 physical cores, 2 logical cores (on executing CPU)
No Hyperthreading hardware capability detected
Clock Frequencies Not supported by CPU
Data Cache Level 1:3 : (32, 1024, 36608) kbytes
64 byte cache line size
Address Size 48 bits virtual, 46 bits physical
SIMD 512 bit = 64 byte max. SIMD vector size
Time Stamp Counter TSC is accessible via rdtsc
TSC increased at every clock cycle (non-invariant TSC)
Perf. Monitoring Performance Monitoring Counters (PMC) are not supported
Hypervisor Yes, Microsoft

@theogf
Copy link
Member Author

theogf commented Jan 21, 2022

This is an extended discussion on the status of this PR and using Tullio for computing pairwise distances. This is the follow-up of the discussion during out last meeting.

Distances vs Tullio

  • Distances.jl is definitely outdated and has a large technical debt: in-place operations, no GPU compatibility, hard-coded types all over the code.
  • Tullio.jl offers the possibility to do this operation using the Einstein notation (similar to einsum in some Python framework). It is not as fast as Distances.jl but the slowness is negligible (~5%) compared to the gain in flexibility. It also allocate a lot less which can make a difference for very large kernel matrices. It is also automatically compatible with GPU computations (and I guess parallelization and co). We basically benefit from everything stated on the Tullio.jl README.

The current issues with Tullio (and its solutions)

  • The biggest problem of all is AD. Tullio tries to symbolically derives its own rules and forbid all other alternatives. @devmotion wrote Opt out of CR.rrule if pullback is not defined mcabbott/Tullio.jl#130 to partially solve this problem but we don't know if it would work well anyway
  • This means that generic fallback like P[i, j] := d(X[i], Y[i]) is not differentiable...
  • A solution to that is to write custom rrule (and frule) for pairwise and colwise, however that requires writing rules for every case (AVector, ColVecs, RowVecs) etc...
  • Another solution is that, due to the nature of the pairwise (and colwise) function it might be possible to have a very generic rrule and to only have to define the scalar pushforward of dP_ij/dX_kl for each metric. I will try to investigate that with @niklasschmitz (who will hear about this just now).

@willtebbutt
Copy link
Member

willtebbutt commented Jan 21, 2022

A solution to that is to write custom rrule (and frule) for pairwise and colwise, however that requires writing rules for every case (AVector, ColVecs, RowVecs) etc...

One slightly different option, is to implement things as a mix of Tullio and standard functionality, in a way that doesn't require implementing new rrules etc.

For example, suppose that d(x, y) = f((x - y)^2), where Tullio doesn't know how to differentiate f.
You could implement pairwise for this operation as something like

function pairwise(d, x::RowVecs, y::RowVecs)
    @tullio D[i, j] = (x[i, k] - y[j, k])^2
    return map(f, D)
end

Yes, you've had to implement a specialise method of pairwise for RowVecs, which isn't any better than what we have at the minute, but it should be automatically differentiable and GPU-able. In this sense, it's better than what we have at the minute -- current code is differentiable, but not GPU-able -- modulo any small performance loss relative to using Distances on the CPU.

Alternatively, presumably there's a way to teach Tullio about any particular f?

@devmotion
Copy link
Member

Thanks for the write-up! It makes me wonder though what exactly the benefits from switching to Tullio would be.

Unfortunately, IIRC my PR wouldn't help with the issues that you faced here. And if we have to implement ChainRules derivatives then one of the main selling points of Tullio (AD) would not be helpful in our case and we could just implrment and optimize rules for Distances.

I'm also curious in what cases Tullio reduces allocations. I assume you talk about pairwise and colwise? For primal computations Distances should (usually) be impossible to beat since it only allocates the output array. So I guess you refer to AD? In this case maybe just a rule, or an improved rule, is missing for Distances. And if we have to add them anyway we could just fix these issues with Distances.

Speaking about pairwise and colwise, is the einsum notation actually an argument for Tullio? So far we only need these two it seems but not any other operations.

So it seems the main argument for Tullio would be GPU compatibility. IIRC people tried to address this already with packages such as DistancesCUDA (not sure about the name...). In defense of Distances, my impression is not that it is completely outdated. It's actively maintained and there were many improvements and refactorings in the last months. In my possibly biased opinion its source code is also much cleaner and simpler and easier to read than Tullio (probably not too surprising - to some extent - given the different goals of both packages).

@devmotion
Copy link
Member

Regarding GPU support, we or some separate package could just restrict our custom pairwise etc implementations such that we only forward specific array types to Distances and handle the rest in a non-mutable way (or the other way around).

@theogf
Copy link
Member Author

theogf commented Jan 21, 2022

One slightly different option, is to implement things as a mix of Tullio and standard functionality, in a way that doesn't require implementing new rrules etc.

@willtebbutt The problem with this, is that it does not solve the problem of the generic fallback. Of course one could just use broadcasting

I'm also curious in what cases Tullio reduces allocations. I assume you talk about pairwise and colwise?

I was wrong about this (I confused with the other expected_loglike benchmarks). The allocation saves are compared to the standard broadcast approach. Here is the Gist where I saved all the computations https://gist.github.com/theogf/50426b2e991bba8868f6728d1325518b

Regarding GPU support, we or some separate package could just restrict our custom pairwise etc implementations such that we only forward specific array types to Distances and handle the rest in a non-mutable way (or the other way around).

That sounds more like a temporary solution right? And also a lot more work! I think Tullio would at least allow us to have a general solution (and if my AD proposal works very little work from our side)

@devmotion
Copy link
Member

That sounds more like a temporary solution right? And also a lot more work!

I think that's what ideally you want to do. Based on specific traits of arrays such as the ones in ArrayInterface you use different implementations (e.g. in-place or out-of-place). That's what SciML does all the time, e.g. all ODE methods are implemented twice, once for in-place and once for out-of-place functions. So if such traits (would) become available in base, then this would be what Distances should and probably would do. Unfortunately, ArrayInterface is a quite heavy dependency so currently such traits are not available in Distances.

@github-actions
Copy link
Contributor

Benchmark result

Judge result

Benchmark Report for /home/runner/work/KernelFunctions.jl/KernelFunctions.jl

Job Properties

  • Time of benchmarks:
    • Target: 31 Jan 2022 - 11:31
    • Baseline: 31 Jan 2022 - 11:32
  • Package commits:
    • Target: eb1a6c
    • Baseline: f9bbd8
  • Julia commits:
    • Target: 905826
    • Baseline: 905826
  • Julia command flags:
    • Target: None
    • Baseline: None
  • Environment variables:
    • Target: None
    • Baseline: None

Results

A ratio greater than 1.0 denotes a possible regression (marked with ❌), while a ratio less
than 1.0 denotes a possible improvement (marked with ✅). Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).

ID time ratio memory ratio
["Exponential", "ColVecs", "kernelmatrixX"] 1.15 (5%) ❌ 0.97 (1%) ✅
["Exponential", "ColVecs", "kernelmatrixXY"] 1.23 (5%) ❌ 0.93 (1%) ✅
["Exponential", "ColVecs", "kernelmatrix_diagXY"] 1.06 (5%) ❌ 1.00 (1%)
["Exponential", "RowVecs", "kernelmatrixX"] 1.34 (5%) ❌ 0.48 (1%) ✅
["Exponential", "RowVecs", "kernelmatrixXY"] 1.25 (5%) ❌ 0.31 (1%) ✅
["Exponential", "RowVecs", "kernelmatrix_diagXY"] 1.11 (5%) ❌ 1.00 (1%)
["Exponential", "Vecs", "kernelmatrixX"] 0.87 (5%) ✅ 1.00 (1%)
["Exponential", "Vecs", "kernelmatrixXY"] 0.89 (5%) ✅ 1.00 (1%)
["Exponential", "Vecs", "kernelmatrix_diagXY"] 0.81 (5%) ✅ 1.13 (1%) ❌
["SqExponential", "ColVecs", "kernelmatrixX"] 1.12 (5%) ❌ 0.97 (1%) ✅
["SqExponential", "ColVecs", "kernelmatrixXY"] 1.18 (5%) ❌ 0.93 (1%) ✅
["SqExponential", "RowVecs", "kernelmatrixX"] 1.30 (5%) ❌ 0.48 (1%) ✅
["SqExponential", "RowVecs", "kernelmatrixXY"] 1.20 (5%) ❌ 0.31 (1%) ✅
["SqExponential", "RowVecs", "kernelmatrix_diagXY"] 1.08 (5%) ❌ 1.00 (1%)
["SqExponential", "Vecs", "kernelmatrixX"] 0.92 (5%) ✅ 0.99 (1%)
["SqExponential", "Vecs", "kernelmatrixXY"] 0.93 (5%) ✅ 0.99 (1%)
["SqExponential", "Vecs", "kernelmatrix_diagXY"] 0.89 (5%) ✅ 1.07 (1%) ❌

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["Exponential", "ColVecs"]
  • ["Exponential", "RowVecs"]
  • ["Exponential", "Vecs"]
  • ["SqExponential", "ColVecs"]
  • ["SqExponential", "RowVecs"]
  • ["SqExponential", "Vecs"]

Julia versioninfo

Target

Julia Version 1.6.5
Commit 9058264a69 (2021-12-19 12:30 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.3 LTS
  uname: Linux 5.11.0-1027-azure #30~20.04.1-Ubuntu SMP Wed Jan 12 20:56:50 UTC 2022 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz: 
              speed         user         nice          sys         idle          irq
       #1  2593 MHz        500 s          1 s        108 s       1350 s          0 s
       #2  2593 MHz       1464 s          1 s        136 s        396 s          0 s
       
  Memory: 6.788978576660156 GB (3555.484375 MB free)
  Uptime: 202.46 sec
  Load Avg:  1.1  0.72  0.31
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake-avx512)

Baseline

Julia Version 1.6.5
Commit 9058264a69 (2021-12-19 12:30 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.3 LTS
  uname: Linux 5.11.0-1027-azure #30~20.04.1-Ubuntu SMP Wed Jan 12 20:56:50 UTC 2022 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz: 
              speed         user         nice          sys         idle          irq
       #1  2593 MHz        597 s          1 s        112 s       1868 s          0 s
       #2  2593 MHz       1980 s          1 s        147 s        490 s          0 s
       
  Memory: 6.788978576660156 GB (3466.890625 MB free)
  Uptime: 264.59 sec
  Load Avg:  1.08  0.79  0.36
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake-avx512)

Target result

Benchmark Report for /home/runner/work/KernelFunctions.jl/KernelFunctions.jl

Job Properties

  • Time of benchmark: 31 Jan 2022 - 11:31
  • Package commit: eb1a6c
  • Julia commit: 905826
  • Julia command flags: None
  • Environment variables: None

Results

Below is a table of this job's results, obtained by running the benchmarks.
The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to
index into the BaseBenchmarks suite to retrieve the corresponding benchmarks.
The percentages accompanying time and memory values in the below table are noise tolerances. The "true"
time/memory value for a given benchmark is expected to fall within this percentage of the reported value.
An empty cell means that the value was zero.

ID time GC time memory allocations
["Exponential", "ColVecs", "kernelmatrixX"] 6.340 μs (5%) 6.50 KiB (1%) 2
["Exponential", "ColVecs", "kernelmatrixXY"] 6.300 μs (5%) 6.50 KiB (1%) 2
["Exponential", "ColVecs", "kernelmatrix_diagX"] 214.314 ns (5%) 480 bytes (1%) 2
["Exponential", "ColVecs", "kernelmatrix_diagXY"] 403.000 ns (5%) 480 bytes (1%) 2
["Exponential", "RowVecs", "kernelmatrixX"] 2.644 μs (5%) 1.75 KiB (1%) 2
["Exponential", "RowVecs", "kernelmatrixXY"] 2.644 μs (5%) 1.75 KiB (1%) 2
["Exponential", "RowVecs", "kernelmatrix_diagX"] 143.484 ns (5%) 320 bytes (1%) 2
["Exponential", "RowVecs", "kernelmatrix_diagXY"] 353.052 ns (5%) 320 bytes (1%) 2
["Exponential", "Vecs", "kernelmatrixX"] 6.875 μs (5%) 6.56 KiB (1%) 4
["Exponential", "Vecs", "kernelmatrixXY"] 6.875 μs (5%) 6.56 KiB (1%) 4
["Exponential", "Vecs", "kernelmatrix_diagX"] 214.132 ns (5%) 480 bytes (1%) 2
["Exponential", "Vecs", "kernelmatrix_diagXY"] 439.904 ns (5%) 544 bytes (1%) 4
["SqExponential", "ColVecs", "kernelmatrixX"] 5.867 μs (5%) 6.50 KiB (1%) 2
["SqExponential", "ColVecs", "kernelmatrixXY"] 5.833 μs (5%) 6.50 KiB (1%) 2
["SqExponential", "ColVecs", "kernelmatrix_diagX"] 200.328 ns (5%) 480 bytes (1%) 2
["SqExponential", "ColVecs", "kernelmatrix_diagXY"] 364.428 ns (5%) 480 bytes (1%) 2
["SqExponential", "RowVecs", "kernelmatrixX"] 2.522 μs (5%) 1.75 KiB (1%) 2
["SqExponential", "RowVecs", "kernelmatrixXY"] 2.522 μs (5%) 1.75 KiB (1%) 2
["SqExponential", "RowVecs", "kernelmatrix_diagX"] 136.903 ns (5%) 320 bytes (1%) 2
["SqExponential", "RowVecs", "kernelmatrix_diagXY"] 325.004 ns (5%) 320 bytes (1%) 2
["SqExponential", "Vecs", "kernelmatrixX"] 6.620 μs (5%) 6.53 KiB (1%) 3
["SqExponential", "Vecs", "kernelmatrixXY"] 6.640 μs (5%) 6.53 KiB (1%) 3
["SqExponential", "Vecs", "kernelmatrix_diagX"] 202.717 ns (5%) 480 bytes (1%) 2
["SqExponential", "Vecs", "kernelmatrix_diagXY"] 453.812 ns (5%) 512 bytes (1%) 3

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["Exponential", "ColVecs"]
  • ["Exponential", "RowVecs"]
  • ["Exponential", "Vecs"]
  • ["SqExponential", "ColVecs"]
  • ["SqExponential", "RowVecs"]
  • ["SqExponential", "Vecs"]

Julia versioninfo

Julia Version 1.6.5
Commit 9058264a69 (2021-12-19 12:30 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.3 LTS
  uname: Linux 5.11.0-1027-azure #30~20.04.1-Ubuntu SMP Wed Jan 12 20:56:50 UTC 2022 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz: 
              speed         user         nice          sys         idle          irq
       #1  2593 MHz        500 s          1 s        108 s       1350 s          0 s
       #2  2593 MHz       1464 s          1 s        136 s        396 s          0 s
       
  Memory: 6.788978576660156 GB (3555.484375 MB free)
  Uptime: 202.46 sec
  Load Avg:  1.1  0.72  0.31
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake-avx512)

Baseline result

Benchmark Report for /home/runner/work/KernelFunctions.jl/KernelFunctions.jl

Job Properties

  • Time of benchmark: 31 Jan 2022 - 11:32
  • Package commit: f9bbd8
  • Julia commit: 905826
  • Julia command flags: None
  • Environment variables: None

Results

Below is a table of this job's results, obtained by running the benchmarks.
The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to
index into the BaseBenchmarks suite to retrieve the corresponding benchmarks.
The percentages accompanying time and memory values in the below table are noise tolerances. The "true"
time/memory value for a given benchmark is expected to fall within this percentage of the reported value.
An empty cell means that the value was zero.

ID time GC time memory allocations
["Exponential", "ColVecs", "kernelmatrixX"] 5.520 μs (5%) 6.73 KiB (1%) 3
["Exponential", "ColVecs", "kernelmatrixXY"] 5.140 μs (5%) 6.97 KiB (1%) 4
["Exponential", "ColVecs", "kernelmatrix_diagX"] 210.277 ns (5%) 480 bytes (1%) 2
["Exponential", "ColVecs", "kernelmatrix_diagXY"] 378.500 ns (5%) 480 bytes (1%) 2
["Exponential", "RowVecs", "kernelmatrixX"] 1.978 μs (5%) 3.67 KiB (1%) 4
["Exponential", "RowVecs", "kernelmatrixXY"] 2.122 μs (5%) 5.59 KiB (1%) 6
["Exponential", "RowVecs", "kernelmatrix_diagX"] 137.717 ns (5%) 320 bytes (1%) 2
["Exponential", "RowVecs", "kernelmatrix_diagXY"] 317.845 ns (5%) 320 bytes (1%) 2
["Exponential", "Vecs", "kernelmatrixX"] 7.925 μs (5%) 6.59 KiB (1%) 4
["Exponential", "Vecs", "kernelmatrixXY"] 7.725 μs (5%) 6.59 KiB (1%) 4
["Exponential", "Vecs", "kernelmatrix_diagX"] 209.727 ns (5%) 480 bytes (1%) 2
["Exponential", "Vecs", "kernelmatrix_diagXY"] 541.924 ns (5%) 480 bytes (1%) 2
["SqExponential", "ColVecs", "kernelmatrixX"] 5.217 μs (5%) 6.73 KiB (1%) 3
["SqExponential", "ColVecs", "kernelmatrixXY"] 4.933 μs (5%) 6.97 KiB (1%) 4
["SqExponential", "ColVecs", "kernelmatrix_diagX"] 197.891 ns (5%) 480 bytes (1%) 2
["SqExponential", "ColVecs", "kernelmatrix_diagXY"] 355.769 ns (5%) 480 bytes (1%) 2
["SqExponential", "RowVecs", "kernelmatrixX"] 1.944 μs (5%) 3.67 KiB (1%) 4
["SqExponential", "RowVecs", "kernelmatrixXY"] 2.100 μs (5%) 5.59 KiB (1%) 6
["SqExponential", "RowVecs", "kernelmatrix_diagX"] 132.689 ns (5%) 320 bytes (1%) 2
["SqExponential", "RowVecs", "kernelmatrix_diagXY"] 300.000 ns (5%) 320 bytes (1%) 2
["SqExponential", "Vecs", "kernelmatrixX"] 7.160 μs (5%) 6.59 KiB (1%) 4
["SqExponential", "Vecs", "kernelmatrixXY"] 7.160 μs (5%) 6.59 KiB (1%) 4
["SqExponential", "Vecs", "kernelmatrix_diagX"] 199.043 ns (5%) 480 bytes (1%) 2
["SqExponential", "Vecs", "kernelmatrix_diagXY"] 512.188 ns (5%) 480 bytes (1%) 2

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["Exponential", "ColVecs"]
  • ["Exponential", "RowVecs"]
  • ["Exponential", "Vecs"]
  • ["SqExponential", "ColVecs"]
  • ["SqExponential", "RowVecs"]
  • ["SqExponential", "Vecs"]

Julia versioninfo

Julia Version 1.6.5
Commit 9058264a69 (2021-12-19 12:30 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.3 LTS
  uname: Linux 5.11.0-1027-azure #30~20.04.1-Ubuntu SMP Wed Jan 12 20:56:50 UTC 2022 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz: 
              speed         user         nice          sys         idle          irq
       #1  2593 MHz        597 s          1 s        112 s       1868 s          0 s
       #2  2593 MHz       1980 s          1 s        147 s        490 s          0 s
       
  Memory: 6.788978576660156 GB (3466.890625 MB free)
  Uptime: 264.59 sec
  Load Avg:  1.08  0.79  0.36
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake-avx512)

Runtime information

Runtime Info
BLAS #threads 2
BLAS.vendor() openblas64
Sys.CPU_THREADS 2

lscpu output:

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          2
On-line CPU(s) list:             0,1
Thread(s) per core:              1
Core(s) per socket:              2
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           85
Model name:                      Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Stepping:                        7
CPU MHz:                         2593.906
BogoMIPS:                        5187.81
Hypervisor vendor:               Microsoft
Virtualization type:             full
L1d cache:                       64 KiB
L1i cache:                       64 KiB
L2 cache:                        2 MiB
L3 cache:                        35.8 MiB
NUMA node0 CPU(s):               0,1
Vulnerability Itlb multihit:     KVM: Mitigation: VMX unsupported
Vulnerability L1tf:              Mitigation; PTE Inversion
Vulnerability Mds:               Mitigation; Clear CPU buffers; SMT Host state unknown
Vulnerability Meltdown:          Mitigation; PTI
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Full generic retpoline, STIBP disabled, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Mitigation; Clear CPU buffers; SMT Host state unknown
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt avx512cd avx512bw avx512vl xsaveopt xsavec xsaves md_clear
Cpu Property Value
Brand Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Vendor :Intel
Architecture :Skylake
Model Family: 0x06, Model: 0x55, Stepping: 0x07, Type: 0x00
Cores 2 physical cores, 2 logical cores (on executing CPU)
No Hyperthreading hardware capability detected
Clock Frequencies Not supported by CPU
Data Cache Level 1:3 : (32, 1024, 36608) kbytes
64 byte cache line size
Address Size 48 bits virtual, 46 bits physical
SIMD 512 bit = 64 byte max. SIMD vector size
Time Stamp Counter TSC is accessible via rdtsc
TSC increased at every clock cycle (non-invariant TSC)
Perf. Monitoring Performance Monitoring Counters (PMC) are not supported
Hypervisor Yes, Microsoft

@sharanry
Copy link
Contributor

I have been trying to get this minimally working, but I am observing that it is terribly slow on the GPU. Am I doing something completely wrong?

julia> using Tullio, BenchmarkTools, CUDA

julia> x = rand(5, 5);

julia> test(x) = @tullio D[i, j] := (x[k, i] - x[k, j])^2 grad=false;

julia> test2(x) = @tullio D[i, j] := (x[i, k] - x[j, k])^2 grad=false;

julia> @btime test($x);
  104.748 ns (1 allocation: 256 bytes)

julia> @btime test($(x |> cu));
  2.989 ms (834 allocations: 129.17 KiB)

julia> @btime test2($x);
  118.719 ns (1 allocation: 256 bytes)

julia> @btime test2($(x |> cu));
  2.973 ms (834 allocations: 129.17 KiB)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance critical Triggers benchmarking CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DotProduct "metric"
5 participants