benchmark ideas #240

StefanKarpinski · 2018-11-07T21:18:09Z

I could have sworn I'd previously opened an issue here with an idea for a benchmark but now I can't find it. There was a discussion of some algorithms that are hard/impossible to do in a vectorized way, and a couple that came up were:

PDE solver (@ChrisRackauckas)
counting the frequency of all substrings of a large string (@jakobnissen)

Please post and discuss more ideas here!

KristofferC · 2018-11-07T22:09:54Z

I could do a PDE solver (FEM) benchmark. A funny thing about PDEs is that one solving the simplest possible (regular grid with diffusion) is a couple of lines while a general framework can be arbitrarily big. But something that is kinda realistic should be possible in a few hundred lines.

ChrisRackauckas · 2018-11-07T23:33:53Z

A funny thing about PDEs is that one solving the simplest possible (regular grid with diffusion) is a couple of lines while a general framework can be arbitrarily big.

Very true. Though I wonder if we should just link over to @johnfgibson's https://github.com/johnfgibson/julia-pde-benchmark which is really well done already. Instead of a PDE, we could also do something like the 4th order Runge-Kutta method.

https://en.wikipedia.org/wiki/Runge%E2%80%93Kutta_methods#The_Runge%E2%80%93Kutta_method

A quick Julia code for using it to simulate the Lorenz equation is:

function f(du,u)
 du[1] = 10.0*(u[2]-u[1])
 du[2] = u[1]*(28.0-u[3]) - u[2]
 du[3] = u[1]*u[2] - (8/3)*u[3]
end

function rk4_solve(u,dt,n)
  k1 = similar(u); k2 = similar(u)
  k3 = similar(u); k4 = similar(u)
  tmp = similar(u)
  for i in 1:n
    f(k1,u)
    @. tmp = u + dt*k1/2
    f(k2,u)
    @. tmp = u + dt*k2/2
    f(k3,u)
    @. tmp = u + dt*k3
    f(k4,u)
    @. u = u + (k1 + 2k2 + 2k3 + k4)/6
  end
  u
end

u = [1.0,0.0,0.0]
dt = 1/2
n = 100
u = rk4_solve(u,dt,n)

Some things which are immediately highlighted in this example is:

This code is pretty much impossible to "MATLAB-vectorize" effectively, to the point that the researcher who made the MATLAB ODE suite (one of the most prominent in the field) had a specific project to investigate the possibility of more vectorizable methods. (Note this can still be done functionally as a fold, though that breaks down when you add adaptivity of dt)
Array re-use and small array performance is key. Or use StaticArrays if possible.
Loop performance is key.
The speed of higher order function calls is key.

jakobnissen · 2018-11-08T09:12:20Z

To give some context for the "counting substring problem": I work in bioinformatics, where we often have long sequences of DNA, 100's of thousands to millions of bases:
TAGTGATAGTGCTTCGGGAAAACC ...
A kmer is a DNA-sequence of length k. For small values of k, we can pack them into unsigned integers, which is often the only way to process sequences efficiently enough to handle millions of sequences. So over time, kmer analysis has become a very common procedure, almost a small subfield.
For example, a common thing to do is choose some small K (typically 4), and count the frequency of all 4^4 possible 4-mers. Any sequence, no matter the length, can then be represented by a 256-length vector of frequencies.
To make things worse, some of the characters in the sequence are undetermined (marked by "N"). Any kmer containing an N at any position is invalid and must be ignored.

So the challenge is: Given a string 250.000 characters long, tally up each 4-mer and count their frequency, skipping any 4-mer that contains an invalid character.
In BioJulia, the heavy lifting is done by this generic kmer iterator protocol, and finishes in ~450 microseconds.

StefanKarpinski · 2018-11-08T14:41:04Z

So I guess the microbenchmark version of the k-mer counting code would be to fix a specific length like 4-mers and then count the frequency in a long, random fake DNA sequence, and have some simpler specialized code for that. Or maybe an example DNA sequence, although for microbenchmarks we generally haven't had data files, but there's no reason we couldn't.

KristofferC · 2018-11-08T14:46:48Z

We have some data files in this repo https://github.com/JuliaCI/BaseBenchmarks.jl/tree/master/src/problem/data.

remove sorting of Trial

Keno pushed a commit that referenced this issue Feb 4, 2022

Merge pull request #240 from JuliaCI/kc/sort

a7375d0

remove sorting of Trial

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmark ideas #240

benchmark ideas #240

StefanKarpinski commented Nov 7, 2018

KristofferC commented Nov 7, 2018 •

edited

Loading

ChrisRackauckas commented Nov 7, 2018

jakobnissen commented Nov 8, 2018

StefanKarpinski commented Nov 8, 2018 •

edited

Loading

KristofferC commented Nov 8, 2018

benchmark ideas #240

benchmark ideas #240

Comments

StefanKarpinski commented Nov 7, 2018

KristofferC commented Nov 7, 2018 • edited Loading

ChrisRackauckas commented Nov 7, 2018

jakobnissen commented Nov 8, 2018

StefanKarpinski commented Nov 8, 2018 • edited Loading

KristofferC commented Nov 8, 2018

KristofferC commented Nov 7, 2018 •

edited

Loading

StefanKarpinski commented Nov 8, 2018 •

edited

Loading