Sensible Default values? #12

kronocharia · 2015-02-12T11:56:10Z

"You should think about what "sensible default" means: the TBB user guide gives some guidance in the form of a "rule of thumb". Your rule of thumb should probably take into account the size of the inner loop: remember that the work is O(n^2), and the work within each original parfor iteration is O(n). But on the other side, you always want enough tasks to keep the processors occupied (no matter how many there are), so you can't set the chunk size to K=n."

The guide suggest timing to ensure they run for at least several thousand clock cycles, presumably that wasn't what we were meant to do?

and later on

"As before, if HPCE_FFT_LOOP_K is not set, choose a sensible default based on your analysis of the scaling with n, and/or experiments. Though remember, it should be a sensible default for all machines (even those with 64 cores)."

I'm not sure how to evaluate the performance in order to tune these 'sensible default's for all kinds of machines

m8pple · 2015-02-12T16:51:23Z

Exactly how many instructions a given chunk of code takes is going to vary
from machine to machine and compiler to compiler, so you can't be exact
without timing.

But, you can look at a piece of code and make a decent estimate. So for
example, if I look at:

float *wibble = ...;
for(unsigned i=0; i<n; i+=K){
    for(unsigned j=i; j<i+K; j++){ // assuming K is a factor of n
        wibble[j] = wibble[j]*wibble[j];
    }
}

Then I guess that the inner loop is a compare, branch, load, multiply, and a store.
Given this code is probably vectorisable,
and a super-scalar architecture can issue multiple instructions at once, I might play it safe
and assume one instruction per iteration. So in that case, I might choose K=16384 as a
reasonable point to stop splitting.

If I had something like:

float split(int begin, int n)
{
    if(n==1){
        return leaf(begin);
    }else{
        float a=f(begin, n/2);
        float b=f(begin+n/2, n/2);
        return branch(a,b);
    }
}

then I just need to estimate what value of (end-begin) I think will results
in just over the threshold. If I look at leaf and it takes about 16 instructions
I could make the approximation that for n=1024 I have about the right amount
of work. If I thought that branch took a lot of time, I might go crazy and apply
the master theorem, but it really
isn't needed.

As you vary K from K=1 to n/K=1 you usually find there is quite a large region
where K results in good speedup. Whether you have 1e4 or 1e6 instructions
doesn't matter, what matters is if you have 1e2 instructions, or there are only
two tasks when you have 8 CPUs.

kronocharia · 2015-02-12T20:11:43Z

It seems like the parfor K keeps on getting better the larger it is, this was from 1 to 5096, with larger k with trendlines that are to the right of the graph, this cant be correct?

m8pple closed this as completed Feb 12, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sensible Default values? #12

Sensible Default values? #12

kronocharia commented Feb 12, 2015

m8pple commented Feb 12, 2015

kronocharia commented Feb 12, 2015

Sensible Default values? #12

Sensible Default values? #12

Comments

kronocharia commented Feb 12, 2015

m8pple commented Feb 12, 2015

kronocharia commented Feb 12, 2015