Skip to content

Commit

Permalink
Scaling thread distribution.
Browse files Browse the repository at this point in the history
SkipPhase removal was tested by protonspring (official-stockfish#1835).

STC (+1 offset) (3 threads)
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 28428 W: 6278 L: 6170 D: 15980
http://tests.stockfishchess.org/tests/view/5bfe01c20ebc5902bceda021

STC (+1 offset) (8 threads)
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 26002 W: 5082 L: 4970 D: 15950
http://tests.stockfishchess.org/tests/view/5bfe132c0ebc5902bceda12f

SkipSize for threads 1-20 can be captured exactly by
skipSize = int(std::log(idx + 1) / std::log(1.92));
where logarithmic growth seems natural (with a base similar to the branching factor).
The formula extends to larger thread counts.
  • Loading branch information
vondele committed Jan 13, 2019
1 parent 5446e6f commit 5a59835
Showing 1 changed file with 3 additions and 10 deletions.
13 changes: 3 additions & 10 deletions src/search.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -61,10 +61,6 @@ namespace {
// Different node types, used as a template parameter
enum NodeType { NonPV, PV };

// Sizes and phases of the skip-blocks, used for distributing search depths across the threads
constexpr int SkipSize[] = { 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4 };
constexpr int SkipPhase[] = { 0, 1, 0, 1, 2, 3, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 6, 7 };

// Razor and futility margins
constexpr int RazorMargin = 600;
Value futility_margin(Depth d, bool improving) {
Expand Down Expand Up @@ -358,12 +354,9 @@ void Thread::search() {
&& !(Limits.depth && mainThread && rootDepth / ONE_PLY > Limits.depth))
{
// Distribute search depths across the helper threads
if (idx > 0)
{
int i = (idx - 1) % 20;
if (((rootDepth / ONE_PLY + SkipPhase[i]) / SkipSize[i]) % 2)
continue; // Retry with an incremented rootDepth
}
int skipSize = int(std::log(idx + 1) / std::log(1.92));
if (skipSize && (rootDepth / ONE_PLY + idx + 1) / skipSize % 2)
continue; // Retry with an incremented rootDepth

// Age out PV variability metric
if (mainThread)
Expand Down

15 comments on commit 5a59835

@vondele
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@noobpwnftw @CoffeeOne If you have a chance to add some >=31 core machines to fishtest, it would help with this patch. TIA.

@CoffeeOne
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added one machine. Nevertheless in my opinion that should only be committed when it gains Elo, not as a simplification.

@vondele
Copy link
Owner Author

@vondele vondele commented on 5a59835 Jan 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, hope is a patch along these lines brings elo. The current scheme being hardcoded to 20 threads is not satisfactory, especially not in light of the cluster branch. Note that this patch is exactly equal to master for 1-7 threads, and very similar (identical skipsizes) to 20 threads.... However, so far, it doesn't seem better to have more threads with higher skipsizes.

@mstembera
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test showed that even skip size 4 isn't of very much benefit so even larger makes little sense.
http://tests.stockfishchess.org/tests/view/5b3d3e940ebc5902b9ffdf06

@vondele
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mstembera thanks for the links, that's useful info I had forgotten.

Let's reflect however on what happens with e.g. 512 threads (think cluster branch).. at that point it is not so obvious that skipSize 4, 5... would not be beneficial. I don't know the answer, so we need to get some insight in that question.

@CoffeeOne
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vondele I think you are overestimating the value of the skipping scheme.
Not very long time ago, there was a test series on @noobpwnftw 's 192c/384t machine. I remember that jumping from 192 => to 383 threads still brought some Elo, and the gain from 64 threads to 384 threads was huge. That means stockfish's lazy SMP is not strongly dependent on the skipping scheme. The skipping scheme repeats after 20 threads, right? So with 383 threads there are always 19(!) threads with the same skipping scheme, but performance is still great.

Also I remember that the tests when introducing the skipping scheme were really lengthy. There were tests with super short time control 2" +0.02 on 32 threads, that passed very quickly with huge elo gain, but on slower time control things changed.

Let's go in the other direction and try only maximum skip size of 2 (?)

@vondele
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really overestimating the value, I'm just wondering what is optimal and why. The current choice of stopping at 20 seems arbitrary.

The motivation for this are the results of the cluster branch: official-stockfish#1931 (comment)
where the scaling around 512 threads is sub-optimal (at low threads Elo increases linearly with the log of the number of threads added, i.e. much like doubling time roughly gives constant Elo gain).

We might be touching the limits of lazySMP, but we need tests to figure out why, and eventually try fix this.. It is obviously not going to be easy. However, better is to have a simple model one can reason about (I think this patch goes in this direction), which one can test, falsify or verify.

@CoffeeOne
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we might be touching the limits of SMP, also cluster's scaling can be different to normal (1-node) stockfish's scaling.
When this test fails (assuming that the change gave an Elo loss), I will submit a test will smaller skip sizes, for example repeating the current scheme with more than 8 threads, I would like to know how it performs in comparison.

@vondele
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on a single node, scaling is similar with threading and MPI (not quite as good however, similar to the how threading is worse than equivalent 'time odds' ).

As a next test, I wanted to increase the 'prefactor', i.e. the 1.92 in the formula (increasing it to 2.4 will limit the skipSize to 3 for up to 31 threads). I'd like to experiment with 2.26 (limits skipSize to 3 for up to 25 threads).

@vondele
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

@CoffeeOne
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean you will submit a test with 31 threads, where threads 1-25 have similar skipping scheme than current master, and only threads 26, 27, 28, 29, 30, 31 have another skipping scheme?

I do not think that's a good idea, because the effect of that change will be very small. Therefore the test results will be very noisy, I expect a high number of games.

@CoffeeOne
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, OK, according to the figure, I was wrong, so the test can make sense :)

@pb00068
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the time when the skipping scheme was introduced we had a shared continuation-history (countermoves & follow-moves) between threads and no capture-history at all. Thus without spreading the threads over different depths we ran into the risk to have almost identical search-trees in various threads and wasting resources in doing duplicate work.
Now that each thread has it's own sophisticated histories I think the benefit of the skipping scheme has less impact than it had at it's introduction,

@vondele
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be indeed that it has become less important. However, I think the problem of too similar search trees still exists at many threads. We should aim for reaching higher depth with more threads.

@31m059
Copy link

@31m059 31m059 commented on 5a59835 Jan 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vondele Your test is approved. (I'm letting you know in a comment, because under the current system, checking is a bit difficult.)

Another trick: to view/approve tests in general, not just from a specific user, Actions is still up:
http://tests.stockfishchess.org/actions

Please sign in to comment.