Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
SkipPhase removal was tested by protonspring (official-stockfish#1835). STC (+1 offset) (3 threads) LLR: 2.95 (-2.94,2.94) [-3.00,1.00] Total: 28428 W: 6278 L: 6170 D: 15980 http://tests.stockfishchess.org/tests/view/5bfe01c20ebc5902bceda021 STC (+1 offset) (8 threads) LLR: 2.95 (-2.94,2.94) [-3.00,1.00] Total: 26002 W: 5082 L: 4970 D: 15950 http://tests.stockfishchess.org/tests/view/5bfe132c0ebc5902bceda12f SkipSize for threads 1-20 can be captured exactly by skipSize = int(std::log(idx + 1) / std::log(1.92)); where logarithmic growth seems natural (with a base similar to the branching factor). The formula extends to larger thread counts.
- Loading branch information
5a59835
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@noobpwnftw @CoffeeOne If you have a chance to add some >=31 core machines to fishtest, it would help with this patch. TIA.
5a59835
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added one machine. Nevertheless in my opinion that should only be committed when it gains Elo, not as a simplification.
5a59835
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, hope is a patch along these lines brings elo. The current scheme being hardcoded to 20 threads is not satisfactory, especially not in light of the cluster branch. Note that this patch is exactly equal to master for 1-7 threads, and very similar (identical skipsizes) to 20 threads.... However, so far, it doesn't seem better to have more threads with higher skipsizes.
5a59835
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test showed that even skip size 4 isn't of very much benefit so even larger makes little sense.
http://tests.stockfishchess.org/tests/view/5b3d3e940ebc5902b9ffdf06
5a59835
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mstembera thanks for the links, that's useful info I had forgotten.
Let's reflect however on what happens with e.g. 512 threads (think cluster branch).. at that point it is not so obvious that skipSize 4, 5... would not be beneficial. I don't know the answer, so we need to get some insight in that question.
5a59835
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vondele I think you are overestimating the value of the skipping scheme.
Not very long time ago, there was a test series on @noobpwnftw 's 192c/384t machine. I remember that jumping from 192 => to 383 threads still brought some Elo, and the gain from 64 threads to 384 threads was huge. That means stockfish's lazy SMP is not strongly dependent on the skipping scheme. The skipping scheme repeats after 20 threads, right? So with 383 threads there are always 19(!) threads with the same skipping scheme, but performance is still great.
Also I remember that the tests when introducing the skipping scheme were really lengthy. There were tests with super short time control 2" +0.02 on 32 threads, that passed very quickly with huge elo gain, but on slower time control things changed.
Let's go in the other direction and try only maximum skip size of 2 (?)
5a59835
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really overestimating the value, I'm just wondering what is optimal and why. The current choice of stopping at 20 seems arbitrary.
The motivation for this are the results of the cluster branch: official-stockfish#1931 (comment)
where the scaling around 512 threads is sub-optimal (at low threads Elo increases linearly with the log of the number of threads added, i.e. much like doubling time roughly gives constant Elo gain).
We might be touching the limits of lazySMP, but we need tests to figure out why, and eventually try fix this.. It is obviously not going to be easy. However, better is to have a simple model one can reason about (I think this patch goes in this direction), which one can test, falsify or verify.
5a59835
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we might be touching the limits of SMP, also cluster's scaling can be different to normal (1-node) stockfish's scaling.
When this test fails (assuming that the change gave an Elo loss), I will submit a test will smaller skip sizes, for example repeating the current scheme with more than 8 threads, I would like to know how it performs in comparison.
5a59835
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on a single node, scaling is similar with threading and MPI (not quite as good however, similar to the how threading is worse than equivalent 'time odds' ).
As a next test, I wanted to increase the 'prefactor', i.e. the 1.92 in the formula (increasing it to 2.4 will limit the skipSize to 3 for up to 31 threads). I'd like to experiment with 2.26 (limits skipSize to 3 for up to 25 threads).
5a59835
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5a59835
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean you will submit a test with 31 threads, where threads 1-25 have similar skipping scheme than current master, and only threads 26, 27, 28, 29, 30, 31 have another skipping scheme?
I do not think that's a good idea, because the effect of that change will be very small. Therefore the test results will be very noisy, I expect a high number of games.
5a59835
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, OK, according to the figure, I was wrong, so the test can make sense :)
5a59835
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the time when the skipping scheme was introduced we had a shared continuation-history (countermoves & follow-moves) between threads and no capture-history at all. Thus without spreading the threads over different depths we ran into the risk to have almost identical search-trees in various threads and wasting resources in doing duplicate work.
Now that each thread has it's own sophisticated histories I think the benefit of the skipping scheme has less impact than it had at it's introduction,
5a59835
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be indeed that it has become less important. However, I think the problem of too similar search trees still exists at many threads. We should aim for reaching higher depth with more threads.
5a59835
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vondele Your test is approved. (I'm letting you know in a comment, because under the current system, checking is a bit difficult.)
Another trick: to view/approve tests in general, not just from a specific user, Actions is still up:
http://tests.stockfishchess.org/actions