-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update NNUE architecture to SFNNv8: L1-2560 nn-ac1dbea57aa3.nnue #4795
Conversation
Creating this net involved: - a 6-stage training process from scratch. The datasets used in stages 1-5 were fully minimized. - permuting L1 weights with official-stockfish/nnue-pytorch#254 A strong epoch after each training stage was chosen for the next. The 6 stages were: ``` 1. 400 epochs, lambda 1.0, default LR and gamma UHOx2-wIsRight-multinet-dfrc-n5000 (135G) nodes5000pv2_UHO.binpack data_pv-2_diff-100_nodes-5000.binpack wrongIsRight_nodes5000pv2.binpack multinet_pv-2_diff-100_nodes-5000.binpack dfrc_n5000.binpack 2. 800 epochs, end-lambda 0.75, LR 4.375e-4, gamma 0.995, skip 12 LeelaFarseer-T78juntoaugT79marT80dec.binpack (141G) T60T70wIsRightFarseerT60T74T75T76.binpack test78-junjulaug2022-16tb7p.no-db.min.binpack test79-mar2022-16tb7p.no-db.min.binpack test80-dec2022-16tb7p.no-db.min.binpack 3. 800 epochs, end-lambda 0.725, LR 4.375e-4, gamma 0.995, skip 20 leela93-v1-dfrc99-v2-T78juntosepT80jan-v6dd-T78janfebT79aprT80aprmay.min.binpack leela93-filt-v1.min.binpack dfrc99-16tb7p-filt-v2.min.binpack test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.binpack test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.binpack test78-janfeb2022-16tb7p.min.binpack test79-apr2022-16tb7p.min.binpack test80-apr2022-16tb7p.min.binpack test80-may2022-16tb7p.min.binpack 4. 800 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 24 leela96-dfrc99-v2-T78juntosepT79mayT80junsepnovjan-v6dd-T80mar23-v6-T60novdecT77decT78aprmayT79aprT80may23.min.binpack leela96-filt-v2.min.binpack dfrc99-16tb7p-filt-v2.min.binpack test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.binpack test79-may2022-16tb7p.filter-v6-dd.min.binpack test80-jun2022-16tb7p.filter-v6-dd.min.binpack test80-sep2022-16tb7p.filter-v6-dd.min.binpack test80-nov2022-16tb7p.filter-v6-dd.min.binpack test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.binpack test80-mar2023-2tb7p.v6-sk16.min.binpack test60-novdec2021-16tb7p.min.binpack test77-dec2021-16tb7p.min.binpack test78-aprmay2022-16tb7p.min.binpack test79-apr2022-16tb7p.min.binpack test80-may2023-2tb7p.min.binpack 5. 960 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 28 Increased max-epoch to 960 near the end of the first 800 epochs 5af11540bbfe dataset: official-stockfish#4635 6. 1000 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 28 Increased max-epoch to 1000 near the end of the first 800 epochs 1ee1aba5ed dataset: official-stockfish#4782 ``` L1 weights permuted with: ```bash python3 serialize.py $nnue $nnue_permuted \ --features=HalfKAv2_hm \ --ft_optimize \ --ft_optimize_data=/data/fishpack32.binpack \ --ft_optimize_count=10000 ``` Speed measurements from 100 bench runs at depth 13 with profile-build x86-64-avx2: ``` sf_base = 1329051 +/- 2224 (95%) sf_test = 1163344 +/- 2992 (95%) diff = -165706 +/- 4913 (95%) speedup = -12.46807% +/- 0.370% (95%) ``` Training data can be found at: https://robotmoon.com/nnue-training-data/ Local elo at 25k nodes per move (vs. L1-2048 nn-1ee1aba5ed4c.nnue) ep959 : 16.2 +/- 2.3 Failed 10+0.1 STC: https://tests.stockfishchess.org/tests/view/6501beee2cd016da89abab21 LLR: -2.92 (-2.94,2.94) <0.00,2.00> Total: 13184 W: 3285 L: 3535 D: 6364 Ptnml(0-2): 85, 1662, 3334, 1440, 71 Failed 180+1.8 VLTC: https://tests.stockfishchess.org/tests/view/6505cf9a72620bc881ea908e LLR: -2.94 (-2.94,2.94) <0.00,2.00> Total: 64248 W: 16224 L: 16374 D: 31650 Ptnml(0-2): 26, 6788, 18640, 6650, 20 Passed 60+0.6 th 8 VLTC SMP (STC bounds): https://tests.stockfishchess.org/tests/view/65084a4618698b74c2e541dc LLR: 2.95 (-2.94,2.94) <0.00,2.00> Total: 90630 W: 23372 L: 23033 D: 44225 Ptnml(0-2): 13, 8490, 27968, 8833, 11 Passed 60+0.6 th 8 VLTC SMP: https://tests.stockfishchess.org/tests/view/6501d45d2cd016da89abacdb LLR: 2.95 (-2.94,2.94) <0.50,2.50> Total: 137804 W: 35764 L: 35276 D: 66764 Ptnml(0-2): 31, 13006, 42326, 13522, 17 bench 1246812
So I see this wasn't trained with official-stockfish/nnue-pytorch#259 ? I don't see the benefits at this point. The process stays complicated, net gets larger, but the gains are within noise. I'd be in favor of this only if it simplifies the training process. Even it's as little as getting rid of these large interleaved binpacks. Could we maybe simplify with the current arch? |
Correct, this net is based on a training that started in May, about 2 months before that PR was opened.
These are simplifications vs. the current L1-2048 master training run:
It's possible there would have been more elo gains if this training had instead used larger/more-randomized binpacks and the previous complicated weight permutation process.
If a simpler training process can't pass SPRT, what's the criteria for whether it can be accepted? There's a balance between simplifying training and maximizing elo. Have to pick 2 of 3:
As long as gaining elo is top priority, training simplifications will naturally follow sometime later unless one is willing to spend significant time trying to optimize both at once. |
At this pace we will end up with a 8192-wide L1 before anyone else is able to reproduce the network |
Let me post an additional measurement Sopel did https://tests.stockfishchess.org/tests/view/650c77c6fb151d43ae6d51dd showing master net is roughly 30 Elo stronger than an old master net with simpler training procedure. I believe that shows that significant progress has indeed been made, i.e. the training protocol is complex and the data sets large, but the Elo results are quite impressive. The larger network sizes has so far shown quite consistently good scaling with TC, i.e. seemingly growing benefit at larger TC, which is consistent with intuition, and are clearly strong at fixed nodes. This could be contributing to the good performance in some of the ongoing tournaments. Reducing nps is actually also a good thing when in comes to hash pressure, i.e. less hash is needed for the same analysis time. Having said all these positive things on the evolution of the nets, clearly, picking up training for new contributors, or people who had a break in training (like myself), is pretty difficult. It is essential that we are able to keep the process reproducible, and simple enough that we can improve on it. While I think linrock does a great job in describing in words what the process is, and providing the needed data, this really is a software engineering task. Ideally, the whole process could be reproduced starting from a single declarative file (e.g. a json that documents all datasets and parameters). Our easy_train.py is a first step, and I know we have pending PRs on nnue-pytorch that make good steps in that direction (e.g. official-stockfish/nnue-pytorch#257). I can only encourage this effort, and I will, in a couple of months, pick up training again. |
I am probably not the first to have this idea but we could have a second small/fast net to use for our simple eval when the material advantage already looks decisive. |
yes, the idea is around, but nobody implemented and tried it. |
like, none of these files exist. How do I form this dataset |
https://www.kaggle.com/datasets/linrock/leela96-filt-v2-min https://www.kaggle.com/datasets/linrock/t80augtooctt79aprt78aprtosep-v6-mar2023min https://www.kaggle.com/datasets/linrock/0dd1cebea57-misc-v6-dd https://www.kaggle.com/datasets/linrock/0dd1cebea57-test80-v6-dd/versions/2 https://www.kaggle.com/datasets/linrock/test80-mar2023-2tb7p-v6-sk16 https://www.kaggle.com/datasets/linrock/nn-1e7ca356472e-t60-t79 https://www.kaggle.com/datasets/linrock/test77-dec2021-16tb7p-84p https://www.kaggle.com/datasets/linrock/test78-aprmayjunjul2022-16tb7p https://www.kaggle.com/datasets/linrock/test79-apr2022-16tb7p https://www.kaggle.com/datasets/linrock/1ee1aba5ed-test80-martojul2023-2tb7p The filenames may vary a bit between this description and whatever was uploaded to kaggle. Aside from small differences in filenames, the main things to notice are:
|
also I know the dataset situation is quite messy. It would be amazing if we could host public datasets by simply rsync'ing onto a remote server. That would free up a lot of time for keeping the datasets tidy. Unfortunately having to manually manage data for uploading to kaggle is kind of a grind. It's currently hard to prioritize keeping the dataset simple vs. elo gainer research, since i'm handling the datasets mostly manually and large portions of the dataset are constantly changing. |
This is a later epoch from the same experiment that led to the previous master net. In training stage 6, max-epoch was raised to 1,200 near the end of the first 1,000 epochs. For more details, see official-stockfish#4795 Local elo at 25k nodes per move (vs. L1-2048 nn-1ee1aba5ed4c.nnue) ep1079 : 15.6 +/- 1.2 Passed STC: https://tests.stockfishchess.org/tests/view/651503b3b3e74811c8af1e2a LLR: 2.94 (-2.94,2.94) <0.00,2.00> Total: 29408 W: 7607 L: 7304 D: 14497 Ptnml(0-2): 97, 3277, 7650, 3586, 94 Passed LTC: https://tests.stockfishchess.org/tests/view/651585ceb3e74811c8af2a5f LLR: 2.94 (-2.94,2.94) <0.50,2.50> Total: 73164 W: 18828 L: 18440 D: 35896 Ptnml(0-2): 30, 7749, 20644, 8121, 38 bench 1306282
This is a later epoch from the same experiment that led to the previous master net. In training stage 6, max-epoch was raised to 1,200 near the end of the first 1,000 epochs. For more details, see #4795 Local elo at 25k nodes per move (vs. L1-2048 nn-1ee1aba5ed4c.nnue) ep1079 : 15.6 +/- 1.2 Passed STC: https://tests.stockfishchess.org/tests/view/651503b3b3e74811c8af1e2a LLR: 2.94 (-2.94,2.94) <0.00,2.00> Total: 29408 W: 7607 L: 7304 D: 14497 Ptnml(0-2): 97, 3277, 7650, 3586, 94 Passed LTC: https://tests.stockfishchess.org/tests/view/651585ceb3e74811c8af2a5f LLR: 2.94 (-2.94,2.94) <0.50,2.50> Total: 73164 W: 18828 L: 18440 D: 35896 Ptnml(0-2): 30, 7749, 20644, 8121, 38 closes #4810 Bench: 1453057
Creating this net involved:
A strong epoch after each training stage was chosen for the next. The 6 stages were:
L1 weights permuted with:
Speed measurements from 100 bench runs at depth 13 with profile-build x86-64-avx2:
Training data can be found at:
https://robotmoon.com/nnue-training-data/
Local elo at 25k nodes per move (vs. L1-2048 nn-1ee1aba5ed4c.nnue)
ep959 : 16.2 +/- 2.3
Failed 10+0.1 STC:
https://tests.stockfishchess.org/tests/view/6501beee2cd016da89abab21
LLR: -2.92 (-2.94,2.94) <0.00,2.00>
Total: 13184 W: 3285 L: 3535 D: 6364
Ptnml(0-2): 85, 1662, 3334, 1440, 71
Failed 180+1.8 VLTC:
https://tests.stockfishchess.org/tests/view/6505cf9a72620bc881ea908e
LLR: -2.94 (-2.94,2.94) <0.00,2.00>
Total: 64248 W: 16224 L: 16374 D: 31650
Ptnml(0-2): 26, 6788, 18640, 6650, 20
Passed 60+0.6 th 8 VLTC SMP (STC bounds):
https://tests.stockfishchess.org/tests/view/65084a4618698b74c2e541dc
LLR: 2.95 (-2.94,2.94) <0.00,2.00>
Total: 90630 W: 23372 L: 23033 D: 44225
Ptnml(0-2): 13, 8490, 27968, 8833, 11
Passed 60+0.6 th 8 VLTC SMP:
https://tests.stockfishchess.org/tests/view/6501d45d2cd016da89abacdb
LLR: 2.95 (-2.94,2.94) <0.50,2.50>
Total: 137804 W: 35764 L: 35276 D: 66764
Ptnml(0-2): 31, 13006, 42326, 13522, 17
bench 1246812