FT optimization utility. Integrate with serialization. #254

Sopel97 · 2023-06-28T11:15:55Z

This is a small rewrite of https://github.com/Ergodice/nnue-pytorch/tree/ftperm. Big thanks to @Ergodice for the great FT weight permutation optimization algorithm. I kept it as is, only reducing batch size in one place - needs revisiting later to clean up and potentially support cpu-only optimization.

ftperm.py can be used standalone to go through the optimization process step by step. serialize.py integration allows optionally including the full process (without intermediate files) to be performed during serialization. The optimization is NOT performed by default, as it takes very long and requires additional parameters.

Usage notes copied from notes in ftperm.py:


NOTE: This script uses CUDA and may requires large amounts of VRAM. Decrease --count if encountering problems.

Example use:

1. Generate the activation matrix for some sample dataset.

python ftperm.py gather --data=data\fishpack32.binpack --net=networks\nn-5af11540bbfe.nnue --count=1000000 --features=HalfKAv2_hm --out ftact1m.npy

python ftperm.py gather --data=noob_master_leaf_static_d12_85M_0.binpack --net=nn-5af11540bbfe.nnue --count=10000 --features=HalfKAv2_hm --out ftact1m.npy

2. Find a permutation

python ftperm.py find_perm --data=ftact1m.npy --out=ftact.perm

3. Test the permutation against the baseline

python ftperm.py eval_perm --data=ftact1m.npy --perm=ftact.perm

4. Apply permutation and save
python serialize.py nn-5af11540bbfe.nnue permuted.nnue --features=HalfKAv2_hm --ft_perm=ftact.perm

----------------------------------------------------------------

OR do the whole process in one step

python serialize.py networks\nn-5af11540bbfe.nnue permuted.nnue --features=HalfKAv2_hm --ft_optimize --ft_optimize_data=data\fishpack32.binpack --ft_optimize_count=1000000

python serialize.py nn-5af11540bbfe.nnue permuted.nnue --features=HalfKAv2_hm --ft_optimize --ft_optimize_data=noob_master_leaf_static_d12_85M_0.binpack --ft_optimize_count=10000

After this is merged the following things should be looked at:

How many data points we actually need. I got good results with 10000, 1M maybe excessive.
Find out ideal batch size in get_score_change. Should be smallest possible as it affects VRAM usage. I reduced it to 10000 already. With 1GB of VRAM limit I had to reduce it to 100, didn't notice any issues.
Figure out if make_swaps_3 can be optimized with cupy. It takes by far the longest.
Add a cpu-only mode for VRAM limited systems.

Sopel97 · 2023-06-30T12:22:09Z

I have improved the readability and maintainability of the algorithm code. I also made sure it works both with cupy and numpy. Initially I was wrong, cupy was being used for make_swaps_3, it was numpy that wasn't.

@Ergodice could you please check if the comments I added make sense? I wasn't sure in all places.

I also believe that make_swaps_3 could be made faster by reusing old max_values and only updating it along the changed blocks (as this is quite sparse given the size of it). But this is for another time to expore.

Ergodice · 2023-06-30T19:39:26Z

Very elegant refactor!

A couple things:
On lines 95 and 159 the shape of actmat is not (N, L1) but (N, L1 // 2).
The zeroing out on lines 265-268 is not done to prevent redundancy but because the entries are only computed correctly when they are all from different blocks.

faster permutation of master net weights Activation data taken from https://drive.google.com/drive/folders/1Ec9YuuRx4N03GPnVPoQOW70eucOKngQe?usp=sharing Permutation found using https://github.com/Ergodice/nnue-pytorch/blob/836387a0e5e690431d404158c46648710f13904d/ftperm.py See also official-stockfish/nnue-pytorch#254 The algorithm greedily selects 2- and 3-cycles that can be permuted to increase the number of runs of zeroes. The percent of zero runs from the master net increased from 68.46 to 70.11 from 2-cycles and only increased to 70.32 when considering 3-cycles. Interestingly, allowing both halves of L1 to intermix when creating zero runs can give another 0.5% zero-run density increase with this method. Measured speedup: ``` CPU: 16 x AMD Ryzen 9 3950X 16-Core Processor Result of 50 runs base (./stockfish.master ) = 1561556 +/- 5439 test (./stockfish.patch ) = 1575788 +/- 5427 diff = +14231 +/- 2636 speedup = +0.0091 P(speedup > 0) = 1.0000 ``` closes official-stockfish#4640 No functional change

vondele · 2023-07-06T16:53:07Z

this seems to have conflicts, and unaddressed review comments?

Sopel97 · 2023-07-06T17:12:23Z

yes, I'll address them later

…th serialize.py

Sopel97 · 2023-07-07T13:26:51Z

Resolved conflics and addressed the comments.

@Ergodice

faster permutation of master net weights Activation data taken from https://drive.google.com/drive/folders/1Ec9YuuRx4N03GPnVPoQOW70eucOKngQe?usp=sharing Permutation found using https://github.com/Ergodice/nnue-pytorch/blob/836387a0e5e690431d404158c46648710f13904d/ftperm.py See also official-stockfish/nnue-pytorch#254 The algorithm greedily selects 2- and 3-cycles that can be permuted to increase the number of runs of zeroes. The percent of zero runs from the master net increased from 68.46 to 70.11 from 2-cycles and only increased to 70.32 when considering 3-cycles. Interestingly, allowing both halves of L1 to intermix when creating zero runs can give another 0.5% zero-run density increase with this method. Measured speedup: ``` CPU: 16 x AMD Ryzen 9 3950X 16-Core Processor Result of 50 runs base (./stockfish.master ) = 1561556 +/- 5439 test (./stockfish.patch ) = 1575788 +/- 5427 diff = +14231 +/- 2636 speedup = +0.0091 P(speedup > 0) = 1.0000 ``` closes #4640 No functional change master stockfish-dev-20230706-e699fee5 @Ergodice @vondele Ergodice authored

Creating this net involved: - a 6-stage training process from scratch - permuting L1 weights with official-stockfish/nnue-pytorch#254 A strong epoch after each training stage was chosen for the next. The 6 stages were: 1. 400 epochs, lambda 1.0, default LR and gamma UHOx2-wIsRight-multinet-dfrc-n5000 (135G) nodes5000pv2_UHO.binpack data_pv-2_diff-100_nodes-5000.binpack wrongIsRight_nodes5000pv2.binpack multinet_pv-2_diff-100_nodes-5000.binpack dfrc_n5000.binpack 2. 800 epochs, end-lambda 0.75, LR 4.375e-4, gamma 0.995, skip 12 LeelaFarseer-T78juntoaugT79marT80dec.binpack (141G) T60T70wIsRightFarseerT60T74T75T76.binpack test78-junjulaug2022-16tb7p.no-db.min.binpack test79-mar2022-16tb7p.no-db.min.binpack test80-dec2022-16tb7p.no-db.min.binpack 3. 800 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 20 leela93-v1-dfrc99-v2-T78juntosepT80jan-v6dd-T78janfebT79aprT80aprmay.min.binpack leela93-filt-v1.min.binpack dfrc99-16tb7p-filt-v2.min.binpack test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.min.binpack test80-jan2023-16tb7p.v6-sk20.min.binpack test78-janfeb2022-16tb7p.min.binpack test79-apr2022-16tb7p.min.binpack test80-apr2022-16tb7p.min.binpack test80-may2022-16tb7p.min.binpack 4. 800 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 24 leela96-dfrc99-v2-T78juntosepT79mayT80junsepnovjan-v6dd-T80mar23-v6-T60novdecT77decT78aprmayT79aprT80may23.min.binpack leela96-filt-v2.min.binpack dfrc99-16tb7p-filt-v2.min.binpack test78-juntosep2022-16tb7p-filter-v6-dd.min.binpack test79-may2022-16tb7p.filter-v6-dd.min.binpack test80-jun2022-16tb7p.filter-v6-dd.min.binpack test80-sep2022-16tb7p.filter-v6-dd.min.binpack test80-nov2022-16tb7p.filter-v6-dd.min.binpack test80-jan2023-2tb7p.filter-v6-dd.min.binpack test80-mar2023-2tb7p.v6-sk16.min.binpack test60-novdec2021-16tb7p.min.binpack test77-dec2021-16tb7p.min.binpack test78-aprmay2022-16tb7p.min.binpack test79-apr2022-16tb7p.min.binpack test80-may2023-2tb7p.min.binpack 5. 960 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 28 Increased max-epoch to 960 near the end of the first 800 epochs 5af11540bbfe dataset: official-stockfish#4635 6. 1000 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 28 Increased max-epoch to 1000 near the end of the first 800 epochs 1ee1aba5ed dataset: official-stockfish#4782 L1 weights permuted with: ``` python3 serialize.py $nnue $nnue_permuted \ --features=HalfKAv2_hm \ --ft_optimize \ --ft_optimize_data=/data/fishpack32.binpack \ --ft_optimize_count=10000 ``` Speed measurements from 100 bench runs at depth 13 with profile-build x86-64-avx2: ``` sf_base = 1329051 +/- 2224 (95%) sf_test = 1163344 +/- 2992 (95%) diff = -165706 +/- 4913 (95%) speedup = -12.46807% +/- 0.370% (95%) ``` Training data can be found at: https://robotmoon.com/nnue-training-data/ Local elo at 25k nodes per move (vs. L1-2048 nn-1ee1aba5ed4c.nnue) ep959 : 16.2 +/- 2.3 Failed 10+0.1 STC: https://tests.stockfishchess.org/tests/view/6501beee2cd016da89abab21 LLR: -2.92 (-2.94,2.94) <0.00,2.00> Total: 13184 W: 3285 L: 3535 D: 6364 Ptnml(0-2): 85, 1662, 3334, 1440, 71 Failed 180+1.8 VLTC: https://tests.stockfishchess.org/tests/view/6505cf9a72620bc881ea908e LLR: -2.94 (-2.94,2.94) <0.00,2.00> Total: 64248 W: 16224 L: 16374 D: 31650 Ptnml(0-2): 26, 6788, 18640, 6650, 20 Passed 60+0.6 th 8 VLTC SMP (STC bounds): https://tests.stockfishchess.org/tests/view/65084a4618698b74c2e541dc LLR: 2.95 (-2.94,2.94) <0.00,2.00> Total: 90630 W: 23372 L: 23033 D: 44225 Ptnml(0-2): 13, 8490, 27968, 8833, 11 Passed 60+0.6 th 8 VLTC SMP: https://tests.stockfishchess.org/tests/view/6501d45d2cd016da89abacdb LLR: 2.95 (-2.94,2.94) <0.50,2.50> Total: 137804 W: 35764 L: 35276 D: 66764 Ptnml(0-2): 31, 13006, 42326, 13522, 17 bench 1246812

Creating this net involved: - a 6-stage training process from scratch - permuting L1 weights with official-stockfish/nnue-pytorch#254 A strong epoch after each training stage was chosen for the next. The 6 stages were: 1. 400 epochs, lambda 1.0, default LR and gamma UHOx2-wIsRight-multinet-dfrc-n5000 (135G) nodes5000pv2_UHO.binpack data_pv-2_diff-100_nodes-5000.binpack wrongIsRight_nodes5000pv2.binpack multinet_pv-2_diff-100_nodes-5000.binpack dfrc_n5000.binpack 2. 800 epochs, end-lambda 0.75, LR 4.375e-4, gamma 0.995, skip 12 LeelaFarseer-T78juntoaugT79marT80dec.binpack (141G) T60T70wIsRightFarseerT60T74T75T76.binpack test78-junjulaug2022-16tb7p.no-db.min.binpack test79-mar2022-16tb7p.no-db.min.binpack test80-dec2022-16tb7p.no-db.min.binpack 3. 800 epochs, end-lambda 0.725, LR 4.375e-4, gamma 0.995, skip 20 leela93-v1-dfrc99-v2-T78juntosepT80jan-v6dd-T78janfebT79aprT80aprmay.min.binpack leela93-filt-v1.min.binpack dfrc99-16tb7p-filt-v2.min.binpack test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.min.binpack test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.binpack test78-janfeb2022-16tb7p.min.binpack test79-apr2022-16tb7p.min.binpack test80-apr2022-16tb7p.min.binpack test80-may2022-16tb7p.min.binpack 4. 800 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 24 leela96-dfrc99-v2-T78juntosepT79mayT80junsepnovjan-v6dd-T80mar23-v6-T60novdecT77decT78aprmayT79aprT80may23.min.binpack leela96-filt-v2.min.binpack dfrc99-16tb7p-filt-v2.min.binpack test78-juntosep2022-16tb7p-filter-v6-dd.min.binpack test79-may2022-16tb7p.filter-v6-dd.min.binpack test80-jun2022-16tb7p.filter-v6-dd.min.binpack test80-sep2022-16tb7p.filter-v6-dd.min.binpack test80-nov2022-16tb7p.filter-v6-dd.min.binpack test80-jan2023-2tb7p.filter-v6-dd.min.binpack test80-mar2023-2tb7p.v6-sk16.min.binpack test60-novdec2021-16tb7p.min.binpack test77-dec2021-16tb7p.min.binpack test78-aprmay2022-16tb7p.min.binpack test79-apr2022-16tb7p.min.binpack test80-may2023-2tb7p.min.binpack 5. 960 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 28 Increased max-epoch to 960 near the end of the first 800 epochs 5af11540bbfe dataset: official-stockfish#4635 6. 1000 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 28 Increased max-epoch to 1000 near the end of the first 800 epochs 1ee1aba5ed dataset: official-stockfish#4782 L1 weights permuted with: ``` python3 serialize.py $nnue $nnue_permuted \ --features=HalfKAv2_hm \ --ft_optimize \ --ft_optimize_data=/data/fishpack32.binpack \ --ft_optimize_count=10000 ``` Speed measurements from 100 bench runs at depth 13 with profile-build x86-64-avx2: ``` sf_base = 1329051 +/- 2224 (95%) sf_test = 1163344 +/- 2992 (95%) diff = -165706 +/- 4913 (95%) speedup = -12.46807% +/- 0.370% (95%) ``` Training data can be found at: https://robotmoon.com/nnue-training-data/ Local elo at 25k nodes per move (vs. L1-2048 nn-1ee1aba5ed4c.nnue) ep959 : 16.2 +/- 2.3 Failed 10+0.1 STC: https://tests.stockfishchess.org/tests/view/6501beee2cd016da89abab21 LLR: -2.92 (-2.94,2.94) <0.00,2.00> Total: 13184 W: 3285 L: 3535 D: 6364 Ptnml(0-2): 85, 1662, 3334, 1440, 71 Failed 180+1.8 VLTC: https://tests.stockfishchess.org/tests/view/6505cf9a72620bc881ea908e LLR: -2.94 (-2.94,2.94) <0.00,2.00> Total: 64248 W: 16224 L: 16374 D: 31650 Ptnml(0-2): 26, 6788, 18640, 6650, 20 Passed 60+0.6 th 8 VLTC SMP (STC bounds): https://tests.stockfishchess.org/tests/view/65084a4618698b74c2e541dc LLR: 2.95 (-2.94,2.94) <0.00,2.00> Total: 90630 W: 23372 L: 23033 D: 44225 Ptnml(0-2): 13, 8490, 27968, 8833, 11 Passed 60+0.6 th 8 VLTC SMP: https://tests.stockfishchess.org/tests/view/6501d45d2cd016da89abacdb LLR: 2.95 (-2.94,2.94) <0.50,2.50> Total: 137804 W: 35764 L: 35276 D: 66764 Ptnml(0-2): 31, 13006, 42326, 13522, 17 bench 1246812

Creating this net involved: - a 6-stage training process from scratch - permuting L1 weights with official-stockfish/nnue-pytorch#254 A strong epoch after each training stage was chosen for the next. The 6 stages were: ``` 1. 400 epochs, lambda 1.0, default LR and gamma UHOx2-wIsRight-multinet-dfrc-n5000 (135G) nodes5000pv2_UHO.binpack data_pv-2_diff-100_nodes-5000.binpack wrongIsRight_nodes5000pv2.binpack multinet_pv-2_diff-100_nodes-5000.binpack dfrc_n5000.binpack 2. 800 epochs, end-lambda 0.75, LR 4.375e-4, gamma 0.995, skip 12 LeelaFarseer-T78juntoaugT79marT80dec.binpack (141G) T60T70wIsRightFarseerT60T74T75T76.binpack test78-junjulaug2022-16tb7p.no-db.min.binpack test79-mar2022-16tb7p.no-db.min.binpack test80-dec2022-16tb7p.no-db.min.binpack 3. 800 epochs, end-lambda 0.725, LR 4.375e-4, gamma 0.995, skip 20 leela93-v1-dfrc99-v2-T78juntosepT80jan-v6dd-T78janfebT79aprT80aprmay.min.binpack leela93-filt-v1.min.binpack dfrc99-16tb7p-filt-v2.min.binpack test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.min.binpack test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.binpack test78-janfeb2022-16tb7p.min.binpack test79-apr2022-16tb7p.min.binpack test80-apr2022-16tb7p.min.binpack test80-may2022-16tb7p.min.binpack 4. 800 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 24 leela96-dfrc99-v2-T78juntosepT79mayT80junsepnovjan-v6dd-T80mar23-v6-T60novdecT77decT78aprmayT79aprT80may23.min.binpack leela96-filt-v2.min.binpack dfrc99-16tb7p-filt-v2.min.binpack test78-juntosep2022-16tb7p-filter-v6-dd.min.binpack test79-may2022-16tb7p.filter-v6-dd.min.binpack test80-jun2022-16tb7p.filter-v6-dd.min.binpack test80-sep2022-16tb7p.filter-v6-dd.min.binpack test80-nov2022-16tb7p.filter-v6-dd.min.binpack test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.binpack test80-mar2023-2tb7p.v6-sk16.min.binpack test60-novdec2021-16tb7p.min.binpack test77-dec2021-16tb7p.min.binpack test78-aprmay2022-16tb7p.min.binpack test79-apr2022-16tb7p.min.binpack test80-may2023-2tb7p.min.binpack 5. 960 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 28 Increased max-epoch to 960 near the end of the first 800 epochs 5af11540bbfe dataset: official-stockfish#4635 6. 1000 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 28 Increased max-epoch to 1000 near the end of the first 800 epochs 1ee1aba5ed dataset: official-stockfish#4782 ``` L1 weights permuted with: ```bash python3 serialize.py $nnue $nnue_permuted \ --features=HalfKAv2_hm \ --ft_optimize \ --ft_optimize_data=/data/fishpack32.binpack \ --ft_optimize_count=10000 ``` Speed measurements from 100 bench runs at depth 13 with profile-build x86-64-avx2: ``` sf_base = 1329051 +/- 2224 (95%) sf_test = 1163344 +/- 2992 (95%) diff = -165706 +/- 4913 (95%) speedup = -12.46807% +/- 0.370% (95%) ``` Training data can be found at: https://robotmoon.com/nnue-training-data/ Local elo at 25k nodes per move (vs. L1-2048 nn-1ee1aba5ed4c.nnue) ep959 : 16.2 +/- 2.3 Failed 10+0.1 STC: https://tests.stockfishchess.org/tests/view/6501beee2cd016da89abab21 LLR: -2.92 (-2.94,2.94) <0.00,2.00> Total: 13184 W: 3285 L: 3535 D: 6364 Ptnml(0-2): 85, 1662, 3334, 1440, 71 Failed 180+1.8 VLTC: https://tests.stockfishchess.org/tests/view/6505cf9a72620bc881ea908e LLR: -2.94 (-2.94,2.94) <0.00,2.00> Total: 64248 W: 16224 L: 16374 D: 31650 Ptnml(0-2): 26, 6788, 18640, 6650, 20 Passed 60+0.6 th 8 VLTC SMP (STC bounds): https://tests.stockfishchess.org/tests/view/65084a4618698b74c2e541dc LLR: 2.95 (-2.94,2.94) <0.00,2.00> Total: 90630 W: 23372 L: 23033 D: 44225 Ptnml(0-2): 13, 8490, 27968, 8833, 11 Passed 60+0.6 th 8 VLTC SMP: https://tests.stockfishchess.org/tests/view/6501d45d2cd016da89abacdb LLR: 2.95 (-2.94,2.94) <0.50,2.50> Total: 137804 W: 35764 L: 35276 D: 66764 Ptnml(0-2): 31, 13006, 42326, 13522, 17 bench 1246812

Creating this net involved: - a 6-stage training process from scratch - permuting L1 weights with official-stockfish/nnue-pytorch#254 A strong epoch after each training stage was chosen for the next. The 6 stages were: ``` 1. 400 epochs, lambda 1.0, default LR and gamma UHOx2-wIsRight-multinet-dfrc-n5000 (135G) nodes5000pv2_UHO.binpack data_pv-2_diff-100_nodes-5000.binpack wrongIsRight_nodes5000pv2.binpack multinet_pv-2_diff-100_nodes-5000.binpack dfrc_n5000.binpack 2. 800 epochs, end-lambda 0.75, LR 4.375e-4, gamma 0.995, skip 12 LeelaFarseer-T78juntoaugT79marT80dec.binpack (141G) T60T70wIsRightFarseerT60T74T75T76.binpack test78-junjulaug2022-16tb7p.no-db.min.binpack test79-mar2022-16tb7p.no-db.min.binpack test80-dec2022-16tb7p.no-db.min.binpack 3. 800 epochs, end-lambda 0.725, LR 4.375e-4, gamma 0.995, skip 20 leela93-v1-dfrc99-v2-T78juntosepT80jan-v6dd-T78janfebT79aprT80aprmay.min.binpack leela93-filt-v1.min.binpack dfrc99-16tb7p-filt-v2.min.binpack test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.binpack test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.binpack test78-janfeb2022-16tb7p.min.binpack test79-apr2022-16tb7p.min.binpack test80-apr2022-16tb7p.min.binpack test80-may2022-16tb7p.min.binpack 4. 800 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 24 leela96-dfrc99-v2-T78juntosepT79mayT80junsepnovjan-v6dd-T80mar23-v6-T60novdecT77decT78aprmayT79aprT80may23.min.binpack leela96-filt-v2.min.binpack dfrc99-16tb7p-filt-v2.min.binpack test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.binpack test79-may2022-16tb7p.filter-v6-dd.min.binpack test80-jun2022-16tb7p.filter-v6-dd.min.binpack test80-sep2022-16tb7p.filter-v6-dd.min.binpack test80-nov2022-16tb7p.filter-v6-dd.min.binpack test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.binpack test80-mar2023-2tb7p.v6-sk16.min.binpack test60-novdec2021-16tb7p.min.binpack test77-dec2021-16tb7p.min.binpack test78-aprmay2022-16tb7p.min.binpack test79-apr2022-16tb7p.min.binpack test80-may2023-2tb7p.min.binpack 5. 960 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 28 Increased max-epoch to 960 near the end of the first 800 epochs 5af11540bbfe dataset: official-stockfish#4635 6. 1000 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 28 Increased max-epoch to 1000 near the end of the first 800 epochs 1ee1aba5ed dataset: official-stockfish#4782 ``` L1 weights permuted with: ```bash python3 serialize.py $nnue $nnue_permuted \ --features=HalfKAv2_hm \ --ft_optimize \ --ft_optimize_data=/data/fishpack32.binpack \ --ft_optimize_count=10000 ``` Speed measurements from 100 bench runs at depth 13 with profile-build x86-64-avx2: ``` sf_base = 1329051 +/- 2224 (95%) sf_test = 1163344 +/- 2992 (95%) diff = -165706 +/- 4913 (95%) speedup = -12.46807% +/- 0.370% (95%) ``` Training data can be found at: https://robotmoon.com/nnue-training-data/ Local elo at 25k nodes per move (vs. L1-2048 nn-1ee1aba5ed4c.nnue) ep959 : 16.2 +/- 2.3 Failed 10+0.1 STC: https://tests.stockfishchess.org/tests/view/6501beee2cd016da89abab21 LLR: -2.92 (-2.94,2.94) <0.00,2.00> Total: 13184 W: 3285 L: 3535 D: 6364 Ptnml(0-2): 85, 1662, 3334, 1440, 71 Failed 180+1.8 VLTC: https://tests.stockfishchess.org/tests/view/6505cf9a72620bc881ea908e LLR: -2.94 (-2.94,2.94) <0.00,2.00> Total: 64248 W: 16224 L: 16374 D: 31650 Ptnml(0-2): 26, 6788, 18640, 6650, 20 Passed 60+0.6 th 8 VLTC SMP (STC bounds): https://tests.stockfishchess.org/tests/view/65084a4618698b74c2e541dc LLR: 2.95 (-2.94,2.94) <0.00,2.00> Total: 90630 W: 23372 L: 23033 D: 44225 Ptnml(0-2): 13, 8490, 27968, 8833, 11 Passed 60+0.6 th 8 VLTC SMP: https://tests.stockfishchess.org/tests/view/6501d45d2cd016da89abacdb LLR: 2.95 (-2.94,2.94) <0.50,2.50> Total: 137804 W: 35764 L: 35276 D: 66764 Ptnml(0-2): 31, 13006, 42326, 13522, 17 bench 1246812

Creating this net involved: - a 6-stage training process from scratch - permuting L1 weights with official-stockfish/nnue-pytorch#254 The datasets used in stages 1-5 were fully minimized. A strong epoch after each training stage was chosen for the next. The 6 stages were: ``` 1. 400 epochs, lambda 1.0, default LR and gamma UHOx2-wIsRight-multinet-dfrc-n5000 (135G) nodes5000pv2_UHO.binpack data_pv-2_diff-100_nodes-5000.binpack wrongIsRight_nodes5000pv2.binpack multinet_pv-2_diff-100_nodes-5000.binpack dfrc_n5000.binpack 2. 800 epochs, end-lambda 0.75, LR 4.375e-4, gamma 0.995, skip 12 LeelaFarseer-T78juntoaugT79marT80dec.binpack (141G) T60T70wIsRightFarseerT60T74T75T76.binpack test78-junjulaug2022-16tb7p.no-db.min.binpack test79-mar2022-16tb7p.no-db.min.binpack test80-dec2022-16tb7p.no-db.min.binpack 3. 800 epochs, end-lambda 0.725, LR 4.375e-4, gamma 0.995, skip 20 leela93-v1-dfrc99-v2-T78juntosepT80jan-v6dd-T78janfebT79aprT80aprmay.min.binpack leela93-filt-v1.min.binpack dfrc99-16tb7p-filt-v2.min.binpack test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.binpack test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.binpack test78-janfeb2022-16tb7p.min.binpack test79-apr2022-16tb7p.min.binpack test80-apr2022-16tb7p.min.binpack test80-may2022-16tb7p.min.binpack 4. 800 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 24 leela96-dfrc99-v2-T78juntosepT79mayT80junsepnovjan-v6dd-T80mar23-v6-T60novdecT77decT78aprmayT79aprT80may23.min.binpack leela96-filt-v2.min.binpack dfrc99-16tb7p-filt-v2.min.binpack test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.binpack test79-may2022-16tb7p.filter-v6-dd.min.binpack test80-jun2022-16tb7p.filter-v6-dd.min.binpack test80-sep2022-16tb7p.filter-v6-dd.min.binpack test80-nov2022-16tb7p.filter-v6-dd.min.binpack test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.binpack test80-mar2023-2tb7p.v6-sk16.min.binpack test60-novdec2021-16tb7p.min.binpack test77-dec2021-16tb7p.min.binpack test78-aprmay2022-16tb7p.min.binpack test79-apr2022-16tb7p.min.binpack test80-may2023-2tb7p.min.binpack 5. 960 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 28 Increased max-epoch to 960 near the end of the first 800 epochs 5af11540bbfe dataset: official-stockfish#4635 6. 1000 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 28 Increased max-epoch to 1000 near the end of the first 800 epochs 1ee1aba5ed dataset: official-stockfish#4782 ``` L1 weights permuted with: ```bash python3 serialize.py $nnue $nnue_permuted \ --features=HalfKAv2_hm \ --ft_optimize \ --ft_optimize_data=/data/fishpack32.binpack \ --ft_optimize_count=10000 ``` Speed measurements from 100 bench runs at depth 13 with profile-build x86-64-avx2: ``` sf_base = 1329051 +/- 2224 (95%) sf_test = 1163344 +/- 2992 (95%) diff = -165706 +/- 4913 (95%) speedup = -12.46807% +/- 0.370% (95%) ``` Training data can be found at: https://robotmoon.com/nnue-training-data/ Local elo at 25k nodes per move (vs. L1-2048 nn-1ee1aba5ed4c.nnue) ep959 : 16.2 +/- 2.3 Failed 10+0.1 STC: https://tests.stockfishchess.org/tests/view/6501beee2cd016da89abab21 LLR: -2.92 (-2.94,2.94) <0.00,2.00> Total: 13184 W: 3285 L: 3535 D: 6364 Ptnml(0-2): 85, 1662, 3334, 1440, 71 Failed 180+1.8 VLTC: https://tests.stockfishchess.org/tests/view/6505cf9a72620bc881ea908e LLR: -2.94 (-2.94,2.94) <0.00,2.00> Total: 64248 W: 16224 L: 16374 D: 31650 Ptnml(0-2): 26, 6788, 18640, 6650, 20 Passed 60+0.6 th 8 VLTC SMP (STC bounds): https://tests.stockfishchess.org/tests/view/65084a4618698b74c2e541dc LLR: 2.95 (-2.94,2.94) <0.00,2.00> Total: 90630 W: 23372 L: 23033 D: 44225 Ptnml(0-2): 13, 8490, 27968, 8833, 11 Passed 60+0.6 th 8 VLTC SMP: https://tests.stockfishchess.org/tests/view/6501d45d2cd016da89abacdb LLR: 2.95 (-2.94,2.94) <0.50,2.50> Total: 137804 W: 35764 L: 35276 D: 66764 Ptnml(0-2): 31, 13006, 42326, 13522, 17 bench 1246812

Creating this net involved: - a 6-stage training process from scratch. The datasets used in stages 1-5 were fully minimized. - permuting L1 weights with official-stockfish/nnue-pytorch#254 A strong epoch after each training stage was chosen for the next. The 6 stages were: ``` 1. 400 epochs, lambda 1.0, default LR and gamma UHOx2-wIsRight-multinet-dfrc-n5000 (135G) nodes5000pv2_UHO.binpack data_pv-2_diff-100_nodes-5000.binpack wrongIsRight_nodes5000pv2.binpack multinet_pv-2_diff-100_nodes-5000.binpack dfrc_n5000.binpack 2. 800 epochs, end-lambda 0.75, LR 4.375e-4, gamma 0.995, skip 12 LeelaFarseer-T78juntoaugT79marT80dec.binpack (141G) T60T70wIsRightFarseerT60T74T75T76.binpack test78-junjulaug2022-16tb7p.no-db.min.binpack test79-mar2022-16tb7p.no-db.min.binpack test80-dec2022-16tb7p.no-db.min.binpack 3. 800 epochs, end-lambda 0.725, LR 4.375e-4, gamma 0.995, skip 20 leela93-v1-dfrc99-v2-T78juntosepT80jan-v6dd-T78janfebT79aprT80aprmay.min.binpack leela93-filt-v1.min.binpack dfrc99-16tb7p-filt-v2.min.binpack test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.binpack test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.binpack test78-janfeb2022-16tb7p.min.binpack test79-apr2022-16tb7p.min.binpack test80-apr2022-16tb7p.min.binpack test80-may2022-16tb7p.min.binpack 4. 800 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 24 leela96-dfrc99-v2-T78juntosepT79mayT80junsepnovjan-v6dd-T80mar23-v6-T60novdecT77decT78aprmayT79aprT80may23.min.binpack leela96-filt-v2.min.binpack dfrc99-16tb7p-filt-v2.min.binpack test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.binpack test79-may2022-16tb7p.filter-v6-dd.min.binpack test80-jun2022-16tb7p.filter-v6-dd.min.binpack test80-sep2022-16tb7p.filter-v6-dd.min.binpack test80-nov2022-16tb7p.filter-v6-dd.min.binpack test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.binpack test80-mar2023-2tb7p.v6-sk16.min.binpack test60-novdec2021-16tb7p.min.binpack test77-dec2021-16tb7p.min.binpack test78-aprmay2022-16tb7p.min.binpack test79-apr2022-16tb7p.min.binpack test80-may2023-2tb7p.min.binpack 5. 960 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 28 Increased max-epoch to 960 near the end of the first 800 epochs 5af11540bbfe dataset: official-stockfish#4635 6. 1000 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 28 Increased max-epoch to 1000 near the end of the first 800 epochs 1ee1aba5ed dataset: official-stockfish#4782 ``` L1 weights permuted with: ```bash python3 serialize.py $nnue $nnue_permuted \ --features=HalfKAv2_hm \ --ft_optimize \ --ft_optimize_data=/data/fishpack32.binpack \ --ft_optimize_count=10000 ``` Speed measurements from 100 bench runs at depth 13 with profile-build x86-64-avx2: ``` sf_base = 1329051 +/- 2224 (95%) sf_test = 1163344 +/- 2992 (95%) diff = -165706 +/- 4913 (95%) speedup = -12.46807% +/- 0.370% (95%) ``` Training data can be found at: https://robotmoon.com/nnue-training-data/ Local elo at 25k nodes per move (vs. L1-2048 nn-1ee1aba5ed4c.nnue) ep959 : 16.2 +/- 2.3 Failed 10+0.1 STC: https://tests.stockfishchess.org/tests/view/6501beee2cd016da89abab21 LLR: -2.92 (-2.94,2.94) <0.00,2.00> Total: 13184 W: 3285 L: 3535 D: 6364 Ptnml(0-2): 85, 1662, 3334, 1440, 71 Failed 180+1.8 VLTC: https://tests.stockfishchess.org/tests/view/6505cf9a72620bc881ea908e LLR: -2.94 (-2.94,2.94) <0.00,2.00> Total: 64248 W: 16224 L: 16374 D: 31650 Ptnml(0-2): 26, 6788, 18640, 6650, 20 Passed 60+0.6 th 8 VLTC SMP (STC bounds): https://tests.stockfishchess.org/tests/view/65084a4618698b74c2e541dc LLR: 2.95 (-2.94,2.94) <0.00,2.00> Total: 90630 W: 23372 L: 23033 D: 44225 Ptnml(0-2): 13, 8490, 27968, 8833, 11 Passed 60+0.6 th 8 VLTC SMP: https://tests.stockfishchess.org/tests/view/6501d45d2cd016da89abacdb LLR: 2.95 (-2.94,2.94) <0.50,2.50> Total: 137804 W: 35764 L: 35276 D: 66764 Ptnml(0-2): 31, 13006, 42326, 13522, 17 bench 1246812

Creating this net involved: - a 6-stage training process from scratch. The datasets used in stages 1-5 were fully minimized. - permuting L1 weights with official-stockfish/nnue-pytorch#254 A strong epoch after each training stage was chosen for the next. The 6 stages were: ``` 1. 400 epochs, lambda 1.0, default LR and gamma UHOx2-wIsRight-multinet-dfrc-n5000 (135G) nodes5000pv2_UHO.binpack data_pv-2_diff-100_nodes-5000.binpack wrongIsRight_nodes5000pv2.binpack multinet_pv-2_diff-100_nodes-5000.binpack dfrc_n5000.binpack 2. 800 epochs, end-lambda 0.75, LR 4.375e-4, gamma 0.995, skip 12 LeelaFarseer-T78juntoaugT79marT80dec.binpack (141G) T60T70wIsRightFarseerT60T74T75T76.binpack test78-junjulaug2022-16tb7p.no-db.min.binpack test79-mar2022-16tb7p.no-db.min.binpack test80-dec2022-16tb7p.no-db.min.binpack 3. 800 epochs, end-lambda 0.725, LR 4.375e-4, gamma 0.995, skip 20 leela93-v1-dfrc99-v2-T78juntosepT80jan-v6dd-T78janfebT79aprT80aprmay.min.binpack leela93-filt-v1.min.binpack dfrc99-16tb7p-filt-v2.min.binpack test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.binpack test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.binpack test78-janfeb2022-16tb7p.min.binpack test79-apr2022-16tb7p.min.binpack test80-apr2022-16tb7p.min.binpack test80-may2022-16tb7p.min.binpack 4. 800 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 24 leela96-dfrc99-v2-T78juntosepT79mayT80junsepnovjan-v6dd-T80mar23-v6-T60novdecT77decT78aprmayT79aprT80may23.min.binpack leela96-filt-v2.min.binpack dfrc99-16tb7p-filt-v2.min.binpack test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.binpack test79-may2022-16tb7p.filter-v6-dd.min.binpack test80-jun2022-16tb7p.filter-v6-dd.min.binpack test80-sep2022-16tb7p.filter-v6-dd.min.binpack test80-nov2022-16tb7p.filter-v6-dd.min.binpack test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.binpack test80-mar2023-2tb7p.v6-sk16.min.binpack test60-novdec2021-16tb7p.min.binpack test77-dec2021-16tb7p.min.binpack test78-aprmay2022-16tb7p.min.binpack test79-apr2022-16tb7p.min.binpack test80-may2023-2tb7p.min.binpack 5. 960 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 28 Increased max-epoch to 960 near the end of the first 800 epochs 5af11540bbfe dataset: official-stockfish#4635 6. 1000 epochs, end-lambda 0.7, LR 4.375e-4, gamma 0.995, skip 28 Increased max-epoch to 1000 near the end of the first 800 epochs 1ee1aba5ed dataset: official-stockfish#4782 ``` L1 weights permuted with: ```bash python3 serialize.py $nnue $nnue_permuted \ --features=HalfKAv2_hm \ --ft_optimize \ --ft_optimize_data=/data/fishpack32.binpack \ --ft_optimize_count=10000 ``` Speed measurements from 100 bench runs at depth 13 with profile-build x86-64-avx2: ``` sf_base = 1329051 +/- 2224 (95%) sf_test = 1163344 +/- 2992 (95%) diff = -165706 +/- 4913 (95%) speedup = -12.46807% +/- 0.370% (95%) ``` Training data can be found at: https://robotmoon.com/nnue-training-data/ Local elo at 25k nodes per move (vs. L1-2048 nn-1ee1aba5ed4c.nnue) ep959 : 16.2 +/- 2.3 Failed 10+0.1 STC: https://tests.stockfishchess.org/tests/view/6501beee2cd016da89abab21 LLR: -2.92 (-2.94,2.94) <0.00,2.00> Total: 13184 W: 3285 L: 3535 D: 6364 Ptnml(0-2): 85, 1662, 3334, 1440, 71 Failed 180+1.8 VLTC: https://tests.stockfishchess.org/tests/view/6505cf9a72620bc881ea908e LLR: -2.94 (-2.94,2.94) <0.00,2.00> Total: 64248 W: 16224 L: 16374 D: 31650 Ptnml(0-2): 26, 6788, 18640, 6650, 20 Passed 60+0.6 th 8 VLTC SMP (STC bounds): https://tests.stockfishchess.org/tests/view/65084a4618698b74c2e541dc LLR: 2.95 (-2.94,2.94) <0.00,2.00> Total: 90630 W: 23372 L: 23033 D: 44225 Ptnml(0-2): 13, 8490, 27968, 8833, 11 Passed 60+0.6 th 8 VLTC SMP: https://tests.stockfishchess.org/tests/view/6501d45d2cd016da89abacdb LLR: 2.95 (-2.94,2.94) <0.50,2.50> Total: 137804 W: 35764 L: 35276 D: 66764 Ptnml(0-2): 31, 13006, 42326, 13522, 17 closes official-stockfish#4795 bench 1246812

Created by training an L1-128 net from scratch with a wider range of evals in the training data and wld-fen-skipping disabled during training. The differences in this training data compared to the first dual nnue PR are: - removal of all positions with 3 pieces - when piece count >= 16, keep positions with simple eval above 750 - when piece count < 16, remove positions with simple eval above 3000 The asymmetric data filtering was meant to flatten the training data piece count distribution, which was previously heavily skewed towards positions with low piece counts. Additionally, the simple eval range where the smallnet is used was widened to cover more positions previously evaluated by the big net and simple eval. ```yaml experiment-name: 128--S1-hse-S7-v4-S3-v1-no-wld-skip training-dataset: - /data/hse/S3/leela96-filt-v2.min.high-simple-eval-1k.binpack - /data/hse/S3/dfrc99-16tb7p-eval-filt-v2.min.high-simple-eval-1k.binpack - /data/hse/S3/test80-apr2022-16tb7p.min.high-simple-eval-1k.binpack - /data/hse/S7/test60-2020-2tb7p.v6-3072.high-simple-eval-v4.binpack - /data/hse/S7/test60-novdec2021-12tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack - /data/hse/S7/test77-nov2021-2tb7p.v6-3072.min.high-simple-eval-v4.binpack - /data/hse/S7/test77-dec2021-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack - /data/hse/S7/test77-jan2022-2tb7p.high-simple-eval-v4.binpack - /data/hse/S7/test78-jantomay2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack - /data/hse/S7/test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack - /data/hse/S7/test79-apr2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack - /data/hse/S7/test79-may2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack - /data/hse/S7/test80-may2022-16tb7p.high-simple-eval-v4.binpack - /data/hse/S7/test80-jun2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack - /data/hse/S7/test80-jul2022-16tb7p.v6-dd.min.high-simple-eval-v4.binpack - /data/hse/S7/test80-aug2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack - /data/hse/S7/test80-sep2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack - /data/hse/S7/test80-oct2022-16tb7p.v6-dd.high-simple-eval-v4.binpack - /data/hse/S7/test80-nov2022-16tb7p-v6-dd.min.high-simple-eval-v4.binpack - /data/hse/S7/test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack - /data/hse/S7/test80-feb2023-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack - /data/hse/S7/test80-mar2023-2tb7p.v6-sk16.min.high-simple-eval-v4.binpack - /data/hse/S7/test80-apr2023-2tb7p-filter-v6-sk16.min.high-simple-eval-v4.binpack - /data/hse/S7/test80-may2023-2tb7p.v6.min.high-simple-eval-v4.binpack - /data/hse/S7/test80-jun2023-2tb7p.v6-3072.min.high-simple-eval-v4.binpack - /data/hse/S7/test80-jul2023-2tb7p.v6-3072.min.high-simple-eval-v4.binpack - /data/hse/S7/test80-aug2023-2tb7p.v6.min.high-simple-eval-v4.binpack - /data/hse/S7/test80-sep2023-2tb7p.high-simple-eval-v4.binpack - /data/hse/S7/test80-oct2023-2tb7p.high-simple-eval-v4.binpack wld-fen-skipping: False start-from-engine-test-net: False nnue-pytorch-branch: linrock/nnue-pytorch/L1-128 engine-test-branch: linrock/Stockfish/L1-128-nolazy engine-base-branch: linrock/Stockfish/L1-128 num-epochs: 500 start-lambda: 1.0 end-lambda: 1.0 ``` Experiment yaml configs converted to easy_train.sh commands with: https://github.com/linrock/nnue-tools/blob/4339954/yaml_easy_train.py Binpacks interleaved at training time with: official-stockfish/nnue-pytorch#259 FT weights permuted with 10k positions from fishpack32.binpack with: official-stockfish/nnue-pytorch#254 Data filtered for high simple eval positions (v4) with: https://github.com/linrock/Stockfish/blob/b9c8440/src/tools/transform.cpp#L640-L675 Training data can be found at: https://robotmoon.com/nnue-training-data/ Local elo at 25k nodes per move of L1-128 smallnet (nnue-only eval) vs. L1-128 trained on standard S1 data: nn-epoch319.nnue : -241.7 +/- 3.2 Passed STC vs. 36db936: https://tests.stockfishchess.org/tests/view/6576b3484d789acf40aabbfe LLR: 2.94 (-2.94,2.94) <0.00,2.00> Total: 21920 W: 5680 L: 5381 D: 10859 Ptnml(0-2): 82, 2488, 5520, 2789, 81 Passed LTC vs. DualNNUE official-stockfish#4915: https://tests.stockfishchess.org/tests/view/65775c034d789acf40aac7e3 LLR: 2.95 (-2.94,2.94) <0.50,2.50> Total: 147606 W: 36619 L: 36063 D: 74924 Ptnml(0-2): 98, 16591, 39891, 17103, 120 Bench: 1438336

Created by training an L1-128 net from scratch with a wider range of evals in the training data and wld-fen-skipping disabled during training. The differences in this training data compared to the first dual nnue PR are: - removal of all positions with 3 pieces - when piece count >= 16, keep positions with simple eval above 750 - when piece count < 16, remove positions with simple eval above 3000 The asymmetric data filtering was meant to flatten the training data piece count distribution, which was previously heavily skewed towards positions with low piece counts. Additionally, the simple eval range where the smallnet is used was widened to cover more positions previously evaluated by the big net and simple eval. ```yaml experiment-name: 128--S1-hse-S7-v4-S3-v1-no-wld-skip training-dataset: - /data/hse/S3/leela96-filt-v2.min.high-simple-eval-1k.binpack - /data/hse/S3/dfrc99-16tb7p-eval-filt-v2.min.high-simple-eval-1k.binpack - /data/hse/S3/test80-apr2022-16tb7p.min.high-simple-eval-1k.binpack - /data/hse/S7/test60-2020-2tb7p.v6-3072.high-simple-eval-v4.binpack - /data/hse/S7/test60-novdec2021-12tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack - /data/hse/S7/test77-nov2021-2tb7p.v6-3072.min.high-simple-eval-v4.binpack - /data/hse/S7/test77-dec2021-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack - /data/hse/S7/test77-jan2022-2tb7p.high-simple-eval-v4.binpack - /data/hse/S7/test78-jantomay2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack - /data/hse/S7/test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack - /data/hse/S7/test79-apr2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack - /data/hse/S7/test79-may2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack - /data/hse/S7/test80-may2022-16tb7p.high-simple-eval-v4.binpack - /data/hse/S7/test80-jun2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack - /data/hse/S7/test80-jul2022-16tb7p.v6-dd.min.high-simple-eval-v4.binpack - /data/hse/S7/test80-aug2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack - /data/hse/S7/test80-sep2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack - /data/hse/S7/test80-oct2022-16tb7p.v6-dd.high-simple-eval-v4.binpack - /data/hse/S7/test80-nov2022-16tb7p-v6-dd.min.high-simple-eval-v4.binpack - /data/hse/S7/test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack - /data/hse/S7/test80-feb2023-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack - /data/hse/S7/test80-mar2023-2tb7p.v6-sk16.min.high-simple-eval-v4.binpack - /data/hse/S7/test80-apr2023-2tb7p-filter-v6-sk16.min.high-simple-eval-v4.binpack - /data/hse/S7/test80-may2023-2tb7p.v6.min.high-simple-eval-v4.binpack - /data/hse/S7/test80-jun2023-2tb7p.v6-3072.min.high-simple-eval-v4.binpack - /data/hse/S7/test80-jul2023-2tb7p.v6-3072.min.high-simple-eval-v4.binpack - /data/hse/S7/test80-aug2023-2tb7p.v6.min.high-simple-eval-v4.binpack - /data/hse/S7/test80-sep2023-2tb7p.high-simple-eval-v4.binpack - /data/hse/S7/test80-oct2023-2tb7p.high-simple-eval-v4.binpack wld-fen-skipping: False start-from-engine-test-net: False nnue-pytorch-branch: linrock/nnue-pytorch/L1-128 engine-test-branch: linrock/Stockfish/L1-128-nolazy engine-base-branch: linrock/Stockfish/L1-128 num-epochs: 500 start-lambda: 1.0 end-lambda: 1.0 ``` Experiment yaml configs converted to easy_train.sh commands with: https://github.com/linrock/nnue-tools/blob/4339954/yaml_easy_train.py Binpacks interleaved at training time with: official-stockfish/nnue-pytorch#259 FT weights permuted with 10k positions from fishpack32.binpack with: official-stockfish/nnue-pytorch#254 Data filtered for high simple eval positions (v4) with: https://github.com/linrock/Stockfish/blob/b9c8440/src/tools/transform.cpp#L640-L675 Training data can be found at: https://robotmoon.com/nnue-training-data/ Local elo at 25k nodes per move of L1-128 smallnet (nnue-only eval) vs. L1-128 trained on standard S1 data: nn-epoch319.nnue : -241.7 +/- 3.2 Passed STC vs. 36db936: https://tests.stockfishchess.org/tests/view/6576b3484d789acf40aabbfe LLR: 2.94 (-2.94,2.94) <0.00,2.00> Total: 21920 W: 5680 L: 5381 D: 10859 Ptnml(0-2): 82, 2488, 5520, 2789, 81 Passed LTC vs. DualNNUE #4915: https://tests.stockfishchess.org/tests/view/65775c034d789acf40aac7e3 LLR: 2.95 (-2.94,2.94) <0.50,2.50> Total: 147606 W: 36619 L: 36063 D: 74924 Ptnml(0-2): 98, 16591, 39891, 17103, 120 closes #4919 Bench: 1438336

Created by training an L1-128 net from scratch with a wider range of evals in the training data and wld-fen-skipping disabled during training. The differences in this training data compared to the first dual nnue PR are: - removal of all positions with 3 pieces - when piece count >= 16, keep positions with simple eval above 750 - when piece count < 16, remove positions with simple eval above 3000 The asymmetric data filtering was meant to flatten the training data piece count distribution, which was previously heavily skewed towards positions with low piece counts. Additionally, the simple eval range where the smallnet is used was widened to cover more positions previously evaluated by the big net and simple eval. ```yaml experiment-name: 128--S1-hse-S7-v4-S3-v1-no-wld-skip training-dataset: - /data/hse/S3/leela96-filt-v2.min.high-simple-eval-1k.binpack - /data/hse/S3/dfrc99-16tb7p-eval-filt-v2.min.high-simple-eval-1k.binpack - /data/hse/S3/test80-apr2022-16tb7p.min.high-simple-eval-1k.binpack - /data/hse/S7/test60-2020-2tb7p.v6-3072.high-simple-eval-v4.binpack - /data/hse/S7/test60-novdec2021-12tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack - /data/hse/S7/test77-nov2021-2tb7p.v6-3072.min.high-simple-eval-v4.binpack - /data/hse/S7/test77-dec2021-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack - /data/hse/S7/test77-jan2022-2tb7p.high-simple-eval-v4.binpack - /data/hse/S7/test78-jantomay2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack - /data/hse/S7/test78-juntosep2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack - /data/hse/S7/test79-apr2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack - /data/hse/S7/test79-may2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack - /data/hse/S7/test80-may2022-16tb7p.high-simple-eval-v4.binpack - /data/hse/S7/test80-jun2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack - /data/hse/S7/test80-jul2022-16tb7p.v6-dd.min.high-simple-eval-v4.binpack - /data/hse/S7/test80-aug2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack - /data/hse/S7/test80-sep2022-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack - /data/hse/S7/test80-oct2022-16tb7p.v6-dd.high-simple-eval-v4.binpack - /data/hse/S7/test80-nov2022-16tb7p-v6-dd.min.high-simple-eval-v4.binpack - /data/hse/S7/test80-jan2023-3of3-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack - /data/hse/S7/test80-feb2023-16tb7p-filter-v6-dd.min-mar2023.unmin.high-simple-eval-v4.binpack - /data/hse/S7/test80-mar2023-2tb7p.v6-sk16.min.high-simple-eval-v4.binpack - /data/hse/S7/test80-apr2023-2tb7p-filter-v6-sk16.min.high-simple-eval-v4.binpack - /data/hse/S7/test80-may2023-2tb7p.v6.min.high-simple-eval-v4.binpack - /data/hse/S7/test80-jun2023-2tb7p.v6-3072.min.high-simple-eval-v4.binpack - /data/hse/S7/test80-jul2023-2tb7p.v6-3072.min.high-simple-eval-v4.binpack - /data/hse/S7/test80-aug2023-2tb7p.v6.min.high-simple-eval-v4.binpack - /data/hse/S7/test80-sep2023-2tb7p.high-simple-eval-v4.binpack - /data/hse/S7/test80-oct2023-2tb7p.high-simple-eval-v4.binpack wld-fen-skipping: False start-from-engine-test-net: False nnue-pytorch-branch: linrock/nnue-pytorch/L1-128 engine-test-branch: linrock/Stockfish/L1-128-nolazy engine-base-branch: linrock/Stockfish/L1-128 num-epochs: 500 start-lambda: 1.0 end-lambda: 1.0 ``` Experiment yaml configs converted to easy_train.sh commands with: https://github.com/linrock/nnue-tools/blob/4339954/yaml_easy_train.py Binpacks interleaved at training time with: official-stockfish/nnue-pytorch#259 FT weights permuted with 10k positions from fishpack32.binpack with: official-stockfish/nnue-pytorch#254 Data filtered for high simple eval positions (v4) with: https://github.com/linrock/Stockfish/blob/b9c8440/src/tools/transform.cpp#L640-L675 Training data can be found at: https://robotmoon.com/nnue-training-data/ Local elo at 25k nodes per move of L1-128 smallnet (nnue-only eval) vs. L1-128 trained on standard S1 data: nn-epoch319.nnue : -241.7 +/- 3.2 Passed STC vs. 36db936: https://tests.stockfishchess.org/tests/view/6576b3484d789acf40aabbfe LLR: 2.94 (-2.94,2.94) <0.00,2.00> Total: 21920 W: 5680 L: 5381 D: 10859 Ptnml(0-2): 82, 2488, 5520, 2789, 81 Passed LTC vs. DualNNUE official-stockfish#4915: https://tests.stockfishchess.org/tests/view/65775c034d789acf40aac7e3 LLR: 2.95 (-2.94,2.94) <0.50,2.50> Total: 147606 W: 36619 L: 36063 D: 74924 Ptnml(0-2): 98, 16591, 39891, 17103, 120 closes official-stockfish#4919 Bench: 1438336

Sopel97 force-pushed the ftperm_ergodice branch from dc21871 to b814eb6 Compare June 28, 2023 11:19

Sopel97 force-pushed the ftperm_ergodice branch from 45f608c to f6d5274 Compare June 30, 2023 12:27

Sopel97 mentioned this pull request Jul 6, 2023

Update default net to nn-c38c3d8d3920.nnue official-stockfish/Stockfish#4666

Closed

Sopel97 added 2 commits July 7, 2023 15:21

Optimizer for feature transformer weights permutation. Integration wi…

91090ad

…th serialize.py

Tidy up the algorithm code. Allow selection between numpy and cupy.

b341816

Sopel97 force-pushed the ftperm_ergodice branch from f6d5274 to b341816 Compare July 7, 2023 13:26

vondele merged commit 5107928 into official-stockfish:master Jul 12, 2023

linrock mentioned this pull request Sep 21, 2023

Update NNUE architecture to SFNNv8: L1-2560 nn-ac1dbea57aa3.nnue official-stockfish/Stockfish#4795

Closed

linrock mentioned this pull request Jan 7, 2024

Update smallnet to nn-baff1ede1f90.nnue with wider eval range official-stockfish/Stockfish#4919

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FT optimization utility. Integrate with serialization. #254

FT optimization utility. Integrate with serialization. #254

Sopel97 commented Jun 28, 2023 •

edited

Loading

Sopel97 commented Jun 30, 2023 •

edited

Loading

Ergodice commented Jun 30, 2023

vondele commented Jul 6, 2023

Sopel97 commented Jul 6, 2023

Sopel97 commented Jul 7, 2023

FT optimization utility. Integrate with serialization. #254

FT optimization utility. Integrate with serialization. #254

Conversation

Sopel97 commented Jun 28, 2023 • edited Loading

Sopel97 commented Jun 30, 2023 • edited Loading

Ergodice commented Jun 30, 2023

vondele commented Jul 6, 2023

Sopel97 commented Jul 6, 2023

Sopel97 commented Jul 7, 2023

Sopel97 commented Jun 28, 2023 •

edited

Loading

Sopel97 commented Jun 30, 2023 •

edited

Loading