Compare many onnx files in one go

carsten-wenderdel · Dec 11, 2023 · 7781a74 · 7781a74
1 parent 07bad18
commit 7781a74
Show file tree

Hide file tree

Showing 2 changed files with 76 additions and 2 deletions.
diff --git a/crates/coach/src/bin/benchmark-evaluators.rs b/crates/coach/src/bin/benchmark-evaluators.rs
@@ -0,0 +1,65 @@
+use coach::duel::Duel;
+use engine::complex::ComplexEvaluator;
+use engine::dice::FastrandDice;
+use engine::probabilities::{Probabilities, ResultCounter};
+use rayon::prelude::*;
+use std::fs;
+use std::io::{stdout, Write};
+
+/// Compare one evaluator with neural nets in the folder `training-data`.
+fn main() {
+    let folder_name = "training-data";
+    println!("Start benchmarking, read contents of {}", folder_name);
+    let mut paths = fs::read_dir(folder_name)
+        .unwrap()
+        .map(|x| x.unwrap().file_name().into_string().unwrap())
+        .filter(|x| x.ends_with(".onnx"))
+        .collect::<Vec<_>>();
+    paths.sort();
+
+    for file_name in paths {
+        print!("Load current neural nets");
+        stdout().flush().unwrap();
+        let current = ComplexEvaluator::from_file_paths_optimized(
+            "neural-nets/contact.onnx",
+            "neural-nets/race.onnx",
+        )
+        .expect("Could not find nets for current");
+
+        let path_string = folder_name.to_string() + "/" + file_name.as_str();
+        print!("\rTry {}", path_string);
+        stdout().flush().unwrap();
+        let contender =
+            ComplexEvaluator::from_file_paths_optimized(&path_string, "neural-nets/race.onnx")
+                .expect("Failed creating neural net for contender");
+
+        let duel = Duel::new(contender, current);
+
+        let mut dice_gen = FastrandDice::new();
+
+        let number_of_games = 100_000;
+
+        // If we create n seeds, than n duels are played in parallel which gives us 2*n GameResults.
+        let seeds: Vec<u64> = (0..number_of_games / 2).map(|_| dice_gen.seed()).collect();
+        let counter = seeds
+            .into_par_iter()
+            .map(|seed| duel.duel(&mut FastrandDice::with_seed(seed)))
+            .reduce(ResultCounter::default, |a, b| a.combine(&b));
+
+        let probabilities = Probabilities::from(&counter);
+        let winning_or_losing = if probabilities.equity() > 0.0 {
+            "winning"
+        } else {
+            " losing"
+        };
+        println!(
+            "\r{} is {}. After {} games the equity is {:7.4}. {:?}",
+            file_name.strip_suffix(".onnx").unwrap(),
+            winning_or_losing,
+            counter.sum(),
+            probabilities.equity(),
+            probabilities,
+        );
+    }
+    println!("Finished benchmarking");
+}
diff --git a/docs/dev/training.md b/docs/dev/training.md
@@ -45,8 +45,17 @@ defined, it should be something like `mode = "contact"`.
 - You might want to edit various hyperparameters. Number of epochs, optimizer and loss function should be ok, but maybe you find better ones.
 In any case you should try various learning rates, they have a big impact on the quality of the net.
 - Go to the folder `training` and execute `./src/train-on-rollout-data.py` - this will create several new nets in the `training-data` folder. It should take only a few minutes.
-- Check the quality of the net: Edit [`compare-evaluators.rs`](../../crates/coach/src/bin/compare-evaluators.rs) and pick
-different nets you want to compare.
+
+### Compare neural nets
+Before deciding which new neural net is the best, you should compare it to the current best net. This is done by letting two evaluators play against each other.
+To have a baseline, copy existing onnx files from https://github.com/carsten-wenderdel/wildbg-training to the folder `neural-networks`. Those committed to this repository are small and weaker.
+- Check the quality of the net: 
+
+#### Compare just two evaluators
+- Edit [`compare-evaluators.rs`](../../crates/coach/src/bin/compare-evaluators.rs) and pick different nets you want to compare.
 - Execute `cargo run -r -p coach --bin compare-evaluators`. This starts two evaluators with different nets playing against each other.
 After several ten thousand games a difference in equity should be visible. This helps to pick the strongest net.
 
+#### Compare all neural nets in the `training-data` folder
+- Edit [`benchmark-evaluators.rs`](../../crates/coach/src/bin/compare-evaluators.rs) and and pick the number of games that should be played per comparison. Even with 300,000 games the results can easily fluctuate by 0.04 equity points.
+- Execute `cargo run -r -p coach --bin benchmark-evaluators`. After having results, you might want to repeat this with less onnx files in the `training-data` folder and more games.