From 1c1cf48d3cf88f861f777d7a764f730d69f1ff29 Mon Sep 17 00:00:00 2001 From: carschandler <92899389+carschandler@users.noreply.github.com> Date: Wed, 13 Nov 2024 07:07:02 -0600 Subject: [PATCH] Confusing wording in self-play.mdx --- units/en/unit7/self-play.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/units/en/unit7/self-play.mdx b/units/en/unit7/self-play.mdx index f35d0336..7ebfe9e9 100644 --- a/units/en/unit7/self-play.mdx +++ b/units/en/unit7/self-play.mdx @@ -37,7 +37,7 @@ The theory behind self-play is not something new. It was already used by Arthur Self-Play is integrated into the MLAgents library and is managed by multiple hyperparameters that we’re going to study. But the main focus, as explained in the documentation, is the **tradeoff between the skill level and generality of the final policy and the stability of learning**. -Training against a set of slowly changing or unchanging adversaries with low diversity **results in more stable training. But a risk to overfit if the change is too slow.** +Training against a set of slowly changing or unchanging adversaries with low diversity **results in more stable training. But there is a risk of overfitting if the change is too slow.** So we need to control: