Replicating Civil Comments results with standard deviation #133
Unanswered
SharanyaMohan-30
asked this question in
Q&A
Replies: 1 comment
-
For each seed, we pick an "early stopping" epoch based on val worst-group accuracy, and then we calculate the average and worst-group accuracies for both val and test at that epoch. We then average those results for each seed, and we calculate stddevs from those results as well. Does that make sense? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Dear Team,
I am trying to replicate results for the civil comments dataset using GROUP DRO - LABEL (i.e group by = 'Y') in the leaderboard .
Test average accuracy is mentioned as 90.2 (0.3) and validation average accuracy as 90.4 (0.4) ,I am not understanding on these values were obtained, does that mean from each seed max of average accuracy among 5 epochs are used, or do we use only the average accuracy from the last epoch(i.e. 5th epoch)
I tried taking the average from all 5 seeds by using the 5th epoch average accuracy, i didn't not achieve 90.2 in the test average accuracy.
Could you please help me with how to replicate the results including the standard deviation?
Beta Was this translation helpful? Give feedback.
All reactions