-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Group logs online evals #708
Conversation
group_logs_client.open(resume=True) | ||
|
||
# Restore existing metrics data | ||
data = list_existing_group_logs_metrics(group_logs_client.wandb) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We run many evals tasks at the same time. So there's a chance of race condition here, right? I'm thinking maybe we can pursue an approach with building an automatic report based on the metrics we already write to the runs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is indeed possible, and would mean that the second eval metric overrides the first one. However, the chances of this happening are small: listing takes ~1.5s and publishing takes ~2s (which is small compared to the runtime of eval tasks). Relying on runs would probably be a better approach here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree chances are low but still, as we discussed, let's explore the approach with reporting not to republish the evals again.
82e7c53
to
e0cee4c
Compare
Based on #660
Closes #697
This was more complex than I thought, as it is not possible to increment an existing table from the Python client.
I ran the
eval.py
script for every evaluation task of groupN1O85rIASLmCwKfUpCvTlw
. Results are displayed on https://wandb.ai/teklia/da-en/runs/5ljd4a4n