Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Models are missing in group logs #874

Open
Tracked by #164
eu9ene opened this issue Oct 10, 2024 · 1 comment
Open
Tracked by #164

Models are missing in group logs #874

eu9ene opened this issue Oct 10, 2024 · 1 comment
Labels
bug Something is broken or not correct weights and biases Intergration with Weights and Biases

Comments

@eu9ene
Copy link
Collaborator

eu9ene commented Oct 10, 2024

Some models are missing from the final evals table in the group logs. It seems this is the case for both online and offline uploading. Here's an example for zh-en. It was tracked online with a slightly outdated code so I'm not sure 100%.

https://wandb.ai/moz-translations/zh-en/runs/group_logs_dziji?nw=nwuserepavlov

The same group was uploaded recently by @vrigal with the latest offline uploader:

https://wandb.ai/teklia/zh-en-taskcluster/runs/group_logs_dziji?nw=nwuserepavlov

Both are missing some models. If we look at the group in Taskcluster, we can see it includes evals for: teacher1, teacher2, teacher-ensemble, student, and finetuned-student. Only some of those models are present in the tables.

https://firefox-ci-tc.services.mozilla.com/tasks/groups/dzijiL-PQ4ScKBB3oIjGQg#eval

I wonder if it can be due to W&B display issues. See #716

@eu9ene eu9ene added bug Something is broken or not correct weights and biases Intergration with Weights and Biases labels Oct 10, 2024
@vrigal
Copy link
Contributor

vrigal commented Oct 11, 2024

The example you mention ran on 2024-09-25T17:32:07.715Z, so I think the code is up-to-date except for the comet score that has been patched (x100). The charts published with runs seems correct, except that I cannot see quantized metrics in the reupload. I'm not sure why this happen with the taskcluster group publication.

Concerning the missing lines in the group table from the online run, it seems like an issue with the table incrementation. Successive versions of the table artifact seems to override the previous row. This is particularly visible with versions 17→19:

Concerning the missing runs the table is published once, but only contains 7 entries (over 40):
https://api.wandb.ai/files/teklia/zh-en-taskcluster/group_logs_dziji/media/table/metrics_0_2f49ec3ae32495b07610.table.json
This will require more investigation, It may be due to unsupported tasks with the parse_tc_group entrypoint.

I noticed another important issue concerning the reuploading workflow: #875

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is broken or not correct weights and biases Intergration with Weights and Biases
Projects
None yet
Development

No branches or pull requests

2 participants