You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Publication from a Taskcluster group using the --overide-runs agrument manages to delete the existing runs of a group, but fails creating new runs:
wandb: ERROR Error while calling W&B API: run teacher-1_dziji was previously created and deleted; try a new run name (<Response [409]>)
Note: It is the ID that conflicts here, and not the name as suggested by above message.
Furthermore, the client stays stuck during 90s
wandb.errors.CommError: Run initialization has timed out after 90.0 sec.
It is annoying because we cannot support identifying runs by unique ID (<name>_<group_id>) and allow overriding a run from an existing project. Unfortunately deleting all artifacts from the project does not seem to fix that. Eventually a quick fix would be to detect such exception and retry with a postfix (name and ID would then be teacher-1_dziji_1, teacher-1_dziji_2…) and it should work (except the display is not ideal and may be confusing, at least consider documenting it).
I think W&B disallow overriding a run because it keep the data to allow a restore of the deleted runs during 7 days (see this issue: wandb/wandb#6395). In the worst scenario we could clean everything (with the --overide-runs) now, then hope reuploading in a week works. It would be interesting to contact the W&B team about this.
I suppose we never detected it since using similar name and IDs for identifying runs in the bar charts.
The text was updated successfully, but these errors were encountered:
Unfortunately it seems to be the same behavior from the interface (as suggested on their issue).
I suggest discussing this with the W&B team, to at least ensure we can override a run after 7 days. Also they may be able to delete the run from the DB directly so we can publish again (or any short term alternative idk, like erasing the content of a run then use the resume client's option).
Publication from a Taskcluster group using the
--overide-runs
agrument manages to delete the existing runs of a group, but fails creating new runs:Note: It is the
ID
that conflicts here, and not thename
as suggested by above message.Furthermore, the client stays stuck during 90s
It is annoying because we cannot support identifying runs by unique ID (
<name>_<group_id>
) and allow overriding a run from an existing project. Unfortunately deleting all artifacts from the project does not seem to fix that. Eventually a quick fix would be to detect such exception and retry with a postfix (name and ID would then beteacher-1_dziji_1
,teacher-1_dziji_2
…) and it should work (except the display is not ideal and may be confusing, at least consider documenting it).I think W&B disallow overriding a run because it keep the data to allow a restore of the deleted runs during 7 days (see this issue: wandb/wandb#6395). In the worst scenario we could clean everything (with the
--overide-runs
) now, then hope reuploading in a week works. It would be interesting to contact the W&B team about this.I suppose we never detected it since using similar name and IDs for identifying runs in the bar charts.
The text was updated successfully, but these errors were encountered: