-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wandb logging bug using Iteration based runner #2069
Comments
@ayulockin can probably help here :) |
Hey @levan92, is the issue that the WandbLoggerHook is not logging validation metrics or is it that the steps (x axis) are not correct (meaningful). Also since you are using MMDetection can you give |
Thanks @ayulockin! Will try out the MMDetWandbHook. Yup, the issue is that Wandb is not logging any of the val values at all, as wandb's step is higher than the given step (because of the reason I gave at the end of my original post). |
@ayulockin I've tried out |
Hey @levan92, I have faced the same issue but didn't dig deeper. I have a hunch that it has something to do with MMDetection's I can confirm that both |
Yup, in A solution is to call To reproduce, you can use the config snippet I provided above in the original post (append it to the existing cfg file I referenced). |
Here's an ugly fix that works: In @master_only
def log(self, runner) -> None:
tags = self.get_loggable_tags(runner)
if tags:
step = self.get_iter(runner)
if not self.by_epoch and (step%self.eval_interval) == 0 and not self.eval_step:
commit = False
self.eval_step = True # eval step next
else:
commit = self.commit
self.eval_step = False
if self.with_step:
self.wandb.log(
tags, step=step, commit=commit)
else:
tags['global_step'] = step
self.wandb.log(tags, commit=commit) What do you think? There's probably a more elegant way of doing this, but this works for me for now. |
Hey @levan92, I don't think that should be an issue. W&B doesn't care about the step mismatch. In the UI the validation metric will be at nth step where For a more elegant solution you should check out |
@ayulockin Ah I see. However, when I ran it originally, this warning message was showing after each validation
And also none of these val metrics were logged to the wandb run. See original wandb run here. However, after applying my hot-fix above, the warning message no longer shows up and the val metrics successfully logs to wandb. See wandb run after fix here. [Update] I tried the same experiments with upgraded wandb (upgrade from pip version 0.12.0 to 0.12.9), warning message will not show on the newer wandb pip version, but the performance remains the same: val metrics will only be logged to wandb after applying the fix above |
Thanks for checking it out @levan92. I will investigate more in this direction and make a PR to fix it. |
Hi @levan92 , as a workaround, you can set mmcv/mmcv/runner/hooks/logger/wandb.py Line 35 in 1f25001
|
Describe the Issue
Validation metrics reporting/logging to wandb does not happen when using
IterBasedRunner
Reproduction
Here's a simple reproduction of the bug.
Using mmdetection,
In config file,
faster_rcnn/faster_rcnn_r50_caffe_c4_1x_coco.py
with the following edits:Environment
Python 3.8.8
mmcv 1.4.5
mmdet 2.25.0
wandb 0.12.0
Error traceback
If applicable, paste the error traceback here.
Bug fix
In the wandb hook, for the
log
method,self.wandb.log
is called withcommit=True
by default all the time. Therefore, the log call from last training step (before validation) will cause wandb to increment step by one. Then when wandb.log is called for the validation metric, wandb's step will be ahead of the current step (at validation) by one.Is there a good way to commit only after the each validation is done?
The text was updated successfully, but these errors were encountered: