You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Metrics calculated by evaluation hook are sometimes logged as train/{metric_name} and sometimes as val/{metric_name}.
More precisely, imagine that you have evaluation interval equal to 250 iterations and logging interval equal to 20 iterations. On the 250th iteration you will have your evaluation results logged as val/{metric_name}. After that, on the 500th iteration both train loss logging and evaluation will occur and you will have your evaluation results logged as train/{metric_name}.
It is extremely frustrating, especially if you use tensorboard logger that builds you two different charts for train/{metric_name} and val/{metric_name}.
When training iteration is logged, time is in the buffer. Otherwise it's not.
This LOC comes from this PR that applied get_mode method everywhere despite the fact that previously it was only used in the text logger which logs the mode for the whole iteration rather than for each data point separately. Some of the issues caused by that PR were already fixed.
Before that pull request was merged, evaluation metrics were always logged with train tag. If such behavior is acceptable, I am willing to create a PR.
However, if you think that we need to always log evaluation results with val tag, it will require a lot of redesigning, because we will either need to create separate hook methods for evaluation or make EvalHook explicitly set val mode and flush the logger in the same manner as val(...) method of runner currently does.
The text was updated successfully, but these errors were encountered:
Describe the Issue
Metrics calculated by evaluation hook are sometimes logged as
train/{metric_name}
and sometimes asval/{metric_name}
.More precisely, imagine that you have evaluation interval equal to 250 iterations and logging interval equal to 20 iterations. On the 250th iteration you will have your evaluation results logged as
val/{metric_name}
. After that, on the 500th iteration both train loss logging and evaluation will occur and you will have your evaluation results logged astrain/{metric_name}
.It is extremely frustrating, especially if you use tensorboard logger that builds you two different charts for
train/{metric_name}
andval/{metric_name}
.Bug fix
This issue is caused by this line of code: https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/logger/base.py#L61
When training iteration is logged,
time
is in the buffer. Otherwise it's not.This LOC comes from this PR that applied
get_mode
method everywhere despite the fact that previously it was only used in the text logger which logs the mode for the whole iteration rather than for each data point separately. Some of the issues caused by that PR were already fixed.Before that pull request was merged, evaluation metrics were always logged with
train
tag. If such behavior is acceptable, I am willing to create a PR.However, if you think that we need to always log evaluation results with
val
tag, it will require a lot of redesigning, because we will either need to create separate hook methods for evaluation or make EvalHook explicitly setval
mode and flush the logger in the same manner asval(...)
method of runner currently does.The text was updated successfully, but these errors were encountered: