Skip to content
This repository has been archived by the owner on May 1, 2024. It is now read-only.

Possible 1-image difference in test accuracy between logger file and manual calculation #8

Open
bbartoldson opened this issue Apr 9, 2021 · 0 comments

Comments

@bbartoldson
Copy link

Hi Jonathan,

Thank you for sharing this awesome code!

I loaded 8 models from checkpoint.pth files created by lottery experiments, tested them on CIFAR10, and compared those test accuracies to the accuracies in the logger files. For 3/8 models, the test accuracies were different by 0.0001 (e.g., 78.53% vs. 78.54%). The logger accuracy could be higher or lower than my manually calculated accuracy; i.e., neither accuracy source was systematically higher.

If you know a potential reason for this, please let me know, and I will investigate. For example, I was thinking something could be happening due to a float() or str() conversion in the MetricLogger, but I don't think that is it. Also, maybe I'm not instantiating the PrunedModel properly when I do so manually using the checkpoint.pth files (if that's the case, though, then I'm not sure why the accuracies are exactly equal for 5/8 models and very close for 3/8). In case it's helpful, these 8 models come from 8 levels of pruning during a lottery experiment (so they have the same architecture).

Thanks!

Brian

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant