You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the ckpt_path is passed to the test/validation/predict functions of the Trainer, they load the weights even if a model is provided.
Motivation
I noticed that one of our DeepSpeed test was incorrect (see here). resume_from_checkpoint does not re-load the weights for test/validate/predict, which is probably the right thing to do, however when modified to pass ckpt_path to the test function I noticed the weights are not loaded, which is default behaviour.
As described by @carmocca I suggested we change the behaviour as such:
BEFORE
trainer.test(model, ckpt_path=None) # use provided model
trainer.test(model, ckpt_path='best') # use provided model, ignore ckpt_path
trainer.test(model, ckpt_path='my_path') # use provided model, ignore ckpt_path
trainer.fit(model)
# then
trainer.test(ckpt_path=None) # use latest model
trainer.test(ckpt_path='my_path') # load path
AFTER
trainer.test(model, ckpt_path=None) # use provided model
trainer.test(model, ckpt_path='best') # load best model
trainer.test(model, ckpt_path='my_path') # load path
trainer.fit(model)
# then
trainer.test(ckpt_path=None) # load best model
trainer.test(ckpt_path='my_path') # load path
This imo makes the behaviour in line with what's expected + allows deepspeed to be used as an engine in the cases where inference cannot happen without the Trainer (when there is sharding orchestration etc).
The text was updated successfully, but these errors were encountered:
trainer.test(model, ckpt_path=None) # use provided model
trainer.test(model, ckpt_path='best') # load best model
trainer.test(model, ckpt_path='my_path') # load path
trainer.fit(model)
# then
trainer.test(ckpt_path=None) # load best model
trainer.test(ckpt_path='my_path') # load path
🚀 Feature
When the
ckpt_path
is passed to thetest/validation/predict
functions of the Trainer, they load the weights even if a model is provided.Motivation
I noticed that one of our DeepSpeed test was incorrect (see here).
resume_from_checkpoint
does not re-load the weights for test/validate/predict, which is probably the right thing to do, however when modified to passckpt_path
to thetest
function I noticed the weights are not loaded, which is default behaviour.As described by @carmocca I suggested we change the behaviour as such:
BEFORE
AFTER
This imo makes the behaviour in line with what's expected + allows deepspeed to be used as an engine in the cases where inference cannot happen without the Trainer (when there is sharding orchestration etc).
The text was updated successfully, but these errors were encountered: