-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CLI] Can't launch test
command from checkpoint because "fit" key added to top level of CLI config
#11463
Comments
To not have |
@mauvilsa would you like to work on fixing this? |
I was just wondering. Correct me if I am wrong, but using self.save_hyperparameters() is recommended practice and so the checkpoint should contain the hyperparams necessary to load the model. Would it be possible to do something like that:
This would allow one to only have to run: Maybe 1 and 2 could be done with load_from_checkpoint when the model is given to the CLI: CLI(
BoringModel,
RandomDataModule,
save_config_overwrite=True,
) However, when dealing with |
I could but not sure when. Independent from this, is there any reason why we shouldn't do this or what else needs to be done? One minor I could think of is that by removing the subcommand key there is no way to know which subcommand was executed. |
I think it would be great if the model hyperparameters are saved in the checkpoint so that for test/predict the model can be loaded without the need to give a config. Though, note that |
So there's two topics being discussed here:
I don't think so.
This should be discussed in a separate issue, however, saving hypeparameters is not required with the CLI, perhaps even discouraged since the CLI config can do much more as @mauvilsa mentioned above.
We need the Trainer to be set up for this. This is out of the scope of the CLI. |
I opened #11532 which fixes:
|
Thank you very much @carmocca and @mauvilsa ! I agree with you for the two points you mentioned @carmocca. How do we, as users, use the Lightning CLI to run the standard model usage pipeline (train-val, test and predict) ? What parameters do I have to provide the CLI at each stage (configs, checkpoint, config overwrite...) ? Why do I need to configure the optimizer and scheduler for the test stage (if using linked arguments, it throws an error in the test stage if not configured) ? The current documentation is not clear about what we should do to execute the test and predict commands through the CLI. maybe related are: I'm currently in the process of publishing a research paper, and I would really like to use your CLI in my released code. Many thanks again and have a good day ! I am looking forward to the merge of the PR 👀 |
For these use cases, you have 3 options:
We should totally skip this in that case. Can you open a separate issue?
The patch will be included with the 1.5.9 release |
🐛 Bug
Hello everyone and many thanks for your awesome work.
The example given here is a dummy one. I reduced my issue to a simple, reproducible example.
Let's consider the
fit
andtest
stage. Between those two stages, the model have the same parameters. However, the data might be different (different split).When using the CLI for the
test
command, I expected to only give a config overwrite for the data, but I have to give the config for everything.Followed by
won't work as is and will raise a
TypeError: empty(): ....
I can't use the config from the log file:
as it will raise
test_trainer.py: error: 'Configuration check failed :: No action for destination key "fit.model.chans" to check its value.'
because a "fit" is added by the CLI at the top of the original config file.I have to use:
Which makes a trained network (or its ckpt, or log dir) not self-contained, and it can be easy to use the wrong config at test time.
To Reproduce
config.yaml
test_data.yaml
In reality, the parameters of data would be very different (e.g., split_file: train.txt/split_file=test.txt). Here it is just to illustrate a point.
trainer.py
Expected behavior
I would expect the following sequence of command to work:
train
test
or (for test):
but lightning_logs/version_0/config.yaml has now an additional "fit" key:
Environment
Additional context
Related issues:
#10460
Have a good day.
cc @carmocca @mauvilsa @rbracco
The text was updated successfully, but these errors were encountered: