-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CLI] Trainer pipelines (tune+fit, fit+test) using the CLI #8385
Comments
If it helps simplify the interface, I think it is reasonable that if you are doing multiple things (e.g. tune, train, then test) the config files are all applied prior to any of these and only once. Then the chain themselves are responsible for any modifications. It also would be much nicer to have the tuning options inside of the underlying config so that you only need to save out the one If people want to do them in an order with weird stuff in between, they can call the The other thing to keep in mind is that the default Right now the way I had to add tuning was the following: class MyLightningCLI(LightningCLI):
def add_arguments_to_parser(self, parser):
parser.add_argument('--tune', action="store_true")
parser.add_argument('--tune_max_lr', type=float, default=1e-2)
parser.add_argument('--tune_save_results', action="store_true")
def before_fit(self):
if self.config["tune"]:
lr_finder = self.trainer.tuner.lr_find(self.model, max_lr=self.config["tune_max_lr"]) # could add more?
suggested_lr = lr_finder.suggestion()
print(f"Changing learning rate to {suggested_lr}\n")
self.model.hparams.learning_rate = suggested_lr
if self.config["tune_save_results"]:
fig = lr_finder.plot(suggest=True)
logdir = Path(self.trainer.log_dir)
if not os.path.exists(logdir):
os.makedirs(logdir)
fig.savefig( logdir / "lr_results.png") I am not sayign that this should necessarily be any simpler in the code (though it could be if the I am not sure what One last lower priority thing: with these "chains" is that there are many cases where having an additional stage (e.g. the choice to "pretrain" a model might be useful). For that, I am worried it is too particular for the problems that I solve and not sufficiently universal to be worth formalizing, but if you adding in user-defined actions it might make it more convenient. Otherwise I can hijack |
I'd be interested in such a feature, too. I usually execute train + test to have all necessary data present. I'm using Comet ML and at the moment am storing the experiment key so that I don't have a new experiment run when I want to test the model that I've just trained. It works, but it would be convenient to be able to call multiple subcommands at once. |
I enhance LightningCLI myself to do this currently, if anyone is interested, please refer to https://github.com/tshu-w/deep-learning-project-template/blob/main/src/lit_cli.py#L70 |
Yeah, that's definitely a possibility, I think it would be nice to have it supported directly still :) |
CLI users already get overwhelmed with merging config files and the difference between an argument before and after a subcommand. This proposal would make things much more complex in this regard and would push the limits of what one can do with command line input or a configuration file. The suggested alternatives are:
|
🚀 Feature
Add CLI submodules to chain trainer functions in a single CLI command
Motivation
As a user, I might want to run multiple trainer commands one after the other from the CLI, without the need to save the HP found in tune and set them, and allow using single tensorboard to compare fit/test.
For example:
Risks
python script.py tune --trainer.auto_lr_find=True --config=tune_config.yaml fit --max_epochs=5 --config=fit_config.yaml
It's not obvious if the trainer should be instantiated in both cases with auto_lr_find=True, max_epochs=5, ... and merge the config files or if they should be separated and different trainer instances should be created.
And the permutations of possibilities can have an impact which would be confusing for users not familiar with this chaining structure.
The text was updated successfully, but these errors were encountered: