[CLI] Trainer pipelines (tune+fit, fit+test) using the CLI #8385

edenlightning · 2021-07-12T20:57:26Z

🚀 Feature

Add CLI submodules to chain trainer functions in a single CLI command

Motivation

As a user, I might want to run multiple trainer commands one after the other from the CLI, without the need to save the HP found in tune and set them, and allow using single tensorboard to compare fit/test.
For example:

tune -> fit
tune -> fit -> {validate,test}
fit -> {validate,test}

Risks

One of the main points of the CLI is reproducibility so passing multiple commands would make the command line inputs very complex when different options per function (or configs) are necessary. For example, in this chaining idea:
python script.py tune --trainer.auto_lr_find=True --config=tune_config.yaml fit --max_epochs=5 --config=fit_config.yaml
It's not obvious if the trainer should be instantiated in both cases with auto_lr_find=True, max_epochs=5, ... and merge the config files or if they should be separated and different trainer instances should be created.
And the permutations of possibilities can have an impact which would be confusing for users not familiar with this chaining structure.

The text was updated successfully, but these errors were encountered:

jlperla · 2021-07-13T02:04:39Z

If it helps simplify the interface, I think it is reasonable that if you are doing multiple things (e.g. tune, train, then test) the config files are all applied prior to any of these and only once. Then the chain themselves are responsible for any modifications.

It also would be much nicer to have the tuning options inside of the underlying config so that you only need to save out the one yml to reproduce things. Saving out multiple files and reorganizing them sounds like a lot of confusion. Same with the testing options. The last thing I want to do is deal with splitting them across multiple files, as it is then impossible to easily reconstruct executing the CLI in the first place.

If people want to do them in an order with weird stuff in between, they can call the python script.py twice... but maybe I am missing a key usecase on this.

The other thing to keep in mind is that the default tune isn't necessarily very useful on its own, so it would be important to let users modify it and load things out of CLI parameters/etc. The auto_lr_find may be insufficient for the tuning (and is ignored without tuning anyways).

Right now the way I had to add tuning was the following:

class MyLightningCLI(LightningCLI):
    def add_arguments_to_parser(self, parser):
        parser.add_argument('--tune', action="store_true")
        parser.add_argument('--tune_max_lr', type=float, default=1e-2)
        parser.add_argument('--tune_save_results', action="store_true")
        
    def before_fit(self):
        if self.config["tune"]:
            lr_finder  = self.trainer.tuner.lr_find(self.model, max_lr=self.config["tune_max_lr"])  # could add more?
            suggested_lr =  lr_finder.suggestion()
            print(f"Changing learning rate to {suggested_lr}\n")
            self.model.hparams.learning_rate = suggested_lr
            if self.config["tune_save_results"]:
                fig = lr_finder.plot(suggest=True)
                logdir = Path(self.trainer.log_dir)
                if not os.path.exists(logdir):
                    os.makedirs(logdir)
                fig.savefig( logdir / "lr_results.png")

I am not sayign that this should necessarily be any simpler in the code (though it could be if the tune function is written to have some standardized settings/implementaiton/etc. There is nothing specific to my model in that code) , but hopefully it gives an example of a common use case. Consider using this to figure out where you might put that code in a refactoring of the hooks/etc.

I am not sure what validate vs. test in {validate,test} means in the actions above. Lightning calls the validation as it is going through the training process (with a sanity check at the beginning to help debugging). Why would I want to turn that on/off as an "action", or are you talking about something different? Anyways, unless you are talking about a new class of data in the DataModule/LightningModule callbacks, I think you mean test?

One last lower priority thing: with these "chains" is that there are many cases where having an additional stage (e.g. the choice to "pretrain" a model might be useful). For that, I am worried it is too particular for the problems that I solve and not sufficiently universal to be worth formalizing, but if you adding in user-defined actions it might make it more convenient. Otherwise I can hijack before_fit the way I have done above and emulate my own. Most of my models don't work very well unless I basically "pretrain" it with a simpler loss function for a few epochs before starting the real fitting process.

florianblume · 2022-01-07T13:29:47Z

I'd be interested in such a feature, too. I usually execute train + test to have all necessary data present. I'm using Comet ML and at the moment am storing the experiment key so that I don't have a new experiment run when I want to test the model that I've just trained. It works, but it would be convenient to be able to call multiple subcommands at once.

tshu-w · 2022-01-07T13:45:38Z

I enhance LightningCLI myself to do this currently, if anyone is interested, please refer to https://github.com/tshu-w/deep-learning-project-template/blob/main/src/lit_cli.py#L70

updated:
tshu-w/lightning-template@f1386e0#diff-8af0b134246b678d4a20a3bafb46020192aa34a1aa437dc9b193837236254007

florianblume · 2022-01-07T14:43:52Z

Yeah, that's definitely a possibility, I think it would be nice to have it supported directly still :)

carmocca · 2022-02-01T17:53:15Z

CLI users already get overwhelmed with merging config files and the difference between an argument before and after a subcommand.

This proposal would make things much more complex in this regard and would push the limits of what one can do with command line input or a configuration file.

The suggested alternatives are:

Save everything you need to disk and use it to reload on the next command line call.
Use LightningCLI(run=False) and manage all calls within a Python process, the CLI is then used to provide the initial configuration
Override the LightningCLI.after_run methods as exemplified here [CLI] Trainer pipelines (tune+fit, fit+test) using the CLI #8385 (comment)

edenlightning added feature Is an improvement or enhancement help wanted Open to be worked on argparse (removed) Related to argument parsing (argparse, Hydra, ...) labels Jul 12, 2021

edenlightning added this to the v1.5 milestone Jul 12, 2021

edenlightning mentioned this issue Jul 12, 2021

Call any trainer function from the LightningCLI #7508

Merged

11 tasks

awaelchli modified the milestones: v1.5, v1.6 Nov 4, 2021

F-Barto mentioned this issue Jan 19, 2022

[CLI] Can't launch test command from checkpoint because "fit" key added to top level of CLI config #11463

Closed

carmocca closed this as completed Feb 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CLI] Trainer pipelines (tune+fit, fit+test) using the CLI #8385

[CLI] Trainer pipelines (tune+fit, fit+test) using the CLI #8385

edenlightning commented Jul 12, 2021 •

edited by carmocca

Loading

jlperla commented Jul 13, 2021

florianblume commented Jan 7, 2022

tshu-w commented Jan 7, 2022 •

edited

Loading

florianblume commented Jan 7, 2022

carmocca commented Feb 1, 2022

[CLI] Trainer pipelines (tune+fit, fit+test) using the CLI #8385

[CLI] Trainer pipelines (tune+fit, fit+test) using the CLI #8385

Comments

edenlightning commented Jul 12, 2021 • edited by carmocca Loading

🚀 Feature

Motivation

Risks

jlperla commented Jul 13, 2021

florianblume commented Jan 7, 2022

tshu-w commented Jan 7, 2022 • edited Loading

florianblume commented Jan 7, 2022

carmocca commented Feb 1, 2022

edenlightning commented Jul 12, 2021 •

edited by carmocca

Loading

tshu-w commented Jan 7, 2022 •

edited

Loading