Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CLI] Trainer pipelines (tune+fit, fit+test) using the CLI #8385

Closed
edenlightning opened this issue Jul 12, 2021 · 5 comments
Closed

[CLI] Trainer pipelines (tune+fit, fit+test) using the CLI #8385

edenlightning opened this issue Jul 12, 2021 · 5 comments
Labels
argparse (removed) Related to argument parsing (argparse, Hydra, ...) feature Is an improvement or enhancement help wanted Open to be worked on
Milestone

Comments

@edenlightning
Copy link
Contributor

edenlightning commented Jul 12, 2021

🚀 Feature

Add CLI submodules to chain trainer functions in a single CLI command

Motivation

As a user, I might want to run multiple trainer commands one after the other from the CLI, without the need to save the HP found in tune and set them, and allow using single tensorboard to compare fit/test.
For example:

  • tune -> fit
  • tune -> fit -> {validate,test}
  • fit -> {validate,test}

Risks

  • One of the main points of the CLI is reproducibility so passing multiple commands would make the command line inputs very complex when different options per function (or configs) are necessary. For example, in this chaining idea:
    python script.py tune --trainer.auto_lr_find=True --config=tune_config.yaml fit --max_epochs=5 --config=fit_config.yaml
    It's not obvious if the trainer should be instantiated in both cases with auto_lr_find=True, max_epochs=5, ... and merge the config files or if they should be separated and different trainer instances should be created.
    And the permutations of possibilities can have an impact which would be confusing for users not familiar with this chaining structure.
@edenlightning edenlightning added feature Is an improvement or enhancement help wanted Open to be worked on argparse (removed) Related to argument parsing (argparse, Hydra, ...) labels Jul 12, 2021
@edenlightning edenlightning added this to the v1.5 milestone Jul 12, 2021
@jlperla
Copy link

jlperla commented Jul 13, 2021

If it helps simplify the interface, I think it is reasonable that if you are doing multiple things (e.g. tune, train, then test) the config files are all applied prior to any of these and only once. Then the chain themselves are responsible for any modifications.

It also would be much nicer to have the tuning options inside of the underlying config so that you only need to save out the one yml to reproduce things. Saving out multiple files and reorganizing them sounds like a lot of confusion. Same with the testing options. The last thing I want to do is deal with splitting them across multiple files, as it is then impossible to easily reconstruct executing the CLI in the first place.

If people want to do them in an order with weird stuff in between, they can call the python script.py twice... but maybe I am missing a key usecase on this.

The other thing to keep in mind is that the default tune isn't necessarily very useful on its own, so it would be important to let users modify it and load things out of CLI parameters/etc. The auto_lr_find may be insufficient for the tuning (and is ignored without tuning anyways).

Right now the way I had to add tuning was the following:

class MyLightningCLI(LightningCLI):
    def add_arguments_to_parser(self, parser):
        parser.add_argument('--tune', action="store_true")
        parser.add_argument('--tune_max_lr', type=float, default=1e-2)
        parser.add_argument('--tune_save_results', action="store_true")
        
    def before_fit(self):
        if self.config["tune"]:
            lr_finder  = self.trainer.tuner.lr_find(self.model, max_lr=self.config["tune_max_lr"])  # could add more?
            suggested_lr =  lr_finder.suggestion()
            print(f"Changing learning rate to {suggested_lr}\n")
            self.model.hparams.learning_rate = suggested_lr
            if self.config["tune_save_results"]:
                fig = lr_finder.plot(suggest=True)
                logdir = Path(self.trainer.log_dir)
                if not os.path.exists(logdir):
                    os.makedirs(logdir)
                fig.savefig( logdir / "lr_results.png")

I am not sayign that this should necessarily be any simpler in the code (though it could be if the tune function is written to have some standardized settings/implementaiton/etc. There is nothing specific to my model in that code) , but hopefully it gives an example of a common use case. Consider using this to figure out where you might put that code in a refactoring of the hooks/etc.

I am not sure what validate vs. test in {validate,test} means in the actions above. Lightning calls the validation as it is going through the training process (with a sanity check at the beginning to help debugging). Why would I want to turn that on/off as an "action", or are you talking about something different? Anyways, unless you are talking about a new class of data in the DataModule/LightningModule callbacks, I think you mean test?

One last lower priority thing: with these "chains" is that there are many cases where having an additional stage (e.g. the choice to "pretrain" a model might be useful). For that, I am worried it is too particular for the problems that I solve and not sufficiently universal to be worth formalizing, but if you adding in user-defined actions it might make it more convenient. Otherwise I can hijack before_fit the way I have done above and emulate my own. Most of my models don't work very well unless I basically "pretrain" it with a simpler loss function for a few epochs before starting the real fitting process.

@awaelchli awaelchli modified the milestones: v1.5, v1.6 Nov 4, 2021
@florianblume
Copy link

I'd be interested in such a feature, too. I usually execute train + test to have all necessary data present. I'm using Comet ML and at the moment am storing the experiment key so that I don't have a new experiment run when I want to test the model that I've just trained. It works, but it would be convenient to be able to call multiple subcommands at once.

@tshu-w
Copy link
Contributor

tshu-w commented Jan 7, 2022

@florianblume
Copy link

Yeah, that's definitely a possibility, I think it would be nice to have it supported directly still :)

@carmocca
Copy link
Contributor

carmocca commented Feb 1, 2022

CLI users already get overwhelmed with merging config files and the difference between an argument before and after a subcommand.

This proposal would make things much more complex in this regard and would push the limits of what one can do with command line input or a configuration file.

The suggested alternatives are:

@carmocca carmocca closed this as completed Feb 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
argparse (removed) Related to argument parsing (argparse, Hydra, ...) feature Is an improvement or enhancement help wanted Open to be worked on
Projects
None yet
Development

No branches or pull requests

6 participants