exp init: refactor and simplify interactive mode #7396

skshetry · 2022-02-17T09:03:49Z

Moves workspace tree for interactive mode after prompts. Previously
this was displayed before the prompts.
Remove the yaml contents and the confirmation prompt in an interactive
mode.
Simplify init_interactive().

Before

This command will guide you to set up a train stage in dvc.yaml.
See https://s.dvc.org/g/pipeline-files.

Command to execute: true

Enter the paths for dependencies and outputs of the command.
DVC assumes the following workspace structure:
├── data
├── metrics.json
├── models
├── params.yaml
├── plots
└── src

Path to a code file/directory [src, n to omit]:
'src' does not exist in the workspace. "exp run" may fail.
Path to a data file/directory [data, n to omit]:
'data' does not exist in the workspace. "exp run" may fail.
Path to a model file/directory [models, n to omit]:
Path to a parameters file [params.yaml, n to omit]:
Path to a metrics file [metrics.json, n to omit]:
Path to a plots file/directory [plots, n to omit]:
────────────────────────────────────────────────────────────────────────────────
train:
  cmd: 'true'
  deps:
  - data
  - src
  params:
  - foo
  outs:
  - models
  metrics:
  - metrics.json:
      cache: false
  plots:
  - plots:
      cache: false

Do you want to add the above contents to dvc.yaml? [Y/n]: y

Created src and data.
Created train stage in dvc.yaml. To run, use "dvc exp run".
See https://s.dvc.org/g/exp/run.

After

This command will guide you to set up a train stage in dvc.yaml.
See https://s.dvc.org/g/pipeline-files.

Command to execute: true

Enter the paths for dependencies and outputs of the command.
Path to a code file/directory [src, n to omit]:
'src' does not exist, the directory will be created.
Path to a data file/directory [data, n to omit]:
'data' does not exist, the directory will be created.
Path to a model file/directory [models, n to omit]:
Path to a parameters file [params.yaml, n to omit]:
Path to a metrics file [metrics.json, n to omit]:
Path to a plots file/directory [plots, n to omit]:

Using experiment project structure:
├── data
├── metrics.json
├── models
├── params.yaml
├── plots
└── src

Created src and data.
Created train stage in dvc.yaml. To run, use "dvc exp run".

See #7331 (comment).

❗ I have followed the Contributing to DVC checklist.
📖 If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.

Thank you for the contribution - we'll try to review it as soon as possible. 🙏

- Moves workspace tree for interactive mode after prompts. Previously this was displayed before the prompts. - Remove the yaml contents and the confirmation prompt in an interactive mode. - Simplify `init_interactive()`.

pmrowla

LGTM from the engineering side

dberenbaum · 2022-02-17T16:40:00Z

Looks good! There are a few issues that are more follow-ups to #7331 than specific to changes in this PR.

Distinguishing dependencies and outputs.

Especially without showing dvc.yaml, it's unclear which paths are dependencies and which are outputs. Since only dependencies are created, this gets confusing:

$ exp init -f --data raw --metrics metrics python src/train.py
Using experiment project structure:
├── metrics
├── models
├── params.yaml
├── plots
├── raw
└── src

Created raw.
Created train stage in dvc.yaml. To run, use "dvc exp run".
See https://s.dvc.org/g/exp/run.

It's not obvious why raw is created but metrics is not.

In interactive mode, we could at least separate the prompts:

$ dvc exp init -if python src/train.py
This command will guide you to set up a train stage in dvc.yaml.
See https://s.dvc.org/g/pipeline-files.

Enter the paths for dependencies of the command.
Path to a code file/directory [src, n to omit]:
Path to a data file/directory [data, n to omit]:
Path to a parameters file [params.yaml, n to omit]:

Enter the paths for outputs of the command.
Path to a model file/directory [models, n to omit]:
Path to a metrics file [metrics.json, n to omit]:
Path to a plots file/directory [plots, n to omit]:

This doesn't help for non-interactive mode, though. Maybe we can signify outputs in the workspace tree and preface it like Using experiment project structure (paths in bold/red/whatever to be created by the experiment command).

Minor inconsistency between the prompts and the argument overrides.

The prompts give a warning if the input doesn't exist, but the argument overrides don't. It makes sense since you have the option to quit interactive mode if you don't want to create those paths, but it might be confusing, especially when combining them:

$ dvc exp init -if --data raw python src/train.py
This command will guide you to set up a train stage in dvc.yaml.
See https://s.dvc.org/g/pipeline-files.

Enter the paths for dependencies and outputs of the command.
Path to a code file/directory [src, n to omit]: code
'code' does not exist, the directory will be created.
Path to a model file/directory [models, n to omit]:
Path to a parameters file [params.yaml, n to omit]:
Path to a metrics file [metrics.json, n to omit]:
Path to a plots file/directory [plots, n to omit]:

Using experiment project structure:
├── code
├── metrics.json
├── models
├── params.yaml
├── plots
└── raw

Created code and raw.
Created train stage in dvc.yaml. To run, use "dvc exp run".
See https://s.dvc.org/g/exp/run.

There was a warning about creating code but not raw. If you want to keep the overrides, maybe they should always have a warning as well? Should we drop either the warnings or the final Created code and raw statement so that the info isn't duplicated?

Workspace tree for subdirectories

Take this example:

~/repo/ dvc exp init -f --data data/raw python src/train.py
Using experiment project structure:
├── data/raw
├── metrics.json
├── models
├── params.yaml
├── plots
└── src

Created data/raw.
Created train stage in dvc.yaml. To run, use "dvc exp run".
See https://s.dvc.org/g/exp/run.

This actually doesn't look that bad since it keeps the tree structure flat, but we should probably print out the actual tree branches:

├── data
│   └── raw
├── metrics.json
├── models
├── params.yaml
├── raw
├── plots
└── src

dberenbaum · 2022-02-17T16:55:50Z

In interactive mode, we could at least separate the prompts:
$ dvc exp init -if python src/train.py
This command will guide you to set up a train stage in dvc.yaml.
See https://s.dvc.org/g/pipeline-files.

Enter the paths for dependencies of the command.
Path to a code file/directory [src, n to omit]:
Path to a data file/directory [data, n to omit]:
Path to a parameters file [params.yaml, n to omit]:

Enter the paths for outputs of the command.
Path to a model file/directory [models, n to omit]:
Path to a metrics file [metrics.json, n to omit]:
Path to a plots file/directory [plots, n to omit]:
This doesn't help for non-interactive mode, though. Maybe we can signify outputs in the workspace tree and preface it like Using experiment project structure (paths in bold/red/whatever to be created by the experiment command).

Very minor, but usage of command and experiment throughout the output is a little inconsistent. What do you think about using experiment everywhere, like:

$ dvc exp init -if
This command will guide you to set up a train stage in dvc.yaml.
See https://s.dvc.org/g/pipeline-files.

Command to execute:  python src/train.py

Enter experiment dependencies.
Path to a code file/directory [src, n to omit]:
Path to a data file/directory [data, n to omit]:
Path to a parameters file [params.yaml, n to omit]:

Enter experiment outputs.
Path to a model file/directory [models, n to omit]:
Path to a metrics file [metrics.json, n to omit]:
Path to a plots file/directory [plots, n to omit]:

Project structure (paths in bold must be created by the experiment):
├── data
├── metrics
├── models
├── params.yaml
├── plots
└── src

Created train stage in dvc.yaml. To run, use "dvc exp run".
See https://s.dvc.org/g/exp/run.

skshetry · 2022-02-22T11:14:03Z

Looks good! There are a few issues that are more follow-ups to #7331 than specific to changes in this PR.

Distinguishing dependencies and outputs.

In interactive mode, we could at least separate the prompts:

That makes sense. Thanks. I'll handle that in next PR.

Minor inconsistency between the prompts and the argument overrides.

Since this is now an expected behaviour, i.e. inputs are created if they don't exist, I don't think we need to throw any warnings. I think it is clear from the message "Created ...". Interactive mode is just trying to more chatty, so the inconsistency does not bother me at least.

Workspace tree for subdirectories

This is something that I did think of, but I did not want to complicate implementations any further. Also I find the flat tree easier to scan through.

What do you think about using experiment everywhere:

Makes sense. Thanks.

exp init: refactor and simplify interactive mode

4c3ae8a

- Moves workspace tree for interactive mode after prompts. Previously this was displayed before the prompts. - Remove the yaml contents and the confirmation prompt in an interactive mode. - Simplify `init_interactive()`.

skshetry requested a review from a team as a code owner February 17, 2022 09:03

skshetry requested a review from pmrowla February 17, 2022 09:03

skshetry self-assigned this Feb 17, 2022

skshetry added A: cli Related to the CLI A: experiments Related to dvc exp enhancement Enhances DVC refactoring Factoring and re-factoring labels Feb 17, 2022

skshetry requested a review from dberenbaum February 17, 2022 09:39

pmrowla approved these changes Feb 17, 2022

View reviewed changes

skshetry merged commit fcddf08 into iterative:main Feb 22, 2022

skshetry deleted the simplify-exp-init branch February 22, 2022 11:14

skshetry restored the simplify-exp-init branch April 27, 2022 03:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

exp init: refactor and simplify interactive mode #7396

exp init: refactor and simplify interactive mode #7396

skshetry commented Feb 17, 2022 •

edited

Loading

pmrowla left a comment

dberenbaum commented Feb 17, 2022

dberenbaum commented Feb 17, 2022

skshetry commented Feb 22, 2022

Distinguishing dependencies and outputs.

Minor inconsistency between the prompts and the argument overrides.

Workspace tree for subdirectories

exp init: refactor and simplify interactive mode #7396

exp init: refactor and simplify interactive mode #7396

Conversation

skshetry commented Feb 17, 2022 • edited Loading

Before

After

pmrowla left a comment

Choose a reason for hiding this comment

dberenbaum commented Feb 17, 2022

Distinguishing dependencies and outputs.

Minor inconsistency between the prompts and the argument overrides.

Workspace tree for subdirectories

dberenbaum commented Feb 17, 2022

skshetry commented Feb 22, 2022

Distinguishing dependencies and outputs.

Minor inconsistency between the prompts and the argument overrides.

Workspace tree for subdirectories

skshetry commented Feb 17, 2022 •

edited

Loading