Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exp init: refactor and simplify interactive mode #7396

Merged
merged 1 commit into from
Feb 22, 2022

Conversation

skshetry
Copy link
Member

@skshetry skshetry commented Feb 17, 2022

  • Moves workspace tree for interactive mode after prompts. Previously
    this was displayed before the prompts.
  • Remove the yaml contents and the confirmation prompt in an interactive
    mode.
  • Simplify init_interactive().
Before
This command will guide you to set up a train stage in dvc.yaml.
See https://s.dvc.org/g/pipeline-files.

Command to execute: true

Enter the paths for dependencies and outputs of the command.
DVC assumes the following workspace structure:
├── data
├── metrics.json
├── models
├── params.yaml
├── plots
└── src

Path to a code file/directory [src, n to omit]:
'src' does not exist in the workspace. "exp run" may fail.
Path to a data file/directory [data, n to omit]:
'data' does not exist in the workspace. "exp run" may fail.
Path to a model file/directory [models, n to omit]:
Path to a parameters file [params.yaml, n to omit]:
Path to a metrics file [metrics.json, n to omit]:
Path to a plots file/directory [plots, n to omit]:
────────────────────────────────────────────────────────────────────────────────
train:
  cmd: 'true'
  deps:
  - data
  - src
  params:
  - foo
  outs:
  - models
  metrics:
  - metrics.json:
      cache: false
  plots:
  - plots:
      cache: false

Do you want to add the above contents to dvc.yaml? [Y/n]: y

Created src and data.
Created train stage in dvc.yaml. To run, use "dvc exp run".
See https://s.dvc.org/g/exp/run.
After
This command will guide you to set up a train stage in dvc.yaml.
See https://s.dvc.org/g/pipeline-files.

Command to execute: true

Enter the paths for dependencies and outputs of the command.
Path to a code file/directory [src, n to omit]:
'src' does not exist, the directory will be created.
Path to a data file/directory [data, n to omit]:
'data' does not exist, the directory will be created.
Path to a model file/directory [models, n to omit]:
Path to a parameters file [params.yaml, n to omit]:
Path to a metrics file [metrics.json, n to omit]:
Path to a plots file/directory [plots, n to omit]:

Using experiment project structure:
├── data
├── metrics.json
├── models
├── params.yaml
├── plots
└── src

Created src and data.
Created train stage in dvc.yaml. To run, use "dvc exp run".

See #7331 (comment).

Thank you for the contribution - we'll try to review it as soon as possible. 🙏

- Moves workspace tree for interactive mode after prompts. Previously
  this was displayed before the prompts.
- Remove the yaml contents and the confirmation prompt in an interactive
  mode.
- Simplify `init_interactive()`.
@skshetry skshetry requested a review from a team as a code owner February 17, 2022 09:03
@skshetry skshetry requested a review from pmrowla February 17, 2022 09:03
@skshetry skshetry self-assigned this Feb 17, 2022
@skshetry skshetry added A: cli Related to the CLI A: experiments Related to dvc exp enhancement Enhances DVC refactoring Factoring and re-factoring labels Feb 17, 2022
@skshetry skshetry requested a review from dberenbaum February 17, 2022 09:39
Copy link
Contributor

@pmrowla pmrowla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from the engineering side

@dberenbaum
Copy link
Collaborator

Looks good! There are a few issues that are more follow-ups to #7331 than specific to changes in this PR.

Distinguishing dependencies and outputs.

Especially without showing dvc.yaml, it's unclear which paths are dependencies and which are outputs. Since only dependencies are created, this gets confusing:

$ exp init -f --data raw --metrics metrics python src/train.py
Using experiment project structure:
├── metrics
├── models
├── params.yaml
├── plots
├── raw
└── src

Created raw.
Created train stage in dvc.yaml. To run, use "dvc exp run".
See https://s.dvc.org/g/exp/run.

It's not obvious why raw is created but metrics is not.

In interactive mode, we could at least separate the prompts:

$ dvc exp init -if python src/train.py
This command will guide you to set up a train stage in dvc.yaml.
See https://s.dvc.org/g/pipeline-files.

Enter the paths for dependencies of the command.
Path to a code file/directory [src, n to omit]:
Path to a data file/directory [data, n to omit]:
Path to a parameters file [params.yaml, n to omit]:

Enter the paths for outputs of the command.
Path to a model file/directory [models, n to omit]:
Path to a metrics file [metrics.json, n to omit]:
Path to a plots file/directory [plots, n to omit]:

This doesn't help for non-interactive mode, though. Maybe we can signify outputs in the workspace tree and preface it like Using experiment project structure (paths in bold/red/whatever to be created by the experiment command).

Minor inconsistency between the prompts and the argument overrides.

The prompts give a warning if the input doesn't exist, but the argument overrides don't. It makes sense since you have the option to quit interactive mode if you don't want to create those paths, but it might be confusing, especially when combining them:

$ dvc exp init -if --data raw python src/train.py
This command will guide you to set up a train stage in dvc.yaml.
See https://s.dvc.org/g/pipeline-files.

Enter the paths for dependencies and outputs of the command.
Path to a code file/directory [src, n to omit]: code
'code' does not exist, the directory will be created.
Path to a model file/directory [models, n to omit]:
Path to a parameters file [params.yaml, n to omit]:
Path to a metrics file [metrics.json, n to omit]:
Path to a plots file/directory [plots, n to omit]:

Using experiment project structure:
├── code
├── metrics.json
├── models
├── params.yaml
├── plots
└── raw

Created code and raw.
Created train stage in dvc.yaml. To run, use "dvc exp run".
See https://s.dvc.org/g/exp/run.

There was a warning about creating code but not raw. If you want to keep the overrides, maybe they should always have a warning as well? Should we drop either the warnings or the final Created code and raw statement so that the info isn't duplicated?

Workspace tree for subdirectories

Take this example:

~/repo/ dvc exp init -f --data data/raw python src/train.py
Using experiment project structure:
├── data/raw
├── metrics.json
├── models
├── params.yaml
├── plots
└── src

Created data/raw.
Created train stage in dvc.yaml. To run, use "dvc exp run".
See https://s.dvc.org/g/exp/run.

This actually doesn't look that bad since it keeps the tree structure flat, but we should probably print out the actual tree branches:

├── data
│   └── raw
├── metrics.json
├── models
├── params.yaml
├── raw
├── plots
└── src

@dberenbaum
Copy link
Collaborator

In interactive mode, we could at least separate the prompts:

$ dvc exp init -if python src/train.py
This command will guide you to set up a train stage in dvc.yaml.
See https://s.dvc.org/g/pipeline-files.

Enter the paths for dependencies of the command.
Path to a code file/directory [src, n to omit]:
Path to a data file/directory [data, n to omit]:
Path to a parameters file [params.yaml, n to omit]:

Enter the paths for outputs of the command.
Path to a model file/directory [models, n to omit]:
Path to a metrics file [metrics.json, n to omit]:
Path to a plots file/directory [plots, n to omit]:

This doesn't help for non-interactive mode, though. Maybe we can signify outputs in the workspace tree and preface it like Using experiment project structure (paths in bold/red/whatever to be created by the experiment command).

Very minor, but usage of command and experiment throughout the output is a little inconsistent. What do you think about using experiment everywhere, like:

$ dvc exp init -if
This command will guide you to set up a train stage in dvc.yaml.
See https://s.dvc.org/g/pipeline-files.

Command to execute:  python src/train.py

Enter experiment dependencies.
Path to a code file/directory [src, n to omit]:
Path to a data file/directory [data, n to omit]:
Path to a parameters file [params.yaml, n to omit]:

Enter experiment outputs.
Path to a model file/directory [models, n to omit]:
Path to a metrics file [metrics.json, n to omit]:
Path to a plots file/directory [plots, n to omit]:

Project structure (paths in bold must be created by the experiment):
├── data
├── metrics
├── models
├── params.yaml
├── plots
└── src

Created train stage in dvc.yaml. To run, use "dvc exp run".
See https://s.dvc.org/g/exp/run.

@skshetry
Copy link
Member Author

Looks good! There are a few issues that are more follow-ups to #7331 than specific to changes in this PR.

Distinguishing dependencies and outputs.

In interactive mode, we could at least separate the prompts:

That makes sense. Thanks. I'll handle that in next PR.

Minor inconsistency between the prompts and the argument overrides.

Since this is now an expected behaviour, i.e. inputs are created if they don't exist, I don't think we need to throw any warnings. I think it is clear from the message "Created ...". Interactive mode is just trying to more chatty, so the inconsistency does not bother me at least.

Workspace tree for subdirectories

This is something that I did think of, but I did not want to complicate implementations any further. Also I find the flat tree easier to scan through.

What do you think about using experiment everywhere:

Makes sense. Thanks.

@skshetry skshetry merged commit fcddf08 into iterative:main Feb 22, 2022
@skshetry skshetry deleted the simplify-exp-init branch February 22, 2022 11:14
@skshetry skshetry restored the simplify-exp-init branch April 27, 2022 03:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: cli Related to the CLI A: experiments Related to dvc exp enhancement Enhances DVC refactoring Factoring and re-factoring
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants