exp init: create workspace structure by default, show workspace tree #7331

skshetry · 2022-02-01T11:32:46Z

This PR creates the workspace structure by default. That means that the code paths and data paths are created if they don't exist. If the path has any extension, the empty file will be created, otherwise, a directory is created.

Initially, this was to be created for the defaults only, but I have added the magic everywhere, so this is applied to passed arguments, defaults or config values too.

Second thing that PR adds is this output so that users are aware of the changes and their workspace structure:

Creating experiment project structure:
├── data
├── metrics.json
├── models
├── params.yaml
├── plots
└── src

Closes #7137, closes #7138 and closes #7139.

skshetry · 2022-02-07T08:39:01Z

We may want to reconsider the output for interactive mode now, especially for the one that shows workspace structure, since we create some of the files by default:

This command will guide you to set up a train stage in dvc.yaml.
See https://s.dvc.org/g/pipeline-files.

Enter the paths for dependencies and outputs of the command.
DVC assumes the following workspace structure:
├── data
├── metrics.json
├── models
├── params.yaml
├── plots
└── src

Path to a code file/directory [src, n to omit]:
'src' does not exist, the directory will be created.
Path to a data file/directory [data, n to omit]:
'data' does not exist, the directory will be created.
Path to a model file/directory [models, n to omit]:
Path to a parameters file [params.yaml, n to omit]:
Path to a metrics file [metrics.json, n to omit]:
Path to a plots file/directory [plots, n to omit]:

dberenbaum · 2022-02-08T18:27:05Z

Thanks, @skshetry! What about params? Will they be added separately after #4112? It might be confusing to explain what paths do and don't get created automatically. At least if params are created, it can be simplified to "inputs get created, outputs don't."

dberenbaum · 2022-02-08T21:22:12Z

Looks good!

Two issues to consider:

"Creating experiment project structure" may be misleading because outputs are not created, and inputs are only created if they don't already exist.
I agree that changes to the interactive output would be helpful, and specifically we should try to make interactive and non-interactive mode more consistent with each other.

Maybe we can just make the output more general?

$ dvc exp init --data raw python src/train.py
Experiment project structure:
├── metrics.json
├── models
├── params.yaml
├── plots
├── raw
└── src

'raw' does not exist, the directory will be created.
'src' does not exist, the directory will be created.
...

$ dvc exp init -i python src/train.py
Experiment project structure:
├── data
├── metrics.json
├── models
├── params.yaml
├── plots
└── src

Path to a code file/directory [src, n to omit]:
'src' does not exist, the directory will be created.
Path to a data file/directory [data, n to omit]: raw
'raw' does not exist, the directory will be created.
Path to a model file/directory [models, n to omit]:
Path to a parameters file [params.yaml, n to omit]:
Path to a metrics file [metrics.json, n to omit]:
Path to a plots file/directory [plots, n to omit]:
...

skshetry · 2022-02-09T05:27:37Z

Thanks, @skshetry! What about params? Will they be added separately after #4112? It might be confusing to explain what paths do and don't get created automatically. At least if params are created, it can be simplified to "inputs get created, outputs don't."

yeah, I was thinking of just doing that after #4112 as it'll be noop. But could do that now too, as params.yaml file is still read by default and is still usable.

skshetry · 2022-02-09T08:55:00Z

'raw' does not exist, the directory will be created.
'src' does not exist, the directory will be created.

For non-interactive mode, I'll just merge these into a single message:

Creating raw, src and params.yaml

skshetry · 2022-02-09T11:23:12Z

Also, should the message be following?

Using/Expecting/... Experiment project structure:

skshetry · 2022-02-09T12:01:08Z

I have decided to not create params.yaml by default in this PR as it requires some changes and thoughts on interactive mode. Let's do that in a successive PR. @dberenbaum

dberenbaum · 2022-02-09T14:00:55Z

For interactive mode, I'll just merge these into a single message:

Do you mean for non-interactive mode, or am I misunderstanding?

Also, I would just keep them separate unless you have a strong reason to merge them. I don't see an exact guideline for this, but I prefer a single piece of information per line. It keeps the clarity of whether each path is a file or directory, it's more consistent, and it's easier to parse.

Using/Expecting/... Experiment project structure:

I think "Using" is fine.

skshetry · 2022-02-09T14:48:59Z

For interactive mode, I'll just merge these into a single message:

Do you mean for non-interactive mode, or am I misunderstanding?

Yeah sorry, I meant non-interactive mode.

Also, I would just keep them separate unless you have a strong reason to merge them. I don't see an exact guideline for this, but I prefer a single piece of information per line. It keeps the clarity of whether each path is a file or directory, it's more consistent, and it's easier to parse.

Right, but is something being a file or a directory important information? It may cause visual noise without adding any meaningful information. I think that we created something is more important.

This is what the current implementation displays:

Creating data and src

dberenbaum · 2022-02-09T17:02:37Z

I have decided to not create params.yaml by default in this PR as it requires some changes and thoughts on interactive mode.

You could also address that in a separate PR first and come back to this if it's easier since it would make params have less special behavior that needs to be handled.

I'm seeing inconsistent behavior when using flags with/without interactive mode:

Example 1:

$ dvc exp init -f --data raw python src/train.py
Using experiment project structure:
├── metrics.json
├── models
├── params.yaml
├── plots
├── raw
└── src

Creating # What happened here? I expected it to say 'Creating raw.'
Created train stage in dvc.yaml. To run, use "dvc exp run".
See https://s.dvc.org/g/exp/run.

$ rmdir raw

$ dvc exp init -if --data raw python src/train.py
This command will guide you to set up a train stage in dvc.yaml.
See https://s.dvc.org/g/pipeline-files.

Enter the paths for dependencies and outputs of the command.
Using experiment project structure:
├── metrics.json
├── models
├── params.yaml
├── plots
├── raw
└── src

Path to a code file/directory [src, n to omit]:
Path to a model file/directory [models, n to omit]:
Path to a parameters file [params.yaml, n to omit]:
Path to a metrics file [metrics.json, n to omit]:
Path to a plots file/directory [plots, n to omit]:

─────────────────────────────────────────────────────────────────────────────────────────
train:
  cmd: python src/train.py
  deps:
  - raw
  - src
  outs:
  - models
  metrics:
  - metrics.json:
      cache: false
  plots:
  - plots:
      cache: false

Do you want to add the above contents to dvc.yaml? [Y/n]:
Created train stage in dvc.yaml. To run, use "dvc exp run".
See https://s.dvc.org/g/exp/run.

$ ls raw # This is created, but why wasn't there a message about it during `dvc exp init -i`?

Example 2:

$ dvc exp init -f --params params.txt python src/train.py
ERROR: Parameters file 'params.txt' does not exist.

$ This command will guide you to set up a train stage in dvc.yaml.
See https://s.dvc.org/g/pipeline-files.

Enter the paths for dependencies and outputs of the command.
Using experiment project structure:
├── data
├── metrics.json
├── models
├── params.txt
├── plots
└── src

Path to a code file/directory [src, n to omit]:
Path to a data file/directory [data, n to omit]:
Path to a model file/directory [models, n to omit]:
Path to a metrics file [metrics.json, n to omit]:
Path to a plots file/directory [plots, n to omit]:
ERROR: Parameters file 'params.txt' does not exist. # Why wait until the end to validate this?

We spent a lot of time developing the interactive mode initially, but I think that led to problems with the non-interactive mode and now inconsistencies between them. Do we need such different implementations for interactive and non-interactive modes? I think it creates a more consistent UI and reduces the risk of bugs or odd behavior in either mode if we combine as much as possible.

I would prefer a unified workflow like:

Get template to determine expected defaults. Print default tree if in interactive mode. I would disallow --data and other override flags in interactive mode (I think it's confusing, overly complicated, and I can't imagine why someone would need this).
Iterate over each dependency and output to collect overrides (from prompts in interactive mode) and create dependencies as needed (output should show creation of each as it happens).
Pass context to stage create method. I think we should print the same thing in both modes here, most likely the updated tree.

I know this is asking for large changes, but hopefully it will be simpler and raise fewer issues/questions once complete. Let me know if you want to have a call to discuss.

skshetry · 2022-02-16T13:50:16Z

I have decided to not create params.yaml by default in this PR as it requires some changes and thoughts on interactive mode.

You could also address that in a separate PR first and come back to this if it's easier since it would make params have less special behavior that needs to be handled.

This PR handles "all inputs are created", so this can already create params by empty. We just need to get rid of those checks that we have before. I don't want to make too many decisions or changes in a single PR. And I need this PR before we move to params, as it unifies input creations.

I'm seeing inconsistent behavior when using flags with/without interactive mode:

Great catch with the UI messages. The first one is an oversight on this PR. "no message" on overrides has been fixed, it'll be shown even in interactive and non-interactive mode.

The params issue will be fixed after we remove the checks for params file existence.

I would prefer a unified workflow like:

1. Get template to determine expected defaults. Print default tree if in interactive mode. I would disallow `--data` and other override flags in interactive mode (I think it's confusing, overly complicated, and I can't imagine why someone would need this).

This is something that will rarely be used. I prefer it as being an override, as ignoring them may be more confusing. Although I have no strong opinion.

2. Iterate over each dependency and output to collect overrides (from prompts in interactive mode) and create dependencies as needed (output should show creation of each as it happens).

Yeah, fixed this. Thanks.

3. Pass context to stage create method. I think we should print the same thing in both modes here, most likely the updated tree.

Makes sense, but how should we improve message before the prompts in an interactive mode?

dberenbaum · 2022-02-16T20:16:12Z

1. Get template to determine expected defaults. Print default tree if in interactive mode. I would disallow `--data` and other override flags in interactive mode (I think it's confusing, overly complicated, and I can't imagine why someone would need this).
This is something that will rarely be used. I prefer it as being an override, as ignoring them may be more confusing. Although I have no strong opinion.

Yeah, I agree that ignoring the arguments is a bad option. I meant that --interactive should be mutually exclusive with --data and other overrides (including --explicit). It signals that the overrides are intended for non-interactive mode.

It can also reduce some of the init_interactive() logic. It wouldn't be necessary to adjust the prompts or workspace tree. That would help ensure consistency between interactive and non-interactive modes.

3. Pass context to stage create method. I think we should print the same thing in both modes here, most likely the updated tree.
Makes sense, but how should we improve message before the prompts in an interactive mode?

IMO we can get rid of printing the tree before the prompts to simplify, especially if we are printing the updated tree at the end.

I also think we can get rid of all the if interactive/if not interactive conditionals after repo.stage.create() and always print:

Using experiment project structure:
├── metrics.json
├── models
├── mydata
├── params.yaml
├── plots
└── src

Created train stage in dvc.yaml. To run, use "dvc exp run".
See https://s.dvc.org/g/exp/run.

Having a confirmation prompt in interactive mode after printing the tree seems okay if you think it's helpful to keep that, but I don't know that it's necessary.

skshetry · 2022-02-17T08:30:55Z

Yeah, I agree that ignoring the arguments is a bad option. I meant that --interactive should be mutually exclusive with --data and other overrides (including --explicit). It signals that the overrides are intended for non-interactive mode.

It can also reduce some of the init_interactive() logic. It wouldn't be necessary to adjust the prompts or workspace tree. That would help ensure consistency between interactive and non-interactive modes.

Right, but if you are using --data with --interactive, I think it's better to honor their request and think of it as an override, considering we already have support for that and documented the behavior in CLI help as:

  --interactive, -i     Prompt for values that are not provided

IMO we can get rid of printing the tree before the prompts to simplify, especially if we are printing the updated tree at the end.

I'll do this in the next PR.

I also think we can get rid of all the if interactive/if not interactive conditionals after repo.stage.create() and always print:

Yeah, let's get rid of it then. Also in the next PR. Thanks.

skshetry · 2022-02-17T08:33:03Z

I am merging this PR. This is a release-blocker Blocks a release now until few issues that we are discussing are merged, especially the params one.

cc @efiop

skshetry force-pushed the exp-init-workspace branch from 50e1c0f to dc50914 Compare February 7, 2022 08:26

skshetry marked this pull request as ready for review February 7, 2022 08:27

skshetry requested a review from a team as a code owner February 7, 2022 08:27

skshetry requested review from pared and dberenbaum February 7, 2022 08:27

skshetry added A: experiments Related to dvc exp enhancement Enhances DVC ui user interface / interaction labels Feb 7, 2022

skshetry self-assigned this Feb 7, 2022

skshetry force-pushed the exp-init-workspace branch from dc50914 to 7df1085 Compare February 7, 2022 14:01

pared approved these changes Feb 8, 2022

View reviewed changes

dberenbaum mentioned this pull request Feb 8, 2022

exp init experience #7130

Closed

10 tasks

skshetry mentioned this pull request Feb 15, 2022

localfs: make it fsspec-compatible #7352

Open

skshetry added 3 commits February 16, 2022 17:11

exp init: create workspace structure by default, show workspace tree

f1a6140

exp init: log created dependencies in non-interactive mode

4055de5

exp init: update message

e341e70

fix UI message

6ec7b7d

skshetry force-pushed the exp-init-workspace branch from 00ac3fc to 6ec7b7d Compare February 16, 2022 13:53

s/Creating/Created

50cd26e

skshetry merged commit 497eac9 into iterative:main Feb 17, 2022

skshetry deleted the exp-init-workspace branch February 17, 2022 08:33

skshetry mentioned this pull request Feb 17, 2022

exp init: refactor and simplify interactive mode #7396

Merged

2 tasks

skshetry restored the exp-init-workspace branch April 27, 2022 03:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

exp init: create workspace structure by default, show workspace tree #7331

exp init: create workspace structure by default, show workspace tree #7331

skshetry commented Feb 1, 2022 •

edited

Loading

skshetry commented Feb 7, 2022

dberenbaum commented Feb 8, 2022

dberenbaum commented Feb 8, 2022

skshetry commented Feb 9, 2022

skshetry commented Feb 9, 2022 •

edited

Loading

skshetry commented Feb 9, 2022 •

edited

Loading

skshetry commented Feb 9, 2022 •

edited

Loading

dberenbaum commented Feb 9, 2022

skshetry commented Feb 9, 2022 •

edited

Loading

dberenbaum commented Feb 9, 2022

skshetry commented Feb 16, 2022 •

edited

Loading

dberenbaum commented Feb 16, 2022

skshetry commented Feb 17, 2022

skshetry commented Feb 17, 2022 •

edited

Loading

exp init: create workspace structure by default, show workspace tree #7331

exp init: create workspace structure by default, show workspace tree #7331

Conversation

skshetry commented Feb 1, 2022 • edited Loading

skshetry commented Feb 7, 2022

dberenbaum commented Feb 8, 2022

dberenbaum commented Feb 8, 2022

skshetry commented Feb 9, 2022

skshetry commented Feb 9, 2022 • edited Loading

skshetry commented Feb 9, 2022 • edited Loading

skshetry commented Feb 9, 2022 • edited Loading

dberenbaum commented Feb 9, 2022

skshetry commented Feb 9, 2022 • edited Loading

dberenbaum commented Feb 9, 2022

skshetry commented Feb 16, 2022 • edited Loading

dberenbaum commented Feb 16, 2022

skshetry commented Feb 17, 2022

skshetry commented Feb 17, 2022 • edited Loading

skshetry commented Feb 1, 2022 •

edited

Loading

skshetry commented Feb 9, 2022 •

edited

Loading

skshetry commented Feb 9, 2022 •

edited

Loading

skshetry commented Feb 9, 2022 •

edited

Loading

skshetry commented Feb 9, 2022 •

edited

Loading

skshetry commented Feb 16, 2022 •

edited

Loading

skshetry commented Feb 17, 2022 •

edited

Loading