Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] cmd-ref: document exp init #3015

Merged
merged 9 commits into from
Dec 7, 2021
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions content/docs/command-reference/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -221,6 +221,30 @@ connection settings, and configuring a remote is the way that can be done.
> hash overlaps: the hash of an external <abbr>output</abbr> could collide with
> that of a local file with different content.

### exp

This section overrides default configured workspace paths in `dvc exp init`,
that helps to avoid repeating these paths if all of your projects share a
similar structure.

The section contains following options, which are only used as a default and can
be overidden explicitly through CLI arguments or through responses in prompts
(in `--interactive` mode).

- `exp.code` - path to your source file or directory.

- `exp.data` - path to your data file or directory to track.

- `exp.models` - path to your models file or directory.

- `exp.metrics` - path to your metrics file.

- `exp.params` - path to your parameters file.

- `exp.plots` - path to your plots file or directory.

- `exp.live` - path to your dvclive outputs.

### state

> 📖 See
Expand Down
6 changes: 4 additions & 2 deletions content/docs/command-reference/exp/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@
_New in DVC 2.0 (see `dvc version`)_

A set of commands to generate and manage <abbr>experiments</abbr>:
[run](/doc/command-reference/exp/run), [show](/doc/command-reference/exp/show),
[init](/doc/command-reference/exp/init), [run](/doc/command-reference/exp/run),
[show](/doc/command-reference/exp/show),
[diff](/doc/command-reference/exp/diff),
[apply](/doc/command-reference/exp/apply),
[branch](/doc/command-reference/exp/branch),
Expand All @@ -20,7 +21,7 @@ A set of commands to generate and manage <abbr>experiments</abbr>:

```usage
usage: dvc exp [-h] [-q | -v]
{show,apply,diff,run,gc,branch,list,push,pull,remove}
{show,apply,diff,run,gc,branch,list,push,pull,remove,init}
...

positional arguments:
Expand All @@ -37,6 +38,7 @@ positional arguments:
push Push a local experiment to a Git remote.
pull Pull an experiment from a Git remote.
remove Remove local experiments.
init Initialize experiments.
Copy link
Contributor

@jorgeorpinel jorgeorpinel Nov 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uninformative and sounds like a requirement (like dvc init). How can we better describe what this does for the user? (I need to read the rest of the changes before I can suggest something.)

This comment was marked as resolved.

Copy link
Contributor

@jorgeorpinel jorgeorpinel Nov 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the help output maybe a short form of a specific desc.

Codify a project variation and run it as an experiment

p.s. and re "sounds like a requirement (like dvc init)" - can we still reconsider the name? 😅
Maybe dvc exp new

```

## Description
Expand Down
158 changes: 158 additions & 0 deletions content/docs/command-reference/exp/init.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
# exp init

Initializes experiments.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

## Synopsis

```usage
usage: dvc exp init [-h] [-q | -v] [--run] [--interactive] [-f]
[--explicit] [--name NAME] [--code CODE]
[--data DATA] [--models MODELS] [--params PARAMS]
[--metrics METRICS] [--plots PLOTS] [--live LIVE]
[--type {default,live}]
[command]
```

## Description

`dvc exp init` helps you quickly get started with experiments. It reduces
boilerplate for initializing [pipeline](/doc/command-reference/dag) stages in a
`dvc.yaml` file by assuming sane defaults about the location of your data,
[parameters](/doc/command-reference/params), source code,
[models](/doc/command-reference/), [metrics](/doc/command-reference/metrics) and
[plots](/doc/command-reference/plots), which can be customized through config.

It also offers guided `--interactive` mode for creating a stage to be
[`exp run`](/doc/command-reference/exp/run) later. `dvc exp init` supports
creating different types of stages, eg: live if you are using
[dvclive](/doc/dvclive) to monitor and checkpoint progress during training of
machine learning models.

This command is intended to be light-weight and simple and lacks many bells and
whistles that `dvc stage add` provides.

### The `command` argument

The `command` argument is optional, if you are using `--interactive` mode. The
`command` sent to `dvc exp init` can be anything your terminal would accept and
Comment on lines +34 to +37
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we put this under Options like for targets in https://dvc.org/doc/command-reference/repro#options (and other places I think) ? That was an initiative of @skshetry actually 🙂

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep it as it is for now, as we also need to do something for stage add/run?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why keep it as a section here when we moved it in repro ? We had lots of good reasons (which you proposed), what's different?

do something for stage add/run

Out of scope but we can create an issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it's better to do it at the same time for all commands. I am going with what we have right now, let's handle it later?

Copy link
Contributor

@jorgeorpinel jorgeorpinel Dec 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p.s. we don't always make changes for all commands at once, we can take them as they come. BTW I actually like a section better than having it under Options, I'm just confused since we discussed this a lot for repro and you (and Ivan) strongly argued for putting it under Options I think 🤷 now it's unclear which one is inconsistent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it's better to do it at the same time for all commands. I am going with what we have right now, let's handle it later?

#3071 (review)

run directly, for example a shell built-in, expression, or binary found in
`PATH`. Please remember that any flags sent after the `command` are interpreted
by the command itself, not by `dvc exp init`.

⚠️ While DVC is platform-agnostic, the commands defined in your
[pipeline](/doc/command-reference/dag) stages may only work on some operating
systems and require certain software packages to be installed.

Wrap the command with double quotes `"` if there are special characters in it
like `|` (pipe) or `<`, `>` (redirection), otherwise they would apply to
`dvc run` itself. Use single quotes `'` instead if there are environment
variables in it that should be evaluated dynamically. Examples:

```dvc
$ dvc exp init "./a_script.sh > /dev/null 2>&1"
$ dvc exp init './another_script.sh $MYENVVAR'
```

## Options

- `-i`, `--interactive` - prompts user for the command to execute and different
paths for tracking outputs and dependencies, unless they are provided through
cli arguments explicitly. Interactive mode allows users to set those location
from default values or omit them.

- `--explicit` - `dvc exp init` assumes default location of your outputs and
dependencies (which can be overriden from the config). By using `--explicit`,
it will not use those default values while initializing experiments. In
`--interactive` mode, prompt won't set default value and all the values for
the prompt needs to be explicitly provided, or omitted.

- `--code` - override the a path to your source file or directory which your
experiment depends on. The default is `src` directory for your code.

- `--data` - override the path to your data file or directory to track, which
your experiment depends on. The default is `data` directory.

- `--model` - override the path to your models file or directory to track, which
your experiment depends on. `dvc exp init` assumes `models` directory by
default.

- `--params` - override the path to
[parameter dependencies](/doc/command-reference/params) which your experiment
depends on. The default parameters file name is `params.yaml`. Note that
`dvc exp init` may fail if the parameters file does not exist at the time of
the invocation, as DVC reads the file to find parameters to track for the
stage.

- `--metrics` - override the path to metrics file to track, which your
experiment produces. Default is `metrics.json` file.

- `--plots` - override the path to plots file or directory, which your
experiment produces. The default is `plots`.

- `--live` - override the directory `path` for [DVCLive](/doc/dvclive), which
your experiment will write logs to. The default is `dvclive` directory, which
only comes to effect when used with `--type=live`.

- `--type` - selects the type of the stage to create. Currently it provides two
different kinds of stages: `default` and `live`. If unspecified, `default`
stage is created.

`default` stage creates a stage with `metrics` and `plots` tracked by DVC
itself, and does not track live-created artifacts (unless explicitly
specified).

`live` stage is intended for use in deep-learning scenarios, where metrics and
plots are tracked by [dvclive](/doc/dvclive) and supports tracking progress
while training a deep-learning model with
[checkpoints](/doc/command-reference/exp/run#checkpoints).

- `-n <stage>`, `--name <stage>` - specify a custom name for the stage generated
by this command (e.g. `-n train`). By default, the name of the stage depends
on `--type` of the stage that is being created. If
`--type=default, the name of the stage will be `default`, and in case of `--type=live`, the name of the stage will be `live`.

Note that the stage name can only contain letters, numbers, dash `-` and
underscore `_`.

- `-f`, `--force` - overwrite an existing stage in `dvc.yaml` file without
asking for confirmation.

- `--run` - runs the experiment after initializing it.

- `-h`, `--help` - prints the usage/help message, and exit.

- `-q`, `--quiet` - do not write anything to standard output. Exit with 0 if no
problems arise, otherwise 1.

- `-v`, `--verbose` - displays detailed tracing information.

### Setting up custom paths

`dvc exp init` supports
[setting up custom workspace paths by setting up DVC config](/doc/command-reference/config#exp),
where you can add an `exp` section to the config file to point to the paths for
your code, data, models, parameters, metrics, plots (either images or tabular
data to be plotted), and dvclive outputs.

```dvc
$ dvc config exp.data datasets/
$ dvc config exp.params config.yaml
$ dvc config exp.code scripts/train.py
$ dvc config exp.models trained_models/
$ dvc config exp.metrics reports.json
$ dvc config exp.plots viz/
```

You can leave some configurations to use default values. This can be useful in a
system or global config so that you can avoid repeating these paths if all of
your projects share a similar structure.

## Non-interactive mode

## Guided/Interactive mode

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really want to show all the prompts. And I think most of the information are already mentioned elsewhere, so I am thinking of leaving these fields.

Copy link
Contributor

@jorgeorpinel jorgeorpinel Dec 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. If we ever make a detailed guide we can include all prompts but it's probably unnecessary (and a lot to maintain) either way.

## Examples

#### Default stage

#### Dvclive stage
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
4 changes: 4 additions & 0 deletions content/docs/sidebar.json
Original file line number Diff line number Diff line change
Expand Up @@ -260,6 +260,10 @@
"label": "exp show",
"slug": "show"
},
{
"label": "exp init",
"slug": "init"
},
{
"label": "exp diff",
"slug": "diff"
Expand Down