Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref: document dvc queue #3715

Merged
merged 33 commits into from
Aug 3, 2022
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
4e68869
ref: add queue section
pmrowla Jul 1, 2022
dbe354c
ref: dvc queue start
pmrowla Jul 1, 2022
fb61d31
ref: dvc queue stop
pmrowla Jul 1, 2022
817afb7
ref: dvc queue kill
pmrowla Jul 1, 2022
a26ee1d
ref: dvc queue status
pmrowla Jul 1, 2022
e5feaf4
ref: dvc queue logs
pmrowla Jul 1, 2022
8a4b3d8
ref: dvc queue remove
pmrowla Jul 1, 2022
75e21a3
ref: update `exp run` reference for queue changes
pmrowla Jul 5, 2022
01ee250
ref: add queue logs and queue status examples
pmrowla Jul 5, 2022
ea22d10
review fixes
pmrowla Jul 6, 2022
89041fa
add gif animation for logs --follow
pmrowla Jul 6, 2022
f5fef4d
update idle worker note
pmrowla Jul 6, 2022
29760dc
add queue to sidebar
pmrowla Jul 6, 2022
533f175
Update content/docs/command-reference/queue/logs.md
jorgeorpinel Jul 11, 2022
6677c18
Update content/docs/user-guide/experiment-management/running-experime…
jorgeorpinel Jul 11, 2022
f337cad
Update content/docs/command-reference/exp/remove.md
jorgeorpinel Jul 11, 2022
ecd46d4
Update content/docs/command-reference/exp/run.md
jorgeorpinel Jul 11, 2022
d230f9a
Update content/docs/command-reference/exp/run.md
jorgeorpinel Jul 11, 2022
7ae6413
Update content/docs/command-reference/exp/run.md
jorgeorpinel Jul 11, 2022
8f1681b
Update content/docs/command-reference/exp/remove.md
jorgeorpinel Jul 11, 2022
09a23bc
Restyled by prettier (#3757)
restyled-io[bot] Jul 11, 2022
7df232e
Update content/docs/command-reference/queue/index.md
jorgeorpinel Jul 11, 2022
c87cc2a
Update content/docs/command-reference/queue/index.md
jorgeorpinel Jul 11, 2022
6a17f18
Update content/docs/command-reference/queue/start.md
jorgeorpinel Jul 11, 2022
7264c94
Update content/docs/command-reference/queue/start.md
jorgeorpinel Jul 11, 2022
e571871
Restyled by prettier (#3758)
restyled-io[bot] Jul 11, 2022
68d34e4
Merge branch 'main' into cmdref-dvc-queue
jorgeorpinel Jul 29, 2022
8fd6a29
drop `exp remove --queue` deprecation warning
pmrowla Aug 2, 2022
5d556a1
cmdref: reorder queue subcommands sidebar
pmrowla Aug 3, 2022
45183e7
replace quote blocks with admon
pmrowla Aug 3, 2022
f6061dd
remove setup block from examples
pmrowla Aug 3, 2022
ed2e412
lint fixes
pmrowla Aug 3, 2022
e1930b4
remove logs --follow gif
pmrowla Aug 3, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions content/docs/command-reference/exp/remove.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,12 @@ With `--queue`, the list of experiments awaiting execution is cleared instead.
- `--queue` - remove all experiments that haven't been run yet (defined via
`dvc exp run --queue`).

- `-A`, `--all` - remove all experiments that have been run. Use `--queue` to
remove queued ones.
> ⚠️ `dvc exp remove --queue` is now an alias for `dvc queue remove --queued`.
> The `--queue` option will likely be deprecated and removed in a future DVC
> release. Refer to the `dvc queue` documentation for more details.
dberenbaum marked this conversation as resolved.
Show resolved Hide resolved
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

- `-A`, `--all` - remove all experiments that have been run. Use
`dvc queue remove` to remove queued experiment tasks.

- `--rev <commit>` - remove experiments derived from the specified `<commit>` as
baseline.
Expand Down
20 changes: 11 additions & 9 deletions content/docs/command-reference/exp/run.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,13 +34,9 @@ Use the `--set-param` (`-S`) option as a shortcut to change
<abbr>parameter</abbr> values [on-the-fly] before running the experiment.

It's possible to [queue experiments] for later execution with the `--queue`
flag. To actually run them, use `dvc exp run --run-all`. Queued experiments are
run sequentially by default, but can be run in parallel using the `--jobs`
option.

> ⚠️ Parallel runs are experimental and may be unstable. Make sure you're using
> a number of jobs that your environment can handle (no more than the CPU
> cores).
flag. Queued experiments can be run using `dvc queue start`, refer to th
pmrowla marked this conversation as resolved.
Show resolved Hide resolved
`dvc queue` documentation for more information on managing the experiment task
queue.

It's also possible to run special [checkpoint experiments] that log the
execution progress (useful for deep learning ML). The `--rev` and `--reset`
Expand Down Expand Up @@ -82,8 +78,7 @@ committing them to the Git repo. Unnecessary ones can be [cleared] with
runs.

- `--queue` - place this experiment at the end of a line for future execution,
but don't actually run it yet. Use `dvc exp run --run-all` to process the
queue.
but don't actually run it yet. Use `dvc queue start` to process the queue.

> For checkpoint experiments, this implies `--reset` unless a `--rev` is
> provided.
Expand All @@ -100,6 +95,13 @@ committing them to the Git repo. Unnecessary ones can be [cleared] with
> stages may sometimes be executed several times depending on the state of the
> [run-cache] at that time.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

> ⚠️ `dvc exp run --run-all [--jobs]` is now a shortcut for
> `dvc queue start [--jobs]` followed by `dvc queue logs -f`. The `--run-all`
> and `--jobs` options will likely be deprecated and removed in a future DVC
> release. It is recommended to migrate your workflows to use
> `dvc queue start` and `dvc queue logs`. Refer to the `dvc queue`
> documentation for more details.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also include this kind of warning in core DVC (i.e. actually logging DeprecationWarning)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually yes, but there haven't been any formal decisions about what and what not to deprecate w/the queueing changes. I think the current plan is to not rush formally deprecating the existing experiments UI since the dvc queue workflow will probably change once we get some user feedback

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine to say that this will be deprecated (we are already doing it here). There shouldn't be a hard requirement to include deprecation of experiment features since they are "experimental" in 2.X, but it would be nice.

- `-r <commit>`, `--rev <commit>` - resume an experiment from a specific
checkpoint name or hash (`commit`) in `--queue` or `--temp` runs.

Expand Down
38 changes: 38 additions & 0 deletions content/docs/command-reference/queue/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# queue

A set of commands to manage the
[DVC experiments](/doc/user-guide/experiment-management/experiments-overview)
task queue: [start](/doc/command-reference/queue/start),
Comment on lines +3 to +5
Copy link
Contributor

@jorgeorpinel jorgeorpinel Jul 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this terminology: "experiment task queue"

Why not just "experiment queue" ? Are "tasks" important enough to differentiate from experiments? (At first glance it sounds like an implementation detail)

WDYT @dberenbaum ? Could impact strings in the core codebase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's important to differentiate between "experiment" vs "queue task" due to commands where there's overlap (like exp remove vs queue remove).

exp remove removes DVC experiment data, meaning experiment git refs and their associated DVC cache data. queue remove removes task queue entry data, meaning the queue entry itself, and associated queue-related artifacts (i.e. logs)

The current plan is also to emphasize this in other commands. exp show will be modified so that by default it does not display queued or failed experiments by default (so the default view only shows things that can be removed with exp remove). There will still be optional flags for displaying queued/failed exps in the table (since it is useful to be able to see the params/deps for them).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorgeorpinel Maintaining two separate concepts is important, but feel free to suggest more useful terminology 🙏 .

Copy link
Contributor

@jorgeorpinel jorgeorpinel Jul 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I see there's some rationale and justification for the special terminology. I still think "task" is an implementation detail and could be avoided...

feel free to suggest more useful terminology

I'm thinking that it should be clear that exp commands deal with experiment data while queue commands deal with experiment queue entries (I mean that was the whole point of separating the commands I think 🙂) so term "experiment" can be used interchangeably in many places. Where disambiguation or emphasis are needed, we can use descriptive wording like "entry from the queue", "worker logs", etc.

But it's something we can follow-up on later if needed (still unsure about the release timeline we're looking at here).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is in main now and should be in the next DVC release.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everyone raises good points here. Agreed that dvc queue should be used infrequently. No functionality was removed from dvc exp here. For example, you can remove queued experiments without using dvc queue:

$ dvc exp run --queue -S train.min_split=0.5
Queued experiment 'd522711' for future execution.
$ dvc queue status
Task     Name    Created    Status
d522711          02:11 PM   Queued

Worker status: 0 active, 0 idle
$ dvc exp show
 ──────────────────────────────────────────────────────────────────────────────────────>
  Experiment              Created        State    avg_prec   roc_auc   prepare.split   >
 ──────────────────────────────────────────────────────────────────────────────────────>
  workspace               -              -           0.925   0.94602   0.2             >
  10-bigrams-experiment   May 19, 2022   -           0.925   0.94602   0.2             >
  └── d522711             02:11 PM       Queued          -         -   0.2             >
 ──────────────────────────────────────────────────────────────────────────────────────>
$ dvc exp remove d522711
Removed experiments: d522711

In this case, dvc exp remove and dvc queue remove are aliases, but dvc queue remove is needed to do things like drop successful experiments from dvc queue status and logs (and it seemed better to allow flexibility to remove any task from the queue as long as we have the command).

When the dvc queue commands are needed, it often is helpful to distinguish between the experiment and the queue task. Examples include completed experiments where you might want to clear them and their logs from the queue but keep the results, or checkpoints where there are multiple experiment rows for a single queue task (note that you can use the finished experiment names in the dvc queue commands, like dvc queue logs exp-7002e).

Ideas to further improve:

  • If there's functionality missing from dvc exp regarding the queue, we should add it where possible (for example, I notice dvc exp remove doesn't work for failed experiments, so I'll open an issue).
  • We might want to drop the suggested deprecations here (like dropping dvc exp remove --queued).

Copy link
Contributor

@jorgeorpinel jorgeorpinel Jul 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only specific suggestion for now is to avoid term "task". Be descriptive instead e.g. "queued experiment", "experiment from queue", even "entry from exps queue" if needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorgeorpinel if queue is considered a low level building block (and eventually can do things besides experiments for example), it's fine to use task here to my mind.

Copy link
Contributor

@jorgeorpinel jorgeorpinel Jul 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not strongly opposed but it seems unnecessary to me: the whole task management aspect of this is an implementation detail (ultimately irrelevant for users).

Copy link
Contributor

@jorgeorpinel jorgeorpinel Aug 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

queue is considered a low level building block

But "queue" is the name of the command. Hard to avoid that term.

p.s. term "worker process" also seems redundant and too deep. We already have "jobs" for the option name, I'd stick to that.
In some places we even have the special combo "task queue worker process" -- words have lost all meaning 😋

[stop](/doc/command-reference/queue/stop),
[status](/doc/command-reference/queue/status),
[logs](/doc/command-reference/queue/logs),
[remove](/doc/command-reference/queue/remove),
[kill](/doc/command-reference/queue/kill)

## Synopsis

```usage
usage: dvc queue [-h] [-q | -v] {start,stop,status,logs,remove,kill} ...
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

positional arguments:
COMMAND
start Start experiments queue workers.
stop Stop experiments queue workers.
status List the status of the queue tasks and workers.
logs Show output logs for a task in the experiments queue.
remove Remove tasks in experiments queue.
kill Kill tasks in experiments queue.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
```

## Description

`dvc queue` subcommands provide specialized ways to manage queued experiment
tasks.
Comment on lines +28 to +31
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's provide just a bit more context instead like "You can use exp run --queue to queue experiments and then..." instead of repeating the definition of the command (already in the top of the page). Thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmrowla What do you think? Could you add some phrasing like this?


## Options

- `-h`, `--help` - prints the usage/help message, and exit.

- `-q`, `--quiet` - do not write anything to standard output.

- `-v`, `--verbose` - displays detailed tracing information.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
31 changes: 31 additions & 0 deletions content/docs/command-reference/queue/kill.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
## queue kill
pmrowla marked this conversation as resolved.
Show resolved Hide resolved

Kill actively running
[DVC Experiment](/doc/user-guide/experiment-management/experiments-overview).
pmrowla marked this conversation as resolved.
Show resolved Hide resolved
tasks.

## Synopsis

```usage
usage: dvc queue kill [-h] [-q | -v] [<task> ...]

positional arguments:
<task> Tasks in queue to kill.
```

## Description

Forcefully stops execution of the specified (running) experiment tasks. Killed
tasks will be considered as failed runs.

This command does not stop the queue worker process. After the specified task
has been killed, the worker process will consume and execute the next experiment
task in the queue.

## Options

- `-h`, `--help` - prints the usage/help message, and exit.

- `-q`, `--quiet` - do not write anything to standard output.

- `-v`, `--verbose` - displays detailed tracing information.
191 changes: 191 additions & 0 deletions content/docs/command-reference/queue/logs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,191 @@
## queue logs

Show output logs for running and completed tasks in the
[DVC Experiment](/doc/user-guide/experiment-management/experiments-overview)
task queue.

## Synopsis

```usage
usage: dvc queue logs [-h] [-q | -v] [-e <encoding>] [-f] <task>

positional arguments:
<task> Task to show.
```

## Description

Shows output logs for the specified running or completed experiment task.

By default, this command will show any available log data and then exit. For
tasks which are still running, the `--follow` option can be used to attach to
the task and continuously show live log output, until the task has completed.

When using the `--follow` option, it is safe to stop following output using
`Ctrl+C` (or `SIGINT`). This will only cause the logs command to exit, and the
experiment task will continue to be run in the background.

## Options

- `-e <encoding>`, `--encoding <encoding>` - text encoding for log output.
Defaults to the system locale encoding.

> ⚠️ Note that this option is used to specify the encoding of the experiment
> task output (i.e. the output of pipeline stage commands), which may not
> always match the encoding of your system terminal.

- `-f`, `--follow` - attach to task and follow additional live output. Only
applicable of the task is still running.
pmrowla marked this conversation as resolved.
Show resolved Hide resolved

- `-h`, `--help` - prints the usage/help message, and exit.

- `-q`, `--quiet` - do not write anything to standard output.

- `-v`, `--verbose` - displays detailed tracing information.

## Examples

> This is based on our [Get Started](/doc/start/experiments), where you can find
> the actual source code.

<details>

### Expand to prepare the example ML project

Clone the DVC repo and download the data it <abbr>depends</abbr> on:

```dvc
$ git clone [email protected]:iterative/example-get-started.git
$ cd example-get-started
$ dvc pull
```

Let's also install the Python requirements:

> We **strongly** recommend creating a
> [virtual environment](https://python.readthedocs.io/en/stable/library/venv.html)
> first.

```dvc
$ pip install -r src/requirements.txt
```

</details>
pmrowla marked this conversation as resolved.
Show resolved Hide resolved

Let's say we have previously run some experiments:
pmrowla marked this conversation as resolved.
Show resolved Hide resolved

```dvc
$ dvc queue status
Task Name Created Status
192a13c 04:15 PM Failed
753b005 04:01 PM Success
0bbb118 04:01 PM Success
1ae8b65 04:01 PM Success

Worker status: 0 active, 0 idle
```

We can view the output for both failed and successfully completed experiment
tasks:

```dvc
$ dvc queue logs 192a13c
'data/data.xml.dvc' didn't change, skipping
Running stage 'prepare':
> python src/prepare.py data/data.xml
Traceback (most recent call last):
File "/Users/pmrowla/git/example-get-started/.dvc/tmp/exps/tmp217n0tjv/src/prepare.py", line 10, in <module>
raise AssertionError
AssertionError
ERROR: failed to reproduce 'prepare': failed to run: python src/prepare.py data/data.xml, exited with 1
```

```dvc
$ dvc queue logs 0bbb118
'data/data.xml.dvc' didn't change, skipping
Stage 'prepare' is cached - skipping run, checking out outputs
Updating lock file 'dvc.lock'

Stage 'featurize' is cached - skipping run, checking out outputs
Updating lock file 'dvc.lock'

Stage 'train' is cached - skipping run, checking out outputs
Updating lock file 'dvc.lock'

Stage 'evaluate' is cached - skipping run, checking out outputs
Updating lock file 'dvc.lock'

To track the changes with git, run:

git add dvc.yaml scores.json roc.json params.yaml data/prepared data/data.xml prc.json src/featurization.py data/features src/evaluate.py model.pkl dvc.lock src/train.py src/prepare.py

To enable auto staging, run:

dvc config core.autostage true
```

Let's queue a new experiment and view the output while it is running:

```dvc
$ dvc exp run --queue -S prepare.split=0.40 -S featurize.max_features=4000
Queued experiment '93cfa70' for future execution.
$ dvc queue start
Started '1' new experiments task queue worker.
$ dvc queue logs 93cfa70 ⏎
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
'data/data.xml.dvc' didn't change, skipping
Running stage 'prepare':
> python src/prepare.py data/data.xml
Updating lock file 'dvc.lock'

Running stage 'featurize':
> python src/featurization.py data/prepared data/features
```

We can see that by default, `dvc queue logs` displays any available output and
then exits. In this case, our `featurize` stage is still running, so no
additional output is available at this time.

Now let's use the `--follow` option to continue viewing all live output from the
running experiment task, until it has completed:
dberenbaum marked this conversation as resolved.
Show resolved Hide resolved

```dvc
$ dvc queue logs -f 93cfa70
Following logs for experiment '93cfa70'. Use Ctrl+C to stop following logs (experiment execution will continue).

'data/data.xml.dvc' didn't change, skipping
Running stage 'prepare':
> python src/prepare.py data/data.xml
Updating lock file 'dvc.lock'

Running stage 'featurize':
> python src/featurization.py data/prepared data/features
The input data frame data/prepared/train.tsv size is (14945, 3)
The output matrix data/features/train.pkl size is (14945, 4002) and data type is float64
The input data frame data/prepared/test.tsv size is (10055, 3)
The output matrix data/features/test.pkl size is (10055, 4002) and data type is float64
Updating lock file 'dvc.lock'

Running stage 'train':
> python src/train.py data/features model.pkl
Input matrix size (14945, 4002)
X matrix size (14945, 4000)
Y matrix size (14945,)
Updating lock file 'dvc.lock'

Running stage 'evaluate':
> python src/evaluate.py model.pkl data/features scores.json prc.json roc.json
Updating lock file 'dvc.lock'

To track the changes with git, run:

git add prc.json model.pkl roc.json params.yaml src/train.py src/prepare.py data/features src/evaluate.py data/data.xml data/prepared dvc.yaml dvc.lock scores.json src/featurization.py

To enable auto staging, run:

dvc config core.autostage true
```

We can see that output for the full experiment pipeline is displayed. We are
also notified that we can safely use `Ctrl+C` if we want to exit the
`dvc queue logs` command, without affecting the execution of our running
experiment task.
44 changes: 44 additions & 0 deletions content/docs/command-reference/queue/remove.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
## queue remove

Remove queued and completed tasks from the
[DVC Experiment](/doc/user-guide/experiment-management/experiments-overview)
task queue.

## Synopsis

```usage
usage: dvc queue remove [-h] [-q | -v]
[--all] [--queued] [--success] [--failed]
[<task> ...]

positional arguments:
<task> Tasks in queue to remove.
```

## Description

Removes the specified queued or completed experiment tasks from the queue. For
completed tasks, this will also remove any associated output logs.

> ⚠️ Note that for successfully completed tasks, this command is not the same as
> `dvc exp remove`. This command not remove any Git or DVC data associated with
pmrowla marked this conversation as resolved.
Show resolved Hide resolved
> a successful DVC experiment. It only removes the task queue entry and any
> associated output logs for that task.

## Options

- `--all` - remove all (queued and completed) experiment tasks from the queue.

- `--queued` - remove all queued experiment tasks from the queue.

- `--success` - remove all successfully completed tasks (and associated output
logs) from the queue.
Comment on lines +36 to +39
Copy link
Contributor

@jorgeorpinel jorgeorpinel Jul 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably too late to discuss now but just adding a note for a possible design follow-up:

queue remove --queued sounds redundant. Maybe --unprocessed/waiting or something like that?
Also, queue remove --success -> --successful, I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one was just an idea for you @dberenbaum, no follow-up needed in #3894.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good point but not a high priority for me. Feel free to open an issue and maybe we can change it or make --queued a hidden alias when we have time.


- `--failed` - remove all failed tasks (and associated output logs) from the
queue.

- `-h`, `--help` - prints the usage/help message, and exit.

- `-q`, `--quiet` - do not write anything to standard output.

- `-v`, `--verbose` - displays detailed tracing information.
Loading