Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref: document dvc queue #3715

Merged
merged 33 commits into from
Aug 3, 2022
Merged
Show file tree
Hide file tree
Changes from 26 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
4e68869
ref: add queue section
pmrowla Jul 1, 2022
dbe354c
ref: dvc queue start
pmrowla Jul 1, 2022
fb61d31
ref: dvc queue stop
pmrowla Jul 1, 2022
817afb7
ref: dvc queue kill
pmrowla Jul 1, 2022
a26ee1d
ref: dvc queue status
pmrowla Jul 1, 2022
e5feaf4
ref: dvc queue logs
pmrowla Jul 1, 2022
8a4b3d8
ref: dvc queue remove
pmrowla Jul 1, 2022
75e21a3
ref: update `exp run` reference for queue changes
pmrowla Jul 5, 2022
01ee250
ref: add queue logs and queue status examples
pmrowla Jul 5, 2022
ea22d10
review fixes
pmrowla Jul 6, 2022
89041fa
add gif animation for logs --follow
pmrowla Jul 6, 2022
f5fef4d
update idle worker note
pmrowla Jul 6, 2022
29760dc
add queue to sidebar
pmrowla Jul 6, 2022
533f175
Update content/docs/command-reference/queue/logs.md
jorgeorpinel Jul 11, 2022
6677c18
Update content/docs/user-guide/experiment-management/running-experime…
jorgeorpinel Jul 11, 2022
f337cad
Update content/docs/command-reference/exp/remove.md
jorgeorpinel Jul 11, 2022
ecd46d4
Update content/docs/command-reference/exp/run.md
jorgeorpinel Jul 11, 2022
d230f9a
Update content/docs/command-reference/exp/run.md
jorgeorpinel Jul 11, 2022
7ae6413
Update content/docs/command-reference/exp/run.md
jorgeorpinel Jul 11, 2022
8f1681b
Update content/docs/command-reference/exp/remove.md
jorgeorpinel Jul 11, 2022
09a23bc
Restyled by prettier (#3757)
restyled-io[bot] Jul 11, 2022
7df232e
Update content/docs/command-reference/queue/index.md
jorgeorpinel Jul 11, 2022
c87cc2a
Update content/docs/command-reference/queue/index.md
jorgeorpinel Jul 11, 2022
6a17f18
Update content/docs/command-reference/queue/start.md
jorgeorpinel Jul 11, 2022
7264c94
Update content/docs/command-reference/queue/start.md
jorgeorpinel Jul 11, 2022
e571871
Restyled by prettier (#3758)
restyled-io[bot] Jul 11, 2022
68d34e4
Merge branch 'main' into cmdref-dvc-queue
jorgeorpinel Jul 29, 2022
8fd6a29
drop `exp remove --queue` deprecation warning
pmrowla Aug 2, 2022
5d556a1
cmdref: reorder queue subcommands sidebar
pmrowla Aug 3, 2022
45183e7
replace quote blocks with admon
pmrowla Aug 3, 2022
f6061dd
remove setup block from examples
pmrowla Aug 3, 2022
ed2e412
lint fixes
pmrowla Aug 3, 2022
e1930b4
remove logs --follow gif
pmrowla Aug 3, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 9 additions & 2 deletions content/docs/command-reference/exp/remove.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,15 @@ With `--queue`, the list of experiments awaiting execution is cleared instead.
- `--queue` - remove all experiments that haven't been run yet (defined via
`dvc exp run --queue`).

- `-A`, `--all` - remove all experiments that have been run. Use `--queue` to
remove queued ones.
<admon type="warn">

`dvc exp remove --queue` is now an alias for `dvc queue remove --queued`. The
`--queue` flag will be deprecated in a future DVC release.
pmrowla marked this conversation as resolved.
Show resolved Hide resolved

</admon>

- `-A`, `--all` - remove all experiments that have been run. Use
`dvc queue remove` to remove queued experiment tasks.

- `--rev <commit>` - remove experiments derived from the specified `<commit>` as
baseline.
Expand Down
23 changes: 11 additions & 12 deletions content/docs/command-reference/exp/run.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,13 +34,9 @@ Use the `--set-param` (`-S`) option as a shortcut to change
<abbr>parameter</abbr> values [on-the-fly] before running the experiment.

It's possible to [queue experiments] for later execution with the `--queue`
flag. To actually run them, use `dvc exp run --run-all`. Queued experiments are
run sequentially by default, but can be run in parallel using the `--jobs`
option.

> ⚠️ Parallel runs are experimental and may be unstable. Make sure you're using
> a number of jobs that your environment can handle (no more than the CPU
> cores).
flag. Queued experiments can be run using `dvc queue start`, refer to the
`dvc queue` documentation for more information on managing the experiment task
queue.

It's also possible to run special [checkpoint experiments] that log the
execution progress (useful for deep learning ML). The `--rev` and `--reset`
Expand Down Expand Up @@ -82,8 +78,7 @@ committing them to the Git repo. Unnecessary ones can be [cleared] with
runs.

- `--queue` - place this experiment at the end of a line for future execution,
but don't actually run it yet. Use `dvc exp run --run-all` to process the
queue.
but don't actually run it yet. Use `dvc queue start` to process the queue.

> For checkpoint experiments, this implies `--reset` unless a `--rev` is
> provided.
Expand All @@ -96,9 +91,13 @@ committing them to the Git repo. Unnecessary ones can be [cleared] with
parallel. Only has an effect along with `--run-all`. Defaults to 1 (the queue
is processed serially).

> Note that since queued experiments are run isolated from each other, common
> stages may sometimes be executed several times depending on the state of the
> [run-cache] at that time.
<admon type="warn">

`dvc exp run --run-all [--jobs]` is now a shortcut for
`dvc queue start [--jobs]` followed by `dvc queue logs -f`. The `--run-all`
and `--jobs` options will be deprecated in a future DVC release.

</admon>
Comment on lines +94 to +100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should be under --run-all ? Or split into 2 admonitions, even if they're similar.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmrowla What do you think about moving or splitting this one?

Copy link
Contributor Author

@pmrowla pmrowla Aug 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really think it's necessary to repeat this block after 2 consecutive options - run-all and jobs will appear one after another and both options + the info block fit into a single screen on most devices

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. But this is much more about --run-all than about --jobs so I moved it up in 26d28e8.


- `-r <commit>`, `--rev <commit>` - resume an experiment from a specific
checkpoint name or hash (`commit`) in `--queue` or `--temp` runs.
Expand Down
39 changes: 39 additions & 0 deletions content/docs/command-reference/queue/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# queue

A set of commands to manage the
[DVC experiments](/doc/user-guide/experiment-management/experiments-overview)
task queue: [start](/doc/command-reference/queue/start),
Comment on lines +3 to +5
Copy link
Contributor

@jorgeorpinel jorgeorpinel Jul 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this terminology: "experiment task queue"

Why not just "experiment queue" ? Are "tasks" important enough to differentiate from experiments? (At first glance it sounds like an implementation detail)

WDYT @dberenbaum ? Could impact strings in the core codebase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's important to differentiate between "experiment" vs "queue task" due to commands where there's overlap (like exp remove vs queue remove).

exp remove removes DVC experiment data, meaning experiment git refs and their associated DVC cache data. queue remove removes task queue entry data, meaning the queue entry itself, and associated queue-related artifacts (i.e. logs)

The current plan is also to emphasize this in other commands. exp show will be modified so that by default it does not display queued or failed experiments by default (so the default view only shows things that can be removed with exp remove). There will still be optional flags for displaying queued/failed exps in the table (since it is useful to be able to see the params/deps for them).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorgeorpinel Maintaining two separate concepts is important, but feel free to suggest more useful terminology 🙏 .

Copy link
Contributor

@jorgeorpinel jorgeorpinel Jul 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I see there's some rationale and justification for the special terminology. I still think "task" is an implementation detail and could be avoided...

feel free to suggest more useful terminology

I'm thinking that it should be clear that exp commands deal with experiment data while queue commands deal with experiment queue entries (I mean that was the whole point of separating the commands I think 🙂) so term "experiment" can be used interchangeably in many places. Where disambiguation or emphasis are needed, we can use descriptive wording like "entry from the queue", "worker logs", etc.

But it's something we can follow-up on later if needed (still unsure about the release timeline we're looking at here).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is in main now and should be in the next DVC release.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everyone raises good points here. Agreed that dvc queue should be used infrequently. No functionality was removed from dvc exp here. For example, you can remove queued experiments without using dvc queue:

$ dvc exp run --queue -S train.min_split=0.5
Queued experiment 'd522711' for future execution.
$ dvc queue status
Task     Name    Created    Status
d522711          02:11 PM   Queued

Worker status: 0 active, 0 idle
$ dvc exp show
 ──────────────────────────────────────────────────────────────────────────────────────>
  Experiment              Created        State    avg_prec   roc_auc   prepare.split   >
 ──────────────────────────────────────────────────────────────────────────────────────>
  workspace               -              -           0.925   0.94602   0.2             >
  10-bigrams-experiment   May 19, 2022   -           0.925   0.94602   0.2             >
  └── d522711             02:11 PM       Queued          -         -   0.2             >
 ──────────────────────────────────────────────────────────────────────────────────────>
$ dvc exp remove d522711
Removed experiments: d522711

In this case, dvc exp remove and dvc queue remove are aliases, but dvc queue remove is needed to do things like drop successful experiments from dvc queue status and logs (and it seemed better to allow flexibility to remove any task from the queue as long as we have the command).

When the dvc queue commands are needed, it often is helpful to distinguish between the experiment and the queue task. Examples include completed experiments where you might want to clear them and their logs from the queue but keep the results, or checkpoints where there are multiple experiment rows for a single queue task (note that you can use the finished experiment names in the dvc queue commands, like dvc queue logs exp-7002e).

Ideas to further improve:

  • If there's functionality missing from dvc exp regarding the queue, we should add it where possible (for example, I notice dvc exp remove doesn't work for failed experiments, so I'll open an issue).
  • We might want to drop the suggested deprecations here (like dropping dvc exp remove --queued).

Copy link
Contributor

@jorgeorpinel jorgeorpinel Jul 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only specific suggestion for now is to avoid term "task". Be descriptive instead e.g. "queued experiment", "experiment from queue", even "entry from exps queue" if needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorgeorpinel if queue is considered a low level building block (and eventually can do things besides experiments for example), it's fine to use task here to my mind.

Copy link
Contributor

@jorgeorpinel jorgeorpinel Jul 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not strongly opposed but it seems unnecessary to me: the whole task management aspect of this is an implementation detail (ultimately irrelevant for users).

Copy link
Contributor

@jorgeorpinel jorgeorpinel Aug 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

queue is considered a low level building block

But "queue" is the name of the command. Hard to avoid that term.

p.s. term "worker process" also seems redundant and too deep. We already have "jobs" for the option name, I'd stick to that.
In some places we even have the special combo "task queue worker process" -- words have lost all meaning 😋

[stop](/doc/command-reference/queue/stop),
[status](/doc/command-reference/queue/status),
[logs](/doc/command-reference/queue/logs),
[remove](/doc/command-reference/queue/remove),
[kill](/doc/command-reference/queue/kill)

## Synopsis

```usage
usage: dvc queue [-h] [-q | -v]
pmrowla marked this conversation as resolved.
Show resolved Hide resolved
{start,stop,status,logs,remove,kill} ...

positional arguments:
COMMAND
start Start experiments queue workers.
stop Stop experiments queue workers.
status List the status of the queue tasks and workers.
logs Show output logs for a task in the experiments queue.
remove Remove tasks in experiments queue.
kill Kill tasks in experiments queue.
```

## Description

`dvc queue` subcommands provide specialized ways to manage queued experiment
tasks.
Comment on lines +28 to +31
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's provide just a bit more context instead like "You can use exp run --queue to queue experiments and then..." instead of repeating the definition of the command (already in the top of the page). Thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmrowla What do you think? Could you add some phrasing like this?


## Options

- `-h`, `--help` - prints the usage/help message, and exit.

- `-q`, `--quiet` - do not write anything to standard output.

- `-v`, `--verbose` - displays detailed tracing information.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
37 changes: 37 additions & 0 deletions content/docs/command-reference/queue/kill.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
## queue kill
pmrowla marked this conversation as resolved.
Show resolved Hide resolved

Kill actively running
[DVC Experiment](/doc/user-guide/experiment-management/experiments-overview)
tasks.

## Synopsis

```usage
usage: dvc queue kill [-h] [-q | -v] [<task> ...]

positional arguments:
<task> Tasks in queue to kill.
```

## Description

Forcefully stops execution of the specified (running) experiment tasks. Killed
tasks will be considered as failed runs.

This command does not stop the queue worker process. After the specified task
has been killed, the worker process will consume and execute the next experiment
task in the queue.

To kill all running experiment tasks and also stop queue processing, you can use
`dvc queue stop --kill`.

> ⚠️ Note that killed experiment tasks will be considered failed runs and will
> not be re-added to the queue for future execution.

## Options

- `-h`, `--help` - prints the usage/help message, and exit.

- `-q`, `--quiet` - do not write anything to standard output.

- `-v`, `--verbose` - displays detailed tracing information.
160 changes: 160 additions & 0 deletions content/docs/command-reference/queue/logs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
## queue logs

Show output logs for running and completed tasks in the
[DVC Experiment](/doc/user-guide/experiment-management/experiments-overview)
task queue.

## Synopsis

```usage
usage: dvc queue logs [-h] [-q | -v] [-e <encoding>] [-f] <task>

positional arguments:
<task> Task to show.
```

## Description

Shows output logs for the specified running or completed experiment task.

By default, this command will show any available log data and then exit. For
tasks which are still running, the `--follow` option can be used to attach to
the task and continuously show live log output, until the task has completed.

When using the `--follow` option, it is safe to stop following output using
`Ctrl+C` (or `SIGINT`). This will only cause the logs command to exit, and the
experiment task will continue to be run in the background.

## Options

- `-e <encoding>`, `--encoding <encoding>` - text encoding for log output.
Defaults to the system locale encoding.

> ⚠️ Note that this option is used to specify the encoding of the experiment
> task output (i.e. the output of pipeline stage commands), which may not
> always match the encoding of your system terminal.

- `-f`, `--follow` - attach to task and follow additional live output. Only
applicable if the task is still running.

- `-h`, `--help` - prints the usage/help message, and exit.

- `-q`, `--quiet` - do not write anything to standard output.

- `-v`, `--verbose` - displays detailed tracing information.

## Examples

> This is based on our [Get Started](/doc/start/experiments), where you can find
> the actual source code.

<details>

### Expand to prepare the example ML project

Clone the DVC repo and download the data it <abbr>depends</abbr> on:

```dvc
$ git clone [email protected]:iterative/example-get-started.git
$ cd example-get-started
$ dvc pull
```

Let's also install the Python requirements:

> We **strongly** recommend creating a
> [virtual environment](https://python.readthedocs.io/en/stable/library/venv.html)
> first.

```dvc
$ pip install -r src/requirements.txt
```

</details>
pmrowla marked this conversation as resolved.
Show resolved Hide resolved

## Example: View logs for completed experiment tasks

Let's say we have previously run some queued experiment tasks:

```dvc
$ dvc queue status
Task Name Created Status
192a13c 04:15 PM Failed
753b005 04:01 PM Success
0bbb118 04:01 PM Success
1ae8b65 04:01 PM Success

Worker status: 0 active, 0 idle
```

We can view the output for both failed and successfully completed experiment
tasks:

```dvc
$ dvc queue logs 192a13c
'data/data.xml.dvc' didn't change, skipping
Running stage 'prepare':
> python src/prepare.py data/data.xml
Traceback (most recent call last):
File "/Users/pmrowla/git/example-get-started/.dvc/tmp/exps/tmp217n0tjv/src/prepare.py", line 10, in <module>
raise AssertionError
AssertionError
ERROR: failed to reproduce 'prepare': failed to run: python src/prepare.py data/data.xml, exited with 1
```

```dvc
$ dvc queue logs 0bbb118
'data/data.xml.dvc' didn't change, skipping
Stage 'prepare' is cached - skipping run, checking out outputs
Updating lock file 'dvc.lock'

Stage 'featurize' is cached - skipping run, checking out outputs
Updating lock file 'dvc.lock'

Stage 'train' is cached - skipping run, checking out outputs
Updating lock file 'dvc.lock'

Stage 'evaluate' is cached - skipping run, checking out outputs
Updating lock file 'dvc.lock'

To track the changes with git, run:

git add dvc.yaml scores.json roc.json params.yaml data/prepared data/data.xml prc.json src/featurization.py data/features src/evaluate.py model.pkl dvc.lock src/train.py src/prepare.py

To enable auto staging, run:

dvc config core.autostage true
```

## Example: View logs for running experiment tasks

Let's queue a new experiment and view the output while it is running:

```dvc
$ dvc exp run --queue -S prepare.split=0.40 -S featurize.max_features=4000
Queued experiment '93cfa70' for future execution.
$ dvc queue start
Started '1' new experiments task queue worker.
$ dvc queue logs 93cfa70
'data/data.xml.dvc' didn't change, skipping
Running stage 'prepare':
> python src/prepare.py data/data.xml
Updating lock file 'dvc.lock'

Running stage 'featurize':
> python src/featurization.py data/prepared data/features
```

We can see that by default, `dvc queue logs` displays any available output and
then exits. In this case, our `featurize` stage is still running, so no
additional output is available at this time.

If we wanted to continuously view live output from the running task (until it
completes) we also could have used the `--follow` option:

![](/img/queue-logs-follow.gif)
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

We can see that output for the full experiment pipeline is displayed when using
`--follow`. We are also notified that we can safely use `Ctrl+C` if we want to
exit the `dvc queue logs` command, without affecting the execution of our
running experiment task.
44 changes: 44 additions & 0 deletions content/docs/command-reference/queue/remove.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
## queue remove

Remove queued and completed tasks from the
[DVC Experiment](/doc/user-guide/experiment-management/experiments-overview)
task queue.

## Synopsis

```usage
usage: dvc queue remove [-h] [-q | -v]
[--all] [--queued] [--success] [--failed]
[<task> ...]

positional arguments:
<task> Tasks in queue to remove.
```

## Description

Removes the specified queued or completed experiment tasks from the queue. For
completed tasks, this will also remove any associated output logs.

> ⚠️ Note that for successfully completed tasks, this command is not the same as
> `dvc exp remove`. `dvc queue remove` does not remove any Git or DVC data
> associated with a successful DVC experiment. It only removes the task queue
> entry and any associated output logs for that task.

## Options

- `--all` - remove all (queued and completed) experiment tasks from the queue.

- `--queued` - remove all queued experiment tasks from the queue.

- `--success` - remove all successfully completed tasks (and associated output
logs) from the queue.
Comment on lines +36 to +39
Copy link
Contributor

@jorgeorpinel jorgeorpinel Jul 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably too late to discuss now but just adding a note for a possible design follow-up:

queue remove --queued sounds redundant. Maybe --unprocessed/waiting or something like that?
Also, queue remove --success -> --successful, I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one was just an idea for you @dberenbaum, no follow-up needed in #3894.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good point but not a high priority for me. Feel free to open an issue and maybe we can change it or make --queued a hidden alias when we have time.


- `--failed` - remove all failed tasks (and associated output logs) from the
queue.

- `-h`, `--help` - prints the usage/help message, and exit.

- `-q`, `--quiet` - do not write anything to standard output.

- `-v`, `--verbose` - displays detailed tracing information.
Loading