Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

start: Experiments Trail #2574

Merged
merged 90 commits into from
Oct 18, 2021
Merged
Show file tree
Hide file tree
Changes from 31 commits
Commits
Show all changes
90 commits
Select commit Hold shift + click to select a range
a84c2be
experiments draft 1 moved to experiments-trail/
Jul 2, 2021
81c4052
modifications for downloading the data and setting up the experiment …
Jul 2, 2021
e80af13
Fix project installation and commands
iesahin Jul 3, 2021
bb56fb7
added second draft for experiments
Jul 5, 2021
29c3ae0
Restyled by prettier (#2608)
restyled-io[bot] Jul 5, 2021
e12f222
Update version 1 by some command outputs
iesahin Jul 5, 2021
1a646bb
Added naming section and two screenshots to draft 1
iesahin Jul 6, 2021
5c37246
adding separate section for experiment preparation in draft 2
iesahin Jul 6, 2021
5756cfc
fixed link to the project and added some explanations to installation
iesahin Jul 6, 2021
1645b10
some minor edits in draft 1
iesahin Jul 6, 2021
0acdd13
minor edits in pipeline explanation
iesahin Jul 6, 2021
0f43b57
carried over changes in draft-1 and updated the install instructions
iesahin Jul 6, 2021
2f1cd50
merged and simplified the first paragraph and added emojis
iesahin Jul 8, 2021
51aa9da
added links to the sidebar (tentative)
iesahin Jul 8, 2021
9f25143
changes after jorge's review
iesahin Jul 9, 2021
e24d163
carried over changes in alt-1 to alt-2
iesahin Jul 12, 2021
da4d617
added 3rd alternative to experiments trail
iesahin Jul 12, 2021
77c36c2
minor changes
iesahin Jul 12, 2021
e76ff2b
removed alt-1 and alt-2
iesahin Jul 16, 2021
4a6cba7
renamed experiments trail alt-3 and modified the sidebar
iesahin Jul 16, 2021
3b8cbe1
minor changes for experiment commands
iesahin Jul 20, 2021
4c6b54d
updated the experiments and removed some text
iesahin Jul 22, 2021
a2063cc
Updated and simplified initial paragraphs
iesahin Jul 28, 2021
7a6a70b
updated the details section scope to hide all installation info
iesahin Jul 30, 2021
e07584c
moved params diff explanation to hidden section + minor edits
iesahin Jul 30, 2021
cb777d3
replaced some emoji
iesahin Aug 19, 2021
a1e6b6b
removed the "note on experiments names" section
iesahin Aug 19, 2021
c10457f
removed "how dvc updates parameters" section
iesahin Aug 19, 2021
aa30fdf
removed dvc exp apply
iesahin Aug 19, 2021
a8f0741
removed screenshots
iesahin Aug 20, 2021
51afaf5
edits for clarification
iesahin Aug 20, 2021
96406ec
revised the doc per reviews
iesahin Oct 4, 2021
543596e
Update content/docs/start/experiments-trail/experiments.md
iesahin Oct 4, 2021
b4fc360
minor edits
iesahin Oct 4, 2021
cadf03a
some emoji changes
iesahin Oct 4, 2021
9dde3f7
put the video
iesahin Oct 5, 2021
9a56f80
replaced older gs:experiments with the trail doc
iesahin Oct 5, 2021
d1f929a
Removed emoji and moved persisting to details section
iesahin Oct 5, 2021
d085d77
added a sentence for data management
iesahin Oct 5, 2021
797fc1d
added a brief `dvc exp show` after the first run.
iesahin Oct 6, 2021
34c165f
added emoji to detail headers
iesahin Oct 6, 2021
46411bf
added an initial text
iesahin Oct 6, 2021
72cca7c
some suggestions of Ivan applied.
iesahin Oct 8, 2021
dad083d
experiments draft 1 moved to experiments-trail/
Jul 2, 2021
7f768e2
modifications for downloading the data and setting up the experiment …
Jul 2, 2021
761c864
Fix project installation and commands
iesahin Jul 3, 2021
506f0fa
added second draft for experiments
Jul 5, 2021
e7c1127
Restyled by prettier (#2608)
restyled-io[bot] Jul 5, 2021
05fef9e
Update version 1 by some command outputs
iesahin Jul 5, 2021
e9d94c5
Added naming section and two screenshots to draft 1
iesahin Jul 6, 2021
518cb57
adding separate section for experiment preparation in draft 2
iesahin Jul 6, 2021
8e3f186
fixed link to the project and added some explanations to installation
iesahin Jul 6, 2021
14518f9
some minor edits in draft 1
iesahin Jul 6, 2021
ac24222
minor edits in pipeline explanation
iesahin Jul 6, 2021
fcf8aea
carried over changes in draft-1 and updated the install instructions
iesahin Jul 6, 2021
b0d48d3
merged and simplified the first paragraph and added emojis
iesahin Jul 8, 2021
fbf8c5e
added links to the sidebar (tentative)
iesahin Jul 8, 2021
f12d1b8
changes after jorge's review
iesahin Jul 9, 2021
91ee97b
carried over changes in alt-1 to alt-2
iesahin Jul 12, 2021
b7ad284
added 3rd alternative to experiments trail
iesahin Jul 12, 2021
0b97728
minor changes
iesahin Jul 12, 2021
5a925e9
removed alt-1 and alt-2
iesahin Jul 16, 2021
3c6a9b2
renamed experiments trail alt-3 and modified the sidebar
iesahin Jul 16, 2021
8433c8f
minor changes for experiment commands
iesahin Jul 20, 2021
f6b870b
updated the experiments and removed some text
iesahin Jul 22, 2021
b2050b9
Updated and simplified initial paragraphs
iesahin Jul 28, 2021
690b391
updated the details section scope to hide all installation info
iesahin Jul 30, 2021
4be6138
moved params diff explanation to hidden section + minor edits
iesahin Jul 30, 2021
c2e00f5
replaced some emoji
iesahin Aug 19, 2021
f4dae65
removed the "note on experiments names" section
iesahin Aug 19, 2021
77ff0c5
removed "how dvc updates parameters" section
iesahin Aug 19, 2021
397cc53
removed dvc exp apply
iesahin Aug 19, 2021
065fa71
removed screenshots
iesahin Aug 20, 2021
5c2725b
edits for clarification
iesahin Aug 20, 2021
b7d37dc
revised the doc per reviews
iesahin Oct 4, 2021
4e81e46
Update content/docs/start/experiments-trail/experiments.md
iesahin Oct 4, 2021
02cd702
minor edits
iesahin Oct 4, 2021
62edbd8
some emoji changes
iesahin Oct 4, 2021
10e6ec3
put the video
iesahin Oct 5, 2021
bd4b6ad
replaced older gs:experiments with the trail doc
iesahin Oct 5, 2021
8dfe709
Removed emoji and moved persisting to details section
iesahin Oct 5, 2021
56ccd4c
added a sentence for data management
iesahin Oct 5, 2021
2a8b42e
added a brief `dvc exp show` after the first run.
iesahin Oct 6, 2021
769c2e6
added emoji to detail headers
iesahin Oct 6, 2021
c7ddcf8
added an initial text
iesahin Oct 6, 2021
915c61a
some suggestions of Ivan applied.
iesahin Oct 8, 2021
ea94da6
modifications after Dave's review
iesahin Oct 15, 2021
2b9ebd9
Merge branch 'iesahin/new-gs-experiments' of github.com:iterative/dvc…
iesahin Oct 15, 2021
ec3d132
merged
iesahin Oct 15, 2021
d3cae7a
revised after Dave's review.
iesahin Oct 18, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion content/docs/sidebar.json
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,8 @@
}
},
{
"slug": "experiments",
"slug": "experiments-trail/experiments",
shcheklein marked this conversation as resolved.
Show resolved Hide resolved
"label": "Experiments Trail",
"tutorials": {
"katacoda": "https://katacoda.com/dvc/courses/get-started/experiments"
}
Expand Down
255 changes: 255 additions & 0 deletions content/docs/start/experiments-trail/experiments.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,255 @@
---
title: 'Get Started: Experiments'
---

# Get Started with Experiments

In machine learning projects, number of <abbr>experiments</abbr> grow rapidly.
iesahin marked this conversation as resolved.
Show resolved Hide resolved
DVC can track these experiments, list their most relevant parameters and
iesahin marked this conversation as resolved.
Show resolved Hide resolved
metrics, and commit only the ones that we need to Git.

In this section, we explore the basic features of DVC experiment management with
[`get-started-experiments`][gse] project.

[gse]: https://github.com/iterative/get-started-experiments

<details>

## ⚙️ Installing the Example Project
iesahin marked this conversation as resolved.
Show resolved Hide resolved
iesahin marked this conversation as resolved.
Show resolved Hide resolved

These commands are run in the [`get-started-experiments`][gse] project. You can
run the commands in this document after cloning the repository and installing
the requirements.

### ⚡ Clone the project and create virtual environment

Please clone the project and create a virtual environment.

> We create a virtual environment to keep the libraries we use isolated from the
> rest of your system. This prevents version conflicts.

```dvc
$ git clone https://github.com/iterative/get-started-experiments -b get-started
shcheklein marked this conversation as resolved.
Show resolved Hide resolved
$ cd get-started-experiments
$ virtualenv .venv
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it won't work on Mac (need to check Windows) - you need to provide Python 3

I would simplify this with We strongly recommend creating a virtual environment first. - the way we handle this in the current example get started

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

virtualenv requires a separate installation in all systems. The standard one in Py3 is venv. We discussed this in iterative/example-repos-dev#26 and decided to continue with virtualenv.

I can also strongly recommend, if that's the point :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even venv requires a separate apt install python3-venv in Debian and derivatives.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question was about not providing so much details. We can keep git clone and maybe pip install -r requirements ... we can put a link nearby to the virtualenv like we do in the current get started?

$ . .venv/bin/activate
$ python -m pip install -r requirements.txt
```

### ✅ Get the data set

The repository we cloned doesn't contain the dataset. We use `dvc pull` to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here we need a reference to the data management

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we'll revise all the docs when we'll have done other trails. All steps will link to other relevant parts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we include it now? not even the link itself, but some phrase - that dvc pull is one of those commands to save, load, manage datasets and ML models ... read here ....

update the missing data files.

```dvc
$ dvc pull
```

The repository already contains the necessary configuration to run the
experiments.

## 💡 Preparing a project for DVC experiments
iesahin marked this conversation as resolved.
Show resolved Hide resolved

If DVC is not installed to the system, please refer to [install](/doc/install).
iesahin marked this conversation as resolved.
Show resolved Hide resolved

In this getting started page, we assume that there is already a configured DVC
project to simplify the introduction. DVC experiments require a DVC pipeline
defined in the project.

If DVC is not initialized before in the project, you can do so by:

```dvc
$ dvc init
```

DVC also requires commands to be run and their dependencies to be defined as
stages. We use `dvc stage add` to add a stage and set its dependencies.

```dvc
$ dvc stage add -n train \
-p model.conv_units \
-p train.epochs \
-d data/images \
-m metrics.json \
python3 src/train.py
iesahin marked this conversation as resolved.
Show resolved Hide resolved
```

The command tells DVC to create an experiment named `train`, and for any change
in `data/images/` directory, `model.conv_units` or `train.epochs` parameters, we
(re)run an experiment using `src/train.py` that produces a new `metrics.json`
file.

You can learn more about [pipelines], and [parameters] in other sections of the
documentation.

[pipelines]: /doc/start/data-pipelines
[parameters]: /doc/start/metrics-parameters-plots

</details>

## 👟 Running the experiment with default parameters

The purpose of `dvc exp` subcommands is to run the experiments without
committing parameter and dependency changes to Git. The artifacts like models,
metrics produced by each experiment are tracked by DVC and persisted on demand.

Running the experiment with the default project settings requires only the
command:

```dvc
$ dvc exp run
...
Reproduced experiment(s): exp-b28f0
Experiment results have been applied to your workspace.
...
```

It runs the specified command (`python train.py`) in `dvc.yaml`. That command
writes the metrics values to `metrics.json`.

This experiment is then associated with the values found in parameters file
(`params.yaml`), and other dependencies (`data/images/`) with these produced
metrics.

<details>

### 📖 If you used `dvc repro` before

Earlier versions of DVC uses `dvc repro` to run the pipeline. If you already
have a DVC project, you may already be using `dvc repro`.

In DVC 2.0 `dvc exp run` supersedes `dvc repro`. We use `dvc repro` to run the
pipeline as found in the <abbr>workspace</abbr>. All the parameters and
dependencies are retrieved from the current workspace. It doesn't use any
special objects to track the experiments or associate parameters with metrics.
iesahin marked this conversation as resolved.
Show resolved Hide resolved
When you have large number of experiments that you don't want to commit all to
Git, it's better to use `dvc exp run`. It allows to change the parameters
quickly, can track the history of artifacts and has facilities to compare these
experiments easily.

`dvc repro` is still available to run the pipeline when these extra features are
not needed.

</details>

## ⏲ Running the experiment by setting parameters

Now let's do some more experimentation.

DVC allows to update the parameters defined in the pipeline without modifying
the files manually. We use this feature to set the convolutional units in
`train.py`.

```dvc
$ dvc exp run --set-param model.conv_units=24
...
Reproduced experiment(s): exp-7b56f
Experiment results have been applied to your workspace.
...
```

<details>

## 👟👟 Run multiple experiments in parallel

Instead of running the experiments one-by-one, we can define them to run in a
batch. This is especially handy when you have long running experiments.

We add experiments to the queue using the `--queue` option of `dvc exp run`. We
also use `-S` (`--set-param`) to set a value for the parameter.

```dvc
$ dvc exp run --queue -S model.conv_units=32
Queued experiment '3cac8c6' for future execution.
$ dvc exp run --queue -S model.conv_units=64
Queued experiment '23660b6' for future execution.
$ dvc exp run --queue -S model.conv_units=128
Queued experiment '6591a57' for future execution.
$ dvc exp run --queue -S model.conv_units=256
Queued experiment '9109ea9' for future execution.
```

Next, run all (`--run-all`) queued experiments in parallel. You can specify the
number of parallel processes using `--jobs`:

```dvc
$ dvc exp run --run-all --jobs 2
```

</details>

## ↔️ Comparing experiments

The experiments are run several times with different parameters. We use
`dvc exp show` to compare all of these experiments. This command presents the
parameters and metrics produced in experiments in a nicely formatted table.

```dvc
$ dvc exp show
```

```dvctable
┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃ white:**Experiment** ┃ white:**Created**┃ yellow:**loss** ┃ yellow:**acc** ┃ blue:**train.epochs** ┃ blue:**model.conv_units** ┃
iesahin marked this conversation as resolved.
Show resolved Hide resolved
┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
│ workspace │ - │ 0.23508 │ 0.9151 │ 10 │ 24 │
│ 7317bc6 │ Jul 18, 2021 │ - │ - │ 10 │ 16 │
│ ├── e2647ef [exp-ee8a4] │ 05:14 PM │ 0.23146 │ 0.9145 │ 10 │ 64 │
│ ├── 15c9451 [exp-a9be6] │ 05:14 PM │ 0.25231 │ 0.9102 │ 10 │ 32 │
│ ├── 9c32227 [exp-17dd9] │ 04:46 PM │ 0.23687 │ 0.9167 │ 10 │ 256 │
│ ├── 8a9cb15 [exp-29d93] │ 04:46 PM │ 0.24459 │ 0.9134 │ 10 │ 128 │
│ ├── dfc536f [exp-a1bd9] │ 03:35 PM │ 0.23508 │ 0.9151 │ 10 │ 24 │
│ └── 1a1d858 [exp-6dccf] │ 03:21 PM │ 0.23282 │ 0.9152 │ 10 │ 16 │
└─────────────────────────┴──────────────┴─────────┴────────┴──────────────┴──────────────────┘
```

By default it shows all the parameters and the metrics with the timestamp. If
iesahin marked this conversation as resolved.
Show resolved Hide resolved
you have large number of parameters, metrics or experiments, this may lead to a
cluttered view. You can limit the table to specific metrics, or parameters, or
hide the timestamp column with `--include-metrics`, `--include-params`, or
`--no-timestamp` options of the command, respectively.

```dvc
$ dvc exp show --no-timestamp \
--include-params model.conv_units --include-metrics acc
```

```dvctable
┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃ white:**Experiment** ┃ yellow:**acc** ┃ blue:**model.conv_units** ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
│ workspace │ 0.9151 │ 24 │
│ 7317bc6 │ - │ 16 │
│ ├── e2647ef [exp-ee8a4] │ 0.9145 │ 64 │
│ ├── 15c9451 [exp-a9be6] │ 0.9102 │ 32 │
│ ├── 9c32227 [exp-17dd9] │ 0.9167 │ 256 │
│ ├── 8a9cb15 [exp-29d93] │ 0.9134 │ 128 │
│ ├── dfc536f [exp-a1bd9] │ 0.9151 │ 24 │
│ └── 1a1d858 [exp-6dccf] │ 0.9152 │ 16 │
└─────────────────────────┴────────┴──────────────────┘
```

## 🔏 Persisting experiments

After selecting an experiment from the table, you can create a Git branch that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should create a sense that it's only one of the possible ways I think

contains the experiment with all its related files.

```dvc
$ dvc exp branch exp-05e87 "cnn-256"
Git branch 'cnn-256' has been created from experiment 'exp-05e87'.
To switch to the new branch run:

git checkout cnn-256
```

You can then checkout and continue from working this branch as usual.
iesahin marked this conversation as resolved.
Show resolved Hide resolved
iesahin marked this conversation as resolved.
Show resolved Hide resolved

## ⭐ Go Further

There are many other features of `dvc exp`, like cleaning up the unused
experiments, sharing them without committing into Git or getting differences
between two experiments.

Please see the section on
[Experiment Management](/doc/user-guide/experiment-management) in the User's
Guide or `dvc exp` and subcommands in the Command Reference.