Skip to content

Commit

Permalink
params: roll back changes for now...
Browse files Browse the repository at this point in the history
  • Loading branch information
jorgeorpinel committed Aug 26, 2022
1 parent dc16807 commit 23cd9c6
Show file tree
Hide file tree
Showing 6 changed files with 81 additions and 129 deletions.
2 changes: 1 addition & 1 deletion content/docs/command-reference/exp/run.md
Original file line number Diff line number Diff line change
Expand Up @@ -224,7 +224,7 @@ train_config.json train.weight_decay - 0.001

Note that `exp run --set-param` (`-S`) doesn't update your `dvc.yaml`. When
appending or removing <abbr>parameters</abbr>, make sure to update the
[`params` section](https://dvc.org/doc/user-guide/project-structure/dvcyaml-files#parameters)
[`params` section](https://dvc.org/doc/user-guide/project-structure/dvcyaml-files#parameter-dependencies)
of your `dvc.yaml` accordingly.

</admon>
14 changes: 6 additions & 8 deletions content/docs/command-reference/params/diff.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
# params diff

Show changes in `dvc params` between commits in the <abbr>DVC repository</abbr>,
or between a commit and the <abbr>workspace</abbr>.
Show changes in [parameters](/doc/command-reference/params) between commits in
the <abbr>DVC repository</abbr>, or between a commit and the
<abbr>workspace</abbr>.

> Requires that Git is being used to version the project.
Expand All @@ -20,15 +21,12 @@ positional arguments:

## Description

Provides a quick way to compare <abbr>parameters</abbr> among experiments in the
Provides a quick way to compare parameter values among experiments in the
repository history. The differences shown by this command include the old and
new param values, along with the param name.

<admon type="info">

Parameters are defined in the `params` field of `dvc.yaml`. See `dvc params`.

</admon>
> Parameter dependencies are defined in the `params` field of `dvc.yaml` (e.g.
> with the the `-p` (`--params`) option of `dvc stage add`).
Without arguments, `dvc params diff` compares parameters currently present in
the <abbr>workspace</abbr> (uncommitted changes) with the latest committed
Expand Down
112 changes: 47 additions & 65 deletions content/docs/command-reference/params/index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# params

Contains a command to show changes in <abbr>parameters</abbr>:
Contains a command to show changes in parameters:
[diff](/doc/command-reference/params/diff).

## Synopsis
Expand All @@ -16,70 +16,62 @@ positional arguments:

## Description

Parameters are simple values used inside your code to influence the results
(e.g. machine learning [hyperparameters]). DVC can track these as key/value
pairs from structured YAML 1.2, JSON, TOML 1.0,
[or Python](#examples-python-parameters-file) files (`params.yaml` by default),
from which your code should also read them.
In order to track parameters and hyperparameters associated to machine learning
experiments in <abbr>DVC projects</abbr>, DVC provides a different type of
dependencies: _parameters_. They usually have simple names like `epochs`,
`learning-rate`, `batch_size`, etc.

Multiple stages of a <abbr>pipeline</abbr> can [use the same params file] as
<abbr>dependency</abbr>, but only certain values will affect each
<abbr>stage</abbr>. Params usually have simple names like `epochs`,
`learning-rate`, `batch_size`, etc. Example:

```yaml
epochs: 900
tuning:
- learning-rate: 0.945
- max_depth: 7
```
To start tracking parameters, list their names under the `params` field of
`dvc.yaml` (manually or with the the `-p`/`--params` option of `dvc stage add`).
For example:
To start tracking parameters, list them under the `params` field of `dvc.yaml`
stages (manually or with the the `-p`/`--params` option of `dvc stage add`). For
example:

```yaml
stages:
learn:
cmd: python deep.py
cmd: ./deep.py
params:
- epochs # specific param from params.yaml
- tuning.learning-rate # from params.yaml
- myparams.toml:
- batch_size # specific param from custom file
- config.json: # all params in this file
- epochs # track specific parameter (from params.yaml)
- tuning.learning-rate
- myparams.toml: # track specific params from custom file
- batch_size
- config.json: # track all parameters in this file
```
<admon type="info">

See [more details] about this syntax.

</admon>

The `dvc params diff` command is available to show parameter changes, displaying
their current and previous values.

<abbr type="tip">

Parameters can also be used for [templating] `dvc.yaml` itself.

</abbr>
In contrast to a regular <abbr>dependency</abbr>, a parameter dependency is not
a file or directory. Instead, it consists of a _parameter name_ (or key) in a
_parameters file_, where the _parameter value_ should be found. This allows you
to define [stage](/doc/command-reference/run) dependencies more granularly:
changes to other parts of the params file will not affect the stage. Parameter
dependencies also prevent situations where several stages share a regular
dependency (e.g. a config file), and any change in it invalidates all of them
(see `dvc status`), causing unnecessary re-executions upon `dvc repro`.

The default **parameters file** name is `params.yaml`, but any other YAML 1.2,
JSON, TOML 1.0, or [Python](#examples-python-parameters-file) files can be used
additionally (listed under `params:` as shown in the sample above). These files
are typically written manually (or they can be generated) and they can be
versioned directly with Git.

**Parameter values** should be organized in tree-like hierarchies (dictionaries)
inside params files (see [Examples](#examples)). DVC will interpret param names
as the tree path to find those values. Supported types are: string, integer,
float, boolean, and arrays (groups of params). Note that DVC does not ascribe
any specific meaning to these values.

DVC saves parameter names and values to `dvc.lock` in order to track them over
time. They will be compared to the latest params files to determine if the stage
is outdated upon `dvc repro` (or `dvc status`).

DVC does not pass the parameter values to [stage commands]. The commands
executed by DVC should load them by itself, for example using
`dvc.api.params_show()`.
> Note that DVC does not pass the parameter values to stage commands. The
> commands executed by DVC will have to load and parse the parameters file by
> itself.

The `dvc params diff` command is available to show parameter changes, displaying
their current and previous values.

[hyperparameters]:
/doc/user-guide/experiment-management/running-experiments#tuning-hyperparameters
[use the same params file]:
/doc/user-guide/data-pipelines/defining-pipelines#parameter-dependencies
[more details]: /doc/user-guide/project-structure/dvcyaml-files#parameters
[templating]: /doc/user-guide/project-structure/dvcyaml-files#templating
[stage commands]: /doc/user-guide/project-structure/dvcyaml-files#stage-commands
💡 Parameters can also be used for
[templating](/doc/user-guide/project-structure/dvcyaml-files#templating)
`dvc.yaml` itself.

## Options

Expand Down Expand Up @@ -120,7 +112,7 @@ $ dvc stage add -n train -d train.py -d users.csv -o model.pkl \
> Python parameters files.

The `train.py` script will have some code to parse and load the needed
parameters. You can use `dvc.api.params_show()` for this:
parameters. For example, you can use `dvc.api.params_show()`:

```py
import dvc.api
Expand Down Expand Up @@ -205,13 +197,9 @@ previous version, which is why all `Old` values are `—`.

## Examples: Python parameters file

<admon type="warn">

See Note that complex expressions (unsupported by
[ast.literal_eval](https://docs.python.org/3/library/ast.html#ast.literal_eval))
won't be parsed as DVC parameters.

</admon>
> ⚠️ Note that complex expressions (unsupported by
> [ast.literal_eval](https://docs.python.org/3/library/ast.html#ast.literal_eval))
> won't be parsed as DVC parameters.

Consider this Python parameters file named `params.py`:

Expand Down Expand Up @@ -306,9 +294,3 @@ $ dvc stage add -n train -d train.py -d users.csv -o model.pkl \
-p params.py:BOOL,INT,TestConfig \
python train.py
```

<admon type="tip">

See also `dvc.api.params_show()` to load parameters in Python code.

</admon>
12 changes: 6 additions & 6 deletions content/docs/user-guide/basic-concepts/parameter.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
---
name: 'Parameters'
match: [parameter, parameters]
name: 'Parameter Dependency'
match: [parameter, parameters, param, params, hyperparameter, hyperparameters]
tooltip: >-
Simple values that your code can depend on, loaded from a a structured file
(`params.yaml` by default). DVC can track them as granular dependencies for
pipeline stages (defined in `dvc.yaml`). These are especially useful for
machine learning hyperparameter tuning. See `dvc params`.
Pipeline stages (defined in `dvc.yaml`) can depend on specific values inside
an arbitrary YAML, JSON, TOML, or Python file (`params.yaml` by default).
Stages are invalid (considered outdated) when any of their parameter values
change. See [`dvc params`](/doc/command-reference/params).
---
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@ experiment(s). These files codify _pipelines_ that specify one or more
### Running the pipeline(s)

You can run the experiment <abbr>pipelines</abbr> using `dvc exp run`. It uses
`./dvc.yaml` (in the current directory) by default.
You can run the experiment pipeline using `dvc exp run`. It uses `./dvc.yaml`
(in the current directory) by default.

```dvc
$ dvc exp run
Expand All @@ -45,20 +45,19 @@ once.
> 📖 `dvc exp run` is an experiment-specific alternative to `dvc repro`.
[reproduction targets]: /doc/command-reference/repro#options
[dependency graph]: /doc/user-guide/data-pipelines/defining-pipelines
[dependency graph]: /doc/command-reference/dag#directed-acyclic-graph

## Tuning (hyper)parameters

Parameters represent simple values used inside your code to tune modeling
attributes, or that affect experiment results in any other way. For example, a
[random forest classifier] may require a _maximum depth_ value. Machine learning
experimentation often involves defining and searching hyperparameter spaces to
improve the resulting model metrics.
Parameters are the values that modify the behavior of coded processes -- in this
case producing different experiment results. Machine learning experimentation
often involves defining and searching hyperparameter spaces to improve the
resulting model metrics.

Your source code should read params from structured [parameters files]
(`params.yaml` by default). Define them with the `params` field of `dvc.yaml`
for DVC to track them. When a param value has changed, `dvc exp run` invalidates
any stages that depend on it, and reproduces them.
In DVC project source code, <abbr>parameters</abbr> should be read from _params
files_ (`params.yaml` by default) and defined in `dvc.yaml`. When a tracked
param value has changed, `dvc exp run` invalidates any stages that depend on it,
and reproduces them.

> 📖 See `dvc params` for more details.
Expand All @@ -80,11 +79,6 @@ $ dvc exp run -S learning_rate=0.001 -S units=128 # set multiple params
...
```

[random forest classifier]:
https://medium.com/all-things-ai/in-depth-parameter-tuning-for-random-forest-d67bb7e920d
[parameters files]:
/doc/user-guide/project-structure/dvcyaml-files#parameters-files

## Experiment results

The results of the last `dvc exp run` can be seen in the <abbr>workspace</abbr>.
Expand Down
42 changes: 10 additions & 32 deletions content/docs/user-guide/project-structure/dvcyaml-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,21 +87,22 @@ $ dvc stage add -n a_stage "./a_script.sh > /dev/null 2>&1"
$ dvc exp init './another_script.sh $MYENVVAR'
```

### Parameters
### Parameter dependencies

<abbr>Parameters</abbr> are simple key/value pairs consumed by the `command`
code from a structured [parameters file](#parameters-files). They are defined
per-stage in the `params` field of `dvc.yaml` and should contain one of these:
[Parameters](/doc/command-reference/params) are a special type of stage
dependency. They consist of a list of params to track in one of these formats:

1. A param name that can be found in `params.yaml` (default params file);
1. A param key/value pair that can be found in `params.yaml` (default params
file);
2. A dictionary named by the file path to a custom params file, and with a list
of param key/value pairs to find in it;
3. An empty set (give no value or use `null`) named by the file path to a params
file: to track all the params in it dynamically.

<admon type="info">

Dot-separated param names become tree paths to locate values in the params file.
Note that file paths used must be to valid YAML, JSON, TOML, or Python
parameters file.

</admon>

Expand All @@ -113,39 +114,16 @@ stages:
- raw.txt
params:
- threshold # track specific param (from params.yaml)
- nn.batch_size
- passes
- myparams.yaml: # track specific params from custom file
- epochs
- config.json: # track all parameters in this file
outs:
- clean.txt
```

<admon type="tip">

Params are a more granular type of stage dependency: multiple `stages` can use
the same params file, but only certain values will affect their state (see
`dvc status`).

</admon>

#### Parameters files

The supported params file formats are YAML 1.2, JSON, TOML 1.0, [and Python].
[Parameter](#parameters) key/value pairs should be organized in tree-like
hierarchies inside. Supported value types are: string, integer, float, boolean,
and arrays (groups of params).

These files are typically written manually (or generated) and they can be
versioned directly with Git along with other <abbr>workspace</abbr> files.

[and python]: /doc/command-reference/params#examples-python-parameters-file

<admon type="tip">

See also `dvc params diff` to compare params across project version.

</admon>
This allows several stages to depend on values of a shared structured file
(which can be versioned directly with Git). See also `dvc params diff`.

### Metrics and Plots outputs

Expand Down

0 comments on commit 23cd9c6

Please sign in to comment.