Skip to content

Commit

Permalink
Plots: some refactoring (guide vs ref vs file spec) (#3903)
Browse files Browse the repository at this point in the history
* guide: edits
per #3830 (review)

* guide: improvements to Plots

* guide: improve Plot spec

* ref: improve plots a little and
other edits in Plots guides

* Update content/docs/user-guide/visualizing-plots.md

* guide: list top-level plots before stage

* guide: combine jorge and dave edits for plots

* guide: consolidate plots guide

Co-authored-by: dberenbaum <[email protected]>
  • Loading branch information
jorgeorpinel and dberenbaum authored Sep 7, 2022
1 parent f85cc08 commit bdbcd72
Show file tree
Hide file tree
Showing 10 changed files with 221 additions and 350 deletions.
14 changes: 8 additions & 6 deletions content/docs/command-reference/plots/diff.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,7 @@ positional arguments:

## Description

This command is a way to visualize the "difference" between
[certain metrics](/doc/user-guide/visualizing-plots#supported-file-formats)
This command is a way to visualize the "difference" between [certain metrics]
among versions of the <abbr>repository</abbr>, by overlaying them in a single
plot.

Expand All @@ -39,17 +38,20 @@ all of them in a single image).
All plots defined in `dvc.yaml` are used by default, but specific files can be
specified with the `--targets` option (any valid plots file is accepted).

The plot style can be customized with
[plot templates](/doc/user-guide/visualizing-plots#plot-templates-data-series-only),
using the `--template` option. To learn more about plots files and templates
please see `dvc plots`.
The plot style can be customized with [plot templates], using the `--template`
option. To learn more about plots files and templates please see `dvc plots`.

> Note that the default behavior of this command can be modified per metrics
> file with `dvc plots modify`.
Another way to display plots is the `dvc plots show` command, which just lists
all the current plots, without comparisons.

[certain metrics]: /doc/user-guide/visualizing-plots#supported-plot-file-formats

[plot templates]
/doc/user-guide/visualizing-plots#plot-templates-data-series-only

## Options

- `--targets <paths>` - specific plots files to visualize. It accepts `paths` to
Expand Down
17 changes: 13 additions & 4 deletions content/docs/command-reference/plots/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,20 @@ positional arguments:

## Description

`dvc plots` subcommands help you visualize and compare data or images produced
by machine learning projects, letting you fully customize graph settings and
style.
You can visualize and compare JSON, YAML 1.2, CSV, TSV data files or JPEG, GIF,
PNG images found in your project. Typically these are artifacts of an [ML
pipeline] or performance logs produced by [DVCLive].

📖 See [Visualizing Plots](/doc/user-guide/visualizing-plots) for more info.
`dvc plots` subcommands help you customize and generate these plots.

📖 See [Visualizing Plots] as well as the top-level plots definition
[specification] for more details.

[ml pipeline]: /doc/start/data-management/pipelines
[dvclive]: /doc/dvclive/dvclive-with-dvc
[visualizing plots]: /doc/user-guide/visualizing-plots
[specification]:
/doc/user-guide/project-structure/dvcyaml-files#top-level-plot-definitions

## Options

Expand Down
8 changes: 4 additions & 4 deletions content/docs/command-reference/plots/show.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,8 @@ positional arguments:

## Description

This command provides a quick way to visualize
[certain data](/doc/user-guide/visualizing-plots#supported-file-formats) such as
loss functions, AUC curves, confusion matrices, etc.
This command provides a quick way to visualize [certain data] such as loss
functions, AUC curves, confusion matrices, etc.

All plots defined in `dvc.yaml` are used by default, but specific plots files or
[top-level plot] IDs can be specified as `targets` (note that target files don't
Expand All @@ -39,6 +38,7 @@ The default behavior of this command can be modified per [stage plot] file with

</admon>

[certain data]: /doc/user-guide/visualizing-plots#supported-plot-file-formats
[plot templates]:
/doc/user-guide/visualizing-plots#plot-templates-data-series-only
[top-level plot]: /doc/user-guide/visualizing-plots#top-level-plots
Expand Down Expand Up @@ -91,7 +91,7 @@ The default behavior of this command can be modified per [stage plot] file with

## Example: Hierarchical data

We'll use tabular metrics file `train.json` for this example:
We'll use hierarchical metrics file `train.json` for this example:

```json
{
Expand Down
102 changes: 26 additions & 76 deletions content/docs/user-guide/project-structure/dvcyaml-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -475,94 +475,44 @@ validation and auto-completion.

## Top-level plot definitions

The list of `plots` contains one or more user-defined top-level plots. Every
plot has to have a unique identifiear, which may be either a file path (relative
to the location of `dvc.yaml`) or an arbitrary string. Optional configuration
can be given as a dictionary.
The `plots` dictionary contains one or more user-defined `dvc plots`
configurations. Every plot needs a unique ID, which may be either a file path
(relative to the location of `dvc.yaml`) or an arbitrary string. Optional
configuration fields can be provided as well.

In the simplest use case, a user can provide the file path as the plot ID and
not provide configuration at all:
📖 Refer to [Visualizing Plots] and `dvc plots show` for examples.

In the simplest use, you can provide the plot's file path and no configuration.
In that case, the default behavior will be applied. In the example below, DVC
will take data from `logs.csv` and apply the `linear` plot
[template](/doc/user-guide/visualizing-plots#plot-templates-data-series-only) to
the last found column:
[visualizing plots]: /doc/user-guide/visualizing-plots#top-level-plots

```yaml
# dvc.yaml
---
plots:
logs.csv:
```
### Available configuration fields

We can customize the plot by adding appropriate fields to the configuration.
Below, we provided `confusion_matrix` as a plot ID. It will be displayed in the
plot as a title, unless we override it with `title` field. We also provided the
data source in the `y` axis definition. Data will be sourced from
`confusion_matrix_data.csv`. As `y` axis we will use `predicted_class` field. On
the `x` axis we will have the `actual_class` field. Note that DVC will assume
that `actual_class` is inside `confusion_matrix_data.csv`:
- `x` (string) - column/field name from which the X axis data comes from. An
auto-generated _step_ field is used by default.

```yaml
# dvc.yaml
---
plots:
confusion_matrix:
y:
confusion_matrix_data.csv: predicted_class
x: actual_class
template: confusion
```
- `y` - source from which the Y axis data comes from:

We can provide multiple columns/fields from the same file. In this case, we will
take `accuracy` and `loss` fields and display them against the `epoch` column,
all coming from the `logs.csv` file:
- Top-level plots: Accepts string, list, or dictionary. For strings and lists,
the plot ID is used as path to the data source. List elements will be the
names of columns/fields within the source file. For dictionaries, the keys
are used as paths to data sources. The values (strings or lists) are treated
as the source column/field names.

```yaml
#dvc.yaml
---
plots:
multiple_series:
y:
logs.csv: [accuracy, loss]
x: epoch
```
- Plot outputs: column/field name found in the source plots file.

We can source the data from multiple files too. In this case we will plot the
`accuracy` field from both `train_logs.csv` and `test_logs.csv` against the
`epoch`. Note that both files have to have the `epoch` field:
- `x_label` (string) - X axis label. Defaults to the X field name.

```yaml
#dvc.yaml
---
plots:
multiple_files:
y:
train_logs.csv: accuracy
test_logs.csv: accuracy
x: epoch
```
- `y_label` (string) - Y axis label. If all `y` data sources have the same field
name, that will be the default. Otherwise, it's "y".

### Available configuration fields
- `title` (string) - header for the plot(s). Defaults:

- `x` - field name from which the X axis data comes from. An auto-generated
_step_ field is used by default. It has to be a string.

- `y` - field name from which the Y axis data comes from.
- Top-level plots: It can be a string, list or dictionary. If its a string or
list, it is assumed that plot ID will be the path to the data source.
String, or list elements will be the names of data columns or fields withing
the source file. If this field is a dictionary, it is assumed that its keys
are paths to data sources. The values have to be either strings or lists,
and are treated as column(s)/field(s) within respective files.
- Plot outputs: It is a field name from which the Y axis data comes from.
- `x_label` - X axis label. The X field name is the default.
- `y_label` - Y axis label. If all provided Y entries have the same field name,
this name will be the default, `y` string otherwise.
- `title` - Plot title. Defaults:
- Top-level plots: `path/to/dvc.yaml::plot_id`
- Plot outputs: Path to the file.
- Plot outputs: `path/to/data.csv`

- `template` (string) - [plot template]. Defaults to `linear`.

[plot template]:
https://dvc.org/doc/user-guide/visualizing-plots#plot-templates-data-series-only

## dvc.lock file

Expand Down
Loading

0 comments on commit bdbcd72

Please sign in to comment.