Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: cleanup of plots content #3902

Merged
merged 1 commit into from
Aug 31, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 1 addition & 4 deletions content/docs/sidebar.json
Original file line number Diff line number Diff line change
Expand Up @@ -154,17 +154,14 @@
"share-a-dvc-cache"
]
},
{
"slug": "visualizing-plots"
},
"visualizing-plots",
"setup-google-drive-remote",
"large-dataset-optimization",
"external-dependencies",
{
"label": "Managing External Data",
"slug": "managing-external-data"
},

"running-dvc-on-windows",
"troubleshooting",
"related-technologies",
Expand Down
50 changes: 23 additions & 27 deletions content/docs/user-guide/project-structure/dvcyaml-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -475,28 +475,34 @@ validation and auto-completion.

## Top-level plot definitions

The list of `plots` contains one or more user-defined top-level plots (paths
relative to the location of `dvc.yaml`).

Every plot has to have its own ID. Configuration, if provided, should be a
dictionary.
The list of `plots` contains one or more user-defined top-level plots. Every
plot has to have a unique identifiear, which may be either a file path (relative
to the location of `dvc.yaml`) or an arbitrary string. Optional configuration
can be given as a dictionary.

In the simplest use case, a user can provide the file path as the plot ID and
not provide configuration at all:

In the simplest use, you can provide the plot's file path and no configuration.
In that case, the default behavior will be applied. In the example below, DVC
will take data from `logs.csv` and apply the `linear` plot
[template](/doc/user-guide/visualizing-plots#plot-templates-data-series-only) to
the last found column:

```yaml
# dvc.yaml
---
plots:
logs.csv:
```

In that case the default behavior will be applied. DVC will take data from
`logs.csv` file and apply `linear` plot
[template](/doc/user-guide/visualizing-plots#plot-templates-data-series-only) to
the last found column (CSV, TSV files) or field (JSON, YAML).

We can customize the plot by adding appropriate fields to the configuration:
We can customize the plot by adding appropriate fields to the configuration.
Below, we provided `confusion_matrix` as a plot ID. It will be displayed in the
plot as a title, unless we override it with `title` field. We also provided the
data source in the `y` axis definition. Data will be sourced from
`confusion_matrix_data.csv`. As `y` axis we will use `predicted_class` field. On
the `x` axis we will have the `actual_class` field. Note that DVC will assume
that `actual_class` is inside `confusion_matrix_data.csv`:

```yaml
# dvc.yaml
Expand All @@ -509,14 +515,9 @@ plots:
template: confusion
```

In this case we provided `confusion_matrix` as a plot ID. It will be displayed
in the plot as a title, unless we override it with `title` field. In this case
we provided data source in `y` axis definition. Data will be sourced from
`confusion_matrix_data.csv`. As `y` axis we will use `predicted_class` field. On
`x` axis we will have `actual_class` field. Note that DVC will assume that
`actual_class` is inside `confusion_matrix_data.csv`.

We can provide multiple columns/fields from the same file:
We can provide multiple columns/fields from the same file. In this case, we will
take `accuracy` and `loss` fields and display them against the `epoch` column,
all coming from the `logs.csv` file:

```yaml
#dvc.yaml
Expand All @@ -528,10 +529,9 @@ plots:
x: epoch
```

In this case, we will take `accuracy` and `loss` fields and display them agains
`epoch` column, all coming from `logs.csv` file.

We can source the data from multiple files too:
We can source the data from multiple files too. In this case we will plot the
`accuracy` field from both `train_logs.csv` and `test_logs.csv` against the
`epoch`. Note that both files have to have the `epoch` field:

```yaml
#dvc.yaml
Expand All @@ -544,10 +544,6 @@ plots:
x: epoch
```

In this case we will plot `accuracy` field from both `train_logs.csv` and
`test_logs.csv` against the `epoch`. Note that both files have to have `epoch`
field.

### Available configuration fields

- `x` - field name from which the X axis data comes from. An auto-generated
Expand Down
76 changes: 38 additions & 38 deletions content/docs/user-guide/visualizing-plots.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@

A typical workflow for DVC plots is:

1. Save data to a [supported file format](#supported-file-formats).
1. [Save data](#generating-plots-files) to a
[supported file format](#supported-file-formats) (for example, as a
[pipeline output](#stage-plots)).

```csv
fpr,tpr,threshold
Expand Down Expand Up @@ -30,7 +32,7 @@ plots:
![](/img/plots_cm_get_started_show.svg)

4. Run [experiments](/doc/user-guide/experiment-management/experiments-overview)
and [compare](/doc/command-reference/plots/diff) plots.
and [compare](#comparing-plots) plots.

![](/img/plots_prc_get_started_diff.svg)
![](/img/plots_roc_get_started_diff.svg)
Expand All @@ -40,12 +42,12 @@ plots:

To generate the data files for plots, you can:

1. Use [DVCLive](/doc/dvclive/dvclive-with-dvc) in your Python code to log the
data in the expected format for you.
2. Save data yourself in one of the
[supported file formats](#supported-file-formats).
3. Save an image file of the visualization (helpful for custom visualizations
that would be hard to configure in DVC).
- Use [DVCLive](/doc/dvclive/dvclive-with-dvc) in your Python code to log the
data in the expected format for you.
- Save data yourself in one of the
[supported file formats](#supported-file-formats).
- Save an image file of the visualization (helpful for custom visualizations
that would be hard to configure in DVC).

## Supported file formats

Expand Down Expand Up @@ -197,33 +199,21 @@ file:///Users/usr/src/dvc_plots/index.html

In order to create visualizations, users need to provide the data and
(optionally) configuration that will help customize the plot. DVC provides two
ways to configure visualizations. Users can mark specific stage
<abbr>outputs</abbr> as plots or define top-level `plots` in `dvc.yaml`.

### Stage plots

When using `dvc stage add`, instead of using `--outs/--outs-no-cache` particular
outputs can be marked with `--plots/--plots-no-cache`. This will tell DVC that
they are intended for visualizations.

Upon running `dvc plots show/diff` DVC will collect stage plots alongside the
[top-level plots](#top-level-plots) and display them conforming to their
configuration. Note, that if there are stage plots in the project and they are
also used in some top-level definitions, DVC will create separate rendering for
the stage plots and all definitions using them.

This special type of outputs might come in handy if users want to visually
compare experiments results with other experiments versions and not bother with
writing top-level plot definitions in `dvc.yaml`.
ways to configure visualizations. Users can define top-level plots in `dvc.yaml`
or mark specific stage <abbr>outputs</abbr> as plots. Upon running
`dvc plots show/diff` DVC will collect both top-level plots and stage plots and
display them conforming to their configuration.

### Top-level plots

Plots can also be defined in a top-level `plots` key in `dvc.yaml`. Unlike
[stage plots](#stage-plots), these definitions let you overlay plots from
different data sources, for example training vs. test results (on the current
project version). Conversely, you can create multiple plots from a single source
file. You can also use any plot file in the project, regardless of whether it's
a stage outputs. This creates a separation between visualization and outputs.
Plots can be defined in a
[top-level `plots` key](/doc/user-guide/project-structure/dvcyaml-files#top-level-plot-definitions)
in `dvc.yaml`. Unlike [stage plots](#stage-plots), these definitions let you
overlay plots from different data sources, for example training vs. test results
(on the current project version). Conversely, you can create multiple plots from
a single source file. You can also use any plot file in the project, regardless
of whether it's a stage outputs. This creates a separation between visualization
and outputs.

In order to define the plot users need to provide data and an optional
configuration for the plot. The plots should be defined in `dvc.yaml` file under
Expand Down Expand Up @@ -267,14 +257,27 @@ configuration fields] for the full specification.
[available configuration fields]:
/doc/user-guide/project-structure/dvcyaml-files#available-configuration-fields

### Stage plots

When using `dvc stage add`, instead of using `--outs/--outs-no-cache` particular
outputs can be marked with `--plots/--plots-no-cache`. This will tell DVC that
they are intended for visualizations.

If the same file is used in stage plots and some top-level plots definitions,
DVC will separately render each of them.

Stage plots might come in handy if users want to not bother with writing
top-level plot definitions in `dvc.yaml`.

## Plot templates (data-series only)

DVC uses [Vega-Lite](https://vega.github.io/vega-lite/) JSON specifications to
create plots from user data. A set of built-in _plot templates_ are included.

The `linear` template is the default. It can be changed with the `--template`
(`-t`) option of `dvc plots show` and `dvc plots diff`. The argument provided to
`--template` can be a (built-in) template name or a path to a [custom template].
`--template` can be a (built-in) template name or a path to a
[custom template](#custom-templates).

<admon type="tip">

Expand All @@ -293,17 +296,14 @@ DVC has the following built-in plot templates:
- `smooth` - linear plot with LOESS smoothing, see
[example](#example-smooth-plot)
- `confusion` - confusion matrix, see [example](#example-confusion-matrix)

[custom templates]: /doc/command-reference/plots/templates

- `confusion_normalized` - confusion matrix with values normalized to <0, 1>
range

Note that in the case of CSV/TSV metrics files, column names from the table
header (first row) are equivalent to field names.

Refer to [`templates`](/doc/command-reference/plots/templates) command for more
information on how to prepare your own template from pre-defined ones.
Refer to [`templates`](#custom-templates) command for more information on how to
prepare your own template from pre-defined ones.

### Example: Smooth plot

Expand Down