From 0597ebeacda0c6de2cc7293e0ef6887795682598 Mon Sep 17 00:00:00 2001 From: dberenbaum Date: Tue, 30 Aug 2022 12:33:06 -0400 Subject: [PATCH] docs: cleanup of plots content --- content/docs/sidebar.json | 5 +- .../project-structure/dvcyaml-files.md | 50 ++++++------ content/docs/user-guide/visualizing-plots.md | 76 +++++++++---------- 3 files changed, 62 insertions(+), 69 deletions(-) diff --git a/content/docs/sidebar.json b/content/docs/sidebar.json index 661c4e8806..fae64d9bcb 100644 --- a/content/docs/sidebar.json +++ b/content/docs/sidebar.json @@ -154,9 +154,7 @@ "share-a-dvc-cache" ] }, - { - "slug": "visualizing-plots" - }, + "visualizing-plots", "setup-google-drive-remote", "large-dataset-optimization", "external-dependencies", @@ -164,7 +162,6 @@ "label": "Managing External Data", "slug": "managing-external-data" }, - "running-dvc-on-windows", "troubleshooting", "related-technologies", diff --git a/content/docs/user-guide/project-structure/dvcyaml-files.md b/content/docs/user-guide/project-structure/dvcyaml-files.md index f76454f31e..44ae121e09 100644 --- a/content/docs/user-guide/project-structure/dvcyaml-files.md +++ b/content/docs/user-guide/project-structure/dvcyaml-files.md @@ -475,15 +475,20 @@ validation and auto-completion. ## Top-level plot definitions -The list of `plots` contains one or more user-defined top-level plots (paths -relative to the location of `dvc.yaml`). - -Every plot has to have its own ID. Configuration, if provided, should be a -dictionary. +The list of `plots` contains one or more user-defined top-level plots. Every +plot has to have a unique identifiear, which may be either a file path (relative +to the location of `dvc.yaml`) or an arbitrary string. Optional configuration +can be given as a dictionary. In the simplest use case, a user can provide the file path as the plot ID and not provide configuration at all: +In the simplest use, you can provide the plot's file path and no configuration. +In that case, the default behavior will be applied. In the example below, DVC +will take data from `logs.csv` and apply the `linear` plot +[template](/doc/user-guide/visualizing-plots#plot-templates-data-series-only) to +the last found column: + ```yaml # dvc.yaml --- @@ -491,12 +496,13 @@ plots: logs.csv: ``` -In that case the default behavior will be applied. DVC will take data from -`logs.csv` file and apply `linear` plot -[template](/doc/user-guide/visualizing-plots#plot-templates-data-series-only) to -the last found column (CSV, TSV files) or field (JSON, YAML). - -We can customize the plot by adding appropriate fields to the configuration: +We can customize the plot by adding appropriate fields to the configuration. +Below, we provided `confusion_matrix` as a plot ID. It will be displayed in the +plot as a title, unless we override it with `title` field. We also provided the +data source in the `y` axis definition. Data will be sourced from +`confusion_matrix_data.csv`. As `y` axis we will use `predicted_class` field. On +the `x` axis we will have the `actual_class` field. Note that DVC will assume +that `actual_class` is inside `confusion_matrix_data.csv`: ```yaml # dvc.yaml @@ -509,14 +515,9 @@ plots: template: confusion ``` -In this case we provided `confusion_matrix` as a plot ID. It will be displayed -in the plot as a title, unless we override it with `title` field. In this case -we provided data source in `y` axis definition. Data will be sourced from -`confusion_matrix_data.csv`. As `y` axis we will use `predicted_class` field. On -`x` axis we will have `actual_class` field. Note that DVC will assume that -`actual_class` is inside `confusion_matrix_data.csv`. - -We can provide multiple columns/fields from the same file: +We can provide multiple columns/fields from the same file. In this case, we will +take `accuracy` and `loss` fields and display them against the `epoch` column, +all coming from the `logs.csv` file: ```yaml #dvc.yaml @@ -528,10 +529,9 @@ plots: x: epoch ``` -In this case, we will take `accuracy` and `loss` fields and display them agains -`epoch` column, all coming from `logs.csv` file. - -We can source the data from multiple files too: +We can source the data from multiple files too. In this case we will plot the +`accuracy` field from both `train_logs.csv` and `test_logs.csv` against the +`epoch`. Note that both files have to have the `epoch` field: ```yaml #dvc.yaml @@ -544,10 +544,6 @@ plots: x: epoch ``` -In this case we will plot `accuracy` field from both `train_logs.csv` and -`test_logs.csv` against the `epoch`. Note that both files have to have `epoch` -field. - ### Available configuration fields - `x` - field name from which the X axis data comes from. An auto-generated diff --git a/content/docs/user-guide/visualizing-plots.md b/content/docs/user-guide/visualizing-plots.md index 8551bd3741..772afbecfb 100644 --- a/content/docs/user-guide/visualizing-plots.md +++ b/content/docs/user-guide/visualizing-plots.md @@ -2,7 +2,9 @@ A typical workflow for DVC plots is: -1. Save data to a [supported file format](#supported-file-formats). +1. [Save data](#generating-plots-files) to a + [supported file format](#supported-file-formats) (for example, as a + [pipeline output](#stage-plots)). ```csv fpr,tpr,threshold @@ -30,7 +32,7 @@ plots: ![](/img/plots_cm_get_started_show.svg) 4. Run [experiments](/doc/user-guide/experiment-management/experiments-overview) - and [compare](/doc/command-reference/plots/diff) plots. + and [compare](#comparing-plots) plots. ![](/img/plots_prc_get_started_diff.svg) ![](/img/plots_roc_get_started_diff.svg) @@ -40,12 +42,12 @@ plots: To generate the data files for plots, you can: -1. Use [DVCLive](/doc/dvclive/dvclive-with-dvc) in your Python code to log the - data in the expected format for you. -2. Save data yourself in one of the - [supported file formats](#supported-file-formats). -3. Save an image file of the visualization (helpful for custom visualizations - that would be hard to configure in DVC). +- Use [DVCLive](/doc/dvclive/dvclive-with-dvc) in your Python code to log the + data in the expected format for you. +- Save data yourself in one of the + [supported file formats](#supported-file-formats). +- Save an image file of the visualization (helpful for custom visualizations + that would be hard to configure in DVC). ## Supported file formats @@ -197,33 +199,21 @@ file:///Users/usr/src/dvc_plots/index.html In order to create visualizations, users need to provide the data and (optionally) configuration that will help customize the plot. DVC provides two -ways to configure visualizations. Users can mark specific stage -outputs as plots or define top-level `plots` in `dvc.yaml`. - -### Stage plots - -When using `dvc stage add`, instead of using `--outs/--outs-no-cache` particular -outputs can be marked with `--plots/--plots-no-cache`. This will tell DVC that -they are intended for visualizations. - -Upon running `dvc plots show/diff` DVC will collect stage plots alongside the -[top-level plots](#top-level-plots) and display them conforming to their -configuration. Note, that if there are stage plots in the project and they are -also used in some top-level definitions, DVC will create separate rendering for -the stage plots and all definitions using them. - -This special type of outputs might come in handy if users want to visually -compare experiments results with other experiments versions and not bother with -writing top-level plot definitions in `dvc.yaml`. +ways to configure visualizations. Users can define top-level plots in `dvc.yaml` +or mark specific stage outputs as plots. Upon running +`dvc plots show/diff` DVC will collect both top-level plots and stage plots and +display them conforming to their configuration. ### Top-level plots -Plots can also be defined in a top-level `plots` key in `dvc.yaml`. Unlike -[stage plots](#stage-plots), these definitions let you overlay plots from -different data sources, for example training vs. test results (on the current -project version). Conversely, you can create multiple plots from a single source -file. You can also use any plot file in the project, regardless of whether it's -a stage outputs. This creates a separation between visualization and outputs. +Plots can be defined in a +[top-level `plots` key](/doc/user-guide/project-structure/dvcyaml-files#top-level-plot-definitions) +in `dvc.yaml`. Unlike [stage plots](#stage-plots), these definitions let you +overlay plots from different data sources, for example training vs. test results +(on the current project version). Conversely, you can create multiple plots from +a single source file. You can also use any plot file in the project, regardless +of whether it's a stage outputs. This creates a separation between visualization +and outputs. In order to define the plot users need to provide data and an optional configuration for the plot. The plots should be defined in `dvc.yaml` file under @@ -267,6 +257,18 @@ configuration fields] for the full specification. [available configuration fields]: /doc/user-guide/project-structure/dvcyaml-files#available-configuration-fields +### Stage plots + +When using `dvc stage add`, instead of using `--outs/--outs-no-cache` particular +outputs can be marked with `--plots/--plots-no-cache`. This will tell DVC that +they are intended for visualizations. + +If the same file is used in stage plots and some top-level plots definitions, +DVC will separately render each of them. + +Stage plots might come in handy if users want to not bother with writing +top-level plot definitions in `dvc.yaml`. + ## Plot templates (data-series only) DVC uses [Vega-Lite](https://vega.github.io/vega-lite/) JSON specifications to @@ -274,7 +276,8 @@ create plots from user data. A set of built-in _plot templates_ are included. The `linear` template is the default. It can be changed with the `--template` (`-t`) option of `dvc plots show` and `dvc plots diff`. The argument provided to -`--template` can be a (built-in) template name or a path to a [custom template]. +`--template` can be a (built-in) template name or a path to a +[custom template](#custom-templates). @@ -293,17 +296,14 @@ DVC has the following built-in plot templates: - `smooth` - linear plot with LOESS smoothing, see [example](#example-smooth-plot) - `confusion` - confusion matrix, see [example](#example-confusion-matrix) - -[custom templates]: /doc/command-reference/plots/templates - - `confusion_normalized` - confusion matrix with values normalized to <0, 1> range Note that in the case of CSV/TSV metrics files, column names from the table header (first row) are equivalent to field names. -Refer to [`templates`](/doc/command-reference/plots/templates) command for more -information on how to prepare your own template from pre-defined ones. +Refer to [`templates`](#custom-templates) command for more information on how to +prepare your own template from pre-defined ones. ### Example: Smooth plot