From 4b35a28d075daf921e0d4def73b8f153ae253241 Mon Sep 17 00:00:00 2001 From: Dmitry Petrov Date: Mon, 4 May 2020 23:15:37 -0700 Subject: [PATCH] new plot options & custom templates --- content/docs/command-reference/plot/diff.md | 50 ++++++++++----- content/docs/command-reference/plot/index.md | 67 +++++++++++++++++--- content/docs/command-reference/plot/show.md | 67 ++++++++++++++------ 3 files changed, 138 insertions(+), 46 deletions(-) diff --git a/content/docs/command-reference/plot/diff.md b/content/docs/command-reference/plot/diff.md index 0d1ab9ea06..596a2f3f15 100644 --- a/content/docs/command-reference/plot/diff.md +++ b/content/docs/command-reference/plot/diff.md @@ -7,10 +7,9 @@ them in a single image. ## Synopsis ```usage -usage: dvc plot diff [-h] [-q | -v] [-t [TEMPLATE]] [-d [DATAFILE]] - [-r RESULT] [--no-html] [-f FIELDS] [-o] - [--no-csv-header] - [revisions [revisions ...]] +usage: dvc plot diff [-h] [-q | -v] [-t [TEMPLATE]] [-d [DATAFILE]] [-f FILE] + [-s SELECT] [-x X] [-y Y] [--stdout] [--no-csv-header] + [--no-html] [--title TITLE] [--xlab XLAB] [--ylab YLAB] positional arguments: revisions Git revisions to plot from @@ -44,21 +43,37 @@ an output. ## Options -- `-t [TEMPLATE], --template [TEMPLATE]` - File to be injected with data. - - `-d [DATAFILE], --datafile [DATAFILE]` - Continuous metrics file to visualize. -- `-r RESULT, --result RESULT` - Name of the generated file. +- `-t [TEMPLATE], --template [TEMPLATE]` - File to be injected with data. The + default temlpate is `.dvc/plot/default.json`. See more details in `dvc plot`. -- `--no-html` - Do not wrap vega plot json with HTML. +- `-f FILE, --file FILE` - Name of the generated file. By default, the output + file name is equal to the input filename with additional `.html` suffix or + `.json` suffix for `--no-html` mode. -- `-f FIELDS, --fields FIELDS` - Choose which fileds or jsonpath to put into - plot. +- `--no-html` - Do not wrap output vega plot json with HTML. -- `--no-csv-header` - Provided CSV or TSV datafile does not have a header. +- `-s SELECT, --select SELECT` - Select which fileds or jsonpath to put into + plot. All the fields will be included by default with DVC generated `index` + field - see `dvc plot`. + +- `-x X` - Field name for x axis. `index` is the default field for X. + +- `-y Y` - Field name for y axis. The dafult field is the last field found in + the input file: the last column in CSV file or the last field in the JSON + array object (the first object). + +- `--xlab XLAB` - X axis title. The X column name is the default title. + +- `--ylab YLAB` - Y axis title. The Y column name is the default title. + +- `--title TITLE` - Plot title. - `-o, --stdout` - Print plot content to stdout. +- `--no-csv-header` - Provided CSV or TSV datafile does not have a header. + - `-h`, `--help` - prints the usage/help message, and exit. - `-q`, `--quiet` - do not write anything to standard output. Exit with 0 if no @@ -73,18 +88,19 @@ one: ```dvc $ dvc plot diff -d logs.csv -file:///Users/dmitry/src/plot/logs.html +file:///Users/dmitry/src/plot/logs.csv.html ``` -A new file `logs.html` was generated. User can open it in a web browser. +A new file `logs.csv.html` was generated. User can open it in a web browser. ![](/img/plot_diff_workspace.svg) -The difference between two specified commits: +The difference between two specified commits (multiple commits, tag or branches +can be specified): ```dvc $ dvc plot diff -d logs.csv HEAD 11c0bf1 -file:///Users/dmitry/src/plot/logs.html +file:///Users/dmitry/src/plot/logs.csv.html ``` ![](/img/plot_diff.svg) @@ -107,8 +123,8 @@ turtle,cat ``` ```dvc -$ dvc plot diff -d classes.csv -t confusion_matrix -file:///Users/dmitry/src/test/plot_old/classes.html +$ dvc plot diff -d classes.csv -t confusion +file:///Users/dmitry/src/test/plot_old/classes.csv.html ``` ![](/img/plot_diff_confusion.svg) diff --git a/content/docs/command-reference/plot/index.md b/content/docs/command-reference/plot/index.md index e246f443a4..1cb3672d89 100644 --- a/content/docs/command-reference/plot/index.md +++ b/content/docs/command-reference/plot/index.md @@ -23,9 +23,9 @@ DVC provides a set of commands to visualize _continuous metrics_ of machine learning experiments. Usual examples of plots are AUC curves, loss functions, and confusion matrices. -Continous metrics represents a plot and should be stored as data series in one +Continuous metrics represent plots, and should be stored as data series in one of the supported [file formats](#file-formats). These files are usually created -by users or generated by user's modeling or data processing code. +by users or generated by user modeling or data processing code. The plot commands can work with these continuous metrics files that are commited to a repository history, data files controlled by DVC or files from workspace. @@ -113,8 +113,57 @@ programming language or environment which allows DVC stay programming language agnostic. Plot templates are stored in `.dvc/plot/` directory as json files. A user can -define it's own templates or modify the existing ones. Please see more details -in `dvc plot show` and `dvc plot diff`. +define it's own templates or modify the existing ones. The default template is +`.dvc/plot/default.json`. User can change the temlpate by `--template` or `-t` +option of `dvc plot show` or `dvc plot diff` commands and specifying a file +name. + +For temlpates in the templates directory the path and the json extension are not +required. User can specify only `--template scatter` instead of +`--template .dvc/plot/scatter.json`. Any custom template can be added to the +temlpate directory. + +### Custom templates + +User can define their own temlpate for specific plot types. Any temlpate file is +a JSON specification with predefined DVC anchors that help DVC to inject user's +data properly. + +All input JSON files of `dvc plot show` and `dvc plot diff` commands are +combined together into a single array for the injection to a template file. + +There are two important additional signals or fields that DVC adds: + +- `rev` - specified revision, tag or branch of input file. This option helps to + destinguish between different revisions of the file in `dvc plot diff` + command. + +- `index` - is a ordering number in the file. In many cases it corresponds to + mchine learning training epoch or step number. + +DVC applies the same logic to all input CSV files but first transforms all CSV +data into JSON. DVC uses CSV files columns name from a header for JSON +conversion. + +DVC temlpate anchors: + +- `` - Plotting command input data from either CSV or JSON + files is converted to JSON array and injected instead of this anchor. Two + additional signal will be added `index` and `rev` - revision (See above). + +- `` - A plot title that can be defined by `--title` option. + +- `` - a field name for Y axis of the plot. It can be defined by + `-y` option of the commands. The dafult field is the last field found in the + input file: the last column in CSV file or the last field in the JSON array + object. + +- `` - a field name for Y axes. It can be defined by `-x` option. + `index` is the default field for X. + +- `` - a displayed field label for Y. + +- `` - a displayed field label for X. ## Options @@ -142,7 +191,7 @@ epoch,accuracy,loss,val_accuracy,val_loss ```dvc $ dvc plot show logs.csv -file:///Users/dmitry/src/plot/logs.html +file:///Users/dmitry/src/plot/logs.csv.html ``` ![](/img/plot_show.svg) @@ -151,7 +200,7 @@ Difference between the current file and the previous commited one: ```dvc $ dvc plot diff -d logs.csv HEAD^ -file:///Users/dmitry/src/plot/logs.html +file:///Users/dmitry/src/plot/logs.csv.html ``` ![](/img/plot_diff.svg) @@ -159,7 +208,7 @@ file:///Users/dmitry/src/plot/logs.html Visualize a specific field: ```dvc -$ dvc plot show --field loss logs.csv +$ dvc plot show -y loss logs.csv file:///Users/dmitry/src/plot/logs.html ``` @@ -183,8 +232,8 @@ turtle,cat ``` ```dvc -$ dvc plot show classes.csv --template confusion_matrix -file:///Users/dmitry/src/plot/classes.html +$ dvc plot show classes.csv --template confusion -x actual -y predicted +file:///Users/dmitry/src/plot/classes.csv.html ``` ![](/img/plot_show_confusion.svg) diff --git a/content/docs/command-reference/plot/show.md b/content/docs/command-reference/plot/show.md index 486b58e6c0..70ae6f22c8 100644 --- a/content/docs/command-reference/plot/show.md +++ b/content/docs/command-reference/plot/show.md @@ -6,9 +6,9 @@ Generate a plot image from from a ## Synopsis ```usage -usage: dvc plot show [-h] [-q | -v] [-t [TEMPLATE]] [-r RESULT] [--show-json] - [-f FIELDS] - [datafile] +usage: dvc plot show [-h] [-q | -v] [-t [TEMPLATE]] [-f FILE] [-s SELECT] + [-x X] [-y Y] [--stdout] [--no-csv-header] [--no-html] + [--title TITLE] [--xlab XLAB] [--ylab YLAB] positional arguments: datafile Data to be visualized. @@ -22,14 +22,30 @@ information. ## Options -- `-t [TEMPLATE], --template [TEMPLATE]` - File to be injected with data. +- `-t [TEMPLATE], --template [TEMPLATE]` - File to be injected with data. The + default temlpate is `.dvc/plot/default.json`. See more details in `dvc plot`. -- `-r RESULT, --result RESULT` - Name of the generated file. +- `-f FILE, --file FILE` - Name of the generated file. By default, the output + file name is equal to the input filename with additional `.html` suffix or + `.json` suffix for `--no-html` mode. -- `--no-html` - Do not wrap vega plot json with HTML. +- `--no-html` - Do not wrap output vega plot json with HTML. -- `-f FIELDS, --fields FIELDS` - Choose which fileds or jsonpath to put into - plot. +- `-s SELECT, --select SELECT` - Select which fileds or jsonpath to put into + plot. All the fields will be included by default with DVC generated `index` + field - see `dvc plot`. + +- `-x X` - Field name for x axis. `index` is the default field for X. + +- `-y Y` - Field name for y axis. The dafult field is the last field found in + the input file: the last column in CSV file or the last field in the JSON + array object (the first object). + +- `--xlab XLAB` - X axis title. The X column name is the default title. + +- `--ylab YLAB` - Y axis title. The Y column name is the default title. + +- `--title TITLE` - Plot title. - `-o, --stdout` - Print plot content to stdout. @@ -58,30 +74,41 @@ epoch,accuracy,loss,val_accuracy,val_loss 7,0.9954,0.01396906608727198,0.9802,0.07247738889862157 ``` -By default, the command plots the last column of the tabular file. +By default, the command plots the last column of the tabular file. Please look +at the default behaviour of `-y` option. ```dvc $ dvc plot show logs.csv -file:///Users/dmitry/src/plot/logs.html +file:///Users/dmitry/src/plot/logs.csv.html ``` ![](/img/plot_show.svg) -Use `--field` option to changing column to visualize: +Use `-y` option to change column to visualize: ```dvc -$ dvc plot show --field loss logs.csv -file:///Users/dmitry/src/plot/logs.html +$ dvc plot show -y loss logs.csv +file:///Users/dmitry/src/plot/logs.csv.html ``` ![](/img/plot_show_field.svg) +In the previous examlpe all the columns (or fields) were included into the +output file. You can select only specified subset ot the columns by `--select` +option which might be important for reducing the output file size. In this case +the default `index` column will be still included. + +```dvc +$ dvc plot show -y loss --select loss logs.csv +file:///Users/dmitry/src/plot/logs.csv.html +``` + A tabular file without header can be plotted with `--no-csv-header` option. A field can be specified through column number (starting with 0): ```dvc $ dvc plot show --no-csv-header --field 2 logs.csv -file:///Users/dmitry/src/plot/logs.html +file:///Users/dmitry/src/plot/logs.csv.html ``` In many automation scenarios (like CI/CD for ML), it is convinient to have Vega @@ -92,7 +119,7 @@ Note, the result file extension changes to JSON: ``` $ dvc plot show --no-html logs.csv -file:///Users/dmitry/src/plot/logs.json +file:///Users/dmitry/src/plot/logs.csv.json ``` JSON file plotting example: @@ -116,15 +143,15 @@ find. ```dvc $ dvc plot show train.json -file:///Users/dmitry/src/plot/train.html +file:///Users/dmitry/src/plot/train.json.html ``` ![](/img/plot_show.svg) -The field name can be specified with the same `--field` option. The signal from -the first JSON array with the specified name will be showned: +The field name can be specified with the same `-y` option. The signal from the +first JSON array with the specified name will be showned: ```dvc -$ dvc plot show --field accuracy logs.json -file:///Users/dmitry/src/plot/logs.html +$ dvc plot show -y accuracy logs.json +file:///Users/dmitry/src/plot/logs.json.html ```