From 82b8e0bc38cb1df41dd3864309e84376272d1374 Mon Sep 17 00:00:00 2001 From: Emre Sahin Date: Thu, 8 Jul 2021 17:37:40 +0300 Subject: [PATCH 01/15] copied to content from gs and added section headers --- .../experiment-management/sharing.md | 78 +++++++++++++++++++ 1 file changed, 78 insertions(+) create mode 100644 content/docs/user-guide/experiment-management/sharing.md diff --git a/content/docs/user-guide/experiment-management/sharing.md b/content/docs/user-guide/experiment-management/sharing.md new file mode 100644 index 0000000000..030f2f3e37 --- /dev/null +++ b/content/docs/user-guide/experiment-management/sharing.md @@ -0,0 +1,78 @@ + +# Sharing Experiments + +DVC has storing and sharing facilities like remotes or shared cache for tracked objects. In this section we discuss an alternative way to share the experiments without committing them to Git history or branch. + +## Prepare remotes to share experiments + +There are two types of remotes to store experiment objects. Git remotes are the locations that store Git repositories. A Github/Gitlab/Bitbucket repository is an example for Git remote. + +The other type of remote is the DVC remote which we add to a project using `dvc remote add` and manage using `dvc remote` subcommands. Basically DVC remotes have the same structure as cache, but live in the cloud. DVC uses these central locations to store and fetch binary files that doesn't normally fit into Git repositories. + +DVC experiments use both kinds of these remotes to store objects. + +Experiment objects that are normally tracked in Git are shared using Git remotes, and files tracked via DVC are shared using (Q: What about neither, `cache: false` objects and objects tracked both DVC and Git?) + +## Pushing experiments to remotes + +## Listing experiments in remotes + +## Pulling experiments from remotes + +---- + +BELOW is from GS:Experiments + +## Sharing Experiment + +After committing the best experiments to our Git branch, we can +[store and share](/doc/start/data-and-model-versioning#storing-and-sharing) them +remotely like any other iteration of the pipeline. + +```dvc +dvc push +git push +``` + +
+ +### 💡 Important information on storing experiments remotely. + +The commands in this section require both a `dvc remote default` and a +[Git remote](https://git-scm.com/book/en/v2/Git-Basics-Working-with-Remotes). A +DVC remote stores the experiment data, and a Git remote stores the code, +parameters, and other metadata associated with the experiment. DVC supports +various types of remote storage (local file system, SSH, Amazon S3, Google Cloud +Storage, HTTP, HDFS, etc.). The Git remote is often a central Git server +(GitHub, GitLab, BitBucket, etc.). + +
+ +Experiments that have not been made persistent will not be stored or shared +remotely through `dvc push` or `git push`. + +`dvc exp push` enables storing and sharing any experiment remotely. + +```dvc +$ dvc exp push gitremote exp-bfe64 +Pushed experiment 'exp-bfe64' to Git remote 'gitremote'. +``` + +`dvc exp list` shows all experiments that have been saved. + +```dvc +$ dvc exp list gitremote --all +72ed9cd: + exp-bfe64 +``` + +`dvc exp pull` retrieves the experiment from a Git remote. + +```dvc +$ dvc exp pull gitremote exp-bfe64 +Pulled experiment 'exp-bfe64' from Git remote 'gitremote'. +``` + +> All these commands take a Git remote as an argument. A `dvc remote default` is +> also required to share the experiment data. + From 7f611e6c05a113f400ca62de91ca9c5b166b9bd5 Mon Sep 17 00:00:00 2001 From: Emre Sahin Date: Fri, 9 Jul 2021 22:53:12 +0300 Subject: [PATCH 02/15] added remotes section and an example to clone/copy experiments --- .../experiment-management/sharing.md | 105 ++++++++++++++++-- 1 file changed, 96 insertions(+), 9 deletions(-) diff --git a/content/docs/user-guide/experiment-management/sharing.md b/content/docs/user-guide/experiment-management/sharing.md index 030f2f3e37..5012368882 100644 --- a/content/docs/user-guide/experiment-management/sharing.md +++ b/content/docs/user-guide/experiment-management/sharing.md @@ -1,17 +1,52 @@ - # Sharing Experiments -DVC has storing and sharing facilities like remotes or shared cache for tracked objects. In this section we discuss an alternative way to share the experiments without committing them to Git history or branch. +DVC has storing and sharing facilities like remotes or shared cache for tracked +objects. In this section we discuss an alternative way to share the experiments +without committing them to Git history or branch. ## Prepare remotes to share experiments -There are two types of remotes to store experiment objects. Git remotes are the locations that store Git repositories. A Github/Gitlab/Bitbucket repository is an example for Git remote. +There are two types of remotes to store experiment objects. Git remotes are the +locations that store Git repositories. A Github/Gitlab/Bitbucket repository is +an example for Git remote. + +The other type of remote is the DVC remote which we add to a project using +`dvc remote add` and manage using `dvc remote` subcommands. Basically DVC +remotes have the same structure as cache, but live in the cloud. +DVC uses these central locations to store and fetch binary files that doesn't +normally fit into Git repositories. + +DVC experiments use both kinds of these remotes to store objects. + +Experiment objects that are normally tracked in Git are shared using Git +remotes, and files tracked with DVC are shared using DVC remotes. Therefore both +of these sharing facilities should be set up for experiment sharing to work +correctly. + +Normally, there should already be a Git repository set up as `origin` when you +clone the project. To view Git remotes in a project, you can use `git remote -v` +command. + +```dvc +$ git remote -v +origin https://github.com/iterative/get-started-experiments (fetch) +origin https://github.com/iterative/get-started-experiments (push) +``` + +On the other hand, cached DVC files are stored in DVC remotes. You can get the +location of DVC remotes in a project using `dvc remote list` command. -The other type of remote is the DVC remote which we add to a project using `dvc remote add` and manage using `dvc remote` subcommands. Basically DVC remotes have the same structure as cache, but live in the cloud. DVC uses these central locations to store and fetch binary files that doesn't normally fit into Git repositories. +```dvc +$ dvc remote list +storage https://remote.dvc.org/get-started-experiments +``` -DVC experiments use both kinds of these remotes to store objects. +If there is not a DVC remote set up for your project, please refer to +`dvc remote add` documentation to add a remote to share DVC-cached files in the +experiments. -Experiment objects that are normally tracked in Git are shared using Git remotes, and files tracked via DVC are shared using (Q: What about neither, `cache: false` objects and objects tracked both DVC and Git?) +(Q: What about neither, `cache: false` objects and objects tracked both DVC and +Git?) ## Pushing experiments to remotes @@ -19,11 +54,64 @@ Experiment objects that are normally tracked in Git are shared using Git remotes ## Pulling experiments from remotes ----- +## Creating a separate directory for an experiment + +A very common use case for experiments is to create a separate local directory +for your work. You can do so by `dvc exp apply` and `dvc exp branch` commands, +but here we'll see how to use `dvc exp pull` to copy an experiment. + +Suppose there is project in `~/my-project` that you have many experiments and +would like to have a copy of a particular experiment named `exp-abc12` in this +project. + +You first clone the repository to another directory: + +```dvc +$ git clone ~/my-project ~/my-successful-experiment +$ cd ~/my-successful-experiment +``` + +Git sets `origin` of cloned repository to `~/my-project`, so when you list the +experiments in this new clone, you can see your all experiments in +`~/my-project`. + +```dvc +$ dvc exp list origin +main: + ... + exp-abc12 +``` + +If there is no central remote, and there is no means to set up one, you can +define the original repository's DVC cache as a _remote_ in the clone. + +```dvc +$ dvc remote add --local --default storage ~/my-project/.dvc/cache +``` + +If there is central remote for the project, assuming all DVC cache in +`~/my-project` repository is pushed to it, you can pull an experiment in the +clone: + +```dvc +$ dvc exp pull origin exp-abc12 +``` + +Then we can apply this experiment and get a workspace that contains all your +experiment files: + +```dvc +$ dvc exp apply exp-abc12 +``` + +Now you have a dedicated directory for your experiment that contains all your +artifacts. + +--- BELOW is from GS:Experiments -## Sharing Experiment +## Sharing Experiment After committing the best experiments to our Git branch, we can [store and share](/doc/start/data-and-model-versioning#storing-and-sharing) them @@ -75,4 +163,3 @@ Pulled experiment 'exp-bfe64' from Git remote 'gitremote'. > All these commands take a Git remote as an argument. A `dvc remote default` is > also required to share the experiment data. - From 7f5220d90c03d965d03ea78eb1862a3483bdde16 Mon Sep 17 00:00:00 2001 From: Emre Sahin Date: Sat, 10 Jul 2021 12:47:29 +0300 Subject: [PATCH 03/15] added dvc exp push section --- .../experiment-management/sharing.md | 30 +++++++++++++++++++ 1 file changed, 30 insertions(+) diff --git a/content/docs/user-guide/experiment-management/sharing.md b/content/docs/user-guide/experiment-management/sharing.md index 5012368882..b7cf2ce2dd 100644 --- a/content/docs/user-guide/experiment-management/sharing.md +++ b/content/docs/user-guide/experiment-management/sharing.md @@ -50,6 +50,36 @@ Git?) ## Pushing experiments to remotes +You can push an experiment to a Git repository using `dvc exp push`. + +```dvc +$ dvc exp push origin exp-abc123 +``` + +It requires the Git remote name and experiment name as arguments. + +It pushes the DVC tracked files in DVC cache to DVC remote automatically. If you +want to prevent this behavior and not push these files, you can use `--no-cache` +flag. + +DVC uses the default remote for pushing files in the DVC cache. If there is not +a default DVC remote, it asks to define one by `dvc remote default `. If +you don't want to have a default remote, or if there are more than one DVC +remote defined in the project, you can select the remote that will be used by +`--remote / -r` option. + +DVC is able to use multiple processes to push DVC-cached files. Set the number +of jobs with `--jobs / -j` option. Please note that increase in performance is +dependent to available bandwidth and remote (cloud) server configurations. For +very large number of jobs, you may have side effects in your local network or +system. + +DVC has a caching mechanism called _Run-Cache_ that stores the artifacts from +intermediate stages. For example, if there is an intermediate step that applies +data-augmentation on the dataset and you would like to push these artifacts as +well as the end products of the experiments, you can use `--run-cache` flag to +push all of these to the DVC remote. + ## Listing experiments in remotes ## Pulling experiments from remotes From e44c25072ce9fa4bb2817fe6dc82dbe40f0262b9 Mon Sep 17 00:00:00 2001 From: Emre Sahin Date: Tue, 13 Jul 2021 09:54:52 +0300 Subject: [PATCH 04/15] added link to the sidebar --- content/docs/sidebar.json | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/sidebar.json b/content/docs/sidebar.json index 3ba08943ce..af187313b1 100644 --- a/content/docs/sidebar.json +++ b/content/docs/sidebar.json @@ -139,7 +139,7 @@ "label": "Experiment Management", "slug": "experiment-management", "source": "experiment-management/index.md", - "children": ["checkpoints"] + "children": ["sharing", "checkpoints"] }, "setup-google-drive-remote", "large-dataset-optimization", From e69d2cf628eac2efbf275522d8bf135dd358d952 Mon Sep 17 00:00:00 2001 From: Emre Sahin Date: Tue, 13 Jul 2021 09:56:48 +0300 Subject: [PATCH 05/15] added dvc exp list explanation --- .../experiment-management/sharing.md | 49 +++++++++++++++++++ 1 file changed, 49 insertions(+) diff --git a/content/docs/user-guide/experiment-management/sharing.md b/content/docs/user-guide/experiment-management/sharing.md index b7cf2ce2dd..276d55c5cb 100644 --- a/content/docs/user-guide/experiment-management/sharing.md +++ b/content/docs/user-guide/experiment-management/sharing.md @@ -82,6 +82,55 @@ push all of these to the DVC remote. ## Listing experiments in remotes +DVC stores the experiments in Git repositories. In order to list an experiment +in a repository, you can use `dvc exp list` command. + +With no command line options, this command lists the experiments in the current +repository. You can supply a Git remote name to list the experiments. + +```dvc +$ dvc exp list origin +main: + cnn-128 + cnn-32 + cnn-64 + cnn-96 +``` + +`dvc exp list ` lists the experiments that are referenced by the +_current commit._ If you would like to list all experiments referenced from +other branches and commit, use `--all` flag. + +```dvc +$ dvc exp list origin --all +0b5bedd: + exp-9edbe +0f73830: + exp-280e9 + exp-4cd96 + exp-65d0a +172b1b9: + exp-7424d +190e697: + exp-ec039 +3426c9e: + exp-0680e +39afbbc: + exp-21155 + ... +``` + +When you don't need the parent commits', you can just get the names with +`--names-only` option. + +```dvc +$ dvc exp list origin --names-only +cnn-128 +cnn-32 +cnn-64 +cnn-96 +``` + ## Pulling experiments from remotes ## Creating a separate directory for an experiment From 7c07d9ed71ff6ff7422a7d87482390bb89c7c189 Mon Sep 17 00:00:00 2001 From: Emre Sahin Date: Tue, 13 Jul 2021 10:45:07 +0300 Subject: [PATCH 06/15] added dvc exp pull section --- .../experiment-management/sharing.md | 24 +++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/content/docs/user-guide/experiment-management/sharing.md b/content/docs/user-guide/experiment-management/sharing.md index 276d55c5cb..ca6448f2a0 100644 --- a/content/docs/user-guide/experiment-management/sharing.md +++ b/content/docs/user-guide/experiment-management/sharing.md @@ -133,6 +133,30 @@ cnn-96 ## Pulling experiments from remotes +When you clone a DVC repository from a Git remote, it doesn't clone any +experiments. In order to get the experiments, use `dvc exp pull` command with +the Git remote and the experiment name. + +```dvc +$ dvc exp pull origin cnn-64 +``` + +It pulls all the text files from Git repository and DVC tracked files from DVC +remote. You need to have both of these remote configured in your project. See +[above](#prepare-remotes-to-share-experiments) for information regarding remote +configuration. + +When you don't have a default DVC remote, or would like to ask DVC to use a +particular remote, you can specify it with `--remote` / `-r` option. + +DVC can use more than one process to pull DVC tracked files from remotes. You +can set the number of processes to pull by `--jobs` / `-j` option. By default +DVC uses `4 * cpu_count()` processes for non-SSH remotes, and `4` for SSH +remotes. + +If there is already an experiment in the current repository with the name you're +trying to pull, DVC won't overwrite it unless you supply `--force` flag. + ## Creating a separate directory for an experiment A very common use case for experiments is to create a separate local directory From abad7fd352cc9166d4138d9b843c50dbe7acc63d Mon Sep 17 00:00:00 2001 From: Emre Sahin Date: Tue, 13 Jul 2021 20:13:24 +0300 Subject: [PATCH 07/15] minor modifications --- .../experiment-management/sharing.md | 57 ++++++++++++------- 1 file changed, 36 insertions(+), 21 deletions(-) diff --git a/content/docs/user-guide/experiment-management/sharing.md b/content/docs/user-guide/experiment-management/sharing.md index ca6448f2a0..5afaab673d 100644 --- a/content/docs/user-guide/experiment-management/sharing.md +++ b/content/docs/user-guide/experiment-management/sharing.md @@ -1,19 +1,19 @@ # Sharing Experiments DVC has storing and sharing facilities like remotes or shared cache for tracked -objects. In this section we discuss an alternative way to share the experiments +files. In this section we discuss an alternative way to share the experiments without committing them to Git history or branch. ## Prepare remotes to share experiments -There are two types of remotes to store experiment objects. Git remotes are the +There are two types of remotes to store experiment objects: Git remotes are the locations that store Git repositories. A Github/Gitlab/Bitbucket repository is -an example for Git remote. +an example of a Git remote. The other type of remote is the DVC remote which we add to a project using `dvc remote add` and manage using `dvc remote` subcommands. Basically DVC -remotes have the same structure as cache, but live in the cloud. -DVC uses these central locations to store and fetch binary files that doesn't +remotes have the same structure as DVC cache, but live in the +cloud. DVC uses these locations to store and fetch binary files that doesn't normally fit into Git repositories. DVC experiments use both kinds of these remotes to store objects. @@ -66,24 +66,26 @@ DVC uses the default remote for pushing files in the DVC cache. If there is not a default DVC remote, it asks to define one by `dvc remote default `. If you don't want to have a default remote, or if there are more than one DVC remote defined in the project, you can select the remote that will be used by -`--remote / -r` option. +`--remote` / `-r` option. -DVC is able to use multiple processes to push DVC-cached files. Set the number -of jobs with `--jobs / -j` option. Please note that increase in performance is -dependent to available bandwidth and remote (cloud) server configurations. For -very large number of jobs, you may have side effects in your local network or -system. +DVC is uses multiple processes to push DVC-cached files. By default DVC uses `4 + +- cpu_count()`processes to push the files. You can set the number of processes with`--jobs`/`-j` + option. Please note that increase in performance is dependent to available + bandwidth and remote (cloud) server configurations. For very large number of + jobs, you may have side effects in your local network or system. DVC has a caching mechanism called _Run-Cache_ that stores the artifacts from intermediate stages. For example, if there is an intermediate step that applies -data-augmentation on the dataset and you would like to push these artifacts as +data-augmentation on your dataset and you would like to push these artifacts as well as the end products of the experiments, you can use `--run-cache` flag to -push all of these to the DVC remote. +push all of these to the DVC remote. `--run-cache` flag pushes all artifacts +referenced in `dvc.lock` file. ## Listing experiments in remotes -DVC stores the experiments in Git repositories. In order to list an experiment -in a repository, you can use `dvc exp list` command. +DVC stores the experiments in Git repositories. In order to list experiments in +a repository, you can use `dvc exp list` command. With no command line options, this command lists the experiments in the current repository. You can supply a Git remote name to list the experiments. @@ -151,12 +153,23 @@ particular remote, you can specify it with `--remote` / `-r` option. DVC can use more than one process to pull DVC tracked files from remotes. You can set the number of processes to pull by `--jobs` / `-j` option. By default -DVC uses `4 * cpu_count()` processes for non-SSH remotes, and `4` for SSH -remotes. +DVC uses `4 * cpu_count()` processes for non-SSH remotes, and `4` processes for +SSH remotes. If there is already an experiment in the current repository with the name you're trying to pull, DVC won't overwrite it unless you supply `--force` flag. +## Pulling all experiments from a remote + +Assuming all experiments have distinct names, you can create a loop to pull all +experiments from `origin` like the following. + +```dvc +dvc exp list --all --names-only | while read -r expname ; do + dvc exp pull origin ${expname} +done +``` + ## Creating a separate directory for an experiment A very common use case for experiments is to create a separate local directory @@ -183,16 +196,18 @@ $ dvc exp list origin main: ... exp-abc12 + ... ``` -If there is no central remote, and there is no means to set up one, you can -define the original repository's DVC cache as a _remote_ in the clone. +If there is no DVC remote in the original repository, and there is no means to +set up one, you can define the original repository's DVC cache as a _remote_ in +the clone. ```dvc $ dvc remote add --local --default storage ~/my-project/.dvc/cache ``` -If there is central remote for the project, assuming all DVC cache in +If there is a DVC remote for the project, assuming all DVC cache in `~/my-project` repository is pushed to it, you can pull an experiment in the clone: @@ -212,7 +227,7 @@ artifacts. --- -BELOW is from GS:Experiments +> BELOW is from GS:Experiments and will be removed ## Sharing Experiment From 59afc43562568768920abee00f345895e2f12985 Mon Sep 17 00:00:00 2001 From: Emre Sahin Date: Wed, 14 Jul 2021 10:34:50 +0300 Subject: [PATCH 08/15] minor fixes + process -> thread --- .../experiment-management/sharing.md | 22 +++++++++---------- 1 file changed, 10 insertions(+), 12 deletions(-) diff --git a/content/docs/user-guide/experiment-management/sharing.md b/content/docs/user-guide/experiment-management/sharing.md index 5afaab673d..6d3d5c51e6 100644 --- a/content/docs/user-guide/experiment-management/sharing.md +++ b/content/docs/user-guide/experiment-management/sharing.md @@ -68,12 +68,11 @@ you don't want to have a default remote, or if there are more than one DVC remote defined in the project, you can select the remote that will be used by `--remote` / `-r` option. -DVC is uses multiple processes to push DVC-cached files. By default DVC uses `4 - -- cpu_count()`processes to push the files. You can set the number of processes with`--jobs`/`-j` - option. Please note that increase in performance is dependent to available - bandwidth and remote (cloud) server configurations. For very large number of - jobs, you may have side effects in your local network or system. +DVC uses multiple threads to push DVC-cached files. By default DVC uses 4x +`cpu_count()` threads to push the files. You can set the number of threads +with`--jobs`/`-j` option. Please note that increase in performance is dependent +to available bandwidth and remote (cloud) server configurations. For very large +number of jobs, you may have side effects in your local network or system. DVC has a caching mechanism called _Run-Cache_ that stores the artifacts from intermediate stages. For example, if there is an intermediate step that applies @@ -151,10 +150,9 @@ configuration. When you don't have a default DVC remote, or would like to ask DVC to use a particular remote, you can specify it with `--remote` / `-r` option. -DVC can use more than one process to pull DVC tracked files from remotes. You -can set the number of processes to pull by `--jobs` / `-j` option. By default -DVC uses `4 * cpu_count()` processes for non-SSH remotes, and `4` processes for -SSH remotes. +DVC can use more than one thread to pull DVC tracked files from remotes. You can +set the number of threads to pull by `--jobs` / `-j` option. By default DVC uses +4 x `cpu_count()` threads for non-SSH remotes, and 4 threads for SSH remotes. If there is already an experiment in the current repository with the name you're trying to pull, DVC won't overwrite it unless you supply `--force` flag. @@ -165,8 +163,8 @@ Assuming all experiments have distinct names, you can create a loop to pull all experiments from `origin` like the following. ```dvc -dvc exp list --all --names-only | while read -r expname ; do - dvc exp pull origin ${expname} +$ dvc exp list --all --names-only | while read -r expname ; do \ + dvc exp pull origin ${expname} \ done ``` From e4a5cfbfeabc15c611c40a3535ec8a024988cc02 Mon Sep 17 00:00:00 2001 From: Emre Sahin Date: Fri, 16 Jul 2021 08:40:11 +0300 Subject: [PATCH 09/15] changed sidebar label --- content/docs/sidebar.json | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/content/docs/sidebar.json b/content/docs/sidebar.json index af187313b1..78267116bd 100644 --- a/content/docs/sidebar.json +++ b/content/docs/sidebar.json @@ -139,7 +139,13 @@ "label": "Experiment Management", "slug": "experiment-management", "source": "experiment-management/index.md", - "children": ["sharing", "checkpoints"] + "children": [ + { + "slug": "sharing", + "label": "Sharing Experiments" + }, + "checkpoints" + ] }, "setup-google-drive-remote", "large-dataset-optimization", @@ -535,7 +541,6 @@ "label": "User Guide", "slug": "user-guide", "source": "user-guide/index.md", - "children": [ { "label": "Prepare Your Repositories", @@ -555,7 +560,6 @@ "share-view" ] }, - { "label": "Explore ML Experiments", "slug": "explore-experiments" From c633ce52e91901146573484c9bc8cb905685d493 Mon Sep 17 00:00:00 2001 From: Emre Sahin Date: Fri, 16 Jul 2021 08:41:43 +0300 Subject: [PATCH 10/15] removed copied part from gs:experiments --- .../experiment-management/sharing.md | 57 ------------------- 1 file changed, 57 deletions(-) diff --git a/content/docs/user-guide/experiment-management/sharing.md b/content/docs/user-guide/experiment-management/sharing.md index 6d3d5c51e6..916bb70b57 100644 --- a/content/docs/user-guide/experiment-management/sharing.md +++ b/content/docs/user-guide/experiment-management/sharing.md @@ -222,60 +222,3 @@ $ dvc exp apply exp-abc12 Now you have a dedicated directory for your experiment that contains all your artifacts. - ---- - -> BELOW is from GS:Experiments and will be removed - -## Sharing Experiment - -After committing the best experiments to our Git branch, we can -[store and share](/doc/start/data-and-model-versioning#storing-and-sharing) them -remotely like any other iteration of the pipeline. - -```dvc -dvc push -git push -``` - -
- -### 💡 Important information on storing experiments remotely. - -The commands in this section require both a `dvc remote default` and a -[Git remote](https://git-scm.com/book/en/v2/Git-Basics-Working-with-Remotes). A -DVC remote stores the experiment data, and a Git remote stores the code, -parameters, and other metadata associated with the experiment. DVC supports -various types of remote storage (local file system, SSH, Amazon S3, Google Cloud -Storage, HTTP, HDFS, etc.). The Git remote is often a central Git server -(GitHub, GitLab, BitBucket, etc.). - -
- -Experiments that have not been made persistent will not be stored or shared -remotely through `dvc push` or `git push`. - -`dvc exp push` enables storing and sharing any experiment remotely. - -```dvc -$ dvc exp push gitremote exp-bfe64 -Pushed experiment 'exp-bfe64' to Git remote 'gitremote'. -``` - -`dvc exp list` shows all experiments that have been saved. - -```dvc -$ dvc exp list gitremote --all -72ed9cd: - exp-bfe64 -``` - -`dvc exp pull` retrieves the experiment from a Git remote. - -```dvc -$ dvc exp pull gitremote exp-bfe64 -Pulled experiment 'exp-bfe64' from Git remote 'gitremote'. -``` - -> All these commands take a Git remote as an argument. A `dvc remote default` is -> also required to share the experiment data. From c1b56fe741ab2eb0edeb4756fab16ee9c1245e5b Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Tue, 20 Jul 2021 06:52:09 +0000 Subject: [PATCH 11/15] guide: move exp/sharing to full slug per https://github.com/iterative/dvc.org/pull/2618#pullrequestreview-710227061 --- content/docs/sidebar.json | 8 +------- .../{sharing.md => sharing-experiments.md} | 0 2 files changed, 1 insertion(+), 7 deletions(-) rename content/docs/user-guide/experiment-management/{sharing.md => sharing-experiments.md} (100%) diff --git a/content/docs/sidebar.json b/content/docs/sidebar.json index 78267116bd..89ec1a042d 100644 --- a/content/docs/sidebar.json +++ b/content/docs/sidebar.json @@ -139,13 +139,7 @@ "label": "Experiment Management", "slug": "experiment-management", "source": "experiment-management/index.md", - "children": [ - { - "slug": "sharing", - "label": "Sharing Experiments" - }, - "checkpoints" - ] + "children": ["sharing-experiments", "checkpoints"] }, "setup-google-drive-remote", "large-dataset-optimization", diff --git a/content/docs/user-guide/experiment-management/sharing.md b/content/docs/user-guide/experiment-management/sharing-experiments.md similarity index 100% rename from content/docs/user-guide/experiment-management/sharing.md rename to content/docs/user-guide/experiment-management/sharing-experiments.md From df8ad04317390a7ef259112cc355752a02042718 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Tue, 20 Jul 2021 08:09:03 +0000 Subject: [PATCH 12/15] guide: copy edit exp/sharing --- .../sharing-experiments.md | 196 ++++++++---------- 1 file changed, 87 insertions(+), 109 deletions(-) diff --git a/content/docs/user-guide/experiment-management/sharing-experiments.md b/content/docs/user-guide/experiment-management/sharing-experiments.md index 916bb70b57..760839d5c3 100644 --- a/content/docs/user-guide/experiment-management/sharing-experiments.md +++ b/content/docs/user-guide/experiment-management/sharing-experiments.md @@ -1,31 +1,27 @@ # Sharing Experiments -DVC has storing and sharing facilities like remotes or shared cache for tracked -files. In this section we discuss an alternative way to share the experiments -without committing them to Git history or branch. +There are two types of remotes that can store experiments. Git remotes are +distributed copies of the Git repository, for example on GitHub or GitLab. -## Prepare remotes to share experiments +[DVC remotes](/doc/command-reference/remote) on the other hand are +storage-specific locations (e.g. Amazon S3 or Google Drive) which we can +configure with `dvc remote`. DVC uses them to store and fetch large files that +don't normally fit inside Git repos. -There are two types of remotes to store experiment objects: Git remotes are the -locations that store Git repositories. A Github/Gitlab/Bitbucket repository is -an example of a Git remote. +DVC needs both kinds of remotes for backing up and sharing experiments. -The other type of remote is the DVC remote which we add to a project using -`dvc remote add` and manage using `dvc remote` subcommands. Basically DVC -remotes have the same structure as DVC cache, but live in the -cloud. DVC uses these locations to store and fetch binary files that doesn't -normally fit into Git repositories. +Experiment files that are normally tracked in Git (like code versions) are +shared using Git remotes, and files or directories tracked with DVC (like +datasets) are shared using DVC remotes. -DVC experiments use both kinds of these remotes to store objects. +> See [Git remotes guide] and `dvc remote add` for information on setting them +> up. -Experiment objects that are normally tracked in Git are shared using Git -remotes, and files tracked with DVC are shared using DVC remotes. Therefore both -of these sharing facilities should be set up for experiment sharing to work -correctly. +[git remotes guide]: + https://git-scm.com/book/en/v2/Git-Basics-Working-with-Remotes -Normally, there should already be a Git repository set up as `origin` when you -clone the project. To view Git remotes in a project, you can use `git remote -v` -command. +Normally, there should already be a Git remote called `origin` when you clone a +repo. Use `git remote -v` to list your Git remotes: ```dvc $ git remote -v @@ -33,61 +29,55 @@ origin https://github.com/iterative/get-started-experiments (fetch) origin https://github.com/iterative/get-started-experiments (push) ``` -On the other hand, cached DVC files are stored in DVC remotes. You can get the -location of DVC remotes in a project using `dvc remote list` command. +Similarly, you can see the DVC remotes in you project using `dvc remote list`: ```dvc $ dvc remote list storage https://remote.dvc.org/get-started-experiments ``` -If there is not a DVC remote set up for your project, please refer to -`dvc remote add` documentation to add a remote to share DVC-cached files in the -experiments. +_Q: What about neither, `cache: false` objects, and objects tracked both by DVC +and Git?_ -(Q: What about neither, `cache: false` objects and objects tracked both DVC and -Git?) +## Uploading experiments to remotes -## Pushing experiments to remotes - -You can push an experiment to a Git repository using `dvc exp push`. +You can upload an experiment and its files to both remotes using `dvc exp push` +(requires the Git remote name and experiment name as arguments). ```dvc $ dvc exp push origin exp-abc123 ``` -It requires the Git remote name and experiment name as arguments. +> Use `dvc exp show` to find experiment names. -It pushes the DVC tracked files in DVC cache to DVC remote automatically. If you -want to prevent this behavior and not push these files, you can use `--no-cache` -flag. +This pushes the necessary DVC-tracked files from the cache to the default DVC +remote (similar to `dvc push`). You can prevent this behavior by using the +`--no-cache` option to the command above. -DVC uses the default remote for pushing files in the DVC cache. If there is not -a default DVC remote, it asks to define one by `dvc remote default `. If -you don't want to have a default remote, or if there are more than one DVC -remote defined in the project, you can select the remote that will be used by -`--remote` / `-r` option. +If there's no default DVC remote, it will ask you to define one with +`dvc remote default`. If you don't want a default remote, or if you want to use +a different remote, you can specify one with the `--remote` (`-r`) option. -DVC uses multiple threads to push DVC-cached files. By default DVC uses 4x -`cpu_count()` threads to push the files. You can set the number of threads -with`--jobs`/`-j` option. Please note that increase in performance is dependent -to available bandwidth and remote (cloud) server configurations. For very large -number of jobs, you may have side effects in your local network or system. +DVC can use multiple threads to upload files (4 per CPU core by default). You +can set the number with `--jobs` (`-j`). Please note that increases in +performance also depend on the connection bandwidth and remote configurations. -DVC has a caching mechanism called _Run-Cache_ that stores the artifacts from +DVC has a mechanism called the [run-cache] that stores the artifacts from intermediate stages. For example, if there is an intermediate step that applies data-augmentation on your dataset and you would like to push these artifacts as well as the end products of the experiments, you can use `--run-cache` flag to push all of these to the DVC remote. `--run-cache` flag pushes all artifacts referenced in `dvc.lock` file. -## Listing experiments in remotes +[run-cache]: /doc/user-guide/project-structure/internal-files#run-cache + +## Listing experiments remotely -DVC stores the experiments in Git repositories. In order to list experiments in -a repository, you can use `dvc exp list` command. +In order to list experiments in a DVC project, you can use the `dvc exp list` +command. With no command line options, it lists the experiments in the current +project. -With no command line options, this command lists the experiments in the current -repository. You can supply a Git remote name to list the experiments. +You can supply a Git remote name to list the experiments: ```dvc $ dvc exp list origin @@ -98,9 +88,9 @@ main: cnn-96 ``` -`dvc exp list ` lists the experiments that are referenced by the -_current commit._ If you would like to list all experiments referenced from -other branches and commit, use `--all` flag. +Note that by default this only lists experiments derived from the current commit +(local `HEAD` or default remote branch). You can list all the experiments +(derived from from every branch and commit) with the `--all` option: ```dvc $ dvc exp list origin --all @@ -109,20 +99,14 @@ $ dvc exp list origin --all 0f73830: exp-280e9 exp-4cd96 - exp-65d0a -172b1b9: - exp-7424d -190e697: - exp-ec039 -3426c9e: - exp-0680e -39afbbc: - exp-21155 - ... + ... +main: + cnn-128 + ... ``` -When you don't need the parent commits', you can just get the names with -`--names-only` option. +When you don't need to see the parent commits, you can list experiment names +only, with `--names-only`: ```dvc $ dvc exp list origin --names-only @@ -132,35 +116,32 @@ cnn-64 cnn-96 ``` -## Pulling experiments from remotes +## Downloading experiments from remotes -When you clone a DVC repository from a Git remote, it doesn't clone any -experiments. In order to get the experiments, use `dvc exp pull` command with -the Git remote and the experiment name. +When you clone a DVC repository, it doesn't fetch any experiments by default. In +order to get them, use `dvc exp pull` (with the Git remote and the experiment +name), for example: ```dvc $ dvc exp pull origin cnn-64 ``` -It pulls all the text files from Git repository and DVC tracked files from DVC -remote. You need to have both of these remote configured in your project. See -[above](#prepare-remotes-to-share-experiments) for information regarding remote -configuration. +This pulls all the necessary files from both remotes. Again, you need to have +both of these configured (see this +[earlier section](#prepare-remotes-to-share-experiments)). -When you don't have a default DVC remote, or would like to ask DVC to use a -particular remote, you can specify it with `--remote` / `-r` option. +You can specify a remote to pull from with `--remote` (`-r`). -DVC can use more than one thread to pull DVC tracked files from remotes. You can -set the number of threads to pull by `--jobs` / `-j` option. By default DVC uses -4 x `cpu_count()` threads for non-SSH remotes, and 4 threads for SSH remotes. +DVC can use multiple threads to download files (4 per CPU core typically). You +can set the number with `--jobs` (`-j`). -If there is already an experiment in the current repository with the name you're -trying to pull, DVC won't overwrite it unless you supply `--force` flag. +If an experiment being pulled already exists in the local project, DVC won't +overwrite it unless you supply `--force`. -## Pulling all experiments from a remote +### Pulling all experiments -Assuming all experiments have distinct names, you can create a loop to pull all -experiments from `origin` like the following. +You can create a loop to pull all experiments from `origin` (Git remote) like +this: ```dvc $ dvc exp list --all --names-only | while read -r expname ; do \ @@ -168,57 +149,54 @@ $ dvc exp list --all --names-only | while read -r expname ; do \ done ``` -## Creating a separate directory for an experiment +## Example: Creating a directory for an experiment -A very common use case for experiments is to create a separate local directory -for your work. You can do so by `dvc exp apply` and `dvc exp branch` commands, -but here we'll see how to use `dvc exp pull` to copy an experiment. +A good way to isolate experiments is to create a separate home directory for +each one. -Suppose there is project in `~/my-project` that you have many experiments and -would like to have a copy of a particular experiment named `exp-abc12` in this -project. +> Another alternative is to use `dvc exp apply` and `dvc exp branch`, but here +> we'll see how to use `dvc exp pull` to copy an experiment. + +Suppose there is a DVC repository in `~/my-project` with multiple +experiments. Let's create a copy of experiment `exp-abc12` from there. -You first clone the repository to another directory: +First, clone the repo into another directory: ```dvc -$ git clone ~/my-project ~/my-successful-experiment -$ cd ~/my-successful-experiment +$ git clone ~/my-project ~/my-experiment +$ cd ~/my-experiment ``` -Git sets `origin` of cloned repository to `~/my-project`, so when you list the -experiments in this new clone, you can see your all experiments in -`~/my-project`. +Git sets the `origin` remote of the cloned repo to `~/my-project`, so you can +see your all experiments from `~/my-experiment` like this: ```dvc $ dvc exp list origin main: - ... - exp-abc12 - ... + exp-abc12 + ... ``` -If there is no DVC remote in the original repository, and there is no means to -set up one, you can define the original repository's DVC cache as a _remote_ in -the clone. +If there is no DVC remote in the original repository, you can define its +cache as the clone's `dvc remote`: ```dvc $ dvc remote add --local --default storage ~/my-project/.dvc/cache ``` -If there is a DVC remote for the project, assuming all DVC cache in -`~/my-project` repository is pushed to it, you can pull an experiment in the -clone: +If there is a DVC remote for the project, assuming the experiments have been +pushed there, you can pull the one in question: ```dvc $ dvc exp pull origin exp-abc12 ``` -Then we can apply this experiment and get a workspace that contains all your -experiment files: +Then we can `dvc apply` this experiment and get a workspace that +contains all of its files: ```dvc $ dvc exp apply exp-abc12 ``` -Now you have a dedicated directory for your experiment that contains all your -artifacts. +Now you have a dedicated directory for your experiment, containing all its +artifacts! From 93ac4406a02ebd6f5fdf9b2fb239f7c53142d667 Mon Sep 17 00:00:00 2001 From: Emre Sahin Date: Tue, 20 Jul 2021 20:00:15 +0300 Subject: [PATCH 13/15] tabs to spaces --- .../sharing-experiments.md | 23 ++++++++----------- 1 file changed, 10 insertions(+), 13 deletions(-) diff --git a/content/docs/user-guide/experiment-management/sharing-experiments.md b/content/docs/user-guide/experiment-management/sharing-experiments.md index 760839d5c3..8f79285af2 100644 --- a/content/docs/user-guide/experiment-management/sharing-experiments.md +++ b/content/docs/user-guide/experiment-management/sharing-experiments.md @@ -36,9 +36,6 @@ $ dvc remote list storage https://remote.dvc.org/get-started-experiments ``` -_Q: What about neither, `cache: false` objects, and objects tracked both by DVC -and Git?_ - ## Uploading experiments to remotes You can upload an experiment and its files to both remotes using `dvc exp push` @@ -82,10 +79,10 @@ You can supply a Git remote name to list the experiments: ```dvc $ dvc exp list origin main: - cnn-128 - cnn-32 - cnn-64 - cnn-96 + cnn-128 + cnn-32 + cnn-64 + cnn-96 ``` Note that by default this only lists experiments derived from the current commit @@ -95,14 +92,14 @@ Note that by default this only lists experiments derived from the current commit ```dvc $ dvc exp list origin --all 0b5bedd: - exp-9edbe + exp-9edbe 0f73830: - exp-280e9 - exp-4cd96 - ... + exp-280e9 + exp-4cd96 + ... main: - cnn-128 - ... + cnn-128 + ... ``` When you don't need to see the parent commits, you can list experiment names From c9eb1882ddf2f2e51db02c83453c3a3436c877a3 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 21 Jul 2021 06:22:03 +0000 Subject: [PATCH 14/15] guide: no run-cache details in Sharing Exps and make example about exp push/pull --all per https://github.com/iterative/dvc.org/pull/2618#pullrequestreview-710334057 and https://github.com/iterative/dvc.org/pull/2618#pullrequestreview-710339863 --- .../experiment-management/sharing-experiments.md | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/content/docs/user-guide/experiment-management/sharing-experiments.md b/content/docs/user-guide/experiment-management/sharing-experiments.md index 8f79285af2..bc772553fa 100644 --- a/content/docs/user-guide/experiment-management/sharing-experiments.md +++ b/content/docs/user-guide/experiment-management/sharing-experiments.md @@ -59,12 +59,7 @@ DVC can use multiple threads to upload files (4 per CPU core by default). You can set the number with `--jobs` (`-j`). Please note that increases in performance also depend on the connection bandwidth and remote configurations. -DVC has a mechanism called the [run-cache] that stores the artifacts from -intermediate stages. For example, if there is an intermediate step that applies -data-augmentation on your dataset and you would like to push these artifacts as -well as the end products of the experiments, you can use `--run-cache` flag to -push all of these to the DVC remote. `--run-cache` flag pushes all artifacts -referenced in `dvc.lock` file. +> 📖 See also the [run-cache] mechanism. [run-cache]: /doc/user-guide/project-structure/internal-files#run-cache @@ -135,10 +130,9 @@ can set the number with `--jobs` (`-j`). If an experiment being pulled already exists in the local project, DVC won't overwrite it unless you supply `--force`. -### Pulling all experiments +### Example: Pushing or pulling multiple experiments -You can create a loop to pull all experiments from `origin` (Git remote) like -this: +You can create a loop to upload or download all experiments like this: ```dvc $ dvc exp list --all --names-only | while read -r expname ; do \ @@ -146,6 +140,9 @@ $ dvc exp list --all --names-only | while read -r expname ; do \ done ``` +> Without `--all`, only the experiments derived from the current commit will be +> pushed/pulled. + ## Example: Creating a directory for an experiment A good way to isolate experiments is to create a separate home directory for From fbdb8c83834f31fdf78c07e83536fb5c10680b51 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Date: Wed, 21 Jul 2021 06:32:34 +0000 Subject: [PATCH 15/15] cases: note importance of remote add --local in example about cloning individual experiments per https://github.com/iterative/dvc.org/pull/2618#pullrequestreview-710342736 --- .../user-guide/experiment-management/sharing-experiments.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/content/docs/user-guide/experiment-management/sharing-experiments.md b/content/docs/user-guide/experiment-management/sharing-experiments.md index bc772553fa..cd6103b019 100644 --- a/content/docs/user-guide/experiment-management/sharing-experiments.md +++ b/content/docs/user-guide/experiment-management/sharing-experiments.md @@ -178,7 +178,10 @@ If there is no DVC remote in the original repository, you can define its $ dvc remote add --local --default storage ~/my-project/.dvc/cache ``` -If there is a DVC remote for the project, assuming the experiments have been +> ⚠️ `--local` is important here, so that the configuration change doesn't get +> to the original repo accidentally. + +If there's a DVC remote for the project, assuming the experiments have been pushed there, you can pull the one in question: ```dvc