iterative · dashohoxha · Nov 12, 2019 · Nov 25, 2019 · Nov 25, 2019 · shcheklein
diff --git a/src/Documentation/sidebar.json b/src/Documentation/sidebar.json
@@ -119,6 +119,29 @@
         "label": "Managing External Data",
         "slug": "managing-external-data"
       },
+      {
+        "label": "Data Sharing",
+        "slug": "data-sharing",
+        "source": "data-sharing/index.md",
+        "children": [
+          {
+            "label": "Remote DVC Storage",
+            "slug": "remote-storage"
+          },
+          {
+            "label": "Shared Development Server",
+            "slug": "shared-server"
+          },
+          {
+            "label": "Mounted DVC Storage",
+            "slug": "mounted-storage"
+          },
+          {
+            "label": "Synced DVC Storage",
+            "slug": "synced-storage"
+          }
+        ]
+      },
       {
         "label": "Contributing",
         "slug": "contributing",

diff --git a/static/docs/user-guide/data-sharing/index.md b/static/docs/user-guide/data-sharing/index.md
@@ -0,0 +1,43 @@
+# Data Sharing and Collaboration with DVC
+
+Like Git, DVC facilitates collaboration and data sharing on a distributed
+environment. It makes it easy to consistently get all your data files and
+directories to any machine, along with matching source code.
+
+![](/static/img/model-sharing-digram.png)
+
+There are several ways to setup data sharing with DVC. We will discuss the most
+common scenarios.
+
+- [Sharing Data Through a Remote DVC Storage](/doc/user-guide/data-sharing/remote-storage)
+
+  This is the recommended and the most common case of data sharing. In this case
+  we setup a [remote storage](/doc/command-reference/remote) on a data storage
+  provider, to store data files online, where others can reach them. Currently
+  DVC supports Amazon S3, Google Cloud Storage, Microsoft Azure Blob Storage,
+  SSH, HDFS, and other remote locations, and the list is constantly growing.
+
+- [Using Local Storage on a Shared Development Server](/doc/user-guide/data-sharing/shared-server)
+
+  Some teams may prefer using a single shared machine to run their experiments.
+  This allows them to have better resource utilization such as the ability to
+  use multiple GPUs, etc. In this case we can use a local data storage, which
+  allows the team to store and share data very efficiently, with no duplication
+  of data files and instantaneous transfer.
+
+- [Sharing Data Through a Mounted DVC Storage](/doc/user-guide/data-sharing/mounted-storage)
+
+  If the data storage server (or provider) has a protocol that is not supported
+  yet by DVC, but it allows us to mount a remote directory on the local
+  filesystem, then we can still make a setup for data sharing with DVC. This
+  case might be useful for example when the data files are located on a
+  network-attached storage (NAS) and can be accessed through protocols like NFS,
+  Samba, SSHFS, etc.
+
+- [Sharing Data Through a Synchronized DVC Storage](/doc/user-guide/data-sharing/synched-storage)
+
+  There are cloud data storage providers that are not supported yet by DVC. But
+  this does not mean that we cannot use them to share data with the help of DVC.
+  If it is possible to synchronize a local directory with a remote one (which is
+  supported by almost all storage providers), then we are good to go. We can
+  make a setup that allows us to share DVC data.
diff --git a/static/docs/user-guide/data-sharing/mounted-storage.md b/static/docs/user-guide/data-sharing/mounted-storage.md
@@ -0,0 +1,115 @@
+# Sharing Data Through a Mounted DVC Storage
+
+If the data storage server (or provider) has a protocol that is not supported
+yet by DVC, but it allows us to mount a remote directory on the local
+filesystem, then we can still make a setup for data sharing with DVC.
+
+This case might be useful when the data files are located on a network-attached
+storage (NAS), for example, and can be accessed through protocols like NFS,
+Samba, SSHFS, etc.
+
+## SSHFS Mounted Storage Example
+
+In this example we will see how to share data with the help of a storage
+directory that is mounted through SSHFS. Normally we don't need to do this,
+since we can
+[use a SSH remote storage](https://katacoda.com/dvc/courses/examples/ssh-storage)
+directly. But we are using it just as an example, since it is easy to
+network-mount a directory with SSHFS. Once you understand how it works, it
+should be easy to implement it for other types of mounted storages (like NFS,
+Samba, etc.).
+
+> For more detailed instructions check out this
+> [interactive example](https://katacoda.com/dvc/courses/examples/mounted-storage).
+
+<p align="center">
+<img src="/static/img/user-guide/data-sharing/mounted-storage.png"/>
+</p>
+
+### Setup the server
+
+We have to do these configurations on the SSH server:
+
+- Create accounts for each user and add them to groups for accessing the Git
+  repository and the DVC storage.
+- Create a bare git repository (for example on `/srv/project.git/`) and an empty
+  directory for the DVC storage (for example on `/srv/project.cache/`).
+
+- Grant users read/write access to these directories (through the groups).
+
+### Setup each user
+
+When we have to access a SSH server, we definitely want to generate ssh key
+pairs and setup the SSH config so that we can access the server without a
+password.
+
+Let's assume that for each user we can use the private ssh key
+`~/.ssh/dvc-server` to access the server without a password, and we have also
+added on `~/.ssh/config` lines like these:
+
+```
+Host dvc-server
+    HostName host01
+    User user1
+    IdentityFile ~/.ssh/dvc-server
+    IdentitiesOnly yes
+```
+
+Here `dvc-server` is the name or alias that we can use for our server, `host01`
+can actually be the IP or the FQDN of the server, and `user1` is the username of
+the first user on the server.
+
+### Setup the DVC storage
+
+First of all we have to mount the remote storage directory to a local directory.
+With SSHFS (and the SSH configuration on the section above) it is as simple as
+this:
+
+```dvc
+$ mkdir ~/project.cache
+$ sshfs \
+      dvc-server:/srv/project.cache \
+      ~/project.cache
+```
+
+Once it is mounted, the default storage configuration of the project can be done
+like this:
+
+```dvc
+$ dvc remote add --local --default \
+      mounted-cache $HOME/project.cache
+$ dvc remote list --local
+mounted-cache /home/username/project.cache
+```
+
+Note that this configuration is specific for each user, so we have used the
+`--local` option in order to save it on `.dvc/config.local`, which is ignored by
+Git. Now this configuration file should have a content like this:
+
+```
+['remote "mounted-cache"']
+url = /home/username/project.cache
+[core]
+remote = mounted-cache
+```
+
+### Sharing data
+
+After adding data to the project with `dvc add` and `dvc run`, it is stored in
+`.dvc/cache`. We can push both the code changes and the data like this:
+
+```dvc
+$ git push
+$ dvc push
+```
+
+The command `dvc push` copies the cached files from `.dvc/cache/` to
+`~/project.cache/`. However, since this is a mounted directory, the cached files
+are immediately copied to the server as well, and they become available on the
+mounted directories of the other users. So, all the other users have to do in
+order to receive the code changes and the data files is this:
+
+```dvc
+$ git pull
+$ dvc pull
+```
diff --git a/static/docs/user-guide/data-sharing/remote-storage.md b/static/docs/user-guide/data-sharing/remote-storage.md
@@ -0,0 +1,193 @@
+# Sharing Data Through a Remote DVC Storage
+
+This is the recommended and the most common case of data sharing. In this case
+we setup a [remote storage](/doc/command-reference/remote) on a data storage
+provider, to store data files online, where others can reach them. Currently DVC
+supports Amazon S3, Google Cloud Storage, Microsoft Azure Blob Storage, SSH,
+HDFS, and other remote locations, and the list is constantly growing.
+
+## S3 Remote Example
+
+As an example, let's take a look at how you could setup an S3
+[remote storage](/doc/command-reference/remote) for a <abbr>DVC project</abbr>,
+and push/pull to/from it.
+
+### Create an S3 bucket
+
+If you don't already have one available in your S3 account, follow instructions
+in
+[Create a Bucket](https://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html).
+As an advanced alternative, you may use the
+[`aws s3 mb`](https://docs.aws.amazon.com/cli/latest/reference/s3/mb.html)
+command instead.
+
+### Setup DVC remote
+
+To actually configure a S3 remote in the <abbr>project</abbr>, supply the URL to
+the bucket where the data should be stored to the `dvc remote add` command. For
+example:
+
+```dvc
+$ dvc remote add -d myremote s3://mybucket/myproject
+Setting 'myremote' as a default remote.
+```
+
+> The `-d` (`--default`) option sets `myremote` as the default remote storage
+> for this project.
+
+This will add `myremote` to your `.dvc/config`. The `config` file now have a
+section like this:
+
+```dvc
+['remote "myremote"']
+url = s3://mybucket/myproject
+[core]
+remote = myremote
+```
+
+`dvc remote` provides a wide variety of options to configure S3 bucket. For more
+information see `dvc remote modify`.
+
+Let's commit your changes and push your code:
+
+```dvc
+$ git add .dvc/config
+$ git push
+```
+
+### Upload data and code
+
+After adding data to the <abbr>project</abbr> with `dvc run` or other commands,
+it should be stored in your local <abbr>cache</abbr>. Upload it to remote
+storage with the `dvc push` command:
+
+```dvc
+$ dvc push
+```
+
+Code and [DVC-files](/doc/user-guide/dvc-file-format) should be committed and
+pushed with Git.
+
+### Download code
+
+Please use regular Git commands to download code and DVC-files from your Git
+servers. For example:
+
+```dvc
+$ git clone https://github.com/myaccount/myproject.git
+$ cd myproject
+```
+
+or
+
+```dvc
+$ git pull
+```
+
+### Download data
+
+To download data files for your <abbr>project</abbr>, run:
+
+```dvc
+$ dvc pull
+```
+
+`dvc pull` will download the missing data files from the default remote storage
+configured in the `.dvc/config` file.
+
+## SSH Remote Example
+
+As an other example, let's see how to setup an SSH remote storage for a project
+and share data through it.
+
+> For more detailed instructions check out this
+> [interactive example](https://katacoda.com/dvc/courses/examples/ssh-storage).
+
+In this example we will assume a central data storage server that can be
+accessed through SSH from two different users. For the sake of example the
+central Git repository will be located in this server too, but in general it can
+be anywhere, it doesn't have to be on the same server with the DVC data storage.
+
+<p align="center">
+<img src="/static/img/user-guide/data-sharing/ssh-storage.png"/>
+</p>
+
+### Setup the server
+
+Usually we need to do these configurations on a SSH server:
+
+- Create accounts for each user and add them to groups for accessing the Git
+  repository and the DVC storage.
+- Create a bare git repository (for example on `/srv/project.git/`) and an empty
+  directory for the DVC storage (for example on `/srv/project.cache/`).
+
+- Grant users read/write access to these directories (through the groups).
+
+### Setup each user
+
+When we have to access a SSH server, we definitely want to generate ssh key
+pairs and setup the SSH config so that we can access the server without a
+password.
+
+Let's assume that for each user we can use the private ssh key
+`~/.ssh/dvc-server` to access the server without a password, and we have also
+added on `~/.ssh/config` lines like these:
+
+```
+Host dvc-server
+    HostName host01
+    User user1
+    IdentityFile ~/.ssh/dvc-server
+    IdentitiesOnly yes
+```
+
+Here `dvc-server` is the name or alias that we can use for our server, `host01`
+can actually be the IP or the FQDN of the server, and `user1` is the username of
+the first user on the server.
+
+### Setup DVC remote
+
+The configuration of the project with the SSH remote storage can be done with a
+command like this:
+
+```dvc
+$ dvc remote add --default \
+      ssh-cache ssh://dvc-server:/srv/project.cache
+```
+
+This command will add a default remote configuration on `.dvc/config` that looks
+like this:
+
+```
+['remote "ssh-cache"']
+url = ssh://dvc-server:/srv/project.cache
+[core]
+remote = ssh-cache
+```
+
+Note that this configuration is the same for all the users, so we can add it to
+Git in order to share it with the other users:
+
+```dvc
+$ git add .dvc/config
+$ git commit -m 'Add a SSH remote cache'
+$ git push
+```
+
+### Sharing data
+
+After adding data to the project with `dvc add` and `dvc run`, it is stored in
+`.dvc/cache`. We can upload to the server both the code changes and the data
+like this:
+
+```dvc
+$ git push
+$ dvc push
+```
+
+On the other end, we can receive the code changes and data like this:
+
+```dvc
+$ git pull
+$ dvc pull
+```