From 9fd1462523662e969ce3a38ff9d8a5609a968531 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Perez Date: Sun, 12 Mar 2023 10:36:10 -0600 Subject: [PATCH 1/6] guide: start improving SSH remote page (intro) --- content/docs/sidebar.json | 2 +- .../importing-external-data.md | 10 +++-- .../data-management/managing-external-data.md | 10 +++-- .../data-management/remote-storage/index.md | 4 +- .../data-management/remote-storage/ssh.md | 37 ++++++++++++++++--- 5 files changed, 46 insertions(+), 17 deletions(-) diff --git a/content/docs/sidebar.json b/content/docs/sidebar.json index 9f5d06aa6f..799851280a 100644 --- a/content/docs/sidebar.json +++ b/content/docs/sidebar.json @@ -137,7 +137,7 @@ "slug": "aliyun-oss" }, { - "label": "SSH", + "label": "SSH & SFTP", "slug": "ssh" }, { diff --git a/content/docs/user-guide/data-management/importing-external-data.md b/content/docs/user-guide/data-management/importing-external-data.md index 37bf226da3..3d680bf677 100644 --- a/content/docs/user-guide/data-management/importing-external-data.md +++ b/content/docs/user-guide/data-management/importing-external-data.md @@ -99,11 +99,13 @@ $ dvc stage add -n download_file \ scp user@example.com:/path/to/data.txt data.txt ``` -⚠️ DVC requires both SSH and SFTP access to work with remote SSH locations. -Check that you can connect both ways with tools like `ssh` and `sftp` -(GNU/Linux). + -> Note that your server's SFTP root might differ from its physical root (`/`). +DVC requires both SSH and SFTP access to work with SSH remote storage. Check +that you can connect both ways with tools like `ssh` and `sftp` (GNU/Linux). +Note that your server's SFTP root might differ from its physical root (`/`). + + diff --git a/content/docs/user-guide/data-management/managing-external-data.md b/content/docs/user-guide/data-management/managing-external-data.md index f33918e7d4..666ce04fe3 100644 --- a/content/docs/user-guide/data-management/managing-external-data.md +++ b/content/docs/user-guide/data-management/managing-external-data.md @@ -125,11 +125,13 @@ $ dvc stage add -d data.txt \ scp data.txt user@example.com:/data.txt ``` -⚠️ DVC requires both SSH and SFTP access to work with remote SSH locations. -Check that you can connect both ways with tools like `ssh` and `sftp` -(GNU/Linux). + -> Note that your server's SFTP root might differ from its physical root (`/`). +DVC requires both SSH and SFTP access to work with SSH remote storage. Check +that you can connect both ways with tools like `ssh` and `sftp` (GNU/Linux). +Note that your server's SFTP root might differ from its physical root (`/`). + + diff --git a/content/docs/user-guide/data-management/remote-storage/index.md b/content/docs/user-guide/data-management/remote-storage/index.md index 67f5f01129..af1dc169b1 100644 --- a/content/docs/user-guide/data-management/remote-storage/index.md +++ b/content/docs/user-guide/data-management/remote-storage/index.md @@ -112,7 +112,7 @@ team. ### Self-hosted / On-premises -- [SSH]; Like `scp` +- [SSH] & SFTP (like `scp`) - [HDFS] & [WebHDFS] - [HTTP] - [WebDAV] @@ -125,7 +125,7 @@ team. ## File systems (local remotes) - + Not related to the `--local` option of `dvc remote` and `dvc config`! diff --git a/content/docs/user-guide/data-management/remote-storage/ssh.md b/content/docs/user-guide/data-management/remote-storage/ssh.md index 32855b2d9a..e4a4f7ed52 100644 --- a/content/docs/user-guide/data-management/remote-storage/ssh.md +++ b/content/docs/user-guide/data-management/remote-storage/ssh.md @@ -1,20 +1,45 @@ -# SSH +# SSH and SFTP -Start with `dvc remote add` to define the remote: +
+ +### Click to learn about SSH and SFTP. + +[SSH] (Secure Shell) is a protocol that uses encryption to secure a connection +with a remote computer, which lets you safely transfer files to and from it +(like [`scp`]), among other features. Other operations can be used on top of +SSH, like FTP (simple file transfer protocol) which becomes secure or [SFTP]. + +[SSH]: https://www.ssh.com/academy/ssh +[SFTP]: https://www.ssh.com/academy/ssh/sftp-ssh-file-transfer-protocol +[`scp`]: https://www.ssh.com/academy/ssh/scp + +
+ +DVC will act as an SSH/SFTP client, which means that the remote storage should +be located in an [SSH server]. Use `dvc remote add` to define the remote by +setting a name and valid [SSH URL] (may include basic auth info. like a user +name): ```cli $ dvc remote add -d myremote ssh://user@example.com/path ``` -⚠️ DVC requires both SSH and SFTP access to work with remote SSH locations. -Check that you can connect both ways with tools like `ssh` and `sftp` -(GNU/Linux). +[ssh server]: https://www.ssh.com/academy/ssh/server +[SSH URL]: https://www.ietf.org/archive/id/draft-salowey-secsh-uri-00.html + + + +DVC requires both SSH and SFTP access to work with SSH remote storage. Check +that you can connect both ways with tools like [`ssh`] and `sftp` (GNU/Linux). +Note that your server's SFTP root might differ from its physical root (`/`). + +[`ssh`]: https://www.ssh.com/academy/ssh/command -> Note that the server's SFTP root might differ from its physical root (`/`). + ## Configuration parameters From 6e9a475c5f96dbc30c7e50c70b4d2910481efc73 Mon Sep 17 00:00:00 2001 From: Jorge Orpinel Perez Date: Mon, 13 Mar 2023 02:56:40 -0700 Subject: [PATCH 2/6] guide: make SSH auth more guide-like --- .../remote-storage/amazon-s3.md | 4 +- .../data-management/remote-storage/ssh.md | 139 +++++++----------- 2 files changed, 59 insertions(+), 84 deletions(-) diff --git a/content/docs/user-guide/data-management/remote-storage/amazon-s3.md b/content/docs/user-guide/data-management/remote-storage/amazon-s3.md index 5376c9f3f1..e0612ea8eb 100644 --- a/content/docs/user-guide/data-management/remote-storage/amazon-s3.md +++ b/content/docs/user-guide/data-management/remote-storage/amazon-s3.md @@ -58,8 +58,8 @@ in your project directory locally. ## Custom authentication -If you don't have the AWS CLI configured in your machine or if you want to -change the auth method for some reason. +Use these configuration options if you don't have the AWS CLI setup in your +environment, if you want to override those values, or to change the auth method. diff --git a/content/docs/user-guide/data-management/remote-storage/ssh.md b/content/docs/user-guide/data-management/remote-storage/ssh.md index e4a4f7ed52..597c08f6a4 100644 --- a/content/docs/user-guide/data-management/remote-storage/ssh.md +++ b/content/docs/user-guide/data-management/remote-storage/ssh.md @@ -21,15 +21,15 @@ SSH, like FTP (simple file transfer protocol) which becomes secure or [SFTP]. DVC will act as an SSH/SFTP client, which means that the remote storage should be located in an [SSH server]. Use `dvc remote add` to define the remote by -setting a name and valid [SSH URL] (may include basic auth info. like a user -name): +setting a name and valid [SSH URL] (which may include some auth info. like user +name or port): ```cli -$ dvc remote add -d myremote ssh://user@example.com/path +$ dvc remote add -d myremote ssh://user@example.com:2222/path ``` [ssh server]: https://www.ssh.com/academy/ssh/server -[SSH URL]: https://www.ietf.org/archive/id/draft-salowey-secsh-uri-00.html +[SSH URL]: https://tools.ietf.org/id/draft-salowey-secsh-uri-00.html#sshsyntax @@ -41,102 +41,77 @@ Note that your server's SFTP root might differ from its physical root (`/`). -## Configuration parameters +By default, authentication credentials (user name, password or private key, +etc.) not found in the URL are loaded from [SSH configuration]. You can also set +them directly with DVC. -> If any values given to the parameters below contain sensitive user info, add -> them with the `--local` option, so they're written to a Git-ignored config -> file. +[ssh configuration]: https://www.ssh.com/academy/ssh/config -- `url` - remote location, in a regular - [SSH format](https://tools.ietf.org/id/draft-salowey-secsh-uri-00.html#sshsyntax). - Note that this can already include the `user` parameter, embedded into the - URL: +## Custom authentication - ```cli - $ dvc remote modify myremote url \ - ssh://user@example.com:1234/path - ``` +2 parameters that are commonly included in an SSH URL are user name and +sometimes port. These can be set (or overridden) as follows: - ⚠️ DVC requires both SSH and SFTP access to work with remote SSH locations. - Please check that you are able to connect both ways with tools like `ssh` and - `sftp` (GNU/Linux). - - > Note that your server's SFTP root might differ from its physical root (`/`). - -- `user` - user name to access the remote: - - ```cli - $ dvc remote modify --local myremote user myuser - ``` - - The order in which DVC picks the user name: - - 1. `user` parameter set with this command (found in `.dvc/config`); - 2. User defined in the URL (e.g. `ssh://user@example.com/path`); - 3. User defined in the SSH config file (e.g. `~/.ssh/config`) for this host - (URL); - 4. Current system user - -- `port` - port to access the remote. +```cli +$ dvc remote modify myremote user myuser +$ dvc remote modify myremote port 2222 +``` - ```cli - $ dvc remote modify myremote port 2222 - ``` +Order in which DVC picks these values when defined in multiple places: - The order in which DVC decide the port number: +1. Value set in these `user`/`port` params (DVC-specific config) +2. User/port embedded in the `url`, if any (e.g. `ssh://user@example.com:2222`) +3. `User`/`Port` defined for the host in SSH config +4. Default values: Current system user; Standard SSH port 22 - 1. `port` parameter set with this command (found in `.dvc/config`); - 2. Port defined in the URL (e.g. `ssh://example.com:1234/path`); - 3. Port defined in the SSH config file (e.g. `~/.ssh/config`) for this host - (URL); - 4. Default SSH port 22 + -- `keyfile` - path to private key to access the remote. +The `dvc remote modify --local` flag is needed to write sensitive user info to a +Git-ignored config file (`.dvc/config.local`) so that no secrets are leaked +through Git. See `dvc config`. - ```cli - $ dvc remote modify --local myremote keyfile /path/to/keyfile - ``` + -- `password` - a password to access the remote +Using a private key is usually the recommended way to auth an SSH connection, +and it should be saved in a key file. You can set it's path as shown below. +Often these require a passphrase to use as well: You can set up DVC to ask for +it each time, or set it directly. - ```cli - $ dvc remote modify --local myremote password mypassword - ``` +```cli +$ dvc remote modify --local myremote keyfile /path/to/keyfile +# and (if needed) +$ dvc remote modify myremote ask_passphrase true +# or +$ dvc remote modify --local myremote passphrase mypassphrase +``` -- `ask_password` - ask for a password to access the remote. +Another popular way to authenticate an SSH connection is with a simple password. +It can be set directly or you can set up DVC to ask for it when needed: - ```cli - $ dvc remote modify myremote ask_password true - ``` +```cli +$ dvc remote modify --local myremote password mypassword +# or +$ dvc remote modify myremote ask_password true +``` -- `passphrase` - a private key passphrase to access the remote +## More configuration parameters - ```cli - $ dvc remote modify --local myremote passphrase mypassphrase - ``` +- `url` - modify the remote location ([scroll up](#amazon-s3) for details) -- `ask_passphrase` - ask for a private key passphrase to access the remote. +- `allow_agent` - whether to use [SSH agents] (`true` by default). Setting this + to `false` is useful when `ssh-agent` is causing problems, e.g. "No existing + session" errors. - ```cli - $ dvc remote modify myremote ask_passphrase true - ``` +- `gss_auth` - use Generic Security Service auth if available on host (for + example, [with Kerberos]). `false` by default -- `gss_auth` - use Generic Security Services authentication if available on host - (for example, - [with kerberos](https://en.wikipedia.org/wiki/Generic_Security_Services_Application_Program_Interface#Relationship_to_Kerberos)). - Using this param requires `paramiko[gssapi]`, which is currently only - supported by our pip package, and could be installed with - `pip install 'dvc[ssh_gssapi]'`. Other packages (Conda, Windows, and macOS - PKG) do not support it. + - ```cli - $ dvc remote modify myremote gss_auth true - ``` + Using GSS requires `paramiko[gssapi]`, which is only supported currently by + the DVC pip package (installed with `pip install 'dvc[ssh_gssapi]'`). -- `allow_agent` - whether to use [SSH agents](https://www.ssh.com/ssh/agent) - (`true` by default). Setting this to `false` is useful when `ssh-agent` is - causing problems, such as a "No existing session" error: + - ```cli - $ dvc remote modify myremote allow_agent false - ``` +[ssh agents]: https://www.ssh.com/academy/ssh/agent +[with kerberos]: + https://en.wikipedia.org/wiki/Generic_Security_Services_Application_Program_Interface#Relationship_to_Kerberos From dd8153cc0adcd0b578e036ee89ddce821225015c Mon Sep 17 00:00:00 2001 From: Dave Berenbaum Date: Mon, 13 Mar 2023 12:15:48 -0400 Subject: [PATCH 3/6] Update content/docs/user-guide/data-management/remote-storage/ssh.md --- content/docs/user-guide/data-management/remote-storage/ssh.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/docs/user-guide/data-management/remote-storage/ssh.md b/content/docs/user-guide/data-management/remote-storage/ssh.md index 597c08f6a4..c393a48107 100644 --- a/content/docs/user-guide/data-management/remote-storage/ssh.md +++ b/content/docs/user-guide/data-management/remote-storage/ssh.md @@ -72,8 +72,8 @@ through Git. See `dvc config`. -Using a private key is usually the recommended way to auth an SSH connection, -and it should be saved in a key file. You can set it's path as shown below. +Using a private key is usually the recommended way to authenticate an SSH connection, +and it should be saved in a key file. You can set its path as shown below. Often these require a passphrase to use as well: You can set up DVC to ask for it each time, or set it directly. From cd991b5f4b474e90c972f4e9d5f003e6e10ff309 Mon Sep 17 00:00:00 2001 From: "restyled-io[bot]" <32688539+restyled-io[bot]@users.noreply.github.com> Date: Mon, 13 Mar 2023 12:18:12 -0400 Subject: [PATCH 4/6] Restyled by prettier (#4388) Co-authored-by: Restyled.io --- .../docs/user-guide/data-management/remote-storage/ssh.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/content/docs/user-guide/data-management/remote-storage/ssh.md b/content/docs/user-guide/data-management/remote-storage/ssh.md index c393a48107..a1d3e5d01c 100644 --- a/content/docs/user-guide/data-management/remote-storage/ssh.md +++ b/content/docs/user-guide/data-management/remote-storage/ssh.md @@ -72,10 +72,10 @@ through Git. See `dvc config`.
-Using a private key is usually the recommended way to authenticate an SSH connection, -and it should be saved in a key file. You can set its path as shown below. -Often these require a passphrase to use as well: You can set up DVC to ask for -it each time, or set it directly. +Using a private key is usually the recommended way to authenticate an SSH +connection, and it should be saved in a key file. You can set its path as shown +below. Often these require a passphrase to use as well: You can set up DVC to +ask for it each time, or set it directly. ```cli $ dvc remote modify --local myremote keyfile /path/to/keyfile From e3e457ee1dbd85ca6754fbb62268ccea3c5e294b Mon Sep 17 00:00:00 2001 From: Dave Berenbaum Date: Thu, 16 Mar 2023 12:20:32 -0400 Subject: [PATCH 5/6] Update content/docs/user-guide/data-management/remote-storage/ssh.md --- content/docs/user-guide/data-management/remote-storage/ssh.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/user-guide/data-management/remote-storage/ssh.md b/content/docs/user-guide/data-management/remote-storage/ssh.md index a1d3e5d01c..b4c910eb41 100644 --- a/content/docs/user-guide/data-management/remote-storage/ssh.md +++ b/content/docs/user-guide/data-management/remote-storage/ssh.md @@ -68,7 +68,7 @@ Order in which DVC picks these values when defined in multiple places: The `dvc remote modify --local` flag is needed to write sensitive user info to a Git-ignored config file (`.dvc/config.local`) so that no secrets are leaked -through Git. See `dvc config`. +through Git. See [Configuration](/doc/user-guide/project-structure/configuration#config-file-locations).
From a6c01d250d2697088bbc12860316291b932d6f68 Mon Sep 17 00:00:00 2001 From: "restyled-io[bot]" <32688539+restyled-io[bot]@users.noreply.github.com> Date: Thu, 16 Mar 2023 12:37:56 -0400 Subject: [PATCH 6/6] Restyled by prettier (#4397) Co-authored-by: Restyled.io --- content/docs/user-guide/data-management/remote-storage/ssh.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/content/docs/user-guide/data-management/remote-storage/ssh.md b/content/docs/user-guide/data-management/remote-storage/ssh.md index b4c910eb41..03748244e1 100644 --- a/content/docs/user-guide/data-management/remote-storage/ssh.md +++ b/content/docs/user-guide/data-management/remote-storage/ssh.md @@ -68,7 +68,8 @@ Order in which DVC picks these values when defined in multiple places: The `dvc remote modify --local` flag is needed to write sensitive user info to a Git-ignored config file (`.dvc/config.local`) so that no secrets are leaked -through Git. See [Configuration](/doc/user-guide/project-structure/configuration#config-file-locations). +through Git. See +[Configuration](/doc/user-guide/project-structure/configuration#config-file-locations).