diff --git a/content/docs/sidebar.json b/content/docs/sidebar.json index 0da64a4add..3aaed700b7 100644 --- a/content/docs/sidebar.json +++ b/content/docs/sidebar.json @@ -138,7 +138,7 @@ "slug": "aliyun-oss" }, { - "label": "SSH", + "label": "SSH & SFTP", "slug": "ssh" }, { diff --git a/content/docs/user-guide/data-management/importing-external-data.md b/content/docs/user-guide/data-management/importing-external-data.md index 37bf226da3..3d680bf677 100644 --- a/content/docs/user-guide/data-management/importing-external-data.md +++ b/content/docs/user-guide/data-management/importing-external-data.md @@ -99,11 +99,13 @@ $ dvc stage add -n download_file \ scp user@example.com:/path/to/data.txt data.txt ``` -⚠️ DVC requires both SSH and SFTP access to work with remote SSH locations. -Check that you can connect both ways with tools like `ssh` and `sftp` -(GNU/Linux). + -> Note that your server's SFTP root might differ from its physical root (`/`). +DVC requires both SSH and SFTP access to work with SSH remote storage. Check +that you can connect both ways with tools like `ssh` and `sftp` (GNU/Linux). +Note that your server's SFTP root might differ from its physical root (`/`). + + diff --git a/content/docs/user-guide/data-management/managing-external-data.md b/content/docs/user-guide/data-management/managing-external-data.md index f33918e7d4..666ce04fe3 100644 --- a/content/docs/user-guide/data-management/managing-external-data.md +++ b/content/docs/user-guide/data-management/managing-external-data.md @@ -125,11 +125,13 @@ $ dvc stage add -d data.txt \ scp data.txt user@example.com:/data.txt ``` -⚠️ DVC requires both SSH and SFTP access to work with remote SSH locations. -Check that you can connect both ways with tools like `ssh` and `sftp` -(GNU/Linux). + -> Note that your server's SFTP root might differ from its physical root (`/`). +DVC requires both SSH and SFTP access to work with SSH remote storage. Check +that you can connect both ways with tools like `ssh` and `sftp` (GNU/Linux). +Note that your server's SFTP root might differ from its physical root (`/`). + + diff --git a/content/docs/user-guide/data-management/remote-storage/amazon-s3.md b/content/docs/user-guide/data-management/remote-storage/amazon-s3.md index 5376c9f3f1..e0612ea8eb 100644 --- a/content/docs/user-guide/data-management/remote-storage/amazon-s3.md +++ b/content/docs/user-guide/data-management/remote-storage/amazon-s3.md @@ -58,8 +58,8 @@ in your project directory locally. ## Custom authentication -If you don't have the AWS CLI configured in your machine or if you want to -change the auth method for some reason. +Use these configuration options if you don't have the AWS CLI setup in your +environment, if you want to override those values, or to change the auth method. diff --git a/content/docs/user-guide/data-management/remote-storage/index.md b/content/docs/user-guide/data-management/remote-storage/index.md index 990eb2d2df..aa0bbcf255 100644 --- a/content/docs/user-guide/data-management/remote-storage/index.md +++ b/content/docs/user-guide/data-management/remote-storage/index.md @@ -112,7 +112,7 @@ team. ### Self-hosted / On-premises -- [SSH]; Like `scp` +- [SSH] & SFTP (like `scp`) - [HDFS] & [WebHDFS] - [HTTP] - [WebDAV] @@ -125,7 +125,7 @@ team. ## File systems (local remotes) - + Not related to the `--local` option of `dvc remote` and `dvc config`! diff --git a/content/docs/user-guide/data-management/remote-storage/ssh.md b/content/docs/user-guide/data-management/remote-storage/ssh.md index 32855b2d9a..03748244e1 100644 --- a/content/docs/user-guide/data-management/remote-storage/ssh.md +++ b/content/docs/user-guide/data-management/remote-storage/ssh.md @@ -1,117 +1,118 @@ -# SSH +# SSH and SFTP -Start with `dvc remote add` to define the remote: +
-```cli -$ dvc remote add -d myremote ssh://user@example.com/path -``` +### Click to learn about SSH and SFTP. + +[SSH] (Secure Shell) is a protocol that uses encryption to secure a connection +with a remote computer, which lets you safely transfer files to and from it +(like [`scp`]), among other features. Other operations can be used on top of +SSH, like FTP (simple file transfer protocol) which becomes secure or [SFTP]. -⚠️ DVC requires both SSH and SFTP access to work with remote SSH locations. -Check that you can connect both ways with tools like `ssh` and `sftp` -(GNU/Linux). +[SSH]: https://www.ssh.com/academy/ssh +[SFTP]: https://www.ssh.com/academy/ssh/sftp-ssh-file-transfer-protocol +[`scp`]: https://www.ssh.com/academy/ssh/scp -> Note that the server's SFTP root might differ from its physical root (`/`). +
-## Configuration parameters +DVC will act as an SSH/SFTP client, which means that the remote storage should +be located in an [SSH server]. Use `dvc remote add` to define the remote by +setting a name and valid [SSH URL] (which may include some auth info. like user +name or port): + +```cli +$ dvc remote add -d myremote ssh://user@example.com:2222/path +``` -> If any values given to the parameters below contain sensitive user info, add -> them with the `--local` option, so they're written to a Git-ignored config -> file. +[ssh server]: https://www.ssh.com/academy/ssh/server +[SSH URL]: https://tools.ietf.org/id/draft-salowey-secsh-uri-00.html#sshsyntax -- `url` - remote location, in a regular - [SSH format](https://tools.ietf.org/id/draft-salowey-secsh-uri-00.html#sshsyntax). - Note that this can already include the `user` parameter, embedded into the - URL: + - ```cli - $ dvc remote modify myremote url \ - ssh://user@example.com:1234/path - ``` +DVC requires both SSH and SFTP access to work with SSH remote storage. Check +that you can connect both ways with tools like [`ssh`] and `sftp` (GNU/Linux). +Note that your server's SFTP root might differ from its physical root (`/`). - ⚠️ DVC requires both SSH and SFTP access to work with remote SSH locations. - Please check that you are able to connect both ways with tools like `ssh` and - `sftp` (GNU/Linux). +[`ssh`]: https://www.ssh.com/academy/ssh/command - > Note that your server's SFTP root might differ from its physical root (`/`). + -- `user` - user name to access the remote: +By default, authentication credentials (user name, password or private key, +etc.) not found in the URL are loaded from [SSH configuration]. You can also set +them directly with DVC. - ```cli - $ dvc remote modify --local myremote user myuser - ``` +[ssh configuration]: https://www.ssh.com/academy/ssh/config - The order in which DVC picks the user name: +## Custom authentication - 1. `user` parameter set with this command (found in `.dvc/config`); - 2. User defined in the URL (e.g. `ssh://user@example.com/path`); - 3. User defined in the SSH config file (e.g. `~/.ssh/config`) for this host - (URL); - 4. Current system user +2 parameters that are commonly included in an SSH URL are user name and +sometimes port. These can be set (or overridden) as follows: -- `port` - port to access the remote. +```cli +$ dvc remote modify myremote user myuser +$ dvc remote modify myremote port 2222 +``` - ```cli - $ dvc remote modify myremote port 2222 - ``` +Order in which DVC picks these values when defined in multiple places: - The order in which DVC decide the port number: +1. Value set in these `user`/`port` params (DVC-specific config) +2. User/port embedded in the `url`, if any (e.g. `ssh://user@example.com:2222`) +3. `User`/`Port` defined for the host in SSH config +4. Default values: Current system user; Standard SSH port 22 - 1. `port` parameter set with this command (found in `.dvc/config`); - 2. Port defined in the URL (e.g. `ssh://example.com:1234/path`); - 3. Port defined in the SSH config file (e.g. `~/.ssh/config`) for this host - (URL); - 4. Default SSH port 22 + -- `keyfile` - path to private key to access the remote. +The `dvc remote modify --local` flag is needed to write sensitive user info to a +Git-ignored config file (`.dvc/config.local`) so that no secrets are leaked +through Git. See +[Configuration](/doc/user-guide/project-structure/configuration#config-file-locations). - ```cli - $ dvc remote modify --local myremote keyfile /path/to/keyfile - ``` + -- `password` - a password to access the remote +Using a private key is usually the recommended way to authenticate an SSH +connection, and it should be saved in a key file. You can set its path as shown +below. Often these require a passphrase to use as well: You can set up DVC to +ask for it each time, or set it directly. - ```cli - $ dvc remote modify --local myremote password mypassword - ``` +```cli +$ dvc remote modify --local myremote keyfile /path/to/keyfile +# and (if needed) +$ dvc remote modify myremote ask_passphrase true +# or +$ dvc remote modify --local myremote passphrase mypassphrase +``` -- `ask_password` - ask for a password to access the remote. +Another popular way to authenticate an SSH connection is with a simple password. +It can be set directly or you can set up DVC to ask for it when needed: - ```cli - $ dvc remote modify myremote ask_password true - ``` +```cli +$ dvc remote modify --local myremote password mypassword +# or +$ dvc remote modify myremote ask_password true +``` -- `passphrase` - a private key passphrase to access the remote +## More configuration parameters - ```cli - $ dvc remote modify --local myremote passphrase mypassphrase - ``` +- `url` - modify the remote location ([scroll up](#amazon-s3) for details) -- `ask_passphrase` - ask for a private key passphrase to access the remote. +- `allow_agent` - whether to use [SSH agents] (`true` by default). Setting this + to `false` is useful when `ssh-agent` is causing problems, e.g. "No existing + session" errors. - ```cli - $ dvc remote modify myremote ask_passphrase true - ``` +- `gss_auth` - use Generic Security Service auth if available on host (for + example, [with Kerberos]). `false` by default -- `gss_auth` - use Generic Security Services authentication if available on host - (for example, - [with kerberos](https://en.wikipedia.org/wiki/Generic_Security_Services_Application_Program_Interface#Relationship_to_Kerberos)). - Using this param requires `paramiko[gssapi]`, which is currently only - supported by our pip package, and could be installed with - `pip install 'dvc[ssh_gssapi]'`. Other packages (Conda, Windows, and macOS - PKG) do not support it. + - ```cli - $ dvc remote modify myremote gss_auth true - ``` + Using GSS requires `paramiko[gssapi]`, which is only supported currently by + the DVC pip package (installed with `pip install 'dvc[ssh_gssapi]'`). -- `allow_agent` - whether to use [SSH agents](https://www.ssh.com/ssh/agent) - (`true` by default). Setting this to `false` is useful when `ssh-agent` is - causing problems, such as a "No existing session" error: + - ```cli - $ dvc remote modify myremote allow_agent false - ``` +[ssh agents]: https://www.ssh.com/academy/ssh/agent +[with kerberos]: + https://en.wikipedia.org/wiki/Generic_Security_Services_Application_Program_Interface#Relationship_to_Kerberos