Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

guide: (better) SSH remotes #4384

Merged
merged 7 commits into from
Mar 16, 2023
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion content/docs/sidebar.json
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,7 @@
"slug": "aliyun-oss"
},
{
"label": "SSH",
"label": "SSH & SFTP",
"slug": "ssh"
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -99,11 +99,13 @@ $ dvc stage add -n download_file \
scp [email protected]:/path/to/data.txt data.txt
```

⚠️ DVC requires both SSH and SFTP access to work with remote SSH locations.
Check that you can connect both ways with tools like `ssh` and `sftp`
(GNU/Linux).
<admon type="warn">

> Note that your server's SFTP root might differ from its physical root (`/`).
DVC requires both SSH and SFTP access to work with SSH remote storage. Check
that you can connect both ways with tools like `ssh` and `sftp` (GNU/Linux).
Note that your server's SFTP root might differ from its physical root (`/`).

</admon>

</details>

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -125,11 +125,13 @@ $ dvc stage add -d data.txt \
scp data.txt [email protected]:/data.txt
```

⚠️ DVC requires both SSH and SFTP access to work with remote SSH locations.
Check that you can connect both ways with tools like `ssh` and `sftp`
(GNU/Linux).
<admon type="warn">

> Note that your server's SFTP root might differ from its physical root (`/`).
DVC requires both SSH and SFTP access to work with SSH remote storage. Check
that you can connect both ways with tools like `ssh` and `sftp` (GNU/Linux).
Note that your server's SFTP root might differ from its physical root (`/`).

</admon>

</details>

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -58,8 +58,8 @@ in your project directory locally.

## Custom authentication

If you don't have the AWS CLI configured in your machine or if you want to
change the auth method for some reason.
Use these configuration options if you don't have the AWS CLI setup in your
environment, if you want to override those values, or to change the auth method.
Comment on lines 59 to +62
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated change, sorry.


<admon type="warn">

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ team.

### Self-hosted / On-premises

- [SSH]; Like `scp`
- [SSH] & SFTP (like `scp`)
- [HDFS] & [WebHDFS]
- [HTTP]
- [WebDAV]
Expand All @@ -125,7 +125,7 @@ team.

## File systems (local remotes)

<admon type="tip">
<admon type="info">

Not related to the `--local` option of `dvc remote` and `dvc config`!

Expand Down
162 changes: 81 additions & 81 deletions content/docs/user-guide/data-management/remote-storage/ssh.md
Original file line number Diff line number Diff line change
@@ -1,117 +1,117 @@
# SSH
# SSH and SFTP

<!--
## SSH
-->

Start with `dvc remote add` to define the remote:
<details>

```cli
$ dvc remote add -d myremote ssh://[email protected]/path
```
### Click to learn about SSH and SFTP.

[SSH] (Secure Shell) is a protocol that uses encryption to secure a connection
with a remote computer, which lets you safely transfer files to and from it
(like [`scp`]), among other features. Other operations can be used on top of
SSH, like FTP (simple file transfer protocol) which becomes secure or [SFTP].

⚠️ DVC requires both SSH and SFTP access to work with remote SSH locations.
Check that you can connect both ways with tools like `ssh` and `sftp`
(GNU/Linux).
[SSH]: https://www.ssh.com/academy/ssh
[SFTP]: https://www.ssh.com/academy/ssh/sftp-ssh-file-transfer-protocol
[`scp`]: https://www.ssh.com/academy/ssh/scp

> Note that the server's SFTP root might differ from its physical root (`/`).
</details>

## Configuration parameters
DVC will act as an SSH/SFTP client, which means that the remote storage should
be located in an [SSH server]. Use `dvc remote add` to define the remote by
setting a name and valid [SSH URL] (which may include some auth info. like user
name or port):

```cli
$ dvc remote add -d myremote ssh://[email protected]:2222/path
```

> If any values given to the parameters below contain sensitive user info, add
> them with the `--local` option, so they're written to a Git-ignored config
> file.
[ssh server]: https://www.ssh.com/academy/ssh/server
[SSH URL]: https://tools.ietf.org/id/draft-salowey-secsh-uri-00.html#sshsyntax

- `url` - remote location, in a regular
[SSH format](https://tools.ietf.org/id/draft-salowey-secsh-uri-00.html#sshsyntax).
Note that this can already include the `user` parameter, embedded into the
URL:
<admon type="warn">

```cli
$ dvc remote modify myremote url \
ssh://[email protected]:1234/path
```
DVC requires both SSH and SFTP access to work with SSH remote storage. Check
that you can connect both ways with tools like [`ssh`] and `sftp` (GNU/Linux).
Note that your server's SFTP root might differ from its physical root (`/`).

⚠️ DVC requires both SSH and SFTP access to work with remote SSH locations.
Please check that you are able to connect both ways with tools like `ssh` and
`sftp` (GNU/Linux).
[`ssh`]: https://www.ssh.com/academy/ssh/command

> Note that your server's SFTP root might differ from its physical root (`/`).
</admon>

- `user` - user name to access the remote:
By default, authentication credentials (user name, password or private key,
etc.) not found in the URL are loaded from [SSH configuration]. You can also set
them directly with DVC.

```cli
$ dvc remote modify --local myremote user myuser
```
[ssh configuration]: https://www.ssh.com/academy/ssh/config

The order in which DVC picks the user name:
## Custom authentication

1. `user` parameter set with this command (found in `.dvc/config`);
2. User defined in the URL (e.g. `ssh://[email protected]/path`);
3. User defined in the SSH config file (e.g. `~/.ssh/config`) for this host
(URL);
4. Current system user
2 parameters that are commonly included in an SSH URL are user name and
sometimes port. These can be set (or overridden) as follows:

- `port` - port to access the remote.
```cli
$ dvc remote modify myremote user myuser
$ dvc remote modify myremote port 2222
```

```cli
$ dvc remote modify myremote port 2222
```
Order in which DVC picks these values when defined in multiple places:

The order in which DVC decide the port number:
1. Value set in these `user`/`port` params (DVC-specific config)
2. User/port embedded in the `url`, if any (e.g. `ssh://[email protected]:2222`)
3. `User`/`Port` defined for the host in SSH config
4. Default values: Current system user; Standard SSH port 22

1. `port` parameter set with this command (found in `.dvc/config`);
2. Port defined in the URL (e.g. `ssh://example.com:1234/path`);
3. Port defined in the SSH config file (e.g. `~/.ssh/config`) for this host
(URL);
4. Default SSH port 22
<admon type="warn">

- `keyfile` - path to private key to access the remote.
The `dvc remote modify --local` flag is needed to write sensitive user info to a
Git-ignored config file (`.dvc/config.local`) so that no secrets are leaked
through Git. See `dvc config`.
dberenbaum marked this conversation as resolved.
Show resolved Hide resolved

```cli
$ dvc remote modify --local myremote keyfile /path/to/keyfile
```
</admon>

- `password` - a password to access the remote
Using a private key is usually the recommended way to authenticate an SSH connection,
and it should be saved in a key file. You can set its path as shown below.
Often these require a passphrase to use as well: You can set up DVC to ask for
it each time, or set it directly.

```cli
$ dvc remote modify --local myremote password mypassword
```
```cli
$ dvc remote modify --local myremote keyfile /path/to/keyfile
# and (if needed)
$ dvc remote modify myremote ask_passphrase true
# or
$ dvc remote modify --local myremote passphrase mypassphrase
```

- `ask_password` - ask for a password to access the remote.
Another popular way to authenticate an SSH connection is with a simple password.
It can be set directly or you can set up DVC to ask for it when needed:

```cli
$ dvc remote modify myremote ask_password true
```
```cli
$ dvc remote modify --local myremote password mypassword
# or
$ dvc remote modify myremote ask_password true
```

- `passphrase` - a private key passphrase to access the remote
## More configuration parameters

```cli
$ dvc remote modify --local myremote passphrase mypassphrase
```
- `url` - modify the remote location ([scroll up](#amazon-s3) for details)

- `ask_passphrase` - ask for a private key passphrase to access the remote.
- `allow_agent` - whether to use [SSH agents] (`true` by default). Setting this
to `false` is useful when `ssh-agent` is causing problems, e.g. "No existing
session" errors.

```cli
$ dvc remote modify myremote ask_passphrase true
```
- `gss_auth` - use Generic Security Service auth if available on host (for
example, [with Kerberos]). `false` by default

- `gss_auth` - use Generic Security Services authentication if available on host
(for example,
[with kerberos](https://en.wikipedia.org/wiki/Generic_Security_Services_Application_Program_Interface#Relationship_to_Kerberos)).
Using this param requires `paramiko[gssapi]`, which is currently only
supported by our pip package, and could be installed with
`pip install 'dvc[ssh_gssapi]'`. Other packages (Conda, Windows, and macOS
PKG) do not support it.
<admon type="warn">

```cli
$ dvc remote modify myremote gss_auth true
```
Using GSS requires `paramiko[gssapi]`, which is only supported currently by
the DVC pip package (installed with `pip install 'dvc[ssh_gssapi]'`).

- `allow_agent` - whether to use [SSH agents](https://www.ssh.com/ssh/agent)
(`true` by default). Setting this to `false` is useful when `ssh-agent` is
causing problems, such as a "No existing session" error:
</admon>

```cli
$ dvc remote modify myremote allow_agent false
```
[ssh agents]: https://www.ssh.com/academy/ssh/agent
[with kerberos]:
https://en.wikipedia.org/wiki/Generic_Security_Services_Application_Program_Interface#Relationship_to_Kerberos