Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add file download description #5523

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 24 additions & 6 deletions docs/working-with-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -228,29 +228,47 @@ In general, you should not need to manually copy files, because Nextflow will au

## Remote files

Nextflow can work with many kinds of remote files and objects using the same interface as for local files. The following protocols are supported:
Nextflow works with many types of remote files and objects using the same interface as for local files. The following protocols are supported:

- HTTP(S) / FTP (`http://`, `https://`, `ftp://`)
- HTTP(S)/FTP (`http://`, `https://`, `ftp://`)
- Amazon S3 (`s3://`)
- Azure Blob Storage (`az://`)
- Google Cloud Storage (`gs://`)

To reference a remote file, simple specify the URL when opening the file:
Nextflow downloads remote files when tasks that reference them are created and they do not exist on the same filesystem as the work directory. When possible, standard libraries are used to download files. For example, HttpURLConnection is used for HTTP, and AWS Java SDK is used for S3. Implementations can be viewed under FileSystemProvider in the Nextflow codebase.

To reference a remote file, simply specify the URL when opening the file:

```nextflow
pdb = file('http://files.rcsb.org/header/5FID.pdb')
```

You can then access it as a local file as described previously:
It can then be accessed as a local file:

```nextflow
println pdb.text
```

By default, downloaded files are staged in a subdirectory of the work directory. The subdirectory is named using the prefix `stage-`, followed by a hash. For example, `stage-XXXXXXXX`.

<!---
Details of hash generation.
--->

Remote files are cached using the aforementioned hash. If multiple tasks request the same remote file at the same time, Nextflow will likely download a separate copy to separate folders.

<!---
Details of caching behavior.
--->

:::{note}
Not all operations are supported for all protocols. For example, writing and directory listing is not supported for HTTP(S) and FTP paths.
:::

:::{note}
Not all operations are supported for all protocols. In particular, writing and directory listing are not supported for HTTP(S) and FTP paths.
A custom process can be used to download a file into a task directory instead of using built-in remote file staging. To be staged by Nextflow, the file name must be provided to the process as a val input instead of a path input.
:::

:::{note}
Additional configuration may be required to work with cloud object storage (e.g. to authenticate with a private bucket). Refer to the respective page for each cloud storage provider for more information.
Additional configuration may be required to work with cloud object storage. For example, to authenticate with a private bucket. Refer to the respective page for each cloud storage provider for more information.
:::
Loading