-
Notifications
You must be signed in to change notification settings - Fork 394
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* guide: draft structure of Data Mgmt and some updates around the topic in existing docs * guide: full text for draft intro to DM * guide: hide cloud versioning info per #4042 (review) * guide: clarify Data Mgmt parts and add prospective figure titles * guide: add figure drafts to Data Mgmt * guide: SCM->VC (Data Mgmt) * guide: update 2 figs and add 1 more (Data Mgmt) * guide: roll back unrelated changes per #4042 (review) * guide: mention clouds first (DM) and and update fig. 1 per #4042 (review) * guide: flatten DM index per #4042 (review) * guide: udpates to DM/ DV moved from #4053 (review) * guide: add DM/ Data Versioning page per #4042 (comment) * guide: update outdated link * guide: revert more unrelatedly chaqnged files per #4042 (review) * guide: remove unused ref link * guide: DM/ Remote Storage (not just Setup) and and some links from cmd refs and avoid term "data remote" and some admons nearby... * guide: remove a comment * guide: draft for DM/ Remote Storage content * ref: expand config.remote and link to/from Remotes guide * ref: fix remote config file examples * guide: complete Remote Config section and and add Project config section to DM/ DV guide * guide: complete list of supported storage types * guide: clarify `remote modify` phrase in in the Remote config section of DM/ Remote Storage * Update content/docs/user-guide/data-management/data-versioning.md * guide: update versioning config per #4058 (review) * guide: don't call remote storage "additional" here (in the DM/ Remote Storage guide) per #4058 (review) Co-authored-by: Dave Berenbaum <[email protected]> * guide: pull -> download (DM/ RS intro) * guide: remove "optional" from Remote Storage nav & title per #4058 (review) * guide: splits and notes around Data Mgmt index page rel. #4042 (comment) * guide: Data Mgmt intro + note updates * guide: draft of all contents + + remove comments * guide: small impros to Data Mgmt in prep for #4042 (review) * guide: rewrite Data Mgmt index in before/after form per #4042 (review) * guide: add draft figure for Data Mgmt * guide: simplify/refocus data mgmt index per #4042 (review) * work around commented header bug * guide: drop DM/ DV page * guide: rewrite DM intro and - hide benefits (for now) - remove codification comment block * guide: use DM table instead of figure for now * guide: rewrite Data Mgmt story * guide: add draft figures to Data Mgmt * guide: simplify Data Mgmt story and benefits * guide: remove unused images (DM) * guide: update Data Mgmt figures (v1) * guide: rewrite text of Data Mgmt index * guide: update Data Mgmt figures * guide: iterate on Data Mgmt again * guide: update Data Mgmt figs * guide: more supporting info about Data Mgmt * guide: update figures (much more concrete) and and matching text updates * guide: edits to How it works (Data Mgmt) * guide: update Data Mgmt figures Rel. #4042 (comment) * guide: emphaisze dataset versions in UG fig 1 Rel. #4042 (comment) * guide: update Data Mgmt figures (with notes), expand img captions, and update text accordingly. * guide: more updates to text and figure styles, esp. to the first half and comment some stuff out (temporary) * guide: update figures and text (Data Mgmt) ... Using a tabs toggle for the 2nd fig. * guide: Data Management text (section 1) finalized for this version of figures * guide: Data Management (main text) finalized for this version of figures * guide: Data Management (secondary text) pending diagram and code sample(s) * guide: add DVC data mgmt technical diagram & dummy sample CLI blocks * guide: update Data Mgmt text * guide: udpate text and 2nd figure (Data Mgmt) * guide: draft 2nd and 3rd figures * guide: rewrite Data Mgmt/ How it works & and Benefits/ Tradeoffs Probably still unfinished... Missing more data versioning info? See HTML comments. * guide: update drafts of Data Mgmt figures 2, 3 * guide: Data Mgmt improvements and hide the benefits list for now * guide: separate from Data Mgmt work Rel. #4042 * guide: remove hidden Storage locations page for now * guide: small cleanup of Remote storage page Co-authored-by: Dave Berenbaum <[email protected]> Co-authored-by: rogermparent <[email protected]>
- Loading branch information
1 parent
3b4dda5
commit 4d4cbd4
Showing
11 changed files
with
205 additions
and
50 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
113 changes: 113 additions & 0 deletions
113
content/docs/user-guide/data-management/remote-storage.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,113 @@ | ||
# Remote Storage | ||
|
||
_DVC remotes_ provide optional/additional storage to backup and share your data | ||
and ML model. For example, you can download data artifacts created by colleagues | ||
without spending time and resources to regenerate them locally. See `dvc push` | ||
and `dvc pull`. | ||
|
||
<admon type="info"> | ||
|
||
DVC remotes are similar to [Git remotes], but for <abbr>cached</abbr> data. | ||
|
||
[git remotes]: https://git-scm.com/book/en/v2/Git-Basics-Working-with-Remotes | ||
|
||
</admon> | ||
|
||
This is somehow like GitHub or GitLab providing hosting for source code | ||
repositories. However, DVC does not provide or recommend a specific storage | ||
service. Instead, it adopts a bring-your-own-platform approach, supporting a | ||
wide variety of [storage types](#supported-storage-types). | ||
|
||
The main uses of remote storage are: | ||
|
||
- Synchronize DVC-tracked data (previously <abbr>cached</abbr>). | ||
- Centralize or distribute large file storage for sharing and collaboration. | ||
- Back up different versions of your data and models. | ||
- Save space in your working environment (by deleting pushed files/directories). | ||
|
||
## Configuration | ||
|
||
You can set up one or more remote storage locations, mainly with the | ||
`dvc remote add` and `dvc remote modify` commands. These read and write to the | ||
[`remote`] section of the project's configuration file (`.dvc/config`), which | ||
you could edit manually as well. | ||
|
||
Typically, you'll first register a DVC remote by adding its name and URL (or | ||
file path), e.g.: | ||
|
||
```cli | ||
$ dvc remote add mybucket s3://my-bucket | ||
``` | ||
|
||
Then, you'll usually need or want to configure the remote's authentication | ||
credentials or other properties, etc. For example: | ||
|
||
```cli | ||
$ dvc remote modify --local \ | ||
mybucket credentialpath ~/.aws/alt | ||
$ dvc remote modify mybucket connect_timeout 300 | ||
``` | ||
|
||
<admon type="warn"> | ||
|
||
Make sure to use the `--local` flag when writing secrets to configuration. This | ||
creates a second config file in `.dvc/config.local` that is ignored by Git. This | ||
way your secrets do not get to the repository. See `dvc config` for more info. | ||
|
||
This also means each copy of the <abbr>DVC repository</abbr> may have to | ||
re-configure remote storage authentication. | ||
|
||
</admon> | ||
|
||
<details> | ||
|
||
### Click to see the resulting config files. | ||
|
||
```ini | ||
# .dvc/config | ||
['remote "mybucket"'] | ||
url = s3://my-bucket | ||
connect_timeout = 300 | ||
``` | ||
|
||
```ini | ||
# .dvc/config.local | ||
['remote "mybucket"'] | ||
credentialpath = ~/.aws/alt | ||
``` | ||
|
||
```ini | ||
# .gitignore | ||
.dvc/config.local | ||
``` | ||
|
||
</details> | ||
|
||
Finally, you can `git commit` the changes to share the general configuration of | ||
your remote (`.dvc/config`) via the Git repo. | ||
|
||
[`remote`]: /doc/command-reference/config#remote | ||
|
||
## Supported storage types | ||
|
||
> See more [details](/doc/command-reference/remote/add#supported-storage-types). | ||
### Cloud providers | ||
|
||
- Amazon S3 (AWS) | ||
- S3-compatible e.g. MinIO | ||
- Microsoft Azure Blob Storage | ||
- Google Drive | ||
- Google Cloud Storage (GCP) | ||
- Aliyun OSS | ||
|
||
### Self-hosted / On-premises | ||
|
||
- SSH servers; Like `scp` | ||
- HDFS & WebHDFS | ||
- HTTP | ||
- WebDAV | ||
- Local directories, mounted drives; Like `rsync` | ||
> Includes network resources e.g. network-attached storage (NAS) or other | ||
> external devices |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters