-
Notifications
You must be signed in to change notification settings - Fork 394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
guide: remote storage #4058
Merged
Merged
guide: remote storage #4058
Changes from 38 commits
Commits
Show all changes
100 commits
Select commit
Hold shift + click to select a range
7350938
guide: draft structure of Data Mgmt and
jorgeorpinel 203f6a6
guide: full text for draft intro to DM
jorgeorpinel 90eaa5d
Merge branch 'main' into guide/data-mgmt-flows
jorgeorpinel eb246bb
guide: hide cloud versioning info
jorgeorpinel a3687ec
guide: clarify Data Mgmt parts and
jorgeorpinel fad0bad
guide: add figure drafts to Data Mgmt
jorgeorpinel 4e3c3da
guide: SCM->VC (Data Mgmt)
jorgeorpinel 7f02c15
guide: update 2 figs and add 1 more (Data Mgmt)
jorgeorpinel f41d16e
Merge branch 'main' into guide/data-mgmt-flows
jorgeorpinel 3a9a045
Merge branch 'main' into guide/data-mgmt-flows
jorgeorpinel df40521
Merge branch 'main' into guide/data-mgmt-flows
jorgeorpinel adc13ee
Merge branch 'guide/data-mgmt-flows' into guide/data-mgmt/remote-config
jorgeorpinel c0b92f1
guide: roll back unrelated changes
jorgeorpinel 636872a
Merge branch 'guide/data-mgmt-flows' into guide/data-mgmt/remote-config
jorgeorpinel c2303c0
guide: mention clouds first (DM) and
jorgeorpinel 62997ab
guide: flatten DM index
jorgeorpinel fc74c53
guide: udpates to DM/ DV
jorgeorpinel 8c40a03
guide: add DM/ Data Versioning page
jorgeorpinel 1a8ca61
guide: update outdated link
jorgeorpinel 27be87f
guide: revert more unrelatedly chaqnged files
jorgeorpinel aaee7af
guide: remove unused ref link
jorgeorpinel dd99f21
Merge branch 'guide/data-mgmt-flows' into guide/data-mgmt/remote-config
jorgeorpinel 118e3eb
guide: DM/ Remote Storage (not just Setup) and
jorgeorpinel 24c331a
guide: remove a comment
jorgeorpinel ff85dcc
Merge branch 'guide/data-mgmt-flows' into guide/data-mgmt/remote-config
jorgeorpinel 266a8f7
guide: draft for DM/ Remote Storage content
jorgeorpinel b04f20a
ref: expand config.remote and link to/from Remotes guide
jorgeorpinel 1c77de4
ref: fix remote config file examples
jorgeorpinel 8e7c320
guide: complete Remote Config section and
jorgeorpinel 9b904f5
guide: complete list of supported storage types
jorgeorpinel 3b5e520
guide: clarify `remote modify` phrase in
jorgeorpinel 73e2f55
Merge branch 'main' into guide/data-mgmt-flows
jorgeorpinel 7fc7fa3
Merge branch 'guide/data-mgmt-flows' into guide/data-mgmt/remote-config
jorgeorpinel ff7e666
Update content/docs/user-guide/data-management/data-versioning.md
c0026fc
guide: update versioning config
jorgeorpinel 71b599c
guide: don't call remote storage "additional" here
jorgeorpinel 9774855
guide: pull -> download (DM/ RS intro)
jorgeorpinel e5c6f13
guide: remove "optional" from Remote Storage nav & title
jorgeorpinel ec1af6d
Merge branch 'main' into guide/data-mgmt-flows
jorgeorpinel 2f31bb6
guide: splits and notes around Data Mgmt index page
jorgeorpinel a84c442
guide: Data Mgmt intro + note updates
jorgeorpinel ab55389
guide: draft of all contents +
jorgeorpinel 31d5288
Merge branch 'main' into guide/data-mgmt-flows
jorgeorpinel a13f989
Merge branch 'main' into guide/data-mgmt-flows
jorgeorpinel 601c99e
guide: small impros to Data Mgmt
jorgeorpinel a8bad84
guide: rewrite Data Mgmt index in before/after form
jorgeorpinel c8cc17b
guide: add draft figure for Data Mgmt
jorgeorpinel 3cb84cb
Merge branch 'main' into guide/data-mgmt-flows
jorgeorpinel a13cb0f
guide: simplify/refocus data mgmt index
jorgeorpinel e3ba70b
Merge branch 'main' into guide/data-mgmt-flows
jorgeorpinel c29d9ec
work around commented header bug
jorgeorpinel 875fba3
Merge branch 'main' into guide/data-mgmt-flows
jorgeorpinel 831ad1d
Merge branch 'main' into guide/data-mgmt-flows
jorgeorpinel 8ddda9c
guide: drop DM/ DV page
jorgeorpinel 28322e5
guide: rewrite DM intro and
jorgeorpinel 179d172
guide: use DM table instead of figure for now
jorgeorpinel d979a5e
Merge branch 'main' into guide/data-mgmt-flows
jorgeorpinel 74bc156
guide: rewrite Data Mgmt story
jorgeorpinel e138096
guide: add draft figures to Data Mgmt
jorgeorpinel f904038
guide: simplify Data Mgmt story and benefits
jorgeorpinel e1772ea
guide: remove unused images (DM)
jorgeorpinel cc0390e
guide: update Data Mgmt figures (v1)
jorgeorpinel 4ee3223
guide: rewrite text of Data Mgmt index
jorgeorpinel 149599b
Merge branch 'main' of github.com:iterative/dvc.org into guide/data-m…
rogermparent f2acb66
guide: update Data Mgmt figures
jorgeorpinel 723eb50
guide: iterate on Data Mgmt again
jorgeorpinel 4b67b64
guide: update Data Mgmt figs
jorgeorpinel 9eb7143
guide: more supporting info about Data Mgmt
jorgeorpinel e598839
Merge branch 'main' into guide/data-mgmt-flows
jorgeorpinel dd4466e
guide: update figures (much more concrete) and
jorgeorpinel d637179
guide: edits to How it works (Data Mgmt)
jorgeorpinel c007817
Merge branch 'main' into guide/data-mgmt-flows
jorgeorpinel 5a0fd57
Merge branch 'main' into guide/data-mgmt-flows
jorgeorpinel 3eb81ff
guide: update Data Mgmt figures
jorgeorpinel 98e73ff
Merge branch 'main' into guide/data-mgmt-flows
jorgeorpinel 67b1717
Merge branch 'main' into guide/data-mgmt-flows
jorgeorpinel f3af183
guide: emphaisze dataset versions in UG fig 1
jorgeorpinel 206ce77
Merge branch 'main' into guide/data-mgmt-flows
jorgeorpinel 075aaf3
guide: update Data Mgmt figures (with notes),
jorgeorpinel 7377500
guide: more updates to text and figure styles,
jorgeorpinel baf5b4c
guide: update figures and text (Data Mgmt) ...
jorgeorpinel fb35df5
Merge branch 'main' into guide/data-mgmt-flows
jorgeorpinel 4475f78
guide: Data Management text (section 1)
jorgeorpinel 20fbaae
guide: Data Management (main text)
jorgeorpinel 1da7b8a
guide: Data Management (secondary text)
jorgeorpinel 61e2865
Merge branch 'guide/data-mgmt-flows' of github.com:iterative/dvc.org …
jorgeorpinel ed63127
guide: add DVC data mgmt technical diagram &
jorgeorpinel 0109cf3
guide: update Data Mgmt text
jorgeorpinel 77330cc
Merge branch 'main' into guide/data-mgmt-flows
jorgeorpinel 956b03d
Merge branch 'main' into guide/data-mgmt-flows
jorgeorpinel 7152ad3
guide: udpate text and 2nd figure (Data Mgmt)
jorgeorpinel f29da1e
guide: draft 2nd and 3rd figures
jorgeorpinel 8f49a72
guide: rewrite Data Mgmt/ How it works &
jorgeorpinel f876c17
guide: update drafts of Data Mgmt figures 2, 3
jorgeorpinel ee3f721
guide: Data Mgmt improvements and
jorgeorpinel 061a918
Merge branch 'main' into guide/data-mgmt-flows
jorgeorpinel ac50c94
Merge branch 'guide/data-mgmt-flows' into guide/data-mgmt/remote-config
jorgeorpinel d781fdd
guide: separate from Data Mgmt work
jorgeorpinel a8acb25
guide: remove hidden Storage locations page for now
jorgeorpinel 882170a
guide: small cleanup of Remote storage page
jorgeorpinel File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,13 @@ | ||
# remote add | ||
|
||
Add a new [data remote](/doc/command-reference/remote). | ||
Register a new [DVC remote](/doc/user-guide/data-management/remote-storage). | ||
|
||
> Depending on your storage type, you may also need `dvc remote modify` to | ||
> provide credentials and/or configure other remote parameters. | ||
<admon type="tip"> | ||
Comment on lines
-3
to
+5
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. All the
|
||
|
||
Depending on your storage type, you may also need `dvc remote modify` to provide | ||
credentials and/or configure other remote parameters. | ||
|
||
</admon> | ||
|
||
## Synopsis | ||
|
||
|
@@ -26,9 +30,9 @@ for the first remote): | |
|
||
```ini | ||
['remote "myremote"'] | ||
url = /tmp/dvcstore | ||
url = /tmp/dvcstore | ||
[core] | ||
remote = myremote | ||
remote = myremote | ||
``` | ||
|
||
> 💡 Default remotes are expected by commands that accept a `-r`/`--remote` | ||
|
@@ -379,10 +383,10 @@ Using an absolute path (recommended): | |
```cli | ||
$ dvc remote add -d myremote /tmp/dvcstore | ||
$ cat .dvc/config | ||
... | ||
['remote "myremote"'] | ||
url = /tmp/dvcstore | ||
... | ||
... | ||
['remote "myremote"'] | ||
url = /tmp/dvcstore | ||
... | ||
``` | ||
|
||
> Note that the absolute path `/tmp/dvcstore` is saved as is. | ||
|
@@ -393,10 +397,10 @@ directory, but saved **relative to the config file location**: | |
```cli | ||
$ dvc remote add -d myremote ../dvcstore | ||
$ cat .dvc/config | ||
... | ||
['remote "myremote"'] | ||
url = ../../dvcstore | ||
... | ||
... | ||
['remote "myremote"'] | ||
url = ../../dvcstore | ||
... | ||
``` | ||
|
||
> Note that `../dvcstore` has been resolved relative to the `.dvc/` dir, | ||
|
@@ -423,10 +427,10 @@ The <abbr>project</abbr>'s config file (`.dvc/config`) now looks like this: | |
|
||
```ini | ||
['remote "myremote"'] | ||
url = s3://mybucket/path | ||
region = us-east-2 | ||
url = s3://mybucket/path | ||
region = us-east-2 | ||
[core] | ||
remote = myremote | ||
remote = myremote | ||
``` | ||
|
||
The list of remotes should now be: | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
70 changes: 70 additions & 0 deletions
70
content/docs/user-guide/data-management/data-versioning.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
# Data Versioning | ||
|
||
DVC enables [version control] for data science. But DVC does not actually | ||
implement versioning directly! Instead, DVC focuses on [codifying your data]: | ||
generating small [metafiles] that you can handle with standard [Git workflows] | ||
(commits, branching, pull requests, etc.). | ||
|
||
The resulting projects are neatly organized in the "space dimension", having | ||
only the files and directories needed at the time and without complicated, ad | ||
hoc file names like `2022-10-20_linear-model_v2-Carl`. Project versions live in | ||
the "time dimension" ([Git history]). | ||
|
||
![Versioned ML project](/img/versioned-project.png) _Navigate versions with Git | ||
commits_ | ||
|
||
**Data version control** is the unifying trait across DVC features (data | ||
management and beyond). | ||
|
||
<admon icon="book"> | ||
|
||
Refer to [Versioning Data and Models] to learn more. | ||
|
||
[versioning data and models]: /doc/use-cases/versioning-data-and-models | ||
|
||
</admon> | ||
|
||
[version control]: | ||
https://git-scm.com/book/en/v2/Getting-Started-About-Version-Control | ||
[codifying your data]: /doc/use-cases/versioning-data-and-models | ||
[metafiles]: /doc/user-guide/project-structure | ||
[git workflows]: https://www.atlassian.com/git/tutorials/comparing-workflows | ||
[git history]: | ||
https://git-scm.com/book/en/v2/Git-Basics-Viewing-the-Commit-History | ||
|
||
<!-- | ||
## Cloud versioning | ||
|
||
_New in DVC 2.30.0 (see `dvc version`)_ | ||
|
||
To simplify remote data operations, DVC now supports native versioning of files | ||
and directories on several cloud providers. This means that you can browse your | ||
files normally as you would see them in your local workspace. | ||
--> | ||
|
||
## Project configuration | ||
|
||
Besides metafiles, <abbr>DVC projects</abbr> may contain a config file | ||
(`.dvc/config`) that can also be treated as code when it comes to version | ||
control. | ||
|
||
<admon icon="book"> | ||
|
||
See `dvc config` for more information on DVC config. | ||
|
||
</admon> | ||
|
||
Some times it's important to version configuration changes along with | ||
corresponding data updates. Most notably, if you [set up remote storage] and | ||
`dvc push` data for others to `dvc pull` later, you should `git commit` both the | ||
metafile(s) and `.dvc/config` to the repo. | ||
|
||
Advanced situations where this may also be necessary: | ||
|
||
- When migrating to a [shared cache] | ||
- If you change a `dvc config parsing` option, which impact how `dvc.yaml` files | ||
get parsed. | ||
|
||
[set up remote storage]: | ||
/doc/user-guide/data-management/remote-storage#configuration | ||
[shared cache]: doc/user-guide/how-to/share-a-dvc-cache |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
# Data Management with DVC | ||
|
||
DVC helps you manage and share arbitrarily large files, datasets, and ML models | ||
anywhere: cloud storage, SSH servers, network resources (e.g. NAS), mounted | ||
drives, local file systems, etc. You manipulate DVC project normally in your | ||
local workspace; DVC tracks, restores, and synchronizes them across locations. | ||
|
||
![Storage locations](/img/storage-locations.png) _Local, external, and remote | ||
storage locations_ | ||
|
||
Every <abbr>DVC project</abbr> starts with 2 locations. The | ||
<abbr>workspace</abbr> is the main project directory, containing your data, | ||
models, source code, etc. DVC also creates a <abbr>data cache</abbr> (found | ||
locally in `.dvc/cache` by default), which will be used as fast-access storage | ||
for DVC operations. | ||
|
||
<admon type="tip"> | ||
|
||
The cache can be moved to an external location in the file system or network, | ||
for example to [share it] among several projects. It could even be set up in a | ||
remote system (Internet access), but this is typically too slow for working with | ||
data regularly. | ||
|
||
</admon> | ||
|
||
[share it]: /doc/user-guide/how-to/share-a-dvc-cache | ||
|
||
Optionally, DVC supports additional storage locations such as cloud services | ||
(Amazon S3, Google Drive, Azure Blob Storage, etc.), SSH servers, | ||
network-attached storage, etc. These are called [DVC remotes], and help you to | ||
share or back up copies of your data assets. | ||
|
||
<admon type="info"> | ||
|
||
DVC remotes are similar to Git remotes, but for <abbr>cached</abbr> data. | ||
|
||
</admon> | ||
|
||
[dvc remotes]: /doc/user-guide/data-management/remote-storage |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
core.remote
description, linked to/from Remotes guide, and added a simple example.