diff --git a/content/docs/command-reference/import-url.md b/content/docs/command-reference/import-url.md index 0346f418da..6787def8c9 100644 --- a/content/docs/command-reference/import-url.md +++ b/content/docs/command-reference/import-url.md @@ -146,7 +146,7 @@ $ git checkout 2-remote $ mkdir data ``` -You should now have a blank workspace, just before the +You should now have a blank workspace, just before the [Add Files](/doc/tutorials/get-started/add-files) chapter. diff --git a/content/docs/command-reference/remote/add.md b/content/docs/command-reference/remote/add.md index b0bc12d5e5..5e40bd22fc 100644 --- a/content/docs/command-reference/remote/add.md +++ b/content/docs/command-reference/remote/add.md @@ -355,9 +355,10 @@ $ dvc remote add myremote https://example.com/path/to/dir A "local remote" is a directory in the machine's file system. > While the term may seem contradictory, it doesn't have to be. The "local" part -> refers to the machine where the project is stored, so it can be any directory -> accessible to the same system. The "remote" part refers specifically to the -> project/repository itself. Read "local, but external" storage. +> refers to the machine where the project is stored, so it can be +> any directory accessible to the same system. The "remote" part refers +> specifically to the project/repository itself. Read "local, but external" +> storage. Using an absolute path (recommended): diff --git a/content/docs/command-reference/remote/index.md b/content/docs/command-reference/remote/index.md index aa56881e8f..09a006936b 100644 --- a/content/docs/command-reference/remote/index.md +++ b/content/docs/command-reference/remote/index.md @@ -76,9 +76,9 @@ For the typical process to share the project via remote, see ### What is a "local remote" ? While the term may seem contradictory, it doesn't have to be. The "local" part -refers to the machine where the project is stored, so it can be any directory -accessible to the same system. The "remote" part refers specifically to the -project/repository itself. Read "local, but external" storage. +refers to the machine where the project is stored, so it can be any +directory accessible to the same system. The "remote" part refers specifically +to the project/repository itself. Read "local, but external" storage. diff --git a/content/docs/command-reference/remote/list.md b/content/docs/command-reference/remote/list.md index fcc37b126f..a18510f0aa 100644 --- a/content/docs/command-reference/remote/list.md +++ b/content/docs/command-reference/remote/list.md @@ -46,9 +46,9 @@ Let's for simplicity add a _default_ local remote: ### What is a "local remote" ? While the term may seem contradictory, it doesn't have to be. The "local" part -refers to the machine where the project is stored, so it can be any directory -accessible to the same system. The "remote" part refers specifically to the -project/repository itself. Read "local, but external" storage. +refers to the machine where the project is stored, so it can be any +directory accessible to the same system. The "remote" part refers specifically +to the project/repository itself. Read "local, but external" storage. diff --git a/content/docs/tutorials/get-started/add-files.md b/content/docs/tutorials/get-started/add-files.md index 048aafa213..438ef9495d 100644 --- a/content/docs/tutorials/get-started/add-files.md +++ b/content/docs/tutorials/get-started/add-files.md @@ -40,7 +40,7 @@ Committing DVC-files with Git allows us to track different versions of the ### Expand to learn about DVC internals -`dvc add` moves the actual data file to the cache directory (see +`dvc add` moves the actual data file to the cache directory (see [DVC Files and Directories](/doc/user-guide/dvc-files-and-directories)), while the entries in the workspace may be file links to the actual files in the DVC cache. diff --git a/content/docs/tutorials/get-started/configure.md b/content/docs/tutorials/get-started/configure.md index 753e8ea89d..cb3b429f50 100644 --- a/content/docs/tutorials/get-started/configure.md +++ b/content/docs/tutorials/get-started/configure.md @@ -16,9 +16,9 @@ For simplicity, let's setup a local remote: ### What is a "local remote" ? While the term may seem contradictory, it doesn't have to be. The "local" part -refers to the machine where the project is stored, so it can be any directory -accessible to the same system. The "remote" part refers specifically to the -project/repository itself. Read "local, but external" storage. +refers to the machine where the project is stored, so it can be any +directory accessible to the same system. The "remote" part refers specifically +to the project/repository itself. Read "local, but external" storage. diff --git a/content/docs/tutorials/pipelines.md b/content/docs/tutorials/pipelines.md index 1ba072eb11..537351863e 100644 --- a/content/docs/tutorials/pipelines.md +++ b/content/docs/tutorials/pipelines.md @@ -104,7 +104,7 @@ When we run `dvc add` `Posts.xml.zip`, DVC creates a At DVC initialization, a new `.dvc/` directory is created for internal configuration and cache -[files and directories](/doc/user-guide/dvc-files-and-directories), that are +[files and directories](/doc/user-guide/dvc-files-and-directories) that are hidden from the user. This directory is automatically staged with `git add`, so it can be easily committed with Git. @@ -126,9 +126,9 @@ This file can be committed with Git instead of the data file itself. The data file `Posts.xml.zip` is linked (or copied) from `.dvc/cache/ce/68b98d82545628782c66192c96f2d2`, and added to `.gitignore`. Even -if you remove it from the workspace, or `git checkout` a different commit, the -data is not lost if a corresponding DVC-file is committed. It's enough to run -`dvc checkout` or `dvc pull` to restore data files. +if you remove it from the workspace, or `git checkout` a different +commit, the data is not lost if a corresponding DVC-file is committed. It's +enough to run `dvc checkout` or `dvc pull` to restore data files. @@ -183,10 +183,10 @@ outs: ``` Just like the DVC-file we created earlier with `dvc add`, this stage file uses -`md5` hashes (that point to the cache) to describe and version control -dependencies and outputs. Output `data/Posts.xml` file is saved as +`md5` hashes (that point to the cache) to describe and version +control dependencies and outputs. Output `data/Posts.xml` file is saved as `.dvc/cache/a3/04afb96060aad90176268345e10355` and linked (or copied) to the -workspace, as well as added to `.gitignore`. +workspace, as well as added to `.gitignore`. Two things are worth noticing here. First, by analyzing dependencies and outputs that DVC-files describe, we can restore the full series of commands (pipeline diff --git a/content/docs/tutorials/versioning.md b/content/docs/tutorials/versioning.md index 20234ffddb..06eb3a8624 100644 --- a/content/docs/tutorials/versioning.md +++ b/content/docs/tutorials/versioning.md @@ -163,9 +163,9 @@ $ git tag -a "v1.0" -m "model v1.0, 1000 images" ### Expand to learn more about DVC internals As we mentioned briefly, DVC does not commit the `data/` directory and -`model.h5` file with Git. Instead, `dvc add` stores them in the cache (usually -in `.dvc/cache`) and adds them to `.gitignore`. We then `git commit` DVC-files -that contain file hashes that point to cached data. +`model.h5` file with Git. Instead, `dvc add` stores them in the +cache (usually in `.dvc/cache`) and adds them to `.gitignore`. We +then `git commit` DVC-files that contain file hashes that point to cached data. In this case we created `data.dvc` and `model.h5.dvc`. Refer to [DVC-File Format](/doc/user-guide/dvc-file-format) to learn more about how these @@ -281,8 +281,8 @@ the `v2.0` tag. ### Expand to learn more about DVC internals As we have learned already, DVC keeps data files out of Git (by adjusting -`.gitignore`) and puts them into the cache (usually it's a `.dvc/cache` -directory inside the repository). Instead, DVC creates +`.gitignore`) and puts them into the cache (usually it's a +`.dvc/cache` directory inside the repository). Instead, DVC creates [DVC-files](/doc/user-guide/dvc-file-format). These text files serve as data placeholders that point to the cached files, and they can be easily version controlled with Git. diff --git a/content/docs/user-guide/basic-concepts/dvc-project.md b/content/docs/user-guide/basic-concepts/dvc-project.md new file mode 100644 index 0000000000..486e994379 --- /dev/null +++ b/content/docs/user-guide/basic-concepts/dvc-project.md @@ -0,0 +1,20 @@ +--- +name: 'DVC Project' +match: + [ + 'DVC project', + 'DVC projects', + project, + projects, + 'DVC repository', + 'DVC repositories', + repository, + repositories, + ] +--- + +Initialized by running `dvc init` in the **workspace** (typically in a Git +repository). It will contain the +[`.dvc/` directory](/doc/user-guide/dvc-files-and-directories) and +[DVC-files](/doc/user-guide/dvc-file-format) created with commands such as +`dvc add` or `dvc run`.