Skip to content

Commit

Permalink
start: Data Access and Data Versioning to mention Model in titles (#2096
Browse files Browse the repository at this point in the history
) (#2214)

* guide: disclaim x data (impro #2104)

* Added changes from PR #2188 and modified paths & titles

- Changes title of "Data Access" to "Data and Model Access"
- Changes title of "Data Versioning" to "Data and Model Versioning"
- Renames path of Data Access and Data Versioning to
  `data-and-model-access.md` and `data-and-model-versioning.md`
  respectively.
- Adds redirects
-- `/doc/start/data-access` -> `/doc/start/data-and-model-access`
-- `/doc/start/data-versioning` ->
`/doc/start/data-and-model-versioning`
- Replaces links in `/doc/start` with the new links.

* Update redirects-list.json with fixed subsection redirects.

Co-authored-by: Jorge Orpinel <[email protected]>

* Fixed incomplete looking sentence

* merged into a single paragraph

* Divided models sentence and added "large files" phrase.

* Adds new paths to sidebar

* Updated links to data-access and data-versioning cmd ref

* updated links to data-access and data-versioning in blog

* Updated links to data-access and data-versioning in UC

* Updated links to data-access and data-versioning in UG

* updated yarn.lock

* Update content/docs/start/data-and-model-versioning.md

Co-authored-by: Jorge Orpinel <[email protected]>

* Restyled by prettier

* fixes hardcoded links to data-and-model-access in the blog

* minor fixes

* guide: revert Exp Outs guide rename
per #2154 (review)

* start: emphasize models are files (assumption)

* start: roll back unnecessary changes

unnecessary for #2214

Co-authored-by: Jorge Orpinel <[email protected]>
Co-authored-by: Jorge Orpinel <[email protected]>
Co-authored-by: Emre Sahin <iex@levinas>
Co-authored-by: Restyled.io <[email protected]>
  • Loading branch information
5 people authored Mar 29, 2021
1 parent 939a025 commit 1544cd5
Show file tree
Hide file tree
Showing 17 changed files with 61 additions and 56 deletions.
2 changes: 1 addition & 1 deletion content/blog/2020-10-12-october-20-dvc-heartbeat.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ few weeks, so stay tuned. Another big initative is adding videos to our docs:
since video seems like a popular format for a lot of learners, we're working to
supplement our official docs with embedded videos. Check out our first
installment on the
[Getting Started with Data Versioning](https://dvc.org/doc/start/data-versioning).
[Getting Started with Data Versioning](/doc/start/data-and-model-versioning).

https://youtu.be/kLKBcPonMYw

Expand Down
2 changes: 1 addition & 1 deletion content/blog/2020-11-11-november-20-dvc-heartbeat.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ welcome referrals if you know a good candidate)!

We're continuing to develop our video docs, and now half of our "Getting
Started" section has video accompaniments. Check out our latest release on
[data access with DVC](https://dvc.org/doc/start/data-access):
[data access with DVC](/doc/start/data-and-model-access):

https://youtu.be/EE7Gk84OZY8

Expand Down
14 changes: 7 additions & 7 deletions content/blog/2020-12-18-december-20-dvc-heartbeat.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,17 +53,17 @@ As you may have heard
on adding complete video docs to the "Getting Started" section of the DVC site.
We now have 100% coverage! We have videos that mirror the tutorials for:

- [Data versioning](https://dvc.org/doc/start/data-versioning) - how to use Git
and DVC together to track different versions of a dataset
- [Data versioning](/doc/start/data-and-model-versioning) - how to use Git and
DVC together to track different versions of a dataset

- [Data access](https://dvc.org/doc/start/data-access) - how to share models and
- [Data access](/doc/start/data-and-model-access) - how to share models and
datasets across projects and environments

- [Pipelines](https://dvc.org/doc/start/data-pipelines) - how to create
reproducible pipelines to transform datasets to features to models
- [Pipelines](/doc/start/data-pipelines) - how to create reproducible pipelines
to transform datasets to features to models

- [Experiments](https://dvc.org/doc/start/experiments) - how to do a `git diff`
for models that compares and visualizes metrics
- [Experiments](/doc/start/experiments) - how to do a `git diff` for models that
compares and visualizes metrics

https://media.giphy.com/media/L4ZZNbDpOCfiX8uYSd/giphy.gif

Expand Down
5 changes: 3 additions & 2 deletions content/docs/command-reference/diff.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,8 +123,9 @@ $ dvc diff

Let's checkout the
[2-track-data](https://github.com/iterative/example-get-started/releases/tag/2-track-data)
tag, corresponding to the [Data Versioning](/doc/start/data-versioning) _Get
Started_ chapter, right after we added `data.xml` file with DVC:
tag, corresponding to the
[Data Versioning](/doc/start/data-and-model-versioning) _Get Started_ chapter,
right after we added `data.xml` file with DVC:

```dvc
$ git checkout 2-track-data
Expand Down
2 changes: 1 addition & 1 deletion content/docs/command-reference/get.md
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,7 @@ file or directory from. It also has the `--out` option to specify the location
to place the target data within the workspace. Combining these two options
allows us to do something we can't achieve with the regular `git checkout` +
`dvc checkout` process – see for example the
[Get Older Data Version](/doc/start/data-versioning#switching-between-versions)
[Get Older Data Version](/doc/start/data-and-model-versioning#switching-between-versions)
chapter of our _Get Started_.

Let's use the
Expand Down
4 changes: 2 additions & 2 deletions content/docs/command-reference/import-url.md
Original file line number Diff line number Diff line change
Expand Up @@ -190,8 +190,8 @@ $ git checkout 3-config-remote
## Example: Tracking a file from the web

An advanced alternate to the intro of the
[Versioning Basics](/doc/start/data-versioning) part of the _Get Started_ is to
use `dvc import-url`:
[Versioning Basics](/doc/start/data-and-model-versioning) part of the _Get
Started_ is to use `dvc import-url`:

```dvc
$ dvc import-url https://data.dvc.org/get-started/data.xml \
Expand Down
4 changes: 2 additions & 2 deletions content/docs/command-reference/import.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,8 +67,8 @@ data `path`, and the `outs` field contains the corresponding local path in the
<abbr>workspace</abbr>. It records enough metadata about the imported data to
enable DVC efficiently determining whether the local copy is out of date.

To actually [version the data](/doc/start/data-versioning), `git add` (and
`git commit`) the import `.dvc` file.
To actually [version the data](/doc/start/data-and-model-versioning), `git add`
(and `git commit`) the import `.dvc` file.

Note that `dvc repro` doesn't check or update import `.dvc` files (see
`dvc freeze`), use `dvc update` to bring the import up to date from the data
Expand Down
4 changes: 2 additions & 2 deletions content/docs/sidebar.json
Original file line number Diff line number Diff line change
Expand Up @@ -35,13 +35,13 @@
},
"children": [
{
"slug": "data-versioning",
"slug": "data-and-model-versioning",
"tutorials": {
"katacoda": "https://katacoda.com/dvc/courses/get-started/versioning"
}
},
{
"slug": "data-access",
"slug": "data-and-model-access",
"tutorials": {
"katacoda": "https://katacoda.com/dvc/courses/get-started/accessing"
}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
---
title: 'Get Started: Data Access'
title: 'Get Started: Data and Model Access'
---

# Get Started: Data Access
# Get Started: Data and Model Access

Okay, now that we've learned how to _track_ data and models with DVC and how to
version them with Git, next question is how can we _use_ these artifacts outside
of the project? How do I download a model to deploy it? How do I download a
specific version of a model? How do I reuse datasets across different projects?
Okay, we've learned how to _track_ data and models with DVC, and how to commit
their versions to Git. The next questions are: How can we _use_ these artifacts
outside of the project? How do I download a model to deploy it? How to download
a specific version of a model? Or reuse datasets across different projects?

> These questions tend to come up when you browse the files that DVC saves to
> remote storage, e.g.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: 'Get Started: Data Versioning'
description: 'Get started with data versioning in DVC. Learn how to use a
title: 'Get Started: Data and Model Versioning'
description: 'Get started with data and model versioning in DVC. Learn how to use a
regular Git workflow for datasets and ML models, without storing large files in
Git.'
---
Expand All @@ -14,8 +14,8 @@ to a different version of a 100Gb file in less than a second with a
`git checkout`.

The foundation of DVC consists of a few commands that you can run along with
`git` to track large files, directories, or ML models. Think "Git for data".
Read on or watch our video to learn about versioning data with DVC!
`git` to track large files, directories, or ML model files. Think "Git for
data". Read on or watch our video to learn about versioning data with DVC!

https://youtu.be/kLKBcPonMYw

Expand All @@ -34,8 +34,8 @@ $ dvc get https://github.com/iterative/dataset-registry \
```

We use the fancy `dvc get` command to jump ahead a bit and show how Git repo
becomes a source for datasets or models - what we call "data registry" or "model
registry". `dvc get` can download any file or directory tracked in a <abbr>DVC
becomes a source for datasets or models - what we call "data/model registry".
`dvc get` can download any file or directory tracked in a <abbr>DVC
repository</abbr>. It's like `wget`, but for DVC or Git repos. In this case we
download the latest version of the `data.xml` file from the
[dataset registry](https://github.com/iterative/dataset-registry) repo as the
Expand Down Expand Up @@ -90,10 +90,10 @@ outs:

## Storing and sharing

You can upload DVC-tracked data or models with `dvc push`, so they're safely
stored [remotely](/doc/command-reference/remote). This also means they can be
retrieved on other environments later with `dvc pull`. First, we need to setup a
storage:
You can upload DVC-tracked data or model files with `dvc push`, so they're
safely stored [remotely](/doc/command-reference/remote). This also means they
can be retrieved on other environments later with `dvc pull`. First, we need to
setup a storage:

```dvc
$ dvc remote add -d storage s3://mybucket/dvcstore
Expand Down Expand Up @@ -154,9 +154,9 @@ a3

## Retrieving

Having DVC-tracked data stored remotely, it can be downloaded when needed in
other copies of this <abbr>project</abbr> with `dvc pull`. Usually, we run it
after `git clone` and `git pull`.
Having DVC-tracked data and models stored remotely, it can be downloaded when
needed in other copies of this <abbr>project</abbr> with `dvc pull`. Usually, we
run it after `git clone` and `git pull`.

<details>

Expand Down
4 changes: 2 additions & 2 deletions content/docs/start/data-pipelines.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,8 +143,8 @@ stages:
There's no need to use `dvc add` for DVC to track stage outputs (`data/prepared`
in this case); `dvc run` already took care of this. You only need to run
`dvc push` if you want to save them to
[remote storage](/doc/start/data-versioning#storing-and-sharing), (usually along
with `git commit` to version `dvc.yaml` itself).
[remote storage](/doc/start/data-and-model-versioning#storing-and-sharing),
(usually along with `git commit` to version `dvc.yaml` itself).

## Dependency graphs (DAGs)

Expand Down
4 changes: 2 additions & 2 deletions content/docs/start/experiments.md
Original file line number Diff line number Diff line change
Expand Up @@ -172,8 +172,8 @@ $ git commit -a -m "Preserve best random forest experiment"
## Sharing experiments

After committing the best experiments to our Git branch, we can
[store and share](/doc/start/data-versioning#storing-and-sharing) them remotely
like any other iteration of the pipeline.
[store and share](/doc/start/data-and-model-versioning#storing-and-sharing) them
remotely like any other iteration of the pipeline.

```dvc
dvc push
Expand Down
19 changes: 10 additions & 9 deletions content/docs/start/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,15 +53,16 @@ Now you're ready to DVC!
DVC's features can be grouped into functional components. We'll explore them one
by one in the next few pages:

- [**Data versioning**](/doc/start/data-versioning) (try this next) is the base
layer of DVC for large files, datasets, and machine learning models. Use a
regular Git workflow, but without storing large files in the repo (think "Git
for data"). Data is stored separately, which allows for efficient sharing.

- [**Data access**](/doc/start/data-access) shows how to use data artifacts from
outside of the project and how to import data artifacts from another DVC
project. This can help to download a specific version of an ML model to a
deployment server or import a model to another project.
- [**Data and model versioning**](/doc/start/data-and-model-versioning) (try
this next) is the base layer of DVC for large files, datasets, and machine
learning models. Use a regular Git workflow, but without storing large files
in the repo (think "Git for data"). Data is stored separately, which allows
for efficient sharing.

- [**Data and model access**](/doc/start/data-and-model-access) shows how to use
data artifacts from outside of the project and how to import data artifacts
from another DVC project. This can help to download a specific version of an
ML model to a deployment server or import a model to another project.

- [**Data pipelines**](/doc/start/data-pipelines) describe how models and other
data artifacts are built, and provide an efficient way to reproduce them.
Expand Down
8 changes: 4 additions & 4 deletions content/docs/use-cases/data-registries.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@

One of the main uses of <abbr>DVC repositories</abbr> is the
[versioning of data and model files](/doc/use-cases/data-and-model-files-versioning).
DVC also enables cross-project [reusability](/doc/start/data-access) of these
<abbr>data artifacts</abbr>. This means that your projects can depend on data
from other DVC repositories — like a **package management system for data
science**.
DVC also enables cross-project [reusability](/doc/start/data-and-model-access)
of these <abbr>data artifacts</abbr>. This means that your projects can depend
on data from other DVC repositories — like a **package management system for
data science**.

![](/img/data-registry.png) _Data management middleware_

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ Benefits of our approach include:
- **Collaboration**: Easily distribute your project development and share its
data [internally](/doc/use-cases/shared-development-server) and
[remotely](/doc/use-cases/sharing-data-and-model-files), or
[reuse](/doc/start/data-access) it in other places.
[reuse](/doc/start/data-and-model-access) it in other places.

- **Data compliance**: Review data modification attempts as Git
[pull requests](https://www.dummies.com/web-design-development/what-are-github-pull-requests/).
Expand Down
3 changes: 2 additions & 1 deletion content/docs/user-guide/project-structure/dvc-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@
You can use `dvc add` to track data files or directories located in your current
<abbr>workspace</abbr>\*. Additionally, `dvc import` and `dvc import-url` let
you bring data from external locations to your project, and start tracking it
locally. See [Data Versioning](/doc/start/data-versioning) for more info.
locally. See [Data Versioning](/doc/start/data-and-model-versioning) for more
info.

> \* Certain [external locations](/doc/user-guide/managing-external-data) are
> also supported.
Expand Down
2 changes: 2 additions & 0 deletions redirects-list.json
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@
"^/(?:docs|documentation)(/.*)?$ /doc$1",

"^/doc/get-started(/.*)?$ /doc/start",
"^/doc/start/data-versioning$ /doc/start/data-and-model-versioning",
"^/doc/start/data-access$ /doc/start/data-and-model-access",
"^/doc/tutorial(/.*)?$ /doc/start",
"^/doc/tutorials/get-started(/.*)?$ /doc/start",
"^/doc/tutorials/versioning(/.*)?$ /doc/use-cases/versioning-data-and-model-files/tutorial",
Expand Down

0 comments on commit 1544cd5

Please sign in to comment.