added a BC: workspace document #2197

iesahin · 2021-02-15T08:35:33Z

This doesn't fix any issue completely.

Added workspace.md doc to basic-concepts dir. Expanded this from previous iterations.
Added Basic Concepts section to the sidebar and added this document.
tooltip definition is not changed.

This PR closes #1127 and is related to #550.

…ument to the links

content/docs/sidebar.json

jorgeorpinel · 2021-02-15T18:01:42Z

content/docs/user-guide/basic-concepts/workspace.md

@@ -6,3 +6,46 @@ tooltip: >-
  models, etc. Typically, it's also a Git repository. It will contain your DVC
  project.
 ---
+
+<!-- keywords: data science project architecture, machine learning project architecture, machine learning workflow, data science workflow, machine learning file system, data science file system, data science project structure, machine learning project structure, notebook version control -->


Just a note to remove this comment later before merging (but for now it's useful to have the list during review 👍)

content/docs/user-guide/basic-concepts/workspace.md

jorgeorpinel

Please bear with us as we review this first concept as we will probably establish some criteria for all concepts along the way. Some general comments below to begin with:

jorgeorpinel · 2021-02-15T18:41:46Z

content/docs/user-guide/basic-concepts/workspace.md

+# Workspace
+
+A data science project consists of data obtained from many different sources.
+Most of the time it needs to convert the the format of this data into a form
+that is required by the training models and supplying to data science / machine
+learning workflows. Sometimes it needs to be split into multiple files or


OK to have some context but it should probably be a short intro paragraph only? Maybe just say something about project structures.

I think in BC documents, we need to make separation of concerns. If our aim is a quick introduction to DVC concepts, we don't need such story material to begin with. This was solely for SEO purposes.

IMHO I can write blog posts / tutorials and tell stories with all kinds of SEO phrases there instead of trying to use these in docs. As you can see it's possible but I don't think it increases the quality of docs.

I can paraphrase into a single paragraph or remove them completely. I'm fine with all.

we need to make separation of concerns...

We do already have tooltips for shot definitions. But you can't search for "pipeline dvc" on Google and find a tooltip (i.e. we want to have landing pages for our docs).

Another goal of these is to extract explanations out of the cmd refs (e.g. in https://dvc.org/doc/command-reference/dag which shouldn't be where pipelines are explained)

In my mind most concept pages are 2-3 paragraph long in general (we'll see as they come). There could also be some cool diagrams potentially (feel free to contribute sketches, we can fix them up later). They can still have a story, maybe just a very straightforward one? (motivation -> explanation -> uses/implications). WDYT @shcheklein ?

I'm putting link to the last of these highlighted project words.

p.s. I usually "tooltipy" just one instance of the term per page or big section, but I try to make it the first or 2nd one (as long as it doesn't distract e.g. no other links are nearby).

p.p.s motivation -> explanation -> uses/implications - I see that's more or less the structure here already 👍

content/docs/user-guide/basic-concepts/workspace.md

…oject

content/docs/user-guide/basic-concepts/workspace.md

jorgeorpinel · 2021-02-27T02:36:54Z

content/docs/user-guide/basic-concepts/workspace.md

+contents through DVC commands.
+
+Files and directories in the workspace can be added to DVC (`dvc add`) or they
+can be downloaded from external sources (`dvc get`, `dvc import`,


But actually add can also download data to the workspace (see --out and --to-remote options). Also, import* commands download AND track data. You may want to rephrase this part accordingly 🙂

Sometimes I think there is no command that hasn't got a duplicate, somehow :) I try to mention commands in passing, if we'd consider each and every option to commands, we'll need to duplicate the command reference here IMHO.

We may just delete the commands if you would like.

I can also list all possibilities for each functionality, like

In the workspace, you can

Import files and directories (dvc add --out, dvc import-url...)

but I think this will turn the document into a list of commands and options.

No need to list every command usage of course, agreed!

My point was that these 3 commands mentioned actually overlap in a way that makes the current text slightly incorrect. In any case, the main use case of add is not to "add" but to "track", actually. Please check each cmd ref to try to find the right terms when needed 🙂

"Download" is correct for get/import but add can also download (and they can all "transfer") so I'd avoid that term probably. And in fact I wouldn't even mention get here, since it doesn't require a DVC project/workspace. For import I'd try to use the cmd name as the relevant action (to "import") I guess...

the main use case of add is not to "add" but to "track"
For import I'd try to use the cmd name as the relevant action (to "import")

then again import* also track the downloaded data 😅 ("adds"). Maybe it should be a single sentence about tracking and put all add, import, import-url in the same parenthesis.

content/docs/user-guide/basic-concepts/workspace.md

…sahin/issue53

jorgeorpinel · 2021-03-15T07:43:14Z

OK I see I neglected this accidentally. Sorry, checking now @iesahin 👍

jorgeorpinel · 2021-03-15T07:51:30Z

content/docs/user-guide/basic-concepts/workspace.md

+These may be split into multiple files or directories or (as the project
+structure needs) have different versions for different requirements, e.g., a
+smaller / simplified version might be required in prototyping for faster
+feedback and shorter training times. A single workspace to manage all artifacts


or have different versions for different requirements...

Let's not go into versioning here, I think. At least not by implying they're all in the workspace because in DVC the workspace only holds one version (the rest are cached and managed via Git, metafiles, etc.

(Mentioned in #2197 (comment))

versioning needs and managing dependencies make it increasingly difficult

p.s. This is better way to very subtly mention versioning (could even link to the corresponding Use Case doc).

jorgeorpinel · 2021-03-15T08:08:57Z

content/docs/user-guide/basic-concepts/workspace.md

+of a project is desirable, although versioning needs and managing dependencies
+make it increasingly difficult.
+
+DVC allows a single directory to contain all your project artifacts. The


a single directory to contain all your project artifacts

Not exactly. File contents are org'd in the cache with a special file structure (see https://dvc.org/doc/user-guide/project-structure/internal-files#structure-of-the-cache-directory)

The workspace is the directory containing the visible part of your project

That part is correct. And contradicts the previous part 🙂 (because "visible part" implies there's a hidden part which must be in other dirs).

Let's open this p with that sentence.

jorgeorpinel · 2021-03-15T08:12:08Z

content/docs/user-guide/basic-concepts/workspace.md

+workspace is the directory containing the _visible_ part of your
+<abbr>project</abbr>, e.g., the raw data, source code, model files. You can have


"e.g., the raw data, source code, model files you're currently using"

jorgeorpinel · 2021-03-15T08:12:44Z

content/docs/user-guide/basic-concepts/workspace.md

+<abbr>project</abbr>, e.g., the raw data, source code, model files. You can have
+multiple versions of data, models, and other kinds of artifacts within the
+workspace and limit your focus to a subset of these. You can record your


You can have multiple versions of data ... within the workspace

Again contradicting 😕

jorgeorpinel · 2021-03-15T08:15:35Z

content/docs/user-guide/basic-concepts/workspace.md

+workspace and limit your focus to a subset of these. You can record your
+progress in a commit and analyze your data and model history. DVC provides a


record your progress in a commit and analyze...

Again probably too many details about versioning. Doesn't really fall within the 'workspace' concept, I think. This can be simpler.

No need to rename your models for minor changes...
or save tens of different renamed files for training

Those are better mentions of versioning (clear benefits i.e. how much you'd have to suffer without DVC)

save cleaned up data in different directories

That specific example isn't great because that's still pretty common even with DVC (e.g. in our own example-get-started repo we have a prepared/ dir).

jorgeorpinel · 2021-03-15T08:20:23Z

content/docs/user-guide/basic-concepts/workspace.md

+progress in a commit and analyze your data and model history. DVC provides a
+_machine learning file system_ to manipulate your data and models using its


This "ML file system" keyword is pretty tricky. No need to force it (just skip it if you can't find a correct way to use it).

I can only think of something like "DVC turns your project into a sort of machine learning file system for..." but not sure.

jorgeorpinel · 2021-03-15T08:21:19Z

content/docs/user-guide/basic-concepts/workspace.md

+programs. DVC can keep track of all of these in a single directory called the
+workspace.


DVC can keep track of all of these in a single directory called the workspace.

Again contradicting and also, repetitive at this point.

jorgeorpinel · 2021-03-15T08:23:16Z

content/docs/user-guide/basic-concepts/workspace.md

+DVC supports all typical operations of a versioned data file system through its
+commands. Behind the scene these operations use <abbr>metafiles</abbr> like the


DVC supports all typical operations of a versioned data file system through its commands.

Maybe open the previous paragraph with that?

p.s. having this I think def. no need for the "ml file system" keyword. But keeping "machine learning" somewhere would be nice.

Let's try to incorporate the metafile mentions to the main (2nd) paragraph somehow. After simplifying it per my previous comments, there should be enough room in there. that way there's no need for this 4th p.

jorgeorpinel

More feedback 👍

jorgeorpinel · 2021-05-06T16:48:12Z

I guess I should take this over 👍

iesahin · 2021-05-07T18:36:54Z

Context change makes me miserable. I can take care of this after the GS project if you have other tasks. Thank you @jorgeorpinel

jorgeorpinel · 2021-05-10T03:02:51Z

For some reason I can't fetch iesahin/issue53 so I'm going to merge this to a new branch I pushed instead (concept/workspace) to preserve @iesahin's commits, and reopen the PR from there to wrap it up.

iesahin added 2 commits February 15, 2021 11:26

Added basic concepts: workspace document.

168a290

added a basic concepts section to the sidebar and added workspace doc…

3130952

…ument to the links

shcheklein temporarily deployed to dvc-org-iesahin-issue53-hfg3kw February 15, 2021 08:35 Inactive

jorgeorpinel reviewed Feb 15, 2021

View reviewed changes

content/docs/sidebar.json Show resolved Hide resolved

jorgeorpinel reviewed Feb 15, 2021

View reviewed changes

shcheklein reviewed Feb 15, 2021

View reviewed changes

content/docs/user-guide/basic-concepts/workspace.md Outdated Show resolved Hide resolved

jorgeorpinel reviewed Feb 15, 2021

View reviewed changes

iesahin mentioned this pull request Feb 16, 2021

term: ambiguous use of "external" and "workspace" #1127

Closed

iesahin added 4 commits February 16, 2021 19:36

Merge branch 'master' into iesahin/issue53

c8d77e6

fixed typos

d119729

Revised initial two paragraphs into a single one and abbr link for pr…

e0d915e

…oject

modified the text to mention checkout and removed run and gc

bdbcc97

shcheklein temporarily deployed to dvc-org-iesahin-issue53-hfg3kw February 16, 2021 17:21 Inactive

iesahin mentioned this pull request Feb 16, 2021

concepts: links to related docs in concepts pages #2171

Closed

This comment has been minimized.

Sign in to view

jorgeorpinel reviewed Feb 17, 2021

View reviewed changes

content/docs/user-guide/basic-concepts/workspace.md Outdated Show resolved Hide resolved

jorgeorpinel reviewed Feb 17, 2021

View reviewed changes

content/docs/user-guide/basic-concepts/workspace.md Outdated Show resolved Hide resolved

jorgeorpinel reviewed Feb 17, 2021

View reviewed changes

content/docs/user-guide/basic-concepts/workspace.md Outdated Show resolved Hide resolved

iesahin added 2 commits February 18, 2021 11:06

Removed listed keywords and some other minor modifications

7f2e93d

styled

523f1c1

shcheklein temporarily deployed to dvc-org-iesahin-issue53-hfg3kw February 18, 2021 08:08 Inactive

Modified the definition for more clear sentence.

50964b1

shcheklein temporarily deployed to dvc-org-iesahin-issue53-hfg3kw February 18, 2021 08:49 Inactive

Added dvc filenames to the last paragraph.

46fe4bb

shcheklein temporarily deployed to dvc-org-iesahin-issue53-hfg3kw February 18, 2021 08:59 Inactive

This comment has been minimized.

Sign in to view

iesahin mentioned this pull request Feb 21, 2021

guide: new DVC Concepts section #550

Closed

14 tasks

jorgeorpinel reviewed Feb 22, 2021

View reviewed changes

content/docs/user-guide/basic-concepts/workspace.md Outdated Show resolved Hide resolved

jorgeorpinel reviewed Feb 22, 2021

View reviewed changes

content/docs/user-guide/basic-concepts/workspace.md Outdated Show resolved Hide resolved

Modified the paragraph to emphasize the evolution aspects

44af0a9

shcheklein had a problem deploying to dvc-org-iesahin-issue53-hfg3kw February 22, 2021 18:35 Failure

Modified the paragraph to reduce "file, directory" repeats

42559e2

shcheklein had a problem deploying to dvc-org-iesahin-issue53-hfg3kw February 22, 2021 18:43 Failure

Removed pipelines etc. sentences

6cc0f6c

shcheklein had a problem deploying to dvc-org-iesahin-issue53-hfg3kw February 22, 2021 19:34 Failure

Moved typical ops sentence to the next paragraph and some rephrasing

e85a0d7

shcheklein had a problem deploying to dvc-org-iesahin-issue53-hfg3kw February 22, 2021 19:43 Failure

iesahin added 2 commits February 26, 2021 19:46

Merge branch 'master' of https://github.com/iterative/dvc.org into ie…

5674991

…sahin/issue53

Modified some sentences and the tooltip for clarity

0b099f6

shcheklein temporarily deployed to dvc-org-iesahin-issue53-hfg3kw February 26, 2021 17:06 Inactive

iterative deleted a comment from iesahin Feb 27, 2021

jorgeorpinel reviewed Mar 15, 2021

View reviewed changes

jorgeorpinel self-assigned this May 6, 2021

jorgeorpinel changed the base branch from master to concept/workspace May 10, 2021 03:01

jorgeorpinel merged commit b2c9169 into concept/workspace May 10, 2021

jorgeorpinel mentioned this pull request May 10, 2021

concepts: workspace #2453

Merged

1 task

iesahin deleted the iesahin/issue53 branch July 14, 2021 12:39

		workspace is the directory containing the _visible_ part of your
		<abbr>project</abbr>, e.g., the raw data, source code, model files. You can have

		workspace and limit your focus to a subset of these. You can record your
		progress in a commit and analyze your data and model history. DVC provides a

		progress in a commit and analyze your data and model history. DVC provides a
		_machine learning file system_ to manipulate your data and models using its

		programs. DVC can keep track of all of these in a single directory called the
		workspace.

		DVC supports all typical operations of a versioned data file system through its
		commands. Behind the scene these operations use <abbr>metafiles</abbr> like the

added a BC: workspace document #2197

added a BC: workspace document #2197

Conversation

iesahin commented Feb 15, 2021 • edited by jorgeorpinel Loading

Choose a reason for hiding this comment

This comment was marked as resolved.

This comment was marked as off-topic.

jorgeorpinel left a comment

Choose a reason for hiding this comment

jorgeorpinel Feb 15, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

This comment was marked as resolved.

jorgeorpinel Feb 17, 2021 • edited Loading

Choose a reason for hiding this comment

jorgeorpinel Feb 17, 2021 • edited Loading

Choose a reason for hiding this comment

This comment has been minimized.

This comment has been minimized.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorgeorpinel Mar 15, 2021 • edited Loading

Choose a reason for hiding this comment

jorgeorpinel commented Mar 15, 2021

jorgeorpinel Mar 15, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorgeorpinel Mar 15, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorgeorpinel Mar 15, 2021 • edited Loading

Choose a reason for hiding this comment

jorgeorpinel Mar 15, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorgeorpinel left a comment

Choose a reason for hiding this comment

jorgeorpinel commented May 6, 2021

iesahin commented May 7, 2021

jorgeorpinel commented May 10, 2021

iesahin commented Feb 15, 2021 •

edited by jorgeorpinel

Loading

jorgeorpinel Feb 15, 2021 •

edited

Loading

jorgeorpinel Feb 17, 2021 •

edited

Loading

jorgeorpinel Feb 17, 2021 •

edited

Loading

jorgeorpinel Mar 15, 2021 •

edited

Loading

jorgeorpinel Mar 15, 2021 •

edited

Loading

jorgeorpinel Mar 15, 2021 •

edited

Loading

jorgeorpinel Mar 15, 2021 •

edited

Loading

jorgeorpinel Mar 15, 2021 •

edited

Loading