Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update dvc push documentation #203

Merged
merged 4 commits into from
Mar 18, 2019
Merged

Update dvc push documentation #203

merged 4 commits into from
Mar 18, 2019

Conversation

robogeek
Copy link
Contributor

No description provided.


With the first `dvc push` we specified a stage in the middle of the pipeline
while using `--with-deps`. This started with the named stage and searched
backwards through the pipeline for data files to upload. Because the stage
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do a single space everywhere? :) (btw you have different styles in this document).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does anyone read the raw markdown? What people are reading is the rendered markdown as HTML on the website, or else in the github repository. The raw markdown is for editing. Once it is rendered the number of spaces after a period, how lines are wrapped into paragraphs, and so on, all that is disappeared into the rendered HTML.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I consider writing docs (at least with Markdown or Latex) the same as coding. It's easier for everyone to write/code when there is a common style guide. In this specific case - there are some editors that automatically remove extra spaces (especially trailing). So, if it happens that someone creates a PR to fix a simple mistake there are chances we end up with a lot of unnecessary changes all over the file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points, and I'll keep this in mind. On the other hand consider how documentation editing is different than code editing. Inserting a few words in the middle of a paragraph causes the rest of the paragraph to reflow -- if one is manually adjusting the text to fit into 80 characters per line. Meaning, the one line with inserted words will overflow 80 columns, then every following line in that paragraph probably also overflows, resulting in an excessive diff.

For DVC docs I'm making sure to remove trailing spaces and to fit things into 80 columns. And I've adjusted the settings to insert spaces rather than tabs when hitting TAB.

This command pushes all data file caches related to the current Git branch to
the remote storage.
Uploads files and directories from the current branch in the local workspace to
the [remote storage]('doc/commands-reference/remote').
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We push data from cache based on DVC files in the working space. For example, (let's double check this), if I run something with --no-commit and then dvc push, data from the working space won't be uploaded to remote. Again, let's confirm and let's come with a better summary.


## Description

The `dvc push` command is the twin pair to the `dvc pull` command, and together
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that a lot of users still don't understand how all these commands work (dvc status -c, dvc pull/push/fetch, etc). Could we think about some explanation similar to what we have in dvc add? It might be helpful to try making examples more detailed - show DVC file content, explain that it will extract checksums from it and will be pushing/pulling only those files to/from cache to/from remote.


## Examples

Using the `dvc push` command remote storage must be defined. For an existing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the comment above ^^. I think it worth explaining and illustrating it with more details to show state (with tree .) before/after, show DVC file content, show that a referenced file is in cache or not in cache, etc. It's definitely worth explaining at least at one of those example.

Copy link
Contributor Author

@robogeek robogeek Mar 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have put together an example focusing explicitly on what happens in the cache from dvc push operations. I've pushed it to the pull request so we can discuss whether this form is useful or not.

Copy link
Member

@shcheklein shcheklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks!
Since it repeats a lot the dvc pull document let's spend a few cycles providing a more details, polishing explanation. It feels that it's still hard to understand that all these commands deal with three things - DVC files (scope is determined via various options), cache and remote. All three things are in play and we need to come with some language (similar to dvc add??) how to explain this.

@shcheklein shcheklein merged commit fc492a0 into master Mar 18, 2019
@efiop efiop deleted the push branch March 18, 2019 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants