Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repro: 1.x update #1572

Merged
merged 11 commits into from
Jul 22, 2020
Merged

repro: 1.x update #1572

merged 11 commits into from
Jul 22, 2020

Conversation

sarthakforwet
Copy link
Contributor

A PR as an outcome of the dicsussion at #1445(review).

This PR intends to completely update dvc repro documentation to sync with the 1.0 version of DVC.

@sarthakforwet

This comment has been minimized.

@jorgeorpinel

This comment has been minimized.

@sarthakforwet

This comment has been minimized.

@jorgeorpinel

This comment has been minimized.

Copy link
Contributor

@jorgeorpinel jorgeorpinel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the initiative @sarthakforwet , definitely a much needed update.

  • What about the Examples though? Need to test them all and update if needed — can be done in a separate PR though so no pressure. - Misc. updates #1615

content/docs/command-reference/repro.md Outdated Show resolved Hide resolved
content/docs/command-reference/repro.md Outdated Show resolved Hide resolved
content/docs/command-reference/repro.md Outdated Show resolved Hide resolved
content/docs/command-reference/repro.md Outdated Show resolved Hide resolved
content/docs/command-reference/repro.md Outdated Show resolved Hide resolved
content/docs/command-reference/repro.md Outdated Show resolved Hide resolved
content/docs/command-reference/repro.md Outdated Show resolved Hide resolved
sarthakforwet and others added 2 commits July 14, 2020 02:31
Co-authored-by: Jorge Orpinel <[email protected]>
Co-authored-by: Jorge Orpinel <[email protected]>
@jorgeorpinel

This comment has been minimized.

Copy link
Contributor

@jorgeorpinel jorgeorpinel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting there @sarthakforwet but a few items are pending from the previous review, and here's a few more:

content/docs/command-reference/repro.md Outdated Show resolved Hide resolved
content/docs/command-reference/repro.md Outdated Show resolved Hide resolved
@sarthakforwet
Copy link
Contributor Author

sarthakforwet commented Jul 14, 2020

I also said "targets doesn't support dirs." but actually I think that's wrong

Yeah actually, targets do support dirs:

So basically we have the following difference between usage of -R and -c option -

R_ cdiff

Directory structure -

directory

I think the -c option should be updated by removing the following line -
" Instead of using --cwd, one can alternately specify a target in a subdirectory as path/to/target.dvc."

and adding an example of using dvc repro with specific targets.

@jorgeorpinel

This comment has been minimized.

Comment on lines 106 to 107
- `-c <path>`, `--cwd <path>` - directory within the project to reproduce from.
Instead of using `--cwd`, one can alternately specify a target in a
Copy link
Contributor

@jorgeorpinel jorgeorpinel Jul 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like I just suggested, I'd like to test whether this option even makes provides any value now that dvc repro accepts directories as targets. But even if it does, the paragraph is wrong or outdated, for example "specify a target in a subdirectory as path/to/target.dvc" — target.dvc is not a subdir.

This comment was marked as resolved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how it's obvious

Oh sorry! Maybe I interpreted the review incorrectly but I think you are asking for this -

image

Thanks

Copy link
Contributor

@jorgeorpinel jorgeorpinel Jul 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. You are correct for this scenario (./ which isn't the only scenario). The thing is you should be able to use dvc repro path/to/dir (without flags) if path/to/dir is tracked by DVC, I think (maybe not). That's what I'd like you to play with please, and if it does work, compare using or not using -c and -R in that scenario, and find a good way to express all your findings in the description (as well, probably, as the new Specific files or directories example you're suggesting).

Copy link
Contributor

@jorgeorpinel jorgeorpinel Jul 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • If you haven't done this, let's leave it for a separate PR @sarthakforwet

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is you should be able to use dvc repro path/to/dir (without flags) if path/to/dir is tracked by DVC,

We can specify the corresponding .dvc file of the DVC tracked directory in dvc repro (which would update any change in the directory) but not the directory name itself. Moreover we cannot build a pipeline in DVC tracked directories ( as per my experimentation dunno if documented already). Also I think directories have no logical use as a dependency in a pipeline. (although we can add them if we wish). And hence I think it is not incorrect if dvc repro does not take directories as targets (without flags).

Since we cannot specify a directory in dvc repro, to reproduce pipelines inside a directory/sub-directory (which is untracked), we make use of -c and -R options.

However no example in the doc for using dvc repro with a specific target stage or .dvc file is provided and hence I think this should be done. WDYT @jorgeorpinel?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can specify the corresponding .dvc file ... but not the directory name itself

(or a stage name). OK got it. I think you're right. Except when you use -R.

  • This should be clarified in the -R option desc: It expands what the targets argument accepts.

we cannot build a pipeline in DVC tracked directories

I never thought about that... I guess it's to be expected. May need a core repo issue, will think about it 🧠 thanks

I think directories have no logical use as a dependency in a pipeline

Why not? Of course you can use a directory as a dependency. They can also be outputs. I don't see how this affects our conversation though.

to reproduce pipelines inside a directory/sub-directory (which is untracked), we make use of -c and -R options

And -P.

  • no example in the doc for using dvc repro with a specific target stage or .dvc file is provided and hence I think this should be done

Agree. I'll create an issue about these 2 check boxes, ans possibly something else... I'll tag you there. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #1632

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(or a stage name)

I am referencing DVC tracked directories specifically in #1572(review) and I think "stage name" should not be mentioned with that..

And -P.

Yeah sure, but I was talking specifically in comparison of dvc repro with/without -R and -c options.

Agree. I'll create an issue about these 2 check boxes, ans possibly something else...

Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am referencing DVC tracked directories specifically

Stages in dvc.yaml can output directories too. It't not exclusive to .dvc files.

@iterative iterative deleted a comment from sarthakforwet Jul 17, 2020
@sarthakforwet

This comment has been minimized.

@sarthakforwet
Copy link
Contributor Author

@jorgeorpinel Although we cannot perform parallel stage execution but we can perform sequential stage execution (even when having different branches in the pipeline) like this-

image

would it be beneficial to mention this too?

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Jul 17, 2020

Although we cannot perform parallel stage execution but we can perform sequential stage execution

UPDATE: Let's leave this for another PR. See #1572 (comment) also.

command: by specifying stage or `.dvc` file to reproduce, or by using the
command: by specifying a stage or a `.dvc` file to reproduce, or by using the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one stage or one .dvc file is accepted as targets?

Copy link
Contributor Author

@sarthakforwet sarthakforwet Jul 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there can be many. The sentence should be "specifying stages or .dvc files..." but only the grammar fix in the current text was asked so only did that. Should I update it?

Copy link
Contributor

@jorgeorpinel jorgeorpinel Jul 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"specifying stages or .dvc files" is exactly the grammar fix I was looking for.

@sarthakforwet
Copy link
Contributor Author

What about the Examples though? Need to test them all and update if needed.

The "Example: Downstream" probably needs updation @jorgeorpinel. Following updates could be needed-

a.) The "Example: Downstream" describes text.txt as the dependency in the last stage as - "The reason being that the text.txt file is a dependency in the last stage of the pipeline " but it is a dependency in the stage "filter". So we should update it accordingly.

image

b. Performing dvc repro --downstream in the example would reproduce the complete pipeline as no stages are specified and text.txt is also changed. So I think we should specify a stage to depict its true usage.

c. Running dvc repro --downstream does not update .dvc file as -

image

So should we also mention it in the doc?

I would open a new PR and mention this discussion if the changes are valid.

@jorgeorpinel jorgeorpinel changed the title Updated dvc repro documentation to sync with DVC 1.0 release repro: 1.x update Jul 22, 2020
@sarthakforwet
Copy link
Contributor Author

@jorgeorpinel Are we ready to merge this PR?

jorgeorpinel added a commit that referenced this pull request Jul 22, 2020
Copy link
Contributor

@jorgeorpinel jorgeorpinel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

A few items are pending which I marked with checkboxes in the discussion above. And I'm about to reply again to the question of dvc repro examples... ⏳

@jorgeorpinel jorgeorpinel merged commit 5fb3b4a into iterative:master Jul 22, 2020
@jorgeorpinel
Copy link
Contributor

Thanks again @sarthakforwet

On the examples, I've started updating them in #1615 since it's mostly typos and corrections, not really 1.x-related updates. Feel free to contribute to that branch (although you would need to open a PR from your fork to my branch... or you can leave suggestions in the changes).

b. Performing dvc repro --downstream in the example would reproduce the complete pipeline as no stages are specified and text.txt is also changed. So I think we should specify a stage to depict its true usage.

Are you sure? Meaning have you actually tried this? If so, that would probably be 1.x-related, actually.

c. Running dvc repro --downstream does not update .dvc file as -

Interesting finding... I don't know if that's expected, or some bug! Please open a report in the core repo (link here for context).

@jorgeorpinel jorgeorpinel mentioned this pull request Jul 22, 2020
@sarthakforwet
Copy link
Contributor Author

Are you sure?

Yeah I'm sure about this. It is evident from the screenshot in the next point. If we take another example -

image

We can say that by-default dvc repro --downstream starts with the first stage of the pipeline.

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Jul 22, 2020

In that case like I said the --downstream option description definitely needs an update, as well as that example itself (see #1615 (review)). Would you like to submit a PR to my jorge branch with that? (since I already started #1615 using that branch)

Feel free to also include the 2 pending items listed under #1572 (review) above.

@sarthakforwet
Copy link
Contributor Author

Would you like to submit a PR to my jorge branch with that?

Yeah absolutely.

Feel free to also include the 2 pending items listed under #1572 (review) above.

Sure!

@sarthakforwet sarthakforwet deleted the repro branch July 22, 2020 20:56
command: by specifying stage file `targets`, or by using the `--single-item`,
`--cwd`, or other options.
There are a few ways to restrict the stages that will be regenerated by this
command: by specifying stages or `.dvc` files to reproduce, or by using the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can it accept an output?

stages -> stage names

.dvc - probably it's better not to metion this? what is the use case for this? can it be done w/o repro?

Copy link
Contributor Author

@sarthakforwet sarthakforwet Jul 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can it be done w/o repro?

Yes, .dvc files can be updated via dvc add.

Copy link
Contributor

@jorgeorpinel jorgeorpinel Jul 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can it accept an output?

Not supported by repro

.dvc - what is the use case for this?

Same as using dvc add again on a changed data file.

Keeping it in this paragraph because it's not an edge case to have the raw data as .dvc files at the beginning of a pipeline and dvc repro will by default update these if the data changed.

Updated in #1615.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if it the case (repro updates .dvc) - it's a side effect, not a main goal of the dvc repro ... dvc repro something.dvc for dvc add-ed files is confusing and not worth complicating docs for I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dvc repro something.dvc for dvc add-ed files is confusing

Good point. Generalized in ebc1560.

@@ -53,7 +56,8 @@ corresponding commands. <abbr>Outputs</abbr> are deleted from the

It saves all the data files, intermediate or final results into the <abbr>DVC
cache</abbr> (unless the `--no-commit` option is used), and updates the hash
values of changed dependencies and outputs in the corresponding stage files.
values of changed dependencies and outputs in the corresponding stages (or
`.dvc` files).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to complicate everything with .dvc files?

Copy link
Contributor

@jorgeorpinel jorgeorpinel Jul 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this is a pretty common thing as I just explained. And we no longer call .dvc files "stages", so we need to keep mentioning both some times... In fact I removed the parentheses and changed or->and.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's just use ... in the DVC files (.dvc, dvc.lock, etc) ... essence first, details second

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Updated to "in the DVC files (dvc.lock and .dvc)" (those are the only 2 that apply here) also in ebc1560.

Copy link
Contributor

@jorgeorpinel jorgeorpinel Jul 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DVC files (dvc.lock and .dvc)

Wouldn't it be better to just have "dvc.lock and .dvc files" though? Shorter

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, can be that as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed. I like it also because so far I've managed to avoid the term "DVC file" It's confusing vs. .dvc file, which is just one type of DVC file...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would even rename the DVC Files and Dirs guide to Internal Files and Dirs... But anyway, moving on!


- `-P`, `--all-pipelines` - reproduce all pipelines, for all the stage files
present in `DVC` repository.
- `-P`, `--all-pipelines` - reproduce all pipelines, for all the stages present
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it reproduce all pipelines within one dvc.yaml or within all of them? (search recusrsively)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It reproduces all of the pipelines (recursive search).

This comment was marked as resolved.

This comment was marked as resolved.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tested. -P makes it recursive indeed (checks all dvc.yaml files). I rewrote the description anyway since it didn't make much sense. See #1615.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The funny think is that status is recursive by default, but repro needs -P... Is that a bug? Cc @efiop

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @sarthakforwet, for testing this. I would make the note more explicit then

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already updated the note.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

status is recursive by default, but repro needs -P

Extacted this matter iterative/dvc/issues/4273 with much more detail. PTAL

jorgeorpinel added a commit that referenced this pull request Jul 23, 2020
jorgeorpinel added a commit that referenced this pull request Jul 23, 2020
jorgeorpinel added a commit that referenced this pull request Jul 23, 2020
shcheklein pushed a commit that referenced this pull request Aug 5, 2020
* cmd: review repro examples
per #1572 (comment)

* cmd: fic tyupo in get-url

* cmd: updates to repro
per #1572 (review)

* cmd: rewrite repro -P desc
rel #1572 (review)

* cmd: simplified and generalize repro targets desc and DVC file mention
per #1572 (comment)
and #1572 (comment)

* cmd: minor update for repro desc wording
per #1572 (comment)

* term: don't use "synchronize" in the context of checkout

* cmd: rewrite Downstream example and added info for sequential execution of stages

* Update content/docs/command-reference/repro.md

* Update content/docs/command-reference/repro.md

* Update content/docs/command-reference/repro.md

* cmd: Updated Downstream example

* Update content/docs/command-reference/repro.md

* repro: Updated Downstream example

* Update content/docs/command-reference/repro.md

* cmd: updated last para for the description of --downstream and improved formatting

* cmd: review language of init --subdir

* term: revuew usage of "granular", esp. around init --subdir

* repro.md: updated Downstream example

* cmd: improve init --subdir explanation

* cmd: add info about nested subrepos to init

* cmd: fix -P option desc.
per #1615 (review)

* cmd: improve explanation on how --subdir affects commands
per #1615 (review)

* cmd: simplify nested structures explanation in init
per #1615 (comment)

* guide: add note aboud `cp` not being a download in external deps
per #1643 (review)

* cmd: add note about what --cwd means to repro
per iterative/dvc#4292 (comment)

* guide: nvmd! removing that note in external deps
per 42b670f#r41087633

* Update content/docs/command-reference/repro.md

* Update content/docs/command-reference/repro.md

* Update content/docs/command-reference/repro.md

* Update content/docs/command-reference/repro.md

* Update content/docs/command-reference/repro.md

* cmd: more small updates to init

* Update content/docs/command-reference/repro.md

* Update content/docs/command-reference/repro.md

* Restyled by prettier

* cmd: rewrap metrics diff usage paragraph

* term: remove "just" from -j desc in 3 refs
per eda27fc

* cmd: add command examples to init --subdir use cases
per #1615

* cmd: explain nested repo and projects of all kinds outside of --subdir
per #1615 (review)

* cmd: remove bold names to nested and not-nested structure examples in init --subdir
per #1615 (review)

* cmd: standardize --jobs option in all refs
per #1615 (review)
et al.

* cmd: add speed note to --jobs desc in all refs.
per #1615 (review)

* cmd: change versioning command example in init
per #1615 (review)

* cd: change repo comments in init --subdir examples
for #1615 (review)

* cmd: improve note on DVC submodules a little
for #1615 (review)

* cmd: better explain why isolation is important in --subdir bullet
per #1615 (review)

* cmd: split last --subdir cases explicitly as 2 bullets
per #1615 (comment)

* cmd: remove most notes and code block examples about nesting projects/repos in init
per #1615 (review)

Co-authored-by: sarthakforwet <[email protected]>
Co-authored-by: Restyled.io <[email protected]>
shcheklein pushed a commit that referenced this pull request Aug 5, 2020
* cmd: review repro examples
per #1572 (comment)

* cmd: fic tyupo in get-url

* cmd: updates to repro
per #1572 (review)

* cmd: rewrite repro -P desc
rel #1572 (review)

* cmd: simplified and generalize repro targets desc and DVC file mention
per #1572 (comment)
and #1572 (comment)

* cmd: minor update for repro desc wording
per #1572 (comment)

* term: don't use "synchronize" in the context of checkout

* cmd: rewrite Downstream example and added info for sequential execution of stages

* Update content/docs/command-reference/repro.md

* Update content/docs/command-reference/repro.md

* Update content/docs/command-reference/repro.md

* cmd: Updated Downstream example

* Update content/docs/command-reference/repro.md

* repro: Updated Downstream example

* Update content/docs/command-reference/repro.md

* cmd: updated last para for the description of --downstream and improved formatting

* cmd: review language of init --subdir

* term: revuew usage of "granular", esp. around init --subdir

* repro.md: updated Downstream example

* cmd: improve init --subdir explanation

* cmd: add info about nested subrepos to init

* cmd: fix -P option desc.
per #1615 (review)

* cmd: improve explanation on how --subdir affects commands
per #1615 (review)

* cmd: simplify nested structures explanation in init
per #1615 (comment)

* guide: add note aboud `cp` not being a download in external deps
per #1643 (review)

* cmd: add note about what --cwd means to repro
per iterative/dvc#4292 (comment)

* guide: nvmd! removing that note in external deps
per 42b670f#r41087633

* Update content/docs/command-reference/repro.md

* Update content/docs/command-reference/repro.md

* Update content/docs/command-reference/repro.md

* Update content/docs/command-reference/repro.md

* Update content/docs/command-reference/repro.md

* cmd: more small updates to init

* Update content/docs/command-reference/repro.md

* Update content/docs/command-reference/repro.md

* Restyled by prettier

* cmd: rewrap metrics diff usage paragraph

* term: remove "just" from -j desc in 3 refs
per eda27fc

* cmd: add command examples to init --subdir use cases
per #1615

* cmd: explain nested repo and projects of all kinds outside of --subdir
per #1615 (review)

* cmd: remove bold names to nested and not-nested structure examples in init --subdir
per #1615 (review)

* cmd: standardize --jobs option in all refs
per #1615 (review)
et al.

* cmd: add speed note to --jobs desc in all refs.
per #1615 (review)

* cmd: change versioning command example in init
per #1615 (review)

* cd: change repo comments in init --subdir examples
for #1615 (review)

* cmd: improve note on DVC submodules a little
for #1615 (review)

* cmd: better explain why isolation is important in --subdir bullet
per #1615 (review)

* cmd: split last --subdir cases explicitly as 2 bullets
per #1615 (comment)

* cmd: remove most notes and code block examples about nesting projects/repos in init
per #1615 (review)

* cmd: add basic text about nesting to init

* cmd: revert nested block examples back into --subdir
per #1661 (review)

* cmd: remove new section about nesting to revert more

Co-authored-by: sarthakforwet <[email protected]>
Co-authored-by: Restyled.io <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants