Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exp User Guide #2269

Closed
dberenbaum opened this issue Mar 5, 2021 · 9 comments
Closed

Exp User Guide #2269

dberenbaum opened this issue Mar 5, 2021 · 9 comments
Labels
A: docs Area: user documentation (gatsby-theme-iterative) type: enhancement Something is not clear, small updates, improvement suggestions

Comments

@dberenbaum
Copy link
Contributor

dberenbaum commented Mar 5, 2021

Child issue of #2266

Proposed requirements

  • Suggest helpful workflows for doing experiments
  • Include enough detail that users can implement suggested workflows without additional context

Proposal

Phase 1

Added an "Experiment Management" section to the user guide (https://dvc.org/doc/user-guide/experiment-management). This guide explains various workflows for different purposes:

  1. Running independent experiments
  2. Running dependent checkpoints
  3. Persisting experiments
    Further, it notes different ways to organize experiments and how the run-cache helps avoid re-running experiments.

Status: Completed

Phase 2

This page likely doesn't need a full rewrite but might have a few additions.

One major addition could be suggested workflows for templating:

  • When to use template vars vs foreach vs experiment parameters
    • vars: value that needs to be set in multiple places (for example, across stages or in both deps and outs of a stage)
    • foreach: mutliple values to run and keep separately every time you run the pipeline (for example, running on different language datasets)
    • exp: values that you want to try out but only keep the optimal one (optimization)
  • When to integrate templating with experiments (for example, changing the data dep used for each experiment)

See other ideas in the questions and comments.

Status: Under consideration

Questions

  • Should this page suggest using exp run over repro, or at least address the overlap (i.e. repro predates exp run and may be more natural for some but can be replaced by exp run)?
  • Should dvclive integration be mentioned, or should that be left to dvclive docs?
  • Are there suggested workflows for sharing experiments worth mentioning?
  • Are there suggested workflows for cleaning up experiments worth mentioning?

Comments

We might want to rethink how we organize the User Guide in the navbar. It's getting to be a long list without much structure. Maybe some pages can be collapsed under another umbrella to make pages like this stand out more.

@jorgeorpinel jorgeorpinel added A: docs Area: user documentation (gatsby-theme-iterative) type: enhancement Something is not clear, small updates, improvement suggestions labels Mar 9, 2021
@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Mar 9, 2021

When to use template vars vs foreach vs experiment parameters...

Sounds a bit more like content for the "best practices" section (#72), linked from the guide.

Should this page suggest using exp run over repro

The guide alrady focuses on exp run. Maybe I didn't get the Q.
BTW checkpoint stages are incompatible with repro.

(repro) can be replaced by exp run

They have different use cases. Technically yes you an always use exp run and simply ignore the experiments (custom Git refs) but it's a discussion similar to "do we need import --to-remote and add --to-remote?". Conceptually they're totally different even if the results are pretty similar. For 3.0 we can def. discuss combining/merging them if it makes sense but I'd take that to the core repo.

Should dvclive integration be mentioned

I vote for leaving that for the LogML (is that the name?) docs. It's an independent project and it could be misleading to casually document it in regular DVC docs as if it was a built-in feature. Also it may be confusing since it's similar to checkpoints in some ways.

Thanks

suggested workflows for sharing experiments / cleaning up experiments

Do you mean the info we already mention? Otherwise what workflows do you have in mind?

rethink how we organize the User Guide

Yes! #144

@dberenbaum
Copy link
Contributor Author

dberenbaum commented Mar 9, 2021

Sounds a bit more like content for the "best practices" section (#72), linked from the guide.

Are best practices not part of the user guide? Are there different types of best practices that belong in different sections? This might be part of thinking through how to organize the user guide.

The guide alrady focuses on exp run...
They have different use cases...
I vote for leaving that for the LogML (is that the name?) docs...

👍

Do you mean the info we already mention? Otherwise what workflows do you have in mind?

I mean for exp push/pull/gc - topics that aren't covered now in Experiments Management.

Yes! #144

🎉

@jorgeorpinel
Copy link
Contributor

Are best practices not part of the user guide?

Yes. probably it will be part of the UG (with subpages if needed). I meant that maybe we can put that info there instead of in https://dvc.org/doc/user-guide/experiment-management.

I mean for exp push/pull/gc - topics that aren't covered now

  • Ah OK yes we should probably emphasize on those features a bit more in the intro guide indeed. Instead of the run-cache section perhaps

@jorgeorpinel
Copy link
Contributor

I see that some of the ideas here come from #2266 (comment):

it should be section that will be answering the "workflow" questions and goes into some technical details if needed. E.g. when do I commit an experiment? Are commits also experiments? What are the best practices.

@dberenbaum
Copy link
Contributor Author

Yes. probably it will be part of the UG (with subpages if needed). I meant that maybe we can put that info there instead of in https://dvc.org/doc/user-guide/experiment-management.

Right, including templating suggestions under experiment management might not make sense. Should we put the discussion of when to use vars vs foreach vs exp in another issue then? And leave this one for adding push, pull, gc and any other commands not covered by experiment management now?

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Mar 11, 2021

discussion of when to use vars vs foreach vs exp

Depending on the answer to that question — which I'm not sure about TBH 🙂 . I guess

  • vars: parameterize pipeline definitions - could be interesting to combine with exps
  • foreach: partition (wide) stages or abbreviate (long) stages - doesn't seem esp. meaningful for exps
  • exp+params: track all attempts without worrying about Git

adding push, pull, gc

Agree about adding a Sharing section to the UG (prob instead of run-cache). Not sure we need to cover removing experiments though, as those are secondary utilities and already mentioned/linked from exp examples, whenever appropriate.

@dberenbaum
Copy link
Contributor Author

Related: #2313

@jorgeorpinel
Copy link
Contributor

I think this was had some actionable items, may just need to summarize it 🙂

@dberenbaum
Copy link
Contributor Author

I added an issue for exp sharing. There's also some discussion above of adding UG docs for templating, but I think that can wait until we have more specific stories like #2313.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: docs Area: user documentation (gatsby-theme-iterative) type: enhancement Something is not clear, small updates, improvement suggestions
Projects
None yet
Development

No branches or pull requests

2 participants