Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revise experiments docs? #2266

Closed
dberenbaum opened this issue Mar 4, 2021 · 12 comments
Closed

Revise experiments docs? #2266

dberenbaum opened this issue Mar 4, 2021 · 12 comments
Labels
A: docs Area: user documentation (gatsby-theme-iterative) ✨ epic Placeholder ticket for multi-sprint direction, use story, improvement type: discussion Requires active participation to reach a conclusion.

Comments

@dberenbaum
Copy link
Contributor

Using this ticket to discuss experiments docs, with a focus on:

  1. Clarity - Experiments have a lot of overlap with other commands in both DVC and Git (for example, dvc exp run compared to dvc run and dvc repro ). Some exp commands also have overlap with each other (for example, apply and branch). There are also certain keywords that we adopted (for example, "persisting" experiments), but I don't think they are defined anywhere. All of this may both confuse users and obfuscate the value of experiments.

  2. Consistency - Experiments are documented in https://dvc.org/doc/command-reference/exp, https://dvc.org/doc/user-guide/experiment-management, and https://dvc.org/doc/start/experiments (and references elsewhere). @pmrowla, @jorgeorpinel, and I wrote these sections separately. Despite collaboration and review, there's are different perspectives and likely inconsistencies.

Now that experiments are an officially released feature, we can gather any feedback from users and discuss here.

@dberenbaum
Copy link
Contributor Author

In #2243 (comment), I mentioned that the differences between exp apply and exp branch are unclear, and that their behavior is not sufficiently explained by high-level descriptions. My inclination is to more directly address how experiments work under the hood.

Here's the intro to the getting started doc:

Experiments proliferate quickly in ML projects where there are many parameters to tune or other permutations of the code. We can organize such projects and only keep what we ultimately need with dvc experiments. DVC can track experiments for you so there's no need to commit each one to Git. This way your repo doesn't become polluted with all of them. You can discard experiments once they're no longer needed.

That intro raises questions:

  • What exactly is being tracked?
  • What does "tracking" mean?
  • What benefits do I get by having DVC track my experiments?

An alternative approach would be more explicit. For example:

Experiments proliferate quickly in ML projects where there are many parameters to tune or other permutations of the code. With DVC experiments, you can run hundreds of experiments without navigating between Git branches or commits, and you can compare the metrics and parameters for all experiments in a single table.

DVC experiments create Git commits that are stored so that DVC will track them but Git will ignore them. If you want to retrieve the code and outputs for an experiment, DVC can put them in your workspace or your Git history. All the other experiments remain hidden, and DVC can delete them with a single command.

@shcheklein shcheklein added 2.0 release type: discussion Requires active participation to reach a conclusion. A: docs Area: user documentation (gatsby-theme-iterative) labels Mar 5, 2021
@shcheklein
Copy link
Member

@dberenbaum thanks for creating this ticket to do another round of reviews and improvements. A few notes from me (I haven't had time to review some parts of the docs though).

Strategic points:

  • Experiments get started - feels like it should become a self-sufficient entry point (do not depend and do not be the last section after data management). We'll probably need a better DL project for this.
  • User guide experiments - my take on this that it should be section that will be answering the "workflow" questions and goes into some technical details if needed. E.g. when do I commit an experiment? Are commits also experiments? What are the best practices.
  • We'll need to write a Use Case - a "sales" material. Similar to the Data Registry one.
  • Do we need a tutorial besides the get started one? Get started should stay simple to my mind. It's been a constant fight back and forth - simplifying and growing the complexity back in it. People want to squeeze "everything" into it, and it doesn't work well.
  • Video - we need to do the new video

Tactics:

  • A lot of those small things you mentioned, we can start creating a list with checkboxes?

Also, a meta discussion :) it might make sense to even create a separate ticket per checkbox (and link those in the list in this ticket), otherwise it seems it will be hard to discuss so many points in this single thread?

@dberenbaum
Copy link
Contributor Author

I made separate issues for each strategic point. I realized this probably isn't quite what you meant @shcheklein, but maybe it's better to think about our docs as products and focus the issues on each product instead of technical questions.

Here's a checklist for the strategic points(see links to the issues above):

  • Get started
  • User guide experiments
  • Use Case
  • Tutorial
  • Video

@shcheklein
Copy link
Member

but maybe it's better to think about our docs as products and focus the issues on each product instead of technical questions.

totally agree! it's just sometimes it's hard to define the line - docs itself is a product (that we have Get Started, Use Cases, User Guide, Cmd Ref in the first place), then each section can be potentially a product (but we should keep in mind other sections and decisions made there). Not pushing to any actions here, just sharing my experience. I'm totally fine to try to iterate in a way you suggested.

I made separate issues for each strategic point

yeah, I had in mind probably tactical points (like should we use a specific term or not) - so that we don't discuss all them simultaneously in one place (it's hard to do on GH in a single ticket). We can actually try to use GH discussions for this?

@dberenbaum
Copy link
Contributor Author

Yeah, after thinking about it more, I thought we might need to address those strategic points before getting to the tactics. If we want to make major changes to some of the docs, not worth getting caught in the details now. It would be great to propose what we expect from each doc before starting it or revising it. That was the hope with the new issues (once they get more fleshed out), but maybe it would be better in something like Notion, Google Drive, a markdown file in a PR for comment, etc.

yeah, I had in mind probably tactical points (like should we use a specific term or not) - so that we don't discuss all them simultaneously in one place (it's hard to do on GH in a single ticket). We can actually try to use GH discussions for this?

Good idea. I was just looking at those today and thinking about when they might be useful instead of issues. I'll probably try that once we get a better sense of the strategic points.

@shcheklein
Copy link
Member

Notion, Google Drive, a markdown file in a PR for comment, etc.

Any mechanism works for me personally (I would be happy to keep on GH for public visibility, but it can be too painful? not sure. though RFC PR sounds like an option? may be GH discussions? may be our forum - discuss.dvc.org?)

@dberenbaum
Copy link
Contributor Author

I got intrigued seeing some of the great proposals from our users, like https://www.notion.so/Change-dvc-lock-hashes-eb027be2df044ce382183a14718ec48b 😄 , but I agree that GH is great to keep public visibility and public comment.

@dberenbaum
Copy link
Contributor Author

@shcheklein @jorgeorpinel Interested in your thoughts on the structure of #2268 so that I can decide whether to do something similar for the other issues. Of course, please also provide any feedback on the substance in the issue itself.

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Mar 10, 2021

dvc exp run compared to dvc run and dvc repro

@dberenbaum on that specifically, dvc run is being phased out in favor of stage add. repro is quite similar but has a different purpose in principle. Probably a core discussion on how/whether to redesign the UI?

apply and branch...

I'll update docs per #2243 (comment) but there's also a possible core prod discussion on this, I think: why not let users do git commands instead of providing these exact wrappers?

Any other specific overlaps you think we can ameliorate with docs?

keywords that we adopted (for example, "persisting

I hope "persistent" is a general-enough term that doesn't need a special definition but we can make a tooltip if needed in those cases. Any other special terms you've detected?

Consistency... there's are different perspectives and likely inconsistencies

The docs sections are meant to have different perspectives though, keeping in mind website traffic funnels can start in any page (which unfort we haven't studied well enough yet). Let's def. address any specific inconsistencies though.

address how experiments work under the hood

We try not to go into implementation details except very carefully (often hidden), as probably only very few users can get value from that. It may not help most people understand, but it can scare them off. Also, implementation can change much more often than high-level futures or even the UI so tech details make docs harder to maintain. We also have GH tickets, support history, and the codebase itself for that (different types of docs).

Interested in your thoughts on the structure of #2268 so that I can decide whether to do something similar for the other issues

That struct is great for discussions but may need significant updates or further breaking down into smaller actionable tasks (and keep it readable in the future if it lingers).

💡 We can also consider enabling Discussions in this repo instead BTW, if we start treating docs sections more as products or prod features.


I've left more specific comments in the sub-issues here. @dberenbaum feel free to summarize what is still pending to decide for this parent ticket. Or if you intend to keep it as "epic" there's actually a label for that (feel free to apply it). Thanks

@dberenbaum dberenbaum added the ✨ epic Placeholder ticket for multi-sprint direction, use story, improvement label Mar 10, 2021
@dberenbaum
Copy link
Contributor Author

Thanks, @jorgeorpinel! I'll keep it as an epic then for now and keep actual discussion in this issue to a minimum.

@dberenbaum
Copy link
Contributor Author

This may have been overly ambitious 🤣 . Thoughts on closing this and the related issues for now? We could always reopen if we prioritize them in the future.

@shcheklein
Copy link
Member

@dberenbaum no concerns on my end, we can discuss the priorities for the next iteration together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: docs Area: user documentation (gatsby-theme-iterative) ✨ epic Placeholder ticket for multi-sprint direction, use story, improvement type: discussion Requires active participation to reach a conclusion.
Projects
None yet
Development

No branches or pull requests

3 participants