-
Notifications
You must be signed in to change notification settings - Fork 394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
guide: Data Management #2856
Comments
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
re title I'd just avoid the phrase "model management" since it has a specific meaning (not what this is about) but if you want to include "model" maybe use "data and model file management". I don't think we need to include "model" in the title but "model file(s)" can be included in the content. |
I'm not sure we have it written somewhere? :) what kind of meaning do you have? what is so different between (I can see for example that if we include |
Like, related to the ML model lifecycle? I mentioned this in the PR (https://www.dominodatalab.com/solutions/model-management/) and it wasn't contested so I assumed I was correct but you guys are the experts! If it doesn't have a special meaning then it doesn't matter. But if it does users and search engines could get confused. |
I'm not an expert on naming things :) I put "model" to the title because the current docs have it, and we put "model" after a user requested it. I understand the meaning described in https://www.dominodatalab.com/solutions/model-management/ and how it differs from the way we use it, but in this new domain usually people use the same words to mean different things. I have no strong opinion here, and honestly writing a specific "model management" document to the UG might be more appropriate. But until then, we can have "model" in the title and we can say that "models are files that can be tracked by DVC" in the text. |
This comment was marked as resolved.
This comment was marked as resolved.
Data Mgmt is simple enough that covering it in the Get Started and Command Reference has thus far been enough. But having a group of existing content under this "Category" could achieve some goals:
So just doing that reorg of existing content could be a good and quick first step, I think. Then we reconsider all the material proposed above. WDYT? |
Given that all that is already covered (albeit maybe disorganized) and not really the general goal of the UG (explanation-type docs), here's a new plan for the Data Management user guide:
|
My only problem with this idea is that we should drive the value of the product and feature earlier than the user guide. This should be in use cases or in the Get Started if needed, even in README as well. OK to repeat in the UG as well, but as a quick recap. @shcheklein |
UPDATE: #2856 (comment)
This is the plan for data management trail that focuses on:
& Versioning data in DVC projects
See also guide: extract
remote add/modify
details from cmd ref. #2866Adding data to DVC projects
Initialize a DVC repository and use
dvc add
to add files.We'll assume MNIST data exist in a folder and will add it.
Versioning data in DVC projects
Creating remotes
Add a Google Drive folder as a remote.
Make it default
Pushing to/pulling from remotes
Accessing public datasets and registries
Removing data from DVC projects
UPDATE: start with a reorg, see #2856 (comment) below (may be enough).
The text was updated successfully, but these errors were encountered: