-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add mlem get started #109
Add mlem get started #109
Conversation
Converting this to draft since actual docs are not live on mlem.ai, but any feedback is still welcomed |
example-mlem-get-started/generate.sh
Outdated
dvc remote add myremote azure://example-mlem --default | ||
mlem config set default_storage.type dvc | ||
python train.py | ||
dvc add .mlem/model/rf .mlem/dataset/train.csv .mlem/dataset/test_x.csv .mlem/dataset/test_y.csv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's not very common for DVC to track each independently (esp dataset) - is it intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are .mlem files in this dir. Can we do something like echo *.mlem > .dvcignore && dvc add data
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the correct pattern was found and it is /**/?*.mlem
. Btw, does dvcignore support ! negation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this does not seem to work, since we cant dvc add .mlem/dataset/
since we lose .mlem
files, even though they ignored. So going with dvc add .mlem/dataset/*.csv
. Would be great to have some way to do "add everything except dvc-ignored"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, does dvcignore support ! negation?
yes, it does support it
it should be working, otherwise it's a bug to my mind
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$ cat .dvcignore
/**/*.mlem
!/.mlem/
$ dvc check-ignore .mlem -d
.dvcignore:2:!/.mlem/ .mlem
I was trying to make an exception for .mlem
dir in a root, but this is what I've got
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
q - why do you need an exception like this? (curious)
does the same combination work for gitignore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My goal is to exclude all *.mlem
files, but not .mlem
dir in repo root. But simple /**/*.mlem
adds .mlem
dir to ignore list. With git this exclusion works works
example-mlem-get-started/generate.sh
Outdated
|
||
|
||
cp $HERE/code/src/requirements.txt . | ||
cp $HERE/code/src/prepare.py . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don it similar to DVC get-started you need to bundle this code and share from S3 so that people could curl or wget it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not get it from raw.github.com?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raw.github.com works as well, I don't see any particular problems with that .. I would bundle though everything as a single tar (or may github has a way to download revisions as a tar with curl, I don't know)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it contains some extra files that are not needed initially though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it contains whole repo, yes. But I don't see any point of maintaining them separately, since they are very small and easy and also will be available in the docs themselves
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not about the size, it's about the workflow. In the dvc.org get started you would have a command:
wget code.zip
unzip code.zip
git add -a -m "add code"
https://dvc.org/doc/start/data-pipelines#expand-to-download-example-code
including any extra files there would confuse and ruin the flow of the document.
so, your call here - if you want something similar to DVC then consider creating a clean code package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will consider this, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thinking second time - I remember now why I didn't use GH (most likely) - I wanted to keep the command as short and nice as possible wget https://code.dvc.org/get-started/code.zip
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for codifying this 🙏
I put some comments here and there to address, but also:
- should we create a branch in the regular example-get-started that converts models into mlem and uses gto + deploys it. That would be cool.
- in this example - should we have something about client, deployment, gto? Just saving model might be not very interesting (not clear why would I want this at all)
|
Thanks, @mike0sv for addressing some things. I left a bit more comments, but nothing major. |
Co-authored-by: Alexander Guschin <[email protected]>
Per #109 (comment) - we don't need gitpython anymore here
Per #109 (comment) - we don't need gitpython anymore here
Fixed and resolved some stuff, answered other comments. Waiting for your follow-up on the ones we need to address. @shcheklein |
Ok, now I think I've addressed everything we've discussed, @shcheklein :) If you agree, then please approve the PR. |
[Get Started](https://mlem.ai/doc/get-started). It is a step-by-step quick | ||
introduction into basic MLEM concepts. | ||
|
||
- The `main` branch contains only the code needed to start the Get Started. This |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's quite unusual - downside is that people who come to the repo won't see anything in it. This is not good to my mind. What is the purpose of keeping the main like this at all if we have the corresponding tags? Can it be done this way:
simple
becomesmain
dvc
- can it be done in way that builds on top of simple - we can make a PR fromdvc
tomain
(ex-simple) to DVC-ify the project?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did that and committed changes. Please see iterative/mlem.ai#132 for PR updating documentation.
@aguschin thanks! we are getting there :) I still have some questions though. Please see some unresolved comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments left and I didn't try to run it.
Good progress, we are almost there.
Overall, still look too complicated for the MLEM get started. I think as we come to the updates on docs we can optimize this further.
Thanks for the feedback. Fixed those @shcheklein. |
@aguschin one unresolved item left - https://github.com/iterative/example-repos-dev/pull/109/files#r919031250 - up to you to address or merge! Good stuff. |
All changes addressed. I've just updated https://github.com/iterative/example-mlem-get-started Now merging iterative/mlem.ai#132 and #109 Thanks for your help and feedback @jorgeorpinel and @shcheklein ! |
No description provided.