Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mlem get started #109

Merged
merged 26 commits into from
Jul 21, 2022
Merged

Add mlem get started #109

merged 26 commits into from
Jul 21, 2022

Conversation

mike0sv
Copy link
Contributor

@mike0sv mike0sv commented Apr 19, 2022

No description provided.

@mike0sv mike0sv requested a review from casperdcl April 19, 2022 20:20
@mike0sv mike0sv marked this pull request as draft April 19, 2022 20:22
@mike0sv
Copy link
Contributor Author

mike0sv commented Apr 19, 2022

Converting this to draft since actual docs are not live on mlem.ai, but any feedback is still welcomed

dvc remote add myremote azure://example-mlem --default
mlem config set default_storage.type dvc
python train.py
dvc add .mlem/model/rf .mlem/dataset/train.csv .mlem/dataset/test_x.csv .mlem/dataset/test_y.csv
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's not very common for DVC to track each independently (esp dataset) - is it intended?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are .mlem files in this dir. Can we do something like echo *.mlem > .dvcignore && dvc add data?

Copy link
Contributor Author

@mike0sv mike0sv Apr 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the correct pattern was found and it is /**/?*.mlem. Btw, does dvcignore support ! negation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this does not seem to work, since we cant dvc add .mlem/dataset/ since we lose .mlem files, even though they ignored. So going with dvc add .mlem/dataset/*.csv. Would be great to have some way to do "add everything except dvc-ignored"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, does dvcignore support ! negation?

yes, it does support it

it should be working, otherwise it's a bug to my mind

Copy link
Contributor Author

@mike0sv mike0sv Apr 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$ cat .dvcignore 
/**/*.mlem
!/.mlem/
$ dvc check-ignore .mlem -d
.dvcignore:2:!/.mlem/   .mlem

I was trying to make an exception for .mlem dir in a root, but this is what I've got

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q - why do you need an exception like this? (curious)

does the same combination work for gitignore?

Copy link
Contributor Author

@mike0sv mike0sv Apr 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My goal is to exclude all *.mlem files, but not .mlem dir in repo root. But simple /**/*.mlem adds .mlem dir to ignore list. With git this exclusion works works



cp $HERE/code/src/requirements.txt .
cp $HERE/code/src/prepare.py .
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don it similar to DVC get-started you need to bundle this code and share from S3 so that people could curl or wget it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not get it from raw.github.com?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raw.github.com works as well, I don't see any particular problems with that .. I would bundle though everything as a single tar (or may github has a way to download revisions as a tar with curl, I don't know)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it contains some extra files that are not needed initially though?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it contains whole repo, yes. But I don't see any point of maintaining them separately, since they are very small and easy and also will be available in the docs themselves

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not about the size, it's about the workflow. In the dvc.org get started you would have a command:

wget code.zip
unzip code.zip
git add -a -m "add code"

https://dvc.org/doc/start/data-pipelines#expand-to-download-example-code

including any extra files there would confuse and ruin the flow of the document.

so, your call here - if you want something similar to DVC then consider creating a clean code package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will consider this, thanks

Copy link
Member

@shcheklein shcheklein Apr 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking second time - I remember now why I didn't use GH (most likely) - I wanted to keep the command as short and nice as possible wget https://code.dvc.org/get-started/code.zip

Copy link
Member

@shcheklein shcheklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for codifying this 🙏

I put some comments here and there to address, but also:

  • should we create a branch in the regular example-get-started that converts models into mlem and uses gto + deploys it. That would be cool.
  • in this example - should we have something about client, deployment, gto? Just saving model might be not very interesting (not clear why would I want this at all)

@mike0sv
Copy link
Contributor Author

mike0sv commented Apr 20, 2022

  1. Yes, but this could wait until MLEM release I guess
  2. I am planning to add something like that later, this was for first two sections of get-started https://github.com/iterative/mlem.ai/pull/43/files#diff-1cd29580767fad4637f461df8460373ba6ec523dc598d7d3550657ed56cab101

@shcheklein
Copy link
Member

Thanks, @mike0sv for addressing some things. I left a bit more comments, but nothing major.

shcheklein added a commit that referenced this pull request Apr 21, 2022
Per #109 (comment) - we don't need gitpython anymore here
shcheklein added a commit that referenced this pull request Apr 21, 2022
Per #109 (comment) - we don't need gitpython anymore here
@shcheklein shcheklein changed the title add mlem get started Add mlem get started May 12, 2022
@aguschin
Copy link
Contributor

aguschin commented Jul 8, 2022

Fixed and resolved some stuff, answered other comments. Waiting for your follow-up on the ones we need to address. @shcheklein

@aguschin aguschin self-assigned this Jul 8, 2022
@aguschin
Copy link
Contributor

Ok, now I think I've addressed everything we've discussed, @shcheklein :) If you agree, then please approve the PR.

[Get Started](https://mlem.ai/doc/get-started). It is a step-by-step quick
introduction into basic MLEM concepts.

- The `main` branch contains only the code needed to start the Get Started. This
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's quite unusual - downside is that people who come to the repo won't see anything in it. This is not good to my mind. What is the purpose of keeping the main like this at all if we have the corresponding tags? Can it be done this way:

  • simple becomes main
  • dvc - can it be done in way that builds on top of simple - we can make a PR from dvc to main (ex-simple) to DVC-ify the project?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did that and committed changes. Please see iterative/mlem.ai#132 for PR updating documentation.

@shcheklein
Copy link
Member

@aguschin thanks! we are getting there :) I still have some questions though. Please see some unresolved comments.

Copy link
Member

@shcheklein shcheklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments left and I didn't try to run it.

Good progress, we are almost there.

Overall, still look too complicated for the MLEM get started. I think as we come to the updates on docs we can optimize this further.

@aguschin
Copy link
Contributor

Thanks for the feedback. Fixed those @shcheklein.

@shcheklein
Copy link
Member

@aguschin one unresolved item left - https://github.com/iterative/example-repos-dev/pull/109/files#r919031250 - up to you to address or merge! Good stuff.

@aguschin aguschin dismissed shcheklein’s stale review July 21, 2022 11:11

Changes addressed :)

@aguschin
Copy link
Contributor

All changes addressed. I've just updated https://github.com/iterative/example-mlem-get-started

Now merging iterative/mlem.ai#132 and #109

Thanks for your help and feedback @jorgeorpinel and @shcheklein !

@aguschin aguschin merged commit d0b3914 into master Jul 21, 2022
@aguschin aguschin deleted the add-mlem-get-started branch July 21, 2022 11:14
@jorgeorpinel jorgeorpinel added A: docs Area: user documentation (related) A: example-mlem-get-started MLEM examples labels Jul 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: docs Area: user documentation (related) A: example-mlem-get-started MLEM examples
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants