-
Notifications
You must be signed in to change notification settings - Fork 394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: add 'Using Shared Server For The Team' #56
Conversation
@@ -0,0 +1,94 @@ | |||
# Using Shared Server For The Team |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me a while to recall what's this about. Shared Server is not very obvious name for this scenario. Let's use a straightforward name, something like: Multiple Data Scientists on a Single Machine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
@@ -0,0 +1,94 @@ | |||
# Using Shared Server For The Team | |||
|
|||
The key principle that every data science team should aim for is to make |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try to start with the description of the scenario first:
It's pretty common to see that teams prefer using one single shared machine to run their experiments ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then, we need to mention briefly, why is is happening:
- Better resource utilization - I can utilize multiple GPUs, for example
- Probably, less expensive
- Data locality
- something I am missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
data is super valuable as it enables your colleagues and you to take advantage | ||
of the data processing that you've already done, and understand the context | ||
behind results you create and surface potential caveats. In order to share | ||
common data, many teams setup a shared server for the whole team. With DVC, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be we should omit or simplify the description above? It's too long, I'm not sure it's to the point of this use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
$ git push | ||
``` | ||
|
||
And now you can just as easilly get his work appear in your workspace by: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we mention abvout garbage collection, and that we are workin on LRU cache strategy to keep the size limited?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, it is beyond the scope for this article.
Good stuff. I would simplify title and intro. I'll try to come with a picture to this. |
@shcheklein I have actually used this https://www.forbes.com/sites/quora/2017/04/04/what-are-best-practices-for-collaboration-between-data-scientists/#babcee4335ee for the introduction to this use case and its naming. |
Fixes iterative#44 Signed-off-by: Ruslan Kuprieiev <[email protected]>
👍 let's merge, and polish online |
Fixes #44
Signed-off-by: Ruslan Kuprieiev [email protected]