-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
shared cache and dvc import #4476
Comments
And just to comment.... I did attempt to configure this such that all of the DVC projects point to a single shared cache - and it appears to work (project_A, project_B, project_C, project_D). Any concerns? |
@wdixon Using the same shared cache dir is a good approach. The only thing that you need to be aware of is that |
thank you.... It might be nice to show an example of 2 projects using the same cache in the documentation. It wasn't clear the first time reading the page that the shared cache was meant to be shared across projects. Going back and reading again - i do see it indicates for "everybody's projects" |
@wdixon You mean in the https://dvc.org/doc/use-cases/shared-development-server ? So your case is multiple projects and multiple users, right? |
Yes, that is correct URL. and Yes, multiple projects, and multiple users. Initially I had setup separate caches for each project (which would be fine, when the projects are completely independent). However, if you pull in (through a dvc import) a portion of a data registry, that is where I encountered the cache duplication. I ended up trying a single cache directory for all the projects, which seems to be what you recommend - and now don't face any duplicate cache storage. Thanks for the help. |
@wdixon btw, just in case you missed this - you might want to enable @jorgeorpinel should we update the doc a bit to make it explicit that shared cache is also about sharing cache across different projects? |
Sure. I should be getting to that use case soon. Will keep this in mind 👍 |
I do think a bit more on the docs related to sharing cache across projects would be helpful. |
I'm adding the info here: https://github.com/iterative/dvc.org/pull/1724/files#diff-7b8425b522dc0dcb5f8845ed84d12ce6L10-R18 PTAL |
This is more of a question - related to setting up data registries and the implications of shared cache with dvc import.
Presently I have a few datasets - each created as a separate git/dvc project (each say in the 1000GB range).
Each dataset contains a group of specific images, along with several different annotations types.
Each dataset has been configured to use a separate (independent) shared cache on network attached storage - visible to several shared development servers(s)
/network/storage/shared_dvc/cache/project_A
/network/storage/shared_dvc/cache/project_B
/network/storage/shared_dvc/cache/project_C
This part is working.
Now the question arises from consuming these registries - with a 4th project (project_D). This project contains the code defining a DL network and training script.. The network consumes a composite of information contained in registries project_B and project_C ( accomplished with dvc import )
It would seem unnecessary to duplicate the cache storage.
The datasets eat up storage fairly quickly - looking for guidance to minimize the impact of duplicate copies
The text was updated successfully, but these errors were encountered: