-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DVC fails to push data from external cache to default remote #4686
Comments
And in documents
Maybe we have to pull it down first? |
@EmmaBYPeng What is your use case? Maybe we can work around that for now. @karajan1001 PS |
Thanks for the response! re same location for cache and remote: #3703 seems to suggest that it's bad to have the external cache and remote storage be the same thing @pared I guess what we are looking for is to use DVC as a data registry to track data stored on GCS, which multiple developers (including CI) can read/write. We don't want to store the data locally since 1) the data is big, and 2) our ML pipelines need to directly read data from GCS. Our workflow should look something like:
My questions are:
|
Your workflow makes perfect sense. If you want to store your data on
Yes, let's remember that DVC in this case is tracking external file - so any
No, its actually better to have single cache. |
For the record: |
So this is an issue of |
I guess, since we mention the requirement for external cache in case of external outputs, we could mention there that it is the cache that will store the dependencies, and pushing them to other remotes will have no effect. |
Yes, and in the |
@karajan1001 @EmmaBYPeng @efiop I created an issue on docs to clarify this use case. |
Closing in favor of iterative/dvc.org#1865 |
Bug Report
Please provide information about your setup
Output of
dvc version
:Use case
We want to track data on GCS using DVC (w/o downloading to local machines), with an external cache and remote storage
Steps
Issues
After the push, we got
Everything is up to date.
, while nothing showed up in the default storage bucket (our cache bucket did have the cached data, though)I'm new to DVC so I might have misunderstood the external data workflow. Please let me know if I missed anything in the above steps!
Reference: https://dvc.org/doc/user-guide/managing-external-data
The text was updated successfully, but these errors were encountered: