Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More granular control on caching of logs with dvclive #77

Closed
Benjamin-Etheredge opened this issue May 15, 2021 · 6 comments · Fixed by iterative/dvc.org#3411
Closed
Labels
enhancement question Further information is requested

Comments

@Benjamin-Etheredge
Copy link

The cache flag under live in dvc.yaml seems to be an all-or-nothing type flag (i.e., all log files, summaries, and HTML must be cached or none of them). The logging directories, which contain each iteration of data, are less likely to be checked into git than the summaries are. This is due to the large number of differences that will always be present in the logged iteration data. Summaries are generally smaller and thus a prime candidate for being tracked with git.

Currently, this can be done by setting cache to true and removing summary files from .gitignore. This seems counter to the intentions of DVC providing the cache option. I'm also unsure of the implications of doing this. Does DVC still track that item? Is it now duplicated in git and DVC tracking?

Adding the options for individually caching of the outputs of the live tag would allow for easier workflows when only the summaries from logging are to be tracked.

@pmrowla
Copy link
Contributor

pmrowla commented May 17, 2021

Currently, this can be done by setting cache to true and removing summary files from .gitignore. This seems counter to the intentions of DVC providing the cache option. I'm also unsure of the implications of doing this. Does DVC still track that item? Is it now duplicated in git and DVC tracking?

This will cause those files to be tracked by both DVC (as cached objects) and Git, which is a problem. It will eventually cause certain DVC commands to fail as we do not allow adding/tracking a file which is already tracked by Git.

As you noted, what you really need is the ability to set specific dvclive outputs as cache: false so they can be tracked via Git instead.

Transferring this issue to the dvclive repo for now, but there may be a design reason for this limitation that I'm not aware of @pared @dberenbaum

@pmrowla pmrowla transferred this issue from iterative/dvc May 17, 2021
@pared
Copy link
Contributor

pared commented May 17, 2021

HTML files are not tracked at all (but can be added under git control). DVC cache can track only the live directory itself but not the summary JSON. The JSON itself should not be put into .gitignore and one should be able to put it under git control. The design is flawed, and has been chosen this way in order to not impose huge changes on dvc side just for dvclive support. I guess we need to get back to this problem and consider if we can fix it once and for all here.

@dberenbaum
Copy link
Collaborator

@Benjamin-Etheredge What do you want to cache with DVC and what do you want to track with Git? It reads to me like you want to cache the logging directory with DVC and track the other live outputs with Git, which I believe is the default behavior.

@pared pared added the awaiting response we are waiting for your reply, please respond! :) label Jun 15, 2021
@jorgeorpinel jorgeorpinel removed the awaiting response we are waiting for your reply, please respond! :) label Apr 30, 2022
@jorgeorpinel
Copy link

jorgeorpinel commented Apr 30, 2022

Sounds like since the live section is being removed in favor of DVC metrics and plots this is no longer relevant? Cc @daavoo if you could explain/confirm/close that would be great, thanks!

@daavoo
Copy link
Contributor

daavoo commented Apr 30, 2022

Indeed. I was waiting to close this after the docs are merged

@jorgeorpinel
Copy link

jorgeorpinel commented May 1, 2022

OK I see it's linked so yeah it will get closed. But for clarity maybe you want to explicitly answer here on how things are now relating to granular caching of logs?

p.s. maybe even move it to the dvclive repo as a Q.

@jorgeorpinel jorgeorpinel added the question Further information is requested label May 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants