-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exp queue: live metrics #8478
Comments
This should be able to be collected from the metrics files even if not yet written to dvc.lock |
On top of the above information, the VS Code extension needs some way to map the temp directory that the experiment is being run in back to the experiment. We require this information so that we can set up file watchers to:
[Q] Will |
No. VS Code could run @karajan1001 @daavoo Any ideas here? |
I think the main issue is that we don't know those temp dirs, and that's what @mattseddon was asking about. If we know them we can setup watchers (need to make sure that we have all the project information from the |
Yup, makes sense, just checking if that would be enough once he is able to determine those temp dirs, since AFAIK the extension relies entirely on |
To achieve this, we need to make |
@karajan1001 mentioned in #8787:
To clarify, this issue is about non-checkpoint updates that happen in the tmp workspace. AFAIK, the live metrics in the video are only possible because that demo repo uses checkpoints. |
@karajan1001 could you clarify please? I'm not sure I understand the point |
Hi @dberenbaum, so what you mean here is that users continuously update
Hi @shcheklein , the current method to gather remote execution results is that we use |
Correct. |
For the local temp dir executor, we can just monitor the temp workspace, but for experiments running on a remote server, It would be much hard to implement this. |
Yup, I think it's fine to focus on local execution when we get to working on this issue. |
The directory being used can be determined based on the hash for the experiment. Files related to the temporary execution process/dir are stored in So for something like:
The full exp hash will be visible in Inside the
The file that vscode should be looking at is
It's just a json file with a single level dictionary. The
To get file-watcher based live metrics updates without relying on The cli equivalent to get real-time/live metrics for the running exp would just be:
or
or even
@iterative/vs-code |
For reference, all of the files in a given queued exp's
{"pid": 81538, "stdin": null, "stdout": "/Users/pmrowla/git/example-get-started/.dvc/tmp/exps/run/f684e2a8963ee2c47590cb8fa2823fee4129753b/f684e2a8963ee2c47590cb8fa2823fee4129753b.out", "stderr": null, "returncode": 255}
|
We can consider revisiting how |
@pmrowla Why do you think it's easier for VS Code to do this than DVC? Top priority should be having this in VS Code, but I think a user could rightfully ask why they can't get the same info collected easily in the CLI |
Because they can reuse their existing code, and because using file watchers to wait for a metrics file to change is the correct way to handle this in a gui application (as opposed to repeatedly calling We can support this in |
Agreed, VS Code should be responsible for deciding when it's needed to get updates. Once they determine an update is needed, why not have DVC collect the experiments in one place? When |
@mattseddon can correct me, but we definitely don't plan to do it this way. We want to run this with watchers. We need to know temporary locations. I think that should be enough for VS Code.
If that works and it's stable enough, I think VS Code can read this information. @mattseddon wdyt? All of this doesn't change the discussion on the DVC side though. I think @dberenbaum point is that we'd like to be able to see live metric updates as an HTML report in DVC? Etc. |
Let's separate plots and the exp table. I don't really see a use case to have For the exp table, it makes sense to me to have live updates from queued experiments collected as part of |
Tl;dr - yes we can do it on the VS Code side, no it would not be trivial. Min 2 weeks effort to get either plots or experiments updates then a little more to get the other. As plots are not on the roadmap we will probably have to do this anyway. exp show updatesIn order to implement on the VS Code side I will need to:
[Q]s
There are plenty of points of failure in the above and we seem to be relying more and more on the internals of DVC. 5/6 will require some heavy lifting to fit in with the code that is already there. An optimistic estimate would be 2 weeks of effort to get this ironed out. This will more than likely turn into 3 and could blow out even further. plots diff updatesIf DVC isn't going to provide plots updates then I will still have to build out most of the steps above anyway. The main difference being for step 6 the task is easier because the data for each experiment is more or less held separately. |
To add some further context to the above comment the first task in iterative/vscode-dvc#3178 is
|
Yes, the status is mapped to celery task states, so the individual values probably aren't meaningful to vscode, but anything dvc/dvc/repo/experiments/executor/base.py Lines 58 to 65 in 37761a2
It's not garbage collected right now, but will be eventually. This is related to the performance issues with
No, in this case from DVC's perspective you will be running |
Makes sense to me. @mattseddon @pmrowla Do you see any way around VS Code watching files to decide when to update? I assumed this has to be handled by VS Code since DVC is not running any kind of daemon and can only check for updates when called.
I'm open to discussing what DVC can do. I'm not sure it's worth extending the plots syntax, but for For @pmrowla Are there other concerns you have? |
@pmrowla how much effort would it take to add the |
I don't think it belongs in What we could do is add some other plumbing command that walks |
Would it be more appropriate to extend |
Thinking about this some more, I think maybe we can keep everything vscode wants/needs in a single dvc command (whether or not that's The issue right now is that we started from the standpoint of "just dump the dvc structures dicts in json, in a way that also sort of looks like what gets displayed in the CLI table". This was fine from a "we need something that works for vscode right now" standpoint, but what we have 2 years later is a giant mess because what we really do is:
This problem isn't even specific to vscode and the json output. The CLI table has changed over time as well. exp show started from "just display single-commit experiments and flat git branches when a user manually runs a command from the CLI" but over time we've had to add more and more stuff like
and the Collecting the live metrics stuff from the tempdirs is straightforward for DVC, and the dvc-task/celery stuff was all done in a way to make that kind of collection easier than it was before. But Instead of continuing to try and work around the current setup we should just get around to coming up with a data schema that is actually sane and makes sense for both dvc and consumers (whether the consumer is vscode/studio/or something else) This will require changes in both dvc and vscode, but it will allow us to come up with a data schema that is actually sane, allows for future additions to be easily added, and makes sense for both dvc and consumers (whether that's vscode/studio or something else entirely). |
If I was going to put the executor/process stuff in a separate subcommand it probably belongs in What we really want is something separate that is intended specifically for vscode |
I am on board with this 100% and I am sorry that we have contributed to the code/output being so hacked together. If it helps I can easily get together a list of things that we need/use from the output and how we use it. I will happily rewrite whatever I need to on the VS Code side to accommodate these changes and I'd like to contribute on the DVC side too. |
I think it would be a good idea to break the original problem into two separate parts:
VS Code can fairly easily handle 1 and call DVC for updates at the correct time. |
Talked to @pmrowla today, and he is planning to research (up until early next week) the direction he mentioned above, which could potentially collect all info (params, metrics, plots, etc.) for any set of experiments. |
This would be great to get from the vscode side @mattseddon |
So the tl;dr is that we currently use everything that isn't From the exp show data we extract the following information:
Outside of
Having a Please let me know if you need more information. Happy to go into greater details on any of the points above. |
For long-running queued/temp experiments, I'd like to see metrics that are being written in the temp dir while it's running, even if checkpoints aren't enabled. DVC collects these for the workspace but not for experiments running in temp dirs.
The text was updated successfully, but these errors were encountered: