Queue: wrap up e2e workflow #3178

shcheklein · 2023-01-30T19:05:44Z

Followup after #3091

Tasks

Give feedback

Watch and keep updating plots
When something runs in background - can we allow running an experiment in workspace?
Attach and follow standard output / error
Failed / stoped - add resume and / or run again action
Add to a queue action is very slow - can we add a progress bar and notification? (fixed by Remove pausing of data updates when running/queueing experiments #3815 without adding a notification)
Options

See #3091 (comment)

shcheklein · 2023-02-04T23:43:20Z

I've updated the ticket.

mattseddon · 2023-02-05T22:19:33Z

Watch and keep updating plots

Depends on iterative/dvc#8478. Unactionable until that is closed.

mattseddon · 2023-02-08T23:20:57Z

@shcheklein can you please reorder the checkboxes in order of priority? I will add some comments.

shcheklein · 2023-02-08T23:35:43Z

@mattseddon done (also, trying a new beta feature - tasklists).

mattseddon · 2023-02-10T01:48:43Z

When something runs in background - can we allow running an experiment in workspace?

We need to update the extension's package.json and the experiments table to make this happen. It should be achievable to enable. However, I have manually tested this and the experience feels very unstable.

Attach and follow standard output / error

It can be done with dvc queue logs <exp-id> -f but will need a sizeable amount of wiring built around it. We would need to use the PseudoTerminal and build a wrapper around it that looks a lot like the DvcRunner. Should be enabled from the right-click context menu in the table which will bring its own set of challenges (can it be run twice for the same experiment, should it be disabled, under what circumstances, etc).

Failed / stoped - add resume and / or run again action

I don't think this is currently possible. Maybe we could do something with a combination of apply, re-queue and start the queue but I feel like it is unlikely that it will work as expected. Feels like this needs a FR against DVC.

Add to a queue action is very slow - can we add a progress bar and notification?

What would the progress notification show and when would it update? From the data we get back, it would go from being empty to being full and that would be it (we have no information about intermediate steps). My suggestion for this would be to raise an issue in DVC to make the command more performant.

mattseddon · 2023-02-10T04:07:50Z

I don't think we can mark this as done until iterative/dvc#8763 is resolved.

shcheklein · 2023-02-11T00:47:25Z

Thanks @mattseddon, for giving more color / details to the list.

However, I have manually tested this and the experience feels very unstable.

Could you please add more color/details to this?

It can be done with dvc queue logs -f but will need a sizeable amount of wiring built around it. We would need to use the PseudoTerminal and build a wrapper around it that looks a lot like the DvcRunner. Should be enabled from the right-click context menu in the table which will bring its own set of challenges (can it be run twice for the same experiment, should it be disabled, under what circumstances, etc).

can we cut the scope? Show some tooltip with static output when you show it , dump logs as a file and open it, etc? Clearly ppl will have to run it again to get the most recent result. But at least it something. It's way better compared to nothing.

Failed / stoped - add resume and / or run again action
I don't think this is currently possible. Maybe we could do something with a combination of apply, re-queue and start the > queue but I feel like it is unlikely that it will work as expected. Feels like this needs a FR against DVC.

@dberenbaum could you please chime in? What's your take on being able to resume / queue again a failed / killed experiment(s)?

My suggestion for this would be to raise an issue in DVC to make the command more performant.

It's not possible I think. We can get it only to a certain level. I see that it's problematic w/o DVC support. We need to start thinking about improving this though - for data commands, for queue, for all commands that take long time to run.

dberenbaum · 2023-02-14T16:05:03Z

@dberenbaum could you please chime in? What's your take on being able to resume / queue again a failed / killed experiment(s)?

Resuming a killed experiment and re-queuing a failed experiment seem like two different scenarios to me:

Resuming a killed experiment: this is strictly a bug/regression that's tracked in exp run: --temp --rev does not properly resume from the target revision dvc#7813. We haven't prioritized yet because we need to solidify the entire checkpoints/resume workflow and this is merely a bandaid, but I don't expect it would be too hard to address if needed.
Re-queuing a failed experiment: I thought there was more discussion in DVC, but for now I only found Repeat an experiment dvc#8867. I think it's important but complicated, and for now let's start with exposing the logs so at least people know what happened. If the experiment failed, how likely is it that someone wants to re-run it again as is? Agree with @mattseddon that the current suggestion would be to apply it, review what was wrong, and then queue it again. I think it's too complicated for VS Code to do in one action. Let's discuss in Repeat an experiment dvc#8867, and if queuing becomes useful enough, we should start getting more user feedback here to decide on next steps.

shcheklein · 2023-02-15T00:53:35Z

how likely is it that someone wants to re-run it again as is?

Let's say I made a silly mistake and 100 experiments failed? Let's say I forgot to pull data? Let's say it happens in the last stage of the pipeline and it took a lot of time to run them?

I think I would want a way to fix something and re-rerun / resume things.

dberenbaum · 2023-02-16T18:01:11Z

Yep, agree that it's important. I guess the p3 in iterative/dvc#8867 doesn't reflect that, but I did note there that it's important and is only p3 because we already have too many competing priorities.

Let's say I made a silly mistake and 100 experiments failed? Let's say I forgot to pull data? Let's say it happens in the last stage of the pipeline and it took a lot of time to run them?

I don't mean to discount this, but I wasn't sure this was painful enough to make the queue feel broken. In that example, do you think it's likely that the 100 experiments got queued one at a time or by a grid search like dvc exp run -S lr=range(...)? Is there a common scenario where it would be really painful to re-queue them?

What if I find that I needed to change something in my code or dvc.yaml?

I think it's hard to imagine all the ways that experiments will fail and what the most useful ways will be to "fix" them without much user feedback. I wasn't sure this made sense to prioritize just yet, but of course we can change that if you think it's essential right now and feel confident that a relatively simple fix like exp run --failed would be useful enough.

shcheklein · 2023-02-17T00:37:30Z

@dberenbaum I put it also almost at the bottom of the list above. So, it's not critical, probably not p3 either :)

mattseddon · 2023-02-24T01:19:56Z

Watch and keep updating plots

Spent some time thinking about and replying on iterative/dvc#8478 (iterative/dvc#8478 (comment)). Looks likely that we'll need to build out the mechanics in this repo (at least to begin with).

I'll start working on the rest of the list while I wait for a response.

mattseddon · 2023-03-02T00:10:24Z

Marking this as blocked pending the discussion in iterative/dvc#8478

cc @pmrowla @dberenbaum

mattseddon · 2023-04-24T04:37:33Z

2 of the remaining 4 points are currently being discussed elsewhere.

iterative/dvc#8867

Discussion happening in #3676.

Failed / stoped - add resume and / or run again action

The discussion will happen in iterative/dvc#8867. Do we need this for this issue or can we cut the scope?

The other 2 I should get to shortly.

mattseddon · 2023-05-23T08:15:24Z

unblocked by iterative/dvc#9432 / 2.58.1 of DVC

shcheklein added 🔍 review A placeholder issue to review certain part of the product or story priority-p1 Regular product backlog labels Jan 30, 2023

shcheklein self-assigned this Jan 30, 2023

shcheklein added the 📦 product Needs product input or is being actively worked on label Jan 30, 2023

shcheklein changed the title ~~Review queue functionality~~ Queue: wrap up e2e workflow Feb 4, 2023

shcheklein added story Product feature aka epic. Discussion, progress, checkboxes for implementation, etc and removed 🔍 review A placeholder issue to review certain part of the product or story labels Feb 4, 2023

shcheklein removed their assignment Feb 11, 2023

mattseddon self-assigned this Feb 24, 2023

mattseddon mentioned this issue Feb 24, 2023

exp queue: live metrics iterative/dvc#8478

Closed

This was referenced Feb 24, 2023

Add show logs to context menu of experiments running in the queue #3347

Closed

Add viewable cli process class #3358

Merged

Add DvcViewer class #3359

Merged

Add show logs to context menu of experiments running in the queue #3360

Merged

mattseddon added the blocked Issue or pull request blocked due to other dependencies or issues label Mar 2, 2023

mattseddon mentioned this issue Mar 9, 2023

Plots from temporary experiments are first updated when finished #3436

Closed

mattseddon mentioned this issue Mar 29, 2023

Remove checkpoint experiment support from extension UI #3577

Closed

shcheklein removed the 📦 product Needs product input or is being actively worked on label Apr 18, 2023

mattseddon mentioned this issue May 3, 2023

Remove pausing of data updates when running/queueing experiments #3815

Merged

mattseddon mentioned this issue May 8, 2023

Enable experiment operations when experiment(s) are running in the queue #3832

Merged

mattseddon removed the blocked Issue or pull request blocked due to other dependencies or issues label May 23, 2023

This was referenced May 24, 2023

exp show: experiment order randomly changing iterative/dvc#9504

Closed

Bump min version of DVC to 2.58.1 (Enable live plots for experiments running outside of the workspace) #3965

Merged

mattseddon closed this as completed in #3965 May 28, 2023

mattseddon mentioned this issue May 28, 2023

exp show: provide executor information for finished experiments iterative/dvc#9425

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Queue: wrap up e2e workflow #3178

Queue: wrap up e2e workflow #3178

shcheklein commented Jan 30, 2023 •

edited by mattseddon

Loading

Tasks

shcheklein commented Feb 4, 2023

mattseddon commented Feb 5, 2023 •

edited

Loading

mattseddon commented Feb 8, 2023

shcheklein commented Feb 8, 2023

mattseddon commented Feb 10, 2023

mattseddon commented Feb 10, 2023

shcheklein commented Feb 11, 2023

dberenbaum commented Feb 14, 2023

shcheklein commented Feb 15, 2023

dberenbaum commented Feb 16, 2023

shcheklein commented Feb 17, 2023

mattseddon commented Feb 24, 2023 •

edited

Loading

mattseddon commented Mar 2, 2023

mattseddon commented Apr 24, 2023

mattseddon commented May 23, 2023

Queue: wrap up e2e workflow #3178

Queue: wrap up e2e workflow #3178

Comments

shcheklein commented Jan 30, 2023 • edited by mattseddon Loading

Tasks

shcheklein commented Feb 4, 2023

mattseddon commented Feb 5, 2023 • edited Loading

mattseddon commented Feb 8, 2023

shcheklein commented Feb 8, 2023

mattseddon commented Feb 10, 2023

mattseddon commented Feb 10, 2023

shcheklein commented Feb 11, 2023

dberenbaum commented Feb 14, 2023

shcheklein commented Feb 15, 2023

dberenbaum commented Feb 16, 2023

shcheklein commented Feb 17, 2023

mattseddon commented Feb 24, 2023 • edited Loading

mattseddon commented Mar 2, 2023

mattseddon commented Apr 24, 2023

mattseddon commented May 23, 2023

shcheklein commented Jan 30, 2023 •

edited by mattseddon

Loading

mattseddon commented Feb 5, 2023 •

edited

Loading

mattseddon commented Feb 24, 2023 •

edited

Loading