-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Queue: wrap up e2e workflow #3178
Comments
I've updated the ticket. |
Depends on iterative/dvc#8478. Unactionable until that is closed. |
@shcheklein can you please reorder the checkboxes in order of priority? I will add some comments. |
@mattseddon done (also, trying a new beta feature - tasklists). |
We need to update the extension's
It can be done with
I don't think this is currently possible. Maybe we could do something with a combination of apply, re-queue and start the queue but I feel like it is unlikely that it will work as expected. Feels like this needs a FR against DVC.
What would the progress notification show and when would it update? From the data we get back, it would go from being empty to being full and that would be it (we have no information about intermediate steps). My suggestion for this would be to raise an issue in DVC to make the command more performant. |
I don't think we can mark this as done until iterative/dvc#8763 is resolved. |
Thanks @mattseddon, for giving more color / details to the list.
Could you please add more color/details to this?
can we cut the scope? Show some tooltip with static output when you show it , dump logs as a file and open it, etc? Clearly ppl will have to run it again to get the most recent result. But at least it something. It's way better compared to nothing.
@dberenbaum could you please chime in? What's your take on being able to resume / queue again a failed / killed experiment(s)?
It's not possible I think. We can get it only to a certain level. I see that it's problematic w/o DVC support. We need to start thinking about improving this though - for data commands, for queue, for all commands that take long time to run. |
Resuming a killed experiment and re-queuing a failed experiment seem like two different scenarios to me:
|
Let's say I made a silly mistake and 100 experiments failed? Let's say I forgot to pull data? Let's say it happens in the last stage of the pipeline and it took a lot of time to run them? I think I would want a way to fix something and re-rerun / resume things. |
Yep, agree that it's important. I guess the p3 in iterative/dvc#8867 doesn't reflect that, but I did note there that it's important and is only p3 because we already have too many competing priorities.
I don't mean to discount this, but I wasn't sure this was painful enough to make the queue feel broken. In that example, do you think it's likely that the 100 experiments got queued one at a time or by a grid search like What if I find that I needed to change something in my code or dvc.yaml? I think it's hard to imagine all the ways that experiments will fail and what the most useful ways will be to "fix" them without much user feedback. I wasn't sure this made sense to prioritize just yet, but of course we can change that if you think it's essential right now and feel confident that a relatively simple fix like |
@dberenbaum I put it also almost at the bottom of the list above. So, it's not critical, probably not p3 either :) |
Spent some time thinking about and replying on iterative/dvc#8478 (iterative/dvc#8478 (comment)). Looks likely that we'll need to build out the mechanics in this repo (at least to begin with). I'll start working on the rest of the list while I wait for a response. |
Marking this as blocked pending the discussion in iterative/dvc#8478 |
2 of the remaining 4 points are currently being discussed elsewhere. Discussion happening in #3676.
The discussion will happen in iterative/dvc#8867. Do we need this for this issue or can we cut the scope? The other 2 I should get to shortly. |
unblocked by iterative/dvc#9432 / |
Followup after #3091
Tasks
See #3091 (comment)
The text was updated successfully, but these errors were encountered: