Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quick listing of stages #3743

Closed
mribeirodantas opened this issue May 5, 2020 · 12 comments
Closed

Quick listing of stages #3743

mribeirodantas opened this issue May 5, 2020 · 12 comments
Labels
feature request Requesting a new feature

Comments

@mribeirodantas
Copy link
Contributor

Before last release, it was handy to reproduce my stages due to easy filename completion in the command line, due to the stages being files in the directory. All I had to do was to type dvc repro, start typing the name and hit TAB.

Now, stages are fields in a YAML-formatted single file. If I don't know exactly the stage name, I must open the file, look for the part of the file where the stage name that I am looking for is located at, and check or copy-paste from there. Then leave the file and type.

The ideal feature would be to auto-complete with TAB, just like before, but this can be outside the technical scope of DVC (a dirty fix would be to have empty files with stage names, but I don't think that's a good solution...). Therefore, I think a feature that could improve usability would be listing of stages. There could be a new option in the repro command such as dvc repro -l. This command would parse the dvc.yaml and list the stage names so that the user could type them by seeing the desired stage name on the same screen since it has just been printed out.

@triage-new-issues triage-new-issues bot added the triage Needs to be triaged label May 5, 2020
@efiop
Copy link
Contributor

efiop commented May 8, 2020

Hi @mribeirodantas !

Sorry for the delay. We've changed the defaults in 1.0.0a1 to make dvc repro use dvc.yaml by default. Maybe that could help.

Regarding the shell completion, I think we can definitely implement that by doing something like dvc pipeline list inside the shell completion, but it is obviously not implemented yet, unfortunately 🙁

There could be a new option in the repro command such as dvc repro -l. This command would parse the dvc.yaml and list the stage names so that the user could type them by seeing the desired stage name on the same screen since it has just been printed out.

I'm not sure if this is a good idea, seems like something that repro shouldn't bother with. If you think that shell completion is a more desirable feature, then we should go straight to implementing it, it shouldn't be too hard to do.

@efiop efiop added the awaiting response we are waiting for your reply, please respond! :) label May 8, 2020
@triage-new-issues triage-new-issues bot removed the triage Needs to be triaged label May 8, 2020
@mribeirodantas
Copy link
Contributor Author

Yeah, I talked to @shcheklein about shell completion and I gave a read in the files for shell completion (bash/zsh). I think that's something I would like to try to implement myself. Is it fine?

@shcheklein shcheklein added the feature request Requesting a new feature label May 8, 2020
@shcheklein
Copy link
Member

@mribeirodantas that would be really cool and useful! :) btw, does dvc list show what we need in 1.0a? we can consider implementing specific options to make output machine parsable (json, or a pure list w/o headers) if needed.

@mribeirodantas
Copy link
Contributor Author

For the dvc list, I think it does.

@shcheklein
Copy link
Member

It looks like dvc pipeline list is not exactly what we want:

  • It also outputs DVC-files (inputs to the pipeline) - do we want to pass them to repro?
  • Output includes some delimiters, summary, etc - some option like --show-json is needed after all?

That's how it looks like for me right now:

dvc.yaml:prepare
dvc.yaml:featurize
dvc.yaml:train
dvc.yaml:evaluate
data/data.xml.dvc
================================================================================
1 pipelines total

@mribeirodantas
Copy link
Contributor Author

I think we do want to pass them to repro. The argument -p in dvc repro reproduces the stage that contains the specified dvc-tracked file, so it would be nice if dvc repro could also tab-complete the name of files contained in a stage.

About the format, we will have to parse it anyway, so whatever it's printed, we can parse that and make sure it's tab-completable. What do you think?

@shcheklein
Copy link
Member

... could also tab-complete the name of files contained in a stage.

it's a good feature and I've been thinking about this. e.g. run dvc repro model.pkl would actually find the stage that corresponds to that output and reproduce the stage. As far as I remember it's not implemented yet - we can create a feature request - it sounds very reasonable to me, and definitely easier than dvc repro dvc.yaml:train that we have in 1.0a (cc @dmpetrov )

The argument -p in dvc repro reproduces the stage that contains the specified dvc-tracked file

I think it actually expects the stage DVC-file, not one one of the outputs. Unless I'm missing something.

But even it were the case, I would have expected something like dvc repro data/data.xml, not dvc repro data/data.xml.dvc. Or at least both of those, like I mentioned above.

About the format, we will have to parse it anyway, so whatever it's printed, we can parse that and make sure it's tab-completable. What do you think?

The usual problem here is that it means the we make this output an API that we'll have to guarantee. Also, parsing will be pretty ad-hoc and weird. It is usually done with a special command. Here is a good guide on how to write a good output - https://devcenter.heroku.com/articles/cli-style-guide#human-readable-output-vs-machine-readable-output . I would say it makes sense to completely redo the default output for this command, as well as introduce:

... When needed, commands should offer a --json and/or a --terse flag when valuable to allow users to easily parse and script the CLI. ...

@skshetry
Copy link
Member

skshetry commented Nov 26, 2020

We now have a desc and size keywords in the stages, which we can use it to our advantage, and provide --help like message for the stages.

$ dvc stages
build-us: Builds a US specific model  (prepare -> process -> build-us)
build-gb: Builds a UK specific model  (prepare -> process -> build-gb)

We could even provide a default message, if desc does not exist, like:

$ dvc stages
build-us: Produces `model-us.hdf5` (7M), depends on `us-markets.csv`

Or, maybe both of those to create a verbose output and maybe even with more fields.

$ dvc stages
build-us: Builds a US specific model
          Produces model-us.hdf5 (7M)
          Depends on: `us-markets.csv`, etc.
build-gb: Builds a UK specific model
          Produces model-gb.hdf5 (7M)
          Depends on: `gb-markets.csv`, etc.

I might have gone over the top here in the suggestion, but the core of it is to list stages and provide a snippet of a helpful message (preferably with beautiful colours). :)

@efiop
Copy link
Contributor

efiop commented Nov 27, 2020

Let's start with something simple like dvc stages <target> that lists all stages in the <target>. We could then use it in our autocompletion scripts(not necessarily part of the first step).

E.g.

$ dvc stages
data.dvc
dvc.yaml:stage1
dvc.yaml:stage2
path/to/dvc.yaml:stage3
path/to/other/dvc.yaml:stage4

or with target:

$ dvc stages dvc.yaml
dvc.yaml:stage1
dvc.yaml:stage2
$ dvc stages path
path/to/dvc.yaml:stage3
path/to/other/dvc.yaml:stage4

We can start with this being a default behavior for now, and we'll change it to something more verbose later as noted by @skshetry .

@jorgeorpinel
Copy link
Contributor

maybe both of those to create a verbose output and maybe even with more fields.

$ dvc stages
build-us: Builds a US specific model
          Produces model-us.hdf5 (7M)
          Depends on ...

This looks a lot like opening dvc.lock (maybe just copy over desc to the lockfile if we don't do so yet). It's a bit more human-friendly for sure though.

@skshetry
Copy link
Member

skshetry commented Jan 9, 2021

It's a bit more human-friendly for sure though.

@dberenbaum
Copy link
Collaborator

@skshetry Can we close this one as completed or as a duplicate of #5390?

@efiop efiop closed this as completed Dec 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Requesting a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants