-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
repro: support glob/foreach-group to run at once through CLI #4976
repro: support glob/foreach-group to run at once through CLI #4976
Conversation
@@ -165,6 +166,7 @@ def __init__( | |||
|
|||
self.cache = Cache(self) | |||
self.cloud = DataCloud(self) | |||
self.stage = StageLoad(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion for better naming would be appreciated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StageResolver
? Also, I think that naming the object stage
while we have a class of this name will get confusing and some point. Maybe stage_load
or stage_resolver
will be ok, depending on what name we choose in the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pared, will make it StageLoader
, and change the other one to MultiStageLoader
.
Maybe stage_load or stage_resolver will be ok, depending on what name we choose in the end.
stage_load.load_one()
looks a bit odd to me. :) But, I understand your concern.
@@ -133,6 +133,11 @@ def __len__(self): | |||
def __contains__(self, name): | |||
return name in self.resolved_data | |||
|
|||
def is_foreach_generated(self, name: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Temporary solution for now, will be fixed during optimization.
if not file: | ||
# parsing is ambiguous when it does not have a colon | ||
# or if it's not a dvcfile, as it can be a stage name | ||
# in `dvc.yaml` or, an output in a stage. | ||
logger.debug( | ||
"Checking if stage '%s' is in '%s'", target, PIPELINE_FILE | ||
) | ||
if not (recursive and os.path.isdir(target)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
os.path.isdir
might have been a bug. Though, we don't collect one single file when using brancher
.
Implemented a real globbing, rather than a regex. |
There's no need for a glob here
@@ -165,6 +166,7 @@ def __init__( | |||
|
|||
self.cache = Cache(self) | |||
self.cloud = DataCloud(self) | |||
self.stage = StageLoad(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StageResolver
? Also, I think that naming the object stage
while we have a class of this name will get confusing and some point. Maybe stage_load
or stage_resolver
will be ok, depending on what name we choose in the end.
Nice! But the same option name in |
It comes from the UNIX shell |
Thanks Johny, good to confirm. My comment is more about the UX. We can't assume users have such a specific UNIX/Linux or Python experience (or a Ph.D 😉).
|
@jorgeorpinel, our users might be more familiar with |
I agree with @jorgeorpinel that it's a technical term being used in UI, which is not ideal. You have to explain it in docs using usual, less technical terms - e.g. path matches pattern. Then why not call it off-topic @skshetry it reminds me the |
@jorgeorpinel @shcheklein term |
I didn't know the |
For the record: We've discussed --glob in #4864 (comment) before. Just keeping it as is until we add support for it to more commands and enable it by default. |
When will we know? Is there an issue to follow on this? I can create it otherwise. Please lmk
We have but specifically for add. My first impression here is that it does something quite different for repro (using glob internally, but that's an implementation detail).
This trend suggest otherwise but it's also not definitive proof. In any case it's also about intuitiveness I think.
OK let's move on! But sometimes I worry that we may underestimate important details that can hinder the product in the long-run, and may be very hard to even detect later. To me the takeaway is whether we consider UI/UX part of the dev process, or whether it's OK that it's secondary. My bias is towards the former because like Ivan mentioned it's harder to ignore these confusions when writing/reviewing docs anyway. But I realize that we would need a slightly different inter-team workflow (and maybe timelines) so that everyone has time to think and give feedback on these things — not having to discuss after the PR is merged. Food for thought |
I think it should either be
|
@jorgeorpinel I understand your perspective, let's indeed try to sync on this. The issue for --glob is #4816 . We are just waiting for |
This PR adds a way to be able to run the following stage
foreach
group through a single targetbuild
:Eg:
$ dvc repro build
Also, this PR adds a
--glob
flag so that you can achieve the same with:$ dvc repro "build@*" --glob
Also note that, globbing is only allowed for the stage name, not to the file.
So, the following would work:
$ dvc repro "dvc.yaml:build*" --glob
But, the below one won't:
$ dvc repro "**/dvc.yaml:build*" --glob
Regarding the implementation, I have moved all of the target parsing logic inside
dvc.repo.stage
in aStageLoad
class.So, enabling these features on a
checkout
or similar commands should be straightforward.Fixes #4912
Fixes #4886
Fixes #4958
Tests will follow tomorrow.
❗ I have followed the Contributing to DVC checklist.
📖 If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
repro: document
--glob
flag dvc.org#1975Thank you for the contribution - we'll try to review it as soon as possible. 🙏