-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Future dvc.yaml 2.0 improvements #5312
Comments
Cc @dmpetrov @shcheklein @skshetry @dberenbaum @efiop 👋 (anyone else welcome of course) |
The proposed syntax
You can escape with For other concerns, I think I have already replied to them already. |
This is intentional, suggested by @dmpetrov. The reasoning was that our users will be more familiar with foreach loops than other functional/declarative "wordings" such as |
This summary helps me a lot, @jorgeorpinel ! 1. What about allowing users to choose either approach? If users want to be concise and are familiar with the merge resolution order, they can use plain 2.3.2 I'm confused about whether the |
@skshetry concerns that are (partially) new here, or didn't have a reply from you in previous issues: 1 (and thus 3.2), 2.1 specific suggestions (see grayed note too), and 3.1 (improve error msg) |
@dberenbaum correct, that is the idea! The full/explicit tree paths would be optional generally, but if there are conflicting
Oops, that's an error in docs (will fix in iterative/dvc.org/pull/2098). But yes, that's an alternative syntax I'm suggesting: TBH I don't have a strong opinion on this (as long as we're keeping
Yes, everything that usually goes into single stages goes under |
Merging is a feature. It's based on a scenario that @dmpetrov provided, that the user wants to build 3 models for 3 different markets and they share the basic configuration but are trained with market-dependent parameters additionally. eg: # params.yaml
train:
epochs: 10
# models/us/params.yaml (for example)
train:
threshold: 2
# dvc.yaml
stages:
train-model:
foreach: [us, gb, cn]
do:
wdir: models/${item}
vars:
- params.yaml # from `models/${item}/params.yaml`
# this would have been easier if `vars` supported parametrization
cmd: python train.py --threshold ${train.threshold} --epochs ${train.epochs}
outs:
- model-${item} It's a bit complicated example though. (3.2) is based on the order of evaluation. (2.1) - We started with (3.1) - Not a bug, just an enhancement. It's working as intended. |
This was loosened when introducing parametrization. But, `foreach`..`do` should be considered first before stage's regular structure. And, cmd is made required (as it should have been) Related: iterative#5371, iterative#5370, iterative#5312
OKdk. Thanks for the explanations and sorry about the delay. Some final notes from me:
But the feature was called parameterization (now "templating" though).
I thought we agreed that
I didn't say it was a bug. I said the message may not be helpful. And yep, this issue is labeled |
I would add that it's not set in stone :) I would still love to see what is the best for users in this case, how do people call it. The only concern we had is the overlap with |
FWIW we've been using this pretty much since it's available as a beta feature and everyone very naturally called "parameterized dvc pipeline". |
Another user ref. on terminology - they call it "multistage feature" in https://discuss.dvc.org/t/versioning-predictions/656 (even when we're switched to "foreach stages" in the docs already — maybe they read that doc before the change). |
Closing. We can reopen a new issue with any of the remaining questions if they come up again. |
UPDATE: Jump to #5312 (comment) for remaining discussions
Some remaining concerns in order or relevance (only 1 and 2.1 seem more or less worth considering in the immediate term).
1. Behavior: default params file & merging of values
We decided to always include * from params.yaml in
vars
. Trying to redefine values (via other file includes or write-in vars) fails except if they're objects that can be merged (no leaf node conflicts).What about reconsidering the "merging" of vals/objects? Instead we could rename tree paths so there's no possibility of conflicts (@shcheklein's idea). E.g.:
2. Syntax (minor)
Is it worth renaming certain keywords for accuracy? The suggestions below come from the terminology that we ended up using (so far) in docs:
2.1
vars:
aren't really variables (not using that term in docs). How aboutinclude:
orload:
to invoke an action;values:
(we use that term a lot in docs),const:
, orglobals/locals:
for a descriptive term?2.2 The
$
sign in the${}
expression makes it extra tricky to use incmd
(at least on Linux). I understand this was discussed already so no strong opinion,but from our docs-related research, the most common syntax for this isso users will need to worry about escaping anyway...{{ }}
. Of course{}
(or any other brackets) can also be problematic,UPDATE:
{}
has a meaning in YAML so we can't use that anyway.2.3
foreach
: "for each {items} do {stage details}" is a great construct. My only concern is that the given order of the items is not respected (according to #5181 (comment)) so the "loop" analogy isn't that precise. My only alternative ideas areitems:
,set:
,multi:
(we currently use term "mutli-stage" in docs).2.3.2
do
: If we keepforeach
, maybegen/yield:
could a) be slightly more accurate and b) hint that you're not using a typical imperative language loop.If we depart from the loop analogy (use
set/multi/expand:
)do
could just stay, or just be skipped to keep the YAML structure a bit shorter.A possible recombination:
3. Current/Known limitations (future)
3.1 It's not possible to put
vars
(or any other field for that matter) beforeforeach
.dvc repro
gives the following error msg:format error: extra keys not allowed @ data['stages']['mystage']['foreach']
Perhaps we can improve the message so users get that they should either use foreach/do OR a regular stage structure.
3.2.
wdir
can't use${values}
from local write-invars
(becausewdir
is evaluated first, needed for file-based localvars
)? But if we address 1 (no merging of objects) then maybe this can be implemented?3.3
dvc run
doesn't pre-process the commands sent to it (not compatible withvars
). Should it? Probably not (stating here mainly for the record).The text was updated successfully, but these errors were encountered: