Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stage name incorrectly parsed with parametrization for dict #4957

Closed
johnnychen94 opened this issue Nov 24, 2020 · 7 comments · Fixed by #4961 or #5077
Closed

stage name incorrectly parsed with parametrization for dict #4957

johnnychen94 opened this issue Nov 24, 2020 · 7 comments · Fixed by #4961 or #5077
Assignees
Labels
A: templating Related to the templating feature bug Did we break something?

Comments

@johnnychen94
Copy link
Contributor

Bug Report

Please provide information about your setup

List version works:

stages:
  build:
    foreach:
      - 15
      - 25
      - 35
    do:
      cmd: echo ${item}
jc@mathai3:~/Downloads/dvc_test$ dvc repro "build@15"
Running callback stage 'build@15' with command:
	echo 15
15

However, the dict version doesn't

stages:
  build:
    foreach:
      - level: 15
      - level: 25
      - level: 35
    do:
      cmd: echo ${item.level}
jc@mathai3:~/Downloads/dvc_test$ dvc repro --dry
Running callback stage 'build@{'level': 15}' with command:
	echo 15

Running callback stage 'build@{'level': 25}' with command:
	echo 25

Running callback stage 'build@{'level': 35}' with command:
	echo 35
Use `dvc push` to send your updates to remote storage.
jc@mathai3:~/Downloads/dvc_test$ dvc repro -f "build@{'level': 15}"
ERROR: 'build@{'level'' does not exist.

According to the error output, I guess the stage name is incorrectly parsed somewhere. I tried to look at the source code but failed to find a clue.

Output of dvc version:

DVC version: 1.10.1+abb78c
---------------------------------
Platform: Python 3.8.3 on Linux-5.4.0-42-generic-x86_64-with-glibc2.10
Supports: All remotes
Cache types: symlink
Cache directory: nfs4 on storage:/home
Caches: local
Remotes: None
Workspace directory: nfs4 on storage:/home
Repo: dvc, git
@skshetry
Copy link
Member

skshetry commented Nov 24, 2020

@johnnychen94, this is not supported. List in foreach data is not supported to have dict or any complex object inside it.

Can you refactor it to use dict than a list:

stages:
  build:
    foreach:
      level1:
        level: 15
      level2:
        level: 35
    do:
      cmd: echo ${item.level}

@skshetry skshetry added feature request Requesting a new feature A: templating Related to the templating feature bug Did we break something? and removed feature request Requesting a new feature labels Nov 24, 2020
@skshetry skshetry self-assigned this Nov 24, 2020
@johnnychen94
Copy link
Contributor Author

Good to know the alternative and yes it works.

# dvc.yaml
stages:
  build:
    foreach: ${models}
    do:
      cmd: "echo type: ${item.noise_type} level: ${item.noise_level}"
# params.yml
models:
  model_1:
    noise_type: "gaussian"
    noise_level: 15
  model_2:
    noise_type: "gaussian"
    noise_level: 25
jc@mathai3:~/Downloads/dvc_test$ dvc repro --dry -f
Running stage 'build@model_1' with command:
	echo type: gaussian level: 15

Running stage 'build@model_2' with command:
	echo type: gaussian level: 25
Use `dvc push` to send your updates to remote storage.

Should I update the wiki? I guess this is something that people will occasionally hit because it's simpler to write

# params.yml
models:
  - noise_type: "gaussian"
    noise_level: 15
  - noise_type: "gaussian"
    noise_level: 25

@skshetry
Copy link
Member

skshetry commented Nov 25, 2020

Hi, @johnnychen94. I created #4961 to fix this issue, but I still recommend using dictionary for better experience.

Should I update the wiki? I guess this is something that people will occasionally hit because it's simpler to write
I don't think you have access to the wiki, let me add a note there.

I'd still like to keep this issue open for future considerations, so please don't consider it as officially supported yet.

@skshetry
Copy link
Member

skshetry commented Dec 11, 2020

# params.yml
models:
  - noise_type: "gaussian"
    noise_level: 15
  - noise_type: "gaussian"
    noise_level: 25

@johnnychen94 , Coming to this again, would generating stages as build@0, build@1 ... work for you? This will be incompatible change, however.
Or, would that be confusing? And, will you need the key/index of those lists (lists don't have key right now :), only dicts do).

Thanks.

@johnnychen94
Copy link
Contributor Author

sounds good to me. Is the id order guaranteed, I guess not?

@skshetry
Copy link
Member

skshetry commented Dec 11, 2020

@johnnychen94, good question. I think it's good to assume list as ordered and generate on the basis of that index, what do you think?

We could also do that on the basis of its content, but it'll complicate internals (which might require hashing?) and making it that way, DVC won't be able to work if you change/update any item inside. I think this is more suitable for a dictionary.

Another question, should it be 0-indexed or 1-indexed? 😄

@johnnychen94
Copy link
Contributor Author

johnnychen94 commented Dec 11, 2020

I believe our current parametrization workflow does not (and should not) rely on the index order, so perhaps some careful documentation on this would be sufficient, e.g., "the index order is undefined and depends on internal implementation".

I can't foresee what extra benefits hash could give us compared to the simple list order. So might be a good enough reason to stick to the simple version.

Another question, should it be 0-indexed or 1-indexed?

As far as it's well documented, it doesn't matter much for me. I know that there are people who dislike another language because it uses 1-indexed but that makes no sense to me.

I prefer this to be 0-indexed because it might be better for future job scheduling tasks. For example, suppose that we have 20 jobs, then with the glob feature #4976 we could split it into two balanced batches: dvc run build@? --glob and dvc run build@1? --glob. (Does ? work as a single character match here?)

Another example that 0-indexed might help, is that nvidia devices are ordered by 0-indexed (pcie bus order), so it can be convenient to schedule the first job build@0 to the first gpu card CUDA_VISIBLE_DEVICES=0, and so on.

skshetry added a commit to skshetry/dvc that referenced this issue Dec 14, 2020
minor refactors, code comments

Rename interim to definitions

provide location of entry during init for definition

small fixups to set_temporarily

Refactor loop to use dict comprehension

Provide same interface for definitions

Make it work for composite lists

Check for sequence rather than a list

refactor tests

Fixes iterative#4955
Fixes iterative#4957
Fixes iterative#4963
skshetry added a commit that referenced this issue Dec 14, 2020
minor refactors, code comments

Rename interim to definitions

provide location of entry during init for definition

small fixups to set_temporarily

Refactor loop to use dict comprehension

Provide same interface for definitions

Make it work for composite lists

Check for sequence rather than a list

refactor tests

Fixes #4955
Fixes #4957
Fixes #4963
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: templating Related to the templating feature bug Did we break something?
Projects
None yet
2 participants