Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modified recursive dictionary parsing to correctly handle lists in dvc plot templates #134

Merged
merged 3 commits into from
May 20, 2023

Conversation

alexk101
Copy link
Contributor

Previously, the way in which lists were parsed from dvc templates caused the dict_find_value function to end early without continuing to parse the rest of the template once it encountered a list. Additionally, the structure of vega-lite plots when using lists allows for the case of a list of strings. Because of this, when called recursively as was done in 635eaff , the function may fail when it encounters something like a list of strings, as it tries to access the values of d, which is no longer a dictionary, but a string. I have fixed this by allowing for d to be a string, but this may not be desired/sensible since the function is called dict_find_value. It may make sense to change the name of the function or to call another function to handle the string case.

I include a template which previously would not parse as reference for the changes made.

{
    "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
    "repeat": ["quintile"],
    "config": {"view": {"stroke": "transparent"}},
    "spec": {
        "title": {"text": "<DVC_METRIC_TITLE>", "anchor": "middle"},
        "encoding": {
            "column": {"field": "category", "header": {"orient": "bottom"}},
            "row": {"field": "quintile", "title": "Quintile"},
            "y": {"field": "<DVC_METRIC_Y>", "type": "quantitative"},
            "x": {"field": "<DVC_METRIC_X>", "axis": null},
            "color": {"field": "<DVC_METRIC_X>"}
            },
        "mark": "bar",
        "data": {
            "values": "<DVC_METRIC_DATA>"
        }
    }
}

return True
if isinstance(v, list):
return any(dict_find_value(e, value) for e in v)
def dict_find_value(d: Union[dict, str], value: str) -> bool:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to generalize: can it be a list also? E.g. in case of some nested lists.

@shcheklein
Copy link
Member

Thanks for the fix @alexk101 ! It looks reasonable to me, we can keep the name also (it's still trying to find a name).

Can we add some tests for this please?

@dberenbaum
Copy link
Contributor

@alexk101 Thanks for catching this and the changes look good! Since I was also working on this recently (causing this bug), my feeling was that this felt a bit complicated just to check for the anchor, and it feels even more complicated now.

An alternative approach would be to search the raw text for <DVC_METRIC_DATA> before parsing it as json. It looks like the template initially gets read here:

with _open(template_path, encoding="utf-8") as f:
content = json.load(f)
return Template(content, name=template)

You could replace that with something like:

        with _open(template_path, encoding="utf-8") as f:
            raw = f.read()
        content = json.loads(raw)
        template = Template(content, name=template)
        anchor = template.anchor("data")
        if anchor not in raw:
            raise BadTemplateError(
                f"Template '{template.name}' "
                f"is not using '{anchor}' anchor"
            )
        return template

Then we could drop dict_find_value and the validation logic that uses it.

Let me know if that makes sense. The approach you have also works if we have some better tests to avoid a bug like the one I introduced.

@codecov-commenter
Copy link

codecov-commenter commented May 20, 2023

Codecov Report

Patch coverage: 86.36% and project coverage change: -0.24 ⚠️

Comparison is base (635eaff) 96.26% compared to head (fb6631e) 96.03%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #134      +/-   ##
==========================================
- Coverage   96.26%   96.03%   -0.24%     
==========================================
  Files          19       19              
  Lines         777      781       +4     
  Branches      125      129       +4     
==========================================
+ Hits          748      750       +2     
- Misses         16       17       +1     
- Partials       13       14       +1     
Impacted Files Coverage Δ
src/dvc_render/vega_templates.py 94.19% <81.25%> (-1.14%) ⬇️
tests/test_templates.py 100.00% <100.00%> (ø)
tests/test_vega.py 100.00% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@daavoo
Copy link
Contributor

daavoo commented May 20, 2023

Then we could drop dict_find_value and the validation logic that uses it.

Actually, I think that we should be validating more strictly.
We are very opinionated about where the anchor data should be put (inside data->values) but the current validation (and the raw string matching) passes as long as the anchor is somewhere in the template.

@daavoo daavoo merged commit e11b61c into iterative:main May 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants