Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modeling mappings as child elements? #179

Open
zygi opened this issue Apr 4, 2024 · 1 comment
Open

Modeling mappings as child elements? #179

zygi opened this issue Apr 4, 2024 · 1 comment
Labels
question Further information is requested

Comments

@zygi
Copy link

zygi commented Apr 4, 2024

Imagine I have the following xml:

<article>
<title>Hello</title>
<metadata>
<md_key_1>text_content_1</md_key_1>
<md_key_2>text_content_2</md_key_2>
...
</metadata>
</article>

That is, the metadata consists of a dynamic number of elements with dynamic tags and no attributes, each of which contains just text.

Ideally, I would want this to map to the Python model

class Article:
  title: str
  metadata: Dict[str, str]

is there a way to achieve that with pydantic-xml? The closest I got so far was by making metadata a raw field, but then working from the Python side gets a little annoying: how do I construct an instance of Article when metadata is ET.Element? I could create a new constructor class method but then I'd have to remember that for this specific model only, I shouldn't use the constructor.

The other approach I expected to work was setting metadata=Field(exclude=True) and implementing a @computed_element for serialization, and a @field_validator for deserialization. Unfortunately the @field_validator approach doesn't work:

class Article(BaseXmlModel, tag="article"):
    title: str
    metadata: Dict[str, str] = Field(exclude=True)
    
    @field_validator('metadata', mode='before')
    def decode_content(cls, value: Any) -> Optional[Dict[str, str]]:
        print(value)
        assert False

   
if __name__ == "__main__":
    TEST_INPUT = """\
<article>
<title>Hello</title>
<metadata>
<md_key_1>text_content_1</md_key_1>
<md_key_2>text_content_2</md_key_2>
</metadata>
</article>
"""
    Article.from_xml(TEST_INPUT)

prints

  [line -1]: Assertion failed,  [type=assertion_error, input_value={}, input_type=dict]

i.e. the validator receives an empty dict, not anything that could reconstruct the inner fields.

Is there a currently supported approach that I'm missing?

Thanks!

@dapper91 dapper91 added the question Further information is requested label Jun 29, 2024
@dapper91
Copy link
Owner

dapper91 commented Jun 29, 2024

@zygi Hi,

Right now there is not way to model an element with dynamic tags. The workaround I see is the following:

from typing import Any

from lxml import etree

from pydantic_xml import BaseXmlModel, element
from pydantic import model_validator


class Article(BaseXmlModel, tag="article", arbitrary_types_allowed=True):
    title: str = element()
    metadata_raw: etree._Element = element(tag='metadata', default=None)

    @property
    def metadata(self) -> dict[str, str]:
        return {el.tag: el.text for el in self.metadata_raw}

    @model_validator(mode='before')
    @classmethod
    def set_metadata_raw(cls, data: dict) -> dict:
        if metadata := data.pop('metadata', None):
            data['metadata_raw'] = metadata_raw = etree.Element('metadata')
            for tag, text in metadata.items():
                sub = etree.SubElement(metadata_raw, tag)
                sub.text = text

        return data


if __name__ == "__main__":
    TEST_INPUT = """\
<article>
<title>Hello</title>
<metadata>
<md_key_1>text_content_1</md_key_1>
<md_key_2>text_content_2</md_key_2>
</metadata>
</article>
"""
    article = Article.from_xml(TEST_INPUT)
    print(article)
    print(article.metadata)
    print(article.to_xml().decode())

    article = Article(title='Hello', metadata={'md_key_1': 'text_content_1', 'md_key_2': 'text_content_2'})
    print(article)
    print(article.metadata)
    print(article.to_xml().decode())

output:

title='Hello' metadata_raw=<Element metadata at 0x1057376c0>
{'md_key_1': 'text_content_1', 'md_key_2': 'text_content_2'}
<article><title>Hello</title><metadata>
<md_key_1>text_content_1</md_key_1>
<md_key_2>text_content_2</md_key_2>
</metadata>
</article>
title='Hello' metadata_raw=<Element metadata at 0x105811b80>
{'md_key_1': 'text_content_1', 'md_key_2': 'text_content_2'}
<article><title>Hello</title><metadata><md_key_1>text_content_1</md_key_1><md_key_2>text_content_2</md_key_2></metadata></article>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants