Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mixed Content? #176

Open
npmccallum opened this issue Mar 22, 2024 · 3 comments
Open

Mixed Content? #176

npmccallum opened this issue Mar 22, 2024 · 3 comments
Labels
question Further information is requested

Comments

@npmccallum
Copy link

I'm not sure if this is a feature request, documentation request or a user question.

I have some XML like this:

<foo>
    first text
    <bar>second text</bar>
    third text
</foo>

How can I model this? The ordering of the text is significant. So I basically need something like:

class Bar(BaseXmlModel):
    body: str

class Foo(BaseXmlModel):
    body: list[str | Bar]

But of course that doesn't work. What can I do?

@dapper91
Copy link
Owner

@npmccallum Hi,

"first text" and "second text" can be extracted like this:

from pydantic_xml import BaseXmlModel

class Bar(BaseXmlModel, tag='bar'):
    text: str


class Foo(BaseXmlModel, tag='foo'):
    text: str
    bar: Bar

foo = Foo.from_xml(xml)
assert foo.text == '\n    first text\n    '
assert foo.bar.text == 'second text'

Unfortunately element tails are not supported yet. The simplest solution right now to extract "third text" is using raw element:

from lxml.etree import _Element as Element
from pydantic_xml import BaseXmlModel, element


class Foo(BaseXmlModel, tag='foo', arbitrary_types_allowed=True):
    text: str
    bar: Element = element('bar')

    @property
    def bar_text(self):
        return self.bar.text

    @bar_text.setter
    def bar_text(self, text: str):
        self.bar.text = text

    @property
    def bar_tail(self):
        return self.bar.tail

    @bar_tail.setter
    def bar_tail(self, tail: str):
        self.bar.tail = tail


foo = Foo.from_xml(xml)
assert foo.text == '\n    first text\n    '
assert foo.bar_text == 'second text'
assert foo.bar_tail == '\n    third text\n'

@dapper91 dapper91 added the question Further information is requested label Mar 22, 2024
@npmccallum
Copy link
Author

@dapper91 Thanks for the quick response. My real use case is significantly more complex than the simple one I gave. I have dozens of child tags that are interspersed with text. So I really need something like list[str | TypeOne | TypeTwo ... TypeN]. Do you know how difficult this might be to implement?

@dapper91
Copy link
Owner

@npmccallum I think it is possible to add support for element tails. The problem is that in xml parsers (etree, lxml) the tail text corresponds to a sub-element not to the root element, see. Considering your example the tail will be bound to Bar, not to Foo.

So the models will be described like this:

from pydantic_xml import BaseXmlModel

class Bar(BaseXmlModel, tag='bar'):
    text: str
    tail: str = tail()


class Foo(BaseXmlModel, tag='foo'):
    text: str
    bars: list[Bar]

foo = Foo.from_xml(xml)
assert foo.text == '\n    first text\n    '
assert foo.bars[0].text == 'second text'
assert foo.bars[0].tail == '\n    third text\n'
assert foo.bars[1].text == 'fourth text'
assert foo.bars[1].tail == '\n    fifth text\n'
# and so on

Will that be helpful?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants