Mixed Content? #176

npmccallum · 2024-03-22T11:48:09Z

I'm not sure if this is a feature request, documentation request or a user question.

I have some XML like this:

<foo>
    first text
    <bar>second text</bar>
    third text
</foo>

How can I model this? The ordering of the text is significant. So I basically need something like:

class Bar(BaseXmlModel):
    body: str

class Foo(BaseXmlModel):
    body: list[str | Bar]

But of course that doesn't work. What can I do?

dapper91 · 2024-03-22T13:06:47Z

@npmccallum Hi,

"first text" and "second text" can be extracted like this:

from pydantic_xml import BaseXmlModel

class Bar(BaseXmlModel, tag='bar'):
    text: str


class Foo(BaseXmlModel, tag='foo'):
    text: str
    bar: Bar

foo = Foo.from_xml(xml)
assert foo.text == '\n    first text\n    '
assert foo.bar.text == 'second text'

Unfortunately element tails are not supported yet. The simplest solution right now to extract "third text" is using raw element:

from lxml.etree import _Element as Element
from pydantic_xml import BaseXmlModel, element


class Foo(BaseXmlModel, tag='foo', arbitrary_types_allowed=True):
    text: str
    bar: Element = element('bar')

    @property
    def bar_text(self):
        return self.bar.text

    @bar_text.setter
    def bar_text(self, text: str):
        self.bar.text = text

    @property
    def bar_tail(self):
        return self.bar.tail

    @bar_tail.setter
    def bar_tail(self, tail: str):
        self.bar.tail = tail


foo = Foo.from_xml(xml)
assert foo.text == '\n    first text\n    '
assert foo.bar_text == 'second text'
assert foo.bar_tail == '\n    third text\n'

npmccallum · 2024-03-22T13:16:28Z

@dapper91 Thanks for the quick response. My real use case is significantly more complex than the simple one I gave. I have dozens of child tags that are interspersed with text. So I really need something like list[str | TypeOne | TypeTwo ... TypeN]. Do you know how difficult this might be to implement?

dapper91 · 2024-03-31T15:01:24Z

@npmccallum I think it is possible to add support for element tails. The problem is that in xml parsers (etree, lxml) the tail text corresponds to a sub-element not to the root element, see. Considering your example the tail will be bound to Bar, not to Foo.

So the models will be described like this:

from pydantic_xml import BaseXmlModel

class Bar(BaseXmlModel, tag='bar'):
    text: str
    tail: str = tail()


class Foo(BaseXmlModel, tag='foo'):
    text: str
    bars: list[Bar]

foo = Foo.from_xml(xml)
assert foo.text == '\n    first text\n    '
assert foo.bars[0].text == 'second text'
assert foo.bars[0].tail == '\n    third text\n'
assert foo.bars[1].text == 'fourth text'
assert foo.bars[1].tail == '\n    fifth text\n'
# and so on

Will that be helpful?

dapper91 added the question Further information is requested label Mar 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mixed Content? #176

Mixed Content? #176

npmccallum commented Mar 22, 2024

dapper91 commented Mar 22, 2024

npmccallum commented Mar 22, 2024

dapper91 commented Mar 31, 2024

Mixed Content? #176

Mixed Content? #176

Comments

npmccallum commented Mar 22, 2024

dapper91 commented Mar 22, 2024

npmccallum commented Mar 22, 2024

dapper91 commented Mar 31, 2024