Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Severe memory leak #222

Closed
asrenzo opened this issue Nov 7, 2024 · 5 comments
Closed

Severe memory leak #222

asrenzo opened this issue Nov 7, 2024 · 5 comments
Labels
bug Something isn't working v2 Version 2 related

Comments

@asrenzo
Copy link

asrenzo commented Nov 7, 2024

Hi,

With latest version (2.13.1) of pydantic-xml we experience a severe memory leak. This problem may also be present with previous versions.

We were able to build a minimal working example to reproduce the problem which seems to be linked with field decorators and/or model_validators in children models of a main one.

Here is a simple way to vizualize the leak with objgraph.
Sounds like XmlElement, State and XmlEntityInfo are not cleaned when a model is loaded with invalid datasets

This example is based on your own documentation sample.

from datetime import date
from enum import Enum
from typing import Dict, List, Literal, Optional, Set, Tuple

import objgraph
import pydantic as pd
from pydantic import HttpUrl, ValidationError, conint
from pydantic_xml import BaseXmlModel, RootXmlModel, attr, element, wrapped
from pydantic_xml.element.native import XmlElement

DATA = """<Company trade-name="SpaceX" type="Private" xmlns:pd="http://www.company.com/prod">
    <Founder name="Elon" surname="Musk"/>
    <Founded>2002-03-14</Founded>
    <Employees>12000</Employees>
    <WebSite>https://www.spacex.com</WebSite>
    <Industries>
        <Industry>space</Industry>
        <Industry>communications</Industry>
    </Industries>
    <key-people>
        <person position="CEO" name="Elon Musk"/>
        <person position="CTO" name="Elon Musk"/>
        <person position="COO" name="Gwynne Shotwell"/>
    </key-people>
    <hq:headquarters xmlns:hq="http://www.company.com/hq">
        <hq:country>US</hq:country>
        <hq:state>California</hq:state>
        <hq:city>Hawthorne</hq:city>
    </hq:headquarters>
    <co:contacts xmlns:co="http://www.company.com/contact">
        <co:socials>
            <co:social co:type="linkedin">https://www.linkedin.com/company/spacex</co:social>
            <co:social co:type="twitter">https://twitter.com/spacex</co:social>
            <co:social co:type="youtube">https://www.youtube.com/spacex</co:social>
        </co:socials>
    </co:contacts>
    <pd:product pd:status="running" pd:launched="2013">Several launch vehicles</pd:product>
    <pd:product pd:status="running" pd:launched="2019">Starlink</pd:product>
    <pd:product pd:status="development">Starship</pd:product>
</Company>"""

DATA_KO = """<Company trade-name="SpaceX" type="Private" xmlns:pd="http://www.company.com/prod">
    <Founder name="Elon" surname="Musk"/>
    <Founded>2002-03-14</Founded>
    <Employees>12000</Employees>
    <WebSite>https://www.spacex.com</WebSite>
    <Industries>
        <Industry>space</Industry>
        <Industry>communications</Industry>
    </Industries>
    <key-people>
        <person position="CEO" name="Elon Musk"/>
        <person position="CTO" name="Elon Musk"/>
        <person position="COO" name="Gwynne Shotwell"/>
    </key-people>
    <hq:headquarters xmlns:hq="http://www.company.com/hq">
        <hq:country>USA</hq:country>
        <hq:state>California</hq:state>
        <hq:city>Hawthorne</hq:city>
    </hq:headquarters>
    <co:contacts xmlns:co="http://www.company.com/contact">
        <co:socials>
            <co:social co:type="linkedin">https://www.linkedin.com/company/spacex</co:social>
            <co:social co:type="twitter">https://twitter.com/spacex</co:social>
            <co:social co:type="youtube">https://www.youtube.com/spacex</co:social>
        </co:socials>
    </co:contacts>
    <pd:product pd:status="running" pd:launched="2013">Several launch vehicles</pd:product>
    <pd:product pd:status="running" pd:launched="2019">Starlink</pd:product>
    <pd:product pd:status="development">Starship</pd:product>
</Company>"""

NSMAP = {
    "co": "http://www.company.com/contact",
    "hq": "http://www.company.com/hq",
    "pd": "http://www.company.com/prod",
}


class Headquarters(BaseXmlModel, ns="hq", nsmap=NSMAP):
    country: str = element()
    state: str = element()
    city: str = element()

    @pd.field_validator("country")
    def validate_country(cls, value: str) -> str:
        if len(value) > 2:
            raise ValueError("country must be of 2 characters")
        return value


class Industries(RootXmlModel):
    root: Set[str] = element(tag="Industry")


class Social(BaseXmlModel, ns_attrs=True, ns="co", nsmap=NSMAP):
    type: str = attr()
    url: HttpUrl


class Product(BaseXmlModel, ns_attrs=True, ns="pd", nsmap=NSMAP):
    status: Literal["running", "development"] = attr()
    launched: Optional[int] = attr(default=None)
    title: str


class Person(BaseXmlModel):
    name: str = attr()


class CEO(Person):
    position: Literal["CEO"] = attr()


class CTO(Person):
    position: Literal["CTO"] = attr()


class COO(Person):
    position: Literal["COO"] = attr()


class Company(BaseXmlModel, tag="Company", nsmap=NSMAP):
    class CompanyType(str, Enum):
        PRIVATE = "Private"
        PUBLIC = "Public"

    trade_name: str = attr(name="trade-name")
    type: CompanyType = attr()
    founder: Dict[str, str] = element(tag="Founder")
    founded: Optional[date] = element(tag="Founded")
    employees: conint(gt=0) = element(tag="Employees")
    website: HttpUrl = element(tag="WebSite")

    industries: Industries = element(tag="Industries")

    key_people: Tuple[CEO, CTO, COO] = wrapped("key-people", element(tag="person"))
    headquarters: Headquarters
    socials: List[Social] = wrapped(
        "contacts/socials",
        element(tag="social", default_factory=list),
        ns="co",
        nsmap=NSMAP,
    )

    products: Tuple[Product, ...] = element(tag="product", ns="pd")


# Load data with a valid XML
try:
    company = Company.from_xml(DATA)
except ValidationError:
    print("Sounds like something is wrong with this dataset")
    # Making sure company is None
    company = None

print("Remaning XMLElements objs in memory")
try:
    objgraph.show_most_common_types(filter=lambda x: isinstance(x, XmlElement))
except Exception:
    print("XmlElement 0")

# => As expected, no XmlElement obj left in memory


# Now load data with a invalid XML
try:
    company = Company.from_xml(DATA_KO)
except ValidationError:
    print("Sounds like something is wrong with this dataset")
    # Making sure company is None
    company = None

print("Remaning XMLElements objs in memory")
try:
    objgraph.show_most_common_types(filter=lambda x: isinstance(x, XmlElement))
except Exception:
    print("XmlElement 0")

# => 24 XmlElement are still in memory

Don't know if you already are aware of this problem. Didn't find anything in open tickets.

Regards,

Laurent

@dapper91 dapper91 added bug Something isn't working v2 Version 2 related labels Nov 9, 2024
@dapper91
Copy link
Owner

dapper91 commented Nov 9, 2024

@asrenzo Hi,

Thanks for the reporting!

From a brief investigation It seems to me like that is a bug in pydantic itself not pydantic-xml

I was able to reproduced the identical behavior in pure pydantic:

>>> import gc
... 
... import objgraph
... import pydantic as pd
... 
... 
... class Company(pd.BaseModel):
...     country: str
... 
...     @pd.field_validator("country")
...     @classmethod
...     def validate_country(cls, value: str) -> str:
...         if len(value) > 2:
...             raise ValueError("country must be of 2 characters")
...         return value
... 
... 
... def create_company():
...     err = None
...     try:
...         return Company.model_validate({'country': 'USA'}, strict=False)
...     except pd.ValidationError as e:
...         err = e
... 
...     raise err
... 
... 
... for _ in range(10000):
...     try:
...         company = create_company()
...     except pd.ValidationError as e:
...         company = None
... 
... gc.collect()
... print(objgraph.show_most_common_types())
... 
traceback          50000
frame              30002
ValueError         10000
ValidationError    10000
function           5142
tuple              3141
dict               2695
ReferenceType      1395
wrapper_descriptor 1330
method_descriptor  1183

It looks like the error traceback leaks when pydantic.field_validator is used and the error is saved in traceback frame f_locals, don't know why right now.

You get XmlElement as the most common type because it is actively allocated on the function frame, seems like the leaked traceback object is the actual problem.

@canardoFR
Copy link

Hi,

Thx for this quick answer. Problem is also on model_valiidator.

What is thé next step? Shall we fill in an issue on pydantic or will you relay it?

Regards

dapper91 added a commit that referenced this issue Nov 9, 2024
This was referenced Nov 9, 2024
Merged
@dapper91
Copy link
Owner

dapper91 commented Nov 9, 2024

@canardoFR I made a fix in 2.14.0 that get rid of keeping ValidationError's in function frame locals.
Please check that the problem is not reproduced in new release.

I think I need more time to find out the origin of the problem before opening an issue on pydantic.

@canardoFR
Copy link

Will test, but can't before tuesday, sorry
Cheers

@canardoFR
Copy link

version 2.14.0 did the trick.

No more memory leakage problems.

Thanks a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working v2 Version 2 related
Projects
None yet
Development

No branches or pull requests

3 participants