Pydantic is the most widely used data validation library for Python. It uses type hints/type annotations to define data models and has quite a nice "feel" to it. Pydantic V2 was released in June 2023 and brings many changes and improvements, including a new Rust-based engine for serializing and validating data.
This package, pydantic-cereal
, is a small extension package that enables users to serialize Pydantic
models with "arbitrary" (non-JSON-fiendly) types to "arbitrary" file-system-like locations.
It uses fsspec
to support generic file systems.
Writing a custom writer (serializer) and reader (loader) with fsspec
URIs is quite straightforward.
You can also use universal-pathlib
's
UPath
with pydantic-cereal
.
📘 See the full documentation here. 📘
See the minimal pure-Python example to learn how to wrap your own type. Below is a preview of this example.
from fsspec import AbstractFileSystem
from pydantic import BaseModel, ConfigDict
from pydantic_cereal import Cereal
cereal = Cereal() # This is a global variable
# Create and "register" a custom type
class MyType(object):
"""My custom type, which isn't a Pydantic model."""
def __init__(self, value: str):
self.value = str(value)
def __repr__(self) -> str:
return f"MyType({self.value})"
def my_reader(fs: AbstractFileSystem, path: str) -> MyType:
"""Read a MyType from an fsspec URI."""
return MyType(value=fs.read_text(path)) # type: ignore
def my_writer(obj: MyType, fs: AbstractFileSystem, path: str) -> None:
"""Write a MyType object to an fsspec URI."""
fs.write_text(path, obj.value)
MyWrappedType = cereal.wrap_type(MyType, reader=my_reader, writer=my_writer)
# Use type within Pydantic model
class MyModel(BaseModel):
"""My custom Pydantic model."""
config = ConfigDict(arbitrary_types_allowed=True) # Pydantic configuration
fld: MyWrappedType
mdl = MyModel(fld=MyType("my_field"))
# We can save the whole model to an fsspec URI, such as this MemoryFileSystem
uri = "memory://my_model"
cereal.write_model(mdl, uri)
# And we can read it back later
obj = cereal.read_model(uri)
assert isinstance(obj, MyModel)
assert isinstance(obj.fld, MyType)
For wrapping 3rd-party libraries, see the Pandas dataframe example.