Skip to content

Latest commit

 

History

History
359 lines (292 loc) · 12.5 KB

README.md

File metadata and controls

359 lines (292 loc) · 12.5 KB

Type-safe data interchange for Python

JSON is a popular message interchange format employed in API design for its simplicity, readability, flexibility and wide support. However, json.dump and json.load offer no direct support when working with Python data classes employing type annotations. This package offers services for working with strongly-typed Python classes: serializing objects to JSON, deserializing JSON to objects, and producing a JSON schema that matches the data class, e.g. to be used in an OpenAPI specification.

Unlike orjson, this package supports both serializing and deserializing complex types such as data classes, UUIDs, decimals, etc., and allows specifying custom serialization and deserialization hooks. It doesn't require introducing custom classes in your class inheritance chain (such as BaseModel in pydantic dataclasses), making it suitable for operating on classes defined in third-party modules.

Features

This package offers the following services:

  • JSON serialization and de-serialization
    • Generate a JSON object from a Python object (serialization.object_to_json)
    • Parse a JSON object into a Python object (serialization.json_to_object)
  • JSON schema
    • Generate a JSON schema from a Python type (schema.classdef_to_schema)
    • Validate a JSON object against a Python type (schema.validate_object)
  • Type information
    • Extract documentation strings (a.k.a. docstring) from types (docstring.parse_type)
    • Inspect types, including generics (package inspection)

These services come with full support for complex types like data classes, named tuples and generics.

In the context of this package, a JSON object is the (intermediate) Python object representation produced by json.loads from a JSON string. In contrast, a JSON string is the string representation generated by json.dumps from the (intermediate) Python object representation.

Use cases

  • Writing a cloud function (lambda) that communicates with JSON messages received as HTTP payload or websocket text messages
  • Verifying if an API endpoint receives well-formed input
  • Generating a type schema for an OpenAPI specification to impose constraints on what messages an API can receive (see python-openapi)
  • Parsing JSON configuration files into a Python object

Usage

Consider the following class definition:

@dataclass
class Example:
    "A simple data class with multiple properties."

    bool_value: bool = True
    int_value: int = 23
    float_value: float = 4.5
    str_value: str = "string"
    datetime_value: datetime.datetime = datetime.datetime(1989, 10, 23, 1, 45, 50)
    guid_value: uuid.UUID = uuid.UUID("f81d4fae-7dec-11d0-a765-00a0c91e6bf6")

First, we serialize the object to JSON with

source = Example()
json_obj = object_to_json(source)

Here, the variable json_obj has the value:

{
    "bool_value": True,
    "int_value": 23,
    "float_value": 4.5,
    "str_value": "string",
    "datetime_value": "1989-10-23T01:45:50",
    "guid_value": "f81d4fae-7dec-11d0-a765-00a0c91e6bf6",
}

Next, we restore the object from JSON with

target = json_to_object(Example, json_obj)

Here, target holds the restored data class object:

Example(
    bool_value=True,
    int_value=23,
    float_value=4.5,
    str_value="string",
    datetime_value=datetime.datetime(1989, 10, 23, 1, 45, 50),
    guid_value=uuid.UUID("f81d4fae-7dec-11d0-a765-00a0c91e6bf6"),
)

We can also produce the JSON schema corresponding to the Python class:

json_schema = json.dumps(classdef_to_schema(Example), indent=4)

which yields

{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "type": "object",
    "properties": {
        "bool_value": {
            "type": "boolean",
            "default": true
        },
        "int_value": {
            "type": "integer",
            "default": 23
        },
        "float_value": {
            "type": "number",
            "default": 4.5
        },
        "str_value": {
            "type": "string",
            "default": "string"
        },
        "datetime_value": {
            "type": "string",
            "format": "date-time",
            "default": "1989-10-23T01:45:50"
        },
        "guid_value": {
            "type": "string",
            "format": "uuid"
        }
    },
    "additionalProperties": false,
    "required": [
        "bool_value",
        "int_value",
        "float_value",
        "str_value",
        "datetime_value",
        "guid_value"
    ],
    "title": "A simple data class with multiple properties."
}

If a type has a Python docstring, then title and description fields in the JSON schema are populated from the text in the documentation string.

Standards

For producing a JSON schema, the following JSON schema standards are supported:

Conversion table

The following table shows the conversion types the package employs:

Python type JSON schema type Behavior
None null
bool boolean
int integer
float number
str string
decimal.Decimal number
bytes string represented with Base64 content encoding
datetime string constrained to match ISO 8601 format 2018-11-13T20:20:39+00:00
date string constrained to match ISO 8601 format 2018-11-13
time string constrained to match ISO 8601 format 20:20:39+00:00
UUID string constrained to match UUID format f81d4fae-7dec-11d0-a765-00a0c91e6bf6
Enum value type stores the enumeration value type (typically integer or string)
Optional[T] depends on inner type reads and writes T if present
Union[T1, T2, ...] depends on concrete type serializes to the appropriate inner type; deserializes from the first matching type
List[T] array recursive in T
Dict[K, V] object recursive in V, keys are coerced into string
Dict[Enum, V] object recursive in V, keys are of enumeration value type and coerced into string
Set[T] array recursive in T, container has uniqueness constraint
Tuple[T1, T2, ...] array array has fixed length, each element has specific type
Literal[const] type matching const export the literal value as a constant value
data class object iterates over fields of data class
named tuple object iterates over fields of named tuple
regular class object iterates over dir(obj)
JsonArray array untyped JSON array
JsonObject object untyped JSON object
Any oneOf a union of all basic JSON schema types
Annotated[T, ...] depends on T outputs value for T, applies constraints and format based on auxiliary type information

JSON schema examples

Simple basic types

Python type JSON schema
bool {"type": "boolean"}
int {"type": "integer"}
float {"type": "number"}
str {"type": "string"}
bytes {"type": "string", "contentEncoding": "base64"}

Simple built-in types

Python type JSON schema
decimal.Decimal {"type": "number"}
datetime.date {"type": "string", "format": "date"}
uuid.UUID {"type": "string", "format": "uuid"}

Enumeration types

class Side(enum.Enum):
    LEFT = "L"
    RIGHT = "R"
{"enum": ["L", "R"], "type": "string"}

Container types

Python type JSON schema
List[int] {"type": "array", "items": {"type": "integer"}}
Dict[str, int] {"type": "object", "additionalProperties": {"type": "integer"}}
Set[int] {"type": "array", "items": {"type": "integer"}, "uniqueItems": True}}
Tuple[int, str] {"type": "array", "minItems": 2, "maxItems": 2, "prefixItems": [{"type": "integer"}, {"type": "string"}]}

Annotated types

Range:

Annotated[int, IntegerRange(23, 82)])
{
    "type": "integer",
    "minimum": 23,
    "maximum": 82,
}

Precision:

Annotated[decimal.Decimal, Precision(9, 6)])
{
    "type": "number",
    "multipleOf": 0.000001,
    "exclusiveMinimum": -1000,
    "exclusiveMaximum": 1000,
}

Fixed-width types

Fixed-width integer (e.g. uint64) and floating-point (e.g. float32) types are annotated types defined in the package strong_typing.auxiliary. Their signature is recognized when generating a schema, and a format property is written instead of minimum and maximum constraints.

int32:

int32 = Annotated[int, Signed(True), Storage(4), IntegerRange(-2147483648, 2147483647)]
{"format": "int32", "type": "integer"}

uint64:

uint64 = Annotated[int, Signed(False), Storage(8), IntegerRange(0, 18446744073709551615)]
{"format": "uint64", "type": "integer"}

Any type

{
    "oneOf": [
        {"type": "null"},
        {"type": "boolean"},
        {"type": "number"},
        {"type": "string"},
        {"type": "array"},
        {"type": "object"},
    ]
}

Custom serialization and de-serialization

If a composite object (e.g. a dataclass or a plain Python class) has a to_json member function, then this function is invoked to produce a JSON object representation from an instance.

If a composite object has a from_json class function (a.k.a. @classmethod), then this function is invoked, passing the JSON object as an argument, to produce an instance of the corresponding type.

Custom types

It is possible to declare custom types when generating a JSON schema. For example, the following class definition has the annotation @json_schema_type, which will register a JSON schema subtype definition under the path #/definitions/AzureBlob, which will be referenced later with $ref:

_regexp_azure_url = re.compile(
    r"^https?://([^.]+)\.blob\.core\.windows\.net/([^/]+)/(.*)$")

@dataclass
@json_schema_type(
    schema={
        "type": "object",
        "properties": {
            "mimeType": {"type": "string"},
            "blob": {
                "type": "string",
                "pattern": _regexp_azure_url.pattern,
            },
        },
        "required": ["mimeType", "blob"],
        "additionalProperties": False,
    }
)
class AzureBlob(Blob):
    ...

You can use @json_schema_type without the schema parameter to register the type name but have the schema definition automatically derived from the Python type. This is useful if the type is reused across the type hierarchy:

@json_schema_type
class Image:
    ...

class Study:
    left: Image
    right: Image

Here, the two properties of Study (left and right) will refer to the same subtype #/definitions/Image.

Union types

Serializing a union type entails serializing the active member type.

De-serializing discriminated (tagged) union types is based on a disjoint set of property values with type annotation Literal[...]. Consider the following example:

@dataclass
class ClassA:
    name: Literal["A", "a"]
    value: str


@dataclass
class ClassB:
    name: Literal["B", "b"]
    value: str

Here, JSON representations of ClassA and ClassB are indistinguishable based on property names alone. However, the property name for ClassA can only take values "A" and "a", and property name for ClassB can only take values "B" and "b", hence a JSON object such as

{ "name": "A", "value": "string" }

uniquely identifies ClassA, and can never match ClassB. The de-serializer can instantiate the appropriate class, and populate properties of the newly created instance.

Tagged union types must have at least one property of a literal type, and the values for that type must be all different.

When de-serializing regular union types that have no type tags, the first successfully matching type is selected. It is a parse error if all union member types have been exhausted without a finding match.

Name mangling

If a Python class has a property augmented with an underscore (_) as per PEP 8 to avoid conflict with a Python keyword (e.g. for or in), the underscore is removed when reading from or writing to JSON.