Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue with unsupported list values and tuple #103

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

ppease
Copy link

@ppease ppease commented Feb 26, 2024

In this PR, I am fixing two issues when going from pydantic to avro.

The first issue is for pydantic fields that are a list with certain value types like list[str | None] or list[dict[str, str]]. This fixes one of the issues currently posted (this issue), but it fixes more than just nullable elements in an array. We also saw an issue with certain dicts as the value for an array. For the issues with a dict, we were getting, TypeError: unhashable type: 'dict' when we tried to parse the schema.

In debugging the issue, it seemed like several of the cases in the get_type for when t == "array" could be simplified to just inline the type from the dictionary that represents the types of the elements.

The second issue fixed in this PR is adding support for tuple (referenced in this issue). The JSON schema that pydantic generates for tuple looks like this:

Example class:

class TupleTestModel(AvroBase):
    c1: tuple[int]    

Generated JSON schema from pydantic:

{
  "properties": {
    "c1": {
      "maxItems": 1,
      "minItems": 1,
      "prefixItems": [
        {
          "type": "integer"
        }
      ],
      "title": "C1",
      "type": "array"
    }
  },
  "required": ["c1"],
  "title": "TupleTestModel",
  "type": "object"
}

A tuple is an array, but it uses the prefixItems to define that. Previously we would fall into the array case within get_type and blow up because there are no items. In this MR, I am adding a special case to check for prefixItems and handle that.

Test cases were added for both the new handling with list and tuple. All existing tests pass.

if (
isinstance(tn, dict)
and isinstance(tn.get("type", {}), dict)
and tn.get("type", {}).get("logicalType") is not None
and isinstance(tn.get("type", None), (dict, list))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change to pull the type out of the dict representing the types of the elements seemed similar to the logic we are using in the case that handles just a raw dictionary:

if isinstance(value_type, dict) and len(value_type) == 1:
    value_type = value_type.get("type")

The dict case is the equivalent for what we had with the previous if statements handling logicalType and a list of lists. However, it also handles list[dict[str,str]]. In the case that it is a list, we end up with just a list of the types which is valid to set directly as the items type.

@ppease ppease changed the title Fix issue with unsupported list values Fix issue with unsupported list values and tuple Feb 27, 2024
@@ -84,27 +84,31 @@ def get_type(value: dict) -> dict:
}

classes_seen.add(class_name)
elif t == "array" and "prefixItems" in value:
# Handle tuple since it is considered an array
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The best we can do for a tuple with the AVRO schema is figure out all of the unique types and say the given field can be an array that is one of those unique types. We figure out the minimum set of possible types based on the prefixItems and return that as the type for the array.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants