-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issue with unsupported list values and tuple #103
base: main
Are you sure you want to change the base?
Conversation
if ( | ||
isinstance(tn, dict) | ||
and isinstance(tn.get("type", {}), dict) | ||
and tn.get("type", {}).get("logicalType") is not None | ||
and isinstance(tn.get("type", None), (dict, list)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change to pull the type
out of the dict
representing the types of the elements seemed similar to the logic we are using in the case that handles just a raw dictionary:
if isinstance(value_type, dict) and len(value_type) == 1:
value_type = value_type.get("type")
The dict
case is the equivalent for what we had with the previous if
statements handling logicalType
and a list of lists. However, it also handles list[dict[str,str]]
. In the case that it is a list, we end up with just a list of the types which is valid to set directly as the items type.
@@ -84,27 +84,31 @@ def get_type(value: dict) -> dict: | |||
} | |||
|
|||
classes_seen.add(class_name) | |||
elif t == "array" and "prefixItems" in value: | |||
# Handle tuple since it is considered an array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The best we can do for a tuple with the AVRO schema is figure out all of the unique types and say the given field can be an array that is one of those unique types. We figure out the minimum set of possible types based on the prefixItems
and return that as the type for the array.
In this PR, I am fixing two issues when going from pydantic to avro.
The first issue is for pydantic fields that are a list with certain value types like
list[str | None]
orlist[dict[str, str]]
. This fixes one of the issues currently posted (this issue), but it fixes more than just nullable elements in an array. We also saw an issue with certaindict
s as the value for an array. For the issues with adict
, we were getting,TypeError: unhashable type: 'dict'
when we tried to parse the schema.In debugging the issue, it seemed like several of the cases in the
get_type
for whent == "array"
could be simplified to just inline thetype
from the dictionary that represents the types of the elements.The second issue fixed in this PR is adding support for
tuple
(referenced in this issue). The JSON schema that pydantic generates fortuple
looks like this:Example class:
Generated JSON schema from pydantic:
A tuple is an array, but it uses the prefixItems to define that. Previously we would fall into the array case within
get_type
and blow up because there are noitems
. In this MR, I am adding a special case to check forprefixItems
and handle that.Test cases were added for both the new handling with
list
andtuple
. All existing tests pass.