-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamic and Discriminated Unions #1467
Comments
This concept was discussed at some length in this thread. |
Thanks for the link @erictraut! A few thoughts from reading that thread: The
|
Pyright already supports an option called # pyright: reportMatchNotExhaustive=true
from typing import Literal
from pydantic import BaseModel
class StatusBase(BaseModel):
status: str
class StatusA(StatusBase):
status: Literal["a"]
class StatusB(StatusBase):
status: Literal["b"]
class StatusC(StatusBase):
status: Literal["c"]
Status = StatusA | StatusB | StatusC
def func(status: Status) -> None:
match status: # Pyright error: Cases within match statement do not handle all values; Unhandled type: "StatusC"
case StatusA():
...
case StatusB():
... Pyright also supports experimental support for inline syntax for TypedDict. I added this support to inform the discussion about this feature. I recommend not taking a dependency on it because it's likely to change or be removed depending on the final resolution of this topic, but you can play with it to get a sense for what this approach might feel like. Here's what your example would look like with inlined TypedDict support and PEP 695 syntax. # pyright: enableExperimentalFeatures=true
from typing import Literal
type Shape = (
dict[{ "kind": Literal["circle"], "radius": float }] |
dict[{ "kind": Literal["square"], "x": float }] |
dict[{ "kind": Literal["triangle"], "x": float, "y": float }]
) |
Just for completeness, there's really nothing new in Python 3.11 here. We added What we did in 3.11 (put this in |
Sorry for the delay in my reply :) Thanks @JelleZijlstra , I thought And thanks @erictraut for I thought to document python/TypeScript parity regarding this feature because it felt sensible given the comparison to Scala and Kotlin. However, my use case extends beyond TypeScript-style TypedDict unions. The same with a metaclassHoping to better illustrate my use case by highlighting an alternate solution... I was taught that you should really only be using a metaclass if you're:
Is there a canonical metaclass implementation that registers class declarations of a base class? Any time I did it, it looked something like this: classes = {}
class MyMetaclass(type):
def __new__(meta, name, bases, attrs):
cls = super().__new__(meta, name, bases, attrs)
if name != "MyBaseClass":
classes[name] = cls
return cls
class MyBaseClass(metaclass=MyMetaclass):
pass
class A(MyBaseClass):
pass
class B(MyBaseClass):
pass
print(classes) # {'A': <class '__main__.A'>, 'B': <class '__main__.B'>} As described in the OP motivation, and as implemented in Kotlin (see dropdown), this paradigm allows for declaring subclasses in separate files. And the subclasses need not only be dataclasses, they can fill abstract methods too. It’s an easily accessible way to add new functionality in a plugin-based architecture, without Otherwise…Is the only suggested pattern for fulfilling this use case to manually define the Union? If there's no way to statically detect a union of a dynamic collection of objects, neither as a union of subclasses, nor as registered by a metaclass, could there be a way to raise an error at static-analysis time if a subclass is declared, but not written down as part of a union's definition? Ideally I would also like to have the ability to raise an error if a subclass is declared outside of a specified file or package, but given how the |
That's the pattern I recommend. I use it extensively in TypeScript and Python, and I find that it works great, especially when paired with exhaustion verification techniques. Any mechanism that involves dynamic registration is going to be a problem for static type analysis, especially if those registrations can be performed anywhere by any code. This is why, for example, |
Alright, thanks for the confirmation. Regarding the DiscriminatedUnion proposal, given that PEP 727 got added to Unrelated sidenote regarding inlined typed dicts
|
1️⃣ Dynamic UnionsI'm not sure your proposal makes much sense to me, surely you should just make 2️⃣ Discriminated unions👍 👍 👍 👍 Yes we would love this, I haven't (yet) read through the whole of this issue, let me know if you need me to read it thoroughly, and or respond to specific points. I actually asked for it at the typing summit at pycon, in that case it was actually to help mypy's performance - see python/mypy#14034. But it would also be very helpful in pydantic itself. A few unordered thoughts:
|
The policy of typing-extensions is to add new objects when there is a PEP for them. We'll add I haven't thought about DiscriminatedUnion much recently, but I saw Samuel's talk at PyCon and as I recall it wasn't clear to me how this feature would affect type checker behavior, since a sufficiently strong type checker could already discriminate unions this way, even without the annotations. If a PEP is going to propose |
I agree with both @samuelcolvin and @JelleZijlstra: I think Pydantic would benefit from a shared So overall I don't see a huge benefit to adding anything to typing{_extensions} or annotated-types. |
From a type-checking perspective, I'd just want for there to be an error raised upon declaring the type marked as discriminated. Pass
Fail
Not sure what the precedent is on static analysis over marker types. Sounds like a great avenue for community plugins. By the way, why did |
Your examples would fail at runtime with Pydantic I believe, if that helps.
Because it had a PEP. |
Just to throw some more cents in here (and while we're comparing to other languages). I come from Swift programming where 'enum' acts as a really powerful discriminated union. It's the number 1 feature I miss from Swift as it allows you to be super expressive and type safe with return types, data etc etc. In Swift a discriminated union is done with an 'enum with associated value'. I think it's quite an intuitive way to represent the feature as we're already familiar with the idea of an Enum having a limited number of options and 'switching' across them (I see a bunch of thoughts about about using Example: enum Barcode {
case upc(Int, Int, Int, Int)
case qrCode(String)
}
var productBarcode = Barcode.upc(8, 85909, 51226, 3)
switch productBarcode {
case .upc(let numberSystem, let manufacturer, let product, let check): // here the .upc union is matched and it's inner values are assigned out to the 4 new variables
print("UPC: \(numberSystem), \(manufacturer), \(product), \(check).")
case .qrCode(let productCode):
print("QR code: \(productCode).")
} Multiple values in a case are like having a tuple in python: Barcode = tuple[int, int, int, int] | String What's interesting is the ability to mix a 'valueless' enum with those with a value: enum BlogPostStatus {
case draft
case published(Date)
case removed(String)
} ...the ability to have the same type discriminated: enum SomeStringRequestResponse {
case success(String)
case failure(String)
} .... the abiility to name the values for better readability: enum Pizza {
case small(inches: Int) // without 'inches' here, you might be left guessing what 'int' means
} We could improve enum Barcode {
case upc(numberSystem: Int, manufacturer: Int, product: Int, check: Int)
case qrCode(String)
} And to do more complex matching in a switch statement using the 'where' clause: func handle(_ error: Error) {
switch error {
// Matching against a group of offline-related errors:
case URLError.notConnectedToInternet,
URLError.networkConnectionLost,
URLError.cannotLoadFromNetwork:
showOfflineView()
// Matching against a specific error:
case let error as HTTPError where error == .unauthorized:
logOut()
// Matching against our networking error type:
case is HTTPError:
showNetworkErrorView()
// Fallback for other kinds of errors:
default:
showGenericErrorView(for: error)
}
} Not to mention all the usual niceties of encapsulation that an Enum provides (utility functions etc): enum CustomTextFieldTypes {
case cardType
case cardNumber
case cardExpiryDate
case cardName
case ccvNumber
func inputCardNumber(cardNumber: String!, cardNumberTextField: XCUIElement?) {
if case .cardNumber = self {
cardNumberTextField?.typeText(cardNumber)
}
}
} These are all 1st class language features and have made using other languages feel a little jarring/limiting for me and others I know - either indicating the Swift enum has become a crutch or something very useful... ;) I'm not knowledgeable or involved enough with the Python language to think about writing a PEP or contributing to implementation or understand the nuances of how proposals interact with other language features, but as a 'user' this thread makes me excited for the direction of the language and I wanted to show the Swift perspective to hopefully act as inspiration/conversation :) |
How Documentation Metadata in Typing resolves will likely inform the resolution for this issue |
When modeling semi-structured data, discriminated unions are an invaluable tool in your arsenal.
It allows you to model a variable unit of data, where each unit concisely specifies its type and payload. This is a common pattern in event-driven architectures, where you have a stream of "events", "commands", or "actions", each of which is a discriminated union of event types.
I'd like to propose two new features to the typing module that formalize dynamic and discriminated unions.
1️⃣ Dynamic Unions
Current state
Assume we have a
Status
model that is a discriminated union ofStatusA
andStatusB
:To define a union that is understood by the type-checker at static-analysis time, you declare it by mentioning every member of the union:
Whenever adding a new subclass of
StatusBase
, this declaration must be updated.I often abuse a pattern of lazily declaring a dynamic union instead of maintaining a static one, by tying all the members of the union to the subclasses of a parent class (in this case
StatusBase
):The type-checker is not able to infer the type of
Status
, because it's only compiled at runtime.Proposed feature
I propose a new feature that would allow the type-checker to infer a dynamic union of all
StatusBase
subclasses, and finally error if more subclasses are declared afterward.(Maybe call it
SubclassUnion
instead ofDynamicUnion
?)The type-checker should detect if a subclass of
StatusBase
is declared afterStatus
, and raise an error.This could work like the
@sealed
decorator was proposed to work in the Sealed Typing PEP draft. More useful information found on this email chain.The
@sealed
decorator was designed as part of PEP 622 (structural pattern matching). Think of it as confining a type and its subclasses to a single module, so that the type-checker can dynamically infer a union of the subclasses. These are also known as algebraic data types.I'd like to also explore the possibility of scoping algebraic data types to a package, or a dynamic scope bound between the parent class declaration and the union declaration.
2️⃣ Discriminated unions
Current state
For lack of a formalization of discriminated unions in core python, data (de)serialization libraries turn to their own objects. For example, pydantic specifies a discriminator value with
Field
, and usesAnnotated
to tag the union:This does not raise an error at runtime if any member of the
Status
union does not have astatus
field UNTILAnnotatedStatus
is used as a type on the field of a pydantic model declaration.Annotated
is an "escape hatch" that allows clients to express things that type-checkersmight not understand.
However, there is a push by third-party libraries for objects in the
typing
module that would canonically be used with theAnnotated
type. It signals a willingness to move from redundant proprietary internals toward ubiquitous core python. See PEP 727 and related discussion.Proposed feature
I propose a new feature that would allow the type-checker to infer the type of
AnnotatedStatus
, by declaring it as a dynamic discriminated union:This would grant data (de)serialization libraries unified plumbing to declare discriminated unions, and would allow the type-checker to throw an error at static-analysis time if any member of the
Status
union does not have astatus
field, or if any member of theStatus
union has astatus
field with a value that is not unique among all members of the union.The latter hinges on type-checkers being able to implicitly detect disjoint unions. See pyright#5933 for more information on mutable field invariance and implicit disjoint unions.
Motivation
When I'm writing JSONSchemas or deserializing in event-driven architectures, and I'm iterating fast, I gravitate toward easily being able to add new members to a union.
I usually do this in one of two ways. Either:
base.py
, each of the subclasses in a separate file, and the union in the module’s__init__.py
While this approach feels more magical than maintaining a static union yourself, it facilitates an accessible plug-in architecture, where users can implement new subclasses of the parent class in a new file, and have it automatically become part of the union without deeply understanding the internals of the library.
The text was updated successfully, but these errors were encountered: