Replies: 8 comments 65 replies
-
Thanks for reaching out. This discussion is timely because a draft PEP 681 is about to be posted for this functionality. We have a small window of time to complete this PEP because Python 3.11 is locked (in terms of new functionality) in May. @debronte is the primary author, and he's shepherding it through the feedback and review process. It would probably be best to move this discussion to the python/typing discussion forum so the broader Python typing community benefits from it — and has the opportunity to weigh in. I presume that the behavior you describe above is specific to sqlalchemy 2.x. I don't see these classes/types in the sqlalchemy 1.x code base. Am I correct in understanding that sqlalchemy 2.x is still under development and that there is willingness to make some breaking changes from sqlalchemy 1.x? If so, I encourage you to consider switching to something that more closely matches the behavior of the stdlib As for your proposed solution, I see a few issues:
I wonder if it would make sense to apply the transform in the opposite direction. In other words, have the user specify the field type without the descriptor class but then apply the descriptor class as part of the transform. This has the benefit of hiding some of the implementation details that users probably don't care about. The transform could be specified as part of the You asked "does @dataclass
class Model:
a: int # No default value, must be specified with `int` argument when calling __init__ method
b: Optional[int] # No default value, must be specified with `int` or `None` argument when calling __init__ method
c: int = 3 # Default value provided, so it can be omitted when calling __init__ method
d: Optional[int] = None # Default value provided, so it can be omitted when calling __init__ method
reveal_type(Model.__init__)
# Type of "Model.__init__" is "(self, a: int, b: int | None, c: int = 3, d: int | None = None) -> None" |
Beta Was this translation helpful? Give feedback.
-
Thanks for your feedback, @zzzeek. And sorry for the delayed response.
Yes, this is possible: @overload
def mapped_column(*, primary_key: Literal[True], init: Literal[False] = False) -> Any: ...
def mapped_column(*, primary_key: bool = False, init: bool = True) -> Any: ...
I was unable to find a way to configure overloads such that the type of the field would influence which overload we choose. @erictraut, is there some way to make this work with bidirectional type inference and generics maybe? For the first issue I tried having the above overload return Btw, is bidirectional type inference standardized, or is it a Pyright-specific thing?
We could accept a field type transform as a parameter to the Specifying the transform as part of the field descriptor would give more flexibility. Maybe as a new #region Library code
class Mapped(Generic[_T], util.TypingOnly): ...
# Library author provides a function (with no implementation?) to perform
# the desired field type transformation
def transform(input: Type[_T]) -> Mapped[_T]: ...
# The function is specified as the default value of a new "transform" field
# descriptor parameter
def mapped_column(*, init: bool = True, transform: Callable[[Type], Type] = transform) -> Mapped: ...
#endregion
#region User's code
@sqlalchemy_dataclass
class MyClass:
# type checker sees default value of transform parameter on mapped_column
# and treats foo as if its type is Mapped[int]. At runtime it is just int.
foo: int = mapped_column()
# Synthesized __init__ method param's type would be int during type
# checking and at runtime
#
# def __init__(self, foo: int):
# ...
#endregion Questions:
|
Beta Was this translation helpful? Give feedback.
-
Wouldn't Btw, assuming this worked somehow, since
Good point.
I think there was a logical leap you didn't explicitly write out here. Are you saying that requiring the field descriptor to be a class (rather than a function) is OK in this case because the |
Beta Was this translation helpful? Give feedback.
-
thanks for the fast turnaround. I'll try to have a look tomorrow.
- mike
…On Mon, Feb 14, 2022, at 11:15 PM, Eric Traut wrote:
Here's the commit <471fba1> that adds provisional support for this new feature in pyright. I typically publish a new version of pyright on Tuesday evening in preparation for the weekly Wednesday release of Pylance. So watch for pyright 1.1.222 in the next 24 hours or so.
—
Reply to this email directly, view it on GitHub <#2958 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAA7JXY2XE6PDWKOBF4K7GTU3HHM5ANCNFSM5NES7TCQ>.
Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Hi Erik -
Great, I hope to be able to try this, hopefully today although it's very difficult to find time, but i know this is very time sensitive, ill try to hit it today.
- mike
…On Fri, Feb 18, 2022, at 2:13 AM, Erik De Bonte wrote:
@zzzeek <https://github.com/zzzeek>, the new `transform_descriptor_types` <debonte/peps#3> `dataclass_transform` parameter is now available in Pyright 1.1.222 <https://github.com/microsoft/pyright/releases/tag/1.1.222> and Pylance 2022.2.3 <https://github.com/microsoft/pylance-release/releases/tag/2022.2.3>.
Give it a try and let us know how it goes!
—
Reply to this email directly, view it on GitHub <#2958 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAA7JX4L24242QXNSGSIPG3U3XWQPANCNFSM5NES7TCQ>.
Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Hi Erik -
The first one surprises me in that it wasn't doing that already, the second change I don't really understand.
For the first change, we have the users putting the descriptor type explicitly for those fields which are "mapped", suppose HasDataClass includes a metaclass passed into dataclass_transform():
class MyClass(HasDataClass):
id : Mapped[int]
name: Mapped[str]
The Mapped class is a descriptor, with __get__ and __set__ methods. It was my understanding that the type checker already considers MyClass.id and MyClass.name to be descriptors, so that at the instance level, MyClass().id and MyClass().name would operate the same as they did if the @dataclass_transform weren't present. In our case, Mapped[_T] will always return a _T at the instance level in any case, so perhaps we just didnt notice this. But I would say yes, if the class has a descriptor type like I have above at the class level, __get__ and __set__ should be honored at the instance attribute access level in all cases.
For the second case, that one seems to not make much sense to me, both in terms of what it would do, as well as why it would want to do...something.
If I make a class like this:
class MyClass(HasDataClass):
x: int
y: int
id : Mapped[int]
name: Mapped[str]
I would expect that MyClass().id and MyClass().name have descriptor behavior. I would expect that MyClass().x and MyClass().y do not. They should be treated as they normally would on any dataclass, that they can be passed to the constructor and will be present as instance variables. I don't see what the purpose of introducing ClassVar into the type would accomplish. If the user wants x and y to be classvars, they should specify ClassVar, as would always be the case, as below, where "y" is a classvar:
class MyClass(HasDataClass):
x: int
y: ClassVar[int]
id : Mapped[int]
name: Mapped[str]
if someone specifies the above, that's what they should get. As has been mentioned in the past, we are trying to get the "dataclasses" pattern to be replicated here as closely as possible; I was going to be selling this new feature as "as close as you can get to dataclasses", including that I am including support for generating __repr__(), __eq__() and all the rest just like dataclasses do (I'm using dataclasses API directly to generate the methods). This way I can point users to how dataclasses work when there's confusion over things and I don't have to teach a new API that is slightly different in its expected behavior, for those parts that look exactly the same as they do on a standard dataclass.
You can see some tests for my current API at https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/3597/10/test/orm/declarative/test_dc_transforms.py . The test called test_integrated_dc looks tests the above ideas with a class like this:
class A(dc_decl_base):
__tablename__ = "a"
ctrl_one: str
id: Mapped[int] = mapped_column(primary_key=True, init=False)
data: Mapped[str]
some_field: int = dataclasses.field(default=5)
some_none_field: Optional[str] = None
Above, if someone writes a class like that, I think it's clear they are expecting ctrl_one, some_field and some_none_field to do the same thing they would do on a normal dataclass, that is, be part of the __init__ method as well as other methods like __repr__, __eq__, etc. One thing I've observed about the users of dataclasses, they are very enthusiastic about being explicit where there is a choice. I'm not sure what the bigger rationale for assuming ClassVar would be but let me know.
- mike
…On Fri, Mar 4, 2022, at 6:05 PM, Erik De Bonte wrote:
@zzzeek <https://github.com/zzzeek> we're considering two changes based on a recent email ***@***.***/message/FC3RIWDWQZTMXPDJT7UM6TKT5BBSXF3V/> from Carl Meyer on Typing SIG:
1. When `transform_descriptor_types` is `True`, in addition to changing the `__init__` parameter types for descriptor fields, type checkers would understand that the descriptors' `__get__` and `__set__` methods will be used when those fields are accessed. So they would expect that getting the value of such a field would return the `__get__` method's return type. And similarly, when setting the field, a value compatible with `__set__` should be provided.
2. When `transform_descriptor_types` is `True`, *all* fields would be treated as class variables, not just descriptor fields. If we made this change, we would rename `transform_descriptor_types` to something else.
What do you think about these two proposals? Do they feel right to you? Would they improve things for SQLAlchemy at all?
—
Reply to this email directly, view it on GitHub <#2958 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAA7JXYKC7TBBQB6HRHBTW3U6KJNDANCNFSM5NES7TCQ>.
Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Ok, (sorry for email I'm only on my phone today) , so the parameter would mean, "all fields would be treated as class variables (not ClassVar), ok.
In that case I don't think I understand what this particular change means. Say I have a normal dataclass with three fields, no descriptor types on it, but that it can also make use of this flag in some way. What is the change incurred by having the flag set or not set, if no descriptors are present ?
…On Sat, Mar 5, 2022, at 12:00 PM, Eric Traut wrote:
@edebonte, I don't think it's necessary to document the first point. That's how descriptors always work, and the dataclass_transform doesn't propose to make any change to the way descriptors work. So I think it would be confusing to say anything about `__get__` and `__set__` in the dataclass_transform PEP (other than the fact that the `__set__` value is used when synthesizing the `__init__` parameter for that field.
@zzzeek <https://github.com/zzzeek>, on the second point, there's an important distinction between `ClassVar` and a "class variable". We have three classes of variables to consider:
1. `ClassVar`: A class variable that is explicitly declared as such and cannot be overridden by a local variable without triggering a type checker error.
2. A normal "class variable": Can be (and often is) overwritten by an instance variable of the same name, unless it's a descriptor in which case the descriptor's `__set__` logic handles any assignments to the variable through a member access expression.
3. An instance variable only: Stored in the dictionary private to the instance. Cannot be accessed through the class. Descriptors do not work as expected if they are stored as instance variables.
class MyClass:
v1: ClassVar[int] = 0 # "ClassVar"
v2: int = 1 # "class variable"
v4 = MyDescriptor()
def __init__(self):
self.v3: int = 2 # "instance variable"
m = MyClass()
print(MyClass.v1) # 0
print(MyClass.v2) # 1
# print(MyClass.v3) # Runtime error
print(m.v1) # 0
print(m.v2) # 1
print(m.v3) # 2
# m.v1 = 1 # Type error - cannot overwrite ClassVar with instance variable
m.v2 = 10
m.v3 = 11
print(m.v1) # 0
print(m.v2) # 10
print(m.v3) # 11
print(MyClass.v1) # 0
print(MyClass.v2) # 1
So I think Erik's proposal matches all of the desired behaviors you've outlined above. The tricky part is how to explain it all in a way that's clear in the PEP.
—
Reply to this email directly, view it on GitHub <#2958 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAA7JX6HUFF7LEV77XQ53QLU6OHLZANCNFSM5NES7TCQ>.
Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
post-mortem, mypy 1.1.1 is released and they did descriptors exactly wrong, the way I was afraid they would, which is why I really wanted this language to be in the specification. See python/mypy#14868 |
Beta Was this translation helpful? Give feedback.
-
So I just realized that dataclass transforms are available now using a magic name
__dataclass_transform__
, so we could in theory use this now if it were compatible with our use case.However, the way things work in SQLAlchemy, it's not compatible right now. A SQLAlchemy mapped class defines behaviors not just at the instance level, but also at the class level, using Python descriptors. That is, a mapped class is not fully typed if it looks like this:
that of course is a form we've made "work" using the mypy plugin, but it's not actually correct. The correct form is:
Where
Mapped
is a descriptor that supplies all the correct behaviors for the mapped class, including SQL expressions at the class level likeMyClass.id > 5
.the
Mapped
type is a regular descriptor without too much going on except that it has class level behaviors defined: https://github.com/sqlalchemy/sqlalchemy/blob/f24a34140f6007cada900a8ae5ed03fe40ce2631/lib/sqlalchemy/orm/base.py#L608so out of the gate for dataclass_transforms to be useful to us we'd need some way to have it extract the _T of the attributes, that is, to produce the effective
__init__
method below:note also that SQLAlchemy's
__init__
method considers all attributes to be optional up front as well. Unlike dataclasses, SQLAlchemy's objects are highly dynamic where we assume some attributes come from database-generated defaults later on, other attributes are assigned at runtime but not within the constructor, etc.at the moment it does not appear that we'd need to make use of the specially named parameters on the
field()
construct, which is good because that form is also not quite how our APIs look anyway; the ORM mapping process doesn't create any__init__
level default values, everything defaults toNone
and everything is an optional keyword argument (ah where, I'd also ask, does dataclass transform considerOptional[typ]
to mean that it's an optional keyword argument, as opposed to "the parameter is required but can beNone
. that's another part). the high degree of specificity on those parameter names forfield()
, including that these can't be attributes on the object or anything like that, makes me worry a bit, so hopefully if this proposal can be accommodated I won't need to get into that.so here I propose at least one way our altered "synthesis" of
__init__
parameters can be made to work for us in a way that shouldn't get in anyone's way, since the whole purpose of the function is to model "runtime synthesis", that the annotated type given in the body of the class also has a method that can be decorated / magically named, a name like__dataclass_field_transform__
, for us it would look like this:so when pyright looks at the fields for
__dataclass_transform__
in order to determine how it would "synthesize" an__init__
method, it also looks at__dataclass_field_transform__
on the declared type of each field to see what type of object is actually expected within the synthesized__init__
method.theres probably other ways to do this too, but that's the basic thing we'd need. let me know what you think.
Beta Was this translation helpful? Give feedback.
All reactions