-
-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The if TYPE_CHECKING
problem
#1
Comments
In order to make this pattern work well, I believe we will eventually have to move beyond the If we have that, then the runtime can preserve the fully qualified path to the type-only import instead of just its name, and record that (along with perhaps an AST-like representation of additional operations performed on it, such as indexing, bit-or, etc), allowing the lazy (I recognize that this is probably a bridge too far for Python 3.10, but I hope if we find a plan for 3.10, it can be a step on the way to something like this.) |
I'm very interested in this fairly common idiom. I don't have a giant Python code base that uses static type analysis, so I just don't understand the use case. I'd appreciate it if you could tell me:
|
Also, both PEP 649 and the current stringized annotations in 3.10a should speed up importing, by making the annotations faster to calculate. Do either of those speed up importing |
One reason for the idiom is that typing frequently results in import cycles that would not otherwise occur. E.g. if Also import expense is relevant. In a very large system (or even a smaller CLI tool) you may have code paths that are unlikely to be followed in a given process, but may be. There is value in having an expensive import required for that code path not occur immediately at process startup (delaying initial responsiveness of the entire system.) |
It's a little harder to answer the runtime uses for The problem discussed here is already an issue for us in that API schema library use case. Today In that sense this is not a new problem, your PEP just shifts it from being a |
Making |
Circular imports are a gnarly problem. But maybe PEP 649 permits a gnarly solution? Let's say module First, we pick one of
I mean, it's not great, but it seems like it might solve the circular import problem. |
Are these large code bases with And, if so, what did these large codebases do before they could import stringized annotations? Were they just slow, and everyone groused about it? |
I agree with what Carl wrote above; In your gnarly solution, I'm guessing that Another workaround is to just tell people to keep using string annotations in cases where they're using |
If you used my proposed solution for circular imports for runtime use, could you also continue the current |
You mean, hand-stringized annotations? I'm hoping that, if PEP 649 gets accepted, we can deprecate and eventually remove |
We have a large codebase with a lot of type annotations (not quite as large as Carl's, but still). Importing it is slow because it's a massive amount of code, not because of the annotations. I'm sure the annotations make it a bit worse, but it doesn't make enough of a difference to care about. We already set things up to cache the import and minimize the number of times we have to import the codebase.
We could, good point. It's still going to be a bit fragile because our import cycles tend to involve more than just a few files, but maybe we can live with it.
Yes, I mean hand-stringized annotations, like the mypy function that has |
I can only speak for one or two large code bases, but our path was roughly: we started introducing type annotations, we had more and more of them, we noticed they were a significant contributor to startup time and overall CPU, we started using more hand-stringized annotations, then we were happy to see PEP 593 which allowed us to add Edit: I should note that a lot of the runtime CPU cost that became a problem was due to the Python 3.6 GenericMeta implementation, which is much improved now. But we still wouldn't want to go back to pre 593 days, for reasons of import cycles and delaying expensive and likely not to be needed imports. |
I think from our perspective the "module c" workaround wouldn't meet the usability or maintainability bar and we wouldn't adopt it. But the PEP also wouldn't make things any worse for us today, it just would make it harder for us to implement our own extension to I do think the "module c" workaround points in the direction of what a longer term better resolution would be, as I was getting at in my first comment. Really we just want efficient lazy imports (probably via some form of module-level "cached property"), which I think Neil Schemenauer had a proof of concept of at one point. |
Sorry to state the perhaps obvious here, I have limited screen time available and this thread has already ballooned, so I'm treating it as append-only. :-) IIUC the issue with import recursion (and to some extent large code bases) is that before annotations were in use, one could often get away with not importing a module even though an instance of a class defined in that module was used as an argument, because of duck typing. But with annotations you need to have the class in your namespace so you can name it in the annotation. And that means you have to import the module containing the class. So now you are adding imports, and before you know it you either have a new circular import (in large, mature code bases those occur frequently) or you import everything that could be used, regardless of whether it is actually needed. For example, suppose you have a UserAccount class whose instances have a list of BackupFile objects. But most users never use backup files, so that list is nearly always empty, and the BackupFile code just manipulates the list directly. But when adding type annotations, you have to add |
Misquoting something I head from a friend: there's no problem you can't solve in computer science with another layer of indirection, except for the "too many layers of indirection" problem.
Would that sufficiently solve the circular import problem? |
Type checkers will generally reject that as an invalid annotation: you can't use a call expression in an an annotation. Personally I'm now OK with just recommending that you write |
In brief, no :) I think you might be operating under a misapprehension that these cyclic imports are a rare or unusual case, such that it's acceptable to introduce five or ten lines of extra code for each one where it should have otherwise been a simple annotation. This is not so -- these are very common. So "write a three-line wrapper function for every would-be cyclic import" is as much a non-starter as "introduce a new third module for every cyclic import case." Also, I don't think any of the current type checkers are able to handle such annotations, so this option requires updates to every type checker. Again, I think you're pushing in the right semantic direction (lazy imports), but it needs to be much more transparent and less syntactically burdensome to be a practical option. |
Oh, right, yeah. Sorry. I forget static type checkers are so finicky. |
I agree with @JelleZijlstra that it's ok to move forward without solving this, not because the cyclic import case isn't common (for us at least it's quite common), but because the "cyclic import plus need to resolve annotations at runtime" case isn't common. And most important, that case already doesn't work today, since |
What about a mock object replacing
|
Here's a slightly more elaborate version that supports
|
I think that requiring an The status quo today with PEP 563 is that you can get If we could arrange such that if you access Failing that, I would go back to hand-stringifying annotations in this situation before choosing any other manual workaround that's been proposed in this thread so far. |
For those annotations that can't be evaluated, we'd have to stringify it,
right? But that would mean that we'll still have to have all the
stringifying code...
|
Yeah, the rub is what |
Oh, I see. Your PR catches NameErrors and replaces them with something that reports back the name. Yes, that should work. |
So inject the mock objects into
If you moved this injection code into a central module, let's call it
Now you only have to create the mock modules--not every imported object--and only in one place. And if you forget to add one, your code still works, it just imports the original expensive module.
While "delayed annotations using descriptors that magically abolish NameErrors" is definitely preferable in my mind to "annotations are automatically turned into strings", I still hold out hope we can find another less-magical approach. |
I don't see "stuffing sys.modules with stub replacements as an import side effect" as either workable for real use, or as less magical than the NameError replacement approach. The cycle problem does not impact one specific "expensive module" that one can choose in advance to stub out. It could impact any module in the code base, and a different set of modules over time. Also, who is then responsible for replacing this stub with the real module? Import won't do it, it'll find I'm afraid I can't see any possibility of this stub approach being acceptable solution for prod use. |
It is obviously less magical than the proposed "swallow NameError and replace with a new Reference object" approach, because it doesn't require changing the language. You could do it today in Python 3.9. And, if it works, surely it is by definition workable?
Right, but the set of modules is known at compile-time. Obviously it's known, because it's hard-coded in the source tree. It's the set of modules that are supposed to go in the So you stub out that set of modules in my proposed It's actually an improvement over the current approach in that respect, where that set of imports is moved into / out of an You could even have some error-checking in there. If somebody imported one of the modules that's supposed to get stubbed-out before
You can't replace the stub with the real module. But I thought the whole goal was to avoid importing the module at runtime. The problem I was trying to solve was "we need to not import this module in production, because it's expensive, but we need our code to still run, and we want the annotations to minimally work because sometimes people look at them".
If the problem is "we have circular imports," you're right, this won't solve your problem. Circular imports are a pretty gnarly problem in Python and so far nobody has come up with any good solutions. And type hints seem to make them worse. But the initial post in this issue is about "we don't want to Perhaps it would be best if we moved the discussion about circular imports into a second issue on this repo. Attempting to solve two different problems ("import expensive_module" vs "circular imports") in one conversation is not lending clarity to either discussion. |
Y'know, I just tried it, and a simple example of circular dependencies and circular imports worked first try. My code:
I ran
I'm prepared to believe that there are gnarlier circular dependencies / circular import problems where PEP 649 falls down. But at a minimum, it sure seems to work correctly with this simple example. Can someone give me an example where they can get a |
Hi Larry!
I think we've successfully uncovered at least one miscommunication! As far as I'm aware, this problem statement does not describe a problem that anyone has. If you never want a module imported at runtime, why would you have the module around and accessible to import at all? And why would you have functions annotated to take/return instances of classes from that module, if at runtime such instances could not exist? The problem described above that most closely resembles this is when I said "In a very large system (or even a smaller CLI tool) you may have code paths that are unlikely to be followed in a given process, but may be. There is value in having an expensive import required for that code path not occur immediately at process startup (delaying initial responsiveness of the entire system.)" But you'll note that this describes a use case where the "expensive module" is still needed at runtime, just not needed in all (or the most common) execution paths. And this miscommunication leads to another one:
The "set of modules that are supposed to go in the
I don't think so. Once we resolve the misunderstandings above, there is only one real problem here, import cycles and import expense are just two symptoms of it. It would be best to entirely set aside the "an expensive module" framing of the problem; as @JelleZijlstra mentioned early in this thread, that was an overly-simplified and non-representative initial framing. A better core statement of the problem is something like this: "Type annotations tend to greatly increase the import dependency fanout of a typical module. Increasing the import dependency fanout when there is no runtime need for the increased fanout is a bad thing, because it leads to many more import cycles and it unnecessarily front-loads import expense (even if there is no singular "expensive module" but rather just many many modules in the transitive dependency chain which in aggregate are expensive to import)."
In general this is true. But for the problem that "type annotations make import cycles a lot more common," there is already a really excellent solution (from the end-user perspective), which is already in wide use: PEP 563 and
I hope the above has clarified that the reason I was negative about it is that it doesn't work to solve any problem that I have :)
Yes, this is generally how cyclic imports work in Python. If you import the entire module (rather than You might reasonably object that in that case we should just use the form of imports that allows cycles to work! And I would agree in general, but in more complex cases with deeply nested submodules this isn't ergonomic, and as discussed above it's still not desirable to add more spurious runtime dependencies to your modules just for type-checking purposes.
Bit of a digression maybe; "magical" is not a well-specified term. I only pursue the digression because it might help clarify why we find different approaches more or less objectionable. IMO "requires changing the language" is a pretty arbitrary definition of what constitutes "magical" (in a negative use of the term), and I don't think it's a particularly apt definition. I would say something along the lines of "leads to unexpected behavior" or "likely to leak as an abstraction and cause unintended problems" are better definitions, and these are mostly orthogonal to "requires changes to the language." Python as it exists today has plenty of scope for "magic" already (for better or worse), and some changes to the language may allow for less "magical" solutions to some problems. |
Bike shedding some syntax and semantics here, perhaps we could introduce a Within an annotation, these values (and member lookups) would be ignored (eg: "stringized" like PEP 563 or deferred as here/PEP 649). Accessing those values at runtime would trigger the import to be "resolved" - it doesn't magically solve circular imports. Explicitly: runtime annotation inspection ( I'm not very familiar with syntax parsing, but given Outcome:
-- It might look something like this:
from __future__ import co_annotations
deferred import b import B # -> `B = Ref("b.B")`
class A:
def method(self, b: B): pass
from __future__ import co_annotations
deferred import a # -> `a = Ref("a")`
class B:
def method(self, a: a.A): pass edit: Ah, for some reason I though descriptors did work on modules, but that appears not to be the case. 😁 A `lazy_object_proxy` examplefrom importlib import import_module
from lazy_object_proxy import Proxy
def ref(name, ismember=False):
def resolve():
if ismember:
module_name, _, member_name = name.rpartition(".")
return getattr(import_module(module_name), member_name)
return import_module(name)
return Proxy(resolve)
path = ref("os.path")
Path = ref("pathlib.Path", ismember=True)
print("marker 1")
print(path)
print(path.join("/", "/home"))
print("marker 2")
print(Path)
print(Path("/") / "home")
# marker 1
# <module 'posixpath' from '/Users/jacobhayes/.pyenv/versions/3.9.0/lib/python3.9/posixpath.py'>
# /home
# marker 2
# <class 'pathlib.Path'>
# /home With |
@JacobHayes Yup, I think opt-in deferred imports are a good direction for solving this problem (as well as providing a nicer general solution for cyclic imports than Python has ever previously had.) I think for performance reasons the implementation should probably be native in the runtime rather than implemented in pure Python, and ideally totally transparent to language semantics (other than changing when import side effects occur). A co-worker has already been playing around with an implementation of this, there are some tricky issues but it looks promising. |
I think the idea described at #2 (comment) also provides a workable approach to this problem. |
With the release of Python 3.11, browsing the What's New had me re-reading Łukasz's article at https://lukasz.langa.pl/61df599c-d9d8-4938-868b-36b67fdb4448/. That article got me thinking again along the lines of @JelleZijlstra's idea of making name lookup in annotations inherently lazy so As @carljm noted in the comments on #2, that doesn't technically require compiler changes, it just requires the use of a non-standard globals dictionary when doing the evaluation. Writing such a replacement globals isn't entirely trivial (if you want to avoid implicitly wrapping all builtin references in
And then as Jelle noted in the original post, |
One use case I think has not been discussed much here is the use of type hints purely for benefit of the IDE / linter, like for code completion. The AWS client library
The benefit of botostubs is that the IDE will complete the methods on |
The accepted version of PEP 649 covers this. |
This was brought up by Joseph Perez on the mailing list. The problem is that this is a fairly common idiom:
My only idea for a solution is to make an undefined name produce some special object, like
typing.ForwardRef
, in an annotation. But that may introduce more complexity, because annotations aren't just names.I can think of three more operations we'd have to support with current standard library typing:
SomeType
may be generic so we'll have to supportSomeType[int]
SomeType | int
list[SomeType]
, and get caught up by overzealous runtime typechecking. For example,typing.Union
would currently reject it.(I opened this issue because I feel like it's an easier way to have a focused discussion on a single problem. If you disagree, feel free to let me know.)
The text was updated successfully, but these errors were encountered: