Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search procedure for inline annotations #1190

Closed
bdarnell opened this issue Jan 29, 2016 · 34 comments
Closed

Search procedure for inline annotations #1190

bdarnell opened this issue Jan 29, 2016 · 34 comments

Comments

@bdarnell
Copy link
Contributor

(moved from python/typeshed#52)

Currently mypy (AFAICT) looks for type information in the current directory/location of file being checked, directories on MYPYPATH, and typeshed, in that order, and takes the first file it finds. This will become problematic as more libraries adopt inline type annotations: the user will need to maintain a MYPYPATH that includes all such libraries. The easiest way to do that is to pip install -r requirements.txt and use that environment's site-packages as mypy's path, but this will include packages that do not have type annotations and these source files may mask stubs for those libraries in typeshed.

I propose that rather than stopping the search at the first file that exists, mypy examine source files to see whether they have usable type annotations, and if not, continue the search. If the search concludes without any usable type information but it did find some source files, then the first one can be used (to at least get the list of top-level function names, etc)

Completely:

  1. Construct a search path, mimicking python's sys.path initialization: an initial element based on the command line (the current directory in -m mode, the directory containing the given file if a file is given), $MYPYPATH, $PYTHONPATH, typeshed. I think it would be convenient if there were a shortcut to add the site-packages of a given virtualenv (perhaps defaulting to the environment in which mypy is running); this would come in between $PYTHONPATH and typeshed.
  2. For each directory in this search path:
    2a. Look for a .pyi stub file. If one is found, we're done.
    2b. If no .pyi file, look for a .py source file. If it can be parsed and contains type information, we're done. If it can be parsed but does not contain types, and it's the first one we've seen, remember it.
  3. If no type information was found but we remembered a file in 2b, use whatever we can squeeze out of it.
@JukkaL
Copy link
Collaborator

JukkaL commented Jan 30, 2016

I agree that it would be nice if mypy could do the right thing automatically when a library module has inline annotations. However, this proposal has a few problems:

  • If a library is partially annotated, mypy could pick some files from the implementation and some from stubs. As these could refer to different versions of a library, this could be confusing or generate bogus errors.
  • Some modules may only contain variable definitions without any functions, and for these it's not easy to see whether a module is "annotated", as all annotations are optional.
  • Parsing a file to decide whether is has annotations would be slow, especially with the current parser implementation in mypy. We could cache the results but this is still not optimal.
  • The target Python version/configuration may be different from the one running mypy. Thus determining sys.path is difficult in general. One potential way to do this is to give mypy a path to a Python interpreter and have mypy run /target/path/python -c 'import sys; print(sys.path)' or similar.
  • Even if a file has type annotations, there is no guarantee that mypy can process the file without errors. Maybe the annotations are only for PyCharm or human readers, and mypy would generate type errors.

Here are some further ideas:

  • Maybe a library with inline annotations should ship with .pyi files automatically generated from the inline annotations. If we have a convention for where to look for them, they could always be considered as more authoritative than stubs from typeshed.
  • A library with inline annotations could ship with some marker file or other well-known flag that signals that mypy should look at the implementation. Also, this signals that the library author has verified that mypy can cleanly process the module implementation. This has the issue of potential mypy version conflicts. If this doesn't work for some reason, users should be able to not use automatic sys.path traversal and manually decide where to look for annotations.
  • We could ask authors of libraries with inline annotations to generate stubs using the aforementioned (hypothetical) tool and contribute them to typeshed. This would have known challenges such as dealing with multiple library versions, and deciding what to do if typeshed is out of date for some reason (e.g. everyone who can process PRs is busy).

@bdarnell What do you think?

@bdarnell
Copy link
Contributor Author

I'd be happy to include some sort of package-level marker to indicate that the package contains PEP 484 type annotations and these should be used instead of typeshed (and that these decisions should be made at package scope instead of file-by-file). I'd rather not make this mypy-specific since mypy is not supposed to be the only consumer of PEP 484 annotations. If mypy can't handle the annotations in a package then it should either fail or fall back to typeshed for that package.

It would be unfortunate if the only use of inline annotations was in a tool that extracted them to .pyi stub files. If that were the case I would probably write stub files by hand instead of wrestling with inline annotations (which require runtime dependencies on typing and other intrusions on the production code).

Requiring all stubs to be submitted to typeshed seems like the worst possible outcome - I was under the impression that typeshed was essentially a transitional thing, and the long-term goal was for type information to migrate from typeshed into the respective packages. I think centralizing everything in typeshed would discourage participation by package authors.

@gvanrossum
Copy link
Member

I haven't forgotten this (I've just been distracted). We just encountered an issue here where there's a discrepancy between the way PyCharm and mypy search for stubs. In this case it was about an extension module, for which the user had created a stub file (but not added it to typeshed). Mypy found it fine on the default module search path. But PyCharm would only find it if there was a corresponding .py file in the same directory. In their case the fix was to add a .py file that simple re-exported the extension module. But it would be nice to have some kind of standardization for this.

@vlasovskikh
Copy link
Member

PEP 484 is not clear about several things about the search path for stubs. I've mentioned some of them in a comment to python/typing#184.

@bdarnell Shouldn't *.pyi files from typeshed override the *.py files from installed packages? (according to the PEP). If there is no installed top-level package for the module in typeshed, then I believe the type checker could skip the typeshed stub for it.

@bdarnell
Copy link
Contributor Author

@vlasovskikh Why would typeshed annotations take precedence over annotations in the code itself? The files in typeshed may describe a different version of the package and the annotations closer to the code are more likely to be correct for that version. If typeshed takes precedence then it is impossible to transition from annotations in typeshed to annotations in the code itself, and packages are being added to typeshed without coordinating with the package authors.

@vlasovskikh
Copy link
Member

@bdarnell Given the following variants (please correct me if I'm missing something):

  1. Typeshed annotations first, then the annotations inside the code (until the user changes the priority of annotations lookup)
  2. Only the annotations inside the code (until the user changes the priority of annotations lookup)
  3. The annotations inside the code first if there are any (may be time-consuming to detect), then typeshed annotations (until the user changes the priority of annotations lookup)

I would prefer the variant (1) here. Speaking of (3) it may take a while to determine if any files inside a package contain type hints, so it will slow the code analysis down making it hard to run it on-the-fly. Also (3) implies that any annotation added to a module (say, in the form of a # type: ... comment) forces a type checker to switch from typeshed to the real code, this might confuse the user since all the sudden the code analysis for a slightly updated version of the library starts to miss many errors inside their code.

@bdarnell
Copy link
Contributor Author

bdarnell commented Mar 1, 2016

You're assuming that typeshed annotations will be higher-quality than in-code annotations; I am assuming the opposite. More to the point, if I as the author of a package add type annotations to that package, that's how I want my users' code to be type-checked. I don't want people coming to my support forums with type-checking issues that are caused by version skew between typeshed and the actual code they have installed.

Typeshed currently contains near-useless stubs for my project which were added without my knowledge. The presence of these stubs stands in the way of me adding my own annotations to the code. This is frustrating and discourages me from exploring type annotations further.

Your option 3 is the only one that makes sense to me as a library author. I'm fine with adding some piece of metadata somewhere to say "this package uses PEP 484 annotations which should be used by type checkers" so that you don't have to actually load the code to find the annotations (and to address the possibility of incompletely-annotated projects that aren't ready to be checked without typeshed support).

@vlasovskikh
Copy link
Member

@bdarnell I see your point. Forcing library authors to mark their libraries with a piece of metadata that tells that the annotations should override typeshed seems a bit bureaucratic. We might be better off with (2) + a type checker-dependent notification that typeshed has a stub for a particular library so the user might want to use the stub if they want to.

@gvanrossum
Copy link
Member

I like (2) as well. Can one of you formulate a PR to the PEP?

@gvanrossum
Copy link
Member

In #1372 (comment) @bdarnell wrote

I need to be able to specify that my .py files contain annotations that should be considered as valid as .pyi files. -s categorically considers .py files as inferior which undermines the rationale for inline annotations.

Actually mypy's search algorithm takes things one directory at a time. If the .py file occurs in a directory that's earlier in MYPYPATH than the .pyi file, the .py file always wins (with -s, it's marked as not found, but mypy doesn't fall back on the .pyi file if it's in a later directory).

When a .py and .pyi are in the same directory, the .pyi always wins (regardless of -s).

I personally find this a very reasonable algorithm. If you want to have independent control over whether your .py or your .pyi files win, put them in different directories, and flip the directory order in MYPYPATH.

@bdarnell
Copy link
Contributor Author

It's perfectly reasonable that when both .pyi and .py files exist as peers, the .pyi file wins. When I said that -s considers .py files as "inferior" I didn't just mean lower priority in the search process, I meant that it doesn't consider them at all.

Here's my concern: I want to publish a library that contains PEP 484 annotations in the source code, and I want users of this library to be able to type check their code using my annotations. If -s becomes common (it is broadly encouraged in e.g. #1339), then I would need to also publish annotations in .pyi form, which I really don't want to do.

If there were a tool to automate the creation of .pyi stubs from an annotated .py file then I could live with that, but in that case the PEP should be clear that the .pyi files are the source of truth for type information (for public interfaces) and inline annotations are just an option for authors who prefer to write their annotations that way. The PEP currently gives the opposite impression: type hints are meant to be provided inline, and stubs are a workaround for when it is impractical to use inline annotations.

@gvanrossum
Copy link
Member

gvanrossum commented Apr 16, 2016

[-s] doesn't consider [.py files] at all

That's not completely true. The presence of a .py file is still noted and aborts the search right then (from the type analysis POV it's as if the import failed; the module is replaced with a dummy object of type Any).

The way to make it note them is to pass them explicitly on the command line -- or the directory containing the package. All .py files in the directory will be considered.

If -s becomes common (it is broadly encouraged in e.g. #1339), then I would need to also publish annotations in .pyi form, which I really don't want to do.

OK, this is the crux of the matter. I think "encouraged" is too strong a word, and we should really document it in a way that doesn't let people fall in that trap. It's also possible that we'll have to change something, but first I'd like to explore how close we can get without changes to mypy first.

Today, without changes to mypy or typeshed, the way to arrange for what you want without passing the tornado package on the command line would be to download tornado and copy the files into a separate directory, and point MYPYPATH to that directory. The reason for the separate directory is that the version of tornado that's actually used is installed in site-packages, and that presumably contains other packages that are not mypy-clean (maybe they have stubs in typeshed, maybe not).

But just passing the tornado package on the command line (together with the app itself) would be much simpler! In that case pointing to the copy installed in site-packages works just fine, and does the right thing regardless of whether -s is used. The problem with this solution is that it doesn't scale very well if an app uses lots of packages that use the same strategy as tornado -- and hopefully most packages will eventually adopt that strategy, because it's the ideal strategy!

Another problem is that we'd like to have a solution that lets the user set an environment variable once, rather than having to pass things (other than their own app and the flags they like) on the command line. At Dropbox so far we side-step these issues by having a script that runs mypy with the right arguments checked into our repos, but I realize this is not an ideal solution for most use cases.

Perhaps a solution can be found using a central config file as suggested in #1249.

Another possible solution would be to extend the meaning of MYPYPATH so that if you point it to a top-level package (instead of to a directory full of packages, like site-packages) it will do the right thing: trying to import it will just load the package rather than search it as a directory (which would be nonsensical) and .py files in the package will be loaded regardless of -s. Actually this should probably be a separate environment variable (MYPYPACKAGES?) to avoid confusion -- Python 3 has a concept called implicit namespace packages (PEP 420) that would make it hard to distinguish between a package and a directory full of packages. Does that work for you?

@gvanrossum
Copy link
Member

A problem with just passing the tornado package on the command line is that this forces mypy to analyze the entire package. Using the -i flag (which caches the symbol tables of successfully type-checked modules) should speed that up the second time around, but it's still pointless.

@bdarnell
Copy link
Contributor Author

[-s] doesn't consider [.py files] at all
That's not completely true. The presence of a .py file is still noted and aborts the search right then (from the type analysis POV it's as if the import failed; the module is replaced with a dummy object of type Any).

OK, -s doesn't consider annotations in .py files at all.

I believe that -s should be limited to suppressing errors, and if something would have succeeded without error in the absence of -s, it should work the same with it.

Passing tornado on the command line at the same level as the code I'm working on seems like a non-starter - I want mypy file_i_am_editing.py to be fast and do no more work than necessary to find errors in that file.

A MYPYPACKAGES environment variable (or equivalent command-line flag that distinguishes these packages from the main input files would work), although it would be weird if this setting only had any effect if -s were used. If this setting were added I think I'd want to make -s effectively the default, and never consider packages unless they're either on the command line or in MYPYPACKAGES.

This new setting still seems like a lot of unnecessary manual maintenance, though. I'm envisioning something like a line in setup.py (a PEP 426 extension) to declare that a package contains PEP 484 annotations, and then mypy could be given a site-packages directory and use this metadata to decide which packages it can look into.

@gvanrossum
Copy link
Member

OK, -s doesn't consider annotations in .py files at all.

True but not the whole truth -- -s explicitly doesn't look in .py files at all.

-s should be limited to suppressing errors

Sorry, that's just not what it's for. At this stage of mypy's development, we need -s to avoid wasting time analyzing code that's not ready for mypy. For example, where I work we have a codebase that takes over a minute to process without -s, producing over a thousand error messages, but with -s, pointing mypy only at the package of interest, it completes in under 10 seconds (and we've made that package mypy-clean). And that's just analyzing the app's code -- in either case it uses stubs for all 3rd party code. You could argue that the problem is lack of modularity in that code base (and I would have to agree :-), but that's what we have to work with -- almost every (interesting) file ends up with dependencies amounting to about 200K LOC, most of which hasn't been made mypy-clean yet. And that's what -s is meant to address.

if something would have succeeded without error in the absence of -s, it should work the same with it

That's more or less the case (modulo some bugs), but it does this by suppressing type-checking errors whenever Any is involved, and propagating Any as the result of operations that have Any as an operand -- not by analyzing the files that -s forbids and then silencing error messages for them. That would be a nice feature to have, but it would be too slow.

I want mypy file_i_am_editing.py to be fast and do no more work than necessary to find errors in that file.

The reality of mypy's implementation is that for it to find the errors in that file, it either needs stubs for every library package that it imports, or source code for those library packages. Analyzing the source code is slow. It would be nice to have a middle ground, but realistically that's not going to happen soon. However, add tweaking the way imports are processed is pretty easy.

it would be weird if this setting only had any effect if -s were used

Right, I wasn't proposing it to work that way.

make -s effectively the default

No way. It has serious problems.

never consider packages unless they're either on the command line or in MYPYPACKAGES

Or in typeshed.

Note that the normal behavior of mypy is not to search PYTHONPATH or the default sys.path at all, which already has the desired effect (of not considering site-packages). The problems here are caused by insufficient documentation, which leads people to attempt to set MYPYPATH to the contents of sys.path. (Also the flag --use-python-path is probably a red herring.)

TBH so far you're unique in wanting to use inline annotations for Tornado. Most package authors have expressed little interest in this (they're either too busy or worried about compatibility) and are happy with imperfects stubs in typeshed maintained by others. Hopefully in the future your approach will prevail!

@bdarnell
Copy link
Contributor Author

Sorry, that's just not what it's for. At this stage of mypy's development, we need -s to avoid wasting time analyzing code that's not ready for mypy.

OK, that's making more sense now. My introduction to -s was a comment like "if you're having import errors, add -s", which gave me the wrong impression. I don't envy you trying to retrofit static typing into that beast of a codebase.

@JukkaL
Copy link
Collaborator

JukkaL commented Jul 7, 2016

I'm trying to synthesize a more concrete proposal from the above discussion. Here's my proposal:

If you use inline annotations in a library that you maintain, you'd have a few different options for enabling mypy to use them:

  1. Ship .pyi files with your library and tell your users to point MYPYPATH to the directory containing the stub files. It would be nice to have a tool for automatically generating them, but it doesn't exist yet.
  2. Contribute .pyi files to typeshed (or wait for others to contribute them). This could be a good option if your API is stable, since typeshed currently can only have a single version of stubs for a particular package, and typeshed is shipped as part of mypy so they don't normally get updated unless mypy gets updated.
  3. Point mypy to the install directory for the package (likely under site-packages) using the MYPYPACKAGES environment variable (this is a new mypy feature that doesn't exist yet). You'd add a directory containing an __init__.py file or a path to a .py file (it doesn't have to be __init__.py if it's a top-level module) to the environment variable. Mypy would silence any errors from these packages, as they may be checked using different mypy options or a different mypy version compared to the rest of the program that is being type checked.

The first option is not very user friendly, as there's no standard location for 3rd party .pyi files that don't live in typeshed.

The third option is also a little clunky. Looking up the right path can be done through something like python -c 'import mypkg; print(mypkg.__file__)', though this can refer to a .pyc file.

@gvanrossum
Copy link
Member

Here's an idea to make (3) a little easier to use: add a repeatable mypy option that names a module or package name. Mypy will then look this module/package up using sys.path for the Python interpreter corresponding to --python-version (possibly forking a different Python interpreter, using a snippet similar to what you showed). It then arranges to include that module or all files in that package in the type checking, as if those files were listed on the command line, BUT it will suppress errors about them (as Jukka proposed). E.g. for an app using Tornado and Flask this could look like this:

mypy --python-version 2.7 --library-module tornado --library-module flask my_app_main.py

This assumes that both Tornado and Flask have (Python 2 style) inline annotations in the installed package.

NOTE: We will soon support a global setup file; the --library-module flags could be supplied from there instead of from the command line, making this much less of a burden.

@JukkaL
Copy link
Collaborator

JukkaL commented Jul 7, 2016

I like the interface, except for the fact that it's a little unclear which Python interpreter will be used by mypy. For example, if using 2.7, user might expect python2 or python to be the interpreter, but these aren't necessarily the same executable. Maybe support something like --python-interpreter python2?

@gvanrossum
Copy link
Member

Sounds right.

I guess until all that is implemented, we can recommend something like the following:

M=tornado
P=$(python2.7 -c "import $M; print($M.__file__)" | sed 's|/__init__\.pyc*||')
mypy my_app_main.py "$P"

@gvanrossum
Copy link
Member

I think you'll like #1895 once it's landed -- it'll let anybody (well, any setup.py :-) install their own additions to typeshed.

@bdarnell
Copy link
Contributor Author

#1895 gives me a place to put .pyi files, but I don't have .pyi files (maybe one or two, but not for the bulk of my code). My intention for this issue is that I need a way to tell mypy that my .py files have usable inline type annotations.

@gvanrossum
Copy link
Member

Oh, yeah, sorry, my previous comments suggest that I knew that (once :-).

But I haven't heard your opinion on the suggestion I made there. I think it would be reasonable for there to be an option that can be used to name packages that are searched without also searching their sibling packages, so that you point mypy at a package installed in e.g. site-packages without causing mypy to search the site-packages directory.

This option could be specified on the command line; once we have a global setup file it could also be specified there.

@bdarnell Does that address your concern, or do you need more?

PS. In my earlier example I think I meant:

mypy my_app_main.py --library-module "$P"

@bdarnell
Copy link
Contributor Author

--library-module doesn't address my concerns, because it still puts a burden on the application developer to know which of their dependencies have type information available. I want something that lets me point mypy at either a requirements.txt file or an installed environment and makes it figure out what type information it should use. I'm picturing a setting in setup.py (typehints="inline"?) which gets written out into the dist-info metadata.

@gvanrossum
Copy link
Member

gvanrossum commented Jul 30, 2016 via email

@bdarnell
Copy link
Contributor Author

My knowledge of setup.py is thin as well; I was hoping that someone else would be able to fill in the gap behind my hand-waving. A verbose way to do what I'm talking about would be the entry_points feature. Packages can contain an entry_points section in their setup.py, and then other packages (in the linked example, sqlalchemy) can query the system for all installed entry points with a given name. A typehints entry point would be one way to do what I'm asking for. I'd like for there to be a less wordy way to do this, but that would be beyond my knowledge of the setup.py world.

@tharvik
Copy link
Contributor

tharvik commented Aug 2, 2016

Usually, entrypoint should be executable. There is some more obscure arguments, such as package_data which can be useful here.

@cpennington
Copy link
Contributor

cpennington commented Oct 25, 2016

Could we salvage option 1 by providing some minimal tooling in mypy (or some other common place) for packaging up .pyi files at package build time, and then providing a standard place for installing them (via package_data)? At that point, mypy could look at the installed packages for that those .pyi files.

Upsides I see to this approach:

  1. Library authors don't need to compile their own stub files (those would be built by a tool at package time)
  2. Mypy doesn't need to parse all the files during imports, it can just look for the pre-packaged .pyi files, which will be faster to parse.

@cpennington
Copy link
Contributor

I just re-read PEP-484 and found https://www.python.org/dev/peps/pep-0484/#storing-and-distributing-stub-files, which seems fairly close to what I'm already suggesting, but absent the additional tooling to auto-build .pyi files at package time.

@cpennington
Copy link
Contributor

We could use http://setuptools.readthedocs.io/en/latest/setuptools.html#adding-setup-arguments, and add such an entrypoint for mypy, so that you can just add something like typehints = True to setup(...), and it would compile the .py files as .pyi files into a package_data entry (say, in a typehints directory, relative to the package), and then mypy could use the pkg_resources api to find the relevant directory (http://peak.telecommunity.com/DevCenter/PythonEggs#accessing-package-resources).

Alternately, perhaps the typehints = True flag could just compile the .pyi files adjacent to the .py files, at which point the existing mypy lookup methods would Just Work.

@77118cce
Copy link

Hi. Sorry to bother you like this here but is there any official example with stub-only package?

@gvanrossum gvanrossum removed this from the 0.5 milestone Mar 29, 2017
@emmatyping emmatyping mentioned this issue Mar 7, 2018
9 tasks
JukkaL pushed a commit that referenced this issue Apr 9, 2018
This makes it possible to use inline types from installed packages
and use installed stubs according to PEP 561.

Adds `--python-executable` and `--no-site-packages` 
command-line options.

PEP 561: https://www.python.org/dev/peps/pep-0561/

Fixes #2625, #1190, #965.
@emmatyping
Copy link
Collaborator

Fixed by #4693.

@kiwi0fruit
Copy link

@ethanhs So now there is a way to make mypy use inline annotations without duplicating code to .pyi? How?

@kiwi0fruit
Copy link

kiwi0fruit commented Sep 16, 2019

@ethanhs I've added py.typed empty file to the custom ./pkg module folder near __init__.py (module that is part of the project but without .pyi files) as suggeted in PEP 561 but mypy still says Import of 'pkg' ignored when I try to use it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants