Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add type annotations to astropy.utils.parsing #16519

Merged
merged 1 commit into from
Jun 1, 2024

Conversation

eerovaher
Copy link
Member

Description

The parsing utilities are used by unit formatters, so I am asking the units maintainers to review.

  • By checking this box, the PR author has requested that maintainers do NOT use the "Squash and Merge" button. Maintainers should respect this when possible; however, the final decision is at the discretion of the maintainer that merges the PR.

Copy link
Contributor

Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.

  • Do the proposed changes actually accomplish desired goals?
  • Do the proposed changes follow the Astropy coding guidelines?
  • Are tests added/updated as required? If so, do they follow the Astropy testing guidelines?
  • Are docs added/updated as required? If so, do they follow the Astropy documentation guidelines?
  • Is rebase and/or squash necessary? If so, please provide the author with appropriate instructions. Also see instructions for rebase and squash.
  • Did the CI pass? If no, are the failures related? If you need to run daily and weekly cron jobs as part of the PR, please apply the "Extra CI" label. Codestyle issues can be fixed by the bot.
  • Is a change log needed? If yes, did the change log check pass? If no, add the "no-changelog-entry-needed" label. If this is a manual backport, use the "skip-changelog-checks" label unless special changelog handling is necessary.
  • Is this a big PR that makes a "What's new?" entry worthwhile and if so, is (1) a "what's new" entry included in this PR and (2) the "whatsnew-needed" label applied?
  • At the time of adding the milestone, if the milestone set requires a backport to release branch(es), apply the appropriate "backport-X.Y.x" label(s) before merge.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks all OK to me!

@mhvk
Copy link
Contributor

mhvk commented May 30, 2024

Hmm, the readthedocs failure is related:

/home/docs/checkouts/readthedocs.org/user_builds/astropy/conda/16519/lib/python3.11/site-packages/astropy/utils/parsing.py:docstring of astropy.utils.parsing.ThreadSafeParser:1: WARNING: py:class reference target not found: LRParser [ref.class]
/home/docs/checkouts/readthedocs.org/user_builds/astropy/conda/16519/lib/python3.11/site-packages/astropy/utils/parsing.py:docstring of astropy.utils.parsing.lex:1: WARNING: py:class reference target not found: Lexer [ref.class]

@mhvk mhvk added the typing related to type annotations label May 30, 2024
@mhvk
Copy link
Contributor

mhvk commented May 30, 2024

p.s. I added a new typing label to try to keep typing-related discussion together.

@@ -37,7 +47,7 @@ def _add_tab_header(filename, package):


@contextlib.contextmanager
def _patch_get_caller_module_dict(module):
def _patch_get_caller_module_dict(module: ModuleType) -> Generator[None, None, None]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't wait for Python 3.13 (https://docs.python.org/3/library/typing.html#typing.Generator)

Changed in version 3.13: Default values for the send and return types were added.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

@@ -27,7 +37,7 @@
_LOCK = threading.RLock()


def _add_tab_header(filename, package):
def _add_tab_header(filename: str, package: str) -> None:
Copy link
Contributor

@neutrinoceros neutrinoceros May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These annotations are correct, but I note that internally, the function binds the name f twice to objects with different types (file buffers with different permissions), which mypy will flag as soon as we turn it on, so I suggest fixing it now rather than later (it suffices to use 2 different variable names).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to keep type annotation pull requests free from any code changes. If the code should be updated then it's better to do that in a separate pull request. Besides, changing the name of one variable does not sound like a big improvement to me. I would rather rewrite this function in its entirety:

def _add_tab_header(filename: Path, package: str) -> None:
    contents = filename.read_text()
    filename.write_text(_TAB_HEADER.format(package=package))
    filename.open('a').write(contents)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with @eerovaher here; better to have no code changes in typing PRs (and, really, if mypy cannot handle temporary variables like here, then perhaps it needs to be fixed...)

Copy link
Member

@nstarman nstarman May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think any static analyzer handles mutating types without a prior declaration of a type union.

e..g

a: Union[str, int]
a = "a"
a = 1

I also don't think that's something that should be fixed!
Dynamism, like most tools, is both useful and dangerous. I think mutating types in-place is more on the dangerous side than useful. Renaming a variable (out-of place mutation) is both easy and safe. Static type checks should catch this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with @nstarman . I mostly wanted to raise awareness that by adding annotations before enabling type-checking would create technical debt in the form of more errors to be fixed to get type-checking started instead of less (well, not sure what the net change will be). But the current approach is still better than not addressing type hints at all, and it reached some kind of consensus.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also note, for closure, that @eerovaher's effort in reducing long functions might compensate for this specific kind of new "errors" at type-checking time: the probability that a variable name be used for more than one type is greatly increased in long scopes.

@eerovaher
Copy link
Member Author

The documentation build failures are indeed related. Usually we would deal with this by adding more exceptions to docs/nitpick-exceptions, but perhaps in this case it might instead be better to prevent items from this file from being included in the documentation at all. The main problem here is that I can't tell if we can simply remove the documentation because I can't tell if this module is public or private. It does use underscores to mark some things unambiguously as private, it has an explicit __all__ and it is included in the documentation, which all indicate that lex(), yacc() and TheadSafeParser are public, but they cannot be imported directly from astropy.utils which might mean they are actually private.

@nstarman
Copy link
Member

A good example found in the wild.

@mhvk
Copy link
Contributor

mhvk commented May 31, 2024

utils is documented as having different modules from which one can import, with masked, iers, and data as sub-sub-packages. It is also noted that everything except compat is included in the documentation and is public, though with the stated proviso that it is mostly for internal use (so we deprecate things in utils a bit more readily elsewhere, though the process is the same).

But what I don't understand is why it being in the documentation is a problem? I thought the issue elsewhere is that we have two places where classes are documented, and sphinx+typing cannot deal with that. But here we have only one place, don't we? So what actually goes wrong?

@eerovaher
Copy link
Member Author

But what I don't understand is why it being in the documentation is a problem? I thought the issue elsewhere is that we have two places where classes are documented, and sphinx+typing cannot deal with that. But here we have only one place, don't we? So what actually goes wrong?

There is more than one reason why type annotation pull requests might have to wrestle with Sphinx, but it looks to me like in this case the problem is our fault.

If there are type annotations in a function signature then Sphinx tries to turn them into hyperlinks to the relevant types (see e.g. astropy.units.format.Console.to_string()). Sphinx has a bug which means that (sometimes?) it can't figure out what type an annotation is referring to if the type was imported in an if TYPE_CHECKING: block, but in this case that's not really the problem.

We have two documentation warnings and the causes are similar. One is that lex() returns a Lexer instance, but that class comes from astropy.extern, so it's private (right?) and Sphinx cannot create a hyperlink to its documentation because there isn't any. This is not specific to type annotations, the same also applies to docstrings, its just that the lex() docstring conveniently avoids mentioning the function's return type. It should be pointed out that a public function cannot return a private value by definition, so either lex() cannot be public or Lexer cannot be private.

Similarly, we seem to have declared ThreadSafeParser to be public, which means that users are allowed to initialize instances of that class directly, but doing so would require an LRParser instance and that class comes from astropy.extern so it's private. Again, this would be a problem for the docstring too, but the class docstring conveniently avoids mentioning how to create instances of it.

@mhvk
Copy link
Contributor

mhvk commented May 31, 2024

Ah, thanks for digging into that! I am sure we should not start documenting stuff in extern, so this seems to be one case where adding a nitpick exception is actually reasonable (note that eventually we really should use existing ply - see #13399, #7177 - at which point the exceptions could be removed, since one could link to the relevant classes more easily).

p.s. This module is also one of the least public ones of all of utils, since really it exists only to add locks around the parser/lexer stuff. But in itself it doesn't seem bad to have that documented.

@eerovaher
Copy link
Member Author

Adding nitpick exceptions would indeed suppress the warnings during documentation build, but as I've already explained it is a bad solution. To highlight the real problem again (regarding Lexer here): on one hand the result of a lex() call is public because it is the return value of a public function, but on the other hand it is private because it is an instance of a private class. It cannot be both public and private by definition but nonetheless we are claiming it is.

@mhvk
Copy link
Contributor

mhvk commented May 31, 2024

Yes, it is not quite right, but do we really care? At some point perfection becomes the enemy of good enough. I don't think it makes sense to remove the module from the documentation, since that effectively makes this PR an API change (and even if we do it separately feels very much like the tail wagging the dog), so one is left with the nitpick exceptions or not adding the relevant typing information. My sense is that a nitpick exception is the best way forward - they can be removed when we get around to unbundling ply.

@nstarman
Copy link
Member

nstarman commented May 31, 2024

My sense is that a nitpick exception is the best way forward - they can be removed when we get around to unbundling ply.

I agree that this PR doesn't need to tackle this question and a nitpick exception sounds like a good-enough temporary solution. It's good in and of itself to get this module typed, even if that uncovered some problems.
@eerovaher can you open a new Issue with your explanation from above? Then we have a record and can fix this later, which sounds like will require an API change either by updating our lexer, or by being more careful about public/private API, or both.

@eerovaher eerovaher force-pushed the annotate-utils-parsing branch from 1566f33 to 563a219 Compare May 31, 2024 21:52
@eerovaher eerovaher force-pushed the annotate-utils-parsing branch from 563a219 to 1df89ab Compare May 31, 2024 21:55
@eerovaher
Copy link
Member Author

I can agree that this pull request should focus on adding type annotations, but having a paradoxical API is not good.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, let's get this in!

@mhvk mhvk merged commit 8164c3a into astropy:main Jun 1, 2024
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants