Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are type annotations worth having? #533

Open
adam2392 opened this issue Aug 28, 2020 · 45 comments
Open

Are type annotations worth having? #533

adam2392 opened this issue Aug 28, 2020 · 45 comments

Comments

@adam2392
Copy link
Member

Should we have type annotations within the code base, per @hoechenberger comments here: #532 (review)?

@hoechenberger
Copy link
Member

Definitely +10 on having type annotations, I've been using them in my own small projects for two years now or so and I love them. Only annoying things:

  • if one wants to be exhaustive and has several logical ORs, it becomes a little ugly to read sometimes (but this is rare)
  • not so easy for beginners to understand what's going on, since it's a relatively new language feature
  • documentation rendering may not always be super pretty

This is how I got it to render on my QUEST+ documentation:
https://questplus.readthedocs.io/en/latest/qp.html

Cool thing is that you declare the type once, in the signature, and then don't have to repeat / remember it in the rest of the docstring, see e.g.

https://github.com/hoechenberger/questplus/blob/9e1d0b824f9f7831492b6808e8f275f7e081c5ec/questplus/qp.py#L10-L36

@hoechenberger
Copy link
Member

hoechenberger commented Aug 28, 2020

  • if one wants to be exhaustive and has several logical ORs, it becomes a little ugly to read sometimes (but this is rare)

I just remembered that the reason why in my QUEST+ code I switched back to an "Optional[dict]" annotation from something like Optional[dict[list[Union[int, str]]]] … well… you're already seeing the reason, I suppose ;) So we gotta decide to what detail we want to annotate.

@hoechenberger
Copy link
Member

Optional[dict[list[Union[int, str]]]]

... which probably means that I should just have introduced some new types :)

@adam2392
Copy link
Member Author

We could also incorporate mypy in the CI?

@hoechenberger
Copy link
Member

hoechenberger commented Aug 28, 2020

We could also incorporate mypy in the CI?

One step after the other ;) But yes, this might be useful sometime

@sappelhoff
Copy link
Member

sappelhoff commented Aug 28, 2020

I have not much experience with type annotations, but just today I ran into a bug that could have been prevented with type annotations. (I was debugging for a long time until I noticed that at some point I had written "True" instead of True 🤦‍♂️ )

from what it looks like, the benefit would be mostly for the developers, is that right? The documentation rendering seems slightly less appealing (but still good) ... can we still do things like list of str, or ndarray, shape (3, n)?

I am curious to see this in action, and I don't see that it could hurt to have type annotations - apart from the initial extra work.

edit:

One step after the other ;) But yes, this might be useful sometime

what's the benefit of type annotations if we do not run a program (like mypy) that checks these annotations? --> is the only benefit then the "improved readability" of the code?

@hoechenberger
Copy link
Member

hoechenberger commented Aug 28, 2020

from what it looks like, the benefit would be mostly for the developers, is that right?

Depends on what you consider a developer ;) Any user of MNE-BIDS that uses the library via Python code would benefit if their editor supports type annotations.

The documentation rendering seems slightly less appealing (but still good) ...

Wait until you see this:
https://questplus.readthedocs.io/en/latest/psychometric_function.html#questplus.psychometric_function.csf

💩😅

can we still do things like list of str, or ndarray, shape (3, n)?

So far I've only used sphinx-autodoc-typehints and this wouldn't support this out of the box. You'd have to use list[str] for the first example; for the ndarray, I'm not sure. But there's got to be solutions, I believe the numpy folks were also working on type hints for numpydoc.

what's the benefit of type annotations if we do not run a program (like mypy) that checks these annotations? --> is the only benefit then the "improved readability" of the code?

Any good IDE (e.g. PyCharm, VS Code with pylance) will tell you upon typing a line of code that you're about to submit a value of the wrong type to a function. Doesn't always work, of course, because sometimes types have to be inferred; but I've had good experiences with this so far.

@hoechenberger
Copy link
Member

hoechenberger commented Aug 28, 2020

@sappelhoff

consider the following example code:

from typing import Optional, List


def f(a: str = 'foo',
      b: int = 0,
      c: Optional[List[str]] = None) -> str:

    return 'Hello!'


f(a=1, b=None, c=1)

Let's see VS Code with Pylance in action, with the following setting:
Screenshot 2020-08-28 at 21 16 45

Ok we're set, let's go!

ezgif com-video-to-gif

You can see that during typing the function arguments, I already get info as to which type would be expected; and I get those red wiggly lines if I try to pass anything the function doesn't expect.

I can also find an overview of all issues in the Problems tab:
Screenshot 2020-08-28 at 21 37 45

@sappelhoff
Copy link
Member

that's really neat, thanks for the example!

but it seems that the state of adoption in big projects is still very early.

issue for support in numpydoc: numpy/numpydoc#196

@agramfort
Copy link
Member

agramfort commented Aug 29, 2020 via email

@hoechenberger
Copy link
Member

hoechenberger commented Aug 29, 2020

working in Python for > 10 years I've never used it ;)

It's relatively new, pops, you're too old and wouldn't understand! It's not a phase! ;)

No but seriously, I think it can be super helpful and help catch or prevent bugs. I also switched my JavaScript code to TypeScript, which uses similar type declarations, and it's really worth the initial pain of documenting the interfaces.

@hoechenberger
Copy link
Member

Things are really still in the process of evolving. For example, NumPy just recently added "type stubs" to their main repository, which help static Python type checkers to know about the large set of compiled C functions they use.

For pandas, this still seems to be kind of work-in-progress.

And then there is the data-science-types project, aiming to generate stubs for NumPy, SciPy, and Matplotlib. Also WIP :)

So I believe we don't need to rush anything here; however, I get the impression that the scientific Python ecosystem is definitely moving in the direction of using type hints. As it's easier to start early adding type hints to a small / growing code base than to an old, fuzzy, big code base, I think it's worth our while to seriously consider adding type hints. :)

@hoechenberger
Copy link
Member

hoechenberger commented Jun 12, 2022

I've been working on this a bit and I think my main pitch for why we must have this is basically what is shown in the below screencast: this is one of our functions whose return type depends on the value of an input parameter:

Screen.Recording.2022-06-12.at.19.45.25.mov

I will try to finalize most type annotations within the next two weeks or so, and try to get this merged into typeshed (I already got in touch with them and asked for permission to include default parameter values, which is normally against their rules, but they said they might make an exception for us!), so all users of VS Code with Pylance will automatically profit from it. Will ping interested parties once I think it's ready for testing.

The beauty of this approach is that type annotations remain entirely optional and can live outside this project's repository until we've reached a consensus.

cc @cbrnr @agramfort @larsoner

@cbrnr
Copy link
Contributor

cbrnr commented Jun 13, 2022

To be honest, although this is great and all, I don't really trust Pylance. It's not because Pylance per se, but more often than not Python code is not properly annotated, so it cannot infer the type. Now of course your argument is that we will properly annotate (externally at the moment), but I can't and won't trust Pylance unless all type suggestions are 100%. And this is never going to happen, so I'd rather look at the source code (docstring in most cases).

@hoechenberger
Copy link
Member

hoechenberger commented Jun 13, 2022

These days, basically all "main" scientific Python packages are extensively annotated (NumPy, SciPy, Pandas)
And if we annotate MNE-BIDS and MNE-Python, there will be a real benefit for both users and developers. I literally catch multiple bugs a day when working with annotated code and enabling type checking -- sometimes I simply forget that certain functions may have different return types depending on the input. And then I quickly forget to special-case a None :)

But the good thing for you is that you can always just disable type checking and rely on the (also sometimes incomplete and buggy) docstrings instead 😅

@cbrnr
Copy link
Contributor

cbrnr commented Jun 13, 2022

In principle yes, but someone has to make sure that our annotations are always in sync with the code base. I can only imagine this happening when annotations live directly in our code and not in an external package.

In a smaller package like this one, I'd be okay with including annotations (yes, you read that correctly). I already maintain a package which has type annotations (https://github.com/cbrnr/sleepecg), and it's not too bad (although they never helped me in any way either). It would be something else to decide whether or not we should annotate in MNE-Python, which is a much larger package with a ton of existing code, types, functions, modoules, etc.

@hoechenberger
Copy link
Member

hoechenberger commented Jun 13, 2022

In principle yes, but someone has to make sure that our annotations are always in sync with the code base. I can only imagine this happening when annotations live directly in our code and not in an external package.

Typeshed does that automatically, they've set up a rather sophisticated system to my understanding, which regularly checks whether the annotations and the upstream code base are still in sync.

That's why I'd definitely want to get these annotations into typeshed if we don't want to annotate the code base itself. Otherwise everything will get messy and out of sync real quick.

I already maintain a package which has type annotations (https://github.com/cbrnr/sleepecg), and it's not too bad (although they never helped me in any way either).

Hehe I saw that already a while ago and my first thought was, "damn, that must've been hard for Clemens to reach a consensus here!" 😅 I find it interesting you say it never really helped you – did you enable strict type checking in your IDE? Or just "basic" checks?

I first came across type annotations & checking when working with TypeScript a few years back, and this is how I learned to appreciate this. 👍 But then again, my (working) memory doesn't seem to be all that great, so I tend to forget about those little details like varying return types a lot, and a type checker can be of great help to me there.

@cbrnr
Copy link
Contributor

cbrnr commented Jun 13, 2022

If typeshed takes care of syncing that's great! One less thing to worry about.

Hehe I saw that already a while ago and my first thought was, "damn, that must've been hard for Clemens to reach a consensus here!" 😅

I'm not that stubborn 😄 – the project was mainly implemented by a master's student, and he wanted to have annotations. Simple as that.

I find it interesting you say it never really helped you – did you enable strict type checking in your IDE? Or just "basic" checks?

I have to enable something? I thought Pylance is just doing its thing in the background? I never actively enabled something.

@hoechenberger
Copy link
Member

hoechenberger commented Jun 13, 2022

I'm not that stubborn 😄 – the project was mainly implemented by a master's student, and he wanted to have annotations. Simple as that.

Oh wow, this was a Master's project? Super sophisticated!!! I'm genuinely impressed, I thought it's something on a PhD level.

I have to enable something? I thought Pylance is just doing its thing in the background? I never actively enabled something.

Well yes 😂 I believe the default is "off" for type checking!

The setting you'll want to check is python.analysis.typeCheckingMode. I have it usually set to basic, but when working with well-typed code bases, I switch it to strict.

But even at basic level it produces lots of "false alarms" with MNE, so I'm sure you will not like it there :)

@cbrnr
Copy link
Contributor

cbrnr commented Jun 13, 2022

I enabled basic type checking and opened SleepECG only to find stuff like this:

filename = filename + '.csv'
Operator "+" not supported for types "str | Path" and "Literal['.csv']"
  Operator "+" not supported for types "Path" and "Literal['.csv']" when expected type is "str | Path"

That's exactly the nonsense I am not willing to deal with. Maybe I did not annotate correctly (I don't think so), but I want to continue programming Python and not something that approaches Java.

@hoechenberger
Copy link
Member

hoechenberger commented Jun 13, 2022

@cbrnr
But this either highlights a bug in your code above, or in the type annotation!

What this tells you is:

"Hey, filename can in some circumstances be a Path, and a Path cannot be +'d with another string!"

This is exactly the kind of bug that type checking is meant to uncover! This is precisely what I'm talking about when I say, it helps me prevent several bugs a day :)

@hoechenberger
Copy link
Member

hoechenberger commented Jun 13, 2022

Btw there is a way to work around this: you could use a "type guard" like:

assert isinstance(filename, str)
filename = filename + '.csv'

shouldn't spawn the error, right?

Or like this:

if isinstance(filename, str):
    filename += '.csv'
else:  # it's a Path
    filename = filename.with_suffix('.csv')

Edit: incorrect code

@hoechenberger
Copy link
Member

but I want to continue programming Python and not something that approaches Java.

You still have duck typing, and you already need to know whether you're dealing with a string or a Path, so the only thing that type annotations & type checkers do is provide some additional help, is all…

@cbrnr
Copy link
Contributor

cbrnr commented Jun 13, 2022

Alright, I really thought that you can + a Path and a str! In that case, you are absolutely correct! I never checked this myself because I assumed that the person who wanted to use type annotations would also use these annotations (i.e. turn on basic checks).

I might have to revise my opinion on type annotations. As long as everything stays optional, I think this could be useful indeed.

@hoechenberger
Copy link
Member

hoechenberger commented Jun 13, 2022

I might have to revise my opinion on type annotations.

Wow! I never expected this! 😄

My mind is totally blown right now hahaha.

As long as everything stays optional, I think this could be useful

Yep that's what I believe too!

I also sometimes turn off all type checking when working with code bases that are either not or just poorly annotated, as the amount of red squiggly lines due to false alarms is driving me nuts (this is actually sometimes the case with MNE-Python). So it's super nice we are free to choose, depending on our preferences and requirements!

@cbrnr
Copy link
Contributor

cbrnr commented Jun 13, 2022

Wow! I never expected this! 😄

My mind is totally blown right now hahaha.

😆 – as I mentioned previously: I am not that stubborn! Thanks for bearing with me and showing me my first very own example where type annotations really help!

@cbrnr
Copy link
Contributor

cbrnr commented Jun 13, 2022

I just encountered a case where I think the error is actually a fluke (sorry this is OT, please let me know if you would be available e.g. on Discord for 10 minutes to help me go over the errors that I find weird):

The backend parameter can have only three possible values, which I check here:

    if backend not in _all_backends:
        raise ValueError(
            f'Invalid backend for detect_heartbeats: {backend!r}. '
            f'Possible options are: {_all_backends}.',
        )

Further down, I have an if/elif block for each of the three conditions (no else). And then I return something that's computed in any of these three blocks. But I get an UnboundError, because Pylance doesn't seem to see that I already checked for this at the beginning of the function. Any suggestions?

@hoechenberger
Copy link
Member

hoechenberger commented Jun 13, 2022

@cbrnr I'm quite busy right now, but a) yes there are sometimes bugs like these in Pylance & pyright, which get fixed once reported, and b) sometimes one overlooks something, so it might not necessarily always be a bug in Pylance

Like I said I'm a bit busy, but maybe you can condense this code down to a MWE (edit: i just saw you linked to the original code, so I guess I'll take a look at this)

I'd be happy to help get this stuff sorted out. Maybe you can make me a list of these issues and I can try to dig into this once I have time again? (maybe tonight, maybe in a few days, cannot promise anything!)

@cbrnr
Copy link
Contributor

cbrnr commented Jun 13, 2022

Sure, I will try to solve these things by myself as part of cbrnr/sleepecg#93, so anything I can't fix or I have question I will post there (and tag you in case you have time to take a look).

Yes, in this particular example I've linked to the code, if you know how to best deal with it please LMK (and if you don't have time just don't respond, that's perfectly OK).

@hoechenberger
Copy link
Member

hoechenberger commented Jun 13, 2022

Cool! And like I said, be prepared to get bitten by bugs in Pylance itself once in a while – see this list of user-reported bugs that have been fixed over the course of the past couple of months alone:
https://github.com/microsoft/pylance-release/issues?q=is%3Aissue+label%3A%22fixed+in+next+version%22+is%3Aclosed+label%3Abug

and a similarly long list exists for pyright, which is the type checker pylance is built upon…

Edit: Fixed URL

@hoechenberger
Copy link
Member

I just encountered a case where I think the error is actually a fluke (sorry this is OT, please let me know if you would be available e.g. on Discord for 10 minutes to help me go over the errors that I find weird):

The backend parameter can have only three possible values, which I check here:

    if backend not in _all_backends:
        raise ValueError(
            f'Invalid backend for detect_heartbeats: {backend!r}. '
            f'Possible options are: {_all_backends}.',
        )

Further down, I have an if/elif block for each of the three conditions (no else). And then I return something that's computed in any of these three blocks. But I get an UnboundError, because Pylance doesn't seem to see that I already checked for this at the beginning of the function. Any suggestions?

It's these lines that make it difficult for Pylance to keep track:

https://github.com/cbrnr/sleepecg/blob/cf18ddf052ba2987cb50f38f4d31164842214920/sleepecg/heartbeats.py#L105-L108

    if backend not in _available_backends:
        fallback = _available_backends[0]
        warnings.warn(f'Backend {backend!r} not available, using {fallback!r} instead.')
        backend = fallback

I assume it's because _available_backends is dependent on run-time, errm, stuff (you remove from it depending on which dependencies are available)

I would think this is a bug or at least an annoying limitation in Pylance, and maybe we could turn this thing into a MWE and report it to the devs.

For now, this (ugly) workaround fixes the problem for me: I'm adding an assert as a manual "type guard":

    if backend not in _available_backends:
        fallback = _available_backends[0]
        warnings.warn(f'Backend {backend!r} not available, using {fallback!r} instead.')
        backend = fallback
        assert backend in _all_backends  # <-- ugly but seems to do the trick

@hoechenberger
Copy link
Member

hoechenberger commented Jun 13, 2022

@cbrnr Ok I found a cleaner solution that you're probably going to hate even more :)

At the top of your module, where you define _all_backends, do it like this:

_all_backends: tuple[
    Literal['c'], Literal['numba'], Literal['python']
] = ('c', 'numba', 'python')

Then Pylance is able to deduce that the line that follows,

_available_backends = list(_all_backends)

will create an _available_backends instance with only the strings we specified above. Otherwise, i.e. in main, where it reads as:

_all_backends = ('c', 'numba', 'python')
_available_backends = list(_all_backends)

Pylance says that _avilable_backends is just a list of arbitrary strings. This is what causes the error you reported further down, where one would think it's clear that the type/content must've been narrowed down, but obviously Pylance is not smart enough to get this.

I think we might have a MWE here, now, eh? Pylance should know that converting a tuple of strings to a list can only create a list of the exact same strings, and nothing else… So I'd label this a bug!! And an annoying one too!

@hoechenberger
Copy link
Member

@cbrnr Unfortunately maybe not a bug, but a feature (??!!)

  • Create a file type_test.py with the following content, and ideally in its own directory:
    # %%
    x = ('foo', 'bar', 'baz')
    y = list(x)
    
    reveal_type(x)
    reveal_type(y)
  • Install mypy into your Python env
  • run mypy type_test.py

Produces:

❯ mypy type_test.py
type_test.py:5: note: Revealed type is "Tuple[builtins.str, builtins.str, builtins.str]"
type_test.py:6: note: Revealed type is "builtins.list[builtins.str]"
Success: no issues found in 1 source file

So MyPy itself isn't even narrowing down the content of the tuple to the explicit strings we're assigning. This seems to be a Pylance-specific feature (?)
And then in line 6, it's just a list of generic strings.

So now I'm unsure whether we should ask the Pylance or the MyPy folks for advice.

@hoechenberger
Copy link
Member

hoechenberger commented Jun 13, 2022

But even with the following adjustment, mypy cannot keep track of the contents of the list:

# %%
from typing import Literal

x:  tuple[
    Literal['foo'],
    Literal['bar'],
    Literal['baz']
] = ('foo', 'bar', 'baz')
y = list(x)

reveal_type(x)
reveal_type(y)
❯ mypy type_test.py
type_test.py:11: note: Revealed type is "Tuple[Literal['foo'], Literal['bar'], Literal['baz']]"
type_test.py:12: note: Revealed type is "builtins.list[builtins.str]"
Success: no issues found in 1 source file

@hoechenberger
Copy link
Member

… stupid typing!! 😡

@hoechenberger
Copy link
Member

@cbrnr If you want I can ask on the MyPy Gitter chat. They've been very helpful in the past :)

@cbrnr
Copy link
Contributor

cbrnr commented Jun 13, 2022

OMG – I just wanted to write an elaborate response (I had actually already written but then deleted it), but the TLDR is that this exactly shows why I do not want to use type annotations. Thanks for digging into this issue, I guess I'll just turn off type checking again. It's better for my sanity.

@hoechenberger
Copy link
Member

I guess I'll just turn off type checking again. It's better for my sanity.

Hehe you do that

As for me – all hope is lost, there's not much sanity to lose anymore, so I'm just going to go all-in and ask the MyPy folks for help. Will keep you posted ;)

@hoechenberger
Copy link
Member

hoechenberger commented Jun 13, 2022

@cbrnr
But in all seriousness, I totally understand your frustration.

Meanwhile, I got in touch with the folks from the #python/typing gitter channel and received many good pointers, ideas, and feedback. I will read up on this a bit more and revisit things when I have time.

In the meantime, I have a super quick & dirty solution for annoying lines like the one above:

    return np.where(beat_mask)[0]  # type: ignore

Tada! ;)

@cbrnr
Copy link
Contributor

cbrnr commented Jun 13, 2022

Can you also do that ignore globally 😅?

@agramfort
Copy link
Member

agramfort commented Jun 13, 2022 via email

@hoechenberger
Copy link
Member

Can you also do that ignore globally 😅?

I see what you did there 🤓

In fact, you can:

python/mypy#545 (comment)

@hoechenberger
Copy link
Member

and just see this as a lot of characters to type for little gain...

Fewer bugs if you have to rely on young folks like me writing code 😅😅😅

@agramfort
Copy link
Member

agramfort commented Jun 13, 2022 via email

@sappelhoff
Copy link
Member

tackling this would potentially entail:

@sappelhoff sappelhoff modified the milestones: 0.13, future (or never) Jul 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants