Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider reverting lazy imports #13121

Open
cbrnr opened this issue Feb 20, 2025 · 23 comments · May be fixed by #13124
Open

Consider reverting lazy imports #13121

cbrnr opened this issue Feb 20, 2025 · 23 comments · May be fixed by #13124

Comments

@cbrnr
Copy link
Contributor

cbrnr commented Feb 20, 2025

I would like to discuss reverting the changes that introduced the lazy-loader package. I have been unhappy with this change basically almost from the start (although I made a positive comment in the initial discussion, but this was before I fully understood how lazy loading was implemented), and I think it is not good for the project's health in the long run for the following reasons:

  1. Lazy loading is essentially a sophisticated workaround rather than a supported Python feature. The relevant PEP 690 was rejected, and the related discussion suggests it's unlikely to be officially adopted in the future. As a result, the mechanism is not guaranteed to work in all cases and has already shown instances of breaking.
  2. The package relies on .pyi files for runtime behavior, which contradicts their intended purpose. Official references, such as PEP 561 and PEP 484, explicitly state that stub files are strictly meant for static type checking and not execution.
  3. Lazy loading can be useful for CLI tools or packages with many submodules, where startup time is a major concern. However, I think the advantages for MNE-Python are questionable:
    • The import time wasn't particularly long to begin with.
    • While a 50% reduction in relative terms sounds significant, the absolute gain is just 175 milliseconds, which is negligible.
    • We already used nested imports for optional dependencies, and nested imports can also be used to import packages on demand without the need for the non-standard lazy-loader mechanism.
    • Delaying imports until first use might be more disruptive than a slightly longer initial import time.
  4. The lazy-loader package does not appear to be very actively maintained beyond some tooling updates, and there are several unresolved issues such as all packages being loaded at once (issue #131) and eager imports not working as expected (issue #128).

Given these concerns, I strongly suggest that we at least reconsider our decision to adopt the lazy-loader package. Since this will be a rather large change involving many files and probably a lot of manual work, I'm happy to submit a PR if there is general agreement that this is the right direction. It is also not super urgent, but I think the sooner we address this, the easier it will be to revert the changes.

@agramfort
Copy link
Member

agramfort commented Feb 21, 2025 via email

@drammock
Copy link
Member

This comment has the relevant backstory links that @cbrnr forgot to include:

#12388 (comment)

we started with #11838 (but did it a dumb way, without .pyi files). This caused problems with IDE completion hints etc, as discussed in #12059 (that's where all the opinions are aired). The IDE problem was fixed by #12072.

@cbrnr
Copy link
Contributor Author

cbrnr commented Feb 21, 2025

It's not the entire backstory, just some aspect. I think I summarized all important points in my initial post.

@drammock
Copy link
Member

It's not the entire backstory, just some aspect.

Sure. I wasn't trying to imply that your description wasn't relevant. Just that there are some other things that are also relevant that you didn't include.

I think I summarized all important points in my initial post.

I disagree. Past discussions / decisions are relevant and important context. I'm surprised that you seem to be saying that you omitted them intentionally.

@cbrnr
Copy link
Contributor Author

cbrnr commented Feb 21, 2025

Of course I'm not saying I omitted something intentionally, come on!

@drammock
Copy link
Member

Of course I'm not saying I omitted something intentionally, come on!

I guess I misunderstood. Saying "I think I summarized all important points in my initial post" implied that you didn't think linking to past discussions was important, so you chose not to do it. Which, again, was surprising to me, especially since it's a frequent request when old decisions get revisited --- I assumed you'd have thought of doing so.

@mscheltienne
Copy link
Member

mscheltienne commented Feb 23, 2025

I agree with @cbrnr I don't see much value in lazy-loading and I'm also concerned about the package maintenance and adoption. +1 to revert it.

@hoechenberger
Copy link
Member

hoechenberger commented Feb 23, 2025

4. The lazy-loader package does not appear to be very actively maintained beyond some tooling updates, and there are several unresolved issues such as all packages being loaded at once (issue #131) and eager imports not working as expected (issue #128).

I find this concerning and I never liked the fact that we now critically depend upon a hack that didn't pass standardization (via a PEP) – for IMHO very good reasons. If forced to vote, I'd decide against the current lazy loader implementation.

That said, so far I'm not running into problems (anymore) so I actually don't mind (for now).

@hoechenberger
Copy link
Member

hoechenberger commented Feb 23, 2025

FWIW I just briefly checked some of the projects that are listed as having "endorsed" SPEC-0001: NumPy, SciPy, xarray.

In none of the above could I find traces of the lazy-loader dependency!? Either I'm missing something or the way the SPEC website presents itself is kind of misleading, suggesting adoption where there is none. Happy to be corrected, of course, but right now I'm kind of … confused.

@cbrnr cbrnr linked a pull request Feb 24, 2025 that will close this issue
@larsoner
Copy link
Member

I still like the lazy loading stuff but can live with the majority decision here. I guess we'll see what problems/harder other decisions come out in #13124 -- one we've already hit that people should hopefully discuss is how to handle the sklearn dependency in mne.decoding: #13124 (comment)

@larsoner
Copy link
Member

In none of the above could I find traces of the lazy-loader dependency!? Either I'm missing something or the way the SPEC website presents itself is kind of misleading, suggesting adoption where there is none. Happy to be corrected, of course, but right now I'm kind of … confused.

FWIW just looking into one, SciPy uses a lazy loading approach even if it's not via lazy_loader, see scipy/scipy#15230 / https://github.com/scipy/scipy/blob/b9b8b8171fd1453b42fc4492a279c71b54141c51/scipy/__init__.py#L129 . I think for DRY/community adoption it would be better to use lazy_loader but I haven't looked into why they use their own code. I'm assuming the same would probably hold for NumPy, xarray, etc.

@cbrnr
Copy link
Contributor Author

cbrnr commented Feb 24, 2025

Maybe they have similar issues with the package like the ones I raised here?

@larsoner
Copy link
Member

Could be, or could be that nobody has taken the time to switch over. Someone would have to look into it and investigate further.

Lazy loading can be useful for CLI tools or packages with many submodules, ...

One point missed here I think is that anything we change has consequences for any dowstream package that does import mne. So if our import time goes up, all of theirs do, too. So we either force them to live with the import time bump, or ask them to nest their imports or do some lazy importing of their own.

The import time wasn't particularly long to begin with.

One more point of data here... the original PR that added it had timings that went from ~500 ms to ~160 ms (#2001). Looking a bit now:

$ time python -c "import mne"
real	0m0.315s
$ time python -c "import mne; import scipy.linalg"
real	0m0.433s
$ time python -c "import mne; import scipy.linalg; from sklearn.base import BaseEstimator"
real	0m0.943s

As someone who uses MNE in other packages that I launch from the CLI a lot (i.e., psychophysics experiments), having it creep up to half a second (or a second!) to import mne is pretty painful. So if we do revert lazy loading I think we need to re-nest a lot of imports (probably the same list plus test from before we added lazy_loader in the first place). And adding sklearn as a hard dependency / invoked when doing import mne seems like a bad idea.

@cbrnr
Copy link
Contributor Author

cbrnr commented Feb 24, 2025

No objection against nesting the slow imports as we previously did (including sklearn).

@larsoner
Copy link
Member

sklearn isn't as simple, copying from #13124 (comment) so we can discuss in one place :

For sklearn we have a harder decision. Without lazy loading mne.decoding we basically have two choices:

  1. Require people to import mne.decoding before accessing anything in that namespace. So you could no longer do:
    import mne
    mne.decidng.CSP(...)
    
    You would need to add a import mne.decoding instead (or from mne.decoding import CSP etc.). A lot of other packages are this way, such as sklearn (and SciPy was at some point, might still be).
  2. Make sklearn a hard requirement of MNE.

I guess I'd vote for (1). It's a bit weird if we do it just for one submodule, but I guess it's justified because it's the only one that requires an optional dependency for its class hierarchy.

Beyond that I'm not sure how we'd get classes in mne.decoding to properly subclass sklearn classes (e.g., TransformerMixin, MetaEstimatorMixin, etc.). Before lazy loading we had our own duplicated, half-baked, partially incorrect variants of these subclasses, which we definitely should not go back to. It's a big maintenance burden and constantly breaking in addition to being a bad violation of DRY, not working with sklearn class validation tools, etc.

@hoechenberger
Copy link
Member

Before lazy loading we had our own duplicated, half-baked, partially incorrect variants of these subclasses, which we definitely should not go back to.

+100

@drammock
Copy link
Member

drammock commented Feb 24, 2025

Well, the ship may have already sailed, but I'd like to take the time to enumerate some arguments on both sides of this issue.

in favor of keeping lazy loader

  1. import mne is faster. Even if it's less than 200 ms, abandoning it now will be a performance regression. In a single interactive session you might not notice, but over dozens of CI runs per day (many of which we pay for) those seconds will add up. MNE import time also impacts import time of all downstream projects that themselves import MNE.
  2. we get the benefit of nesting expensive dependency imports without actually having to nest them. Previously there were many PRs where a contributor failed to nest a SciPy import, which caused a test failure, which often they didn't know how to interpret (or even access) meaning maintainer time was spent explaining, coaching, and/or fixing the problem. With lazy_loader, there's no longer a need to nest external dependencies, which aligns with contributors' default expectations.
  3. It's not causing us any problems. MNE-Python isn't experiencing any of the problems cited by @cbrnr:
  4. Less churn for contributors. As mentioned in (2) above, our import preference / policy now aligns with what people want/expect to do anyway (put external imports at the top of the file). It's a simpler system than what we had before (only some external imports got nested), and for folks who have been contributing in the last few years, I worry that we'll look a bit scattered (and contributors will be confused) going from nested to non-nested and back to nested in the span of less than 2 years.

against keeping lazy loader

(I've kept these in the same order as @cbrnr's original post

  1. It's not a built-in feature of Python (and likely never will be).
  2. It uses .pyi files for purposes they weren't intended for.
  3. The speedup doesn't matter. "Delaying imports until first use might be more disruptive than a slightly longer initial import time."
  4. lazy_loader is not actively maintained.

response to arguments against keeping it

It's not a built-in feature of Python.

I can't fathom why this should hold any weight. Lots of things will never make it into the standard library.

It uses .pyi files for purposes they weren't intended for

This is the only argument I find somewhat convincing. It seems unlikely, but it is possible that Python will someday set stricter rules about .pyi files that would interfere with them being used in this way. More concerning to me, though, is what might happen when we decide to make MNE fully type-annotated, and decide that we want/need the .pyi files for, e.g., multiple dispatch definitions.

The speedup doesn't matter

See point (1) in favor of keeping lazy_loader.

lazy_loader is not actively maintained

Maybe. I'd say it's a bit too early to tell. In scientific-python/lazy-loader#128 Stéfan said "ping me if you don't get it sorted out" and he was never pinged again, so I feel it's unfair to count that as evidence for lack of maintenance. There's also a bit of a built-in backstop, in that lazy_loader is housed within https://github.com/scientific-python which has a lot of very competent people who could step in / take over if Stéfan ever did go missing.

@cbrnr
Copy link
Contributor Author

cbrnr commented Feb 25, 2025

I think the ideal solution would be to not import everything into the global mne namespace, so having to import mne.decoding or even import mne.io would be perfectly fine for me. The package has grown a lot, and rarely anyone needs all functionality in their work. This would be a Pythonic solution in literally every way:

  • It would make import mne at least as fast as with lazy loader.
  • We don't need to nest any imports.
  • It is the most expected way to do imports in Python, which is the least amount of churn for developers.
  • It does not misuse .pyi stubs.

I don't want to reiterate my points, they haven't changed, but I do want a consensus solution and I am definitely not forcing anything onto the project. If anyone sees any other options, please let us know! Maybe there is a way to keep the current lazy loader, but get rid of the .pyi stubs (I would not prefer this, but it is better than the status quo)?

@cbrnr
Copy link
Contributor Author

cbrnr commented Feb 25, 2025

Just one quick response to the point on CI runs: if we switch from pip to uv pip (not even plain uv), we get the following install times for mne[dev] (with cached downloads of course):

  • pip install "mne[dev]": 22s
  • uv pip install "mne[dev]": 0.35s

So about 62x speedup, which I think makes arguing about 200ms longer import times a moot point.

@hoechenberger
Copy link
Member

Marginally (?) related but has anyone considered that the metrics we're using here are not really useful anyway? Just comparing import mne times is pointless; one needs to measure the cumulated import times until everything is loaded that I'll need to perform a certain action or analysis. Making everything import "lazily", i.e., deferring most imports to "later", and then just omitting them in the measurements is probably not a valid approach.

@drammock
Copy link
Member

I think the ideal solution would be to not import everything into the global mne namespace, so having to import mne.decoding or even import mne.io would be perfectly fine for me.

To me this is a non-starter. Interactive API exploration is something I use a lot with both MNE (where sometimes I forget the exact name of a func, so I mne.viz.<TAB> to find it), or just to explore what an unfamiliar package offers.

the metrics we're using here are not really useful

It's true that the import "savings" doesn't just disappear; if you use mne.decoding then the import overhead of that submodule will occur when it is first used. But as a maintainer, most of my interactive MNE sessions are "quick reproducers" for debugging; I'd say more than 90% of my sessions never use mne.decoding (or beamformer, or inverse_sparse, or ...). To spell it out: each session suffers the import time of only the modules needed / used.

(side note: this will become a much bigger deal when #13059 lands, because hedtools has an import time between 1.5 and 2 seconds. Sure, it can be nested inside the HEDAnnotations class, but there are other reasons mentioned above why nesting imports is undesirable.)

TL;DR: comparing the import mne time is not a 100% fair comparison, but it's also not "pointless". It tells you the maximum time savings per import. I don't have data on which submodules are most commonly used in the interactive sessions of all of our users, but my personal experience suggests that most of the time most of the submodules aren't needed.

@cbrnr
Copy link
Contributor Author

cbrnr commented Feb 25, 2025

To me this is a non-starter. Interactive API exploration is something I use a lot with both MNE (where sometimes I forget the exact name of a func, so I mne.viz. to find it), or just to explore what an unfamiliar package offers.

I don't use it at all, I guess we will not be able to come up with a statistic on how many users require that feature or not. In addition, you can still interactively explore if you first import mne.viz (to stick with your example, it would be sufficient to just not export the largest/slowest modules to the mne namespace).

@drammock
Copy link
Member

you can still interactively explore if you first import mne.viz

which is even slower than importing everything eagerly in the first place.

(to stick with your example, it would be sufficient to just not export the largest/slowest modules to the mne namespace).

which breaks interactive API exploration. How is that "sufficient" exactly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants