Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework Session and Package collection #11646

Merged
merged 3 commits into from
Dec 30, 2023
Merged

Conversation

bluetech
Copy link
Member

This PR implements the proposals in #7777.

@bluetech bluetech added this to the 8.0 milestone Nov 27, 2023
@bluetech bluetech force-pushed the pkg-collect branch 5 times, most recently from 6de3469 to 39f8ae6 Compare November 29, 2023 11:45
Copy link
Member

@RonnyPfannschmidt RonnyPfannschmidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its amazing to see this massive undertaking take shape

there are some detail devils and i noted 2 nitpicks

i will try to do some deeper digging into the python collector changes

i'll avoid bikesheding the Dir/Directory naming - naming rights for the implementor !

testing/test_config.py Show resolved Hide resolved
@@ -489,7 +489,7 @@ def test_collect_topdir(self, pytester: Pytester) -> None:
# assert root2 == rcol, rootid
colitems = rcol.perform_collect([rcol.nodeid], genitems=False)
assert len(colitems) == 1
assert colitems[0].path == p
assert colitems[0].path == topdir
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one seems off - the path there should match the nodeid

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old tree was

<Session>
  <Module test_collect_topdir.py>

Now it's

<Session>
  <Dir test_collect_topdir0>
    <Module test_collect_topdir.py>

So now the collection (with genitems=False) collects the Dir not the Module.

I'm not sure what was the original purpose of this test (it goes back to e2683f4) but the change here is intentional. Do you think there is something that relies on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perform collect towards nodes ids yielded the concrete files, my understanding is that it ought to keep doing that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that you think it should not emit Session -> Dir -> Module here, but Session -> Module like before?

I'll try to explain the rationale. Consider for example

repo/
  pytest.ini
  tests/
     test_it.py

Before this PR, the tree is

<Module tests/test_it.py>
  <Function test_it>

but if I add a file tests/__init__.py (i.e. make tests a package), it becomes

<Package tests>
  <Module test_it.py>
    <Function test_it>

The idea in this PR is to remove the inconsistency with Package. In this PR, without __init__.py the tree is

<Dir repo>
  <Dir tests>
    <Module test_it.py>
      <Function test_it>

and with __init__.py

<Dir repo>
  <Package tests>
    <Module test_it.py>
      <Function test_it>

Now Package is the python-plugin-ified Directory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of that function is to go from a list of nodeids/paths to the initial collectors

Aka the concrete "top-level" fs collectors that are files

The alternative is to ensure all directory collection is limited to filtering of the initial args

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I understand what you're saying now. perform_collect still does this in this PR.

The test is a bit weird. Basically it runs session.perform_collect([""]). What is the initial collector for this? From a quick test, pytest treats this like pytest . i.e. current directory. So the new behavior seems right to me. Previously it picked out the test file simply because it was the first and there was no intermediary node.

For illustration, if I add an __init__.py file in main (not this PR),

diff --git a/testing/test_collection.py b/testing/test_collection.py
index ca2e2b731..424bb9933 100644
--- a/testing/test_collection.py
+++ b/testing/test_collection.py
@@ -478,6 +478,7 @@ class TestCustomConftests:
 class TestSession:
     def test_collect_topdir(self, pytester: Pytester) -> None:
         p = pytester.makepyfile("def test_func(): pass")
+        pytester.makepyfile(__init__="")
         id = "::".join([p.name, "test_func"])
         # XXX migrate to collectonly? (see below)
         config = pytester.parseconfig(id)

then colitems[0] is the Package.

testing/acceptance_test.py Show resolved Hide resolved
Copy link
Member

@nicoddemus nicoddemus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work @bluetech! Thanks a lot for working on this. 👍

Left a bunch of minor suggestions, please take a look.

if direntry.is_dir():
if direntry.name == "__pycache__":
continue
ihook = self.ihook
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given self.ihook is a property which calls self.session.gethookproxy, would it be better to move this outside the loop?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ihook returns a FSHookProxy with a static remove_mods, and is thus very sensitive to its placement. It's something I'm hoping to improve in the future (though it's not easy to make something fast for this in pluggy), but with how pytest is now ihook cannot be safely hoisted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i believe within the loop it is safe as the remove_mods depends on self.fspath

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC the exact problem was with conftest plugins that are loaded after the ihook and are then not filtered out by remove_mods

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my rough understanding is, with the usage if directories, conftests don't get added until after the collect phase,
as such it presents as safe from my perspective

do we consider conftests in packages at their construction or at their collection?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After taking a second and third look at this, I've made a change which removes an ugly hack from this PR, allows the hoisting, and removes conftest loading from gethookproxy << this has been a real pain in my neck.

Recap (a little complicated, sorry):

Current main

In current main, calls to gethookproxy/ihook are the trigger for loading non-initial conftests. This basically means that conftest loading is done almost as a random side-effect, uncontrolled and very non-obvious. And it also dashes any hope of making gethookproxy faster (gethookproxy shows up prominently in pytest profiles).

I've wanted to improve this for a while, #11268 was the latest step towards that.

PR before change

In this PR, I ran into a problem.

Previously, Session and Package did all of the directory traversals inside of their collect, which loaded the conftests as a side effect. If the conftest loading failed, it will occur inside of the collect() and cause it to be reported as a failure.

Now I've changed things such that Session.collect and Package.collect no longer recurse, but just collect their immediate descendants, and genitems does the recursive expansion work.

The problem though is that genitems() doesn't run inside of specific collector's collect context. So when it loads a conftest, and the conftest loading fails, the exception isn't handled by any CollectReport and causes an internal error instead.

The way I've fixed this problem is by loading the conftests eagerly in a pytest_collect_directory post-wrapper, but only during genitems to make sure the directory is actually selected.

This solution in turn caused the conftests to be collected too early; specifically, the plugins are loaded during the parent's collect(), one after the other as the directory entries are collected. So when the ihook is hoisted out of the loop, new plugins are loaded inside the loop, and due to the way the hook proxy works, they are added to the ihook even though they're supposed to be scoped to the child collectors. So no hoisting.

PR after change

Now I've come up with a better solution: since now the collection tree actually reflects the filesystem tree, what we really want is to load the conftest of a directory right before we run its collect(). A conftest can affect a directory's collect() (e.g. with a pytest_ignore_collect hookimpl), but it cannot affect how the directory node itself is collected. So I just moved the conftest loading to be done right before calling collect, but still inside the CollectReport context.

This allows the hoisting, and also removes conftest loading from gethookproxy since it's no longer necessary. And it will probably enable further cleanups. So I'm happy with it.

src/_pytest/main.py Outdated Show resolved Hide resolved
changelog/7777.breaking.rst Outdated Show resolved Hide resolved
changelog/7777.breaking.rst Show resolved Hide resolved
doc/en/example/customdirectory.rst Outdated Show resolved Hide resolved
src/_pytest/main.py Outdated Show resolved Hide resolved
src/_pytest/main.py Show resolved Hide resolved
src/_pytest/python.py Outdated Show resolved Hide resolved
src/_pytest/python.py Outdated Show resolved Hide resolved
def sort_key(entry: "os.DirEntry[str]") -> object:
return (entry.name != "__init__.py", entry.name)

config = self.config
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From this point onward Dir.collect and Package.collect are essentially identical: can we move this to a common function somewhere and reuse it in both places? They are non-trivial and worth avoiding the duplication IMHO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I live by the Rule of three 😀

But more seriously, I started out with this, with some common private function on Directory, but this stuff shouldn't be on Directory so I relented.

My goal is to simplify this loop such that it doesn't need deduplication because it's simple enough (which will also help plugins to implement it correctly). This things that pop out are:

  • The __pycache__ check can move to pytest_ignore_collect -- slight performance hit but probably minor.
  • If we handle the ihook issue we can hoist it out.
  • If Path(direntry.path) is not super slow we can hoist it out
  • Maybe the isinitpath check can move to pytest_ignore_collect as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

@bluetech
Copy link
Member Author

bluetech commented Dec 2, 2023

Thanks for the review @nicoddemus, I pushed a commit addressing some your comments, and replied to the others (let me know if I missed any).

Copy link
Member

@nicoddemus nicoddemus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! There's one minor comment to address, but is minor and up to you. 👍

Great work on this, this clears up a lot of the code and makes the collection tree much more sane and predictable. 👍

src/_pytest/nodes.py Show resolved Hide resolved
Will be used in upcoming commit.
--- Current main

In current main (before pervious commit), calls to gethookproxy/ihook
are the trigger for loading non-initial conftests. This basically means
that conftest loading is done almost as a random side-effect,
uncontrolled and very non-obvious. And it also dashes any hope of making
gethookproxy faster (gethookproxy shows up prominently in pytest
profiles).

I've wanted to improve this for a while, pytest-dev#11268 was the latest step
towards that.

--- PR before change

In this PR, I ran into a problem.

Previously, Session and Package did all of the directory traversals
inside of their collect, which loaded the conftests as a side effect. If
the conftest loading failed, it will occur inside of the collect() and
cause it to be reported as a failure.

Now I've changed things such that Session.collect and Package.collect no
longer recurse, but just collect their immediate descendants, and
genitems does the recursive expansion work.

The problem though is that genitems() doesn't run inside of specific
collector's collect context. So when it loads a conftest, and the
conftest loading fails, the exception isn't handled by any CollectReport
and causes an internal error instead.

The way I've fixed this problem is by loading the conftests eagerly in a
pytest_collect_directory post-wrapper, but only during genitems to make
sure the directory is actually selected.

This solution in turn caused the conftests to be collected too early;
specifically, the plugins are loaded during the parent's collect(), one
after the other as the directory entries are collected. So when the
ihook is hoisted out of the loop, new plugins are loaded inside the
loop, and due to the way the hook proxy works, they are added to the
ihook even though they're supposed to be scoped to the child collectors.
So no hoisting.

--- PR after change

Now I've come up with a better solution: since now the collection tree
actually reflects the filesystem tree, what we really want is to load
the conftest of a directory right before we run its collect(). A
conftest can affect a directory's collect() (e.g. with a
pytest_ignore_collect hookimpl), but it cannot affect how the directory
node itself is collected. So I just moved the conftest loading to be
done right before calling collect, but still inside the CollectReport
context.

This allows the hoisting, and also removes conftest loading from
gethookproxy since it's no longer necessary. And it will probably enable
further cleanups. So I'm happy with it.
@bluetech
Copy link
Member Author

Had to rebase due to some conflicts. I left the "Different fix for conftest loading" separate, I think it will be useful to have the diff in the history.

@bluetech
Copy link
Member Author

@RonnyPfannschmidt do you intend to review this or should I just merge it? It's currently holding the pytest 8.0 release so it would be good to resolve it one way or the other.

@RonnyPfannschmidt
Copy link
Member

Im Still down with a Cold, so don't block on me

@bluetech
Copy link
Member Author

OK, hope you'll feel better soon!

@BeyondEvil
Copy link

This PR made some of my tests start to fail with PermissionError: [Errno 13] Permission denied: '/tmp/snap-private-tmp/__init__.py'

https://github.com/pytest-dev/pytest-metadata/actions/runs/7365057317/job/20046008077

The tests pass against c7ee5599, but fail on this merge.

Running it locally I also get some warnings:

warnings

=============================== warnings summary ===============================
../../../../../../../../../Users/jimbrannlund/dev/pytest/pytest/src/_pytest/config/__init__.py:1211
  /Users/jimbrannlund/dev/pytest/pytest/src/_pytest/config/__init__.py:1211: PytestAssertRewriteWarning: Module already imported so cannot be rewritten: pytest_metadata
    self._mark_plugins_for_rewrite(hook)

../../../../../../../../../Users/jimbrannlund/dev/pytest/pytest/src/_pytest/config/__init__.py:1211
  /Users/jimbrannlund/dev/pytest/pytest/src/_pytest/config/__init__.py:1211: PytestAssertRewriteWarning: Module already imported so cannot be rewritten: tests
    self._mark_plugins_for_rewrite(hook)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

I tried to figure out exactly where it goes wrong, but failed. It's entirely possible there's something wrong in my plugin that was just surfaced by the changes in this PR.

Please let me know what information I can contribute with.

Ping @bluetech

@bluetech
Copy link
Member Author

bluetech commented Jan 6, 2024

@BeyondEvil thanks for testing, I'll check it. Can you report it as a separate issue please?

@BeyondEvil
Copy link

BeyondEvil commented Jan 6, 2024

@BeyondEvil thanks for testing, I'll check it. Can you report it as a separate issue please?

Sure, I wasn't sure if I should do that given it's not a release, just a commit off main branch. 😅

@BeyondEvil
Copy link

#11781

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants