Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache: Crash on Windows when using pytest-xdist #12671

Closed
criemen opened this issue Jul 30, 2024 · 7 comments
Closed

Cache: Crash on Windows when using pytest-xdist #12671

criemen opened this issue Jul 30, 2024 · 7 comments
Labels
stale status: needs information reporter needs to provide more information; can be closed after 2 or more weeks of inactivity

Comments

@criemen
Copy link
Contributor

criemen commented Jul 30, 2024

pytest: 8.3.2
OS: Windows Server 2022 (GH Actions)

We're seeing the following spurious failure on Windows:

pytest -n auto -vv --durations=15 --durations-min=5 --codeql=built --shard-id=$((${SHARD%/*} - 1)) --num-shards=${SHARD#*/} integration-tests/all-platforms/python
============================= test session starts =============================
platform win32 -- Python 3.12.4, pytest-8.3.2, pluggy-1.5.0 -- C:\Users\runneradmin\AppData\Local\pypoetry\Cache\virtualenvs\non-package-mode-kOw55a-5-py3.12\Scripts\python.exe
cachedir: .pytest_cache
rootdir: C:\a\semmle-code\semmle-code
configfile: pyproject.toml
plugins: shard-0.1.2, timeout-2.3.1, xdist-3.6.1
timeout: 1200.0s
timeout method: thread
timeout func_only: False
created: 8/8 workers
collected 1 item
Running 1 items in this shard: integration-tests/all-platforms/python/database-create/test.py::test
codeql: v2.18.2+202407301318 (eded3f82a27ca3c201d6512c071e1956af30cce2) at C:\a\semmle-code\semmle-code\target\intree\codeql  pytest  dist\codeql.EXE
8 workers [1 item]

scheduling tests via LoadScheduling

=================================== ERRORS ====================================
________________________ ERROR collecting test session ________________________
<frozen genericpath>:112: in samefile
    ???
E   FileNotFoundError: [WinError 2] The system cannot find the file specified: 'C:\\a\\semmle-code\\semmle-code\\pytest-cache-files-u0joql63'
____________________________ ERROR collecting gw3 _____________________________
Different tests were collected between gw0 and gw3. The difference is:
--- gw0

+++ gw3

@@ -1 +0,0 @@

-integration-tests/all-platforms/python/database-create/test.py::test
To see why this happens see Known limitations in documentation
=========================== short test summary info ===========================
ERROR  - FileNotFoundError: [WinError 2] The system cannot find the file specified: 'C:\\a\\semmle-code\\semmle-code\\pytest-cache-files-u0joql63'
ERROR gw3
============================== 2 errors in 2.63s ==============================
Error: Process completed with exit code 1.

Note that we're collecting one test (so test collection is fast), and we have code that is creating cache entries at the same time.
Probably, this is in the same area as #12580 (which I filed and fixed), but it's not entirely clear to me what's happening here.

I'm mainly looking for advice how to get a proper stacktrace here, then I'm happy to investigate further on my own.

@nicoddemus
Copy link
Member

nicoddemus commented Jul 30, 2024

This error means that the two workers collected a different set of tests... this usually is caused by some error occurring during collection.

creating cache entries at the same time

Can you elaborate on this? Is that related to that shard plugin which appears in the terminal?

If that plugin does something during collection/initialization, it probably needs to be adjusted in order to account for xdist collecting workers in parallel.

@nicoddemus nicoddemus added the status: needs information reporter needs to provide more information; can be closed after 2 or more weeks of inactivity label Jul 30, 2024
@criemen
Copy link
Contributor Author

criemen commented Jul 30, 2024

The problem I see is that one worker crashes with

E   FileNotFoundError: [WinError 2] The system cannot find the file specified: 'C:\\a\\semmle-code\\semmle-code\\pytest-cache-files-u0joql63'

which is (presumably) during a call to this function (the directory name matches what that code is doing). I don't see which function there would be throwing a FileNotFoundError though, hence my question of how to get a better backtrace.

As one of the worker crashes with this error, it doesn't finish test collection, and therefore reports the second error, but the root cause is that cache initialization doesn't work for some reason. I fixed one problem related to that in #12580 already.

We are indeed doing things in the worker init/during test collection using the pytest cache in our conftest.py, but as far as I'm aware, none of that is unsafe in conjunction with xdist.

@RonnyPfannschmidt
Copy link
Member

More details needed to figure if we hit a platform specific race condition

Copy link
Contributor

This issue is stale because it has the status: needs information label and requested follow-up information was not provided for 14 days.

@github-actions github-actions bot added the stale label Aug 14, 2024
@criemen
Copy link
Contributor Author

criemen commented Aug 14, 2024

The easy way to reproduce this doesn't work https://github.com/criemen/pytest-crash-win (i.e. it doesn't crash), so I'll need to put in some more effort into distilling down what we're doing internally to an external reproducer.

@nicoddemus nicoddemus removed the stale label Aug 14, 2024
Copy link
Contributor

This issue is stale because it has the status: needs information label and requested follow-up information was not provided for 14 days.

@github-actions github-actions bot added the stale label Aug 29, 2024
Copy link
Contributor

github-actions bot commented Sep 5, 2024

This issue was closed because it has the status: needs information label and follow-up information has not been provided for 7 days since being marked as stale.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale status: needs information reporter needs to provide more information; can be closed after 2 or more weeks of inactivity
Projects
None yet
Development

No branches or pull requests

3 participants