Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-106176, GH-104702: Fix reference leak when importing across multiple threads #108497

Merged

Conversation

brettcannon
Copy link
Member

@brettcannon brettcannon commented Aug 25, 2023

When importing, import deadlock detection tracks which threads are importing which modules in a list. Prior to this change, a list was created per thread but never disposed of. Now, the list is disposed of properly using weakrefs to guarantee GC doesn't interrupt the cleanup.

Co-Authored-By: Kirill Podoprigora [email protected]

@brettcannon
Copy link
Member Author

!buildbot .*Windows11.*Refleaks

@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @brettcannon for commit 50e4c5d 🤖

The command will test the builders whose names match following regular expression: .*Windows11.*Refleaks

The builders matched are:

  • AMD64 Windows11 Refleaks PR

@Eclips4
Copy link
Member

Eclips4 commented Aug 25, 2023

This is probably what we should have done.
I'll take a closer look later today (UTC+3).
However, I'd like to hear @exarkun opinion if he has free time.

@brettcannon
Copy link
Member Author

FYI the Windows 11 refleak builder has not come back yet, so the green CI is a little misleading until we can get confirmation this PR doesn't leak under Windows.

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm now curious if it does fix the issue.

Lib/importlib/_bootstrap.py Outdated Show resolved Hide resolved
Lib/importlib/_bootstrap.py Outdated Show resolved Hide resolved
Lib/importlib/_bootstrap.py Outdated Show resolved Hide resolved
@vstinner
Copy link
Member

I'm still able to reproduce the leak in the main branch:

vstinner@WIN C:\victor\python\main>python -m test test_import -m test_concurrency -R 3:3

test_import leaked [38, 36, 40] references, sum=114
test_import leaked [38, 36, 40] memory blocks, sum=114

And with this PR: this is no leak, it works!

vstinner@WIN C:\victor\python\main>python -m test test_import -m test_concurrency -R 3:3
Running Debug|x64 interpreter...
0:00:00 Run tests sequentially
0:00:00 [1/1] test_import
beginning 6 repetitions
123456
......

== Tests result: SUCCESS ==

1 test OK.

Total duration: 7.7 sec
Tests result: SUCCESS

Well done @brettcannon and @Eclips4!

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM since it does fix the leak :-) I left a review, feel free to address it, or ignore my remarks ;-) Thanks for fixing Windows Refleaks buildbots!

Copy link
Member

@Eclips4 Eclips4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to all of Victor's words.
LGTM!
(I've check it on my windows setup and this solution seems fine, refleaks are gone)

@Eclips4
Copy link
Member

Eclips4 commented Aug 26, 2023

I would prefer to backport it to 3.12, because:

  1. Original issue was introduced in 3.12
  2. This problem looks like one of real situations which can happen.

@Eclips4
Copy link
Member

Eclips4 commented Aug 27, 2023

This also solves the #104702.
(and that's wonderful!)
For explanation why test_pickle is also affected by this problem, see my comment: #106207 (comment)

@brettcannon brettcannon added the needs backport to 3.12 bug and security fixes label Aug 28, 2023
@brettcannon
Copy link
Member Author

Thanks for the reviews and verification! I should be able to get this merged at worst by Friday, but hopefully sooner.

@brettcannon brettcannon changed the title GH-106176: Fix reference leak when importing across multiple threads GH-106176, GH-104702: Fix reference leak when importing across multiple threads Aug 28, 2023
@brettcannon
Copy link
Member Author

I addressed the review comments from Victor, updated against main, flagged this for 3.12, and now I'm waiting for CI to go green again before I merge.

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the update!

@vstinner vstinner merged commit 5f85b44 into python:main Aug 29, 2023
@miss-islington
Copy link
Contributor

Thanks @brettcannon for the PR, and @vstinner for merging it 🌮🎉.. I'm working now to backport this PR to: 3.12.
🐍🍒⛏🤖

@bedevere-bot
Copy link

GH-108612 is a backport of this pull request to the 3.12 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Aug 29, 2023
…cross multiple threads (pythonGH-108497)

(cherry picked from commit 5f85b44)

Co-authored-by: Brett Cannon <[email protected]>
@bedevere-bot bedevere-bot removed the needs backport to 3.12 bug and security fixes label Aug 29, 2023
@vstinner
Copy link
Member

@brettcannon:

I addressed the review comments from Victor, updated against main, flagged this for 3.12, and now I'm waiting for CI to go green again before I merge.

Thanks. I took the freedom of merging your PR. It became difficult to read buildbot results these days, since there are more and more known failures. This fix should reduce the number of failed buildbots!

@vstinner
Copy link
Member

Again, thanks for the fix @brettcannon and @Eclips4!

@bedevere-bot
Copy link

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot aarch64 Fedora Stable LTO 3.x has failed when building commit 5f85b44.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/all/#builders/336/builds/3857) and take a look at the build logs.
  4. Check if the failure is related to this commit (5f85b44) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/all/#builders/336/builds/3857

Failed tests:

  • test.test_concurrent_futures.test_shutdown

Failed subtests:

  • test_interpreter_shutdown - test.test_concurrent_futures.test_shutdown.ProcessPoolSpawnProcessPoolShutdownTest.test_interpreter_shutdown

Summary of the results of the build (if available):

== Tests result: FAILURE then FAILURE ==

448 tests OK.

10 slowest tests:

  • test_gdb: 4 min 17 sec
  • test.test_multiprocessing_spawn.test_processes: 1 min 14 sec
  • test.test_concurrent_futures.test_wait: 1 min 12 sec
  • test_socket: 1 min 10 sec
  • test_signal: 1 min
  • test_subprocess: 56.7 sec
  • test.test_multiprocessing_forkserver.test_processes: 52.9 sec
  • test_io: 44.7 sec
  • test.test_multiprocessing_spawn.test_misc: 43.5 sec
  • test.test_asyncio.test_tasks: 41.3 sec

1 test failed:
test.test_concurrent_futures.test_shutdown

14 tests skipped:
test.test_asyncio.test_windows_events
test.test_asyncio.test_windows_utils test_devpoll test_ioctl
test_kqueue test_launcher test_startfile test_tkinter test_ttk
test_winconsoleio test_winreg test_winsound test_wmi
test_zipfile64

1 re-run test:
test.test_concurrent_futures.test_shutdown

Total duration: 4 min 44 sec

Click to see traceback logs
Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-aarch64.lto/build/Lib/test/test_concurrent_futures/test_shutdown.py", line 49, in test_interpreter_shutdown
    self.assertFalse(err)
AssertionError: b'Exception in thread Thread-1:\nTraceback (most recent call last):\n  File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-aarch64.lto/build/Lib/threading.py", line 1059, in _bootstrap_inner\n    self.run()\n  File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-aarch64.lto/build/Lib/concurrent/futures/process.py", line 339, in run\n    self.add_call_item_to_queue()\n  File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-aarch64.lto/build/Lib/concurrent/futures/process.py", line 394, in add_call_item_to_queue\n    self.call_queue.put(_CallItem(work_id,\n  File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-aarch64.lto/build/Lib/multiprocessing/queues.py", line 94, in put\n    self._start_thread()\n  File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-aarch64.lto/build/Lib/multiprocessing/queues.py", line 177, in _start_thread\n    self._thread.start()\n  File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-aarch64.lto/build/Lib/threading.py", line 978, in start\n    _start_new_thread(self._bootstrap, ())\nRuntimeError: can\'t create new thread at interpreter shutdown\nTraceback (most recent call last):\n  File "<string>", line 1, in <module>\n  File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-aarch64.lto/build/Lib/multiprocessing/spawn.py", line 122, in spawn_main\n    exitcode = _main(fd, parent_sentinel)\n               ^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-aarch64.lto/build/Lib/multiprocessing/spawn.py", line 132, in _main\n    self = reduction.pickle.load(from_parent)\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File "/home/buildbot/buildarea/3.x.cstratak-fedora-stable-aarch64.lto/build/Lib/multiprocessing/synchronize.py", line 115, in __setstate__\n    self._semlock = _multiprocessing.SemLock._rebuild(*state)\n                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nFileNotFoundError: [Errno 2] No such file or directory\n' is not false

Yhg1s pushed a commit that referenced this pull request Aug 29, 2023
… multiple threads (GH-108497) (#108612)

GH-106176, GH-104702: Fix reference leak when importing across multiple threads (GH-108497)
(cherry picked from commit 5f85b44)

Co-authored-by: Brett Cannon <[email protected]>
# Dictionary protected by the global import lock
# For a list that can have a weakref to it.
class _List(list):
pass
Copy link
Member

@Yhg1s Yhg1s Aug 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I think it would be nice to use __slots__ for this class, to reduce the size of each instance (especially since the setdefault call creates one each time, even when it isn't necessary).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in you're going to make a PR to make that change, or you're hoping someone else will open an issue on your behalf to track the idea?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latter (although not on my behalf) -- I don't know if there's a reason not to do it, and I don't have the spare cycles to dig in and find out.

@brettcannon brettcannon deleted the issue-106176-_blocking_on-WeakValueDictionary branch August 29, 2023 17:44
@Yhg1s
Copy link
Member

Yhg1s commented Sep 18, 2023

FWIW, this does not seem to have resolved the refleak in test_import in 3.12 (see e.g. https://buildbot.python.org/all/#/builders/1125/builds/115) even though it does seem to have fixed them in main. Does anyone want to take a look at why test_import is still leaking in 3.12?

@Yhg1s
Copy link
Member

Yhg1s commented Sep 18, 2023

Nevermind, the leak was from a different issue (still not resolved on main, but more cleverly hidden).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants