Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent cross platform behavior when using multiprocessing in spawn mode #746

Closed
TCatshoek opened this issue Nov 14, 2022 · 5 comments
Labels
documentation Improvements or additions to documentation

Comments

@TCatshoek
Copy link
Contributor

TCatshoek commented Nov 14, 2022

Hi Delgan! First of all, thanks for the awesome library.

I'm currently developing a tool that is supposed to run on windows, linux and macos that uses multiprocessing, and to guarantee consistent behavior on all platforms we use multiprocessing in "spawn" mode on all platforms.

However, when using multiprocessing in spawn mode on linux, and setting up logging as suggested here, logging anything in the subprocesses seems to freeze or kill them.

The following minimal example should reproduce the issue on linux:

main.py:

import multiprocessing
import sys

from loguru import logger
import workers


def main():
    logger.remove()
    logger.add(sys.stderr, enqueue=True)

    with multiprocessing.get_context('spawn').Pool(4, initializer=workers.set_logger,
                                                   initargs=(logger,)) as pool:
        pool.map(
            workers.log_test,
            list(range(10))
        )


if __name__ == "__main__":
    main()

workers.py:

from loguru import logger

def set_logger(logger_):
    global logger
    logger = logger_


def log_test(num):
    print("before log")
    logger.info(f"hi {num}!")
    print("do we reach this?")

Only "before log" is printed, the log message is not and neither is the print statement after. I have also ran the same code on a windows machine, and it seems to work as expected there. Running in "fork" mode on linux also works as expected.

The subprocesses spawned by python seem to actually segfault, if it helps here's what coredumpctl says:

           PID: 21760 (python)
           UID: 1000 (tom)
           GID: 1000 (tom)
        Signal: 11 (SEGV)
     Timestamp: Mon 2022-11-14 15:57:36 CET (1min 22s ago)
  Command Line: /home/tom/projects/loguru_report/venv/bin/python -c $'from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=16, pipe_handle=26)' --multiprocessing-fork
    Executable: /usr/bin/python3.10
  Size on Disk: 2.4M
       Message: Process 21760 (python) of user 1000 dumped core.
                
                Stack trace of thread 21760:
                #0  0x00007f60404a7724 sem_trywait (libc.so.6 + 0x8e724)
                #1  0x00007f6040ad9514 n/a (_multiprocessing.cpython-310-x86_64-linux-gnu.so + 0x2514)
                #2  0x00007f6040758204 n/a (libpython3.10.so.1.0 + 0x158204)
                #3  0x00007f6040744f4a _PyEval_EvalFrameDefault (libpython3.10.so.1.0 + 0x144f4a)
                #4  0x00007f6040760da2 n/a (libpython3.10.so.1.0 + 0x160da2)
                #5  0x00007f60407472da _PyEval_EvalFrameDefault (libpython3.10.so.1.0 + 0x1472da)
                #6  0x00007f6040755959 _PyFunction_Vectorcall (libpython3.10.so.1.0 + 0x155959)
                #7  0x00007f6040744f4a _PyEval_EvalFrameDefault (libpython3.10.so.1.0 + 0x144f4a)
                #8  0x00007f6040755959 _PyFunction_Vectorcall (libpython3.10.so.1.0 + 0x155959)
                #9  0x00007f6040744f4a _PyEval_EvalFrameDefault (libpython3.10.so.1.0 + 0x144f4a)
                #10 0x00007f6040755959 _PyFunction_Vectorcall (libpython3.10.so.1.0 + 0x155959)
                #11 0x00007f6040744f4a _PyEval_EvalFrameDefault (libpython3.10.so.1.0 + 0x144f4a)
                #12 0x00007f6040755959 _PyFunction_Vectorcall (libpython3.10.so.1.0 + 0x155959)
                #13 0x00007f6040744f4a _PyEval_EvalFrameDefault (libpython3.10.so.1.0 + 0x144f4a)
                #14 0x00007f6040755959 _PyFunction_Vectorcall (libpython3.10.so.1.0 + 0x155959)
                #15 0x00007f60407ac9c9 n/a (libpython3.10.so.1.0 + 0x1ac9c9)
                #16 0x00007f6040783f95 n/a (libpython3.10.so.1.0 + 0x183f95)
                #17 0x00007f60407b688b n/a (libpython3.10.so.1.0 + 0x1b688b)
                #18 0x00007f6040744b16 _PyEval_EvalFrameDefault (libpython3.10.so.1.0 + 0x144b16)
                #19 0x00007f6040755959 _PyFunction_Vectorcall (libpython3.10.so.1.0 + 0x155959)
                #20 0x00007f6040747585 _PyEval_EvalFrameDefault (libpython3.10.so.1.0 + 0x147585)
                #21 0x00007f6040755959 _PyFunction_Vectorcall (libpython3.10.so.1.0 + 0x155959)
                #22 0x00007f6040747585 _PyEval_EvalFrameDefault (libpython3.10.so.1.0 + 0x147585)
                #23 0x00007f6040755959 _PyFunction_Vectorcall (libpython3.10.so.1.0 + 0x155959)
                #24 0x00007f6040744f4a _PyEval_EvalFrameDefault (libpython3.10.so.1.0 + 0x144f4a)
                #25 0x00007f6040755959 _PyFunction_Vectorcall (libpython3.10.so.1.0 + 0x155959)
                #26 0x00007f6040744f4a _PyEval_EvalFrameDefault (libpython3.10.so.1.0 + 0x144f4a)
                #27 0x00007f6040755959 _PyFunction_Vectorcall (libpython3.10.so.1.0 + 0x155959)
                #28 0x00007f6040744b16 _PyEval_EvalFrameDefault (libpython3.10.so.1.0 + 0x144b16)
                #29 0x00007f6040755959 _PyFunction_Vectorcall (libpython3.10.so.1.0 + 0x155959)
                #30 0x00007f6040745c56 _PyEval_EvalFrameDefault (libpython3.10.so.1.0 + 0x145c56)
                #31 0x00007f6040743760 n/a (libpython3.10.so.1.0 + 0x143760)
                #32 0x00007f60407f0e04 PyEval_EvalCode (libpython3.10.so.1.0 + 0x1f0e04)
                #33 0x00007f60408015b3 n/a (libpython3.10.so.1.0 + 0x2015b3)
                #34 0x00007f60407fcd0a n/a (libpython3.10.so.1.0 + 0x1fcd0a)
                #35 0x00007f60407f4031 PyRun_StringFlags (libpython3.10.so.1.0 + 0x1f4031)
                #36 0x00007f60407f3f90 PyRun_SimpleStringFlags (libpython3.10.so.1.0 + 0x1f3f90)
                #37 0x00007f604080dad9 Py_RunMain (libpython3.10.so.1.0 + 0x20dad9)
                #38 0x00007f60407e25eb Py_BytesMain (libpython3.10.so.1.0 + 0x1e25eb)
                #39 0x00007f604043c290 n/a (libc.so.6 + 0x23290)
                #40 0x00007f604043c34a __libc_start_main (libc.so.6 + 0x2334a)
                #41 0x000056101580d045 _start (python3.10 + 0x1045)
                ELF object binary architecture: AMD x86-64

Version info:

Python 3.10.8

Package    Version
---------- -------
loguru     0.6.0
pip        22.2.2
setuptools 63.2.0
@Delgan
Copy link
Owner

Delgan commented Nov 15, 2022

Hey @TCatshoek, glad you like Loguru, and thanks for the fully reproducible example!

So, I haven't investigated much yet, but a wild guess would be that the Loguru internal multiprocessing.SingleQueue() (required when enqueue=True) is created with the default context and thus can't be used from a different context. In your case, on Linux, it's created with "fork" context when logger.add() is called but it tries to be used with "spawn" context from inside the Pool, which causes a crash.

Quoting the multiprocessing documentation:

Note that objects related to one context may not be compatible with processes for a different context. In particular, locks created using the fork context cannot be passed to processes started using the spawn or forkserver start methods.

A quick workaround is to use multiprocessing.set_start_method() before calling logger.add().

For now, I don't think of any better solution. I'll think about it but I'm not sure it's even fixable from within Loguru.

@TCatshoek
Copy link
Contributor Author

Awesome! Thanks for getting back to me so quickly. It didn't occur to me that I was setting things up in a different context than the one that was being used by the subprocesses.

Changing main.py to:

import multiprocessing
import sys

from loguru import logger
import workers


def main():
    logger.remove()
    logger.add(sys.stderr, enqueue=True)

    with multiprocessing.Pool(4, initializer=workers.set_logger,
                                                   initargs=(logger,)) as pool:
        pool.map(
            workers.log_test,
            list(range(10))
        )


if __name__ == "__main__":
    multiprocessing.set_start_method("spawn")
    main()

Does indeed fix the problem!
Thanks for your help. Would it be nice to add a little note of this in the documentation? I could make a PR if you want.

@Delgan
Copy link
Owner

Delgan commented Nov 15, 2022

Yeah, adding a note in the documentation is the very least we can do. PR is welcome for sure! :)

@Delgan Delgan added the documentation Improvements or additions to documentation label Nov 15, 2022
@TCatshoek
Copy link
Contributor Author

Added the note in #750!

@Delgan
Copy link
Owner

Delgan commented Apr 22, 2023

For information I implemented a new context argument to logger.add() because I was dissatisfied with the multiprocessing.set_start_method() workaround.
See #851.

It will be available when I release the next Loguru version, and should be used this way (start method name also accepted as str):

import multiprocessing

context = multiprocessing.get_context("spawn")

logger.add("file.log", enqueue=True, context=context)  # Use "spawn" instead of default "fork" on Linux.

with context.Pool(4) as pool:
    ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants