docs: rewrite the subprocess page

Now multiprocessing is first, with an example of how to use Pool properly.
nedbat · Dec 26, 2024 · 81c5e43 · 81c5e43
1 parent 878410c
commit 81c5e43
Show file tree

Hide file tree

Showing 12 changed files with 85 additions and 67 deletions.
diff --git a/CHANGES.rst b/CHANGES.rst
@@ -37,6 +37,10 @@ Unreleased
   understand the problem or the solution, but ``git bisect`` helped find it,
   and now it's fixed.
 
+- Docs: re-wrote the :ref:`subprocess` page to put multiprocessing first and to
+  highlight the correct use of :class:`multiprocessing.Pool
+  <python:multiprocessing.pool.Pool>`.
+
 .. _issue 1874: https://github.com/nedbat/coveragepy/issues/1874
 .. _issue 1875: https://github.com/nedbat/coveragepy/issues/1875
 .. _issue 1902: https://github.com/nedbat/coveragepy/issues/1902

diff --git a/Makefile b/Makefile
@@ -255,6 +255,7 @@ cogdoc: $(DOCBIN)			## Run docs through cog.
 
 dochtml: cogdoc $(DOCBIN)		## Build the docs HTML output.
 	$(SPHINXBUILD) -b html doc doc/_build/html
+	@echo "Start at: doc/_build/html/index.html"
 
 docdev: dochtml				## Build docs, and auto-watch for changes.
 	PATH=$(DOCBIN):$(PATH) $(SPHINXAUTOBUILD) -b html doc doc/_build/html

diff --git a/coverage/control.py b/coverage/control.py
@@ -301,7 +301,7 @@ def __init__(                       # pylint: disable=too-many-arguments
             context=context,
         )
 
-        # If we have sub-process measurement happening automatically, then we
+        # If we have subprocess measurement happening automatically, then we
         # want any explicit creation of a Coverage object to mean, this process
         # is already coverage-aware, so don't auto-measure it.  By now, the
         # auto-creation of a Coverage object has already happened.  But we can

diff --git a/coverage/multiproc.py b/coverage/multiproc.py
@@ -94,7 +94,7 @@ def patch_multiprocessing(rcfile: str) -> None:
 
     # When spawning processes rather than forking them, we have no state in the
     # new process.  We sneak in there with a Stowaway: we stuff one of our own
-    # objects into the data that gets pickled and sent to the sub-process. When
+    # objects into the data that gets pickled and sent to the subprocess. When
     # the Stowaway is unpickled, its __setstate__ method is called, which
     # re-applies the monkey-patch.
     # Windows only spawns, so this is needed to keep Windows working.

diff --git a/doc/changes.rst b/doc/changes.rst
@@ -1060,12 +1060,12 @@ Work from the PyCon 2016 Sprints!
 - The ``concurrency`` option can now take multiple values, to support programs
   using multiprocessing and another library such as eventlet.  This is only
   possible in the configuration file, not from the command line. The
-  configuration file is the only way for sub-processes to all run with the same
+  configuration file is the only way for subprocesses to all run with the same
   options.  Fixes `issue 484`_.  Thanks to Josh Williams for prototyping.
 
 - Using a ``concurrency`` setting of ``multiprocessing`` now implies
   ``--parallel`` so that the main program is measured similarly to the
-  sub-processes.
+  subprocesses.
 
 - When using `automatic subprocess measurement`_, running coverage commands
   would create spurious data files.  This is now fixed, thanks to diagnosis and

diff --git a/doc/cmd.rst b/doc/cmd.rst
@@ -176,10 +176,10 @@ You can combine multiple values for ``--concurrency``, separated with commas.
 You can specify ``thread`` and also one of ``eventlet``, ``gevent``, or
 ``greenlet``.
 
-If you are using ``--concurrency=multiprocessing``, you must set other options
-in the configuration file.  Options on the command line will not be passed to
-the processes that multiprocessing creates.  Best practice is to use the
-configuration file for all options.
+If you are using ``--concurrency=multiprocessing``, you must set your other
+options in the configuration file.  Options on the command line will not be
+passed to the processes that multiprocessing creates.  Best practice is to use
+the configuration file for all options.
 
 .. _multiprocessing: https://docs.python.org/3/library/multiprocessing.html
 .. _greenlet: https://greenlet.readthedocs.io/

diff --git a/doc/config.rst b/doc/config.rst
@@ -25,7 +25,7 @@ specification of options that are otherwise only available in the
 :ref:`API <api>`.
 
 Configuration files also make it easier to get coverage testing of spawned
-sub-processes.  See :ref:`subprocess` for more details.
+subprocesses.  See :ref:`subprocess` for more details.
 
 The default name for the configuration file is ``.coveragerc``, in the same
 directory coverage.py is being run in.  Most of the settings in the
@@ -443,11 +443,12 @@ need to know the source origin.
 
 (boolean, default False) if true, register a SIGTERM signal handler to capture
 data when the process ends due to a SIGTERM signal.  This includes
-:meth:`Process.terminate <python:multiprocessing.Process.terminate>`, and other
+:meth:`Process.terminate <python:multiprocessing.Process.terminate>` and other
 ways to terminate a process.  This can help when collecting data in usual
 situations, but can also introduce problems (see `issue 1310`_).
 
-Only on Linux and Mac.
+The signal handler is only registered on Linux and Mac.  On Windows, this
+setting has no effect.
 
 .. _issue 1310: https://github.com/nedbat/coveragepy/issues/1310
 

diff --git a/doc/subprocess.rst b/doc/subprocess.rst
@@ -3,57 +3,75 @@
 
 .. _subprocess:
 
-=======================
-Measuring sub-processes
-=======================
+======================
+Measuring subprocesses
+======================
 
-Complex test suites may spawn sub-processes to run tests, either to run them in
-parallel, or because sub-process behavior is an important part of the system
-under test. Measuring coverage in those sub-processes can be tricky because you
-have to modify the code spawning the process to invoke coverage.py.
+If your system under test spawns subprocesses, you'll have to take extra steps
+to measure coverage in those processes.  There are a few ways to ensure they
+get measured.  The approach you use depends on how you create the processes.
 
-There's an easier way to do it: coverage.py includes a function,
-:func:`coverage.process_startup` designed to be invoked when Python starts.  It
-examines the ``COVERAGE_PROCESS_START`` environment variable, and if it is set,
-begins coverage measurement. The environment variable's value will be used as
-the name of the :ref:`configuration file <config>` to use.
+No matter how your subprocesses are created, you will need the :ref:`parallel
+option <config_run_parallel>` to collect separate data for each process, and
+the :ref:`coverage combine <cmd_combine>` command to combine them together
+before reporting.
 
-.. note::
+To successfully write a coverage data file, the Python subprocess under
+measurement must shut down cleanly and have a chance for coverage.py to run its
+termination code.  It will do that when the process ends naturally, or when a
+SIGTERM signal is received.
 
-    The subprocess only sees options in the configuration file.  Options set on
-    the command line will not be used in the subprocesses.
+If your processes are ending with SIGTERM, you must enable the
+:ref:`config_run_sigterm` setting to configure coverage to catch SIGTERM
+signals and write its data.
+
+Other ways of ending a process, like SIGKILL or :func:`os._exit
+<python:os._exit>`, will prevent coverage.py from writing its data file,
+leaving you with incomplete or non-existent coverage data.
 
 .. note::
 
-    If you have subprocesses created with :mod:`multiprocessing
-    <python:multiprocessing>`, the ``--concurrency=multiprocessing``
-    command-line option should take care of everything for you.  See
-    :ref:`cmd_run` for details.
+    Subprocesses will only see coverage options in the configuration file.
+    Options set on the command line will not be visible to subprocesses.
+
+
+Using multiprocessing
+---------------------
 
-When using this technique, be sure to set the parallel option to true so that
-multiple coverage.py runs will each write their data to a distinct file.
+The :mod:`multiprocessing <python:multiprocessing>` module in the Python
+standard library provides high-level tools for managing subprocesses.  If you
+use it, the :ref:`concurrency=multiprocessing <config_run_concurrency>` and
+:ref:`sigterm <config_run_sigterm>` settings will configure coverage to measure
+the subprocesses.
 
+Even with multiprocessing, you have to be careful that all subprocesses
+terminate cleanly or they won't record their coverage measurements.  For
+example, the correct way to use a Pool requires closing and joining the pool
+before terminating::
 
-Configuring Python for sub-process measurement
-----------------------------------------------
+    with multiprocessing.Pool() as pool:
+        # ... use any of the pool methods ...
+        pool.close()
+        pool.join()
 
-Measuring coverage in sub-processes is a little tricky.  When you spawn a
-sub-process, you are invoking Python to run your program.  Usually, to get
-coverage measurement, you have to use coverage.py to run your program.  Your
-sub-process won't be using coverage.py, so we have to convince Python to use
-coverage.py even when not explicitly invoked.
 
-To do that, we'll configure Python to run a little coverage.py code when it
-starts.  That code will look for an environment variable that tells it to start
-coverage measurement at the start of the process.
+Implicit coverage
+-----------------
+
+If you are starting subprocesses another way, you can configure Python to start
+coverage when it runs.  Coverage.py includes a function designed to be invoked
+when Python starts: :func:`coverage.process_startup`.  It examines the
+``COVERAGE_PROCESS_START`` environment variable, and if it is set, begins
+coverage measurement. The environment variable's value will be used as the name
+of the :ref:`configuration file <config>` to use.
 
 To arrange all this, you have to do two things: set a value for the
 ``COVERAGE_PROCESS_START`` environment variable, and then configure Python to
 invoke :func:`coverage.process_startup` when Python processes start.
 
 How you set ``COVERAGE_PROCESS_START`` depends on the details of how you create
-sub-processes.  As long as the environment variable is visible in your
-sub-process, it will work.
+subprocesses.  As long as the environment variable is visible in your
+subprocess, it will work.
 
 You can configure your Python installation to invoke the ``process_startup``
 function in two ways:
@@ -84,17 +102,11 @@ start-up.  Be sure to remove the change when you uninstall coverage.py, or use
 a more defensive approach to importing it.
 
 
-Process termination
--------------------
-
-To successfully write a coverage data file, the Python sub-process under
-analysis must shut down cleanly and have a chance for coverage.py to run its
-termination code.  It will do that when the process ends naturally, or when a
-SIGTERM signal is received.
-
-Coverage.py uses :mod:`atexit <python:atexit>` to handle usual process ends,
-and a :mod:`signal <python:signal>` handler to catch SIGTERM signals.
+Explicit coverage
+-----------------
 
-Other ways of ending a process, like SIGKILL or :func:`os._exit
-<python:os._exit>`, will prevent coverage.py from writing its data file,
-leaving you with incomplete or non-existent coverage data.
+Another option for running coverage on your subprocesses it to run coverage
+explicitly as the command for your subprocess instead of using "python" as the
+command.  This isn't recommended, since it requires running different code
+when running coverage than when not, which can complicate your test
+environment.
diff --git a/igor.py b/igor.py
@@ -180,7 +180,7 @@ def run_tests_with_coverage(core, *runner_args):
             context = os.environ[context[1:]]
         os.environ["COVERAGE_CONTEXT"] = context + "." + core
 
-    # Create the .pth file that will let us measure coverage in sub-processes.
+    # Create the .pth file that will let us measure coverage in subprocesses.
     # The .pth file seems to have to be alphabetically after easy-install.pth
     # or the sys.path entries aren't created right?
     # There's an entry in "make clean" to get rid of this file.

diff --git a/tests/coveragetest.py b/tests/coveragetest.py
@@ -377,11 +377,11 @@ def command_line(self, args: str, ret: int = OK) -> None:
     coverage_command = "coverage"
 
     def run_command(self, cmd: str) -> str:
-        """Run the command-line `cmd` in a sub-process.
+        """Run the command-line `cmd` in a subprocess.
 
-        `cmd` is the command line to invoke in a sub-process. Returns the
+        `cmd` is the command line to invoke in a subprocess. Returns the
         combined content of `stdout` and `stderr` output streams from the
-        sub-process.
+        subprocess.
 
         See `run_command_status` for complete semantics.
 
@@ -394,7 +394,7 @@ def run_command(self, cmd: str) -> str:
         return output
 
     def run_command_status(self, cmd: str) -> tuple[int, str]:
-        """Run the command-line `cmd` in a sub-process, and print its output.
+        """Run the command-line `cmd` in a subprocess, and print its output.
 
         Use this when you need to test the process behavior of coverage.
 
@@ -420,7 +420,7 @@ def run_command_status(self, cmd: str) -> tuple[int, str]:
         command_args = split_commandline[1:]
 
         if command_name == "python":
-            # Running a Python interpreter in a sub-processes can be tricky.
+            # Running a Python interpreter in a subprocesses can be tricky.
             # Use the real name of our own executable. So "python foo.py" might
             # get executed as "python3.3 foo.py". This is important because
             # Python 3.x doesn't install as "python", so you might get a Python

diff --git a/tests/helpers.py b/tests/helpers.py
@@ -33,7 +33,7 @@
 
 
 def run_command(cmd: str) -> tuple[int, str]:
-    """Run a command in a sub-process.
+    """Run a command in a subprocess.
 
     Returns the exit status code and the combined stdout and stderr.
 

diff --git a/tests/test_process.py b/tests/test_process.py
@@ -606,7 +606,7 @@ def test_deprecation_warnings(self) -> None:
             """)
 
         # Some of our testing infrastructure can issue warnings.
-        # Turn it all off for the sub-process.
+        # Turn it all off for the subprocess.
         self.del_environ("COVERAGE_TESTING")
 
         out = self.run_command("python allok.py")
@@ -1197,9 +1197,9 @@ def test_removing_directory_with_error(self) -> None:
         assert all(line in out for line in lines)
 
 
-@pytest.mark.skipif(env.METACOV, reason="Can't test sub-process pth file during metacoverage")
+@pytest.mark.skipif(env.METACOV, reason="Can't test subprocess pth file during metacoverage")
 class ProcessStartupTest(CoverageTest):
-    """Test that we can measure coverage in sub-processes."""
+    """Test that we can measure coverage in subprocesses."""
 
     def setUp(self) -> None:
         super().setUp()