Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor parallelism utilities for public API #12412

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jakelishman
Copy link
Member

@jakelishman jakelishman commented May 15, 2024

Summary

should_run_in_parallel was added in a stable manner to enable backport to the 1.1 series, but from 1.3 onwards, we want this to be part of the public interface so that others can rely on it too.

As part of this, the parallelisation configuration was made more robust and controllable with context managers. This is convenient beyond just for users - it makes it far easier to control the parallelism during the test suite runs. Several instances where different parts of Qiskit and its test suite reached into deep internals of the parallelism utilities and made significant assumptions about the internal logic are refactored to use public interfaces to achieve what they wanted to.

The multiprocessing detection is changed from making OS-based assumptions about what Python does to simply querying the module for its configuration. This makes it more robust to changes in Python's handling (especially important since 3.14 will change the default start method on Unix). In the future, we may want to change to making these assumptions only if the user hasn't configured the multiprocessing start method themselves.

Details and comments

This was elided from #12410 to make that PR backwards compatible. This PR exposes the feature as part of the public API, so will be new for 1.2.

Depends on #12410.

@jakelishman jakelishman added on hold Can not fix yet Changelog: New Feature Include in the "Added" section of the changelog labels May 15, 2024
@jakelishman jakelishman added this to the 1.2.0 milestone May 15, 2024
@jakelishman jakelishman requested a review from a team as a code owner May 15, 2024 16:24
@qiskit-bot
Copy link
Collaborator

One or more of the the following people are requested to review this:

  • @Qiskit/terra-core

@coveralls
Copy link

coveralls commented May 15, 2024

Pull Request Test Coverage Report for Build 11633906056

Details

  • 81 of 93 (87.1%) changed or added relevant lines in 8 files are covered.
  • 8 unchanged lines in 3 files lost coverage.
  • Overall coverage increased (+0.01%) to 88.737%

Changes Missing Coverage Covered Lines Changed/Added Lines %
qiskit/circuit/quantumcircuit.py 2 3 66.67%
qiskit/transpiler/preset_passmanagers/builtin_plugins.py 1 2 50.0%
qiskit/utils/parallel.py 68 78 87.18%
Files with Coverage Reduction New Missed Lines %
crates/qasm2/src/expr.rs 1 94.02%
qiskit/user_config.py 1 86.87%
crates/qasm2/src/lex.rs 6 91.98%
Totals Coverage Status
Change from base Build 11632855301: 0.01%
Covered Lines: 76394
Relevant Lines: 86090

💛 - Coveralls

@jakelishman jakelishman force-pushed the parallel-check-public branch from d6d1da9 to 0bdea9c Compare May 15, 2024 17:40
@jakelishman jakelishman removed the on hold Can not fix yet label May 15, 2024
@jakelishman
Copy link
Member Author

Now rebased over #12410.

Comment on lines 6 to 7
decision is dependent on how many CPUs are available to Qiskit, what the :mod:`multiprocessing`
start method is, how many processes were requested.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean sort of, we don't explicitly check what multiprocessing is set to (ie we ignore if the user explicitly sets this to spawn), but the value for PARALLEL_DEFAULT is based on what the OS default start method is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think I was being more aspirational than correct here. We should do the check based on the multiprocessing start method if that's what we care about, but we don't. I can change the wording.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should just do this, it seems totally in scope to update the logic in the function to call multiprocessing.get_start_method() and get rid of the OS based logic here for 1.2.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should just do this, it seems totally in scope to update the logic in the function to call multiprocessing.get_start_method() and get rid of the OS based logic here for 1.2.

@mtreinish is this still something we ought to do prior to merging this for 1.3?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we still intended to do this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this, which then of course broke a bunch of weird other assumptions we were making thirty miles away in the library, so I ended up really rejigging the public API of the parallelism utilities so we didn't need to make internal assumptions across file boundaries.

I'm not entirely sold on the way I wrote the multiprocessing test (see the commit message) - I was trying to match the spirit of the previous assumption-heavy code, but I'm not certain we couldn't do a shade better.

@ElePT ElePT modified the milestones: 1.2.0, 1.3.0 Jul 30, 2024
Comment on lines 6 to 7
decision is dependent on how many CPUs are available to Qiskit, what the :mod:`multiprocessing`
start method is, how many processes were requested.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should just do this, it seems totally in scope to update the logic in the function to call multiprocessing.get_start_method() and get rid of the OS based logic here for 1.2.

@mtreinish is this still something we ought to do prior to merging this for 1.3?

@jakelishman jakelishman force-pushed the parallel-check-public branch from 0bdea9c to e4fea41 Compare November 1, 2024 17:36
@jakelishman jakelishman changed the title Add should_run_in_parallel to public API Refactor parallelism utilities for public API Nov 1, 2024
`should_run_in_parallel` was added in a stable manner to enable backport
to the 1.1 series, but from 1.3 onwards, we want this to be part of the
public interface so that others can rely on it too.

As part of this, the parallelisation configuration was made more robust
and controllable with context managers.  This is convenient beyond just
for users - it makes it far easier to control the parallelism during the
test suite runs.  Several instances where different parts of Qiskit and
its test suite reached into deep internals of the parallelism utilities
and made significant assumptions about the internal logic are refactored
to use public interfaces to achieve what they wanted to.

The multiprocessing detection is changed from making OS-based
assumptions about what Python does to simply querying the module for its
configuration.  This makes it more robust to changes in Python's
handling (especially important since 3.14 will change the default start
method on Unix).  In the future, we may want to change to making these
assumptions only if the user hasn't configured the `multiprocessing`
start method themselves.
@jakelishman
Copy link
Member Author

I've force-pushed a major new commit that properly refactored a bunch of the parallelism utilities to better support should_run_in_parallel at the level of robustness we'd expect from the public interface. This then removes a bunch of our own library code and test code that was reaching in the belly of the internals of the parallel code and layering assumptions on top of assumptions, and wraps the necessary components in safe interfaces.

I also added a QISKIT_IGNORE_USER_SETTINGS environment variable and configured it to be used in all tox and CI runs by default, which lets us isolate the test suite from the environment - that should make the new tests of what I added reliable, but it's probably something we should have had before anyway.

@jakelishman
Copy link
Member Author

Given how major the changes I made here were, and that we're after the feature freeze deadline, I'm fine if we choose to leave this for Qiskit 2.0.

@mtreinish mtreinish modified the milestones: 1.3.0, 2.0.0 Nov 6, 2024
@mtreinish
Copy link
Member

I agree, given the scope of the changes and that we're only one day out from rc1 lets defer this to 2.0. We can move forward with it pretty soon after rc1 though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: New Feature Include in the "Added" section of the changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants