Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I4087-multiprocessing-2 #4449

Merged
merged 15 commits into from
Sep 18, 2024
Merged

I4087-multiprocessing-2 #4449

merged 15 commits into from
Sep 18, 2024

Conversation

martinjrobins
Copy link
Contributor

@martinjrobins martinjrobins commented Sep 17, 2024

Description

  • idaklu solver can run multiple simulations in parallel using openmp
  • Previously this was done using Python multiprocessing, but this has been removed and instead a list of inputs can be passed directly to the _integrate function of the idaklu solver, the solver then runs these simulations in parallel using openmp

Partially fixes #4087 (just for IDAKLU solver). This is a redo of some of #4260 (plan is to split this old PR into multiple smaller PRs, starting with the openmp parallisation of the IDAKLU solver).

Type of change

Please add a line in the relevant section of CHANGELOG.md to document the change (include PR #) - note reverse order of PR #s. If necessary, also add to the list of breaking changes.

  • New feature (non-breaking change which adds functionality)

Key checklist:

  • No style issues: $ pre-commit run (or $ nox -s pre-commit) (see CONTRIBUTING.md for how to set this up to run automatically when committing locally, in just two lines of code)
  • All tests pass: $ python run-tests.py --all (or $ nox -s tests)
  • The documentation builds: $ python run-tests.py --doctest (or $ nox -s doctests)

You can run integration tests, unit tests, and doctests together at once, using $ python run-tests.py --quick (or $ nox -s quick).

Further checks:

  • Code is commented, particularly in hard-to-understand areas
  • Tests added that prove fix is effective or that feature works

Copy link

codecov bot commented Sep 17, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.46%. Comparing base (f47c12e) to head (727efe2).
Report is 186 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #4449      +/-   ##
===========================================
+ Coverage    99.42%   99.46%   +0.04%     
===========================================
  Files          292      292              
  Lines        22340    22350      +10     
===========================================
+ Hits         22211    22230      +19     
+ Misses         129      120       -9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@agriyakhetarpal
Copy link
Member

@martinjrobins, I'm unsure how the Homebrew OpenMP detection on macOS was working before this PR (my guess is that CVODE was able to find it on its own, but maybe since we now use the header directly, we need to pass its path?). Could you also trigger the wheel build from this branch? We use a different libomp.dylib there rather than the one from Homebrew, so that the wheels don't receive a higher macOS deployment target.

@MarcBerliner
Copy link
Member

I'm unsure how the Homebrew OpenMP detection on macOS was working before this PR (my guess is that CVODE was able to find it on its own, but maybe since we now use the header directly, we need to pass its path?). Could you also trigger the wheel build from this branch? We use a different libomp.dylib there rather than the one from Homebrew, so that the wheels don't receive a higher macOS deployment target.

I think I'm running into this issue. I'm trying to install this on macOS, and I get the error

/PyBaMM/src/pybamm/solvers/c_solvers/idaklu/IDAKLUSolverGroup.cpp:2:10: fatal error: 'omp.h' file not found
      #include <omp.h>

@agriyakhetarpal
Copy link
Member

@MarcBerliner, I think passing -I$(brew --prefix libomp)/include should work

@agriyakhetarpal
Copy link
Member

agriyakhetarpal commented Sep 18, 2024

CXXFLAGS="-I $(brew --prefix libomp)/include" nox -s dev works for me locally. I guess we should set OpenMP_ROOT (or CXXFLAGS) in the noxfile when running on macOS, so that we get passing builds for both CI tests and locally.

Copy link
Member

@MarcBerliner MarcBerliner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @martinjrobins, this will be really nice for optimization.

One high-level question: how does this handle solver convergence errors? If we batch several simulations, we don't want a single ill-conditioned set of parameters to ruin the whole batch with an exception. Do we always receive a solution for the simulations that converge?

Also, this fix to the noxfile.py solved my local build issue with macOS (thanks @agriyakhetarpal):

PYBAMM_ENV = {
    "LD_LIBRARY_PATH": f"{homedir}/.local/lib",
    "PYTHONIOENCODING": "utf-8",
    "MPLBACKEND": "Agg",
    # Expression evaluators (...EXPR_CASADI cannot be fully disabled at this time)
    "PYBAMM_IDAKLU_EXPR_CASADI": os.getenv("PYBAMM_IDAKLU_EXPR_CASADI", "ON"),
    "PYBAMM_IDAKLU_EXPR_IREE": set_iree_state(),
    "IREE_INDEX_URL": os.getenv(
        "IREE_INDEX_URL", "https://iree.dev/pip-release-links.html"
    ),
    "CXXFLAGS": f"-I {os.popen('brew --prefix libomp').read().strip()}/include",
}

src/pybamm/solvers/base_solver.py Outdated Show resolved Hide resolved
src/pybamm/solvers/base_solver.py Outdated Show resolved Hide resolved
@@ -31,8 +31,8 @@
#include <pybind11/stl.h>

namespace py = pybind11;
using np_array = py::array_t<realtype>;
using np_array_dense = py::array_t<realtype, py::array::c_style | py::array::forcecast>;
// note: we rely on c_style ordering for numpy arrays so don't change this!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is outside the scope of this PR, but we should be more careful about this in pybamm. We transpose the y when we store it with the python Solution object, so it's f-contiguous there. Not a big deal, but it'd make life simpler to keep it consistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I had to write the code assuming a particular ordering hence the specification as c_style. If an f_style ordered array is passed in then it gets converted to c_style by pybind11 (according to their docs: https://pybind11.readthedocs.io/en/stable/advanced/pycpp/numpy.html). This isn't ideal since it would involve a copy, but I wasn't sure about a better way of doing it. Let me know if you have any ideas

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I didn't realize pybind11 made a copy here. IMO, I think we should stick with c_style and make python-side changes to y to avoid the transpose and change its ordering from y[time, var] to y[var, time]

src/pybamm/solvers/idaklu_solver.py Show resolved Hide resolved
@martinjrobins
Copy link
Contributor Author

CXXFLAGS="-I $(brew --prefix libomp)/include" nox -s dev works for me locally. I guess we should set OpenMP_ROOT (or CXXFLAGS) in the noxfile when running on macOS, so that we get passing builds for both CI tests and locally.

Yea, I put this into the CI and it works fine (see last commit). Let me know if that is the best way to do it?

@martinjrobins
Copy link
Contributor Author

One high-level question: how does this handle solver convergence errors?

It should be the same behaviour as currently. If the retval from IDASolve indicates it has failed the individual solver will stop early and return. All the other solves, if they converge ok, carry on and solve for the complete duration and return normally. The retval is stored in the solution so the user can tell if that individual solve has failed or not.

@agriyakhetarpal
Copy link
Member

CXXFLAGS="-I $(brew --prefix libomp)/include" nox -s dev works for me locally. I guess we should set OpenMP_ROOT (or CXXFLAGS) in the noxfile when running on macOS, so that we get passing builds for both CI tests and locally.

Yea, I put this into the CI and it works fine (see last commit). Let me know if that is the best way to do it?

Yes. Though, this would fail if someone tries to install without our nox -s dev helper (i.e., with pip install -e .). Could we include this in our CMake subprocess call in setup.py? That should work everywhere.

I'm off to a conference this week, so I haven't re-reviewed #4352 fully yet. We'll have to think how to do it there, because setup.py would be removed with that PR.

@MarcBerliner
Copy link
Member

It should be the same behaviour as currently. If the retval from IDASolve indicates it has failed the individual solver will stop early and return. All the other solves, if they converge ok, carry on and solve for the complete duration and return normally. The retval is stored in the solution so the user can tell if that individual solve has failed or not.

It looks like we throw an error for any simulation failure in _post_process_solution, so the user wouldn't be able to examine the retvals

if sol.flag == 2:
termination = "event"
elif sol.flag >= 0:
termination = "final time"
else:
raise pybamm.SolverError(f"FAILURE {self._solver_flag(sol.flag)}")

We could change this to a warning (or make error vs. warning a solver option).

@martinjrobins
Copy link
Contributor Author

We could change this to a warning (or make error vs. warning a solver option).

hmmm, this would be a big change to the API if someone was relying on the solve throwing an error if it failed. Plus the other solvers raise errors as well (even for multiple input parameters). I agree it would sometime be better to carry on if a solve fails though, but I'd say this would be a different issue (and probably one to discuss at the developers meeting)

@martinjrobins
Copy link
Contributor Author

Yes. Though, this would fail if someone tries to install without our nox -s dev helper (i.e., with pip install -e .). Could we include this in our CMake subprocess call in setup.py? That should work everywhere.

I'll see if I can do it in CMakeLists.txt, since we're getting rid of setup.py

@MarcBerliner
Copy link
Member

hmmm, this would be a big change to the API if someone was relying on the solve throwing an error if it failed. Plus the other solvers raise errors as well (even for multiple input parameters). I agree it would sometime be better to carry on if a solve fails though, but I'd say this would be a different issue (and probably one to discuss at the developers meeting)

Yeah, if this behavior is consistent with the other solvers' multiprocessing, then let's table this for now and bring this up at the meeting. Otherwise LGTM (besides the build issue)

@kratman
Copy link
Contributor

kratman commented Sep 18, 2024

@martinjrobins Did you trigger the wheel build to make sure that works?

@martinjrobins martinjrobins merged commit 48dbb68 into develop Sep 18, 2024
25 checks passed
@martinjrobins martinjrobins deleted the i4087-multiprocessing-2 branch September 18, 2024 21:21
MarcBerliner added a commit to MarcBerliner/PyBaMM that referenced this pull request Sep 22, 2024
commit d362c98
Merge: 7be637c f5717ff
Author: Marc Berliner <[email protected]>
Date:   Fri Sep 20 10:27:14 2024 -0400

    Merge pull request pybamm-team#4440 from pipliggins/output_vars_extrapolation

    Fix IDAKLU output_variables to work with extrapolation events

commit f5717ff
Merge: b012684 7be637c
Author: Marc Berliner <[email protected]>
Date:   Fri Sep 20 09:21:26 2024 -0400

    Merge branch 'develop' into output_vars_extrapolation

commit b012684
Author: Pip Liggins <[email protected]>
Date:   Thu Sep 19 17:16:35 2024 -0700

    Switch test to triggered event

commit 7be637c
Author: Marc Berliner <[email protected]>
Date:   Thu Sep 19 15:39:27 2024 -0400

    Faster (re)initialization of ODEs in `IDA` (pybamm-team#4453)

    * Fast ODE reinitialization

    * Update CHANGELOG.md

    * remove redundant `IDAReInit` call

    * address comments

    * Update IDAKLUSolverOpenMP.inl

commit cee2cad
Author: Pip Liggins <[email protected]>
Date:   Thu Sep 19 10:22:21 2024 -0700

    update changelog

commit 37c94f9
Author: Pip Liggins <[email protected]>
Date:   Thu Sep 19 10:10:32 2024 -0700

    Add test

commit 06d7ecc
Merge: f47c12e 4cda488
Author: Pip Liggins <[email protected]>
Date:   Thu Sep 19 09:25:57 2024 -0700

    Merge branch 'develop' into output_vars_extrapolation

commit 4cda488
Merge: 48dbb68 5bb146f
Author: Valentin Sulzer <[email protected]>
Date:   Thu Sep 19 02:15:59 2024 -0700

    Merge pull request pybamm-team#4330 from parkec3/ocvr_ecm

    ECM with split OCV

commit 5bb146f
Merge: 712a3ee 48dbb68
Author: Valentin Sulzer <[email protected]>
Date:   Thu Sep 19 00:19:24 2024 -0700

    Merge branch 'develop' into ocvr_ecm

commit 48dbb68
Author: Martin Robinson <[email protected]>
Date:   Wed Sep 18 22:21:38 2024 +0100

    feat: add OpenMP parallelization to IDAKLU solver for lists of input parameters (pybamm-team#4449)

    * new solver option `num_solvers`, indicates how many solves run in parallel
    * existing `num_threads` gives total number of threads which are distributed among `num_solvers`

commit e1118ec
Author: Marc Berliner <[email protected]>
Date:   Wed Sep 18 12:35:47 2024 -0400

    Update CODEOWNERS (pybamm-team#4452)

commit f47c12e
Merge: d6b213c 05a0b24
Author: Pip Liggins <[email protected]>
Date:   Mon Sep 16 17:36:13 2024 -0700

    Merge branch 'develop' into output_vars_extrapolation

commit 05a0b24
Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date:   Mon Sep 16 22:08:33 2024 +0100

    chore: update pre-commit hooks (pybamm-team#4445)

    updates:
    - [github.com/astral-sh/ruff-pre-commit: v0.6.4 → v0.6.5](astral-sh/ruff-pre-commit@v0.6.4...v0.6.5)

    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

commit 8e3eb31
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Mon Sep 16 16:00:04 2024 -0400

    Build(deps): bump github/codeql-action in the actions group (pybamm-team#4444)

    Bumps the actions group with 1 update: [github/codeql-action](https://github.com/github/codeql-action).

    Updates `github/codeql-action` from 3.26.6 to 3.26.7
    - [Release notes](https://github.com/github/codeql-action/releases)
    - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
    - [Commits](github/codeql-action@4dd1613...8214744)

    ---
    updated-dependencies:
    - dependency-name: github/codeql-action
      dependency-type: direct:production
      update-type: version-update:semver-patch
      dependency-group: actions
    ...

    Signed-off-by: dependabot[bot] <[email protected]>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

commit d6b213c
Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date:   Fri Sep 13 23:14:56 2024 +0000

    style: pre-commit fixes

commit 343cea4
Author: Pip Liggins <[email protected]>
Date:   Fri Sep 13 15:51:17 2024 -0700

    change check_extrapolation to use t/y_event

commit 712a3ee
Merge: 3b21bf6 6c1815b
Author: Valentin Sulzer <[email protected]>
Date:   Sat Sep 7 13:10:30 2024 -0700

    Merge branch 'develop' into ocvr_ecm

commit 3b21bf6
Author: Eric G. Kratz <[email protected]>
Date:   Tue Sep 3 16:45:19 2024 -0400

    Move changelog update to unreleased

commit 4786443
Merge: 13270c2 ac93806
Author: Eric G. Kratz <[email protected]>
Date:   Tue Sep 3 16:44:14 2024 -0400

    Merge branch 'develop' into ocvr_ecm

commit 13270c2
Merge: f23ac77 1ab27d1
Author: Eric G. Kratz <[email protected]>
Date:   Tue Sep 3 13:20:33 2024 -0400

    Merge branch 'develop' into ocvr_ecm

commit f23ac77
Author: Caitlin Parke <[email protected]>
Date:   Tue Sep 3 13:11:22 2024 -0400

    updated tests and naming convention

commit b820e59
Merge: 9a996c4 ac6c450
Author: Eric G. Kratz <[email protected]>
Date:   Wed Aug 28 16:20:47 2024 -0400

    Merge branch 'develop' into ocvr_ecm

commit 9a996c4
Author: Caitlin Parke <[email protected]>
Date:   Wed Aug 21 16:46:28 2024 -0400

    updated for pytest and better coverage

commit e4504a6
Author: Caitlin Parke <[email protected]>
Date:   Tue Aug 20 18:18:42 2024 -0400

    added docs

commit b4c1897
Merge: 3f1e7f5 977dcf9
Author: Caitlin D. Parke <[email protected]>
Date:   Mon Aug 19 20:09:15 2024 -0400

    Merge branch 'pybamm-team:develop' into ocvr_ecm

commit 3f1e7f5
Author: Caitlin Parke <[email protected]>
Date:   Mon Aug 19 20:07:44 2024 -0400

    added default plotting variables

commit 84fba69
Merge: b1de9d1 99e3119
Author: Caitlin Parke <[email protected]>
Date:   Fri Aug 16 00:44:34 2024 -0400

    Merge branch 'ocvr_ecm' of https://github.com/parkec3/PyBaMM into ocvr_ecm

commit b1de9d1
Author: Caitlin Parke <[email protected]>
Date:   Fri Aug 16 00:44:19 2024 -0400

    pre-commit

commit 99e3119
Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date:   Fri Aug 16 04:42:19 2024 +0000

    style: pre-commit fixes

commit 758175f
Author: Caitlin Parke <[email protected]>
Date:   Fri Aug 16 00:42:04 2024 -0400

    added tests

commit 1fc5670
Merge: 3daabce 1e3f139
Author: Caitlin D. Parke <[email protected]>
Date:   Fri Aug 16 00:40:38 2024 -0400

    Merge branch 'pybamm-team:develop' into ocvr_ecm

commit 3daabce
Author: Caitlin Parke <[email protected]>
Date:   Wed Aug 14 12:01:58 2024 -0400

    parameter updates

commit a502d17
Merge: dbeaeb5 9691d09
Author: Caitlin D. Parke <[email protected]>
Date:   Wed Aug 14 11:58:58 2024 -0400

    Merge branch 'pybamm-team:develop' into ocvr_ecm

commit dbeaeb5
Author: Caitlin Parke <[email protected]>
Date:   Tue Aug 13 14:45:42 2024 -0400

    still working

commit 3bc77c2
Merge: 2a945f2 83239a3
Author: Caitlin Parke <[email protected]>
Date:   Mon Aug 12 11:25:01 2024 -0400

    Merged remote into local changes

commit 2a945f2
Author: Caitlin Parke <[email protected]>
Date:   Mon Aug 12 11:17:11 2024 -0400

    Added parameter objects in model

commit 83239a3
Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date:   Fri Aug 9 20:43:34 2024 +0000

    style: pre-commit fixes

commit f93e67b
Author: Caitlin Parke <[email protected]>
Date:   Fri Aug 9 16:39:33 2024 -0400

    pre-commit changes

commit 80428c9
Merge: f3faca4 b1fc595
Author: Caitlin D. Parke <[email protected]>
Date:   Fri Aug 9 16:34:30 2024 -0400

    Merge branch 'pybamm-team:develop' into ocvr_ecm

commit f3faca4
Author: Caitlin Parke <[email protected]>
Date:   Fri Aug 9 16:33:10 2024 -0400

    Added ECM + split OCV model
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

refactor multiprocessing and multiple inputs
4 participants