Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable multithreading in IDAKLU #2947

Merged
merged 4 commits into from
May 17, 2023
Merged

Conversation

jsbrittain
Copy link
Contributor

@jsbrittain jsbrittain commented May 12, 2023

Description

Enable multithreading in IDAKLU by replacing Serial vectors with OpenMP [OMP] vectors, and exposing a num_threads parameter to the user.

Serial vectors have been replaced with OMP vectors for the c-solver, with the default parameter num_threads=1 being equivalent to the Serial vector implementation. OMP vectors were introduced in the Sundials CVODE solver in v2.8.0 [currently v6.5.1], [https://sundials.readthedocs.io/en/latest/History_link.html](released in 2015) (changelog).

User implementation example:

sol = pybamm.IDAKLUSolver(options={'num_threads': 4})

(Partially) Fixes #2645
Implements shared memory multithreading only, not distributed memory.

Type of change

Please add a line in the relevant section of CHANGELOG.md to document the change (include PR #) - note reverse order of PR #s. If necessary, also add to the list of breaking changes.

  • New feature (non-breaking change which adds functionality)
  • Optimization (back-end change that speeds up the code)
  • Bug fix (non-breaking change which fixes an issue)

Key checklist:

  • No style issues: $ pre-commit run (see CONTRIBUTING.md for how to set this up to run automatically when committing locally, in just two lines of code)
  • All tests pass: $ python run-tests.py --all
  • The documentation builds: $ python run-tests.py --doctest

You can run unit and doctests together at once, using $ python run-tests.py --quick.

Further checks:

  • Code is commented, particularly in hard-to-understand areas
  • Tests added that prove fix is effective or that feature works [existing tests remain in place]

*** NOTE: ***
This requires recompilation of SUNDIALS with omp enabled; while this is now default behaviour, it will not be compatible with previous installs

@jsbrittain
Copy link
Contributor Author

Sample performance (based on the dae solver example)

import pybamm
import numpy as np

# construct model
model = pybamm.lithium_ion.DFN()
geometry = model.default_geometry
param = model.default_parameter_values
param.process_model(model)
param.process_geometry(geometry)
n = 500  # control the complexity of the geometry (increases number of parameters)
var_pts = {"x_n": n, "x_s": n, "x_p": n, "r_n": round(n/10), "r_p": round(n/10)}
mesh = pybamm.Mesh(geometry, model.default_submesh_types, var_pts)
disc = pybamm.Discretisation(mesh, model.default_spatial_methods)
disc.process_model(model)
t_eval = np.linspace(0, 3600, 100)

# solve using IDAKLU
options = {'num_threads': 1}
for _ in range(5):
    klu_sol = pybamm.IDAKLUSolver(atol=1e-8, rtol=1e-8, options=options).solve(model, t_eval)
    print(f"Solve time: {klu_sol.solve_time.value*1000} msecs [{options['num_threads']} threads]")
options = {'num_threads': 4}
for _ in range(5):
    klu_sol = pybamm.IDAKLUSolver(atol=1e-8, rtol=1e-8, options=options).solve(model, t_eval)
    print(f"Solve time: {klu_sol.solve_time.value*1000} msecs [{options['num_threads']} threads]")

Output:

Solve time: 7708.164755254984 msecs [1 threads]
Solve time: 7715.674336999655 msecs [1 threads]
Solve time: 7623.938228935003 msecs [1 threads]
Solve time: 7630.4152719676495 msecs [1 threads]
Solve time: 7748.058792203665 msecs [1 threads]
Solve time: 4278.668664395809 msecs [4 threads]
Solve time: 4256.6283363848925 msecs [4 threads]
Solve time: 4241.460246965289 msecs [4 threads]
Solve time: 4225.298915058374 msecs [4 threads]
Solve time: 4225.85211135447 msecs [4 threads]

@codecov
Copy link

codecov bot commented May 12, 2023

Codecov Report

Patch and project coverage have no change.

Comparison is base (634f7c7) 99.71% compared to head (1b9a5cd) 99.71%.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #2947   +/-   ##
========================================
  Coverage    99.71%   99.71%           
========================================
  Files          273      273           
  Lines        19002    19002           
========================================
  Hits         18947    18947           
  Misses          55       55           
Impacted Files Coverage Δ
pybamm/solvers/idaklu_solver.py 98.99% <ø> (ø)

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@jsbrittain
Copy link
Contributor Author

@martinjrobins I've implemented openmp as it provides an immediate speed-up while I look further into MPI.

@jsbrittain jsbrittain marked this pull request as ready for review May 12, 2023 15:33
@jsbrittain jsbrittain requested a review from martinjrobins May 12, 2023 15:33
Copy link
Contributor

@martinjrobins martinjrobins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jsbrittain, this is great! Did you check that num_threads=1 and N_VNew_OpenMP with is the same performance as when using N_VNew_Serial?

@jsbrittain
Copy link
Contributor Author

@martinjrobins Performance on feature branch (OpenMP):

Solve time: 7588.103104382753 msecs [1 threads]
Solve time: 7615.54960347712 msecs [1 threads]
Solve time: 7599.450569599867 msecs [1 threads]
Solve time: 7598.879043012857 msecs [1 threads]
Solve time: 7601.809248328209 msecs [1 threads]

Solve time: 4217.907031998038 msecs [4 threads]
Solve time: 4232.315663248301 msecs [4 threads]
Solve time: 4210.8050510287285 msecs [4 threads]
Solve time: 4219.707608222961 msecs [4 threads]
Solve time: 4265.8341228961945 msecs [4 threads]

Performance on develop branch (Serial):

Solve time: 7707.045676186681 msecs [1 threads]
Solve time: 7676.297169178724 msecs [1 threads]
Solve time: 7766.72700420022 msecs [1 threads]
Solve time: 7683.037415146828 msecs [1 threads]
Solve time: 7660.65208427608 msecs [1 threads]

Average 1 thread completion times: 7599 vs 7697 msecs. I may have mentioned some overhead/slowdown previously, but that was because I had missed some vector definitions and was mixing the OpenMP and Serial vectors types. Now that they are fixed there is no slow-down (OpenMP even fractionally quicker?).

@jsbrittain jsbrittain requested a review from martinjrobins May 16, 2023 08:28
@martinjrobins martinjrobins merged commit 137ae23 into pybamm-team:develop May 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

use sundials parallel vectors for parallel solves
2 participants