Excessive RAM consumption when running multiple simulations (Solver Issue) #1442

huegi · 2021-03-22T13:07:04Z

Problem description

When using the same solver object for multiple simulations, memory consumption increases every simulation. The issue is triggered by the solver object. (If this is desired behaviour, please close the issue.)

Link to Slack Discussion

This minimal example triggers the Issue

import pybamm as pb
import gc

solver = pb.CasadiSolver(mode="safe")
para = pb.ParameterValues(chemistry=pb.parameter_sets.Chen2020)
model = pb.lithium_ion.SPM()
exp = pb.Experiment(["Discharge at 0.2C until 2.9V (1 minutes period)"])

for i in range(30):
    print(i)
    sim = pb.Simulation(model, experiment=exp, parameter_values=para, solver=solver)
    sim.solve()

    del sim
    gc.collect()

This minimal example is a workaround for the Issue

import pybamm as pb
import gc

para = pb.ParameterValues(chemistry=pb.parameter_sets.Chen2020)
exp = pb.Experiment(["Discharge at 0.2C until 2.9V (1 minutes period)"])

model = pb.lithium_ion.SPM()


for i in range(30):
    print(i)
    solver = pb.CasadiSolver(mode="safe") # creating solver object every iteration
    sim = pb.Simulation(model, experiment=exp, parameter_values=para, solver=solver)
    sim.solve()

    del sim, solver # deleting object every iteration frees the space
    gc.collect()

valentinsulzer · 2021-03-25T22:11:57Z

I have never done much with memory profiling. Do you know a workflow for this?

valentinsulzer · 2021-03-25T23:06:26Z

Ok, I think I understand what is going on (but not 100% sure)

What's happening

When the simulation solves the model, it has to first set parameters and discretize. When doing this, it creates a new copy of the model, so that the model that you pass in is not unexpectedly changed
When solving, the solver first performs a "set up" step where it does some preprocessing on the model (calculating jacobians, converting to casadi, etc). To avoid having to do this every time the same model is solved, it saves which models it has already preprocessed in a dictionary self.models_set_up.
Hence in each iteration of the loop the solver sees a new model (since the simulation creates a new model each time) and hence is performing the "set up" step each time, and storing the models. These are the really memory-expensive objects. They are then getting stored in the solver and not removed by garbage collection when the simulation is deleted (unless the solver is also deleted).

** Workaround **

I don't think it's worth doing anything about this at this stage. Depending on your use case some possible fixes are:

Define the simulation outside the for loop so that the discretization only happens once (this will get you a big performance gain anyway)
If that's not possible, and you don't want to delete the solver each time, clear which models are saved in the solver:

solver.models_set_up = {}

Let me know if either of these fix the excessive RAM consumption issue for you

huegi · 2021-03-26T07:24:42Z

I quickly tried your solution:

def workaround_sulzer_01():
    model = pb.lithium_ion.SPM()
    para = pb.ParameterValues(chemistry=pb.parameter_sets.Chen2020)
    exp = pb.Experiment(["Discharge at 0.2C until 2.9V (1 minutes period)"])
    solver = pb.CasadiSolver(mode="safe")

    for i in range(1000):
        sim = pb.Simulation(model, experiment=exp, parameter_values=para, solver=solver)
        sim.solve()

        solver.models_set_up = {}

        print(f"cycle {i:>4},   solver_size: {sys.getsizeof(solver):>4}B")

Unfortunately, the RAM consumption continues to increase with this workaround.

Do you think it costs me a lot of computing time to initialise a new solver every time? With this solution, the Issue gets solved (see second code example in the first post).

valentinsulzer · 2021-03-26T12:59:16Z

What about defining the simulation outside the for loop? Or is that not possible for your use case?

huegi · 2021-03-26T13:16:23Z

I am updating the model every iteration, based on data from a PV System. It is a kind of lifetime simulation for PV batteries based on real data and I feed my data on an hourly basis as an experiment.

But even if I define the simulation outside, this would not solve the issue with the solver. RAM consumption is certainly increasing due to the solver and not the simulation.

If it helps, I can check which variable in the solver is causing the increase.

valentinsulzer · 2021-03-26T14:27:23Z

It would still be good to check what happens if you put the simulation outside the for loop. The simulation creating a new model each time might be the problem.

We have pybamm.InputParameter which allows you to do things like you are doing more efficiently (without redefining the model each time). You make something an input parameter instead of a regular Parameter or Scalar and then you can specify its value at solve time. If you want to DM me a MWE I can help set this up

Yes, would be great to pinpoint the variable in the solver that is causing issues.

martinjrobins · 2021-03-26T16:40:28Z

There are a couple more storage variables in the casadi solver solver.integrators and solver.integrator_specs, for me this worked a lot better, but there is still a slow increase in RAM usage
.

    model = pb.lithium_ion.SPM()
    para = pb.ParameterValues(chemistry=pb.parameter_sets.Chen2020)
    exp = pb.Experiment(["Discharge at 0.2C until 2.9V (1 minutes period)"])
    solver = pb.CasadiSolver(mode="safe")

    for i in range(1000):
        sim = pb.Simulation(model, experiment=exp, parameter_values=para, solver=solver)
        sim.solve()

        solver.models_set_up = {}
        solver.integrators = {}
        solver.integrator_specs = {}

        print(f"cycle {i:>4},   solver_size: {sys.getsizeof(solver):>4}B")

valentinsulzer · 2021-03-26T18:06:26Z

Ah yeah I forgot about those. How did you generate that plot?

martinjrobins · 2021-03-26T18:26:14Z

I used mprof https://pypi.org/project/memory-profiler/

valentinsulzer · 2021-03-29T20:54:57Z

Any update on this @huegi ?

huegi · 2021-03-30T09:16:19Z

def workaround_RAM_loss():
    model = pb.lithium_ion.SPM()
    para = pb.ParameterValues(chemistry=pb.parameter_sets.Chen2020)
    exp = pb.Experiment(["Rest for 5 minutes"])
    solver = pb.CasadiSolver(mode="safe")
    sim = pb.Simulation(model, experiment=exp, parameter_values=para, solver=solver)

    for i in range(100):
        sim.solve()
        sim.set_up_experiment(
            pb.lithium_ion.SPM(), pb.Experiment(["Rest for 5 minutes"])
        )

        sim.solver.models_set_up = {}
        sim.solver.integrators = {}
        sim.solver.integrator_specs = {}

        print(f"cycle {i:>4}")

Running the Code above without resetting the three solver variables results in:

Running the code with resetting those 3 variables works totally fine:

Then I tried to find the variable(s) which is causing the problem:

Perhaps one of you can explain why the memory is only released when all three variables are deleted.

PS: If you select a longer experiment, the memory consumption increases in smaller steps as in @martinjrobins picture. So you could probably also free up memory there.

martinjrobins · 2021-03-30T10:09:29Z

It could be that the three variables reference the memory in the others, so deleting one has no effect as the memory is still referenced somewhere (this is a guess!). I agree that there is still a (small) memory leak somewhere. This may be specific to the casadi solver, as I also tried the scipy solver and did not see a memory increase over time.

huegi · 2021-03-30T12:54:45Z

Found out that the method solver.step triggers the additional memory leak.

I was able to track it down to the base_solver class. I see that this func is a vector. But I can't find it afterwards. Could it be that wrong memory associations are made there from time to time (it doesn't happen every cycle!)?

valentinsulzer · 2021-03-30T14:48:30Z

Perhaps one of you can explain why the memory is only released when all three variables are deleted

All three dictionaries have the model as their key, so the model's memory only gets released when all three are deleted

I was able to track it down to the base_solver class. I see that this func is a vector. But I can't find it afterwards. Could it be that wrong memory associations are made there from time to time (it doesn't happen every cycle!)?

I know nothing about memory leaks, but plausible. This func object is returned by the process function, so it becomes rhs, algebraic, etc. However in https://github.com/pybamm-team/PyBaMM/blob/develop/pybamm/solvers/base_solver.py#L400-L402 it doesn't assign the func to anything, so maybe that is the problem? Can you try with model.events = [] to check?

huegi · 2021-03-30T16:43:53Z

Thank you for your idea with model.events = [] @tinosulzer
I added it here in the base_solver. Now it seems to free up the memory after every iteration correctly.

Funny how Python deals with unallocated memory in this case. I have never seen anything like that. But I'm also relatively new to the game ;)

def workaround_RAM_loss():
    sim = pb.Simulation(
        model=pb.lithium_ion.SPM(),
        experiment=pb.Experiment(["Discharge at 0.2C until 3.4V"]),
        parameter_values=pb.ParameterValues(chemistry=pb.parameter_sets.Chen2020),
        solver=pb.CasadiSolver(mode="safe"),
    )

    for i in range(100):
        sim = myCycleFunc(sim)
        print(f"cycle: {i}")


def myCycleFunc(sim: pb.Simulation):
    sim.solver.models_set_up = {}
    sim.solver.integrators = {}
    sim.solver.integrator_specs = {}

    sim.set_up_experiment(
        pb.lithium_ion.SPM(), pb.Experiment(["Discharge at 0.2C until 3.4V"])
    )
    sim.solve()
    # --> normally I would analyze the results here
    return sim

Do you think that we have now found all the factors that have led to an increase in memory? The latter issue with model.event can probably be included in the code.
And regarding the first issue in the solver. What would you think of a solver.clean method that deletes the three dicts?

valentinsulzer · 2021-03-30T16:58:34Z

Can you try replacing https://github.com/pybamm-team/PyBaMM/blob/develop/pybamm/solvers/base_solver.py#L400-L402 with

_, event_eval, _ = process(
                    event.expression, f"event_{n}", use_jacobian=False
                )

?
Maybe the allocation to _ will force it to reuse that memory? Removing model.event is not a good general solution to put in base_solver as in some cases we do need the casadi form of the events.

valentinsulzer · 2021-03-30T16:59:52Z

Also, I will stress again that you're inducing some (probably unnecessary) overheads by running sim.set_up_experiment inside the for loop.

And feel free to add a solver.reset() function which clears those dictionaries

huegi · 2021-03-30T20:44:10Z

I'm using sim.set_up_experiment because I am trying to emulate the behaviour of my actual code. There I'm running new experiments on an hour base and therefore I have to update the model and set_up_experiment. I've updated my minimal example, so that it makes more sense:

def workaround_RAM_loss():
    model = pb.lithium_ion.SPM()
    exp_charge = pb.Experiment(["Charge at 0.2C for 30 min or until 4V"])
    exp_discharge = pb.Experiment(["Discharge at 0.2C for 30 min or until 3V"])

    sim = pb.Simulation(
        model=model,
        experiment=pb.Experiment(["Discharge at 0.2C for 120 min or until 3V"]),
        parameter_values=pb.ParameterValues(chemistry=pb.parameter_sets.Chen2020),
        solver=pb.CasadiSolver(mode="safe"),
    )
    sim.solve()
    for i in range(1000):
        model.set_initial_conditions_from(solution=sim.solution, inplace=True)
        sim.reset() # new proposal to fix the issue

        if i % 2 == 0:
            sim.set_up_experiment(model, exp_charge)
        else:
            sim.set_up_experiment(model, exp_discharge)
        sim.solve()

        last_voltage = sim.solution["Terminal voltage [V]"].entries[-1]
        print(f"cycle: {i}; last_voltage: {last_voltage}")

In the following you can find the different memory profiles for 100 cycles:

As you can see it needs the model.events = [] expression to fix the problem.
I think the easiest to implement is a method in the Simulation class that resets the three solver parameters and the events of the model.

def reset(self):
        self.solver.models_set_up = {}
        self.solver.integrators = {}
        self.solver.integrator_specs = {}
        self.model.events = []

On this last image you can see whats possible with this reset method (@1000 cycles). There is still some memory leakage but I think less enough to run all applications without problems.

valentinsulzer · 2021-03-30T21:14:01Z

Removing the events of the model is not the way to go, it will almost certainly break something else.
Actually your reset function is not removing the events, since self.model is a copy of model and you pass the original model back in when you set up the experiment. So the memory leak in the last plot is still from the events

chuckliu1979 · 2021-07-25T06:17:42Z

i encounter similar problem when solve with parameter "starting_solution", memory increase to 7.2GB while loop 574.
i tried same workaround here, but no help

valentinsulzer · 2021-08-02T13:14:54Z

@chuckliu1979 can you post your code?

chuckliu1979 · 2021-08-05T07:24:31Z

@chuckliu1979 can you post your code?

sure

import time
import numpy
import pybamm

def constant_t_eval():
    return numpy.linspace(0, 5760, 576)


def constant_var_pts():
    var = pybamm.standard_spatial_vars
    return {var.x_n: 20, var.x_s: 20, var.x_p: 20, var.r_n: 10, var.r_p: 10}


class PVExperiment(pybamm.Simulation):
    def __init__(
        self,
        model=None,
        experiment=None,
        parameter_values=None,
        var_pts=None,
        solver=None,
        options=None,
        power_supply=None,
        electrodes=None,
        cells=None,
    ):
        self._t_eval = None
        options = options or {"thermal": "lumped"}
        model = model or pybamm.lithium_ion.DFN(options=options)
        var_pts = var_pts or constant_var_pts()
        solver = solver or pybamm.CasadiSolver(mode="safe")
        solver.max_step = 100000
        parameter_values = parameter_values or pybamm.ParameterValues(chemistry=pybamm.parameter_sets.Chen2020)
        parameter_values.update(
            {
                "Number of electrodes connected in parallel to make a cell": electrodes or 4,
                "Number of cells connected in series to make a battery": cells or 1,
            },
            check_already_exists=False,
        )
        if experiment is None:
            parameter_values.update(
                {
                    "Power function [W]": power_supply or 39.620,
                },
                check_already_exists=False,
            )
        super().__init__(model=model, parameter_values=parameter_values, experiment=experiment, var_pts=var_pts, solver=solver)

    @property
    def t_eval(self):
        if self._t_eval is None and self.solution is not None:
            self._t_eval = self.solution["Time [s]"].entries
        return self._t_eval

    def solve(
        self,
        t_eval=None,
        solver=None,
        check_model=True,
        save_at_cycles=None,
        starting_solution=None,
        **kwargs,
    ):
        self._t_eval, t_eval = None, None
        return super().solve(
            t_eval=t_eval,
            solver=solver,
            check_model=check_model,
            save_at_cycles=save_at_cycles,
            starting_solution=starting_solution,
            **kwargs,
        )

if __name__ == '__main__':
    solution = None
    parameter_values = None

    for i in range(0, 574):
        print(f"loop {i}:")
        parameter_values = parameter_values or pybamm.ParameterValues(chemistry=pybamm.parameter_sets.Chen2020)
        vmin = parameter_values["Lower voltage cut-off [V]"]
        sim = PVExperiment(experiment=pybamm.Experiment([tuple([f"Discharge at 39.620 W for 10 seconds or until {vmin} V"])]), parameter_values=parameter_values)
        solution = sim.solve(starting_solution=solution)

    if solution is not None:
        print(solution["Terminal voltage [V]"].entries)
    print("Sleep...")
    time.sleep(600)

jsbrittain · 2023-03-23T17:37:52Z

Hi - I've been looking into this issue today and a major (but apparently not the only) problem appears to have been caused by a failure to release both the Simulation and ElectrodeSOHSolver instances for garbage collection due to the use of lru_cache decorators in simulation.py and full_battery_models/lithium_ion/electrode_soh.py. Removal of the decorators allows garbage collection to occur as normal.

The lru_cache containers can instead be retained by wrapping the functions during class construction, which makes the containers local to the instance (and hence are destroyed when the instance goes out of scope, or is marked for deletion). Here's a nice explanation of the issue that I found while searching for a suitable solution: https://rednafi.github.io/reflections/dont-wrap-instance-methods-with-functoolslru_cache-decorator-in-python.html.

As demonstration, here are some memory_profiler results for @huegi's original minimal example (slightly updated due to deprecation of the 'chemistry' argument in ParameterValues).

Code:

import pybamm as pb
import gc

solver = pb.CasadiSolver(mode="safe")
para = pb.ParameterValues('Chen2020')
model = pb.lithium_ion.SPM()
exp = pb.Experiment(["Discharge at 0.2C until 2.9V (1 minutes period)"])

for i in range(30):
    print(i)
    sim = pb.Simulation(model, experiment=exp, parameter_values=para, solver=solver)
    sim.solve()

    del sim
    gc.collect()

Using decorators:

Using instance wrappers:

Note that memory consumption increases much more slowly, and drops from a peak of ~310Mb to ~180Mb. This does not entirely resolve the problem, but I can confirm that the destructors are now being called on each loop.

There is also one other use of lru_cache as an instance decorator in the repository (expression_tree/symbol.py).

martinjrobins · 2023-03-24T11:13:50Z

Thanks @jsbrittain, nice catch. You would think the lru_cache docs would put some sort of warning on that decorator

agriyakhetarpal · 2023-03-24T11:39:51Z

@martinjrobins I remember implementing @lru_cache earlier when I was memoizing those methods in #2465: iirc, it was used because @cache was introduced in Python 3.9, so it would break compatibility because we still support Python 3.8. I am not sure if @cache's implementation for maxsize is different in this case, because it is essentially running @lru_cache(maxsize=None) as a wrapper internally

martinjrobins · 2023-03-24T13:09:36Z

@jsbrittain's solution of wrapping the functions during class construction doesn't seem to have a downside (apart from an extra line of code), sounds like we should do this for all the instances of @lru_cache unless anyone has objections...

valentinsulzer · 2023-03-24T13:35:14Z

Nice catch @jsbrittain , happy to revert the change to get rid of @lru_cache. Though it doesn't completely fix this issue (which predates @lru_cache) since the memory shouldn't increase at all in this example

jsbrittain · 2023-03-24T14:30:16Z

@martinjrobins @tinosulzer - yes, as you say this is not the only issue...

The other issue arising from @huegi's minimal sample comes from the solver (as was originally identified), due to an accumulation of items in the integrators and integrator_specs dictionaries. As others have suggested, resetting these (or constructing a new solver on each iteration of the loop) resolves the problem (image below run for 100 iterations). One possible solution here would be to implement a flag that discards old iterators, possibly by discarding the least recently used items (much like the LRU cache implementatons used elsewhere).

@chuckliu1979's code sample on the other hand produces a growing solution object on each iteration, which is then captured by sim through the starting_solution parameter, so both sim and solution increase in size steadily over time (solution contains a number of lists which grow linearly with each successive iteration, as does solution._summary_variables). Discarding old estimates on each iteration of the loop seems to resolve the problem (below are results from the original code; and truncating the lists to the most recent 2 items):

However, since having the ability to iteratively run simulations seems desirable, one option would be to provide a user-adjustable limit on the length / history of these lists, and could default so that only the current and previous values sets are retained, for example keep_history=1. Thus it seems that limiting the history of the solvers is likely to resolve both problem cases. Note that this suggestion also appears related to #2796.

Summarising:

I would recommend associating the lru_cache methods with the class instance instead of the class object, since this seems like the original desired behaviour.
Should old iterators be discarded past some limit? This seems reasonable.
Should the solution history be discarded, past some limit? This also seems reasonable. However, I am not familiar enough with the codebase just yet to have a good feel as to whether this would prove problematic in any use-cases. I can certainly implement these changes and run tests to find out.

jsbrittain · 2023-03-24T15:47:46Z

Summary point 3 (above) is also likely to be resolvable by forwarding solution.last_state as the starting_solution (instead of the full solution profile); however, this would first require resolution of oustanding issue #2788 .

valentinsulzer · 2023-03-24T16:23:08Z

I agree points 1 and 2 should be resolved as you suggest.
For point 3, I would prefer to keep the full history by default, and document that for RAM-intensive cases, the user should pass in starting_solution.last_state (after we fix #2788). i.e. if the user passes in starting_solution, not truncating to starting_solution.last_state, and leaving that truncation up to the user if they so desire.

To give some more context, solving a model gives the outputs t and y, but often there are other variables of interest f(t,y). The way solutions currently works is by only storing t and y as default, and evaluating any variables lazily when that variable is called in the Solution dictionary by evaluating f(t[idx], y[:,idx]) for all idx. Therefore, it is important to keep the full history of t and y so that users have the possibility of evaluating any variable over the entire history.
The exception will be when running as described in #2796 , in which case we know which variables to save and won't be saving solution.y anyway (though we should keep solution.last_state in that case for initializing future solutions.

Another bug is that Solution contains Solution.cycles, a list of sub-solutions, and Solutions.cycles[idx].steps, a list of sub-steps. This allows users to easily call a specific step e.g. solution.cycles[3].steps[0]. However, the solution, cycles, and steps all hold a copy of the y matrix, which means (for large y matrices) the memory usage is 3x larger than in needs to be. Using views instead would alleviate this problem (can that be done in Python?)

valentinsulzer mentioned this issue Oct 14, 2021

Use Dask to run the cell simulations in parallel pybamm-team/liionpack#30

Closed

cyasing mentioned this issue Feb 21, 2023

Solver setup goes to infinite calculation at 42 seconds no matter the drive cycle #2689

Closed

martinjrobins assigned jsbrittain Mar 22, 2023

jsbrittain mentioned this issue Mar 28, 2023

Revert lru_cache to functional form; Add lru-dict for integrators #2823

Merged

8 tasks

martinjrobins closed this as completed in #2823 Mar 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Excessive RAM consumption when running multiple simulations (Solver Issue) #1442

Excessive RAM consumption when running multiple simulations (Solver Issue) #1442

huegi commented Mar 22, 2021

valentinsulzer commented Mar 25, 2021

valentinsulzer commented Mar 25, 2021

huegi commented Mar 26, 2021

valentinsulzer commented Mar 26, 2021

huegi commented Mar 26, 2021

valentinsulzer commented Mar 26, 2021

martinjrobins commented Mar 26, 2021 •

edited

Loading

valentinsulzer commented Mar 26, 2021

martinjrobins commented Mar 26, 2021

valentinsulzer commented Mar 29, 2021

huegi commented Mar 30, 2021

martinjrobins commented Mar 30, 2021

huegi commented Mar 30, 2021

valentinsulzer commented Mar 30, 2021

huegi commented Mar 30, 2021

valentinsulzer commented Mar 30, 2021

valentinsulzer commented Mar 30, 2021 •

edited

Loading

huegi commented Mar 30, 2021

valentinsulzer commented Mar 30, 2021

chuckliu1979 commented Jul 25, 2021

valentinsulzer commented Aug 2, 2021

chuckliu1979 commented Aug 5, 2021

jsbrittain commented Mar 23, 2023

martinjrobins commented Mar 24, 2023

agriyakhetarpal commented Mar 24, 2023 •

edited

Loading

martinjrobins commented Mar 24, 2023

valentinsulzer commented Mar 24, 2023

jsbrittain commented Mar 24, 2023 •

edited

Loading

jsbrittain commented Mar 24, 2023

valentinsulzer commented Mar 24, 2023

Excessive RAM consumption when running multiple simulations (Solver Issue) #1442

Excessive RAM consumption when running multiple simulations (Solver Issue) #1442

Comments

huegi commented Mar 22, 2021

Problem description

This minimal example triggers the Issue

This minimal example is a workaround for the Issue

valentinsulzer commented Mar 25, 2021

valentinsulzer commented Mar 25, 2021

huegi commented Mar 26, 2021

valentinsulzer commented Mar 26, 2021

huegi commented Mar 26, 2021

valentinsulzer commented Mar 26, 2021

martinjrobins commented Mar 26, 2021 • edited Loading

valentinsulzer commented Mar 26, 2021

martinjrobins commented Mar 26, 2021

valentinsulzer commented Mar 29, 2021

huegi commented Mar 30, 2021

martinjrobins commented Mar 30, 2021

huegi commented Mar 30, 2021

valentinsulzer commented Mar 30, 2021

huegi commented Mar 30, 2021

valentinsulzer commented Mar 30, 2021

valentinsulzer commented Mar 30, 2021 • edited Loading

huegi commented Mar 30, 2021

valentinsulzer commented Mar 30, 2021

chuckliu1979 commented Jul 25, 2021

valentinsulzer commented Aug 2, 2021

chuckliu1979 commented Aug 5, 2021

jsbrittain commented Mar 23, 2023

martinjrobins commented Mar 24, 2023

agriyakhetarpal commented Mar 24, 2023 • edited Loading

martinjrobins commented Mar 24, 2023

valentinsulzer commented Mar 24, 2023

jsbrittain commented Mar 24, 2023 • edited Loading

jsbrittain commented Mar 24, 2023

valentinsulzer commented Mar 24, 2023

martinjrobins commented Mar 26, 2021 •

edited

Loading

valentinsulzer commented Mar 30, 2021 •

edited

Loading

agriyakhetarpal commented Mar 24, 2023 •

edited

Loading

jsbrittain commented Mar 24, 2023 •

edited

Loading