Track down and exclude non-deterministic benchmarks #2954

Saransh-cpp · 2023-05-15T18:28:44Z

Currently, several benchmarks pass or fail randomly on different runs. These benchmarks should be tracked and excluded to ensure the credibility of the remaining benchmarking suite.

Note: random regressions are okay because GH Actions can be a bit noisy sometimes, but solvers failing in a particular benchmark randomly should not be okay.

jsbrittain · 2023-05-16T08:26:02Z

This sounds like a similar issue identified for integration tests due to random initial conditions, which was resolved by wrapping the unittest class to fix the random seeds across all tests; see #2844

rtimms · 2023-05-16T08:49:57Z

These non-deterministic issues worry me. It comes from perturbing the initial conditions in the casadi solver, so we know why it happens, but I think for the user it is very unexpected behaviour. My vote would be to make this option False by default and suggest that users try setting it to True if the model struggles to get started. @tinosulzer, in your experience, how much difference did it make to the robustness of the solver?

valentinsulzer · 2023-05-16T16:39:56Z

It was really useful for avoiding errors at t=0 (IDA_INIT or something like that). But that was with casadi 3.5, so 3.6 might be more robust

Saransh-cpp · 2023-06-09T17:43:16Z

I haven't been getting any time to work on this. I'll unassign myself and close the PR (which I don't think has something meaningful as of now). I'll pick it up again in a couple of weeks if it is still open by then.

martinjrobins · 2023-06-12T15:33:51Z

Looks like this benchmark has been failing regularly due to it timing out (timeout is 60s)

         [ 91.43%] ··· time_solve_models.TimeSolveDFN.time_solve_model        1/28 failed
         False       ORegan2022    pybamm.solvers.casadi_solver.CasadiSolver     failed

I'll look at this locally and see if I can see what is happening.


[ 91.43%] ··· time_solve_models.TimeSolveDFN.time_solve_model        1/28 failed
[ 91.43%] ··· ============= ============== =========================================== ============
               solve first    parameter                    solver_class                            
              ------------- -------------- ------------------------------------------- ------------
                  False      Marquis2019    pybamm.solvers.casadi_solver.CasadiSolver    672±20ms  
                  False      Marquis2019    pybamm.solvers.idaklu_solver.IDAKLUSolver    485±30ms  
                  False       ORegan2022    pybamm.solvers.casadi_solver.CasadiSolver     failed   
                  False       ORegan2022    pybamm.solvers.idaklu_solver.IDAKLUSolver   4.55±0.07s 
                  False       Prada2013     pybamm.solvers.casadi_solver.CasadiSolver    527±6ms   
                  False       Prada2013     pybamm.solvers.idaklu_solver.IDAKLUSolver    490±20ms  
                  False         Ai2020      pybamm.solvers.casadi_solver.CasadiSolver   4.10±0.1s  
                  False         Ai2020      pybamm.solvers.idaklu_solver.IDAKLUSolver   1.29±0.05s 
                  False      Ramadass2004   pybamm.solvers.casadi_solver.CasadiSolver    498±40ms  
                  False      Ramadass2004   pybamm.solvers.idaklu_solver.IDAKLUSolver    402±8ms   
                  False        Chen2020     pybamm.solvers.casadi_solver.CasadiSolver    590±40ms  
                  False        Chen2020     pybamm.solvers.idaklu_solver.IDAKLUSolver    466±20ms  
                  False       Ecker2015     pybamm.solvers.casadi_solver.CasadiSolver   3.12±0.1s  
                  False       Ecker2015     pybamm.solvers.idaklu_solver.IDAKLUSolver    731±20ms  
                   True      Marquis2019    pybamm.solvers.casadi_solver.CasadiSolver    447±20ms  
                   True      Marquis2019    pybamm.solvers.idaklu_solver.IDAKLUSolver    308±9ms   
                   True       ORegan2022    pybamm.solvers.casadi_solver.CasadiSolver      n/a     
                   True       ORegan2022    pybamm.solvers.idaklu_solver.IDAKLUSolver    4.32±0s   
                   True       Prada2013     pybamm.solvers.casadi_solver.CasadiSolver    339±20ms  
                   True       Prada2013     pybamm.solvers.idaklu_solver.IDAKLUSolver    369±20ms  
                   True         Ai2020      pybamm.solvers.casadi_solver.CasadiSolver   3.20±0.06s 
                   True         Ai2020      pybamm.solvers.idaklu_solver.IDAKLUSolver   1.06±0.03s 
                   True      Ramadass2004   pybamm.solvers.casadi_solver.CasadiSolver    305±10ms  
                   True      Ramadass2004   pybamm.solvers.idaklu_solver.IDAKLUSolver    292±10ms  
                   True        Chen2020     pybamm.solvers.casadi_solver.CasadiSolver    414±20ms  
                   True        Chen2020     pybamm.solvers.idaklu_solver.IDAKLUSolver    294±6ms   
                   True       Ecker2015     pybamm.solvers.casadi_solver.CasadiSolver   2.65±0.07s 
                   True       Ecker2015     pybamm.solvers.idaklu_solver.IDAKLUSolver    449±20ms  
              ============= ============== =========================================== ============

[ 91.43%] ···· For parameters: False, 'ORegan2022', <class 'pybamm.solvers.casadi_solver.CasadiSolver'>
               
               
               asv: benchmark timed out (timeout 60.0s)

martinjrobins · 2023-06-12T15:49:45Z

note that this particular test takes waaaaay longer than the rest (when it succeeds)

False       ORegan2022    pybamm.solvers.casadi_solver.CasadiSolver   38.4±0.3s

martinjrobins · 2023-06-12T15:55:03Z

suggestion from @DrSOKane: try bumping the number of particle grid points to 30

martinjrobins · 2023-06-12T15:56:05Z

suggestion from @brosaplanella: Looking at asv.conf.json, it seems we can specify different regression thresholds for various benchmarks (final lines of the json file):

// "regressions_thresholds": {
// "some_benchmark": 0.01, // Threshold of 1%
// "another_benchmark": 0.5, // Threshold of 50%
// },

Might be a way around it: tight thresholds for "unit" benchmarks, looser for "integration" benchmarks

brosaplanella · 2023-06-12T15:59:23Z

note that this particular test takes waaaaay longer than the rest (when it succeeds)
False       ORegan2022    pybamm.solvers.casadi_solver.CasadiSolver   38.4±0.3s 

This is a quite tricky parameter set as it has the diffusion coefficient is very nonlinear. I would be happy to skip it the Casadi benchmark for it, or skip it altogether for the time being.

Also because it is likely the cause of why benchmarks take so long to solve...

martinjrobins · 2023-06-12T16:02:53Z

note that this particular test takes waaaaay longer than the rest (when it succeeds)
False       ORegan2022    pybamm.solvers.casadi_solver.CasadiSolver   38.4±0.3s 
This is a quite tricky parameter set as it has the diffusion coefficient is very nonlinear. I would be happy to skip it the Casadi benchmark for it, or skip it altogether for the time being.

Also because it is likely the cause of why benchmarks take so long to solve...

I notice we're already skipping it for solve_first=true. We are also already using 30 particle grid points, so I'll turn it off for now (for the casadi solver, we're still doing it for idaklu)

martinjrobins · 2023-06-13T11:45:27Z

This benchmark is also occasionally timing out:

    GITT      Marquis2019   pybamm.models.full_battery_models.lithium_ion.dfn.DFN   pybamm.solvers.idaklu_solver.IDAKLUSolver   23.7±0.01s

We do not do the corresponding benchmark for casadi (this has previously been turned off). Should we do the same for idaklu?

[ 95.00%] ··· time_sims_experiments.TimeSimulation.time_solve                 ok
[ 95.00%] ··· ============ ============= ======================================================= =========================================== ============
               experiment    parameter                         model_class                                       solver_class                            
              ------------ ------------- ------------------------------------------------------- ------------------------------------------- ------------
                  CCCV      Marquis2019   pybamm.models.full_battery_models.lithium_ion.spm.SPM   pybamm.solvers.casadi_solver.CasadiSolver   6.16±0.01s 
                  CCCV      Marquis2019   pybamm.models.full_battery_models.lithium_ion.spm.SPM   pybamm.solvers.idaklu_solver.IDAKLUSolver    6.19±0s   
                  CCCV      Marquis2019   pybamm.models.full_battery_models.lithium_ion.dfn.DFN   pybamm.solvers.casadi_solver.CasadiSolver    8.84±0s   
                  CCCV      Marquis2019   pybamm.models.full_battery_models.lithium_ion.dfn.DFN   pybamm.solvers.idaklu_solver.IDAKLUSolver   8.34±0.01s 
                  CCCV        Chen2020    pybamm.models.full_battery_models.lithium_ion.spm.SPM   pybamm.solvers.casadi_solver.CasadiSolver    2.77±0s   
                  CCCV        Chen2020    pybamm.models.full_battery_models.lithium_ion.spm.SPM   pybamm.solvers.idaklu_solver.IDAKLUSolver    2.78±0s   
                  CCCV        Chen2020    pybamm.models.full_battery_models.lithium_ion.dfn.DFN   pybamm.solvers.casadi_solver.CasadiSolver    5.54±0s   
                  CCCV        Chen2020    pybamm.models.full_battery_models.lithium_ion.dfn.DFN   pybamm.solvers.idaklu_solver.IDAKLUSolver    4.66±0s   
                  GITT      Marquis2019   pybamm.models.full_battery_models.lithium_ion.spm.SPM   pybamm.solvers.casadi_solver.CasadiSolver   20.2±0.01s 
                  GITT      Marquis2019   pybamm.models.full_battery_models.lithium_ion.spm.SPM   pybamm.solvers.idaklu_solver.IDAKLUSolver    20.4±0s   
                  GITT      Marquis2019   pybamm.models.full_battery_models.lithium_ion.dfn.DFN   pybamm.solvers.casadi_solver.CasadiSolver      n/a     
                  GITT      Marquis2019   pybamm.models.full_battery_models.lithium_ion.dfn.DFN   pybamm.solvers.idaklu_solver.IDAKLUSolver   23.7±0.01s 
                  GITT        Chen2020    pybamm.models.full_battery_models.lithium_ion.spm.SPM   pybamm.solvers.casadi_solver.CasadiSolver   7.59±0.01s 
                  GITT        Chen2020    pybamm.models.full_battery_models.lithium_ion.spm.SPM   pybamm.solvers.idaklu_solver.IDAKLUSolver    7.73±0s   
                  GITT        Chen2020    pybamm.models.full_battery_models.lithium_ion.dfn.DFN   pybamm.solvers.casadi_solver.CasadiSolver    12.9±0s   
                  GITT        Chen2020    pybamm.models.full_battery_models.lithium_ion.dfn.DFN   pybamm.solvers.idaklu_solver.IDAKLUSolver   11.0±0.01s 
              ============ ============= ======================================================= =========================================== ============

martinjrobins · 2023-06-13T11:46:06Z

or I could increase the timeout if this benchmark is important

brosaplanella · 2023-06-13T12:09:23Z

If it is GITT, maybe we can reduce the number of pulses tested. We currently do

"GITT": [("Discharge at C/20 for 1 hour", "Rest for 1 hour")] * 20,

so maybe we could do 10 pulses instead?

Saransh-cpp · 2023-06-13T14:26:14Z

(Some of the non-deterministic benchmarks were excluded in #2784.)

…-failure #2954 turn off benchmark: ORegan2022 DFN solver with the casadi model

Saransh-cpp added difficulty: easy A good issue for someone new. Can be done in a few hours priority: high To be resolved as soon as possible labels May 15, 2023

Saransh-cpp self-assigned this May 15, 2023

martinjrobins mentioned this issue May 19, 2023

I2858-init-cond-sundials #2920

Merged

6 tasks

Saransh-cpp mentioned this issue May 28, 2023

Deterministic benchmarks #2995

Closed

8 tasks

Saransh-cpp removed their assignment Jun 9, 2023

valentinsulzer assigned martinjrobins Jun 12, 2023

valentinsulzer added the in-progress Assigned in the core dev monthly meeting label Jun 12, 2023

martinjrobins added a commit that referenced this issue Jun 12, 2023

#2954 turn off benchmark: ORegan2022 DFN solver with the casadi model

13c9c91

martinjrobins mentioned this issue Jun 12, 2023

#2954 turn off benchmark: ORegan2022 DFN solver with the casadi model #3031

Merged

6 tasks

martinjrobins added a commit that referenced this issue Jun 19, 2023

#2954 reduce GITT benchmark to 10 cycles

8c455d6

martinjrobins closed this as completed in #3031 Jul 6, 2023

martinjrobins added a commit that referenced this issue Jul 6, 2023

Merge pull request #3031 from pybamm-team/i2954-benchmark-intermitant…

6b642a6

…-failure #2954 turn off benchmark: ORegan2022 DFN solver with the casadi model

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track down and exclude non-deterministic benchmarks #2954

Track down and exclude non-deterministic benchmarks #2954

Saransh-cpp commented May 15, 2023

jsbrittain commented May 16, 2023

rtimms commented May 16, 2023

valentinsulzer commented May 16, 2023

Saransh-cpp commented Jun 9, 2023

martinjrobins commented Jun 12, 2023

martinjrobins commented Jun 12, 2023

martinjrobins commented Jun 12, 2023

martinjrobins commented Jun 12, 2023

brosaplanella commented Jun 12, 2023 •

edited

Loading

martinjrobins commented Jun 12, 2023 •

edited

Loading

martinjrobins commented Jun 13, 2023

martinjrobins commented Jun 13, 2023

brosaplanella commented Jun 13, 2023

Saransh-cpp commented Jun 13, 2023

Track down and exclude non-deterministic benchmarks #2954

Track down and exclude non-deterministic benchmarks #2954

Comments

Saransh-cpp commented May 15, 2023

jsbrittain commented May 16, 2023

rtimms commented May 16, 2023

valentinsulzer commented May 16, 2023

Saransh-cpp commented Jun 9, 2023

martinjrobins commented Jun 12, 2023

martinjrobins commented Jun 12, 2023

martinjrobins commented Jun 12, 2023

martinjrobins commented Jun 12, 2023

brosaplanella commented Jun 12, 2023 • edited Loading

martinjrobins commented Jun 12, 2023 • edited Loading

martinjrobins commented Jun 13, 2023

martinjrobins commented Jun 13, 2023

brosaplanella commented Jun 13, 2023

Saransh-cpp commented Jun 13, 2023

brosaplanella commented Jun 12, 2023 •

edited

Loading

martinjrobins commented Jun 12, 2023 •

edited

Loading