Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track down and exclude non-deterministic benchmarks #2954

Closed
Saransh-cpp opened this issue May 15, 2023 · 14 comments · Fixed by #3031
Closed

Track down and exclude non-deterministic benchmarks #2954

Saransh-cpp opened this issue May 15, 2023 · 14 comments · Fixed by #3031
Assignees
Labels
difficulty: easy A good issue for someone new. Can be done in a few hours in-progress Assigned in the core dev monthly meeting priority: high To be resolved as soon as possible

Comments

@Saransh-cpp
Copy link
Member

Currently, several benchmarks pass or fail randomly on different runs. These benchmarks should be tracked and excluded to ensure the credibility of the remaining benchmarking suite.

Note: random regressions are okay because GH Actions can be a bit noisy sometimes, but solvers failing in a particular benchmark randomly should not be okay.

@Saransh-cpp Saransh-cpp added difficulty: easy A good issue for someone new. Can be done in a few hours priority: high To be resolved as soon as possible labels May 15, 2023
@Saransh-cpp Saransh-cpp self-assigned this May 15, 2023
@jsbrittain
Copy link
Contributor

This sounds like a similar issue identified for integration tests due to random initial conditions, which was resolved by wrapping the unittest class to fix the random seeds across all tests; see #2844

@rtimms
Copy link
Contributor

rtimms commented May 16, 2023

These non-deterministic issues worry me. It comes from perturbing the initial conditions in the casadi solver, so we know why it happens, but I think for the user it is very unexpected behaviour. My vote would be to make this option False by default and suggest that users try setting it to True if the model struggles to get started. @tinosulzer, in your experience, how much difference did it make to the robustness of the solver?

@valentinsulzer
Copy link
Member

It was really useful for avoiding errors at t=0 (IDA_INIT or something like that). But that was with casadi 3.5, so 3.6 might be more robust

@Saransh-cpp
Copy link
Member Author

I haven't been getting any time to work on this. I'll unassign myself and close the PR (which I don't think has something meaningful as of now). I'll pick it up again in a couple of weeks if it is still open by then.

@Saransh-cpp Saransh-cpp removed their assignment Jun 9, 2023
@valentinsulzer valentinsulzer added the in-progress Assigned in the core dev monthly meeting label Jun 12, 2023
@martinjrobins
Copy link
Contributor

Looks like this benchmark has been failing regularly due to it timing out (timeout is 60s)

         [ 91.43%] ··· time_solve_models.TimeSolveDFN.time_solve_model        1/28 failed
         False       ORegan2022    pybamm.solvers.casadi_solver.CasadiSolver     failed   

I'll look at this locally and see if I can see what is happening.


[ 91.43%] ··· time_solve_models.TimeSolveDFN.time_solve_model        1/28 failed
[ 91.43%] ··· ============= ============== =========================================== ============
               solve first    parameter                    solver_class                            
              ------------- -------------- ------------------------------------------- ------------
                  False      Marquis2019    pybamm.solvers.casadi_solver.CasadiSolver    672±20ms  
                  False      Marquis2019    pybamm.solvers.idaklu_solver.IDAKLUSolver    485±30ms  
                  False       ORegan2022    pybamm.solvers.casadi_solver.CasadiSolver     failed   
                  False       ORegan2022    pybamm.solvers.idaklu_solver.IDAKLUSolver   4.55±0.07s 
                  False       Prada2013     pybamm.solvers.casadi_solver.CasadiSolver    527±6ms   
                  False       Prada2013     pybamm.solvers.idaklu_solver.IDAKLUSolver    490±20ms  
                  False         Ai2020      pybamm.solvers.casadi_solver.CasadiSolver   4.10±0.1s  
                  False         Ai2020      pybamm.solvers.idaklu_solver.IDAKLUSolver   1.29±0.05s 
                  False      Ramadass2004   pybamm.solvers.casadi_solver.CasadiSolver    498±40ms  
                  False      Ramadass2004   pybamm.solvers.idaklu_solver.IDAKLUSolver    402±8ms   
                  False        Chen2020     pybamm.solvers.casadi_solver.CasadiSolver    590±40ms  
                  False        Chen2020     pybamm.solvers.idaklu_solver.IDAKLUSolver    466±20ms  
                  False       Ecker2015     pybamm.solvers.casadi_solver.CasadiSolver   3.12±0.1s  
                  False       Ecker2015     pybamm.solvers.idaklu_solver.IDAKLUSolver    731±20ms  
                   True      Marquis2019    pybamm.solvers.casadi_solver.CasadiSolver    447±20ms  
                   True      Marquis2019    pybamm.solvers.idaklu_solver.IDAKLUSolver    308±9ms   
                   True       ORegan2022    pybamm.solvers.casadi_solver.CasadiSolver      n/a     
                   True       ORegan2022    pybamm.solvers.idaklu_solver.IDAKLUSolver    4.32±0s   
                   True       Prada2013     pybamm.solvers.casadi_solver.CasadiSolver    339±20ms  
                   True       Prada2013     pybamm.solvers.idaklu_solver.IDAKLUSolver    369±20ms  
                   True         Ai2020      pybamm.solvers.casadi_solver.CasadiSolver   3.20±0.06s 
                   True         Ai2020      pybamm.solvers.idaklu_solver.IDAKLUSolver   1.06±0.03s 
                   True      Ramadass2004   pybamm.solvers.casadi_solver.CasadiSolver    305±10ms  
                   True      Ramadass2004   pybamm.solvers.idaklu_solver.IDAKLUSolver    292±10ms  
                   True        Chen2020     pybamm.solvers.casadi_solver.CasadiSolver    414±20ms  
                   True        Chen2020     pybamm.solvers.idaklu_solver.IDAKLUSolver    294±6ms   
                   True       Ecker2015     pybamm.solvers.casadi_solver.CasadiSolver   2.65±0.07s 
                   True       Ecker2015     pybamm.solvers.idaklu_solver.IDAKLUSolver    449±20ms  
              ============= ============== =========================================== ============

[ 91.43%] ···· For parameters: False, 'ORegan2022', <class 'pybamm.solvers.casadi_solver.CasadiSolver'>
               
               
               asv: benchmark timed out (timeout 60.0s)
               

@martinjrobins
Copy link
Contributor

note that this particular test takes waaaaay longer than the rest (when it succeeds)

False       ORegan2022    pybamm.solvers.casadi_solver.CasadiSolver   38.4±0.3s 

@martinjrobins
Copy link
Contributor

suggestion from @DrSOKane: try bumping the number of particle grid points to 30

@martinjrobins
Copy link
Contributor

suggestion from @brosaplanella: Looking at asv.conf.json, it seems we can specify different regression thresholds for various benchmarks (final lines of the json file):

// "regressions_thresholds": {
// "some_benchmark": 0.01, // Threshold of 1%
// "another_benchmark": 0.5, // Threshold of 50%
// },

Might be a way around it: tight thresholds for "unit" benchmarks, looser for "integration" benchmarks

@brosaplanella
Copy link
Member

brosaplanella commented Jun 12, 2023

note that this particular test takes waaaaay longer than the rest (when it succeeds)

False       ORegan2022    pybamm.solvers.casadi_solver.CasadiSolver   38.4±0.3s 

This is a quite tricky parameter set as it has the diffusion coefficient is very nonlinear. I would be happy to skip it the Casadi benchmark for it, or skip it altogether for the time being.

Also because it is likely the cause of why benchmarks take so long to solve...

@martinjrobins
Copy link
Contributor

martinjrobins commented Jun 12, 2023

note that this particular test takes waaaaay longer than the rest (when it succeeds)

False       ORegan2022    pybamm.solvers.casadi_solver.CasadiSolver   38.4±0.3s 

This is a quite tricky parameter set as it has the diffusion coefficient is very nonlinear. I would be happy to skip it the Casadi benchmark for it, or skip it altogether for the time being.

Also because it is likely the cause of why benchmarks take so long to solve...

I notice we're already skipping it for solve_first=true. We are also already using 30 particle grid points, so I'll turn it off for now (for the casadi solver, we're still doing it for idaklu)

@martinjrobins
Copy link
Contributor

This benchmark is also occasionally timing out:

    GITT      Marquis2019   pybamm.models.full_battery_models.lithium_ion.dfn.DFN   pybamm.solvers.idaklu_solver.IDAKLUSolver   23.7±0.01s 

We do not do the corresponding benchmark for casadi (this has previously been turned off). Should we do the same for idaklu?

[ 95.00%] ··· time_sims_experiments.TimeSimulation.time_solve                 ok
[ 95.00%] ··· ============ ============= ======================================================= =========================================== ============
               experiment    parameter                         model_class                                       solver_class                            
              ------------ ------------- ------------------------------------------------------- ------------------------------------------- ------------
                  CCCV      Marquis2019   pybamm.models.full_battery_models.lithium_ion.spm.SPM   pybamm.solvers.casadi_solver.CasadiSolver   6.16±0.01s 
                  CCCV      Marquis2019   pybamm.models.full_battery_models.lithium_ion.spm.SPM   pybamm.solvers.idaklu_solver.IDAKLUSolver    6.19±0s   
                  CCCV      Marquis2019   pybamm.models.full_battery_models.lithium_ion.dfn.DFN   pybamm.solvers.casadi_solver.CasadiSolver    8.84±0s   
                  CCCV      Marquis2019   pybamm.models.full_battery_models.lithium_ion.dfn.DFN   pybamm.solvers.idaklu_solver.IDAKLUSolver   8.34±0.01s 
                  CCCV        Chen2020    pybamm.models.full_battery_models.lithium_ion.spm.SPM   pybamm.solvers.casadi_solver.CasadiSolver    2.77±0s   
                  CCCV        Chen2020    pybamm.models.full_battery_models.lithium_ion.spm.SPM   pybamm.solvers.idaklu_solver.IDAKLUSolver    2.78±0s   
                  CCCV        Chen2020    pybamm.models.full_battery_models.lithium_ion.dfn.DFN   pybamm.solvers.casadi_solver.CasadiSolver    5.54±0s   
                  CCCV        Chen2020    pybamm.models.full_battery_models.lithium_ion.dfn.DFN   pybamm.solvers.idaklu_solver.IDAKLUSolver    4.66±0s   
                  GITT      Marquis2019   pybamm.models.full_battery_models.lithium_ion.spm.SPM   pybamm.solvers.casadi_solver.CasadiSolver   20.2±0.01s 
                  GITT      Marquis2019   pybamm.models.full_battery_models.lithium_ion.spm.SPM   pybamm.solvers.idaklu_solver.IDAKLUSolver    20.4±0s   
                  GITT      Marquis2019   pybamm.models.full_battery_models.lithium_ion.dfn.DFN   pybamm.solvers.casadi_solver.CasadiSolver      n/a     
                  GITT      Marquis2019   pybamm.models.full_battery_models.lithium_ion.dfn.DFN   pybamm.solvers.idaklu_solver.IDAKLUSolver   23.7±0.01s 
                  GITT        Chen2020    pybamm.models.full_battery_models.lithium_ion.spm.SPM   pybamm.solvers.casadi_solver.CasadiSolver   7.59±0.01s 
                  GITT        Chen2020    pybamm.models.full_battery_models.lithium_ion.spm.SPM   pybamm.solvers.idaklu_solver.IDAKLUSolver    7.73±0s   
                  GITT        Chen2020    pybamm.models.full_battery_models.lithium_ion.dfn.DFN   pybamm.solvers.casadi_solver.CasadiSolver    12.9±0s   
                  GITT        Chen2020    pybamm.models.full_battery_models.lithium_ion.dfn.DFN   pybamm.solvers.idaklu_solver.IDAKLUSolver   11.0±0.01s 
              ============ ============= ======================================================= =========================================== ============

@martinjrobins
Copy link
Contributor

or I could increase the timeout if this benchmark is important

@brosaplanella
Copy link
Member

If it is GITT, maybe we can reduce the number of pulses tested. We currently do

"GITT": [("Discharge at C/20 for 1 hour", "Rest for 1 hour")] * 20,

so maybe we could do 10 pulses instead?

@Saransh-cpp
Copy link
Member Author

(Some of the non-deterministic benchmarks were excluded in #2784.)

martinjrobins added a commit that referenced this issue Jul 6, 2023
…-failure

#2954 turn off benchmark: ORegan2022 DFN solver with the casadi model
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
difficulty: easy A good issue for someone new. Can be done in a few hours in-progress Assigned in the core dev monthly meeting priority: high To be resolved as soon as possible
Projects
None yet
6 participants