WIP: BUG: ABNORMAL_TERMINATION test from real data #212

Eh2406 · 2018-05-25T18:17:34Z

At yesterday's meeting with @semcogli and @janowicz we discuss whether #211 would have a practical impact on our models. I thought I would run an experiment. On our computer, I updated our copy of UrbanSim to include #211, and refit all the elcm models.

Indeed many of our elcm models are reliably throwing a non-convergence warning 'ABNORMAL_TERMINATION_IN_LNSRCH'. According to the internet, that diagnostic code is usually caused by programer error. Specifically, a bad gradient calculation.

To dig in I tried to make a minimal reproducible example, I took one of our models and reduced the sample size and removed columns until the ABNORMAL_TERMINATION stopped. That became the new test_mnl_estimate_temp. It needs a better name, or even better to be in the same form as the parametric tests.

With a MRE in hand I wanted to look deeper. Is the gradient calculation correct? So I added a wrapper function mnl_loglik_print that print the difference between the gradient returned by mnl_loglik and the gradient as approximated by scipy.optimize.approx_fprime. Indeed the gradient does not look correct! In fact it doesn't look good on most of the existing test.

Eh2406 · 2018-05-25T20:03:51Z

Edit: Closed by accident.

I am having trouble getting the calculus to agree with the code,
@smmaurer, @timothyb0912 you are more familiar with this, dose the math look correct?

smmaurer · 2018-05-28T23:04:16Z

Hi @Eh2406, this is really interesting. I'm not familiar enough with the estimation code to have a good sense of what's going on, though.

One thing I'm curious about is how much the non-convergence is affecting the estimation results. When there's a non-converging model and you reduce the sample size until the error disappears, do the coefficients and standard errors change or are they similar?

Another thing to try could be to estimate some of the same models using PyLogit, which is a completely different codebase, and compare the results. This is a bit of a pain because the data formats and model expressions are different, but here's a notebook with an example. ("ChoiceModels engine" is essentially urbansim.urbanchoice.mnl and "PyLogit engine" is pylogit.mnl.MNL.)

https://github.com/UDST/choicemodels/blob/master/notebooks/Destination-choice-models-02.ipynb

I'm going to take a look and see if I can get non-convergence with some Bay Area data..

Eh2406 · 2018-05-29T16:39:56Z

One thing I'm curious about is how much the non-convergence is affecting the estimation results.

I believe the technically correct answer is "unspecified", as in scipy makes no warranties about what the value will be. It can return the initial point, we are seeing this in some of our data sets. It can return a point related to the stopping condition, in this case the last place tested to determine that our gradient is incorrect. Most perniciously it can return the correct answer, as in #211, witch makes it hard to notice that there is a problem.

Another thing to try could be to estimate some of the same models using PyLogit.

That does seem like a good test. Different code, different bugs. From a brief review of the PyLogit code, I don't see them testing the success flag, so thay are alsow not reporting on non-convergence bugs.

I'm going to take a look and see if I can get non-convergence with some Bay Area data.

Thank you. And thank you for taking this bug report seriously.

My attempt to work thru the calculus, will be coming in the next post.

timothyb0912 · 2018-05-29T17:00:21Z

@Eh2406 I don't test the success flag as I've found it to often be meaningless.

I've seen many examples where the gradient is essentially zero (e.g. all terms being on the order of 1e-07) but the success flag says that the optimizer did not converge. I've also seen cases where the optimizer converged, as in the change in the function evaluations was less than a small threshold, yet the gradient shows that the local optimum wasn't yet reached.

I've checked my gradients using automatic differentiation (which has higher accuracy than the finite difference approximations used in approx_fprime) and they've all been correct to every reported decimal place. You can also find the analytic derivations for the MNL's gradient and hessian here

If you do test against PyLogit, I'd suggest directly examining the reported gradient instead of paying attention to the success flag. The gradient can be viewed by accessing the gradient attribute on a fitted model object.

Eh2406 · 2018-05-29T18:45:54Z

I don't test the success flag as I've found it to often be meaningless.

Indeed, I stand corrected. My previous post is overstated. It is more accurate to say that most forms of non-convergence return a reasonable approximation to the correct result. My impression is that ABNORMAL_TERMINATION_IN_LNSRCH is not one of them.

I've checked my gradients using automatic differentiation

Automatic differentiation would be a much better test of the gradient implementation! It is immensely comforting to know you have tested PyLogit so thoroughly! Do you have advice on how to use AD to test the urbansim implementation?

You can also find the analytic derivations for the MNL's gradient

Thank you! I will go read that now! My aforementioned attempts to work thru the calculus on my own have not been going well.

Eh2406 · 2018-06-04T15:25:22Z

Work got bizzy last week and I did not have time two dig in as much as I'd like, but hear is what I have found so far.

The example reproduces with only the first column of data.

The gradient calculation matches the one from @timothyb0912's paper. So maybe it has to do with the underflow checks, or something else that is not considered by that paper? Specifically log(prob) == -inf. @timothyb0912 how does PyLogit handle underflow?

Eh2406 added 2 commits May 25, 2018 13:30

example of: "mnl did not converge correctly"

2fabf69

diagnostic tool to identify errors in gradarr

c69ff0d

Eh2406 closed this May 25, 2018

smmaurer reopened this May 28, 2018

sean-wallace mentioned this pull request Jan 26, 2023

Improve optimization UDST/choicemodels#76

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: BUG: ABNORMAL_TERMINATION test from real data #212

WIP: BUG: ABNORMAL_TERMINATION test from real data #212

Eh2406 commented May 25, 2018

Eh2406 commented May 25, 2018 •

edited

Loading

smmaurer commented May 28, 2018

Eh2406 commented May 29, 2018

timothyb0912 commented May 29, 2018 •

edited

Loading

Eh2406 commented May 29, 2018

Eh2406 commented Jun 4, 2018

WIP: BUG: ABNORMAL_TERMINATION test from real data #212

Are you sure you want to change the base?

WIP: BUG: ABNORMAL_TERMINATION test from real data #212

Conversation

Eh2406 commented May 25, 2018

Eh2406 commented May 25, 2018 • edited Loading

smmaurer commented May 28, 2018

Eh2406 commented May 29, 2018

timothyb0912 commented May 29, 2018 • edited Loading

Eh2406 commented May 29, 2018

Eh2406 commented Jun 4, 2018

Eh2406 commented May 25, 2018 •

edited

Loading

timothyb0912 commented May 29, 2018 •

edited

Loading