Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

obj$report() Output #190

Closed
Njoselson opened this issue Jun 18, 2016 · 15 comments
Closed

obj$report() Output #190

Njoselson opened this issue Jun 18, 2016 · 15 comments

Comments

@Njoselson
Copy link

Hello,

Putting REPORT() vectors in the c++ code works easily when running a once off calculation with MakeADFun(). Then the values of these variables at the point they were reported can be accessed with obj$report().

However, I am unclear how the REPORT() vectors work during and after the optimization. I thought that they updated with each additional recalculation of the objective function, corresponding to the updated parameter values. This doesn't seem to be the case however.

> system.time(obj <- MakeADFun(data, parameters, DLL="hake2015"))
Constructing atomic pnorm1
   user  system elapsed 
  60.79    1.12   65.07 
> obj$report()$LLF
[1] 559.783828279
> system.time(opt<-optim(obj$par, obj$fn, obj$gr, method="BFGS", control=list(trace=1,maxit=100000))) 
initial  value 559.783828 
outer mgc:  765010.199228 
outer mgc:  53420.6635511 
outer mgc:  14784.867205 
outer mgc:  12030.8542397 
outer mgc:  9271.62520181 
outer mgc:  12248.2638561 
outer mgc:  18152.7617859 
outer mgc:  14203.7212133 
outer mgc:  10396.9323153 
outer mgc:  4842.49433734 
iter  10 value -189.331166
...
iter 620 value -208.385431
outer mgc:  196.326659407 
outer mgc:  205.212652605 
outer mgc:  394.724306879 
outer mgc:  406.511578734 
outer mgc:  405.170340324 
outer mgc:  155.061316761 
outer mgc:  195.392121382 
final  value -208.386673 
converged
   user  system elapsed 
 185.50    4.51  192.97 
> obj$report()$LLF
[1] -194.526013359

The value of the reported LLF value after running MakeADFun, corresponds with the initial value of the optimization. However, if the report vector would update after every iteration, then the reported value should be the same as the final converged value of the optimization, however they are very far from each other. This would indicate that the other values which are reported on also don't correspond to the optimized iteration.

Is there a different way of getting the optimized run's output? I know you can get the optimized parameters with opt$par, but how do you the values of reported variables at the optimum?

Thanks so much!

Nathaniel

@kaskr
Copy link
Owner

kaskr commented Jun 19, 2016

Try obj$report(obj$env$last.par.best)

@tylerjoedonaldson
Copy link

tylerjoedonaldson commented Jun 22, 2016

In summary:
If obj$fn(opt$par) is run it agrees with opt$value, but not obj$report()$LLF.
Furthermore, setting initial parameters = opt$par and running optim results in the 'initial value' being equal to obj$report()$LLF.
Finally, running it in ADMB with the opt$par parameters agrees with obj$report()$LLF.

The cpp reads:
Report(LLF)
return(LLF) [So obj$fn(opt$par) = opt$value makes sense]

  • For the final converged parameters Report(LLF) != return(LLF)
  • For the initial parameters Report(LLF) == return(LLF)

Perhaps there some sort of post parameter manipulations that optim() performs?

@HaikunXu
Copy link

HaikunXu commented Jul 6, 2016

I also found that "For the final converged parameters Report(LLF) != return(LLF)"

If I want to calculate AIC, which LLF should I use in this case and why? Moreover, how can I get the sub-components of LLF(LLF = LLF1 + LLF2 + ...) after the model is converged? If I use REPORT on these sub-components in cpp, their sum report(LLF1) + report(LLF2)+ ... = report(LLF) is different from opt$objective. I would be appreciated if someone can answer the two question.

Haikun

@Njoselson
Copy link
Author

Hello,

The obj$report(obj$env$last.par.best) solution doesn't work. This gives the same variable values as obj$report() after optimization.

Where the theme through all of these posts comes is understanding what the link is between obj$fn(opt$par) which agrees with opt$objective but not with obj$report()$LLF.

This can be seen if a REPORT() vector is inserted in the orange_big example.

cpp file:

#include <TMB.hpp>

template<class Type>
Type objective_function<Type>::operator() ()
{
  DATA_INTEGER(n);
  DATA_VECTOR(y);
  DATA_VECTOR(t);
  DATA_INTEGER(M);
  DATA_FACTOR(ngroup);
  DATA_INTEGER(multiply);
  PARAMETER_VECTOR(beta);
  PARAMETER(log_sigma);
  PARAMETER(log_sigma_u);
  PARAMETER_VECTOR(u);

  Type sigma=exp(log_sigma);
  Type sigma_u=exp(log_sigma_u);

  ADREPORT(sigma);
  ADREPORT(sigma_u);

  using namespace density;
  int i,j,k,ii;

  Type g=0;

  for(k=0;k< multiply;k++)
  {
    ii = 0;
    for(i=0;i< M;i++)
    {
      // Random effects contribution
      Type u1 = u[i+k*M];
      g -= -(log_sigma_u);
      g -= -.5*pow(u1/sigma_u,2);

      vector<Type> a(3);
     a[0] = 192.0 + beta[0] + u1;
      a[1] = 726.0 + beta[1];
      a[2] = 356.0 + beta[2];

      Type tmp;
      Type f;

      for(j=0;j<ngroup(i);j++)
      {
        f = a[0]/(1+exp(-(t[ii]-a[1])/a[2]));
        tmp = (y[ii] - f)/sigma;
        g -= -log_sigma - 0.5*tmp*tmp;
        ii++;
      }
    }
  }

  REPORT(g);
  return g;
}

R file

require(TMB)

# Read data
source("orange_data.R")

compile("orange_big.cpp")
dyn.load(dynlib("orange_big"))
Mmultiply = data_orange$M*data_orange$multiply

obj <- MakeADFun(data=data_orange,
                 parameters=list(
                   beta=c(0,0,0),
                   log_sigma=1,
                   log_sigma_u=2,
                   u = rep(0,Mmultiply)),
                 random=c("u")
                 )
obj$control <- list(trace=1,parscale=c(1,1,1,1,1)*1e-2,REPORT=1,reltol=1e-12,maxit=100)
obj$hessian <- F
newtonOption(obj,smartsearch=TRUE)
(opt<-nlminb(obj$par,obj$fn,obj$gr,lower=c(-10.0,-10,-10,-5,-5),upper=c(10.0,10,10,5.0,5.0)))

#Comparison of outputs:
obj$report()$g     # =106859.9
opt$objective    # =94814.34
obj$fn(opt$par)  # =94814.34

The question then comes, which is the correct value for the LLF?

If the obj$report() value is incorrect, then are all reported vectors also incorrect for the optimized value?

If the obj$report() value is correct, though, then what does the likelihood value that the optimizer spits out mean?

What is the logic behind the obj$report() vector? Is it placed at the wrong spot? Is it not logical to try to report on the LLF value? Or if there is, is there a more TMB compatible way to report on all the added parts of the LLF, like was remarked above?

Sorry for all the questions, but this seems like a key point of understanding in the optimization procedure of TMB, and is essential for its usefulness as a program.

Thank you so much!

Nathaniel

@Njoselson
Copy link
Author

Njoselson commented Jul 25, 2016

Hi, one more comment.

This disparity of values seems even more strange because if one sets a REPORT() macro for the parameters, then the parameter values in the output of obj$report() matches to the values of opt$par.

This means that there is something specific about reporting on the LLF which makes the REPORT() macro confused.

I have been looking at the source code on this page but I can't seem to figure out why the REPORT() would be incorrect for the objective function value but correct for all of the other variables it reports on.

Please can you help shed some light on this confusion,

Thanks in advance,

Nathaniel

@calbertsen
Copy link
Contributor

I think the value you are reporting is the joint log-likelihood of parameters and random effects, whereas the value given by obj$fn is the marginal log-likelihood of the parameters.

@kaskr
Copy link
Owner

kaskr commented Jul 25, 2016

@Njoselson
Perhaps you have overlooked the documentation of ?MakeADFun especially for the random argument and for the REPORT() macro?

@Njoselson
Copy link
Author

@kaskr @calbertsen
Thank you for this. I understand this answer in the context of the orange_big example, where random effects are specified.

However the initial issue that caused me to open this question was: if there were no random effects specified, what would cause the reported likelihood components not to equal the opt$objective value?

In the initial comment: Why is the final value -208.386673 from optim not equal to

> obj$report()$LLF
[1] -194.526013359

I think @HaikunXu also mentioned a problem like this, and @tylerjoedonaldson .

Are there any other factors that would be added to the REPORT, but excluded from the opt$objective or vice versa?

Nathaniel

@calbertsen
Copy link
Contributor

It might help to provide a small example (without random effects) where opt$objective, obj$fn(opt$par), and obj$report(obj$env$last.par.best) does not give the same function value?

@Njoselson
Copy link
Author

I will see if I can do that.

The model I am working on right now is very large, so I will see if the problem is reproduced on a smaller example.

Thank you so much @calbertsen

@Njoselson
Copy link
Author

Thanks so much for keeping up with this thread, I know it must be frustrating.

@calbertsen , I haven’t been able to come up with a small example where the obj$report()LLF doesn’t match the opt$objective value after optimization. Nevertheless, when we optimize our code we run into exactly this problem.

Is there any other feature of the objective function which will be integrated out after optimization, the way Random Effects are?

Also, I notice that none of the TMB examples calculate the objective function by calculating values of LLF1, ... , LLFn and then adding them at the end of the program. Instead all components of the objective function are added as they are calculated with multiple lines: LLF += ...

Is there a reason this method would be preferred? I ask because it seems that @HaikunXu also had a problem when breaking the LLF into sub components for reporting.

I understand that this is a rather cryptic discussion without specific code, but any help is highly valued.

Thanks!
Nathaniel

@alexfun
Copy link

alexfun commented Mar 21, 2017

I am getting the same issue where obj$fn(par) actually evaluates to a different value when compared to getting the fitted values from obj$report().

Here is a MWE. It requires the skellam package to generate skellam random variables. The objective function is not continuous but I would not have thought that this would affect the function return value and the actual calculated value. Here is the objective function:

#include <TMB.hpp>

template<class Type>
Type objective_function<Type>::operator() () {
    // data:
    DATA_MATRIX(cubic);
    DATA_VECTOR(x1);
    DATA_VECTOR(x2);
    
    
    // response variables
    DATA_VECTOR(Z);
    
    
    // parameters:
    PARAMETER_VECTOR(cubic_param); 
    PARAMETER(x1_param);
    PARAMETER(x2_param);
    
    
    // regression equation: 
    vector<Type> f = cubic * cubic_param; 
    int n = Z.size(); // get number of data points to loop over
    vector<Type> mu1(n);
    vector<Type> mu2(n);
    
    for (int i = 0; i < n; i++){
        mu1[i] = x1[i] * x1_param + f[i];
        if (mu1[i] <= 0) mu1[i] = 0.01;
        mu2[i] = x2[i] * x2_param;
    }
    
    Type nll = 0.0; // initialize negative log likelihood
    
    
    for(int i = 0; i < n; i++){ 
        
        nll -= -(mu1[i] + mu2[i] - 0.5 * Z[i] * (log(mu1[i]/mu2[i])) - log(besselI(2 * sqrt(mu1[i] * mu2[i]), Z[i])));
        
    }

    // Report for diagnostics
    REPORT(mu1);
    REPORT(mu2);    
    REPORT(nll);

    // return objective function value
    return nll;
    
}

Generate data and fit model:

# Generate the data
set.seed(0)
N <- 10000


# covariates
error_term_1 <- rnorm(N)
error_term_2 <- rnorm(N)
cubic <- runif(N, min = -2, max = 2)
cubic <- cbind(cubic^3, cubic)
f <- cubic %*% c(1, -1)
x_1 <- runif(N, min = 0.01, max = 5)
x_2 <- runif(N, min = 3, max = 8)


X_1 <- 0.5 * x_1 + (f > 0) * f 
X_1[X_1 <= 0] <- 0.01
X_2 <- 0.7 * x_2 


Z <- skellam::rskellam(N, X_1, X_2)

# Fit model
# dyn.unload('cpp_files/tmp_obj.dll')
compile('cpp_files/tmp_obj.cpp')
dyn.load('cpp_files/tmp_obj.dll')

optim_in <- MakeADFun(data = list(Z = Z,
                                  sigma = 2,
                                  cubic = cubic,
                                  x1 = x_1,
                                  x2 = x_2),
                      parameters = list(cubic_param = c(1, -1),
                                        x1_param = 1,
                                        x2_param = 1),
                      DLL = "tmp_obj")


optim_out <- optim(optim_in$par, optim_in$fn, optim_in$gr, method = "BFGS", control = list(trace = 0))
print(paste0('NLL value returned from optim() is: ', optim_out$value))
print(paste0('Parameters returned from optim() is: ', paste0(optim_out$par, collapse = ',')))
print(paste0('NLL value returned using report() is: ', optim_in$report()$nll))


# now check with BFGS with no gradient, wrapping objective function
nll_wrapper <-function(par){
        optim_in$fn(par)
        # print(optim_in$report()$nll)
        
        return(optim_in$report()$nll)
        
}

optim_out_good <- optim(optim_in$par, nll_wrapper, method = "BFGS", control = list(trace = 0))
print(paste0('NLL value returned from optim() with nll_wrapper is: ', optim_out_good$value))
print(paste0('NLL value returned from fn() with parameters found above is: ', optim_in$fn(optim_out_good$par)))
print(paste0('Parameters returned from optim() with nll_wrapper is: ', paste0(optim_out_good$par, collapse = ',')))
print(paste0('Actual log-likelihood values calculated from fitted means from report() is: ', 
             sum(-log(skellam::dskellam(Z, optim_in$report()$mu1, optim_in$report()$mu2)))))

Now, diagnostics:

> print(paste0('NLL value returned from optim() is: ', optim_out$value))
[1] "NLL value returned from optim() is: 23713.4614016981"
> print(paste0('Parameters returned from optim() is: ', paste0(optim_out$par, collapse = ',')))
[1] "Parameters returned from optim() is: 0.681834654588285,-0.687026053315376,0.706245411467174,0.72590157685166"
> print(paste0('NLL value returned using report() is: ', optim_in$report()$nll))
[1] "NLL value returned using report() is: 23604.3501271491"

It is clear that the nll values returned are different depending whether you call fn(), or get the report() value.

I believe that the report() value is correct. In this case, one can directly calculate the nll value by writing a wrapper. In this case:

> print(paste0('NLL value returned from optim() with nll_wrapper is: ', optim_out_good$value))
**[1] "NLL value returned from optim() with nll_wrapper is: 23604.3501271491"**
> print(paste0('NLL value returned from fn() with parameters found above is: ', optim_in$fn(optim_out_good$par)))
**[1] "NLL value returned from fn() with parameters found above is: NaN"**
> print(paste0('Parameters returned from optim() with nll_wrapper is: ', paste0(optim_out_good$par, collapse = ',')))
[1] "Parameters returned from optim() with nll_wrapper is: 0.739865698020581,-1.07489320653228,0.765535524439825,0.72142470937947"
> print(paste0('Actual log-likelihood values calculated from fitted means from report() is: ', 
+              sum(-log(skellam::dskellam(Z, optim_in$report()$mu1, optim_in$report()$mu2)))))
[1] "Actual log-likelihood values calculated from fitted means from report() is: 23604.3501271491"

Now you see, even though the parameters can be found from the wrapper, evaluating the fn() at the found parameters leads to an invalid value. Yet, if you query the fitted means and compute them in R, you obtain the right nll value as found using nll_wrapper.

@kaskr
Copy link
Owner

kaskr commented Mar 21, 2017

The difference is most likely a result of parameter dependent branching:

if (mu1[i] <= 0) mu1[i] = 0.01;

See https://github.com/kaskr/adcomp/wiki/Things-you-should-NOT-do-in-TMB and #112 .

If you really want branching on the tape (not recommended as it is not differentiable) here's the equivalent using conditional expressions:

mu1[i] = CppAD::CondExpLe(mu1[i], Type(0), Type(0.01), mu1[i]);

@alexfun
Copy link

alexfun commented Mar 22, 2017

Thanks, can't believe I forget about that. Implementing a smoothed version of the if statement has fixed the issue.

However, I am still confused why the obtained values from REPORT() are different from that returned by the c++ function (since they seem to target the same variable in my very limited knowledge). I would have thought that both of them would be equal, but wrong.

@gosselinf
Copy link

I am unsure if this is the same issue, but what about trying to calculate obj$fn(par) or obj$fn() twice after optimisation or Making AD function? Indeed, I have observed in some cases that the first value of obj$fn() was not the same as the second one (which was generally the same as the ones afterwards). I have no idea why.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants