Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling Julia functions in R very slow due to overhead of juliaGet() #12

Open
woodwards opened this issue Mar 16, 2018 · 7 comments
Open

Comments

@woodwards
Copy link

I am testing calibration in R of a model in Julia. It works, but having to use juliaGet() for each model call makes it extremely slow. Is there a faster way to do it, where the Julia functions act like native R functions without the overhead?

My Julia module is

# linmod_module.jl
module linmod_module

export linmod, linmodvec, loglikelihood

function linmod(x, a, b)
  y = a + b * x
  return y
end

function linmodvec(datax, a, b)
  y = linmod(datax, a, b)
  return y
end

function loglikelihood(datay, datax, a, b, error)
  datan = size(datax, 1)
  y = linmod(datax, a, b)
  z = (datay - y) / error
  ll = - 0.5 * datan * log(2 * pi * error * error) - 0.5 * sum(z .* z);
  return ll
end

end

My R Script is

# try BayesianTools with Julia model

library(XRJulia)
library(BayesianTools)
# library(coda)

# make some data
x_data <- seq(0, 1, length.out=1000)
y_data <- 0.5+0.5*x_data+rnorm(length(x_data),0,0.1)
model_se <- 0.1

# find Julia functions
findJulia(test=TRUE)
ev <- RJulia() # create Julia evaluator
ev$AddToPath(getwd()) # find module linmod

# test Julia model
linmod <- JuliaFunction("linmod", module="linmod_module")
test <- linmod(0.5, 0.5, 0.5)

# test vectorised Julia model
linmodvec <- JuliaFunction("linmodvec", module="linmod_module")
test <- linmodvec(x_data, 0.5, 0.5)
juliaGet(test) # but this is very slow

# test Julia loglikelihood
loglikelihood <- JuliaFunction("loglikelihood", module="linmod_module")
test <- loglikelihood(y_data, x_data, 0.5, 0.5, model_se)

# BayesianTools setup
bt_names <- c("a", "b")
bt_prior <- createUniformPrior(c(0,0), c(1,1))
bt_likelihood <- function(params){
  loglikelihood(y_data, x_data, params[1], params[2], model_se)
}
bt_parallel <- FALSE
bt_setup <- createBayesianSetup(likelihood=bt_likelihood,
                                prior=bt_prior,
                                parallel=bt_parallel,
                                names=bt_names)
nBurnin <- 900
nSample <- 900
nTotal <-  nSample + nBurnin
nChains <- 3
nInternal <- 3
bt_settings <- list(startValue=nInternal, # DREAMzs internal chains
                    iterations=nTotal/nChains, # iterations per external chain
                    nrChains=nChains, 
                    burnin=nBurnin/nChains+1, # discard these ones
                    parallel=bt_parallel,
                    message=TRUE) 
bt_out <- runMCMC(bayesianSetup=bt_setup, 
                  sampler = "DREAMzs", 
                  settings=bt_settings)

suppressWarnings(summary(bt_out))
tracePlot(bt_out)
correlationPlot(bt_out)

bt_predict <- function(params){
  juliaGet(linmodvec(x_data, params[1], params[2]))
}

bt_error <- function(mean, params){
  return(rnorm(length(mean), mean=mean, sd=model_se))
}

bt_samples <- getSample(bt_out, start=2)
summary(bt_samples)

# this throws an error
suppressMessages({
  plotTimeSeriesResults(sampler=bt_samples,
                      model=bt_predict,
                      observed=y_data,
                      error=bt_error,
                      plotResiduals=TRUE,
                      main="Residual Analysis")
})
@johnmchambers
Copy link
Owner

If I understand your R script, it's not juliaGet() per se, but the fact that you are running MCMC (which is not fast in R anyway) with a 2-way interface with data conversion between R and Julia on each call for the likelihood.

Unfortunately, this is just the opposite of what I recommend (in Extending R) for good interface computation: give the other language a substantial computation to do.

@vh-d
Copy link
Contributor

vh-d commented Mar 17, 2018

Also I doubt that computation of this particular likelihood function can be significantly improved by "outsourcing" it to a different language than R. All the code is vectorized and carried out by highly optimized fortran code called from R. So even using faster interface such as Rcpp would not help too much I am afraid.

@woodwards
Copy link
Author

woodwards commented Mar 18, 2018

Thank you gentlemen.

The context is that I have a large biological model in ACSLX which we want to rewrite, since ACSLX is deprecated, and I am looking at options. We want to run this model both from R and also as a "DLL" within another, even larger model. I am looking at the readability, support, communication, speed, of a number of languages, including Fortran, C++, Python, Julia.

This is just a test script that I am learning with. It runs like lightning when I do exactly the same with Rcpp. Maybe this is because Rcpp does a more efficient job of passing the arrays? So are you saying that XRJulia is just relatively slow at passing data, maybe because some conversion is needed?

@vh-d
Copy link
Contributor

vh-d commented Mar 19, 2018

The context is that I have a large biological model in ACSLX which we want to rewrite, since ACSLX is deprecated, and I am looking at options. We want to run this model both from R and also as a "DLL" within another, even larger model. I am looking at the readability, support, communication, speed, of a number of languages, including Fortran, C++, Python, Julia.

I don't know ACSLX but it looks a bit like what Modelica is used for these days. It is open source and has quite some momentum I think. There is also a Modia.jl (alternative to Modelica written by the creator of Modelica on top of Julia). In general, I think Julia is a great candidate but you have to write a lot of code yourself until the ecosystem of packages covers your field.

This is just a test script that I am learning with. It runs like lightning when I do exactly the same with Rcpp. Maybe this is because Rcpp does a more efficient job of passing the arrays? So are you saying that XRJulia is just relatively slow at passing data, maybe because some conversion is needed?

Yes, Rcpp uses R's C interface so it does not need to copy data very often. XRJulia starts Julia as a separate process and exchange data converted to/from JSON via sockets.

My point in the previous comment was that I expect Rcpp and native R solution to run at very similar speeds. It is probably the MCMC part that is the bottleneck.

@woodwards
Copy link
Author

woodwards commented Mar 19, 2018

Yes ASCLX is a bit like Modelica. I had a brief look into Modelica, but it looked a bit niche, I couldn't find much in the way of forums etc,.

@johnmchambers
Copy link
Owner

Vaclav's summary is very likely the point. Rcpp has two speed advantages over XRJulia in terms of general interfaces (e.g section 12.3 of Extending R): it's at the subroutine level not the language level, so doesn't have to work with an expression; and it's embedded rather than working over connections so doesn't have to format and then parse data every time. In this case, it's pretty certain to be the second aspect that matters.

On top of that, the Rcpp implementers have done a lot of work on the C++ side to mesh data structures. It seems to work well for lots of applications, judging by the number of packages that use it.

@woodwards
Copy link
Author

Thank you for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants