-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Approximate Inference #193
Comments
I wonder whether we actually just want to keep the same function names and always pass an For example, posterior(approximation, fx, y)
logpdf(approximation, fx, y)
rand(rng, approximation, fx) etc? This would mean that users wouldn't have to learn a new API -- they'd just add an extra argument to the existing one. It would also create a nice way to think about the kinds of operations that we might like to implement on approximations, and provide a framework for characterising what kinds of operations different approximations are able to provide. For example, the vanilla version of the variational pseudo-point approximation doesn't provide a performant implementation of |
@rossviljoen @willtebbutt is this sufficiently resolved by #194 ? |
My inclination is to say no, because we're not completely satisfied with the result of #194 , although it's an improvement on what we had before. I'll refer future readers to this and subsequent comments #194 (comment) |
@willtebbutt could you summarise the remaining issues in here ? |
Certainly. For both approximations that we've encountered so far (Titsias (2009) and Hensman (2013)), once the The first question is what to do about the API for generating it. For Titsias (2009), something like posterior(VFE(f(z)), f(x), y) (or similar) makes sense, whereas for Hensman (2013) you could get away with something like posterior(VFE(f(z), q)) because the approximate posterior is mediated by Of course, the Titsias (2009) approximation is just the Hensman (2013) approximation with the optimal choice of posterior(VFE(f(z), f(x), y)) or something, and it would make sense. Maybe we should have done that... The second question is what to do about the elbo. In both cases, you can make sense of something like elbo(VFE(...), f(x), y) potentially with some extra arguments. The solution we went with was posterior(VFE(f(z)), f(x), y)
elbo(VFE(f(z)), f(x), y) for Titsias (2009), and presumably a similar thing will happen in SparseGPs for Hensman (2013). This was a slightly hurried design choice for the sake of getting something that was an improvement on what we currently had. I'm confident that we can find a better solution, just no one has found it yet. |
I wonder whether something like approx = VFE(f(z), f(x), y)
elbo(approx) # returns a scalar
approx_posterior = posterior(approx) # return an ApproxPosteriorGP and approx = VFE(f(z), f(x), y, q; config...) # config contains things about batch sizes etc.
elbo(rng, approx) # returns an estimator of the ELBO
approx_posterior_gp = posterior(approx) # returns an ApproxPosteriorGP would make more sense? Still doesn't feel quite right though... |
Reviving this discussion, another question is where should the computation/optimisation go? E.g. for SparseVariationalApproximation we need to optimise to find the optimal q(u), and currently this is left as an exercise for the user, and |
Yeah -- I at least think we ought to be able to do this in the deterministic-objective case (no minibatching, quadrature / exact reconstruction-term computation) at the very least. I've done something like this in edit: if you've got particular ideas for the stochastic-objective case, I'd be interested to know what they are though. |
I believe that this is now stale, so am closing. |
Currently, we have
approx_posterior(approximation, fx, y, u)
As pointed out by @st-- and @rossviljoen in SparseGPs.jl, we should consider reducing this to a 3-arg function in which
approximation
containsu
, sinceu
is really a component of the approximation. This kind of thing would generalise more elegantly to what @rossviljoen is doing inSparseGPs.jl
, as it will be natural in that case to put the variational parameters associated withq(u)
insideapproximation
as well. More generally, there are approximate inference algorithms which don't involve pseudo-points, and it would be nice to generalise to them also.So the new
approx_posterior
function for the saturatedVFE
approximation would be something likeand for the unsaturated would be something like
(or something a bit like that).
The text was updated successfully, but these errors were encountered: