Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Approximate Inference #193

Closed
willtebbutt opened this issue Jul 30, 2021 · 9 comments
Closed

Approximate Inference #193

willtebbutt opened this issue Jul 30, 2021 · 9 comments

Comments

@willtebbutt
Copy link
Member

willtebbutt commented Jul 30, 2021

Currently, we have

approx_posterior(approximation, fx, y, u)

As pointed out by @st-- and @rossviljoen in SparseGPs.jl, we should consider reducing this to a 3-arg function in which approximation contains u, since u is really a component of the approximation. This kind of thing would generalise more elegantly to what @rossviljoen is doing in SparseGPs.jl, as it will be natural in that case to put the variational parameters associated with q(u) inside approximation as well. More generally, there are approximate inference algorithms which don't involve pseudo-points, and it would be nice to generalise to them also.

So the new approx_posterior function for the saturated VFE approximation would be something like

approx_posterior(VFE(u), fx, y)

and for the unsaturated would be something like

approx_posterior(VFE(u, qu), fx, y)

(or something a bit like that).

@willtebbutt
Copy link
Member Author

willtebbutt commented Aug 1, 2021

I wonder whether we actually just want to keep the same function names and always pass an approximation argument?

For example,

posterior(approximation, fx, y)
logpdf(approximation, fx, y)
rand(rng, approximation, fx)

etc?

This would mean that users wouldn't have to learn a new API -- they'd just add an extra argument to the existing one. It would also create a nice way to think about the kinds of operations that we might like to implement on approximations, and provide a framework for characterising what kinds of operations different approximations are able to provide. For example, the vanilla version of the variational pseudo-point approximation doesn't provide a performant implementation of rand, but the pathwise-sampling version would. You could imagine a table of ticks and crosses characterising each of the approximations on offer.

@st--
Copy link
Member

st-- commented Aug 23, 2021

@rossviljoen @willtebbutt is this sufficiently resolved by #194 ?

@willtebbutt
Copy link
Member Author

My inclination is to say no, because we're not completely satisfied with the result of #194 , although it's an improvement on what we had before. I'll refer future readers to this and subsequent comments #194 (comment)

@st--
Copy link
Member

st-- commented Aug 26, 2021

@willtebbutt could you summarise the remaining issues in here ?

@willtebbutt
Copy link
Member Author

willtebbutt commented Aug 26, 2021

Certainly. For both approximations that we've encountered so far (Titsias (2009) and Hensman (2013)), once the ApproxPosteriorGP has been produced, it's clear that we want to implement the regular AbstractGPs API on it.

The first question is what to do about the API for generating it. For Titsias (2009), something like

posterior(VFE(f(z)), f(x), y)

(or similar) makes sense, whereas for Hensman (2013) you could get away with something like

posterior(VFE(f(z), q))

because the approximate posterior is mediated by q(u).

Of course, the Titsias (2009) approximation is just the Hensman (2013) approximation with the optimal choice of q(u), so we could write the Titsias (2009) implementation as

posterior(VFE(f(z), f(x), y))

or something, and it would make sense. Maybe we should have done that...

The second question is what to do about the elbo. In both cases, you can make sense of something like

elbo(VFE(...), f(x), y)

potentially with some extra arguments.

The solution we went with was

posterior(VFE(f(z)), f(x), y)
elbo(VFE(f(z)), f(x), y)

for Titsias (2009), and presumably a similar thing will happen in SparseGPs for Hensman (2013).

This was a slightly hurried design choice for the sake of getting something that was an improvement on what we currently had.

I'm confident that we can find a better solution, just no one has found it yet.

@willtebbutt
Copy link
Member Author

I wonder whether something like

approx = VFE(f(z), f(x), y)
elbo(approx) # returns a scalar
approx_posterior = posterior(approx) # return an ApproxPosteriorGP

and

approx = VFE(f(z), f(x), y, q; config...) # config contains things about batch sizes etc.
elbo(rng, approx) # returns an estimator of the ELBO
approx_posterior_gp = posterior(approx) # returns an ApproxPosteriorGP

would make more sense? Still doesn't feel quite right though...

@st--
Copy link
Member

st-- commented Mar 21, 2022

Reviving this discussion, another question is where should the computation/optimisation go? E.g. for SparseVariationalApproximation we need to optimise to find the optimal q(u), and currently this is left as an exercise for the user, and posterior(sva, lfx, y) is then super fast (and ignores the last two arguments entirely). In contrast, posterior(LaplaceApproximation(), lfx, y) actually computes the mode of the posterior. From a user perspective, it'd be nice if it was consistent (and easy...).

@willtebbutt
Copy link
Member Author

willtebbutt commented Mar 23, 2022

Yeah -- I at least think we ought to be able to do this in the deterministic-objective case (no minibatching, quadrature / exact reconstruction-term computation) at the very least. I've done something like this in ConjugateComputationVI.jl (which I should move over to here and align with what you've done with the Laplace approximation).

edit: if you've got particular ideas for the stochastic-objective case, I'd be interested to know what they are though.

@willtebbutt
Copy link
Member Author

I believe that this is now stale, so am closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants