Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MethodError: no method matching bijector(::AbstractGPs.FiniteGP) #122

Closed
andreaskoher opened this issue Mar 30, 2021 · 20 comments · Fixed by #123
Closed

MethodError: no method matching bijector(::AbstractGPs.FiniteGP) #122

andreaskoher opened this issue Mar 30, 2021 · 20 comments · Fixed by #123

Comments

@andreaskoher
Copy link
Contributor

Hey all,

first of all, great package! I work a lot with Turing and now try to add GPs in order to estimate latent functions. That said, the simple example below breaks

using AbstractGPs, KernelFunctions
using Turing

X = 1:100
f = GP( Matern32Kernel() )
u = rand( f(X) )
λ = exp.(u)
y = rand.(Poisson.(λ))

@model function latent_gp_model(X, y)
    f  = GP(Matern32Kernel())
    u ~ f(X)
    λ = exp.(u)
    y .~ Poisson.(λ)
end
c = sample( latent_gp_model(X, y), NUTS(), 10)

MethodError: no method matching bijector(::AbstractGPs.FiniteGP)

As a workaround, I can define a struct that inherits from AbstractMvNormal and everything works fine:

struct MvNormalGP{T<:AbstractGPs.FiniteGP} <: AbstractMvNormal
    f::T
end
AbstractGPs.rand(rng::Random.AbstractRNG, f::MvNormalGP) = rand(rng, f.f)
AbstractGPs.logpdf(f::MvNormalGP, y::AbstractVector{<:Real}) = logpdf(f.f, y)

@model function latent_gp_model(X, y)
    f  = GP(Matern32Kernel())
    u ~ MvNormalGP( f(X) )
    λ = exp.(u)
    y .~ Poisson.(λ)
end
c = sample( latent_gp_model(X, y), NUTS(), 10)

Thats not a serious issue but, as a beginner to both Turing and AbstractGPs, it took me a while to figure out the problem. It is also somewhat inconsistent in the case of Stheno: With the recent PR, sampling of sparse latent GPs with Stheno works out of the box but, sampling from a non-sparse GP has the same issue as above since the implementation is based on AbstractGPs.

Do you think, it would be possible to derive GPs from AbstractMvNormal instead of ContinuousMultivariateDistribution (at least in the case of Gaussian likelihoods) ?

@devmotion
Copy link
Member

Do you think, it would be possible to derive GPs from AbstractMvNormal instead of ContinuousMultivariateDistribution (at least in the case of Gaussian likelihoods) ?

Yes, I think this would make sense and would fix the issues in your example.

@willtebbutt
Copy link
Member

Seconded. We should definitely be doing that. I think when I implemented this originally I was unaware that Distributions provided an AbstractMvNormal type.

@andreaskoher
Copy link
Contributor Author

Cool. Should I make a PR?

@willtebbutt
Copy link
Member

Please :)

@theogf
Copy link
Member

theogf commented Apr 1, 2021

By the way @willtebbutt @devmotion what is the reason for FiniteGPnot inheriting from AbstractGP and how come there is no AbstractFiniteGP ?

@devmotion
Copy link
Member

IMO technically it's not a GP, it's just a multivariate normal distribution which happens to be a projection of a GP. This projection can be applied to really any GP. Concrete implementations of GPs should be subtypes of AbstractGP, and then if you want to work with the finite-dimensional projection you construct a FiniteGP, e.g., with gp(x) where gp is the GP and x are the points of the projection.

@theogf
Copy link
Member

theogf commented Apr 1, 2021

Ok so what you say is that AbstractGPs does not aim at building an interface for finite-dimensional projections right?
I find this a bit sad since constructing things around the infinitely dimensional GP is pretty limited and not the most interesting thing to expand (at least codewise).

@devmotion
Copy link
Member

Nothing prevents you from specializing methods of FiniteGP for specific GPs. Or design a different finite-dimensional projection.

@theogf
Copy link
Member

theogf commented Apr 1, 2021

You mean I can define specific methods for FiniteGP{<:MySpecialGP} or have MySpecialFiniteGP{<:AbstractGP} but for the latter I find it unfortunate that nothing connects MySpecialFiniteGP and FiniteGP

EDIT: Once again maybe I am missing the point. I am going to check what WIll did for Stheno/TemporalGPs to understand it better

@devmotion
Copy link
Member

Yes, exactly, you can define specific methods for FiniteGP{<:MyGP}. That's what Will did in TemporalGPs as well it seems: https://github.com/JuliaGaussianProcesses/TemporalGPs.jl/blob/24343744cf60a50e09b301dee6f14b03cba7ccba/src/gp/lti_sde.jl#L20-L55

If you implement (gp::MyGP)(x::AbstractVector) = MyFiniteGP(x) you could still hook into the ecosystem. But, of course, you'd not get any of the functionality of FiniteGP.

@willtebbutt
Copy link
Member

willtebbutt commented Apr 1, 2021

Yes, exactly, you can define specific methods for FiniteGP{<:MyGP}.

Indeed -- this is what I found most convenient in practice. In Stheno I didn't find the need to specialise anything (I don't think?) but for TemporalGPs I needed to implement logpdf, rand, etc directly because cov(f(x)) isn't really something you want for state-space stuff (could do lazy stuff, but that runs into other issues)

By the way @willtebbutt @devmotion what is the reason for FiniteGPnot inheriting from AbstractGP and how come there is no AbstractFiniteGP ?

In addition to what's already been said to this point, the main reason that FiniteGP doesn't subtype AbstractGP is a practical one. Making FiniteGP subtype AbstractMvNormal (previously ContinuousMultivariateDistribution means that it plays nicely with code that expects a Distributions.jl type.

When I first implemented Stheno.jl, I actually didn't actually have a FiniteGP type -- cov(f) was defined for all GPs, it just errored if your input space wasn't finite-dimesional, and if you wrote f(x) you got another GP. I still find this approach moderately appealing on an aesthetic level, but in practice it became messy really quickly. The FiniteGP construct really helped to clean up my code a lot.

Additionally, the AbstractGP API is agnostic regarding the index set / domain / input space of your GP, so there's really nothing stopping you from constructing GPs on finite dimensional spaces. Indeed, we kind of already do this with multi-output GPs, where one of the dimensions is finite (the one that indexes the outputs).

Ok so what you say is that AbstractGPs does not aim at building an interface for finite-dimensional projections right?

I think this is only partly true. The interface itself is all about computing the statistics of finite-dimensional projections of infinite-dimensional objects (cov(f, x), mean(f, x), etc).

GP packages that are happy to implement cov(f, x) etc shouldn't really have to know anything about FiniteGPs -- FiniteGPs are really just book-keeping objects.

Are there any particular aspects of the interface that you're concerned about @theogf ? AFAICT, these are really the only things that you need to implement if you've got a standard kind of GP -- everything else can be built on top of that (e.g. pseudo-point approximations).

@theogf
Copy link
Member

theogf commented Apr 1, 2021

Thanks for the lengthy description. This is not for nothing as I am making a PR for the docs to explain the interface :)

Are there any particular aspects of the interface that you're concerned about @theogf ? AFAICT, these are really the only things that you need to implement if you've got a standard kind of GP -- everything else can be built on top of that (e.g. pseudo-point approximations).

So the reason for my questioning is of course about the docs PR but also because I tried to make a quick implementation of VI for GP (using my augmented method but that's not relevant), and I was confused by the fact that when creating VariationalGP
which is basically:

struct VariationalGP
  lgp::LatentFiniteGP
  mu::Vector
  Sigma::Matrix
end

It would be weird to make inherit from AbstractGP since it's obviously not infinitely dimensional, however ApproxPosteriorGP does inherit from AbstractGP and I agree that the methods implemented "make sense" but still it feels odd given the explanations you gave above...

@willtebbutt
Copy link
Member

I see. Assuming that I've understood correctly, and your VariationalGP is a GP + non-Gaussian likelihood, then I would say that it's innappropriate to make it subtype GP because it's not itself a GP, in the sense that the joint distribution over all of the random variables isn't Gaussian. This is why LatentGP doesn't subtype AbstractGP.

On the finite- vs infinite-dimensional stuff, I am curious though -- is there a particular reason that your VariationalGP is a finite-dimensional thing? I would have thought that, given that it can presumably make predictions wherever you like, it would be an infinite-dimensional thing?

@theogf
Copy link
Member

theogf commented Apr 1, 2021

Well in my view it represents a variational distribution q(f(X)) which is a (finite) multivariate normal distribution.
Actually you pinpoint the thing which I am uneasy with : "given that it can presumably make predictions wherever you like, it would be an infinite-dimensional thing?". This does not sound right with me. You can also make predictions wherever you want with a linear regression model yet it's finite.

@willtebbutt
Copy link
Member

willtebbutt commented Apr 1, 2021

Well in my view it represents a variational distribution q(f(X)) which is a (finite) multivariate normal distribution.

Fair enough. So can I not use it to make predictions at test locations?

This does not sound right with me. You can also make predictions wherever you want with a linear regression model yet it's finite.

Okay, that's fair. There I guess there are two kind of infinite going on here. I'm referring to the infinite in the sense that you're distributing over functions, which are infinite-dimensional objects. In this sense, a linear regression model is infinite, its just that it's possible to parametrise said infinite-dimensional object using a finite-dimension vector.

edit: so by infinite I mean that the index set / input domain isn't finite.

@theogf
Copy link
Member

theogf commented Apr 1, 2021

Ok maybe it would be easier to understand what's NOT an infinite-dimensional object.
FiniteGP fits this definition because you can obviously not make any prediction from it.
LatentFiniteGP is just a generalization of FiniteGP so it's the same thing.
LatentGP is a tricky one because although you can make predictions, you would not project a GP but an data output or something like this. .

Ok I think I get the grasp of it. Basically if you can consider the object as a map to a GP realization then it fits the description.

@willtebbutt
Copy link
Member

LatentGP is a tricky one because although you can make predictions, you would not project a GP but an data output or something like this. .

Yeah, I think I would call a LatentGP an infinite-dimensional thing if the AbstractGP it wraps is infinite-dimensional.

My conception of the LatentGP is that it's simple a hierarchical model

f ~ GP(m, k)
y | f ~ D(f)

for some family of distributions D. The usual p(y(x_{1:N}) | f) = \prod_{n=1}^N p(y(x_n) | f(x_n)) being a special case of this.

Basically if you can consider the object as a map to a GP realization then it fits the description.

Yeah, provided that the space your GP distributes over is infinite dimensional (I do think it's reasonable to refer to a multivariate Normal as a GP with a finite-dimensional index set)

@willtebbutt
Copy link
Member

willtebbutt commented Apr 3, 2021

@theogf I've been thinking a bit about your VariationalGP object (above), and was wondering what you think about the following idea.

Firstly, I'm assuming that the approximate posterior you're winding up with over your entire process f is something like

q(f) = q(u) p(f_{\neq u} | u)

i.e. the usual Titsias approximate posterior, but in your case the u might just be the RVs that you make noisy (non-Gaussian) observations of.

The way I'm interpreting your VariationalGP object above is that mu and Sigma give you q(u):

q(u) := N(u; mu, Sigma)

and the AbstractGP that lives inside lgp gives you p(f | u) via its mean and kernel.

Given this structure, I wonder whether we could implement your posterior in terms of existing data structures. Specifically,

qf = LatentGP(
    ApproxPosteriorGP(TheoApprox(), lgp.f, (mu=mu, Sigma=Sigma)),
    lgp.lik,
    lgp.Σy,
)

Then you're back inside the LatentGP framework, but I think this object faithfully represents your approximate posterior, no? If so, you could then just implement a function

qf = theos_approximate_inference(lgp::FiniteLatentGP, y::AbstractVector{<:Real})

that does inference using your methodology, and returns the approximate posterior in the proposed form.

The reason I'm keen on this is that, assuming I've understood your approximation properly, that loads of approximate inference algorithms produce approximate posteriors that are representable by the family of processes that qf is a member of. For example, the older variational inference approaches (natural gradient based etc) fall inside this framework, as does the usual Lapace approximation, and possibly EP etc -- they essentially only differ in the values of mu and Sigma that they return.

What do you think?

edit: probably we'd want a different name from TheoApprox for the type that we use to distinguish this kind of approximate posterior from other kinds, but that's a naming discussion that we can have later.

@theogf theogf reopened this Apr 3, 2021
@theogf
Copy link
Member

theogf commented Apr 3, 2021

Reopening although I know the discussion diverged quite a lot.

So to make things more concrete, I started to sketch a package for ApproximateGPs and implement my augmented approach :
https://github.com/theogf/ApproximateGPs.jl (needs to be updated given this discussion)

In my example I was not even considering Titsias, just variational GPs. Now I understand what is aimed at with ApproximateGPPosterior (it was not super clear either :D), a lot of docs are definitely needed :)

So in your proposition, one would just give some inference functions that optimize the variational parameters and finally returned an ApproximateGPPosterior (which could eventually be further optimized). It's very elegant indeed!

About wrapping in a LatentGP afterwards I think this should be discussed, because one might be interesting in getting the latent GP instead of the output prediction. Should we get a similar interface than MLJ.jl namely a fit! and predict.
In the sense that predict(qf, X*) provides p(y*|q) and we can still use qf(X*) for computing p(f*|q)

I will push till the end to get TheoApprox to be the standard!

@willtebbutt
Copy link
Member

willtebbutt commented Apr 3, 2021

Apologies in advance, this is longer one.

In my example I was not even considering Titsias, just variational GPs. Now I understand what is aimed at with ApproximateGPPosterior (it was not super clear either :D), a lot of docs are definitely needed :)

It's really nice that the same form of the approximate posterior works for both regular variational GPs and the Titsias formulation of pseudo-point approximations, and that said approximate posterior GP is literally just a (slightly strange looking) GP. Also, yes, we always need more docs 😂

About wrapping in a LatentGP afterwards I think this should be discussed, because one might be interesting in getting the latent GP instead of the output prediction

Agreed -- you can always pull the latent GP out of a LatentGP though, so we wouldn't lose information either way. I suspect that we need to work through some examples to get a feel for what is likely to be most convenient in practice.

Should we get a similar interface than MLJ.jl namely a fit! and predict.

We shouldn't need a predict function -- we've already got the FiniteGP / FiniteLatentGP stuff, so given some kernel parameters, you should just be able to do something like

qf = approx_posterior(MyFancyPosteriorApproximationMethod(), lgp_x::FiniteLatentGP, y::AbstractVector)
marginals(qf(X*))
rand(qf(X*))

Am I missing anything here?

Regarding hyperparameter optimisation, I think we have quite different API preferences, and I would like to figure out whether it's possible to support both.

Specifically, I get the impression that you favour a mutating fit! API, whereas I prefer the kind of non-mutating thing that ParameterHandling.jl lets you support. If you take a look at this bit of Stheno.jl's docs, you'll see what I'm on about.

As an example of the kind of API that I would like to have, lets just consider approximate inference algorithms where the basic scheme for doing inference and hyperparameter learning is

  1. Compute optimal approximate posterior qf for the current hyperparameters
  2. Differentiate through an objective function (approximation to log marginal likelihood) at the optimal qf to obtain the gradient w.r.t. the kernel parameters
  3. Update kernel parameters. If not converged, go to step 1.

There are obviously other kinds of algorithms which, for example, jointly update the variational and kernel parameters at the same time, but lets stick with this for now because it's simple. I'm also going to be assuming that we're not mini-batching.

To implement an algorithm like the one described above, you need two functions:

  1. the approx_posterior function discussed above, which computes the optimal qf given the current kernel parameters,
  2. some kind of approximate_log_marginal_likelihood(qf, lgp, y) function, which computes an approximation to the log marginal likelihood, given an approximate posterior qf, a latent GP lgp, and some observations y.

Given these components, you can build a learning algorithm using e.g. Zygote.jl + Optim.jl + ParameterHandling.jl, and perform inference once you've got good kernel parameters just using approx_posterior.

I think you could also use these functions as building blocks inside a fit! function if that's what you prefer. I definitely want to have access to the building blocks and not to be forced to use the Functors.jl interface though :)

What do you think?

I will push till the end to get TheoApprox to be the standard!

😂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants