Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce DiffPt for the covariance function of derivatives #508

Draft
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

FelixBenning
Copy link

Summary

This is a minimal implementation to enable the simulation of gradients (and higher order derivatives) of GPs (see also #504)

Proposed changes

For a covariance kernel k of GP Z, i.e.

k(x,y) # = Cov(Z(x), Z(y)),

a DiffPt allows the differentiation of Z, i.e.

k(DiffPt(x, partial=1), y) # = Cov(∂₁Z(x), Z(y))

for higher order derivatives partial can be any iterable, i.e.

k(DiffPt(x, partial=(1,2)), y) # = Cov(∂₁∂₂Z(x), Z(y))

the code for this feature is extremely minimal but allows the simulation of arbitrary derivatives of Gaussian Processes. It only contains

  • DiffPt
  • an _evaluate(::T, x::DiffPt, y::DiffPt) where {T<: Kernel} function which calls partial functions that take the derivatives.

What alternatives have you considered?

This is the implementation with the smallest footprint but not the most performant. What essentially happens here is the simulation of the multivariate GP $f = (Z, \nabla Z)$ which is a $d+1$ dimensional GP if $Z$ is a univariate GP with input dimension $d$. Due to the "no multi-variate kernels" design philosophy of KernelFunctions.jl we are forced to calculate the entries of the covariance matrix one-by-one. It would be more performant to calculate the entire matrix in one go using backward diff for the first pass and forward diff for the second derivative.

It might be possible to somehow specialize on ranges to get back this performance. But it is not completely clear how. Since we do not call

k.(1:d, 1:d)

which could easily be caught by specializing on broadcast but in reality we do something like

k.([(x, 1),...(x,d)], [(y,1),...(y,d)])

And this is still not quite true as we consider all pairs of these lists and not just a zip.

Breaking changes

None.

Comment on lines +78 to +85
This is a hack to work around the fact that the `where {T<:Kernel}` clause is
not allowed for the `(::T)(x,y)` syntax. If we were to only implement
```julia
(::Kernel)(::DiffPt,::DiffPt)
```
then julia would not know whether to use
`(::SpecialKernel)(x,y)` or `(::Kernel)(x::DiffPt, y::DiffPt)`
```
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid this hack, no kernel type T should implement

	(::T)(x,y)

and instead implement

	_evaluate(k::T, x, y)

Then there should be only a single

	(k::Kernel)(x,y) = _evaluate(k, x, y)

which all the kernels would fall back to.

This ensures that _evaluate(k::T, x::DiffPt{Dim}, y::DiffPt{Dim}) where T<:Kernel is always
more specialized and called.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but this is a much more intrusive change so to not blow up the lines changed, I did not do this yet.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is also why "detect ambiguities" fails right now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should switch to _evaluate(k, x, y). Instead of changing all other kernels, I think you should just create a single wrapper of existing kernels.

Copy link
Author

@FelixBenning FelixBenning May 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ugly thing about wrapping existing kernels is, that you essentially say: This is a different gaussian process model.

I.e.

f = GP(MaternKernel())

is a different (non-differentiable) model from

g = GP(DiffWrapper(MaternKernel())),

which is differentiable. But fundamentally the matern kernel implies that the gaussian process should always be $\lfloor \nu\rfloor$-differentiable. So f ought to be differentiable.

And with this abstraction it is. I.e. you use

x = 1:10
fx = f(x)
y = rand(fx)

to simulate f at points 1:10. If you now wanted to simulate its gradient at point 0 too, you would just have to modify its input

x = [DiffPt(0, partial=1), 1:10... ]
fx = f(x)
y0_grad, y... =  rand(fx)

and y0_grad would be the gradient in 0 as expected.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note how this abstraction lets you mix and match normal points and DiffPts

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note how this abstraction lets you mix and match normal points and DiffPts

You can perform exactly the same computations with a wrapper type.

The ugly thing about wrapping existing kernels is, that you essentially say: This is a different gaussian process model.

I don't view it this way. The wrapper does not change the mathematical model, it just allows you to query derivatives as well.

Copy link
Author

@FelixBenning FelixBenning May 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can perform exactly the same computations with a wrapper type.

well, if the wrapped kernel has a superset of the capabilities of the original kernel (without performance cost) - why would you ever use the unwrapped kernel? So if you only ever use the wrapped kernel, then
GP(DiffWrapper(MaternKernel())
is kind of superflous. So then you would probably start to write convenience functions to get the wrapped kernel immediately, but for the wrapped kernel the compositions like +, ... are not implemented. So you would have to pass all those through. It seems like a pointless effort to get a capability, which the original kernel should already have. I mean the capability does not collide with anything.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's not superfluous as the wrapped kernel has a different API (i.e., you have to provide different types/structure of inputs) that is more inconvenient to work with if you're not interested in evaluating those. There's a difference - but IMO it's not a mathematical one but rather regarding user experience.

As you've already noticed I think it's also just not feasible to extend every implementation of differentiable kernels out there without making implementations of kernels more inconvenient for people that do not want to work with derivatives. So clearly separating these use cases seems simpler to me from a design perspective.

The wrapper would be a Kernel as well, of course, so if the existing implementations in KernelFunctions such as sums etc. are written as generally as they were intended no new definitions for compositions are needed. You would add them only if there is a clear performance gain that outweighs code complexity.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no - you do not have to provide different types/structures of inputs. So it does not make it more inconvenient for people not interested in gradients. That is the entire point of specializing on DiffPt the kernel function still works on everything that is not a DiffPt

@codecov
Copy link

codecov bot commented May 17, 2023

Codecov Report

Patch coverage has no change and project coverage change: -16.75 ⚠️

Comparison is base (ef6d459) 94.16% compared to head (deebf0c) 77.41%.

Additional details and impacted files
@@             Coverage Diff             @@
##           master     #508       +/-   ##
===========================================
- Coverage   94.16%   77.41%   -16.75%     
===========================================
  Files          52       54        +2     
  Lines        1387     1430       +43     
===========================================
- Hits         1306     1107      -199     
- Misses         81      323      +242     
Impacted Files Coverage Δ
src/KernelFunctions.jl 100.00% <ø> (ø)
src/diffKernel.jl 0.00% <0.00%> (ø)
src/mokernels/differentiable.jl 0.00% <0.00%> (ø)

... and 19 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@Crown421
Copy link
Member

I don't think ForwardDiff should be an explicit dependency for KernelFunctions. To me this would make more sense as an extension (which might also allow for different implementations, i.e. Enzyme forward).

@@ -8,10 +8,12 @@ Compat = "34da2185-b29b-5c13-b0c7-acf172513d20"
CompositionsBase = "a33af91c-f02d-484b-be07-31d278c5ca2b"
Distances = "b4f34e82-e78d-54a5-968a-f98e89d6e8f7"
FillArrays = "1a297f60-69ca-5386-bcde-b61e274b549b"
ForwardDiff = "f6369f11-7733-5829-9624-2563aa707210"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't want a dependency on ForwardDiff. We tried hard to avoid it so far.

Copy link
Author

@FelixBenning FelixBenning May 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah maybe this should be a plugin somehow, but writing this as a plugin would have caused a bunch of boilerplate to review and I wanted to make it easier to grasp the core idea

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe AbstractDifferentiaton.jl would be the right abstraction?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it's not ready for proper use. But hopefully it will at some point.

Project.toml Outdated Show resolved Hide resolved
@@ -125,6 +127,7 @@ include("chainrules.jl")
include("zygoterules.jl")

include("TestUtils.jl")
include("diffKernel.jl")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kernels are contained in the kernel subfolder.

Copy link
Author

@FelixBenning FelixBenning May 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it is not a new kernel - it is extending the functionality of all kernels which is why I did not put it here. But maybe it makes sense to put it there anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be a new kernel. That's how you can nicely fit it into the KernelFunctions ecosystem.

import LinearAlgebra as LA

"""
DiffPt(x; partial=())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is a separate type needed? Wouldn't it be better to use the existing input formats for multi-output kernels?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where can I find the existing input formats? https://juliagaussianprocesses.github.io/KernelFunctions.jl/stable/design/#inputs_for_multiple_outputs here it says that you would use Tuple{T, Int} and while you could in principle convert a partial tuple to an Int, you reall don't want to.

I mean even if we are gracious and start counting with zero to make this less of a mess:

() -> 0 # no derivative
(1) -> 1 # partial derivative in direction 1
(2) -> 2
...
(dim) -> dim # partial derivative in direction dim
(1,1) -> dim+1 # twice partial derivative in direction 1 
(1,2) -> dim +2
...
(k, j) -> k * dim + j

So the user would have to do this conversion from carthesian coordinates to linear coordinates. And then the implementation would revert this transformation back from linear coordinates to cartesian coordinates. Cartesian coordinates with ex ante unknown tuple length that is.

That all seems annoying without being really necessary. But sure - you could do it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you just need X \times \{0, \ldots, dim\} to represent all desired partials?

Copy link
Author

@FelixBenning FelixBenning May 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is X? (btw mathjax is enabled: $ works for inline and ```math works for multiline maths.)

Copy link
Author

@FelixBenning FelixBenning May 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this is a bit tricky

$$\begin{aligned} f: \mathbb{R}^d\to \mathbb{R}\\\ \nabla f(x) \in \mathbb{R}^d\\\ \nabla^2 f(x) \in \mathbb{R}^{d^2} \end{aligned}$$

So $(f(x), \nabla f(x), \nabla^2 f(x)) \in \mathbb{R} \times \mathbb{R}^d \times \mathbb{R}^{d\times d} = \mathbb{R}^{1+d + d^2}$

So more generally $(f(x), \nabla f(x), ..., f^{(n)}(x)) \in \mathbb{R}^m$ with

$$m=\sum_{k=0}^{n} d^k = \frac{n(n+1)}2$$

In essence a variable length tuple carries more information than just the longest tuple. Plus there is no longest tuple.

src/diffKernel.jl Outdated Show resolved Hide resolved
src/diffKernel.jl Outdated Show resolved Hide resolved
Comment on lines +78 to +85
This is a hack to work around the fact that the `where {T<:Kernel}` clause is
not allowed for the `(::T)(x,y)` syntax. If we were to only implement
```julia
(::Kernel)(::DiffPt,::DiffPt)
```
then julia would not know whether to use
`(::SpecialKernel)(x,y)` or `(::Kernel)(x::DiffPt, y::DiffPt)`
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should switch to _evaluate(k, x, y). Instead of changing all other kernels, I think you should just create a single wrapper of existing kernels.


implements `(k::T)(x::DiffPt{Dim}, y::DiffPt{Dim})` for all kernel types. But since
generics are not allowed in the syntax above by the dispatch system, this
redirection over `_evaluate` is necessary
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my other comment, I think this should not be done and a simple wrapper would be sufficient.

i.e. 2*dim dimensional input
"""
function partial(k, dim; partials_x=(), partials_y=())
local f(x, y) = partial(t -> k(t, y), dim, partials_x)(x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local is not needed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess - felt right to explicitly say that this function is only temporary and is immediately going to be transformed again

src/diffKernel.jl Outdated Show resolved Hide resolved
@Crown421
Copy link
Member

I have been playing around with the ideas in PR, and realized that to make this work there are some open questions.

The first issue is that kernelmatrix is specialized for SimpleKernels and MOKernels, which would require some additional thought and changes to make work.
For SimpleKernels just adding the _evaluate method is insufficient, as kernelmatrix uses pairwise(metric(...), .... In principle one could think about going deeper, and start extending all necessary methods for Distances.jl, but at this point the question is whether it is worth it.

At this point a wrapper might be easier, because then the only thing needed are some additional methods. At this point GP(DiffWrapper(Kernel)) would indeed be a different object, since it might use be less specialized methods for the kernelmatrix.

Additionally, GP gd that expresses the derivative of some GP g is not quite the same object. At least for the exact GP posteriors defined in AbstractGPs.jl, each instance stores the Cholesky decomposition C as well as C\y for the undifferentiated input.kernel, which can then be efficiently re-used each time the mean or var are computed.
To get the variance var(::GP, ::DiffPt) we can't use the existing C, so we would need to compute the whole C for d^2/dx1dx2 k(x1,x2) or store this matrix in addition.

@FelixBenning
Copy link
Author

FelixBenning commented May 22, 2023

kernelmatrix

@Crown421 the kernelmatrix thing is an issue I have not considered. Taking derivatives breaks isotropy leaving only stationarity intact. But I am also not sure why this specialization for kernelmatrix

function kernelmatrix::SimpleKernel, x::AbstractVector)
    return map(x -> kappa(κ, x), pairwise(metric(κ), x))
end

is more performant than

function kernelmatrix::SimpleKernel, x::AbstractVector)
    return broadcast(κ, x, permutedims(x))

I mean pairwise(metric(κ), x) = broadcast(metric(κ), x, permutedims(x)). So the specialized implementation does essentially

K = broadcast(x, permutedims(x) do (x,y)
      metric(κ)(x,y)
end # first pass over K
map( x->kappa(κ, x), K) # second pass over K

which accesses the elements of K twice. On the other hand

K = broadcast(x, permutedims(x) do (x,y)
      κ(x,y)
end

only requires one access. Since memory access is typically the bottleneck, the general definition should be more performant. That is unless

::SimpleKernel)(x,y) = kappa(κ, metric(κ)(x,y))

is not inlined and causes more function calls. But in that case it probably makes more sense to force inline the code above with @inline this should ensure the general implementation would be reduced to

K = broadcast(x, permutedims(x) do (x,y)
      kappa(κ, metric(κ)(x,y))
end

by the compiler. Which should be faster than the two pass version.

Issues with a wrapper

While DiffWrapper(kernel) may be of type kernel its compositions are not obvious. I mean for sums it is fine, since sum and differentiation commute. But for a function transform you do not have

$$\frac{d}{d x_i} k(f(x), f(y)) = (\frac{\partial}{\partial x_i} k) (f(x), f(y))$$

So

DiffWrapper(kernel)  FunctionTransform(f) != DiffWrapper(kernel  FunctionTransform(f))

So if you wanted to treat k=DiffWrapper(SqExponentialKernel()) as "the" mathematical squared exponential kernel

$$k(x,y) = \exp\Bigl(\frac{(x-y)^2}{2}\Bigr)$$

which is differentiable, then you would expect the behavior of DiffWrapper(kernel ∘ FunctionTransform(f)). So you would have to specialize all the function composition for DiffWrapper. And that feels like the main selling point of KernelFunctions.jl to me. Of course you could tell people to only use DiffWrapper at the very end. But man is that ugly for zero reason.

Caching the cholesky decomposition

I do not understand this point. This would automatically happen with this implementation too. I mean DiffPt(x, partial=i) is just a special point $x +\partial_i$ which is not in $\mathbb{R}^n$. But it still has an evaluation $y$ and a row in the cholesky matrix which can be cached. Everything should just work as is with AbstractGP.jl

@FelixBenning
Copy link
Author

FelixBenning commented May 22, 2023

This is really weird...

julia> @btime kernelmatrix(k, Xc);
  23.836 ms (5 allocations: 61.05 MiB)

julia> @btime map(x -> KernelFunctions.kappa(k, x), KernelFunctions.pairwise(KernelFunctions.metric(k), Xc));
  103.780 ms (8000009 allocations: 183.12 MiB)

julia> @btime k.(Xc, permutedims(Xc));
  78.818 ms (4 allocations: 30.52 MiB)

julia> size(Xc)
(2000,)

julia> size(first(Xc))
(2,)

julia> k
Squared Exponential Kernel (metric = Distances.Euclidean(0.0))

Why is the performance of the implementation

function kernelmatrix::SimpleKernel, x::AbstractVector)
    return map(x -> kappa(κ, x), pairwise(metric(κ), x))
end

https://github.com/JuliaGaussianProcesses/KernelFunctions.jl/blob/master/src/matrix/kernelmatrix.jl#L149-L151
worse than the function itself?

@devmotion
Copy link
Member

worse than the function itself?

Not sure what function you mean here and what you expect to be worse/better.

The main issue is that your benchmarking is flawed, variables etc. have to be interpolated since otherwise you suffer, sometimes massively, from type instabilities and inference issues introduced by global variables.

So instead you should perform benchmarks such as

julia> using KernelFunctions, BenchmarkTools

julia> Xc = ColVecs(randn(2, 2000));

julia> k = GaussianKernel();

julia> @btime kernelmatrix($k, $Xc);
  38.594 ms (5 allocations: 61.05 MiB)

julia> @btime kernelmatrix($k, $Xc);
  35.585 ms (5 allocations: 61.05 MiB)

julia> @btime map(x -> KernelFunctions.kappa($k, x), KernelFunctions.pairwise(KernelFunctions.metric($k), $Xc));
  37.478 ms (5 allocations: 61.05 MiB)

julia> @btime map(x -> KernelFunctions.kappa($k, x), KernelFunctions.pairwise(KernelFunctions.metric($k), $Xc));
  33.321 ms (5 allocations: 61.05 MiB)

julia> @btime $k.($Xc, permutedims($Xc));
  45.019 ms (2 allocations: 30.52 MiB)

julia> @btime $k.($Xc, permutedims($Xc));
  45.339 ms (2 allocations: 30.52 MiB)

I mean pairwise(metric(κ), x) = broadcast(metric(κ), x, permutedims(x)).

No, not generally. pairwise for standard distances (such as Euclidean) is implemented in highly optimized ways in Distances (e.g., by exploiting and ensuring symmetry of the distance matrix).

Since memory access is typically the bottleneck, the general definition should be more performant.

Therefore this statement also does not hold generally. If you are concerned about memory allocations, probably you also might want to use kernelmatrix! instead of kernelmatrix which minimizes allocations:

julia> # Continued from above

julia> K = Matrix{Float64}(undef, length(Xc), length(Xc));

julia> @btime kernelmatrix!($K, $k, $Xc);
  25.775 ms (1 allocation: 15.75 KiB)

julia> @btime kernelmatrix!($K, $k, $Xc);
  30.012 ms (1 allocation: 15.75 KiB)

Another disadvantage of broadcasting is that generally it means more work for the compiler (the whole broadcasting machinery is very involved and complicated) and hence increases compilation times.

@FelixBenning
Copy link
Author

@devmotion ahh 🤦 I only looked into the distances/pairwise.jl file for the pairwise function. I did not know that Distances.jl defines this as well. This is why I hate julias using import mechanism and the include instead of import of files. You never know where functions are coming from. It is basically like from module import * in python which everyone dislikes for the same reason.

I guess if pairwise actually uses the symmetry of distances, then I see where the speedup in the isotropic case comes from.

@devmotion
Copy link
Member

Yes, I try to avoid using XX in packages nowadays and rather use import XX or using XX: f, g, h to make such relations clearer (still convenient to use using XX in the REPL IMO).

@Crown421
Copy link
Member

Caching the cholesky decomposition

I do not understand this point. This would automatically happen with this implementation too. I mean DiffPt(x, partial=i) is just a special point x+∂i which is not in Rn. But it still has an evaluation y and a row in the cholesky matrix which can be cached. Everything should just work as is with AbstractGP.jl

My apologies, I had an error in thinking here, I was convinced that an additional matrix would need to be cached, not sure why.

Issues with a wrapper

While DiffWrapper(kernel) may be of type kernel its compositions are not obvious. I mean for sums it is fine, since sum and differentiation commute. But for a function transform you do not have

So

DiffWrapper(kernel)  FunctionTransform(f) != DiffWrapper(kernel  FunctionTransform(f))

So if you wanted to treat k=DiffWrapper(SqExponentialKernel()) as "the" mathematical squared exponential kernel

which is differentiable, then you would expect the behavior of DiffWrapper(kernel ∘ FunctionTransform(f)). So you would have to specialize all the function composition for DiffWrapper. And that feels like the main selling point of KernelFunctions.jl to me. Of course you could tell people to only use DiffWrapper at the very end. But man is that ugly for zero reason.

Well, not zero reason. There are multiple reasons for using a wrapper in this PR, and therefore it comes down to opinion. It would be easy to define some fallback functions that throw an error in problematic cases, advising users to use the wrapper at the end.

Given that differentiable kernels would not be a core feature, but rather an Extension when also loading a compatible autodiff package, any changes in main part of KernelFunctions should be minimal, and not reduce any performance.

Therefore I would personally prefer starting with a wrapper, at least for now, to have the key functionality available and see additional issues during use. For example, I have already wondered:

  1. How DiffPt should be treated in combination with "normal" points. You mention above mixing the two, but what should for example vcat(ColVecs(X), DiffPt(x, partial=1)) look like? We get performance benefits from storing a points as columns/ rows of a matrix of concrete types (i.e. Matrix{Float64}). Where do we put the partial "annotation" of a DiffPt? One option could be to define new types and a load of convenience functions to make it seamless to combine them with existing ones.

  2. How does DiffPt combine with MOInputs?

  3. For MOPinputs there is a prepare_isotopic_multi_output_data method, should there be something similar for DiffPts?

For me these are important usability questions, with a much higher "ugliness" potential than where one can put a wrapper. During a normal session, I manipulate a lot of inputs and input collections, but only define a GP/ kernel once.

src/diffKernel.jl Outdated Show resolved Hide resolved
@FelixBenning
Copy link
Author

FelixBenning commented May 23, 2023

@Crown421

Well, not zero reason. There are multiple reasons for using a wrapper in this PR, and therefore it comes down to opinion. It would be easy to define some fallback functions that throw an error in problematic cases, advising users to use the wrapper at the end.

I am starting to agree, given that I can not come up with a good solution to the kernelmatrix problem at the moment.

How DiffPt should be treated in combination with "normal" points. You mention above mixing the two, but what should for example vcat(ColVecs(X), DiffPt(x, partial=1)) look like? We get performance benefits from storing a points as columns/ rows of a matrix of concrete types (i.e. Matrix{Float64}). Where do we put the partial "annotation" of a DiffPt? One option could be to define new types and a load of convenience functions to make it seamless to combine them with existing ones.

That is something I am currently thinking about a lot. I would think that custom composite types would be a good idea. Storing

(x, 2)
(x, n),
(y,1),
...
(y,n)

could be replaced and emulated by some sort of dictionary

x => [2, n]
y => 1:n

the advantage is, that you could specialize on index ranges to take more than one partial derivative (and use backwarddiff to get the entire gradient).

But you would still need the ability to interleave points

(x,1)
(y,2)
(x,2)

and I am not yet sure how to fix the abstract order of the points.

Basically what should probably happen is something akin to an SQL join:

TABLE: Enries

ID PosID Partial1 Partial2
1 1 NULL NULL
2 1 1 2
3 2 2 NULL
...

TABLE: Positions

ID Coord1 Coord2 Coord3
1 0.04 1.34 2.6
2 42.7 1.0 3.4
3 2.1 0.3 4.5
...

A left join on (Entries, Postions) would then result in the theoretical list

[
    DiffPt(pos1, ()),
    DiffPt(pos1, (1,2)),
    DiffPt(pos2, (2,)),
     ...
]

But now I don't have the the ranges yet...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants