Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructuring for optimization #67

Closed
redhog opened this issue Dec 2, 2020 · 8 comments
Closed

Restructuring for optimization #67

redhog opened this issue Dec 2, 2020 · 8 comments

Comments

@redhog
Copy link
Collaborator

redhog commented Dec 2, 2020

I've been thinking a bit about the class structure and how things could/should be structured to make more optimizations possible, esp in the case where you have multiple variograms for different measurements at the same positions:

This is not intended as a "let's hack this now", but as a discussion starter to see what you think, and maybe there's a totally different better way to structure the API.

We could have a class that models a set of positions together with their distance metric:

pos = MetricSpace(coordinates, metric="euclidean")
new_pos = MetricSpacePair(pos, new_coordinates)

v1 = Variogram(pos, values1, ...)
ok1 = OrdinaryKriging(pos, values, v1.describe())
v2 = Variogram(pos, values2, ...)
ok2 = OrdinaryKriging(pos, values, v2.describe())
...

interp_values1 = ok1.transform(new_pos)
interp_values1 = ok2.transform(new_pos)
...

The idea here is that MetricSpace calculates the (possibly sparse) distance matrix once of all points in coordinates to all other points in coordinates, regardless of how many variograms or krigings you use it for, and MetricSpacePair does the same for the distance matrix of all points in coordinates and all points in new_coordinates.

@mmaelicke
Copy link
Owner

Thanks for the suggestions.

So far I didn't put too much effort and thinking into the kriging class, so improvements are welcome. I think if we design the MetricSpacePair class wisely, the current code is not too far away from integrating the behavior you describe above. Given that the OrdinaryKriging class is refactored on input attributes as described in #63 , it's only the OrdinaryKriging.transform method that needs some rework. Right now, it needs N 1D-arrays, one for each coordinate dimension.
So either the transform method signature is changed to also accept a MetricSpacePair, or the MetricSpacePair needs a function to export the new positions in the correct format.
The advantage of accepting a MetricSpacePair instance would be, that OrdinaryKriging could pass around an internal flag that it received a MetricSpacePair instance and prevents itself from re-calculating distances, as MetricSpacePair would hold a distance matrix for the new positions, right?

That would be cool, as it is a real improvement, but OrdinaryKriging would still accept a single variogram and calculate the distance matrix itself, if no MetricSpacePair is given.


A few more thoughts that might be interesting here:

  1. In the past, there was always the idea to write a proper interface between skgstat.Variogram and pykrige.

an experimental interface function is here:

def pykrige_model(variogram):

I am not sure if this is still compatible with the latest pykrige version. Since pykrige was moved to the GeoStat Framework and saw some major improvements since.

Whatever route we take here, it would be great if any new classes/interfaces for kriging would at least not close that door.

  1. I had always in mind to wrap sklean.Estimator classes (https://scikit-learn.org/stable/developers/develop.html) around the skgstat.Variogram and skgstat.OrdinaryKriging. Main reasons were to integrate scikit-gstat into existing workflows with sklean.Pipeline and make use of the GridSearch from sklearn.

The experimental interface can be found here:

class VariogramEstimator(BaseEstimator):

Maybe it is possible to integrate / realize the workflow you have in mind by using the sklearn estimator APIs?

As a note: With @MuellerSeb from Geostat framework I discussed in the past if this would also be a feasible way to 'export' skgstat.Variogram to gstools to make use of their great random field generation stuff. And I think we agreed on at least not dropping this idea completely... :)

@redhog
Copy link
Collaborator Author

redhog commented Dec 2, 2020

For the sklearn estimator, shouldn't we have a class

vk = VariogramKriging(params)
vk.fit(coordinates, values)
new_values = vk.predict(new_coordinates)

That's a quite different interface, but could be provided as a wrapper around a Variogram and an OrdinaryKriging instance?
In that case we can easily create MetricSpace/MetricSpacePair instances for the user "under the hood"...

@mmaelicke
Copy link
Owner

That's also a nice idea. Then we would have variogram and kriging in one step.

I always struggled with turning Variogram itself into a proper estimator, as one of the most fundamental design decisions had always been, that "there is no variogram without data sample", which is kind of the logic behind an sklearn estimator. First you give the hyper-parameters, fit to data to obtain the estimated parameters...

But at the end of the day we could have both. Variogram, Kriging and MetricSpace along with sklearn.Estimator wrapper(s). The sklearn estimators are then interfaces to other python packages and would thus justify the different design. That's how I would argue.

I still would love the idea of having an estimator around Variogram and Kriging and then VariogramKriging could be an Pipeline... On the other hand: combining both, variogram and kriging into a single estimator will be easier to implement...

not sure.

@MuellerSeb
Copy link
Collaborator

Hey there,

my two cents on this topic:
GSTools provides solid kriging routines (running in parallel in cython) with a lot of options. See this base class:
https://geostat-framework.readthedocs.io/projects/gstools/en/latest/generated/gstools.krige.Krige.html#gstools.krige.Krige

PyKrige will serve as the place for fancy kriging stuff in the future relying on the GSTools implementations. So, more interfaces to sklearn (like these: GeoStat-Framework/PyKrige#165, GeoStat-Framework/PyKrige#158), and more stuff like CoKriging, Local kriging (with searchradius), spatio temporal kriging all implemented in cython (to be fast and use OpenMP).

So it could be a good idea to join forces here. GSTools will bring a new release in the next weeks introducing these new kriging routines, directional variogram estimation, and latlon support along with examples. For the next year I plan to work on PyKrige as described above.

I keept the possibility to define GSTools CovModels by setting the variogram method to keep the interface to scikit-gstat alive for now as discussed here: GeoStat-Framework/GSTools#90. But I can imagine better integration.

Cheers, Sebastian

@mmaelicke
Copy link
Owner

Hey Sebastian,
Thanks for your input. I am really looking forward to see the updates in action.

I will have a look at the new stuff hopefully sometimes soon. For us, it will be interesting to decide, wether the skgstat.Variogram will interface with gstools.CovModel or a pykrige Kriging class directly.

I believe at some point in the discussion in gstools I started realizing that interfacing CovModel with Variogram directly is not as straightforward as I first thought. The main challenge is that the two classes are designed very different in terms of their data flow and usage. Also, by design, there are no skgstat.Variogram instances that have not been fitted to a sample, as that happens on instantiation. This makes it quite complicated to create a CovModel from a skgstat.Variogram.

But I personally understand this difference as an advantage, because flavors are different and there are cases in exploratory data analysis where I personally choose skgstat and (many) cases in which I am into random field creation and analysis, which can only be done with gstools.

I believe that I didn't understand the interplay of the rescale, len_scale and integral_scale parameters introduced in GeoStat-Framework/GSTools#109 so far and especially how to map them into the skgstat language. I just need the time to look into that. Then a proper export function from skgstat.Variogram to gstools can be implemented and RFs and Kriging can be done with the Geostat-Framework.
Nevertheless, skgstat has the OrdinaryKriging class already implemented and I think it's worth it keeping this class (and improving it). Otherwise, skgstat would be kind of half-baked.
This connection between skgstat and gstools is something that I have on my agenda, but I still don't find the time necessary to do that properly.

But again, thanks for your always-welcome comments, Sebastian!

Best, Mirko

@MuellerSeb
Copy link
Collaborator

MuellerSeb commented Dec 2, 2020

Given that almost all variogram models of scikit-gstat are present in GSTools, I think the best way of an interface would be to map these models onto each other. rescale is especially useful for this task, since there are some minor differences in the model definitions.

For example scikit-gstats gaussian model is defined as: exp(-(r/(l/2))^2), so the rescale factor needs to be 2 (That is the whole point of this factor). I choose the rescale factor to be sqrt(pi/2) so integral scale and length-scale coincide (often done in literature). Integral scale is a model-agnostic representation of a length-scale to gain comparability, since the length-scale can be anything (it is the integral of the correlation function to be precise).

Only the cubic model is missing AFAICT. I can add it if you want ;-)

@mmaelicke
Copy link
Owner

Well, a mapping of the models might make sense anyway, but why would one not use gstools directly then.

I think, from my point of view, the interesting part would be if skgstat can be used instead of gstools.vario_estimate... procedure in more uncommon situations (as an added alternative, not a replacement). Here, the uneven or custom binning, the different and customizable fitting parameterization, the different and customizable semivariance estimators and so on, that skgstat has, can be helpful. What I always had in mind was to use skgstat as an alternative to produce your fitted models and then use it along with all the other great tools gstools has.
In case one wants to use gstools for estimating and fitting a covariance function or variogram in a way that gstools already offers, my advise would be to use gstools and not skgstat, right?
From my point of view, if that works good and there is some way of how a skgstat Variogram can be used in pykrige or gstools kriging, utilizing which interface or whatsoever, then skgstat could add some concise value to the geostatistical-python-universe.

@redhog
Copy link
Collaborator Author

redhog commented Apr 20, 2021

@redhog redhog closed this as completed Apr 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants