cKDTree optimization #62

redhog · 2020-11-25T17:52:11Z

Closes #58

redhog · 2020-11-25T17:57:08Z

skgstat/Kriging.py

@@ -389,16 +395,23 @@ def _krige(self, idx):
        dists = self.transform_dists[idx,:]

        # find all points within the search distance
-        idx = np.where(dists <= self.range)[0]
+        if isinstance(dists, scipy.sparse.spmatrix):
+            idx = np.array([k[1] for k in dists.keys()])


For non-sparse datasets this might actually be a performance bottleneck, and we should use toarray() or somesuch solution to speed it up.

…ee-optimization

redhog · 2020-12-15T08:29:22Z

Heyas @mmaelicke ! Would you have time to review this?

Note: There is a new flag "sparse" added. If set to false, all distances are calculated using pdist, which is the fastest way, at least for semi-small datasets, but quickly eats a lot of ram (N*M, where N is number of points kriged to and M is number of points kriged from). If set to true, distances are calculated using ckDTree only for points withing range, and stored in a sparse matrix. This takes considerably less storage, but it slightly slower for lookup, so for smaller datasets this is a disadvantage. For larger datasets this saves your machine from swapping, which would quickly lower your performance.

mmaelicke · 2020-12-15T09:26:55Z

@redhog , awesome, thank a lot!
Yeah, I will review it. I'll do my very best to do it this evening.

redhog · 2020-12-15T10:00:50Z

Maybe the default for sparse needs to be False... Hm...

mmaelicke

Nice work, thanks a lot!
I think, we need to sort the default value for sparseout, then I'm completely fine with it.

skgstat/Kriging.py

mmaelicke · 2020-12-16T18:24:16Z

skgstat/Kriging.py

+                selected_dists = dists[0, idx].toarray()[0,:]
+            else:
+                selected_dists = dists[idx]
+            sorted_idx = np.argsort(selected_dists, kind="stable")


just out of curiosity: why stable sort here?

Because without that I couldn't do a regression test that gave the same result with sparse/non-sparse (when point coords coincided exactly for some pair of points, which some did in my test dataset)..

redhog · 2020-12-17T08:31:47Z

This is superseded by #68 which implements the same feature, but in a separate class that can also be used by Variogram.

…cke#62

mmaelicke · 2020-12-17T08:39:35Z

This is superseded by #68 which implements the same feature, but in a separate class that can also be used by Variogram.

So, you want to only merge #68 instead of this PR, or does #68 just replace the utility function after this PR was merged?

redhog · 2020-12-17T08:49:12Z

Only merge the other one. Sorry for coding faster than I talked to you...

redhog · 2020-12-17T08:49:49Z

I had the other one in parallel, but didn't want to push it due to a bug that I have now squished...

mmaelicke · 2020-12-17T08:50:57Z

NP. I will just wait until you want me to review or merge something. Or both. :)

cKDTree based optimization for the euclidean case

9a9466c

redhog force-pushed the ckdtree-optimization branch from 6af812c to 9a9466c Compare November 25, 2020 17:54

redhog commented Nov 25, 2020

View reviewed changes

Egil added 4 commits December 3, 2020 09:27

Optimized dist_mat calculation

082b930

Bugfix for the explicit zero handling og sparse dok format

e2c051a

Use a faster sparse format

1f8a7a7

Merge branch 'master' of github.com:mmaelicke/scikit-gstat into ckdtr…

3cbf2ff

…ee-optimization

mmaelicke self-requested a review December 15, 2020 09:27

mmaelicke requested changes Dec 16, 2020

View reviewed changes

Changed the default for sparse

4d37646

redhog pushed a commit to emerald-geomodelling/upstream-scikit-gstat that referenced this pull request Dec 17, 2020

Updated default value for spare to match what was discussed in mmaeli…

dc5366a

…cke#62

redhog closed this Dec 17, 2020

redhog reopened this Dec 17, 2020

redhog closed this Dec 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cKDTree optimization #62

cKDTree optimization #62

redhog commented Nov 25, 2020

redhog Nov 25, 2020

redhog commented Dec 15, 2020

mmaelicke commented Dec 15, 2020

redhog commented Dec 15, 2020

mmaelicke left a comment

mmaelicke Dec 16, 2020

redhog Dec 16, 2020

redhog commented Dec 17, 2020

mmaelicke commented Dec 17, 2020

redhog commented Dec 17, 2020

redhog commented Dec 17, 2020

mmaelicke commented Dec 17, 2020

cKDTree optimization #62

cKDTree optimization #62

Conversation

redhog commented Nov 25, 2020

redhog Nov 25, 2020

Choose a reason for hiding this comment

redhog commented Dec 15, 2020

mmaelicke commented Dec 15, 2020

redhog commented Dec 15, 2020

mmaelicke left a comment

Choose a reason for hiding this comment

mmaelicke Dec 16, 2020

Choose a reason for hiding this comment

redhog Dec 16, 2020

Choose a reason for hiding this comment

redhog commented Dec 17, 2020

mmaelicke commented Dec 17, 2020

redhog commented Dec 17, 2020

redhog commented Dec 17, 2020

mmaelicke commented Dec 17, 2020