Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cKDTree optimization #62

Closed

Conversation

redhog
Copy link
Collaborator

@redhog redhog commented Nov 25, 2020

Closes #58

@redhog redhog force-pushed the ckdtree-optimization branch from 6af812c to 9a9466c Compare November 25, 2020 17:54
@@ -389,16 +395,23 @@ def _krige(self, idx):
dists = self.transform_dists[idx,:]

# find all points within the search distance
idx = np.where(dists <= self.range)[0]
if isinstance(dists, scipy.sparse.spmatrix):
idx = np.array([k[1] for k in dists.keys()])
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For non-sparse datasets this might actually be a performance bottleneck, and we should use toarray() or somesuch solution to speed it up.

@redhog
Copy link
Collaborator Author

redhog commented Dec 15, 2020

Heyas @mmaelicke ! Would you have time to review this?

Note: There is a new flag "sparse" added. If set to false, all distances are calculated using pdist, which is the fastest way, at least for semi-small datasets, but quickly eats a lot of ram (N*M, where N is number of points kriged to and M is number of points kriged from). If set to true, distances are calculated using ckDTree only for points withing range, and stored in a sparse matrix. This takes considerably less storage, but it slightly slower for lookup, so for smaller datasets this is a disadvantage. For larger datasets this saves your machine from swapping, which would quickly lower your performance.

@mmaelicke
Copy link
Owner

@redhog , awesome, thank a lot!
Yeah, I will review it. I'll do my very best to do it this evening.

@mmaelicke mmaelicke self-requested a review December 15, 2020 09:27
@redhog
Copy link
Collaborator Author

redhog commented Dec 15, 2020

Maybe the default for sparse needs to be False... Hm...

Copy link
Owner

@mmaelicke mmaelicke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, thanks a lot!
I think, we need to sort the default value for sparseout, then I'm completely fine with it.

skgstat/Kriging.py Outdated Show resolved Hide resolved
skgstat/Kriging.py Outdated Show resolved Hide resolved
selected_dists = dists[0, idx].toarray()[0,:]
else:
selected_dists = dists[idx]
sorted_idx = np.argsort(selected_dists, kind="stable")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just out of curiosity: why stable sort here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because without that I couldn't do a regression test that gave the same result with sparse/non-sparse (when point coords coincided exactly for some pair of points, which some did in my test dataset)..

@redhog
Copy link
Collaborator Author

redhog commented Dec 17, 2020

This is superseded by #68 which implements the same feature, but in a separate class that can also be used by Variogram.

redhog pushed a commit to emerald-geomodelling/upstream-scikit-gstat that referenced this pull request Dec 17, 2020
@mmaelicke
Copy link
Owner

This is superseded by #68 which implements the same feature, but in a separate class that can also be used by Variogram.

So, you want to only merge #68 instead of this PR, or does #68 just replace the utility function after this PR was merged?

@redhog
Copy link
Collaborator Author

redhog commented Dec 17, 2020

Only merge the other one. Sorry for coding faster than I talked to you...

@redhog redhog closed this Dec 17, 2020
@redhog
Copy link
Collaborator Author

redhog commented Dec 17, 2020

I had the other one in parallel, but didn't want to push it due to a bug that I have now squished...

@redhog redhog reopened this Dec 17, 2020
@redhog redhog closed this Dec 17, 2020
@mmaelicke
Copy link
Owner

NP. I will just wait until you want me to review or merge something. Or both. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize distance function in kriging
2 participants