Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metric space #68

Merged
merged 38 commits into from
Apr 12, 2021
Merged

Conversation

redhog
Copy link
Collaborator

@redhog redhog commented Dec 17, 2020

First steps towards extracting all distance metrics representation into a separate object as outlined in #67

@redhog redhog mentioned this pull request Dec 17, 2020
@redhog
Copy link
Collaborator Author

redhog commented Dec 17, 2020

Hi @mmaelicke !

Please take a look. There is one issue with this code: I'm not sure how to serialize the parameters of a harmonized model to the describe() output in such a way that the model function could be recreated in Kriging. I haven't worked with harmonized models, so I'm not sure how it's supposed to work. Any thoughts?

harmonize = False
if not callable(model):
if model == "harmonize":
# FIXME: How do we serialize a harmonized model in describe()?
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where I don't really know what to do...

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will have a look at it as soon as possible.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @redhog, I don't find the time right now to solve this (lockdown here, kids at home etc.)...
Would it be a possible pathway for you to not include the parameters of harmonized models for now? Then, the Kriging class would raise a NotImplementedError if model params are missing. Then, we could merge this PR and open an issue concerning the harmonized model, to keep it on the agenda.
I just don't want to hold you up any longer, sry about that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't need harmonized models myself, so for me that'd work. However, it would break the workflow for anyone who do. If you think that's not too big a deal, we could merge anyway. Alternatively I guess I could add a hack workaround: Copy the model function (not just name) here:

variogram = variogram.describe()

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I would go for hacky workaround along with an issue opened to fix it at any later point in time.

@redhog
Copy link
Collaborator Author

redhog commented Apr 8, 2021

Heyas @mmaelicke!
Just wanted to let you know I'm working on this again, and have fixed the harmonized model problem.
I also updated the Variogram model to use MetricSpace, so that when you set maxlag it can take advantage of this to not consider all pairs.
I do still have some faulty unittests, mostly because they assume coords is a ndarray, not a MetricSpace.
Btw, this is the shape of my current test dataset:

                   X             Y
count   16887.000000  1.688700e+04
mean   105876.886644  1.256407e+06
std       162.765830  2.139842e+02
min    105374.220678  1.256050e+06
25%    105757.755000  1.256232e+06
50%    105904.848719  1.256376e+06
75%    106002.500000  1.256596e+06
max    106212.660078  1.256800e+06

And I use a maxlag of 50... Without that, it eats > 4Gb RAM to generate the variogram. This is a smallish dataset among the ones we're working with. Just to give you an idea about the scale I'm trying to handle.

@mmaelicke
Copy link
Owner

Hey, thank man! Looks nice!
There is a lot of code outside of skgstat that actually assumes coordinates to be a ndarray. Maybe it's a good idea to store the MetricSpace as a new attribute and let Variogram.coordinates return the coords just like before, to not break existing code.
It should be a property function anyway, so implementing it like this should be straightforward.
I can have a more detailed look on the code, if you want me to. I just quickly scanned the changed files, to be honest.

@redhog
Copy link
Collaborator Author

redhog commented Apr 9, 2021

Variogram.coordinates is actually already a ndarray (it's a property that points to self._X.coords)

@codecov
Copy link

codecov bot commented Apr 12, 2021

Codecov Report

Merging #68 (aaefd81) into master (ef963b8) will decrease coverage by 1.62%.
The diff coverage is 79.77%.

❗ Current head aaefd81 differs from pull request most recent head 544157e. Consider uploading reports for the commit 544157e to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master      #68      +/-   ##
==========================================
- Coverage   90.72%   89.10%   -1.63%     
==========================================
  Files          14       16       +2     
  Lines        1650     1808     +158     
==========================================
+ Hits         1497     1611     +114     
- Misses        153      197      +44     
Impacted Files Coverage Δ
skgstat/VariogramResult.py 38.29% <38.29%> (ø)
skgstat/DirectionalVariogram.py 91.09% <81.81%> (+0.12%) ⬆️
skgstat/MetricSpace.py 85.71% <85.71%> (ø)
skgstat/Kriging.py 85.53% <86.48%> (+1.75%) ⬆️
skgstat/Variogram.py 95.35% <92.13%> (-1.07%) ⬇️
skgstat/SpaceTimeVariogram.py 83.03% <100.00%> (+0.07%) ⬆️
skgstat/__init__.py 100.00% <100.00%> (ø)
skgstat/stmodels.py 100.00% <100.00%> (ø)
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ef963b8...544157e. Read the comment docs.

@redhog
Copy link
Collaborator Author

redhog commented Apr 12, 2021

@mmaelicke

Heyas! I think this is ready for merging now. Added some unit tests.
In general for testing Variogram(), it's good to have test data that is actually spatially correlated. I hacked that together for testing maxlag, but it's a bit of a hack...

@mmaelicke
Copy link
Owner

mmaelicke commented Apr 12, 2021

Nice, I'll have a look at it somewhere this week.
We're still losing some test coverage. But yeah I agree, there should be some tests that use data that is actually spatial correlated.
BTW., I merged a new property today, that returns the spatial dimensionality, which is right now relying on _X. That will need fixing after this merge, as well.
@redhog, did you check if the examples in the docs are still compiling? There is a lot of quite old code in there, possibly some of the examples are using _X directly and need to call Variogram.coordinates to get the ndarray of coordinates. That would be a useful update anyway. This is something that I can do myself..

EDIT: I just had really quickly scanned the changes and it looks like you took care of all that already. Really nice!
About the unittests: You could close this PR and reopen onto a new branch, also called metric-space within this repo, cause then I can add some tests myself and you don't have to do all the work. Second: The CI is running pytest nowadays. That might make it a bit easier to add more complicated tests, than unittest itself. Meaning, we can add unittests or a pytest compatible script.

@redhog redhog changed the base branch from master to metric-space April 12, 2021 19:15
@redhog redhog merged commit 6d61e49 into mmaelicke:metric-space Apr 12, 2021
@redhog redhog mentioned this pull request Apr 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants