-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Empirical Variogram: Update and Refactoring #106
Conversation
* implemented a prototype for the unstructured function with angles. * minor fix with variable name * added docstring and arguments for angle estimation * added a working version which tests the basic functionality * docstring adaption * changed for automatic testcase data generation and split up the test cases * changed default angle tolerance to 25deg * added option to also return the counts (number of pairs) from unstructured * implemented a meaningful test for 2d variogram estimation * bugfix for 3d case when elevation is 90° or 270° * implemented some basic 3d test cases * vario: cleanup cython routines; use greate-circle for tolerance in 3D; check both directions between point pairs * vario: doc update; correct intervals for angles; general formatting of angles array * vario: better handling of angle ranges * vario: fix wrong assumption about hemisphere for angles Co-authored-by: MuellerSeb <[email protected]>
…es (pos array; nD dist; angle checks)
…ariant to vario_estimate
…eter to provide an external mask
@LSchueler one question: should a mask, that masks all values raise an ValueError, our should we return a 0 valued variogram and 0 counts for each bin? ATM it raises an ValueError. EDIT: in order to make it consistent (if field is a masked array and the mask there is entirely True, a 0 valued variogram is returned), I removed the raised Error and return a 0 valued variogram without further calculations. |
if not (isnan(f[m,k]) or isnan(f[m,j])): | ||
counts[d, i] += 1 | ||
variogram[d, i] += estimator_func(f[m,k] - f[m,j]) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One could argue to incorporate a break
here (for d
), since if a point pair matches one direction, it (maybe) shouldn't be used in another direction as well. This only happens if the angels_tol is big enough, so two directions have an overlapping search area. @LSchueler your opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every hardcore Bayesian would probably cry "heresy!", but maybe it is exactly the intention of someone using a large tolerance to do a preliminary test without any knowledge about the main directions of the field. Then, I guess it would be helpful to use as many data points as possible, even if they overlap and are redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For speed up reasons we could check all angles between directions and if their minimum is greater than 2 * angels_tol
, we could break ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed that by checking if the directions are separated. If the search bands overlap, point pairs are counted repeatedly.
With what would this be consistent? - I'm not sure how meaningful a variogram of a completely invalid field is... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once again a tremendous effort! Thanks for that.
I think I like the renaming of the estimation functions, but it could be that users get even more confused about the struct, unstruct stuff, let's see.
I hope you don't mind that I pushed some small typo-fixes directly onto this branch.
I think if you update the changelog, I'd be happy about a merge.
vec[:, i] *= np.cos(angles[:, (i - 1)]) | ||
if dim in [2, 3]: | ||
vec[:, [0, 1]] = vec[:, [1, 0]] # to match convention in 2D and 3D | ||
return vec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope I will never have to fix a bug in here ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said: rotation in higher dimensions (>1) is a pain in the neck!
if not (isnan(f[m,k]) or isnan(f[m,j])): | ||
counts[d, i] += 1 | ||
variogram[d, i] += estimator_func(f[m,k] - f[m,j]) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every hardcore Bayesian would probably cry "heresy!", but maybe it is exactly the intention of someone using a large tolerance to do a preliminary test without any knowledge about the main directions of the field. Then, I guess it would be helpful to use as many data points as possible, even if they overlap and are redundant.
If the field is a masked array and the mask there is entirely True, a 0 valued variogram is returned. If an external mask was given that is enterly true with an ordinary field, an Error was raised. That was the inconsistency. The now returned variogram is constantly 0 and the counts are 0 as well. I think this is meaningful, since it says: No data! |
Thanks ❤️
I think it is now in line with all other routines, since the
Of course not. My english is not se goodest.
I'll do a changelog later in develop, so I can mention the PRs. |
…es in cython; some code refactoring
This PR reworks the whole variogram estimation subpackage:
vario_estimate
instead ofvario_estimate_unstructured
(old kept for legacy code) for simplicityvario_estimate_axis
instead ofvario_estimate_structured
(old kept for legacy code) for simplicityvario_estimate
no_data
option added to allow missing valuesmask
keyword was added to provide an external maskvario_estimate_axis
no_data
option added to allow missing values (sovles vario_estimate_structured returns only nan if nan points in the field #83)"x"
,"y"
,"z"
) or axis number (0
,1
,2
,3
, ...)Thanks to @TobiasGlaubach for starting this (#87) and to @LSchueler for providing the first implementation of the new distance calculation in cython (#82). This also re-implements #104, which makes to original PR obsolete.