Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empirical Variogram: Update and Refactoring #106

Merged
merged 45 commits into from
Nov 11, 2020
Merged

Conversation

MuellerSeb
Copy link
Member

@MuellerSeb MuellerSeb commented Nov 7, 2020

This PR reworks the whole variogram estimation subpackage:

  • new routine name vario_estimate instead of vario_estimate_unstructured (old kept for legacy code) for simplicity
  • new routine name vario_estimate_axis instead of vario_estimate_structured (old kept for legacy code) for simplicity
  • vario_estimate
    • allow to pass multiple fields for joint variogram estimation (e.g. for daily precipitation) on same mesh
    • no_data option added to allow missing values
    • masked fields
      • user can now pass a masked array (or a list of masked arrays) to deselect data points.
      • in addition, a mask keyword was added to provide an external mask
    • directional variograms
      • diretional variograms can now be estimated
      • either provide a list of directions or angles for directions (spherical coordinates)
      • can be controlled by given angle tolerance and (optional) bandwidth
      • prepared for nD
    • an structured field (pos tuple describes axes) can now be passed to estimate an isotropic or directional variogram
    • distance calculation in cython routines in now independent of dimension
  • vario_estimate_axis

Thanks to @TobiasGlaubach for starting this (#87) and to @LSchueler for providing the first implementation of the new distance calculation in cython (#82). This also re-implements #104, which makes to original PR obsolete.

TobiasGlaubach and others added 12 commits November 6, 2020 16:35
* implemented a prototype for the unstructured function with angles.

* minor fix with variable name

* added docstring and arguments for angle estimation

* added a working version which tests the basic functionality

* docstring adaption

* changed for automatic testcase data generation and split up the test cases

* changed default angle tolerance to 25deg

* added option to also return the counts (number of pairs) from unstructured

* implemented a meaningful test for 2d variogram estimation

* bugfix for 3d case when elevation is 90° or 270°

* implemented some basic 3d test cases

* vario: cleanup cython routines; use greate-circle for tolerance in 3D; check both directions between point pairs

* vario: doc update; correct intervals for angles; general formatting of angles array

* vario: better handling of angle ranges

* vario: fix wrong assumption about hemisphere for angles

Co-authored-by: MuellerSeb <[email protected]>
@MuellerSeb MuellerSeb added enhancement New feature or request Documentation Performance Performance related stuff. labels Nov 7, 2020
@MuellerSeb MuellerSeb added this to the 1.3 milestone Nov 7, 2020
@MuellerSeb MuellerSeb requested a review from LSchueler November 7, 2020 15:47
@MuellerSeb MuellerSeb self-assigned this Nov 7, 2020
@MuellerSeb MuellerSeb linked an issue Nov 8, 2020 that may be closed by this pull request
@MuellerSeb
Copy link
Member Author

MuellerSeb commented Nov 9, 2020

@LSchueler one question: should a mask, that masks all values raise an ValueError, our should we return a 0 valued variogram and 0 counts for each bin? ATM it raises an ValueError.

EDIT: in order to make it consistent (if field is a masked array and the mask there is entirely True, a 0 valued variogram is returned), I removed the raised Error and return a 0 valued variogram without further calculations.

if not (isnan(f[m,k]) or isnan(f[m,j])):
counts[d, i] += 1
variogram[d, i] += estimator_func(f[m,k] - f[m,j])

Copy link
Member Author

@MuellerSeb MuellerSeb Nov 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One could argue to incorporate a break here (for d), since if a point pair matches one direction, it (maybe) shouldn't be used in another direction as well. This only happens if the angels_tol is big enough, so two directions have an overlapping search area. @LSchueler your opinion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every hardcore Bayesian would probably cry "heresy!", but maybe it is exactly the intention of someone using a large tolerance to do a preliminary test without any knowledge about the main directions of the field. Then, I guess it would be helpful to use as many data points as possible, even if they overlap and are redundant.

Copy link
Member Author

@MuellerSeb MuellerSeb Nov 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For speed up reasons we could check all angles between directions and if their minimum is greater than 2 * angels_tol, we could break ;-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed that by checking if the directions are separated. If the search bands overlap, point pairs are counted repeatedly.

@LSchueler
Copy link
Member

@LSchueler one question: should a mask, that masks all values raise an ValueError, our should we return a 0 valued variogram and 0 counts for each bin? ATM it raises an ValueError.

EDIT: in order to make it consistent (if field is a masked array and the mask there is entirely True, a 0 valued variogram is returned), I removed the raised Error and return a 0 valued variogram without further calculations.

With what would this be consistent? - I'm not sure how meaningful a variogram of a completely invalid field is...

Copy link
Member

@LSchueler LSchueler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once again a tremendous effort! Thanks for that.
I think I like the renaming of the estimation functions, but it could be that users get even more confused about the struct, unstruct stuff, let's see.

I hope you don't mind that I pushed some small typo-fixes directly onto this branch.

I think if you update the changelog, I'd be happy about a merge.

vec[:, i] *= np.cos(angles[:, (i - 1)])
if dim in [2, 3]:
vec[:, [0, 1]] = vec[:, [1, 0]] # to match convention in 2D and 3D
return vec
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope I will never have to fix a bug in here ;-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said: rotation in higher dimensions (>1) is a pain in the neck!

if not (isnan(f[m,k]) or isnan(f[m,j])):
counts[d, i] += 1
variogram[d, i] += estimator_func(f[m,k] - f[m,j])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every hardcore Bayesian would probably cry "heresy!", but maybe it is exactly the intention of someone using a large tolerance to do a preliminary test without any knowledge about the main directions of the field. Then, I guess it would be helpful to use as many data points as possible, even if they overlap and are redundant.

@MuellerSeb
Copy link
Member Author

@LSchueler one question: should a mask, that masks all values raise an ValueError, our should we return a 0 valued variogram and 0 counts for each bin? ATM it raises an ValueError.
EDIT: in order to make it consistent (if field is a masked array and the mask there is entirely True, a 0 valued variogram is returned), I removed the raised Error and return a 0 valued variogram without further calculations.

With what would this be consistent? - I'm not sure how meaningful a variogram of a completely invalid field is...

If the field is a masked array and the mask there is entirely True, a 0 valued variogram is returned. If an external mask was given that is enterly true with an ordinary field, an Error was raised.

That was the inconsistency. The now returned variogram is constantly 0 and the counts are 0 as well. I think this is meaningful, since it says: No data!

@MuellerSeb
Copy link
Member Author

Once again a tremendous effort! Thanks for that.

Thanks ❤️

I think I like the renaming of the estimation functions, but it could be that users get even more confused about the struct, unstruct stuff, let's see.

I think it is now in line with all other routines, since the mesh_type arguments was added and you can feed struct and unstruct fields to vario_estimate. The vario_estimate_axis was originally not for structured fields (rectilinear grids in our case) but for "structured points" (in the sence of VTK), where every axis is equidistant. So I think with the axis hint (like axis of numpy arrays), the intention of this routine is clearer now.

I hope you don't mind that I pushed some small typo-fixes directly onto this branch.

Of course not. My english is not se goodest.

I think if you update the changelog, I'd be happy about a merge.

I'll do a changelog later in develop, so I can mention the PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation enhancement New feature or request Performance Performance related stuff.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

vario_estimate_structured returns only nan if nan points in the field
3 participants