Empirical Variogram: Update and Refactoring #106

MuellerSeb · 2020-11-07T15:47:10Z

This PR reworks the whole variogram estimation subpackage:

new routine name vario_estimate instead of vario_estimate_unstructured (old kept for legacy code) for simplicity
new routine name vario_estimate_axis instead of vario_estimate_structured (old kept for legacy code) for simplicity
vario_estimate
- allow to pass multiple fields for joint variogram estimation (e.g. for daily precipitation) on same mesh
- no_data option added to allow missing values
- masked fields
  - user can now pass a masked array (or a list of masked arrays) to deselect data points.
  - in addition, a mask keyword was added to provide an external mask
- directional variograms
  - diretional variograms can now be estimated
  - either provide a list of directions or angles for directions (spherical coordinates)
  - can be controlled by given angle tolerance and (optional) bandwidth
  - prepared for nD
- an structured field (pos tuple describes axes) can now be passed to estimate an isotropic or directional variogram
- distance calculation in cython routines in now independent of dimension
vario_estimate_axis
- estimation along array axis now possible in arbitrary dimensions
- no_data option added to allow missing values (sovles vario_estimate_structured returns only nan if nan points in the field #83)
- axis can be given by name ("x", "y", "z") or axis number (0, 1, 2, 3, ...)

Thanks to @TobiasGlaubach for starting this (#87) and to @LSchueler for providing the first implementation of the new distance calculation in cython (#82). This also re-implements #104, which makes to original PR obsolete.

* implemented a prototype for the unstructured function with angles. * minor fix with variable name * added docstring and arguments for angle estimation * added a working version which tests the basic functionality * docstring adaption * changed for automatic testcase data generation and split up the test cases * changed default angle tolerance to 25deg * added option to also return the counts (number of pairs) from unstructured * implemented a meaningful test for 2d variogram estimation * bugfix for 3d case when elevation is 90° or 270° * implemented some basic 3d test cases * vario: cleanup cython routines; use greate-circle for tolerance in 3D; check both directions between point pairs * vario: doc update; correct intervals for angles; general formatting of angles array * vario: better handling of angle ranges * vario: fix wrong assumption about hemisphere for angles Co-authored-by: MuellerSeb <[email protected]>

…es (pos array; nD dist; angle checks)

…ariant to vario_estimate

…eter to provide an external mask

MuellerSeb · 2020-11-09T22:37:21Z

@LSchueler one question: should a mask, that masks all values raise an ValueError, our should we return a 0 valued variogram and 0 counts for each bin? ATM it raises an ValueError.

EDIT: in order to make it consistent (if field is a masked array and the mask there is entirely True, a 0 valued variogram is returned), I removed the raised Error and return a 0 valued variogram without further calculations.

MuellerSeb · 2020-11-10T15:41:40Z

gstools/variogram/estimator.pyx

+                                if not (isnan(f[m,k]) or isnan(f[m,j])):
+                                    counts[d, i] += 1
+                                    variogram[d, i] += estimator_func(f[m,k] - f[m,j])
+


One could argue to incorporate a break here (for d), since if a point pair matches one direction, it (maybe) shouldn't be used in another direction as well. This only happens if the angels_tol is big enough, so two directions have an overlapping search area. @LSchueler your opinion?

Every hardcore Bayesian would probably cry "heresy!", but maybe it is exactly the intention of someone using a large tolerance to do a preliminary test without any knowledge about the main directions of the field. Then, I guess it would be helpful to use as many data points as possible, even if they overlap and are redundant.

For speed up reasons we could check all angles between directions and if their minimum is greater than 2 * angels_tol, we could break ;-)

Fixed that by checking if the directions are separated. If the search bands overlap, point pairs are counted repeatedly.

LSchueler · 2020-11-10T18:08:05Z

@LSchueler one question: should a mask, that masks all values raise an ValueError, our should we return a 0 valued variogram and 0 counts for each bin? ATM it raises an ValueError.

EDIT: in order to make it consistent (if field is a masked array and the mask there is entirely True, a 0 valued variogram is returned), I removed the raised Error and return a 0 valued variogram without further calculations.

With what would this be consistent? - I'm not sure how meaningful a variogram of a completely invalid field is...

LSchueler

Once again a tremendous effort! Thanks for that.
I think I like the renaming of the estimation functions, but it could be that users get even more confused about the struct, unstruct stuff, let's see.

I hope you don't mind that I pushed some small typo-fixes directly onto this branch.

I think if you update the changelog, I'd be happy about a merge.

LSchueler · 2020-11-10T19:01:01Z

gstools/tools/geometric.py

+        vec[:, i] *= np.cos(angles[:, (i - 1)])
+    if dim in [2, 3]:
+        vec[:, [0, 1]] = vec[:, [1, 0]]  # to match convention in 2D and 3D
+    return vec


I hope I will never have to fix a bug in here ;-)

As I said: rotation in higher dimensions (>1) is a pain in the neck!

LSchueler · 2020-11-10T19:11:46Z

gstools/variogram/estimator.pyx

+                                if not (isnan(f[m,k]) or isnan(f[m,j])):
+                                    counts[d, i] += 1
+                                    variogram[d, i] += estimator_func(f[m,k] - f[m,j])
+


Every hardcore Bayesian would probably cry "heresy!", but maybe it is exactly the intention of someone using a large tolerance to do a preliminary test without any knowledge about the main directions of the field. Then, I guess it would be helpful to use as many data points as possible, even if they overlap and are redundant.

MuellerSeb · 2020-11-10T19:36:15Z

@LSchueler one question: should a mask, that masks all values raise an ValueError, our should we return a 0 valued variogram and 0 counts for each bin? ATM it raises an ValueError.
EDIT: in order to make it consistent (if field is a masked array and the mask there is entirely True, a 0 valued variogram is returned), I removed the raised Error and return a 0 valued variogram without further calculations.

With what would this be consistent? - I'm not sure how meaningful a variogram of a completely invalid field is...

If the field is a masked array and the mask there is entirely True, a 0 valued variogram is returned. If an external mask was given that is enterly true with an ordinary field, an Error was raised.

That was the inconsistency. The now returned variogram is constantly 0 and the counts are 0 as well. I think this is meaningful, since it says: No data!

MuellerSeb · 2020-11-10T19:46:02Z

Once again a tremendous effort! Thanks for that.

Thanks ❤️

I think I like the renaming of the estimation functions, but it could be that users get even more confused about the struct, unstruct stuff, let's see.

I think it is now in line with all other routines, since the mesh_type arguments was added and you can feed struct and unstruct fields to vario_estimate. The vario_estimate_axis was originally not for structured fields (rectilinear grids in our case) but for "structured points" (in the sence of VTK), where every axis is equidistant. So I think with the axis hint (like axis of numpy arrays), the intention of this routine is clearer now.

I hope you don't mind that I pushed some small typo-fixes directly onto this branch.

Of course not. My english is not se goodest.

I think if you update the changelog, I'd be happy about a merge.

I'll do a changelog later in develop, so I can mention the PRs.

…es in cython; some code refactoring

TobiasGlaubach and others added 12 commits November 6, 2020 16:35

blackened

2ab7504

tests: blackened

0ed1c15

tools.geometric.ang2dir: new converter from angles to directional vector

04c0a71

vario: introduce directional vectors+bandwith; simplify cython routin…

73f1b68

…es (pos array; nD dist; angle checks)

tests/vario: skip directional tests for now

90bdd99

tools: add ang2dir to init

b8fb4a0

vario: add multi field option; add nodata option

c044b62

example: add multi field vario estimation example

a4b4e6b

tests/vario: add test for multi field and nodata

e3461cf

vario: add posibility to give structured field; rename unstructured v…

89e83bb

…ariant to vario_estimate

vario: fix 'bandwidth' typo; add hints in documentation

c3a1eab

MuellerSeb added enhancement New feature or request Documentation Performance Performance related stuff. labels Nov 7, 2020

MuellerSeb added this to the 1.3 milestone Nov 7, 2020

MuellerSeb requested a review from LSchueler November 7, 2020 15:47

MuellerSeb self-assigned this Nov 7, 2020

MuellerSeb added 7 commits November 7, 2020 16:57

vario: reorder input parameters for backward compatibility

8f9c436

vario: angles doc fix

eac5e87

tests: cleanup

92d888a

doc: use 'vario_estimate' in all examples

66bb632

vario: cleanup mask handling in structured vario-estimate

e1cc3c5

vario: allow masked array for field in vario_estimate; add mask param…

73801c9

…eter to provide an external mask

vario: add no_data option to vario_estimate_structured (solves #83)

0abedea

MuellerSeb linked an issue Nov 8, 2020 that may be closed by this pull request

vario_estimate_structured returns only nan if nan points in the field #83

Closed

MuellerSeb added 3 commits November 8, 2020 20:51

vario: allow arbitrary dim. in structured estimation

5b3dae2

vario: rename vario_estimate_structured to vario_estimate_axis

8343566

vario: don't alter count array, if count==0

036667d

test: vario assertions and no_data checks added

3e3a07f

MuellerSeb added 11 commits November 9, 2020 23:56

vario: no valueError for full masked arrays; test wrong estimator Error

89e0e93

test: test fully masked vario estimate

ddbf148

examples: 2d example for directional variogram

f0e9c61

examples: remove lagacy call

d834c0a

examples: 3d example for directional variogram

0c323ec

tests: vario finally at 100%

9e88943

BUGFIX: 3D contourf plots in gstools not working with mpl 3.3 -> use 2D

539aff4

examples: switch to 2D plot for 3D field

849d49c

examples: use plt.show instead of fig.show

f22a7c3

examples: fix plotting order for sphinx gallery

9302564

examples: typo

1fadb67

MuellerSeb commented Nov 10, 2020

View reviewed changes

Fix some very minor typos in comments

80f9f32

LSchueler requested changes Nov 10, 2020

View reviewed changes

LSchueler approved these changes Nov 10, 2020

View reviewed changes

MuellerSeb added 3 commits November 11, 2020 14:54

Vario: add separate_dirs argument to directional variogram est routin…

2f4b7f1

…es in cython; some code refactoring

vario: check if direcitons are separated to optimize search

6ad2229

merged

b993c59

MuellerSeb merged commit 7ae9393 into develop Nov 11, 2020

MuellerSeb mentioned this pull request Nov 11, 2020

vario_estimate_structured returns only nan if nan points in the field #83

Closed

MuellerSeb deleted the direct-vario branch November 11, 2020 22:41

MuellerSeb mentioned this pull request Apr 7, 2021

GSTools 1.3 *Pure Pink* release #110

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Empirical Variogram: Update and Refactoring #106

Empirical Variogram: Update and Refactoring #106

MuellerSeb commented Nov 7, 2020 •

edited

Loading

MuellerSeb commented Nov 9, 2020 •

edited

Loading

MuellerSeb Nov 10, 2020 •

edited

Loading

LSchueler Nov 10, 2020

MuellerSeb Nov 10, 2020 •

edited

Loading

MuellerSeb Nov 11, 2020

LSchueler commented Nov 10, 2020

LSchueler left a comment

LSchueler Nov 10, 2020

MuellerSeb Nov 10, 2020

LSchueler Nov 10, 2020

MuellerSeb commented Nov 10, 2020

MuellerSeb commented Nov 10, 2020

Empirical Variogram: Update and Refactoring #106

Empirical Variogram: Update and Refactoring #106

Conversation

MuellerSeb commented Nov 7, 2020 • edited Loading

MuellerSeb commented Nov 9, 2020 • edited Loading

MuellerSeb Nov 10, 2020 • edited Loading

Choose a reason for hiding this comment

LSchueler Nov 10, 2020

Choose a reason for hiding this comment

MuellerSeb Nov 10, 2020 • edited Loading

Choose a reason for hiding this comment

MuellerSeb Nov 11, 2020

Choose a reason for hiding this comment

LSchueler commented Nov 10, 2020

LSchueler left a comment

Choose a reason for hiding this comment

LSchueler Nov 10, 2020

Choose a reason for hiding this comment

MuellerSeb Nov 10, 2020

Choose a reason for hiding this comment

LSchueler Nov 10, 2020

Choose a reason for hiding this comment

MuellerSeb commented Nov 10, 2020

MuellerSeb commented Nov 10, 2020

MuellerSeb commented Nov 7, 2020 •

edited

Loading

MuellerSeb commented Nov 9, 2020 •

edited

Loading

MuellerSeb Nov 10, 2020 •

edited

Loading

MuellerSeb Nov 10, 2020 •

edited

Loading