Discrete transform #70

LSchueler · 2020-02-19T14:42:22Z

I added a transform function to the collection, which takes values and optionally thresholds and applies these to an existing field, in order to create a field made up of a number of discrete values.

MuellerSeb

Thanks for that addition! I really think this is a good extension to the binary transformation.
I have some issues with the standard handling of the input and the restrictions to the given values:

why is it necessary for the thresholds to be related to the given values?
I think a discrete transformation could also be called with a single number, that says how many values there should be in the end. Like: divide it in 5 values, following the original field.
When giving values and thresholds, I would just check, if len(values) == len(thresholds)+1 and not that the corresponding thresholds are between the current values, because in the end the given values could be chosen independent of the given field.

What do you think?

MuellerSeb · 2020-02-20T13:42:56Z

examples/07_transformations/02_discrete.py

+srf = gs.SRF(model, seed=20170519)
+srf.structured([x, y])
+# create 5 equidistanly spaced values
+discrete_values = np.linspace(np.min(srf.field), np.max(srf.field), 5)


The values are equidistantly, yes. But they don't need to be equally distributed in the end. I think, this could be misleading.

You mean it's misleading for this specific example? Would " create 5 eq. spaced values for this example" be better?! Or do you prefer an example with non-eq. spaced values?

MuellerSeb · 2020-02-20T13:43:58Z

gstools/transform/field.py

+    else:
+        if thresholds is None:
+            # just in case, sort the values
+            values = np.sort(values)


What if I want the values to BE unsorted? Why do the given values need to be in relation to the values of the given field?

Well, at least for the way I implemented the transformation, the values have to be monotonically increasing. I'm also not sure of how to interpret non monotonically increasing values.

MuellerSeb · 2020-02-20T13:44:18Z

gstools/transform/field.py

+            values = np.array(values)
+            thresholds = np.array(thresholds)
+            for i in range(len(thresholds)):
+                if not (values[i] <= thresholds[i] < values[i + 1]):


Why this restriction?

How else would you define an array of thresholds, which subdivides an array of values? - It's a bit like the nodes and the edges of a graph.

The binary transformation does the following:

get a divide value to select "lower" and "upper" values

replace the lower and upper values with given values

The divide values is unrelated to the given lower and upper values, which will be set in the field. It is only related to the input-field

MuellerSeb · 2020-02-20T13:45:37Z

gstools/transform/field.py

+        if thresholds is None:
+            # just in case, sort the values
+            values = np.sort(values)
+            thresholds = (values[1:] + values[:-1]) / 2


I think the natural way would be to us thresholds, so that the resulting ratios between the given values are even. Maybe I don't get the aim of this transformation...

Yeah, I was also not exactly sure how to handle the kind of input arguments the best.
But at least in my use case I'm just interested in the number of "value classes". The non-arithmetic mean thresholds are just a little generalisation, which was easy to implement.

If the function would just take an array of thresholds, what values would you assign to all the field values lying in between two thresholds?

LSchueler · 2020-02-20T14:40:55Z

Thank's for the input.

I think I've addressed 1) with my other comments.

At first I also wanted to simply take an integer, but by taking an array of values, it becomes so much more flexible and I simply hope, that functions like np.linscape are known to our users, otherwise I've given the example.

But if the values are not related to the field anymore, then this should not be a field transformation, but a new kind of field generation. If you want to extend the value range of the field, it could simply be multiplied by a constant factor before transforming it.
The binary transformation would also only give a globally constant field, if the divide keyword would be larger than max(field) or less than min(field).

Now, arbitrary values can be assigned to the value classes which are separated by the thresholds.

LSchueler · 2020-02-21T10:41:14Z

Wow, yesterday I had a pretty big brain fart...
I finally got your point, removed the check and added another example illustrating the new capability.

Ready for merging?

MuellerSeb · 2020-02-21T12:31:01Z

Wow, yesterday I had a pretty big brain fart...
I finally got your point, removed the check and added another example illustrating the new capability.

Ready for merging?

No problem :-)

Only thing I would argue, is the standard calculation of the threshold. I would propose the following way:

from scipy.special import erfinv
n = len(values)
p = np.arange(1, n) / n  # n-1 equal subdivisions of [0, 1]
# use quantile of the normal distribution to get equal portions for each value
thresholds = fld.mean + fld.model.sill * np.sqrt(2) * erfinv(2 * p - 1)

Then we also don't need to sort the values.

MuellerSeb · 2020-02-21T12:37:38Z

Or we provide both ways with a switch. Then we could both be happy

LSchueler · 2020-02-21T12:50:41Z

I get your point in wanting to relate the thresholds to a Gaussian process. But do you think there are application cases for discrete distributions where one would want that?

At least in my use case, I'd only want to have the arithmetic mean of the given values as a threshold.

Do you have an idea of how to concisely describe the difference of the two threshold calculation methods?

MuellerSeb · 2020-02-21T13:32:26Z

I get your point in wanting to relate the thresholds to a Gaussian process. But do you think there are application cases for discrete distributions where one would want that?

For example to get a facies distribution, where each facies has a number and they should be equally distributed and you want them to be next to each other in the order you have given them.

At least in my use case, I'd only want to have the arithmetic mean of the given values as a threshold.

Then the switch could state that. like th_mode="arithmetic" or th_mode="equal".

Do you have an idea of how to concisely describe the difference of the two threshold calculation methods?

One connects the given values to the field and does sth. like a nearest neighbor interpolation (dividing areas by the arithmetic mean of the given values). The other chunks the given field in equal portions and equips each area with a corresponding given value.

EDIT

Or we could use the thresholds keyword itself: thresholds="arithmetic"

MuellerSeb · 2020-02-21T13:42:59Z

If we use the thresholds keyword, only a minimal modification is needed:

if thresholds == "arithmetic":
    # just in case, sort the values
    values = np.sort(values)
    thresholds = (values[1:] + values[:-1]) / 2
elif thresholds == "equal":
    n = len(values)
    p = np.arange(1, n) / n  # n-1 equal subdivisions of [0, 1]
    # use quantile of the normal distribution to get equal portions for each value
    thresholds = fld.mean + fld.model.sill * np.sqrt(2) * erfinv(2 * p - 1)
else:
    if len(values) != len(thresholds) + 1:
        raise ValueError(
            "discrete transformation: len(values) != len(thresholds) + 1"
        )
    values = np.array(values)
    thresholds = np.array(thresholds, dtype=float)

EDIT

I made a mistake in the calculation, the term fld.model.sill * np.sqrt(2) should be np.sqrt(fld.model.sill * 2)

LSchueler · 2020-02-21T16:51:14Z

Okay, I've added the possibility to add more automatic threshold calculations, but your suggestion is still pretty buggy. I'll have a look at it next week.

MuellerSeb · 2020-02-21T19:25:19Z

I just found a bug in the Zinn&Harvey transformation, where I had the same fallacy. \sigma is not the variance but the standard deviation. That's why we have to use the root.

…e; th check in discrete; call discrete from binary; zinnharvey bugfix

LSchueler · 2020-02-25T10:26:26Z

Damn, such a minor thing I added to GSTools and so much worries with it :-)
Let's finally merge this thing!

LSchueler added 3 commits February 19, 2020 15:33

Add discrete transform function

d0c270d

Add unittest to discrete transform function

c6f6d4a

Update examples

c4e0319

LSchueler requested a review from MuellerSeb February 19, 2020 14:42

LSchueler added Documentation enhancement New feature or request labels Feb 19, 2020

LSchueler added this to the 1.1.2 milestone Feb 19, 2020

LSchueler self-assigned this Feb 19, 2020

Add item to changelog

f2cbee0

MuellerSeb reviewed Feb 20, 2020

View reviewed changes

MuellerSeb modified the milestones: 1.1.2, 1.2 Feb 20, 2020

Remove unnecessary input check and add an example

9a88da7

Now, arbitrary values can be assigned to the value classes which are separated by the thresholds.

Add possibility to add more threshold calculations

bbb8ec6

MuellerSeb mentioned this pull request Feb 21, 2020

Bug in Zinn&Harvey transformation #71

Closed

MuellerSeb and others added 4 commits February 22, 2020 15:34

transform: Add equal thresholds option to discrete; bugfix in discret…

00df6a4

…e; th check in discrete; call discrete from binary; zinnharvey bugfix

examples: add equal thresholds example for discrete transform

4a8c9b8

tests: test equal thresholds in discrete transform

a3580c7

Update docstring

ddb6e5f

LSchueler merged commit 66ea6a8 into develop Feb 25, 2020

LSchueler deleted the discrete_transform branch February 25, 2020 10:30

MuellerSeb mentioned this pull request Mar 20, 2020

1.2.0 release #73

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discrete transform #70

Discrete transform #70

LSchueler commented Feb 19, 2020

MuellerSeb left a comment

MuellerSeb Feb 20, 2020

LSchueler Feb 20, 2020

MuellerSeb Feb 20, 2020

LSchueler Feb 20, 2020

MuellerSeb Feb 20, 2020

LSchueler Feb 20, 2020 •

edited

Loading

MuellerSeb Feb 20, 2020

MuellerSeb Feb 20, 2020

LSchueler Feb 20, 2020

LSchueler commented Feb 20, 2020

LSchueler commented Feb 21, 2020

MuellerSeb commented Feb 21, 2020

MuellerSeb commented Feb 21, 2020

LSchueler commented Feb 21, 2020

MuellerSeb commented Feb 21, 2020 •

edited

Loading

MuellerSeb commented Feb 21, 2020 •

edited

Loading

LSchueler commented Feb 21, 2020

MuellerSeb commented Feb 21, 2020

LSchueler commented Feb 25, 2020

Discrete transform #70

Discrete transform #70

Conversation

LSchueler commented Feb 19, 2020

MuellerSeb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LSchueler Feb 20, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LSchueler commented Feb 20, 2020

LSchueler commented Feb 21, 2020

MuellerSeb commented Feb 21, 2020

MuellerSeb commented Feb 21, 2020

LSchueler commented Feb 21, 2020

MuellerSeb commented Feb 21, 2020 • edited Loading

EDIT

MuellerSeb commented Feb 21, 2020 • edited Loading

EDIT

LSchueler commented Feb 21, 2020

MuellerSeb commented Feb 21, 2020

LSchueler commented Feb 25, 2020

LSchueler Feb 20, 2020 •

edited

Loading

MuellerSeb commented Feb 21, 2020 •

edited

Loading

MuellerSeb commented Feb 21, 2020 •

edited

Loading