Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] add normalization and trend to Field #94

Closed
MuellerSeb opened this issue Jun 2, 2020 · 8 comments · Fixed by #124
Closed

[Enhancement] add normalization and trend to Field #94

MuellerSeb opened this issue Jun 2, 2020 · 8 comments · Fixed by #124
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@MuellerSeb
Copy link
Member

Normalizer

The provided transform submodule serves only as a post-processor for fields.

It would be nice to provide another way of transforming fields with a Normalizer class, that provides a transformation with its inverse transformation (normalization) of the field-values to gain normality.

Examples:

  • log-normal fields
  • boxcox transformation
  • uniform

The difference to the provided transformations is, that these are not destructive and can be inverted. The advantage is, that it could be used for kriging to inverse-transform the conditional values and then transform the field afterwards (so we could provide log-normal kriging for example)

Trend

Like already implemented in the kriging routines, we could simply allow to provide a functional mean, that is applied to the field. We would need a switch to state if the trend should be added before or after normalization.

@MuellerSeb MuellerSeb added the enhancement New feature or request label Jun 2, 2020
@MuellerSeb MuellerSeb added this to the 1.3 milestone Jun 2, 2020
@MuellerSeb MuellerSeb self-assigned this Jun 2, 2020
@MuellerSeb
Copy link
Member Author

@LSchueler : what do you think about the idea to simply allow a callable function for mean, so we don't introduce another keyword?

@LSchueler
Copy link
Member

Sorry, I forgot about this issue!

I really like the idea!
But I'm not 100% sure, what the most convenient way of providing the transformations would be.
At least syntactically, I think something like
ln_field = LogNormal(field)
and
norm_field = LogNormal(ln_field, invert=True)
would be the nicest solution.

Or we provide some kind of plugin solution which would make it possible to call
field.log_normal()
but then it would be difficult if someone wants to also apply the same transformation to e.g. a numpy array. I guess I favor the first idea.

I also dig the idea of allowing callable functions for mean.

Do you think there would be an application where one needs a trend before the transformation and one afterwards?
Hmm, would adding an argument mean or trend to the transformations for adding it afterwards solve the problem of supporting trends before and after transformations?

@MuellerSeb
Copy link
Member Author

I had the following idea:

"""Normalizer demonstration."""
import numpy as np
import gstools as gs


class Normalizer:
    """Normalizer class."""

    def __init__(self, **kwargs):
        # only use values, that have a provided default value
        for key, value in self.default_para().items():
            setattr(self, key, kwargs.get(key, value))

    def default_para(self):
        """Get default parameters."""
        return {}

    def transform(self, values):
        """Transform to target distribution."""
        pass

    def normalize(self, values):
        """Transform to normal distribution."""
        pass


class LogNormal(Normalizer):
    """Log-normal fields."""

    def transform(self, values):
        """Transform to log-normal distribution."""
        return np.exp(values)

    def normalize(self, values):
        """Transform to normal distribution."""
        return np.log(values)


class BoxCox(Normalizer):
    """Log-normal fields."""

    def default_para(self):
        """Get default parameters."""
        return {"shift": 0, "lmbda": 1}

    def transform(self, values):
        """Transform to target distribution."""
        if np.isclose(self.lmbda, 0):
            return np.exp(values) - self.shift
        return (1 + self.lmbda * values) ** (1 / self.lmbda) - self.shift

    def normalize(self, values):
        """Transform to normal distribution."""
        values += self.shift
        if np.isclose(self.lmbda, 0):
            return np.log(values)
        return (values ** self.lmbda - 1) / self.lmbda


norm_ln = LogNormal()
print(norm_ln.transform(-1))

norm_bc = BoxCox(lmbda=0)
a = norm_bc.transform(-1)
b = norm_bc.normalize(a)
print(a, b)
0.36787944117144233
-1 0.36787944117144233 -1.0

this could than be used in the Krige or SRF class:

model = gs.Gaussian()
srf = SRF(model, normalizer=norm_bc)

Then, the rest is done internally. This is mostly important for Krige where the conditioning points need the inverse transformation (that is the whole motivation for the normalizer). In SRF it only adds convenience since it should be the same as applying the transform in the legacy way.

I also had a second thought about the trend. I would always apply the trend to normal data, since residuals in regression are assumed to be normal distributed. If somebody wants it another way, they can always do it by hand.

@LSchueler
Copy link
Member

What do think about not only providing the classes, but also instances of them. Something like

from gstools.transform import log_normal
norm_ln = log_normal.transform(-1)

Then the user wouldn't have to instantiate standard tranformation classes.

@LSchueler
Copy link
Member

Then let's just support trends to the normal data!

@MuellerSeb
Copy link
Member Author

What do think about not only providing the classes, but also instances of them. Something like

from gstools.transform import log_normal
norm_ln = log_normal.transform(-1)

Then the user wouldn't have to instantiate standard tranformation classes.

I don't see a big difference to:

from gstools.normalize import LogNormal
norm_ln = LogNormal().transform(-1)

And it would be only "simple" for parameterless transformations.

@MuellerSeb
Copy link
Member Author

Another good thing would be to provide a fit method, to fit parameters to data:

norm_bc = BoxCox()
norm_bc.fit(data)

or with a shortcut:

norm_bc = BoxCox(data=data)

@LSchueler
Copy link
Member

Reminder: Rewrite precipitation tutorial after this was merged.

@MuellerSeb MuellerSeb linked a pull request Dec 21, 2020 that will close this issue
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants