This IKATS operator implements a scaling (also called normalization).
This operator only takes one input of the functional type ts_list
.
It also takes an optional parameter from the user:
-
scaler: The scaler used to normalize the data
- Z-Norm: Center and scale data:
result = (X - mean) / correct_std
, wherecorrect_std = sqrt(1/(N-1) sum(x - mean)^2)
- MinMax: Scale values between 0 and 1:
result = (X - min) / (max - min)
. Ifmin = max
(constant TS),result = max / 2
- MaxAbs: Scale TS by its maximum absolute value:
result = X / max( abs(X.max), abs(X.min) )
- Z-Norm: Center and scale data:
The operator has one output:
- TS list: The resulting list of time series
- In case of
Z-Norm
usage, the correct_std used corresponds to the corrected sample standard deviation, which is computed as the square root of the unbiased sample variance (where the denominator is number of observations - 1). More precisely, it's an un-biased estimation of the standard deviation. It differs than the "classic" population standard deviation. These two standard deviations are the same for large dataset.
The Spark implementation of Z-Norm differs from sklearn.
Spark behaviour: Use the corrected sample standard deviation. See this doc for more details about implementation.
Sklearn behaviour: Use the "classic" population standard deviation. See this doc for more details about implementation)
Operator implementation: Use the Spark's behaviour: (X - mean) / correct_std
, where correct_std = sqrt(1/(N**-1**) * sum(X-mean)^2)
. A coefficient (sqrt(N-1/N))
is applied to correct the calculation, in case of sklearn usage.