Skip to content

Commit

Permalink
Merge pull request #49 from RoyalHaskoningDHV/linear
Browse files Browse the repository at this point in the history
SAM 3.1.0 : Lasso model, Validators, ClipTransformer and some utils
  • Loading branch information
abontsema authored Aug 25, 2022
2 parents aa4dcf6 + 8540482 commit 59e96fb
Show file tree
Hide file tree
Showing 44 changed files with 28,762 additions and 327 deletions.
3 changes: 1 addition & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,6 @@ wheels/
MANIFEST
.idea


# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
Expand Down Expand Up @@ -145,4 +144,4 @@ venv.bak/

# Exceptions
!docs/requirements.txt
!data/rainbow_beach.parquet
!sam/datasets/data/*.csv
15 changes: 15 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,20 @@ Version X.Y.Z stands for:

-------------

## Version 3.1.0

### New features

- New class `sam.models.LassoTimeseriesRegressor` to create a Lasso regression model for time series data incl. quantile predictions.
- New class `sam.preprocessing.ClipTransformer` to clip input values to the range from the train set, making models more robust again
- New abstract base class `sam.validation.BaseValidator` for all validators.
- Renamed `sam.validation.RemoveFlatlines` to `sam.validation.FlatlineValidator`. `sam.validation.RemoveFlatlines` is still available, but removed in future versions.
- Renamed `sam.validation.RemoveExtremeValues` to `sam.validation.MADValidator`. `sam.validation.RemoveExtremeValues` is still available, but removed in future versions.
- New class `sam.validation.OutsideRangeValidator` for checking / removing data outside of a range.
- New function `datetime_train_test_split` to split pandas dataframes and series based on a datetime.
- New `sam.datasets` module containing functions for loading read-to-use datasets: `sam.datasets.load_rainbow_beach` and `sam.datasets.load_sewage_data`.
st outliers.

## Version 3.0.4

### Changes
Expand All @@ -19,6 +33,7 @@ Version X.Y.Z stands for:
- Updated package dependencies for scikit-learn
- Changed the DeepExplainer to the model agnostic KernelExplainer, so we can remove all the v1 dependencies on tensorflow
- Fixed pytest MPL bug by temporarily setting it to a previous version

## Version 3.0.3

### New features
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,11 +32,11 @@ Keep in mind that the sam package is updated frequently, and after a while, your
Below you can find a simple example on how to use one of our timeseries models. For more examples, check our [example notebooks](https://github.com/RoyalHaskoningDHV/sam/tree/main/examples)

```python
import pandas as pd
from sam.datasets import load_rainbow_beach
from sam.models import MLPTimeseriesRegressor
from sam.feature_engineering import SimpleFeatureEngineer

data = pd.read_parquet("../data/rainbow_beach.parquet") # Requires `pyarrow` package
data = load_rainbow_beach()
X, y = data, data["water_temperature"]

# Easily create rolling and time features to be used by the model
Expand Down
Binary file removed data/rainbow_beach.parquet
Binary file not shown.
18 changes: 18 additions & 0 deletions docs/source/datasets.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
.. _datasets:

=============
Data Sets
=============

This is the documentation for available datasets.

Rainbow Beach
-------------
.. autofunction:: sam.datasets.load_rainbow_beach


Sewage data
-----------
.. autofunction:: sam.datasets.load_sewage_data


1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ Welcome to SAM's documentation!
data
examples
data_sources
datasets
preprocessing
exploration
feature_engineering
Expand Down
7 changes: 7 additions & 0 deletions docs/source/preprocessing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,13 @@ Preprocessing

This is the documentation for preprocessing functions.

Clipping data
-------------
.. autoclass:: sam.preprocessing.ClipTransformer
:members:
:undoc-members:
:show-inheritance:

Normalize timestamps
--------------------
.. warning::
Expand Down
18 changes: 16 additions & 2 deletions docs/source/validation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,23 @@ Data Validation

This is the documentation for the validation functions.

Base Validation class
---------------------
.. autoclass:: sam.validation.BaseValidator
:members:
:undoc-members:
:show-inheritance:

Detect Outside Range
--------------------
.. autoclass:: sam.validation.OutsideRangeValidator
:members:
:undoc-members:
:show-inheritance:

Detect Extreme Values
---------------------------
.. autoclass:: sam.validation.RemoveExtremeValues
.. autoclass:: sam.validation.MADValidator
:members:
:undoc-members:
:show-inheritance:
Expand All @@ -23,7 +37,7 @@ Testset image:

Detect Flatlines
---------------------------
.. autoclass:: sam.validation.RemoveFlatlines
.. autoclass:: sam.validation.FlatlineValidator
:members:
:undoc-members:
:show-inheritance:
Expand Down
5 changes: 2 additions & 3 deletions examples/feature_engineering.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -130,10 +130,9 @@
],
"source": [
"import pandas as pd\n",
"from sam.datasets import load_rainbow_beach\n",
"\n",
"data = pd.read_parquet('../data/rainbow_beach.parquet')\n",
"\n",
"data.head()"
"data = load_rainbow_beach()"
]
},
{
Expand Down
Loading

0 comments on commit 59e96fb

Please sign in to comment.