Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SAM 3.1.0 : Lasso model, Validators, ClipTransformer and some utils #49

Merged
merged 46 commits into from
Aug 25, 2022
Merged
Show file tree
Hide file tree
Changes from 28 commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
8cc7a72
linear model
abontsema Aug 2, 2022
fe19e88
LassoTimeseriesRegressor version 1
abontsema Aug 2, 2022
91da8c5
dump/load implemented for lasso
abontsema Aug 2, 2022
4a62000
docstrings
abontsema Aug 2, 2022
cd679fa
docstrings
abontsema Aug 2, 2022
7dba64a
Merge branch 'main' into linear
abontsema Aug 2, 2022
3d17dd7
model parameters as dict
abontsema Aug 5, 2022
8c4a222
Merge branch 'linear' of https://github.com/RoyalHaskoningDHV/sam int…
abontsema Aug 5, 2022
1f1556d
decorators for seed setting and deprecation of linear model
abontsema Aug 16, 2022
be9642a
Merge branch 'main' into linear
abontsema Aug 16, 2022
0f52328
no tensorflow for linear model
abontsema Aug 16, 2022
6e6cf4c
Merge branch 'linear' of https://github.com/RoyalHaskoningDHV/sam int…
abontsema Aug 16, 2022
847f8ce
workaround for adding seed decorator to testing
abontsema Aug 16, 2022
3459fe7
Introducting BaseValidator class and consistent validator names
abontsema Aug 16, 2022
59b261b
documentation
abontsema Aug 16, 2022
d86c91a
unused imports and commented code
abontsema Aug 16, 2022
94960f0
remove images
abontsema Aug 16, 2022
83ddfe9
formatting
abontsema Aug 16, 2022
86cf80c
New ClipTransformer class
abontsema Aug 17, 2022
5477f79
remove debug print statement
abontsema Aug 17, 2022
2885a59
unused import
abontsema Aug 17, 2022
dd6aa51
correct module in changelog
abontsema Aug 17, 2022
7a7f77b
incorrect docs
abontsema Aug 17, 2022
628b8a8
New OutsideRangeValidator class
abontsema Aug 17, 2022
0e5b659
docs
abontsema Aug 17, 2022
a0902c0
flake8 checks and imports
abontsema Aug 17, 2022
c9d5abc
dataset functions and some utils + docs
abontsema Aug 18, 2022
c021a08
Merge pull request #58 from RoyalHaskoningDHV/new_validation_methods
abontsema Aug 18, 2022
b28b447
Merge pull request #59 from RoyalHaskoningDHV/datasets
abontsema Aug 18, 2022
21d3b5b
merge changelog
abontsema Aug 18, 2022
b3ccc0a
comment length
abontsema Aug 18, 2022
a3d919e
update notebooks
abontsema Aug 18, 2022
f480914
merge
abontsema Aug 18, 2022
1d03c8d
merge
abontsema Aug 18, 2022
9930404
comments ruben
abontsema Aug 18, 2022
fbed39d
assertion for using y in predictions
abontsema Aug 18, 2022
26830da
unit tests for datasets and datetime_train_test_split
abontsema Aug 19, 2022
e4b6840
Merge branch 'main' into linear
abontsema Aug 19, 2022
1252707
flake8
abontsema Aug 19, 2022
e2aa4a9
Merge branch 'linear' of https://github.com/RoyalHaskoningDHV/sam int…
abontsema Aug 19, 2022
0a55af0
Merge branch 'main' into linear
abontsema Aug 19, 2022
fff9645
Merge branch 'main' into linear
abontsema Aug 24, 2022
8aefbcf
optional arg
abontsema Aug 24, 2022
8989914
Merge branch 'linear' of https://github.com/RoyalHaskoningDHV/sam int…
abontsema Aug 24, 2022
0fd152f
use datasets in readme
abontsema Aug 24, 2022
8540482
incorrect autoimport
abontsema Aug 24, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,6 @@ wheels/
MANIFEST
.idea


# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
Expand Down Expand Up @@ -145,4 +144,4 @@ venv.bak/

# Exceptions
!docs/requirements.txt
!data/rainbow_beach.parquet
!sam/datasets/data/*.csv
20 changes: 19 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,29 @@ Version X.Y.Z stands for:
## Version 3.1.0

### New features

- New class `sam.models.LassoTimeseriesRegressor` to create a Lasso regression model for time series data incl. quantile predictions.
- New class `sam.preprocessing.ClipTransformer` to clip input values to the range from the train set, making models more robust again
- New abstract base class `sam.validation.BaseValidator` for all validators.
- Renamed `sam.validation.RemoveFlatlines` to `sam.validation.FlatlineValidator`. `sam.validation.RemoveFlatlines` is still available, but removed in future versions.
- Renamed `sam.validation.RemoveExtremeValues` to `sam.validation.MADValidator`. `sam.validation.RemoveExtremeValues` is still available, but removed in future versions.
- New class `sam.validation.OutsideRangeValidator` for checking / removing data outside of a range.
- New function `datetime_train_test_split` to split pandas dataframes and series based on a datetime.
- New `sam.datasets` module containing functions for loading read-to-use datasets: `sam.datasets.load_rainbow_beach` and `sam.datasets.load_sewage_data`.
st outliers.

## Version 3.0.3

### New features
- Data collection function `sam.data_sources.read_regenradar` does now accept `batch_size` and collects data in batches to avoid timeouts.

## Version 3.0.2

No changes, version bump only.

## Version 3.0.1

No changes, bumped number for release.
No changes, version bump only.

## Version 3.0.0

Expand Down
Binary file removed data/rainbow_beach.parquet
Binary file not shown.
18 changes: 18 additions & 0 deletions docs/source/datasets.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
.. _datasets:

=============
Data Sets
=============

This is the documentation for available datasets.

Rainbow Beach
-------------
.. autofunction:: sam.datasets.load_rainbow_beach


Sewage data
-----------
.. autofunction:: sam.datasets.load_sewage_data


1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ Welcome to SAM's documentation!
data
examples
data_sources
datasets
preprocessing
exploration
feature_engineering
Expand Down
7 changes: 7 additions & 0 deletions docs/source/preprocessing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,13 @@ Preprocessing

This is the documentation for preprocessing functions.

Clipping data
-------------
.. autoclass:: sam.preprocessing.ClipTransformer
:members:
:undoc-members:
:show-inheritance:

Normalize timestamps
--------------------
.. warning::
Expand Down
18 changes: 16 additions & 2 deletions docs/source/validation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,23 @@ Data Validation

This is the documentation for the validation functions.

Base Validation class
---------------------
.. autoclass:: sam.validation.BaseValidator
:members:
:undoc-members:
:show-inheritance:

Detect Outside Range
--------------------
.. autoclass:: sam.validation.OutsideRangeValidator
:members:
:undoc-members:
:show-inheritance:

Detect Extreme Values
---------------------------
.. autoclass:: sam.validation.RemoveExtremeValues
.. autoclass:: sam.validation.MADValidator
:members:
:undoc-members:
:show-inheritance:
Expand All @@ -23,7 +37,7 @@ Testset image:

Detect Flatlines
---------------------------
.. autoclass:: sam.validation.RemoveFlatlines
.. autoclass:: sam.validation.FlatlineValidator
:members:
:undoc-members:
:show-inheritance:
Expand Down
5 changes: 2 additions & 3 deletions examples/feature_engineering.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -130,10 +130,9 @@
],
"source": [
"import pandas as pd\n",
"from sam.datasets import load_rainbow_beach\n",
"\n",
"data = pd.read_parquet('../data/rainbow_beach.parquet')\n",
"\n",
"data.head()"
"data = load_rainbow_beach()"
]
},
{
Expand Down
31 changes: 13 additions & 18 deletions examples/lasso.ipynb

Large diffs are not rendered by default.

46 changes: 22 additions & 24 deletions examples/mlp.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ build-backend = "setuptools.build_meta"
packages = [
"sam",
"sam.data_sources",
"sam.datasets",
"sam.exploration",
"sam.feature_engineering",
"sam.logging_functions",
Expand Down
1 change: 1 addition & 0 deletions sam/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@

__all__ = [
"data_sources",
"datasets",
"exploration",
"feature_engineering",
"logging_functions",
Expand Down
7 changes: 7 additions & 0 deletions sam/datasets/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
from .datasets import load_rainbow_beach, load_sewage_data


__all__ = [
"load_rainbow_beach",
"load_sewage_data",
]
Loading