diff --git a/README.md b/README.md
index ce4e4cc..07fbaa1 100755
--- a/README.md
+++ b/README.md
@@ -8,9 +8,9 @@ AutoNormalize is a Python library for automated datatable normalization. It allo
## Getting Started
-* [Install](#install)
-* [Demos](#demos)
-* [API Reference](#api-reference)
+- [Install](#install)
+- [Demos](#demos)
+- [API Reference](#api-reference)
## Install
@@ -26,11 +26,11 @@ pip uninstall autonormalize
## Demos
-* [Blog Post](https://blog.featurelabs.com/automatic-dataset-normalization-for-feature-engineering-in-python/)
-* [Machine Learning Demo with Featuretools](https://github.com/FeatureLabs/autonormalize/blob/master/autonormalize/demos/AutoNormalize%20%2B%20FeatureTools%20Demo.ipynb)
-* [Kaggle Liquor Sales Dataset Demo](https://github.com/FeatureLabs/autonormalize/blob/master/autonormalize/demos/Kaggle%20Liquor%20Sales%20Dataset%20Demo.ipynb)
-* [Demo with Editing Dependencies](https://github.com/FeatureLabs/autonormalize/blob/master/autonormalize/demos/Editing%20Dependnecies%20Demo.ipynb)
-* [Kaggle Food Production Dataset Demo](https://github.com/FeatureLabs/autonormalize/blob/master/autonormalize/demos/Kaggle%20Food%20%20Dataset%20Demo.ipynb)
+- [Blog Post](https://blog.featurelabs.com/automatic-dataset-normalization-for-feature-engineering-in-python/)
+- [Machine Learning Demo with Featuretools](https://github.com/FeatureLabs/autonormalize/blob/master/autonormalize/demos/AutoNormalize%20%2B%20FeatureTools%20Demo.ipynb)
+- [Kaggle Liquor Sales Dataset Demo](https://github.com/FeatureLabs/autonormalize/blob/master/autonormalize/demos/Kaggle%20Liquor%20Sales%20Dataset%20Demo.ipynb)
+- [Demo with Editing Dependencies](https://github.com/FeatureLabs/autonormalize/blob/master/autonormalize/demos/Editing%20Dependnecies%20Demo.ipynb)
+- [Kaggle Food Production Dataset Demo](https://github.com/FeatureLabs/autonormalize/blob/master/autonormalize/demos/Kaggle%20Food%20%20Dataset%20Demo.ipynb)
## API Reference
@@ -44,19 +44,19 @@ Creates a normalized entityset from a dataframe.
**Arguments:**
-* `df` (pd.Dataframe) : the dataframe containing data
+- `df` (pd.Dataframe) : the dataframe containing data
-* `accuracy` (0 < float <= 1.00; default = 0.98) : the accuracy threshold required in order to conclude a dependency (i.e. with accuracy = 0.98, 0.98 of the rows must hold true the dependency LHS --> RHS)
+- `accuracy` (0 < float <= 1.00; default = 0.98) : the accuracy threshold required in order to conclude a dependency (i.e. with accuracy = 0.98, 0.98 of the rows must hold true the dependency LHS --> RHS)
-* `index` (str, optional) : name of column that is intended index of df
+- `index` (str, optional) : name of column that is intended index of df
-* `name` (str, optional) : the name of created EntitySet
+- `name` (str, optional) : the name of created EntitySet
-* `time_index` (str, optional) : name of time column in the dataframe.
+- `time_index` (str, optional) : name of time column in the dataframe.
**Returns:**
-* `entityset` (ft.EntitySet) : created entity set
+- `entityset` (ft.EntitySet) : created entity set
### `find_dependencies`
@@ -68,7 +68,7 @@ Finds dependencies within dataframe with the DFD search algorithm.
**Returns:**
-* `dependencies` (Dependencies) : the dependencies found in the data within the contraints provided
+- `dependencies` (Dependencies) : the dependencies found in the data within the contraints provided
### `normalize_dataframe`
@@ -78,13 +78,13 @@ normalize_dataframe(df, dependencies)
Normalizes dataframe based on the dependencies given. Keys for the newly created DataFrames can only be columns that are strings, ints, or categories. Keys are chosen according to the priority:
-1) shortest lenghts
-2) has "id" in some form in the name of an attribute
-3) has attribute furthest to left in the table
+1. shortest lenghts
+2. has "id" in some form in the name of an attribute
+3. has attribute furthest to left in the table
**Returns:**
-* `new_dfs` (list[pd.DataFrame]) : list of new dataframes
+- `new_dfs` (list[pd.DataFrame]) : list of new dataframes
@@ -98,25 +98,25 @@ Creates a normalized EntitySet from dataframe based on the dependencies given. K
**Returns:**
-* `entityset` (ft.EntitySet) : created EntitySet
+- `entityset` (ft.EntitySet) : created EntitySet
-### `normalize_entity`
+### `normalize_entityset`
```shell
-normalize_entity(es, accuracy=0.98)
+normalize_entityset(es, accuracy=0.98)
```
Returns a new normalized `EntitySet` from an `EntitySet` with a single entity.
**Arguments:**
-* `es` (ft.EntitySet) : EntitySet with a single entity to normalize
+- `es` (ft.EntitySet) : EntitySet with a single entity to normalize
**Returns:**
-* `new_es` (ft.EntitySet) : new normalized EntitySet
+- `new_es` (ft.EntitySet) : new normalized EntitySet
diff --git a/autonormalize/autonormalize.py b/autonormalize/autonormalize.py
index 244bbc9..278b315 100644
--- a/autonormalize/autonormalize.py
+++ b/autonormalize/autonormalize.py
@@ -85,24 +85,31 @@ def make_entityset(df, dependencies, name=None, time_index=None):
normalize.normalize_dataframe(depdf)
normalize.make_indexes(depdf)
- entities = {}
+ dataframes = {}
relationships = []
stack = [depdf]
while stack != []:
current = stack.pop()
+ if (current.df.ww.schema is None):
+ current.df.ww.init(index=current.index[0], name=current.index[0])
+
+ current_df_name = current.df.ww.name
if time_index in current.df.columns:
- entities[current.index[0]] = (current.df, current.index[0], time_index)
+ dataframes[current_df_name] = (current.df, current.index[0], time_index)
else:
- entities[current.index[0]] = (current.df, current.index[0])
+ dataframes[current_df_name] = (current.df, current.index[0])
for child in current.children:
+ if (child.df.ww.schema is None):
+ child.df.ww.init(index=child.index[0], name=child.index[0])
+ child_df_name = child.df.ww.name
# add to stack
# add relationship
stack.append(child)
- relationships.append((child.index[0], child.index[0], current.index[0], child.index[0]))
+ relationships.append((child_df_name, child.index[0], current_df_name, child.index[0]))
- return ft.EntitySet(name, entities, relationships)
+ return ft.EntitySet(name, dataframes, relationships)
def auto_entityset(df, accuracy=0.98, index=None, name=None, time_index=None):
@@ -141,9 +148,9 @@ def auto_normalize(df):
return normalize_dataframe(df, find_dependencies(df))
-def normalize_entity(es, accuracy=0.98):
+def normalize_entityset(es, accuracy=0.98):
"""
- Returns a new normalized EntitySet from an EntitySet with a single entity.
+ Returns a new normalized EntitySet from an EntitySet with a single dataframe.
Arguments:
es (ft.EntitySet) : EntitySet to normalize
@@ -152,13 +159,14 @@ def normalize_entity(es, accuracy=0.98):
Returns:
new_es (ft.EntitySet) : new normalized EntitySet
"""
- # TO DO: add option to pass an EntitySet with more than one entity, and specify which one
+ # TO DO: add option to pass an EntitySet with more than one dataframe, and specify which one
# to normalize while preserving existing relationships
- if len(es.entities) > 1:
- raise ValueError('There is more than one entity in this EntitySet')
- if len(es.entities) == 0:
+ if len(es.dataframes) > 1:
+ raise ValueError('There is more than one dataframe in this EntitySet')
+ if len(es.dataframes) == 0:
raise ValueError('This EntitySet is empty')
- entity = es.entities[0]
- new_es = auto_entityset(entity.df, accuracy, index=entity.index, name=es.id, time_index=entity.time_index)
+
+ df = es.dataframes[0]
+ new_es = auto_entityset(df, accuracy, index=df.ww.index, name=es.id, time_index=df.ww.time_index)
return new_es
diff --git a/autonormalize/tests/test_example.py b/autonormalize/tests/test_example.py
index d8664fd..ac42a63 100644
--- a/autonormalize/tests/test_example.py
+++ b/autonormalize/tests/test_example.py
@@ -1,5 +1,8 @@
import featuretools as ft
+import pandas as pd
+from unittest.mock import patch
+import pytest
import autonormalize as an
@@ -21,3 +24,30 @@ def test_ft_mock_customer():
assert set([str(rel) for rel in entityset.relationships]) == set([' session_id.session_id>',
' product_id.product_id>',
' customer_id.customer_id>'])
+
+
+@patch("autonormalize.autonormalize.auto_entityset")
+def test_normalize_entityset(auto_entityset):
+ df1 = pd.DataFrame({"test": [0, 1, 2]})
+ df2 = pd.DataFrame({"test": [0, 1, 2]})
+ accuracy = 0.98
+
+ es = ft.EntitySet()
+
+ error = "This EntitySet is empty"
+ with pytest.raises(ValueError, match=error):
+ an.normalize_entityset(es, accuracy)
+
+ es.add_dataframe(df1, "df")
+
+ df_out = es.dataframes[0]
+
+ an.normalize_entityset(es, accuracy)
+
+ auto_entityset.assert_called_with(df_out, accuracy, index=df_out.ww.index, name=es.id, time_index=df_out.ww.time_index)
+
+ es.add_dataframe(df2, "df2")
+
+ error = "There is more than one dataframe in this EntitySet"
+ with pytest.raises(ValueError, match=error):
+ an.normalize_entityset(es, accuracy)
diff --git a/dev-requirements.txt b/dev-requirements.txt
index f26d883..2940ec9 100644
--- a/dev-requirements.txt
+++ b/dev-requirements.txt
@@ -3,9 +3,9 @@ codecov==2.1.8
flake8==3.7.8
autopep8==1.4.4
isort==4.3.21
-nbsphinx==0.8.5
-pydata-sphinx-theme==0.4.0
-Sphinx==3.2.1
+nbsphinx==0.8.7
+pydata-sphinx-theme==0.7.1
+Sphinx==4.2.0
nbconvert==6.0.2
ipython==7.16.3
pygments==2.8.1
diff --git a/docs/source/api_reference.rst b/docs/source/api_reference.rst
index 270fd48..88c3af9 100755
--- a/docs/source/api_reference.rst
+++ b/docs/source/api_reference.rst
@@ -16,7 +16,7 @@ Autonormalize
make_entityset
auto_entityset
auto_normalize
- normalize_entity
+ normalize_entityset
Dependencies
======================
diff --git a/docs/source/release_notes.rst b/docs/source/release_notes.rst
index 11edc63..445182f 100755
--- a/docs/source/release_notes.rst
+++ b/docs/source/release_notes.rst
@@ -3,34 +3,41 @@
Release Notes
-------------
-.. Future Release
- ==============
+Future Release
+==============
* Enhancements
* Fixes
+ * Fix compatibility issues with featuretools (:pr:`41`)
* Changes
+ * Rename ``normalize_entity`` to ``normalize_entityset`` (:pr:`41`)
* Documentation Changes
* Testing Changes
-.. Thanks to the following people for contributing to this release:
+ Thanks to the following people for contributing to this release:
+ :user:`dvreed77`
+
+Breaking Changes
+++++++++++++++++
+ * :pr:`41`: The function ``normalize_entity`` has been renamed to ``normalize_entityset``.
v1.0.1 Jan 7, 2022
==================
* Documentation Changes
- * Update release notes and release format (:pr:`37`)
- * Updated sphinx documentation and guides (:pr:`35`)
+ * Update release notes and release format (:pr:`37`)
+ * Updated sphinx documentation and guides (:pr:`35`)
* Testing Changes
- * Updated tests to work with featuretools 1.0 (:pr:`35`)
+ * Updated tests to work with featuretools 1.0 (:pr:`35`)
- Thanks to the following people for contributing to this release:
- :user:`gsheni`, :user:`tuethan1999`
+ Thanks to the following people for contributing to this release:
+ :user:`gsheni`, :user:`tuethan1999`
v1.0.0 Aug 15, 2019
===================
* Initial Release
- Thanks to the following people for contributing to this release:
- :user:`allisonportis`
+ Thanks to the following people for contributing to this release:
+ :user:`allisonportis`
.. command
.. git log --pretty=oneline --abbrev-commit