Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functionality for linked DynamicTables #645

Merged
merged 63 commits into from
Jul 21, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
d3e8843
Updated DynamicTable to set the name of DataFrames and allow introspe…
oruebel Jul 16, 2021
d04c676
Added module to convert nested DataFrames to a flat DataFrame
oruebel Jul 16, 2021
3c58e46
Added tests for the new functionality for nested DataFrames and Dynam…
oruebel Jul 16, 2021
8a15d83
Updated Changelog
oruebel Jul 16, 2021
8fe6546
Fix flake8 on tests
oruebel Jul 16, 2021
53680ec
Added PR info in Changelog
oruebel Jul 16, 2021
26d0abb
Added first draft of to_hierarchical_datafram and to_denormalized_dat…
oruebel Jul 16, 2021
28f513e
Added AlignedDynamicTable.get function to Fix #646
oruebel Jul 16, 2021
0fdbbd0
Add functions to drop id columns and flatten the columns of a hierarc…
oruebel Jul 16, 2021
204c595
Remove to_denormalized_dataframe and drop flatten_index parameter fro…
oruebel Jul 17, 2021
5880b8d
Fix Flake8 and Changelog
oruebel Jul 17, 2021
f6e465a
Added TODO item
oruebel Jul 17, 2021
f0d2191
Add missing docstrings for extend and get methods for Data and Vector…
oruebel Jul 17, 2021
2d45ef3
Remove unit test for removed function
oruebel Jul 17, 2021
9dae845
Use docval to document hierarchicaltable.py functions
oruebel Jul 18, 2021
8281895
Fix docstring formatting
oruebel Jul 18, 2021
b442917
Document allow_extra and allow_positional in docval
oruebel Jul 18, 2021
6a108ae
Minor spelling error in docstring
oruebel Jul 18, 2021
0f15c02
Added AlignedDynamicTable.has_foreign_columns and corresponding test
oruebel Jul 18, 2021
4d5a0f3
Added AlignedDynamicTable.get_foreign_columns
oruebel Jul 19, 2021
2901d46
Implemented AlignedDynamicTable.get_linked_tables and added tests
oruebel Jul 19, 2021
dc6f223
Add test to make sure DynamicTableRegion to AlignedDynamicTable is wo…
oruebel Jul 19, 2021
15ab0d3
Fix case where DynamicTableRegion is a regular VectorData not a Vecto…
oruebel Jul 19, 2021
36ecc81
Add a test for to_hierarchical_dataframe
oruebel Jul 19, 2021
19dba59
Merge branch 'dev' into add/hierarchical_table_funcs
oruebel Jul 19, 2021
24416a3
Add unit test for drop id columns
oruebel Jul 19, 2021
19d8a28
Fix bugs in drop_id_columns and flatten_column_index
oruebel Jul 19, 2021
76de9f1
Add tests for drop_id_columns and flatten_column_index
oruebel Jul 19, 2021
a43fba2
Add no-level to_hierarchical_dataframe test to imporve coverage
oruebel Jul 19, 2021
ff8e5cc
Add test for to_hierarchical_dataframe with multiple levels and using…
oruebel Jul 19, 2021
7810b6e
Remove unnecceary error check
oruebel Jul 19, 2021
a7f0bef
Fix bug in flatten_column_index where column names got shortened when…
oruebel Jul 19, 2021
5f54d3c
Testing the corner case of to_hierarchical_dataframe on empty tables
oruebel Jul 19, 2021
64b2603
Fix spelling error in test
oruebel Jul 19, 2021
6a00315
Test to_hierarchical_dataframe without vectorindex on the top dtr
oruebel Jul 19, 2021
9dc947a
Cover the case where index dtr is at the last level in convert to_hie…
oruebel Jul 19, 2021
5b1ba20
Fix name of test cases for to_hierarchical_dataframe
oruebel Jul 19, 2021
9749477
Remove unused logic in to_hierarchcial_dataframe
oruebel Jul 19, 2021
d04c213
Fix flake8
oruebel Jul 19, 2021
f561604
Update docstring of get_foreign_columns
oruebel Jul 20, 2021
a67673d
Update docstring of to_hierarchical_dataframe
oruebel Jul 20, 2021
d068ec7
Fix spelling in src/hdmf/common/hierarchicaltable.py
oruebel Jul 20, 2021
2f0da57
Update spelling in src/hdmf/common/hierarchicaltable.py
oruebel Jul 20, 2021
c6fffa9
Update spelling in src/hdmf/common/hierarchicaltable.py
oruebel Jul 20, 2021
dcac2f8
Removed old comment src/hdmf/common/hierarchicaltable.py
oruebel Jul 20, 2021
ca0b3e0
Update docstring src/hdmf/common/table.py
oruebel Jul 20, 2021
2f1cc79
Update docstring src/hdmf/data_utils.py
oruebel Jul 20, 2021
6a8b447
Clean up comments and duplicate logic to simplify to_hierarchical_dat…
oruebel Jul 20, 2021
ec37ab8
Merge branch 'dev' into add/hierarchical_table_funcs
oruebel Jul 20, 2021
90b4499
Added AlignedDynamicTable.get_colnames function and corresponding tests
oruebel Jul 20, 2021
f7962d2
Updated to_hierarchical_dataframe to use new AlignedDynamicTabel.get_…
oruebel Jul 20, 2021
d5fd522
Update src/hdmf/common/hierarchicaltable.py
oruebel Jul 20, 2021
2c0f94b
Update src/hdmf/common/hierarchicaltable.py
oruebel Jul 20, 2021
72427ca
Update src/hdmf/common/hierarchicaltable.py
oruebel Jul 20, 2021
6234b1c
Remove comments from old tests
oruebel Jul 20, 2021
c5e6b2b
Fix #651 Support [int, str], [int, (str, str)] type slicing for Align…
oruebel Jul 20, 2021
58b8f43
Fix bug in to_hierarchical_dataframe when converting AlignedDynamicTa…
oruebel Jul 20, 2021
165229c
Fix spelling in src/hdmf/common/alignedtable.py
oruebel Jul 21, 2021
0ff0aae
Clarify return value for AlignedDynamicTable.get_colnames
oruebel Jul 21, 2021
e954668
Added documentation for the various cases for AlignedDynamicTable.get
oruebel Jul 21, 2021
2a633f8
Update failing test
oruebel Jul 21, 2021
76fdccd
Update src/hdmf/common/alignedtable.py
rly Jul 21, 2021
15480ae
Update src/hdmf/common/alignedtable.py
rly Jul 21, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,32 @@

## Upcoming (TBD)

### New features
- Added several features to simplify interaction with ``DynamicTable`` objects that link to other tables via
``DynamicTableRegion`` columns. @oruebel (#645)
- Added ``DynamicTable.get_foreign_columns`` to find all columns in a table that are a ``DynamicTableRegion``
- Added ``DynamicTable.has_foreign_columns`` to identify if a ``DynamicTable`` contains ``DynamicTableRegion`` columns
- Added ``DynamicTable.get_linked_tables`` to retrieve all tables linked to either directly or indirectly from
the current table via ``DynamicTableRegion``
- Implemented the new ``get_foreign_columns``, ``has_foreign_columns``, and ``get_linked_tables`` also for
``AlignedDynamicTable``
- Added new module ``hdmf.common.hierarchicaltable`` with helper functions to facilitate conversion of
hierarchically nested ``DynamicTable`` objects via the following new functions:
- ``to_hierarchical_dataframe`` to merge linked tables into a single consolidated pandas DataFrame.
- ``drop_id_columns`` to remove "id" columns from a DataFrame.
- ``flatten_column_index`` to replace a ``pandas.MultiIndex`` with a regular ``pandas.Index``

### Bug fixes
- Do not build wheels compatible with Python 2 because HDMF requires Python 3.7. @rly (#642)
- ``AlignedDynamicTable`` did not overwrite its ``get`` function. When using ``DynamicTableRegion`` to referenece ``AlignedDynamicTable`` this led to cases where the columns of the category subtables where omitted during data access (e.g., conversion to pandas.DataFrame). This fix adds the ``AlignedDynamicTable.get`` based on the existing ``AlignedDynamicTable.__getitem__``. @oruebel (#645)
- Fixed #651 to support selection of cells in an ``AlignedDynamicTable`` via slicing with ``[int, (str, str)]``(and ``[int, str, str]``) to select a single cell, and ``[int, str]`` to select a single row of a category table. @oruebel (#645)

### Minor improvements
- Updated ``DynamicTable.to_dataframe()`` and ``DynamicTable.get`` functions to set the ``.name`` attribute
on generated pandas DataFrame objects. @oruebel (#645)
- Added ``AlignedDynamicTable.get_colnames(...)`` to support look-up of the full list of columns as the
``AlignedDynamicTable.colnames`` property only includes the columns of the main table for compliance with
``DynamicTable`` @oruebel (#645)
- Fix documentation for `DynamicTable.get` and `DynamicTableRegion.get`. @rly (#650)

## HDMF 3.0.1 (July 7, 2021)
Expand Down
182 changes: 168 additions & 14 deletions src/hdmf/common/alignedtable.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,10 @@ class AlignedDynamicTable(DynamicTable):
defines a 2-level table in which the main data is stored in the main table implemented by this type
and additional columns of the table are grouped into categories, with each category being'
represented by a separate DynamicTable stored within the group.

NOTE: To remain compatible with DynamicTable, the attribute colnames represents only the
columns of the main table (not including the category tables). To get the full list of
column names, use the get_colnames() function instead.
"""
__fields__ = ({'name': 'category_tables', 'child': True}, )

Expand Down Expand Up @@ -209,6 +213,28 @@ def add_row(self, **kwargs):
for category, values in category_data.items():
self.category_tables[category].add_row(**values)

@docval({'name': 'include_category_tables', 'type': bool,
'doc': "Ignore sub-category tables and just look at the main table", 'default': False},
{'name': 'ignore_category_ids', 'type': bool,
'doc': "Ignore id columns of sub-category tables", 'default': False})
def get_colnames(self, **kwargs):
"""Get the full list of names of columns for this table

:returns: List of tuples (str, str) where the first string is the name of the DynamicTable
that contains the column and the second string is the name of the column. If
include_category_tables is False, then a list of column names is returned.
"""
if not getargs('include_category_tables', kwargs):
return self.colnames
else:
ignore_category_ids = getargs('ignore_category_ids', kwargs)
columns = [(self.name, c) for c in self.colnames]
for category in self.category_tables.values():
if not ignore_category_ids:
columns += [(category.name, 'id'), ]
columns += [(category.name, c) for c in category.colnames]
return columns

@docval({'name': 'ignore_category_ids', 'type': bool,
'doc': "Ignore id columns of sub-category tables", 'default': False})
def to_dataframe(self, **kwargs):
Expand All @@ -225,21 +251,62 @@ def to_dataframe(self, **kwargs):

def __getitem__(self, item):
"""
:param item: Selection defining the items of interest. This may be a
Called to implement standard array slicing syntax.

* **int, list, array, slice** : Return one or multiple row of the table as a DataFrame
* **string** : Return a single category table as a DynamicTable or a single column of the
primary table as a
* **tuple**: Get a column, row, or cell from a particular category. The tuple is expected to consist
of (category, selection) where category may be a string with the name of the sub-category
or None (or the name of this AlignedDynamicTable) if we want to slice into the main table.
Same as ``self.get(item)``. See :py:meth:`~hdmf.common.alignedtable.AlignedDynamicTable.get` for details.
"""
return self.get(item)

:returns: DataFrame when retrieving a row or category. Returns scalar when selecting a cell.
Returns a VectorData/VectorIndex when retrieving a single column.
def get(self, item, **kwargs):
"""
Access elements (rows, columns, category tables etc.) from the table. Instead of calling
this function directly, the class also implements standard array slicing syntax
via :py:meth:`~hdmf.common.alignedtable.AlignedDynamicTable.__getitem__`
(which calls this function). For example, instead of calling
``self.get(item=slice(2,5))`` we may use the often more convenient form of ``self[2:5]`` instead.

:param item: Selection defining the items of interest. This may be either a:

* **int, list, array, slice** : Return one or multiple row of the table as a pandas.DataFrame. For example:
* ``self[0]`` : Select the first row of the table
* ``self[[0,3]]`` : Select the first and fourth row of the table
* ``self[1:4]`` : Select the rows with index 1,2,3 from the table

* **string** : Return a column from the main table or a category table. For example:
* ``self['column']`` : Return the column from the main table.
* ``self['my_category']`` : Returns a DataFrame of the ``my_category`` category table.
This is a shorthand for ``self.get_category('my_category').to_dataframe()``.

* **tuple**: Get a column, row, or cell from a particular category table.
The tuple is expected to consist of the following elements:

* ``category``: string with the name of the category. To select from the main
table use ``self.name`` or ``None``.
* ``column``: string with the name of the column, and
* ``row``: integer index of the row.

The tuple itself then may take the following forms:

* Select a single column from a table via:
* ``self[category, column]``
* Select a single full row of a given category table via:
* ``self[row, category]`` (recommended, for consistency with DynamicTable)
* ``self[category, row]``
* Select a single cell via:
* ``self[row, (category, column)]`` (recommended, for consistency with DynamicTable)
* ``self[row, category, column]``
* ``self[category, column, row]``

:returns: Depending on the type of selection the function returns a:

* **pandas.DataFrame**: when retrieving a row or category table
* **array** : when retrieving a single column
* **single value** : when retrieving a single cell. The data type and shape will depend on the
data type and shape of the cell/column.
"""
if isinstance(item, (int, list, np.ndarray, slice)):
# get a single full row from all tables
dfs = ([super().__getitem__(item).reset_index(), ] +
dfs = ([super().get(item, **kwargs).reset_index(), ] +
[category[item].reset_index() for category in self.category_tables.values()])
names = [self.name, ] + list(self.category_tables.keys())
res = pd.concat(dfs, axis=1, keys=names)
Expand All @@ -248,14 +315,101 @@ def __getitem__(self, item):
elif isinstance(item, str) or item is None:
if item in self.colnames:
# get a specific column
return super().__getitem__(item)
return super().get(item, **kwargs)
else:
# get a single category
return self.get_category(item).to_dataframe()
elif isinstance(item, tuple):
if len(item) == 2:
return self.get_category(item[0])[item[1]]
# DynamicTable allows selection of cells via the syntax [int, str], i.e,. [row_index, columnname]
# We support this syntax here as well with the additional caveat that in AlignedDynamicTable
# columns are identified by tuples of strings. As such [int, str] refers not to a cell but
# a single row in a particular category table (i.e., [row_index, category]). To select a cell
# the second part of the item then is a tuple of strings, i.e., [row_index, (category, column)]
if isinstance(item[0], (int, np.integer)):
oruebel marked this conversation as resolved.
Show resolved Hide resolved
# Select a single cell or row of a sub-table based on row-index(item[0])
# and the category (if item[1] is a string) or column (if item[1] is a tuple of (category, column)
re = self[item[0]][item[1]]
# re is a pandas.Series or pandas.Dataframe. If we selected a single cell
# (i.e., item[2] was a tuple defining a particular column) then return the value of the cell
if re.size == 1:
re = re.values[0]
# If we selected a single cell from a ragged column then we need to change the list to a tuple
if isinstance(re, list):
re = tuple(re)
# We selected a row of a whole table (i.e., item[2] identified only the category table,
# but not a particular column).
# Change the result from a pandas.Series to a pandas.DataFrame for consistency with DynamicTable
if isinstance(re, pd.Series):
re = re.to_frame()
return re
else:
return self.get_category(item[0])[item[1]]
elif len(item) == 3:
return self.get_category(item[0])[item[1]][item[2]]
if isinstance(item[0], (int, np.integer)):
return self.get_category(item[1])[item[2]][item[0]]
else:
return self.get_category(item[0])[item[1]][item[2]]
else:
raise ValueError("Expected tuple of length 2 or 3 with (category, column, row) as value.")
raise ValueError("Expected tuple of length 2 of the form [category, column], [row, category], "
"[row, (category, column)] or a tuple of length 3 of the form "
"[category, column, row], [row, category, column]")

@docval({'name': 'ignore_category_tables', 'type': bool,
'doc': "Ignore the category tables and only check in the main table columns", 'default': False},
allow_extra=False)
def has_foreign_columns(self, **kwargs):
"""
Does the table contain DynamicTableRegion columns

:returns: True if the table or any of the category tables contains a DynamicTableRegion column, else False
"""
ignore_category_tables = getargs('ignore_category_tables', kwargs)
if super().has_foreign_columns():
return True
if not ignore_category_tables:
for table in self.category_tables.values():
if table.has_foreign_columns():
return True
return False

@docval({'name': 'ignore_category_tables', 'type': bool,
'doc': "Ignore the category tables and only check in the main table columns", 'default': False},
allow_extra=False)
def get_foreign_columns(self, **kwargs):
"""
Determine the names of all columns that link to another DynamicTable, i.e.,
find all DynamicTableRegion type columns. Similar to a foreign key in a
database, a DynamicTableRegion column references elements in another table.

:returns: List of tuples (str, str) where the first string is the name of the
category table (or None if the column is in the main table) and the
second string is the column name.
"""
ignore_category_tables = getargs('ignore_category_tables', kwargs)
col_names = [(None, col_name) for col_name in super().get_foreign_columns()]
if not ignore_category_tables:
for table in self.category_tables.values():
col_names += [(table.name, col_name) for col_name in table.get_foreign_columns()]
return col_names

@docval(*get_docval(DynamicTable.get_linked_tables),
{'name': 'ignore_category_tables', 'type': bool,
'doc': "Ignore the category tables and only check in the main table columns", 'default': False},
allow_extra=False)
def get_linked_tables(self, **kwargs):
"""
Get a list of the full list of all tables that are being linked to directly or indirectly
from this table via foreign DynamicTableColumns included in this table or in any table that
can be reached through DynamicTableRegion columns


Returns: List of dicts with the following keys:
* 'source_table' : The source table containing the DynamicTableRegion column
* 'source_column' : The relevant DynamicTableRegion column in the 'source_table'
* 'target_table' : The target DynamicTable; same as source_column.table.

"""
ignore_category_tables = getargs('ignore_category_tables', kwargs)
other_tables = None if ignore_category_tables else list(self.category_tables.values())
return super().get_linked_tables(other_tables=other_tables)
Loading