Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Add parameter IO support for more formats supported by pandas #896

Merged
merged 32 commits into from
Jan 30, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
5306ea6
renaming parameter package from "csv" to "pandas"
patrickhaetti Nov 8, 2021
1896579
change csv to pandas
patrickhaetti Nov 8, 2021
e3ab7bd
creating xlsx parameter, including it to setup.cfg and set up test_lo…
patrickhaetti Nov 9, 2021
523f710
save/load xlsx parameter: replace inf / -inf with NaN when saving an…
patrickhaetti Nov 10, 2021
8ddf3a1
add module to requirements
patrickhaetti Nov 10, 2021
89d8ded
updated path in test_pandas_parameters.py
patrickhaetti Nov 11, 2021
ca86878
Updated file locations.
patrickhaetti Nov 12, 2021
788e260
Update test_pandas_parameters
patrickhaetti Nov 12, 2021
674679a
builtin/io/pandas/tests: csv added
patrickhaetti Nov 12, 2021
f0054dc
Added tab-separated-values parameter
s-weigand Nov 19, 2021
16fbf82
setup-cfg add requires
patrickhaetti Nov 22, 2021
ed5cf52
tsv.py: removed separators
patrickhaetti Nov 22, 2021
5ff3cbb
Change parameter files in /test/data
patrickhaetti Nov 22, 2021
ae28756
csv parameter inf/-inf: convert minimum/maximum columns
patrickhaetti Nov 22, 2021
9267fbb
Adjusted tests to replacing minimum and maximum default values with e…
patrickhaetti Nov 24, 2021
8f6cdf0
Activate mypy and doc linter for pandas subpackage
s-weigand Nov 24, 2021
1404967
Updated Docstrings in pandas parameter package
patrickhaetti Nov 26, 2021
db5b508
Added functions to ensure minimum/maximum columns exist
patrickhaetti Nov 29, 2021
93d00cb
Take out additional parameter file
patrickhaetti Dec 3, 2021
71de262
Added ods parameter to xlsx.py
Dec 15, 2021
294d200
Use functions safe_parameters_fillna + safe_parameters_replace in xl…
Jan 11, 2022
417a2c3
Added tests in utils/tests/test_io.py
Jan 25, 2022
5631c1a
👌Rename io.pandas unit test data
jsnel Jan 26, 2022
1087038
♻️🧪 Renamed functions in utils.io and raised test coverage to 100%
s-weigand Jan 30, 2022
5f70690
♻️🧪 Refactored tests to reduce code duplication
s-weigand Jan 30, 2022
d50b73b
🧪 Adjusted reference files to pass direct comparison test
s-weigand Jan 30, 2022
501e110
👌 Added as_optimized argument to save_parameters methods
s-weigand Jan 30, 2022
e09eaf5
📚 Updated module descriptions
s-weigand Jan 30, 2022
fc9479f
🚧 Added change to changelog
s-weigand Jan 30, 2022
6203cfe
👌 Made as_optimized keyword only
s-weigand Jan 30, 2022
37bb74a
👌 Made csv and tsv drop inf values consistently by default
s-weigand Jan 30, 2022
9602f53
🧪🗑️ Restored original behaviour of deprecated ParameterGroup.to_csv
s-weigand Jan 30, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ LICENSE @glotaran/pyglotaran_creators
# builtin module:
/glotaran/builtin/io/* @glotaran/admins
/glotaran/builtin/io/ascii @jsnel @glotaran/maintainers
/glotaran/builtin/io/csv @glotaran/maintainers
/glotaran/builtin/io/pandas @glotaran/maintainers
/glotaran/builtin/io/netCDF @glotaran/maintainers
/glotaran/builtin/io/sdt @glotaran/maintainers
/glotaran/builtin/megacomplexes/ @jsnel @joernweissenborn
Expand Down
4 changes: 2 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -101,15 +101,15 @@ repos:
args:
- "--select=D,DAR"
name: "flake8 lint docstrings"
files: "^glotaran/(plugin_system|utils|deprecation|testing|parameter|project|model/property.py)"
files: "^glotaran/(plugin_system|utils|deprecation|testing|parameter|project|model/property.py|builtin/io/pandas)"
exclude: "docs|tests?/"
additional_dependencies: [flake8-docstrings, darglint==1.8.0]

- repo: https://github.com/pre-commit/mirrors-mypy
rev: v0.931
hooks:
- id: mypy
files: "^glotaran/(plugin_system|utils|deprecation|testing|parameter|project|model/property.py)"
files: "^glotaran/(plugin_system|utils|deprecation|testing|parameter|project|model/property.py|builtin/io/pandas)"
exclude: "docs"
additional_dependencies: [types-all]

Expand Down
1 change: 1 addition & 0 deletions changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@

- 👌🎨 Add proper repr for DatasetMapping (#957)
- 👌 Add SavingOptions to save_result API (#966)
- ✨ Add parameter IO support for more formats supported by pandas (#896)

### 🩹 Bug fixes

Expand Down
Empty file.
18 changes: 0 additions & 18 deletions glotaran/builtin/io/csv/csv.py

This file was deleted.

1 change: 1 addition & 0 deletions glotaran/builtin/io/pandas/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
"""Pandas io package."""
66 changes: 66 additions & 0 deletions glotaran/builtin/io/pandas/csv.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
"""Module containing CSV io support."""

from __future__ import annotations

import numpy as np
import pandas as pd

from glotaran.io import ProjectIoInterface
from glotaran.io import register_project_io
from glotaran.parameter import ParameterGroup
from glotaran.utils.io import safe_dataframe_fillna
from glotaran.utils.io import safe_dataframe_replace


@register_project_io(["csv"])
class CsvProjectIo(ProjectIoInterface):
"""Plugin for CSV data io."""

def load_parameters(self, file_name: str, sep: str = ",") -> ParameterGroup:
"""Load parameters from CSV file.

Parameters
----------
file_name : str
Name of file to be loaded.
sep: str
Other separators can be used optionally., by default ','

Returns
-------
:class:`ParameterGroup
"""
df = pd.read_csv(file_name, skipinitialspace=True, na_values=["None", "none"], sep=sep)
safe_dataframe_fillna(df, "minimum", -np.inf)
safe_dataframe_fillna(df, "maximum", np.inf)
return ParameterGroup.from_dataframe(df, source=file_name)

def save_parameters(
self,
parameters: ParameterGroup,
file_name: str,
*,
sep: str = ",",
as_optimized: bool = True,
replace_infinfinity: bool = True,
) -> None:
"""Save a :class:`ParameterGroup` to a CSV file.

Parameters
----------
parameters : ParameterGroup
Parameters to be saved to file.
file_name : str
File to write the parameters to.
sep: str
Other separators can be used optionally., by default ','
as_optimized : bool
Weather to include properties which are the result of optimization.
replace_infinfinity : bool
Weather to replace infinity values with empty strings.
"""
df = parameters.to_dataframe(as_optimized=as_optimized)
if replace_infinfinity is True:
safe_dataframe_replace(df, "minimum", -np.inf, "")
safe_dataframe_replace(df, "maximum", np.inf, "")
df.to_csv(file_name, na_rep="None", index=False, sep=sep)
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
label,value,expression,minimum,maximum,non-negative,vary,standard-error
pure_list.1,1.0,None,,,False,True,None
pure_list.2,2.0,None,,,False,True,None
list_with_options.1,3.0,None,,,False,False,None
list_with_options.2,4.0,None,,,False,False,None
verbose_list.all_defaults,5.0,None,,,False,True,None
verbose_list.no_defaults,6.0,None,,,True,False,None
verbose_list.expression_only,11.0,$verbose_list.all_defaults + $verbose_list.no_defaults,,,False,False,None
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
label value expression minimum maximum non-negative vary standard-error
pure_list.1 1.0 None False True None
pure_list.2 2.0 None False True None
list_with_options.1 3.0 None False False None
list_with_options.2 4.0 None False False None
verbose_list.all_defaults 5.0 None False True None
verbose_list.no_defaults 6.0 None True False None
verbose_list.expression_only 11.0 $verbose_list.all_defaults + $verbose_list.no_defaults False False None
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
pure_list: [1.0, 2.0]

list_with_options: [3.0, 4.0, {vary: False}]

verbose_list:
- ["all_defaults", 5.0]
- ["no_defaults", 6.0, {non-negative: True, vary: False, minimum: -1, maximum: 1}]
- ["expression_only", {expr: $verbose_list.all_defaults + $verbose_list.no_defaults}]
105 changes: 105 additions & 0 deletions glotaran/builtin/io/pandas/test/test_pandas_parameters.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
from __future__ import annotations

from pathlib import Path

import numpy as np
import pandas as pd
import pytest
from pandas.testing import assert_frame_equal

from glotaran.io import load_parameters
from glotaran.io import save_parameters
from glotaran.parameter import ParameterGroup

PANDAS_TEST_DATA = Path(__file__).parent / "data"
PATH_XLSX = PANDAS_TEST_DATA / "reference_parameters.xlsx"
PATH_ODS = PANDAS_TEST_DATA / "reference_parameters.ods"
PATH_CSV = PANDAS_TEST_DATA / "reference_parameters.csv"
PATH_TSV = PANDAS_TEST_DATA / "reference_parameters.tsv"


@pytest.fixture(scope="module")
def yaml_reference() -> ParameterGroup:
"""Fixture for yaml reference data."""
return load_parameters(PANDAS_TEST_DATA / "reference_parameters.yaml")


@pytest.mark.parametrize("reference_path", (PATH_XLSX, PATH_ODS, PATH_CSV, PATH_TSV))
def test_references(yaml_reference: ParameterGroup, reference_path: Path):
"""References are the same"""
result = load_parameters(reference_path)
assert result == yaml_reference


@pytest.mark.parametrize(
"format_name,reference_path",
(("xlsx", PATH_XLSX), ("ods", PATH_ODS), ("csv", PATH_CSV), ("tsv", PATH_TSV)),
)
def test_roundtrips(
yaml_reference: ParameterGroup, tmp_path: Path, format_name: str, reference_path: Path
):
"""Roundtrip via save and load have the same data."""
format_reference = load_parameters(reference_path)
parameter_path = tmp_path / f"test_parameters.{format_name}"
save_parameters(file_name=parameter_path, format_name=format_name, parameters=yaml_reference)
parameters_roundtrip = load_parameters(parameter_path)

assert parameters_roundtrip == yaml_reference
assert parameters_roundtrip == format_reference

if format_name in {"csv", "tsv"}:
assert parameter_path.read_text() == reference_path.read_text()

first_data_line = parameter_path.read_text().splitlines()[1]
sep = "," if format_name == "csv" else "\t"

assert f"{sep}-inf" not in first_data_line
assert f"{sep}inf" not in first_data_line
else:
assert_frame_equal(
pd.read_excel(parameter_path, na_values=["None", "none"]),
pd.read_excel(reference_path, na_values=["None", "none"]),
)


@pytest.mark.parametrize("format_name", ("xlsx", "ods", "csv", "tsv"))
def test_as_optimized_false(yaml_reference: ParameterGroup, tmp_path: Path, format_name: str):
"""Column 'standard-error' is missing if as_optimized==False"""
parameter_path = tmp_path / f"test_parameters.{format_name}"
save_parameters(
file_name=parameter_path,
format_name=format_name,
parameters=yaml_reference,
as_optimized=False,
)

if format_name in {"csv", "tsv"}:
assert "standard-error" not in parameter_path.read_text().splitlines()[0]
else:
assert (
"standard-error"
not in pd.read_excel(parameter_path, na_values=["None", "none"]).columns
)


@pytest.mark.parametrize("format_name,sep", (("csv", ","), ("tsv", "\t")))
def test_replace_infinfinity(
yaml_reference: ParameterGroup, tmp_path: Path, format_name: str, sep: str
):
"""Column 'standard-error' is missing if as_optimized==False"""
parameter_path = tmp_path / f"test_parameters.{format_name}"
save_parameters(
file_name=parameter_path,
format_name=format_name,
parameters=yaml_reference,
replace_infinfinity=False,
)
df = pd.read_csv(parameter_path, sep=sep)
assert all(df["minimum"] == -np.inf)
assert all(df["maximum"] == np.inf)

first_data_line = parameter_path.read_text().splitlines()[1]
assert f"{sep}-inf" in first_data_line
assert f"{sep}inf" in first_data_line

assert load_parameters(parameter_path) == yaml_reference
62 changes: 62 additions & 0 deletions glotaran/builtin/io/pandas/tsv.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
"""Module containing TSV io support."""

from __future__ import annotations

from typing import TYPE_CHECKING

from glotaran.io import ProjectIoInterface
from glotaran.io import load_parameters
from glotaran.io import register_project_io
from glotaran.io import save_parameters

if TYPE_CHECKING:
from glotaran.parameter import ParameterGroup


@register_project_io(["tsv"])
class TsvProjectIo(ProjectIoInterface):
"""Plugin for TSV data io."""

def load_parameters(self, file_name: str) -> ParameterGroup:
"""Load parameters from TSV file.

Parameters
----------
file_name : str
Name of file to be loaded.

Returns
-------
:class:`ParameterGroup
"""
return load_parameters(file_name, format_name="csv", sep="\t")

def save_parameters(
self,
parameters: ParameterGroup,
file_name: str,
*,
as_optimized: bool = True,
replace_infinfinity: bool = True,
) -> None:
"""Save a :class:`ParameterGroup` to a TSV file.

Parameters
----------
parameters : ParameterGroup
Parameters to be saved to file.
file_name : str
File to write the parameters to.
as_optimized : bool
Whether to include properties which are the result of optimization.
replace_infinfinity : bool
Weather to replace infinity values with empty strings.
"""
save_parameters(
parameters,
file_name,
format_name="csv",
sep="\t",
as_optimized=as_optimized,
replace_infinfinity=replace_infinfinity,
)
53 changes: 53 additions & 0 deletions glotaran/builtin/io/pandas/xlsx.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
"""Module containing Excel like io support."""

from __future__ import annotations

import numpy as np
import pandas as pd

from glotaran.io import ProjectIoInterface
from glotaran.io import register_project_io
from glotaran.parameter import ParameterGroup
from glotaran.utils.io import safe_dataframe_fillna
from glotaran.utils.io import safe_dataframe_replace


@register_project_io(["xlsx", "ods"])
class ExcelProjectIo(ProjectIoInterface):
"""Plugin for Excel like data io."""

def load_parameters(self, file_name: str) -> ParameterGroup:
"""Load parameters from XLSX file.

Parameters
----------
file_name : str
Name of file to be loaded.

Returns
-------
:class:`ParameterGroup
"""
df = pd.read_excel(file_name, na_values=["None", "none"])
safe_dataframe_fillna(df, "minimum", -np.inf)
safe_dataframe_fillna(df, "maximum", np.inf)
return ParameterGroup.from_dataframe(df, source=file_name)

def save_parameters(
self, parameters: ParameterGroup, file_name: str, *, as_optimized: bool = True
):
"""Save a :class:`ParameterGroup` to a Excel file.

Parameters
----------
parameters : ParameterGroup
Parameters to be saved to file.
file_name : str
File to write the parameters to.
as_optimized : bool
Whether to include properties which are the result of optimization.
"""
df = parameters.to_dataframe(as_optimized=as_optimized)
safe_dataframe_replace(df, "minimum", -np.inf, "")
safe_dataframe_replace(df, "maximum", np.inf, "")
df.to_excel(file_name, na_rep="None", index=False)
8 changes: 7 additions & 1 deletion glotaran/parameter/parameter_group.py
Original file line number Diff line number Diff line change
Expand Up @@ -323,7 +323,13 @@ def to_csv(self, filename: str, delimiter: str = ",") -> None:
delimiter : str
Character to separate columns., by default ","
"""
save_parameters(self, file_name=filename, allow_overwrite=True, sep=delimiter)
save_parameters(
self,
file_name=filename,
allow_overwrite=True,
sep=delimiter,
replace_infinfinity=False,
)

def add_parameter(self, parameter: Parameter | list[Parameter]):
"""Add a :class:`Parameter` to the group.
Expand Down
Loading