Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump pandas from 1.5.3 to 2.0.3 #422

Open
wants to merge 20 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ Inspired from [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
- updating listing file with three v2 sparse model - by @dhrubo-os ([#412](https://github.com/opensearch-project/opensearch-py-ml/pull/412))
- Update model upload history - opensearch-project/opensearch-neural-sparse-encoding-doc-v2-mini (v.1.0.0)(TORCH_SCRIPT) by @dhrubo-os ([#417](https://github.com/opensearch-project/opensearch-py-ml/pull/417))
- Update model upload history - opensearch-project/opensearch-neural-sparse-encoding-v2-distill (v.1.0.0)(TORCH_SCRIPT) by @dhrubo-os ([#419](https://github.com/opensearch-project/opensearch-py-ml/pull/419))
- Bump pandas from 1.5.3 to 2.0.3 bu @yerzhaisang ([#422](https://github.com/opensearch-project/opensearch-py-ml/pull/422))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by*

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


### Fixed
- Fix the wrong final zip file name in model_uploader workflow, now will name it by the upload_prefix alse.([#413](https://github.com/opensearch-project/opensearch-py-ml/pull/413/files))
Expand Down
2 changes: 1 addition & 1 deletion docs/requirements-docs.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
opensearch-py>=2
pandas>=1.5,<3
pandas==2.0.3

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we upgrade to a more latest version? any reason specifically for 2.0.3?

Copy link
Contributor Author

@Yerzhaisang Yerzhaisang Nov 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bumping to a later version introduces datatype issues, including an ImportError like this:
ImportError: cannot import name 'is_datetime_or_timedelta_dtype' from 'pandas.core.dtypes.common'
Given that the issue was only to upgrade to a 2.x version, I thought 2.0.3 would be sufficient.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let's focus on bumping to 2.0.3 for now and then we can create another issue to upgrade more if needed.

matplotlib>=3.6.0,<4
nbval
sphinx
Expand Down
8 changes: 7 additions & 1 deletion opensearch_py_ml/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,14 +55,20 @@


def build_pd_series(
data: Dict[str, Any], dtype: Optional["DTypeLike"] = None, **kwargs: Any
data: Dict[str, Any],
dtype: Optional["DTypeLike"] = None,
index_name: Optional[str] = None,
**kwargs: Any,
) -> pd.Series:
"""Builds a pd.Series while squelching the warning
for unspecified dtype on empty series
"""
dtype = dtype or (EMPTY_SERIES_DTYPE if not data else dtype)
if dtype is not None:
kwargs["dtype"] = dtype
if index_name:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets keep an explicit check for None - if index_name is not None:

what happens if the index is not found?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we pass an index, we see the column name for which we are counting values and how it works in pandas 2.0.3.
image
If the index is not found, we don’t see the column name, and an assertion error occurs when we compare the built-in value_counts() method with the one in pandas.
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets keep an explicit check for None - if index_name is not None:

Done

index = pd.Index(data.keys(), name=index_name)
kwargs["index"] = index
return pd.Series(data, **kwargs)


Expand Down
37 changes: 31 additions & 6 deletions opensearch_py_ml/dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -424,9 +424,36 @@ def drop(
axis = pd.DataFrame._get_axis_name(axis)
axes = {axis: labels}
elif index is not None or columns is not None:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of confused here the parent branch is checking that if one of them is not None but inside its checking again
Line 431 and 440
maybe this could simplified to what @pyek-bot stated about creating a wrapper for convertToList if needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

axes, _ = pd.DataFrame()._construct_axes_from_arguments(
(index, columns), {}
)
# axes, _ = pd.DataFrame()._construct_axes_from_arguments(
# (index, columns), {}
# )

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can remove these comments if no longer used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

axes = {}
if index is not None:
if isinstance(index, pd.Index):
index = index.tolist() # Convert Index to list
elif not is_list_like(index):
index = [index] # Convert to list if it's not list-like already
axes["index"] = index
else:
axes["index"] = None

if columns is not None:
if isinstance(columns, pd.Index):
columns = columns.tolist() # Convert Index to list

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

columns to list

Copy link
Contributor Author

@Yerzhaisang Yerzhaisang Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

elif not is_list_like(columns):
columns = [columns] # Convert to list if it's not list-like already
axes["columns"] = columns
else:
axes["columns"] = None

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be wrapped in a method? and then we can do something like this

axes = {
    "index": to_list_if_needed(index),
    "columns": pd.Index(to_list_if_needed(columns)) if columns is not None else None
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


if columns is not None:
if not is_list_like(columns):
columns = [columns]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

repeated logic? this is handled in lines 443 and 444 right?

Copy link
Contributor Author

@Yerzhaisang Yerzhaisang Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

axes["columns"] = (
pd.Index(columns) if isinstance(columns, list) else columns
)
else:
axes["columns"] = None
else:
raise ValueError(
"Need to specify at least one of 'labels', 'index' or 'columns'"
Expand All @@ -440,7 +467,7 @@ def drop(
axes["index"] = [axes["index"]]
if errors == "raise":
# Check if axes['index'] values exists in index
count = self._query_compiler._index_matches_count(axes["index"])
count = self._query_compiler._index_matches_count(list(axes["index"]))
if count != len(axes["index"]):
raise ValueError(
f"number of labels {count}!={len(axes['index'])} not contained in axis"
Expand Down Expand Up @@ -1326,7 +1353,6 @@ def to_csv(
compression="infer",
quoting=None,
quotechar='"',
line_terminator=None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we removing this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I restored it as lineterminator to align with recent pandas updates, but it’s still not actively used elsewhere in the code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we keep as it is: line_terminator? not lineterminator?

chunksize=None,
tupleize_cols=None,
date_format=None,
Expand Down Expand Up @@ -1355,7 +1381,6 @@ def to_csv(
"compression": compression,
"quoting": quoting,
"quotechar": quotechar,
"line_terminator": line_terminator,
"chunksize": chunksize,
"date_format": date_format,
"doublequote": doublequote,
Expand Down
15 changes: 10 additions & 5 deletions opensearch_py_ml/operations.py
Original file line number Diff line number Diff line change
Expand Up @@ -475,7 +475,7 @@ def _terms_aggs(
except IndexError:
name = None

return build_pd_series(results, name=name)
return build_pd_series(results, index_name=name, name="count")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come its using "count" here but before it was using name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In pandas 2.0.3, a change in the value_counts method resulted in the following behavior:

The method now uses "count" as the name for the values column, while the original column name (e.g., "Carrier") is used for the index name. This differs from earlier versions, where the values column would inherit the name of the original column.
pd_153
pd_203

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @dhrubo-os what are your thoughts? Thanks for the info @Yerzhaisang

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yerzhaisang could you please share any documentation about the changing behavior of value_counts method from version 1.5.3 to 2.0.3. I think pandas versions 1.5.3 and 2.0.3, the value_counts() method has remained consistent in functionality. Please let me know if you think otherwise.

Copy link
Contributor Author

@Yerzhaisang Yerzhaisang Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In pandas 1.5.3, the series name is used for the result series.

In pandas 2.0.3, the series name is used for the index name, while the result series name is set to this one.


def _hist_aggs(
self, query_compiler: "QueryCompiler", num_bins: int
Expand Down Expand Up @@ -1205,7 +1205,7 @@ def describe(self, query_compiler: "QueryCompiler") -> pd.DataFrame:

df1 = self.aggs(
query_compiler=query_compiler,
pd_aggs=["count", "mean", "std", "min", "max"],
pd_aggs=["count", "mean", "min", "max", "std"],
numeric_only=True,
)
df2 = self.quantile(
Expand All @@ -1219,9 +1219,14 @@ def describe(self, query_compiler: "QueryCompiler") -> pd.DataFrame:
# Convert [.25,.5,.75] to ["25%", "50%", "75%"]
df2 = df2.set_index([["25%", "50%", "75%"]])

return pd.concat([df1, df2]).reindex(
["count", "mean", "std", "min", "25%", "50%", "75%", "max"]
)
df = pd.concat([df1, df2])

if df.shape[1] == 1:
return df.reindex(
["count", "mean", "std", "min", "25%", "50%", "75%", "max"]
)

return df.reindex(["count", "mean", "min", "25%", "50%", "75%", "max", "std"])

def to_pandas(
self, query_compiler: "QueryCompiler", show_progress: bool = False
Expand Down
3 changes: 2 additions & 1 deletion opensearch_py_ml/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -312,11 +312,12 @@ def value_counts(self, os_size: int = 10) -> pd.Series:

>>> df = oml.DataFrame(OPENSEARCH_TEST_CLIENT, 'flights')
>>> df['Carrier'].value_counts()
Carrier
Logstash Airways 3331
JetBeats 3274
Kibana Airlines 3234
ES-Air 3220
Name: Carrier, dtype: int64
Name: count, dtype: int64
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what we don't want, right? Carrier is the column name which we are changing it to count and that's not right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @dhrubo, this is actually correct. In pandas 2.0.3, Carrier is set as the index name, and count is the column name.

"""
if not isinstance(os_size, int):
raise TypeError("os_size must be a positive integer.")
Expand Down
2 changes: 1 addition & 1 deletion requirements-dev.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#
# Basic requirements
#
pandas>=1.5.2,<2
pandas==2.0.3
matplotlib>=3.6.2,<4
numpy>=1.24.0,<2
opensearch-py>=2.2.0
Expand Down
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#
# Basic requirements
#
pandas>=1.5.2,<2
pandas==2.0.3
matplotlib>=3.6.2,<4
numpy>=1.24.0,<2
opensearch-py>=2.2.0
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@
},
install_requires=[
"opensearch-py>=2",
"pandas>=1.5,<3",
"pandas==2.0.3",
"matplotlib>=3.6.0,<4",
"numpy>=1.24.0,<2",
"deprecated>=1.2.14,<2",
Expand Down
2 changes: 1 addition & 1 deletion tests/dataframe/test_describe_pytest.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ def test_flights_describe(self):
pd_flights = self.pd_flights()
oml_flights = self.oml_flights()

pd_describe = pd_flights.describe()
pd_describe = pd_flights.describe().drop(["timestamp"], axis=1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we removing this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In recent pandas versions, timestamp column is used in describe method. I could adapt our built-in method to pandas one, however I think it's kind of bug

desc_153 desc_203

# We remove bool columns to match pandas output
oml_describe = oml_flights.describe().drop(
["Cancelled", "FlightDelay"], axis="columns"
Expand Down
32 changes: 29 additions & 3 deletions tests/dataframe/test_groupby_pytest.py
Original file line number Diff line number Diff line change
Expand Up @@ -106,10 +106,18 @@ def test_groupby_aggs_mad_var_std(self, pd_agg, dropna):
pd_flights = self.pd_flights().filter(self.filter_data)
oml_flights = self.oml_flights().filter(self.filter_data)

pd_groupby = getattr(pd_flights.groupby("Cancelled", dropna=dropna), pd_agg)()
if pd_agg == "mad":

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to extract these variables like "mad", "var", "std" as a constant. (can remove the friction to new comers) fro example let MEAN_ABSOLUTE_DEVATION = "mad". Not sure if that would take a lot of effort but something to think about

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

pd_groupby = pd_flights.groupby("Cancelled", dropna=dropna).agg(
lambda x: (x - x.mean()).abs().mean()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this lambda get used in other places, could we possible have a util class that dispatches theses common function names?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

)
else:
pd_groupby = getattr(
pd_flights.groupby("Cancelled", dropna=dropna), pd_agg
)()
oml_groupby = getattr(oml_flights.groupby("Cancelled", dropna=dropna), pd_agg)(
numeric_only=True
)
pd_groupby = pd_groupby[oml_groupby.columns]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this? We shouldn't use any oml resource/info in pd as the goal is to how pd functionalities are same to oml functionality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, fixed


# checking only values because dtypes are checked in aggs tests
assert_frame_equal(
Expand Down Expand Up @@ -224,14 +232,32 @@ def test_groupby_dataframe_mad(self):
pd_flights = self.pd_flights().filter(self.filter_data + ["DestCountry"])
oml_flights = self.oml_flights().filter(self.filter_data + ["DestCountry"])

pd_mad = pd_flights.groupby("DestCountry").mad()
pd_mad = pd_flights.groupby("DestCountry").apply(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get some idea from this PR: https://github.com/elastic/eland/pull/602/files ?

Also Shouldn't we keep BWC in mind?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we don’t need to worry about backward compatibility, as we haven’t modified our built-in methods. We only customized the deprecated pandas methods within the tests.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I was wondering currently we are doing pandas==2.0.3, but can't we do pandas>=2.0.3 ? So that we can support other versions of pandas too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, let me do some research

Copy link
Contributor Author

@Yerzhaisang Yerzhaisang Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dhrubo-os thank you for this good point. Yeah, we shouldn't force customers to install only one version.
I did research and current functionality is compatible with pandas>=1.5.2,<2.1.
What do you think about that?

lambda x: x.select_dtypes(include="number").apply(
lambda x: (x - x.mean()).abs().mean()
)
)

# Re-merge non-numeric columns back, with suffixes to avoid column overlap
non_numeric_columns = (
pd_flights.select_dtypes(exclude="number").groupby("DestCountry").first()
)
pd_mad = pd_mad.join(
non_numeric_columns, lsuffix="_numeric", rsuffix="_non_numeric"
)[self.filter_data]
if "Cancelled" in pd_mad.columns:
pd_mad["Cancelled"] = pd_mad["Cancelled"].astype(float)
oml_mad = oml_flights.groupby("DestCountry").mad()

assert_index_equal(pd_mad.columns, oml_mad.columns)
assert_index_equal(pd_mad.index, oml_mad.index)
assert_series_equal(pd_mad.dtypes, oml_mad.dtypes)

pd_min_mad = pd_flights.groupby("DestCountry").aggregate(["min", "mad"])
pd_min_mad = pd_flights.groupby("DestCountry").agg(
["min", lambda x: (x - x.median()).abs().mean()]
)

pd_min_mad.columns = pd_min_mad.columns.set_levels(["min", "mad"], level=1)
oml_min_mad = oml_flights.groupby("DestCountry").aggregate(["min", "mad"])

assert_index_equal(pd_min_mad.columns, oml_min_mad.columns)
Expand Down
20 changes: 14 additions & 6 deletions tests/dataframe/test_metrics_pytest.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,9 +81,10 @@ def test_flights_extended_metrics(self):
logger.setLevel(logging.DEBUG)

for func in self.extended_funcs:
pd_metric = getattr(pd_flights, func)(
**({"numeric_only": True} if func != "mad" else {})
)
if func == "mad":

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed you introduced branching what if some day in the future we need to reimplement another function from scratch instead of creating a new branch every time can we extract this behavior out? perhaps have a utility class with our own custom implementation like

   customeFunctionMap = {"mad" : lambda x: (x - x.median()).abs().mean()}

and then dispatch it instead of branching for every new method we need to reimplent something like

if func in customFUnctionMap:
// apply the custom function
else:
// use the already given functionality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

pd_metric = (pd_flights - pd_flights.mean()).abs().mean()
else:
pd_metric = getattr(pd_flights, func)(**({"numeric_only": True}))
oml_metric = getattr(oml_flights, func)(numeric_only=True)

pd_value = pd_metric["AvgTicketPrice"]
Expand All @@ -101,7 +102,10 @@ def test_flights_extended_metrics_nan(self):
]

for func in self.extended_funcs:
pd_metric = getattr(pd_flights_1, func)()
if func == "mad":
pd_metric = (pd_flights_1 - pd_flights_1.mean()).abs().mean()
else:
pd_metric = getattr(pd_flights_1, func)()
oml_metric = getattr(oml_flights_1, func)(numeric_only=False)

assert_series_equal(pd_metric, oml_metric, check_exact=False)
Expand All @@ -111,7 +115,10 @@ def test_flights_extended_metrics_nan(self):
oml_flights_0 = oml_flights[oml_flights.FlightNum == "XXX"][["AvgTicketPrice"]]

for func in self.extended_funcs:
pd_metric = getattr(pd_flights_0, func)()
if func == "mad":
pd_metric = (pd_flights_0 - pd_flights_0.mean()).abs().mean()
else:
pd_metric = getattr(pd_flights_0, func)()
oml_metric = getattr(oml_flights_0, func)(numeric_only=False)

assert_series_equal(pd_metric, oml_metric, check_exact=False)
Expand Down Expand Up @@ -498,7 +505,8 @@ def test_flights_agg_quantile(self, numeric_only):
["AvgTicketPrice", "FlightDelayMin", "dayOfWeek"]
)

pd_quantile = pd_flights.agg(["quantile", "min"], numeric_only=numeric_only)
pd_quantile = pd_flights.agg([lambda x: x.quantile(0.5), lambda x: x.min()])
pd_quantile.index = ["quantile", "min"]
oml_quantile = oml_flights.agg(["quantile", "min"], numeric_only=numeric_only)

assert_frame_equal(
Expand Down
4 changes: 1 addition & 3 deletions tests/series/test_arithmetics_pytest.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,9 +80,7 @@ def to_pandas(self):
# "type cast" to modified class (inherits from ed.Series) that overrides the `to_pandas` function
oml_series.__class__ = ModifiedOMLSeries

assert_pandas_opensearch_py_ml_series_equal(
pd_series, oml_series, check_less_precise=True
)
assert_pandas_opensearch_py_ml_series_equal(pd_series, oml_series)

def test_ecommerce_series_invalid_div(self):
pd_df = self.pd_ecommerce()
Expand Down
10 changes: 8 additions & 2 deletions tests/series/test_metrics_pytest.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,10 @@ def test_flights_metrics(self):
oml_flights = self.oml_flights()["AvgTicketPrice"]

for func in self.all_funcs:
pd_metric = getattr(pd_flights, func)()
if func == "mad":
pd_metric = (pd_flights - pd_flights.mean()).abs().mean()
else:
pd_metric = getattr(pd_flights, func)()
oml_metric = getattr(oml_flights, func)()

self.assert_almost_equal_for_agg(func, pd_metric, oml_metric)
Expand Down Expand Up @@ -94,7 +97,10 @@ def test_ecommerce_selected_all_numeric_source_fields(self):
oml_ecommerce = self.oml_ecommerce()[column]

for func in self.all_funcs:
pd_metric = getattr(pd_ecommerce, func)()
if func == "mad":
pd_metric = (pd_ecommerce - pd_ecommerce.mean()).abs().mean()
else:
pd_metric = getattr(pd_ecommerce, func)()
oml_metric = getattr(oml_ecommerce, func)(
**({"numeric_only": True} if (func != "nunique") else {})
)
Expand Down
Loading