Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT-#6498: Make Fold operator more flexible #54

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions modin/core/dataframe/algebra/fold.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,8 @@ def register(
def caller(
query_compiler: PandasQueryCompiler,
fold_axis: Optional[int] = None,
new_index=None,
new_columns=None,
Comment on lines +52 to +53
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add validation for new_index and new_columns.

Consider adding checks to ensure that new_index and new_columns are valid before passing them to the fold method. This will prevent potential runtime errors.

Add validation as follows:

if new_index is not None and not isinstance(new_index, (list, pandas.Index)):
    raise ValueError("new_index must be list-like or a pandas Index.")
if new_columns is not None and not isinstance(new_columns, (list, pandas.Index)):
    raise ValueError("new_columns must be list-like or a pandas Index.")

*args: tuple,
**kwargs: dict,
) -> PandasQueryCompiler:
Comment on lines 49 to 56
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

category Functionality

The new parameters new_index and new_columns have been added to the caller function, but there's no validation or error handling for these parameters. Consider adding checks to ensure that these parameters are valid before passing them to the fold method. Also, it would be helpful to add documentation explaining how these parameters affect the resulting DataFrame and what constraints they might have (e.g., length requirements).

Chat with Korbit by mentioning @korbit-ai, and give a 👍 or 👎 to help Korbit improve your reviews.

Expand All @@ -62,6 +64,10 @@ def caller(
fold_axis : int, optional
0 or None means apply across full column partitions. 1 means
apply across full row partitions.
new_index : list-like, optional
The index of the result.
new_columns : list-like, optional
The columns of the result.
*args : tuple
Additional arguments passed to `fold_function`.
**kwargs: dict
Expand All @@ -77,6 +83,8 @@ def caller(
query_compiler._modin_frame.fold(
cls.validate_axis(fold_axis),
lambda x: fold_function(x, *args, **kwargs),
new_index=new_index,
new_columns=new_columns,
)
)

Expand Down
29 changes: 15 additions & 14 deletions modin/core/dataframe/pandas/dataframe/dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -2321,7 +2321,7 @@ def window(
pass

@lazy_metadata_decorator(apply_axis="both")
def fold(self, axis, func, new_columns=None):
def fold(self, axis, func, new_index=None, new_columns=None):
"""
Perform a function across an entire axis.

Expand All @@ -2331,35 +2331,36 @@ def fold(self, axis, func, new_columns=None):
The axis to apply over.
func : callable
The function to apply.
new_index : list-like, optional
The index of the result.
new_columns : list-like, optional
The columns of the result.
Must be the same length as the columns' length of `self`.
The column labels of `self` may change during an operation so
we may want to pass the new column labels in (e.g., see `cat.codes`).

Returns
-------
PandasDataframe
A new dataframe.

Notes
-----
The data shape is not changed (length and width of the table).
"""
if new_index is not None:
if self.has_materialized_index and len(self.index) == len(new_index):
copy_lengths = True
else:
copy_lengths = False
self.set_index_cache(new_index)
if new_columns is not None:
if self.has_materialized_columns:
assert len(self.columns) == len(
new_columns
), "The length of `new_columns` doesn't match the columns' length of `self`"
if self.has_materialized_columns and len(self.columns) == len(new_columns):
copy_widths = True
else:
copy_widths = False
self.set_columns_cache(new_columns)

new_partitions = self._partition_mgr_cls.map_axis_partitions(
axis, self._partitions, func, keep_partitioning=True
)
return self.__constructor__(
new_partitions,
self.copy_index_cache(copy_lengths=True),
self.copy_columns_cache(copy_lengths=True),
self.copy_index_cache(copy_lengths=copy_lengths),
self.copy_columns_cache(copy_lengths=copy_widths),
self._row_lengths_cache,
self._column_widths_cache,
pandas_backend=self._pandas_backend,
Expand Down