Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REFACTOR-#5424: Replace dtypes="copy" with copy_dtypes flag #5426

Closed
wants to merge 4 commits into from

Conversation

noloerino
Copy link
Collaborator

@noloerino noloerino commented Dec 13, 2022

What do these changes do?

This replaces the dtypes="copy" argument (used in some operators and internal dataframe apply/broadcast methods) with a boolean flag copy_dtypes.

  • first commit message and PR title follow format outlined here

    NOTE: If you edit the PR title to match this format, you need to add another commit (even if it's empty) or amend your last commit for the CI job that checks the PR title to pick up the new PR title.

  • passes flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
  • passes black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
  • signed commit with git commit -s
  • Resolves REFACTOR: Replace dtypes="copy" with copy_dtypes flag #5424
  • tests added and passing
  • module layout described at docs/development/architecture.rst is up-to-date

@noloerino noloerino requested a review from a team as a code owner December 13, 2022 00:41
Copy link
Collaborator

@anmyachev anmyachev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the extra parameter better than what it was before?

@@ -269,7 +276,7 @@ def reduce(
The axis to perform the reduce over.
function : callable(row|col) -> single value
The reduce function to apply to each column.
dtypes : str, optional
dtypes : pandas.Series, optional
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is scalar type not accepted here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scalar types are not accepted here. The control flow here is reduce -> _compute_tree_reduce_metadata -> dataframe constructor, which directly sets self.dtypes on the new dataframe. In contrast, map is written to duplicate scalar datatypes into a series here.

@@ -2637,10 +2659,13 @@ def broadcast_apply_full_axis(
enumerate_partitions : bool, default: False
Whether pass partition index into applied `func` or not.
Note that `func` must be able to obtain `partition_idx` kwarg.
dtypes : list-like, default: None
dtypes : list-like, optional
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to use list-like type everywhere or pandas.Series one? To be more consistent.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change it to pandas.Series, since there are some locations where the dtype eventually gets passed to the constructor where it has to be a series.

@noloerino
Copy link
Collaborator Author

noloerino commented Dec 17, 2022 via email

@anmyachev
Copy link
Collaborator

Happy holiday!

noloerino and others added 4 commits January 3, 2023 13:50
Signed-off-by: Jonathan Shi <[email protected]>
Signed-off-by: Jonathan Shi <[email protected]>
Co-authored-by: Anatoly Myachev <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

REFACTOR: Replace dtypes="copy" with copy_dtypes flag
2 participants