REFACTOR-#5424: Replace dtypes="copy" with copy_dtypes flag #5426

noloerino · 2022-12-13T00:41:36Z

What do these changes do?

This replaces the dtypes="copy" argument (used in some operators and internal dataframe apply/broadcast methods) with a boolean flag copy_dtypes.

first commit message and PR title follow format outlined here

NOTE: If you edit the PR title to match this format, you need to add another commit (even if it's empty) or amend your last commit for the CI job that checks the PR title to pick up the new PR title.
passes flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
passes black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
signed commit with git commit -s
Resolves REFACTOR: Replace dtypes="copy" with copy_dtypes flag #5424
tests added and passing
module layout described at docs/development/architecture.rst is up-to-date

anmyachev

Why is the extra parameter better than what it was before?

modin/core/dataframe/base/dataframe/dataframe.py

anmyachev · 2022-12-15T14:07:28Z

modin/core/dataframe/base/dataframe/dataframe.py

@@ -269,7 +276,7 @@ def reduce(
            The axis to perform the reduce over.
        function : callable(row|col) -> single value
            The reduce function to apply to each column.
-        dtypes : str, optional
+        dtypes : pandas.Series, optional


Is scalar type not accepted here?

Scalar types are not accepted here. The control flow here is reduce -> _compute_tree_reduce_metadata -> dataframe constructor, which directly sets self.dtypes on the new dataframe. In contrast, map is written to duplicate scalar datatypes into a series here.

modin/core/dataframe/base/dataframe/dataframe.py

modin/core/dataframe/pandas/dataframe/dataframe.py

anmyachev · 2022-12-15T14:11:23Z

modin/core/dataframe/pandas/dataframe/dataframe.py

@@ -2637,10 +2659,13 @@ def broadcast_apply_full_axis(
        enumerate_partitions : bool, default: False
            Whether pass partition index into applied `func` or not.
            Note that `func` must be able to obtain `partition_idx` kwarg.
-        dtypes : list-like, default: None
+        dtypes : list-like, optional


Is it possible to use list-like type everywhere or pandas.Series one? To be more consistent.

I'll change it to pandas.Series, since there are some locations where the dtype eventually gets passed to the constructor where it has to be a series.

modin/core/dataframe/pandas/dataframe/dataframe.py

noloerino · 2022-12-17T23:45:24Z

Thanks for reviewing @anmyachev, I’ll make the changes after coming back from holiday in January.

Why is the extra parameter better than what it was before?

The extra parameter makes it clearer when a dtype hint is being passed, as opposed to when it’s being copied. I’ve previously had some code changes where I ran into errors because I neglected to handle the case where `dtypes=“copy”`, and keeping control flow (whether to infer or copy dtype) separate from data (the actual resulting dtype) makes more sense to me.

…

On Thu, Dec 15, 2022 at 06:15 Anatoly Myachev ***@***.***> wrote: ***@***.**** commented on this pull request. Why is the extra parameter better than what it was before? ------------------------------ In modin/core/dataframe/base/dataframe/dataframe.py <#5426 (comment)>: > The data types for the result. This is an optimization because there are functions that always result in a particular data type, and this allows us to avoid (re)computing it. ⬇️ Suggested change - The data types for the result. This is an optimization - because there are functions that always result in a particular data - type, and this allows us to avoid (re)computing it. + The data types for the result. This is an optimization + because there are functions that always result in a particular data + type, and this allows us to avoid (re)computing it. + If the argument is a scalar type, then that type is assigned to each result column. ------------------------------ In modin/core/dataframe/base/dataframe/dataframe.py <#5426 (comment)>: > @@ -269,7 +276,7 @@ def reduce( The axis to perform the reduce over. function : callable(row|col) -> single value The reduce function to apply to each column. - dtypes : str, optional + dtypes : pandas.Series, optional Is scalar type not accepted here? ------------------------------ In modin/core/dataframe/base/dataframe/dataframe.py <#5426 (comment)>: > @@ -308,7 +315,7 @@ def tree_reduce( The map function to apply to each column. reduce_func : callable(row|col) -> single value, optional The reduce function to apply to the results of the map function. - dtypes : str, optional + dtypes : pandas.Series, optional Is scalar type not accepted here? ------------------------------ In modin/core/dataframe/pandas/dataframe/dataframe.py <#5426 (comment)>: > @@ -1604,6 +1604,8 @@ def _compute_tree_reduce_metadata(self, axis, new_parts): The axis on which reduce function was applied. new_parts : NumPy 2D array Partitions with the result of applied function. + dtypes : pandas.Series, optional Is scalar type not accepted here? ------------------------------ In modin/core/dataframe/pandas/dataframe/dataframe.py <#5426 (comment)>: > @@ -1643,7 +1644,7 @@ def reduce( The axis to perform the reduce over. function : callable(row|col) -> single value The reduce function to apply to each column. - dtypes : str, optional + dtypes : pandas.Series, optional Is scalar type not accepted here? ------------------------------ In modin/core/dataframe/pandas/dataframe/dataframe.py <#5426 (comment)>: > @@ -2637,10 +2659,13 @@ def broadcast_apply_full_axis( enumerate_partitions : bool, default: False Whether pass partition index into applied `func` or not. Note that `func` must be able to obtain `partition_idx` kwarg. - dtypes : list-like, default: None + dtypes : list-like, optional Is it possible to use list-like type everywhere or pandas.Series one? To be more consistent. ------------------------------ In modin/core/dataframe/pandas/dataframe/dataframe.py <#5426 (comment)>: > @@ -1684,7 +1685,7 @@ def tree_reduce( reduce_func : callable(row|col) -> single value, optional Callable function to reduce the dataframe. If none, then apply map_func twice. - dtypes : str, optional + dtypes : pandas.Series, optional Is scalar type not accepted here? — Reply to this email directly, view it on GitHub <#5426 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFFY4GXV63HZ252VTZXXVYTWNMRX3ANCNFSM6AAAAAAS4SRLOY> . You are receiving this because you authored the thread.Message ID: ***@***.***>

anmyachev · 2022-12-19T15:23:37Z

Happy holiday!

Signed-off-by: Jonathan Shi <[email protected]>

Co-authored-by: Anatoly Myachev <[email protected]>

Signed-off-by: Jonathan Shi <[email protected]>

noloerino requested a review from a team as a code owner December 13, 2022 00:41

noloerino mentioned this pull request Dec 13, 2022

FEAT-#4909: Properly implement map operator #5118

Draft

7 tasks

anmyachev reviewed Dec 15, 2022

View reviewed changes

noloerino and others added 4 commits January 3, 2023 13:50

tmp

5744e93

Signed-off-by: Jonathan Shi <[email protected]>

lint

58d9990

Signed-off-by: Jonathan Shi <[email protected]>

Update df.map docstring

a89d5fe

Co-authored-by: Anatoly Myachev <[email protected]>

replace list-like with pandas.Series

d9e95ab

Signed-off-by: Jonathan Shi <[email protected]>

noloerino force-pushed the copy-dtypes branch from b90470d to d9e95ab Compare January 3, 2023 21:50

noloerino closed this Apr 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REFACTOR-#5424: Replace dtypes="copy" with copy_dtypes flag #5426

REFACTOR-#5424: Replace dtypes="copy" with copy_dtypes flag #5426

noloerino commented Dec 13, 2022 •

edited

Loading

anmyachev left a comment

anmyachev Dec 15, 2022

noloerino Jan 3, 2023

anmyachev Dec 15, 2022

noloerino Jan 3, 2023

noloerino commented Dec 17, 2022 via email

anmyachev commented Dec 19, 2022

REFACTOR-#5424: Replace dtypes="copy" with copy_dtypes flag #5426

REFACTOR-#5424: Replace dtypes="copy" with copy_dtypes flag #5426

Conversation

noloerino commented Dec 13, 2022 • edited Loading

What do these changes do?

anmyachev left a comment

Choose a reason for hiding this comment

anmyachev Dec 15, 2022

Choose a reason for hiding this comment

noloerino Jan 3, 2023

Choose a reason for hiding this comment

anmyachev Dec 15, 2022

Choose a reason for hiding this comment

noloerino Jan 3, 2023

Choose a reason for hiding this comment

noloerino commented Dec 17, 2022 via email

anmyachev commented Dec 19, 2022

noloerino commented Dec 13, 2022 •

edited

Loading