Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ArrayManager] DataFrame constructor from ndarray #40441

Merged

Conversation

jorisvandenbossche
Copy link
Member

@jorisvandenbossche jorisvandenbossche commented Mar 15, 2021

xref #39146

Currently, the DataFrame(ndarray) construction for ArrayManager still went through BlockManager and then mgr_to_mgr conversion, which is less efficient.
This PR adds a check in ndarray_to_mgr for directly creating an ArrayManager.

It's still WIP, because there are still some validation steps that are now done inside BlockManager/Block inits, that need to be factored out / shared / added to ArrayManager path.

@jorisvandenbossche jorisvandenbossche added Refactor Internal refactoring of code Internals Related to non-user accessible pandas implementation Constructors Series/DataFrame/Index/pd.array Constructors labels Mar 15, 2021
@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@jorisvandenbossche jorisvandenbossche marked this pull request as ready for review April 20, 2021 08:27
@@ -304,10 +305,26 @@ def ndarray_to_mgr(
index, columns = _get_axes(
values.shape[0], values.shape[1], index=index, columns=columns
)
values = values.T

_check_values_indices_shape_match(values, index, columns)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By moving values = values.T below _check_values_indices_shape_match, that function doesn't need to take the transposed shapes into account (and can be used for both AM and BM).

@jorisvandenbossche jorisvandenbossche added this to the 1.3 milestone Apr 21, 2021
@jorisvandenbossche jorisvandenbossche merged commit ead9404 into pandas-dev:master Apr 26, 2021
@jorisvandenbossche jorisvandenbossche deleted the am-constructor-ndarray branch April 26, 2021 09:34
for i in range(values.shape[1])
]
else:
if is_datetime_or_timedelta_dtype(values.dtype):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this check is redundant

yeshsurya pushed a commit to yeshsurya/pandas that referenced this pull request May 6, 2021
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Constructors Series/DataFrame/Index/pd.array Constructors Internals Related to non-user accessible pandas implementation Refactor Internal refactoring of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants