Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: create dataframe from 2D numpy array and column names #1456

Merged
merged 9 commits into from
Nov 29, 2024

Conversation

raisadz
Copy link
Contributor

@raisadz raisadz commented Nov 28, 2024

What type of PR is this? (check all applicable)

  • πŸ’Ύ Refactor
  • ✨ Feature
  • πŸ› Bug Fix
  • πŸ”§ Optimization
  • πŸ“ Documentation
  • βœ… Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

@github-actions github-actions bot added the enhancement New feature or request label Nov 28, 2024
Copy link
Member

@FBruzzesi FBruzzesi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @raisadz thanks for the PR! I hope it's ok if I am reviewing even if it's in draft πŸ˜‡

I left a couple of comments, but the main concern is the orient behavior. Apologies if you were going to address these πŸ™ˆ

Comment on lines 450 to 451
native_namespace: The native library to use for DataFrame creation. Only
necessary if inputs are not Narwhals Series.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
native_namespace: The native library to use for DataFrame creation. Only
necessary if inputs are not Narwhals Series.
native_namespace: The native library to use for DataFrame creation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed this

@@ -430,6 +432,163 @@ def _from_dict_impl(
return from_native(native_frame, eager_only=True)


def from_numpy(
data: np.ndarray,
schema: dict[str, DType] | Schema | None = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to support list[str] as column names. In my opinion is way more common to specify the names of the columns without their dtypes when creating a dataframe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I added this as an option for schema

}
native_frame = native_namespace.from_numpy(data, schema=schema_pl)
else:
native_frame = native_namespace.from_numpy(data, orient="col")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we defaulting to orient="col"? To me this seems the opposite of what polars does if nothing else is provided:

import numpy as np
import polars as pl

data = np.array([[5, 2, 1], [1, 4, 3]])
data
array([[5, 2, 1],
       [1, 4, 3]])

pl.from_numpy(data)
shape: (2, 3)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ column_0 ┆ column_1 ┆ column_2 β”‚
β”‚ ---      ┆ ---      ┆ ---      β”‚
β”‚ i64      ┆ i64      ┆ i64      β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•ͺ══════════β•ͺ══════════║
β”‚ 5        ┆ 2        ┆ 1        β”‚
β”‚ 1        ┆ 4        ┆ 3        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

while:

pl.from_numpy(data, orient="col")
shape: (3, 2)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ column_0 ┆ column_1 β”‚
β”‚ ---      ┆ ---      β”‚
β”‚ i64      ┆ i64      β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•ͺ══════════║
β”‚ 5        ┆ 1        β”‚
β”‚ 2        ┆ 4        β”‚
β”‚ 1        ┆ 3        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

This translates to the array transpose in pandas

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @FBruzzesi, thanks for your review! I fixed the orientation from columns to rows

@raisadz raisadz force-pushed the from-numpy branch 5 times, most recently from b8feada to c93a78f Compare November 29, 2024 11:42
@raisadz raisadz marked this pull request as ready for review November 29, 2024 11:49
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @raisadz !

@MarcoGorelli MarcoGorelli merged commit 5b24161 into narwhals-dev:main Nov 29, 2024
24 checks passed
@MarcoGorelli
Copy link
Member

and thanks @FBruzzesi for reviewing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

enh: create dataframe from 2D numpy array and column names
3 participants