-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: create dataframe from 2D numpy array and column names #1456
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @raisadz thanks for the PR! I hope it's ok if I am reviewing even if it's in draft π
I left a couple of comments, but the main concern is the orient
behavior. Apologies if you were going to address these π
narwhals/functions.py
Outdated
native_namespace: The native library to use for DataFrame creation. Only | ||
necessary if inputs are not Narwhals Series. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
native_namespace: The native library to use for DataFrame creation. Only | |
necessary if inputs are not Narwhals Series. | |
native_namespace: The native library to use for DataFrame creation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed this
narwhals/functions.py
Outdated
@@ -430,6 +432,163 @@ def _from_dict_impl( | |||
return from_native(native_frame, eager_only=True) | |||
|
|||
|
|||
def from_numpy( | |||
data: np.ndarray, | |||
schema: dict[str, DType] | Schema | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be nice to support list[str]
as column names. In my opinion is way more common to specify the names of the columns without their dtypes when creating a dataframe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I added this as an option for schema
narwhals/functions.py
Outdated
} | ||
native_frame = native_namespace.from_numpy(data, schema=schema_pl) | ||
else: | ||
native_frame = native_namespace.from_numpy(data, orient="col") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we defaulting to orient="col"
? To me this seems the opposite of what polars does if nothing else is provided:
import numpy as np
import polars as pl
data = np.array([[5, 2, 1], [1, 4, 3]])
data
array([[5, 2, 1],
[1, 4, 3]])
pl.from_numpy(data)
shape: (2, 3)
ββββββββββββ¬βββββββββββ¬βββββββββββ
β column_0 β column_1 β column_2 β
β --- β --- β --- β
β i64 β i64 β i64 β
ββββββββββββͺβββββββββββͺβββββββββββ‘
β 5 β 2 β 1 β
β 1 β 4 β 3 β
ββββββββββββ΄βββββββββββ΄βββββββββββ
while:
pl.from_numpy(data, orient="col")
shape: (3, 2)
ββββββββββββ¬βββββββββββ
β column_0 β column_1 β
β --- β --- β
β i64 β i64 β
ββββββββββββͺβββββββββββ‘
β 5 β 1 β
β 2 β 4 β
β 1 β 3 β
ββββββββββββ΄βββββββββββ
This translates to the array transpose in pandas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @FBruzzesi, thanks for your review! I fixed the orientation from columns to rows
b8feada
to
c93a78f
Compare
c93a78f
to
ff9cc83
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @raisadz !
and thanks @FBruzzesi for reviewing! |
What type of PR is this? (check all applicable)
Related issues
Checklist
If you have comments or can explain your changes, please do so below