Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python): support DataFrame export to numpy structured/record arrays #8628

Conversation

alexander-beedie
Copy link
Collaborator

@alexander-beedie alexander-beedie commented May 1, 2023

Closes #8564 (in conjunction with #8620, which provides structured array init support).

Adds an optional "structured" parameter to to_numpy, allowing for straightforward export to structured arrays1 (and, via a zero-copy .view(np.recarray) call, to record arrays).

Example

import numpy as np
import polars as pl

df = pl.DataFrame(
    {
        "foo": [1, 2, 3],
        "bar": [6.5, 7.0, 8.5],
        "ham": ["a", "b", "c"],
    },
    schema_overrides = {"foo": pl.UInt8, "bar": pl.Float32},
)

Standard export to 2D array:

df.to_numpy()
# array([[1, 6.5, 'a'],
#        [2, 7.0, 'b'],
#        [3, 8.5, 'c']], dtype=object)

Structured array export:

df.to_numpy( structured=True )
# array([(1, 6.5, 'a'), (2, 7. , 'b'), (3, 8.5, 'c')],
#       dtype=[('foo', 'u1'), ('bar', '<f4'), ('ham', '<U1')])

Structured array to record array:

df.to_numpy( structured=True ).view( np.recarray )
# rec.array([(1, 6.5, 'a'), (2, 7. , 'b'), (3, 8.5, 'c')],
#           dtype=[('foo', 'u1'), ('bar', '<f4'), ('ham', '<U1')])

Footnotes

  1. Numpy structured arrays: https://numpy.org/doc/stable/user/basics.rec.html

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars labels May 1, 2023
@alexander-beedie alexander-beedie force-pushed the export-numpy-structured-arrays branch 5 times, most recently from 189270c to 9af1b4d Compare May 2, 2023 06:01
@alexander-beedie alexander-beedie force-pushed the export-numpy-structured-arrays branch from 9af1b4d to 35a0cb9 Compare May 2, 2023 06:06
@alexander-beedie alexander-beedie merged commit 07b9f2a into pola-rs:main May 2, 2023
@alexander-beedie alexander-beedie deleted the export-numpy-structured-arrays branch May 2, 2023 08:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for numpy structured array conversion to and from
2 participants