Add support for numpy structured array conversion to and from #8564

deanm0000 · 2023-04-28T11:52:54Z

Problem description

A while back there was this question which would have benefited from being able to convert a structured array into a polars df.

Another usage for the to_structured_array use case is feeding it to something like statsmodels so that names and dtypes are preserved.

Right now I can do something like

from numpy.lib.recfunctions import unstructured_to_structured
x_cols=['col1','col2','col3']
X=unstructured_to_structured(df.select(x_cols).to_numpy())
X.dtype.names=x_cols

but then the dtypes are all floats. I don't think that matters in the context of OLS but in other contexts it might be more important to retain dtypes.

Alternatively, other packages like patsy and formulaic produce their own Matrix datatype which is just an np.ndarray but it retains the column names in a way that statsmodels recognizes. I haven't dug into the source code enough to see how that works but that would be enough for me although perhaps not as extensible.

The text was updated successfully, but these errors were encountered:

s-banach · 2023-04-28T13:03:14Z

Can you use df.to_pandas()?

deanm0000 · 2023-04-28T13:31:11Z

In a literal sense, yes, but I'm trying to reduce dependence on pandas.

tikkanz · 2023-04-29T10:50:15Z

Does the answer to this stackoverflow question help?

zundertj · 2023-04-29T18:24:31Z

It seems, for Statmodels + formulaic interface, rather easy to create your own wrapper, see working with large datasets in the user guide. Something along the lines of (note: not tested):

class DataSet(dict):
    def __init__(self, df):
        self._df = df

    def __getitem__(self, key):
        try:
            return df[key].to_numpy()
        except:
            raise KeyError

deanm0000 · 2023-05-01T14:04:32Z

@tikkanz yes it's helpful although their return trip method uses the structured array that they started from. That being said we can fix that like this:

npdf=np.empty(df.height, [(x, df.head(0).get_column(x).to_numpy().dtype.str) for x in df.columns])
for x in df.columns:
    npdf[x]=df.get_column(x).to_numpy()

alexander-beedie · 2023-05-02T08:22:21Z

Done; we'll have native init/export support for numpy structured/record arrays in the upcoming 0.17.12 release 👍

deanm0000 · 2023-05-02T10:10:45Z

What's the syntax to export a structured array? Also, thanks for your efforts, they're much appreciated.

alexander-beedie · 2023-05-02T12:19:17Z

What's the syntax to export a structured array?

Have a look at the pull request - it's all detailed there ;)
TLDR: df.to_numpy( structured=True )

deanm0000 added the enhancement New feature or an improvement of an existing feature label Apr 28, 2023

deanm0000 mentioned this issue Apr 28, 2023

Add method to return two numpy arrays where first one is the first column and the second is all other columns #8565

Closed

alexander-beedie self-assigned this May 1, 2023

This was referenced May 1, 2023

feat(python): support transparent DataFrame init from numpy structured/record arrays. #8620

Merged

feat(python): support DataFrame export to numpy structured/record arrays #8628

Merged

alexander-beedie closed this as completed in #8628 May 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for numpy structured array conversion to and from #8564

Add support for numpy structured array conversion to and from #8564

deanm0000 commented Apr 28, 2023

s-banach commented Apr 28, 2023

deanm0000 commented Apr 28, 2023

tikkanz commented Apr 29, 2023

zundertj commented Apr 29, 2023

deanm0000 commented May 1, 2023

alexander-beedie commented May 2, 2023

deanm0000 commented May 2, 2023

alexander-beedie commented May 2, 2023 •

edited

Loading

Add support for numpy structured array conversion to and from #8564

Add support for numpy structured array conversion to and from #8564

Comments

deanm0000 commented Apr 28, 2023

Problem description

s-banach commented Apr 28, 2023

deanm0000 commented Apr 28, 2023

tikkanz commented Apr 29, 2023

zundertj commented Apr 29, 2023

deanm0000 commented May 1, 2023

alexander-beedie commented May 2, 2023

deanm0000 commented May 2, 2023

alexander-beedie commented May 2, 2023 • edited Loading

alexander-beedie commented May 2, 2023 •

edited

Loading