Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support mrecarrays in DataFrame constructor #3479

Closed
wants to merge 3 commits into from

Conversation

jnothman
Copy link
Contributor

This is intended as a fix for #3478. Sorry I do not have the time to write rigorous tests.

@jnothman
Copy link
Contributor Author

To illustrate my point, this is what the condition where all MaskedArray stuff is handled together looks like:

        elif isinstance(data, ma.MaskedArray):
            def unmask(arr):
                mask = ma.getmaskarray(data)
                if mask.any():
                    data, fill_value = _maybe_upcast(data, copy=True)
                    data[mask] = fill_value
                    return data
                return data.copy()

            if data.dtype.names:
                data_columns, data = _rec_to_dict(data)
                data = dict((k, unmask(v)) for k, v in data.iteritems())
                if columns is None:
                    columns = data_columns
                mgr = self._init_dict(data, index, columns, dtype=dtype)
            else:
                mgr = self._init_ndarray(unmask(data), index, columns, dtype=dtype,
                                         copy=copy)

Doesn't that second half look stupendously like the if isinstance(..., ndarray) case?? With an unnecessary dict creation thrown in for fun?

I still contend this is a case where code quality should not be sacrificed for marginal efficiency.

@jreback
Copy link
Contributor

jreback commented Apr 30, 2013

You could easily make a function to combine best of both worlds here (also would put unmask outside of Frame in any event). certainly try if you would like. Still need tests though. There are lots of cases and need to avoid breaking anything

@jnothman
Copy link
Contributor Author

The point is: you couldn't easily factor out the is/isn't rec-array case, because in unmasking already requires a non-rec array. You can easily factor out the unmasking, which is what I did.

Of course it requires testing (or alternatively, test smaller units like _unmask and _rec_to_dict and you can get away with fewer tests for the same coverage). But I don't have the time to put to that.

@jreback
Copy link
Contributor

jreback commented Apr 30, 2013

what is your use case for masked record arrays? why u creating them in the first place (rather than directly in pandas)? existing code?

@jnothman
Copy link
Contributor Author

I am not sure the use-case is relevant, seeing as it would be natural for pandas to support them, being the equivalent numpy data structure to what pandas works with, and given that pandas already supports both numpy masking and numpy records / structured arrays. But I am considering mrecarray output from a model selection procedure in scikit-learn: to handle some cases (where the set of parameters set in each iteration is not constant) masking or equivalent is required; leaving it in a numpy structure avoids binding users to pandas and avoids the project including a pandas dependency.

@jnothman
Copy link
Contributor Author

Okay. Here is a test that ensures input with mrecarray is identical to equivalent input with dicts.

@jreback
Copy link
Contributor

jreback commented Sep 13, 2013

closing in favor of #4836

@jreback jreback closed this Sep 13, 2013
@jnothman
Copy link
Contributor Author

I don't get why you prefer more specialisation rather than code reuse, but I'm glad the feature is now supported.

@jreback
Copy link
Contributor

jreback commented Sep 14, 2013

your solution forced unmasking on evey ndarray which is not good

also the style was different than the rest

your tests were gr8!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants