-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support mrecarrays in DataFrame constructor #3479
Conversation
To illustrate my point, this is what the condition where all elif isinstance(data, ma.MaskedArray):
def unmask(arr):
mask = ma.getmaskarray(data)
if mask.any():
data, fill_value = _maybe_upcast(data, copy=True)
data[mask] = fill_value
return data
return data.copy()
if data.dtype.names:
data_columns, data = _rec_to_dict(data)
data = dict((k, unmask(v)) for k, v in data.iteritems())
if columns is None:
columns = data_columns
mgr = self._init_dict(data, index, columns, dtype=dtype)
else:
mgr = self._init_ndarray(unmask(data), index, columns, dtype=dtype,
copy=copy) Doesn't that second half look stupendously like the I still contend this is a case where code quality should not be sacrificed for marginal efficiency. |
You could easily make a function to combine best of both worlds here (also would put |
The point is: you couldn't easily factor out the is/isn't rec-array case, because in unmasking already requires a non-rec array. You can easily factor out the unmasking, which is what I did. Of course it requires testing (or alternatively, test smaller units like |
what is your use case for masked record arrays? why u creating them in the first place (rather than directly in pandas)? existing code? |
I am not sure the use-case is relevant, seeing as it would be natural for pandas to support them, being the equivalent numpy data structure to what pandas works with, and given that pandas already supports both numpy masking and numpy records / structured arrays. But I am considering mrecarray output from a model selection procedure in scikit-learn: to handle some cases (where the set of parameters set in each iteration is not constant) masking or equivalent is required; leaving it in a numpy structure avoids binding users to pandas and avoids the project including a pandas dependency. |
Okay. Here is a test that ensures input with mrecarray is identical to equivalent input with dicts. |
closing in favor of #4836 |
I don't get why you prefer more specialisation rather than code reuse, but I'm glad the feature is now supported. |
your solution forced unmasking on evey ndarray which is not good also the style was different than the rest your tests were gr8! |
This is intended as a fix for #3478.
Sorry I do not have the time to write rigorous tests.