BUG: Series.map({}) should retain the original dtype #18509

jreback · 2017-11-27T00:23:52Z

In [1]: pd.Series(pd.date_range("2012-01-01", periods=3)).map({})
Out[1]: 
0   NaN
1   NaN
2   NaN
dtype: float64

In [2]: pd.date_range("2012-01-01", periods=3).map({})
Out[2]: DatetimeIndex(['NaT', 'NaT', 'NaT'], dtype='datetime64[ns]', freq=None)

[1] should infer to a datetime64 as well

@jschendel comments

On master:

In [2]: pd.__version__
Out[2]: '0.22.0.dev0+241.gf745e52'

In [3]: pd.date_range('20170101', periods=4).map({})
Out[3]: DatetimeIndex(['NaT', 'NaT', 'NaT', 'NaT'], dtype='datetime64[ns]', freq=None)

IntervalIndex, CategoricalIndex, and Index with object dtype get coerced to Float64Index:

In [4]: pd.interval_range(0, 5).map({})
Out[4]: Float64Index([nan, nan, nan, nan, nan], dtype='float64')

In [5]: pd.CategoricalIndex(list('abca')).map({})
Out[5]: Float64Index([nan, nan, nan, nan], dtype='float64')

In [6]: pd.Index(list('abca')).map({})
Out[6]: Float64Index([nan, nan, nan, nan], dtype='float64')

PeriodIndex and TimedeltaIndex get coerced DatetimeIndex:

In [7]: pd.period_range('2017Q1', periods=4, freq='Q').map({})
Out[7]: DatetimeIndex(['NaT', 'NaT', 'NaT', 'NaT'], dtype='datetime64[ns]', freq=None)

In [8]: pd.timedelta_range(1, periods=4).map({})
Out[8]: DatetimeIndex(['NaT', 'NaT', 'NaT', 'NaT'], dtype='datetime64[ns]', freq=None)

The text was updated successfully, but these errors were encountered:

jreback · 2017-11-27T02:06:46Z

@jschendel yeah [7], and [8] 'work' but should be the correct type (bug I introduced). The others are not well tested (cat / interval), so not surprising they are wrong. [6] also not tested well.

jorisvandenbossche · 2017-11-27T08:47:31Z

Repeating my comment from #18491 here:

I am not sure I find this special casing a good idea. I would rather keep .map 'dummy' and simply create a new index/series from the newly created values, without trying to infer anything more than pd.Index(..) / pd.Series(..) already does.

Do you have a specific use case for this? (for wanting to be smart?)
I think the user can always change the dtype after doing the map if it wants something specific or to preserve the index.

jorisvandenbossche · 2017-11-27T09:03:42Z

To further clarify: I think we are just trying to be too smart here, introducing a lot of corner cases. I would say: just let the user handle this (if he/she wants to retain the index type, he/she can simply wrap the map call in the appropriate index type).

So what is the rationale of trying to keep the index type in case of all-NaNs ? Because you think a user typically uses map to create a new index of the same type but with different values? And it such a case that your DatetimeIndex becoming a FloatIndex is suprising?
In my personal use cases of map, I am typically not interested in preserving the index type, as my functions typically return something completely different (in terms of dtype). In that case, this "trying to preserve the dtype" can also give unexpected results the other way around. Assume I have a function that returns floats, but for some reason for the index I have, only returns np.nans:

In [44]: pd.date_range("2010-01-01", periods=3, freq='A').map(lambda x: np.nan if x.year > 2010 else 1)
Out[44]: Float64Index([1.0, nan, nan], dtype='float64')

In [45]: pd.date_range("2011-01-01", periods=3, freq='A').map(lambda x: np.nan if x.year > 2010 else 1)
Out[45]: DatetimeIndex(['NaT', 'NaT', 'NaT'], dtype='datetime64[ns]', freq=None)

In that case I, unexpectedly, get a DatetimeIndex instead of a FloatIndex.

To put it differently, we wouldn't distinguish a NaN and NaT return:

In [39]: pd.date_range("2012-01-01", periods=3).map(lambda x: pd.NaT)
Out[39]: DatetimeIndex(['NaT', 'NaT', 'NaT'], dtype='datetime64[ns]', freq=None)

In [40]: pd.date_range("2012-01-01", periods=3).map(lambda x: np.nan)
Out[40]: DatetimeIndex(['NaT', 'NaT', 'NaT'], dtype='datetime64[ns]', freq=None)

Also, you added a special case of trying to preserve uint64 data type for Uint64Index. Shouldn't we then do this for Series with different dtypes as well?

And about [1] (first example at the top) being a bug: this has always been the behaviour, so if we change this I would see it as a API breaking change rather than a bug fix.

closes pandas-dev#18509

closes #18509

jreback added Bug Dtype Conversions Unexpected or buggy dtype conversions Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Nov 27, 2017

jreback added this to the 0.22.0 milestone Nov 27, 2017

jreback self-assigned this Nov 27, 2017

jreback added a commit to jreback/pandas that referenced this issue Nov 27, 2017

API: empty map should not infer

c4fdd71

closes pandas-dev#18509

jreback mentioned this issue Nov 27, 2017

API: empty map should not infer #18517

Merged

jreback added a commit to jreback/pandas that referenced this issue Dec 2, 2017

API: empty map should not infer

2758022

closes pandas-dev#18509

jreback closed this as completed in #18517 Dec 2, 2017

jreback added a commit that referenced this issue Dec 2, 2017

API: empty map should not infer (#18517)

e1ba19a

closes #18509

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Series.map({}) should retain the original dtype #18509

BUG: Series.map({}) should retain the original dtype #18509

jreback commented Nov 27, 2017 •

edited

Loading

jreback commented Nov 27, 2017

jorisvandenbossche commented Nov 27, 2017

jorisvandenbossche commented Nov 27, 2017

BUG: Series.map({}) should retain the original dtype #18509

BUG: Series.map({}) should retain the original dtype #18509

Comments

jreback commented Nov 27, 2017 • edited Loading

jreback commented Nov 27, 2017

jorisvandenbossche commented Nov 27, 2017

jorisvandenbossche commented Nov 27, 2017

jreback commented Nov 27, 2017 •

edited

Loading