PERF: regression in Series constructor #30564

TomAugspurger · 2019-12-30T20:42:33Z

This is captured in the SeriesConstructor.time_constructor asv.

In [8]: import pandas as pd
   ...: import numpy as np
   ...:
   ...: data = np.arange(1000)
   ...: index = pd.date_range('2000', periods=len(data))
   ...: data = dict(zip(index, data))
   ...: s = pd.Series(data, index=index)

On master

848 ms ± 10.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

On 0.25.x

   ...: %timeit Series(data, index=index)
82.5 ms ± 2.05 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Looking into a bit now. We're spending a lot more time in ensure_index -> is_period_dtype / is_dtype / construct_from_string.

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2019-12-30T20:56:33Z

Hmm, so in Series._init_dict we go convert the DatetimeIndex to a tuple

        if data:
            keys, values = zip(*data.items())
            values = list(values)

and then go back through Series.__init__ with index=keys, the tuple of Timestamps, which is slow.

TomAugspurger · 2019-12-30T21:18:26Z

Oh fun.

In Index.__init__ we check is_period_dtype(data) (the list of tuples), which eventually calls PeriodDtype.construct_from_string with the data, to support is_period_dtype("Period[D]"). But of course a tuple of timestamps isn't PeriodDtype, so we raise a TypeError.

In #3047, we include the string in the error message, which takes a long time to format for the long tuples. A solution it to raise with a different error message for non-string inputs.

Closes pandas-dev#30564

* PERF: Fixed performance regression in Series init Closes #30564 * avoid calling

* PERF: Fixed performance regression in Series init Closes pandas-dev#30564 * avoid calling

TomAugspurger added the Performance Memory or execution speed performance label Dec 30, 2019

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Dec 30, 2019

PERF: Fixed performance regression in Series init

dc9a20d

Closes pandas-dev#30564

TomAugspurger mentioned this issue Dec 30, 2019

PERF: Fixed performance regression in Series init #30571

Merged

jbrockmendel closed this as completed in #30571 Dec 31, 2019

jbrockmendel pushed a commit that referenced this issue Dec 31, 2019

PERF: Fixed performance regression in Series init (#30571)

ee42275

* PERF: Fixed performance regression in Series init Closes #30564 * avoid calling

hweecat pushed a commit to hweecat/pandas that referenced this issue Jan 1, 2020

PERF: Fixed performance regression in Series init (pandas-dev#30571)

7e13b52

* PERF: Fixed performance regression in Series init Closes pandas-dev#30564 * avoid calling

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: regression in Series constructor #30564

PERF: regression in Series constructor #30564

TomAugspurger commented Dec 30, 2019 •

edited

Loading

TomAugspurger commented Dec 30, 2019

TomAugspurger commented Dec 30, 2019

PERF: regression in Series constructor #30564

PERF: regression in Series constructor #30564

Comments

TomAugspurger commented Dec 30, 2019 • edited Loading

TomAugspurger commented Dec 30, 2019

TomAugspurger commented Dec 30, 2019

TomAugspurger commented Dec 30, 2019 •

edited

Loading