REF: Special case NumericIndex._append_same_dtype() #17307

toobaz · 2017-08-22T07:50:09Z

tests passed
passes git diff master -u -- "*.py" | flake8 --diff

Simple patch which results in a modest speedup (~8%) of the following:

In [2]: idxes = [pd.Int64Index(range(i)) for i in range(1, 10)]*200
In [3]: %timeit idxes[0].append(idxes[1:])

I don't know whether this is worth the extra code... just inquiring so that I can finalize #16236.

codecov · 2017-08-22T09:16:34Z

Codecov Report

Merging #17307 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #17307      +/-   ##
==========================================
- Coverage   91.03%   91.01%   -0.02%     
==========================================
  Files         162      162              
  Lines       49567    49572       +5     
==========================================
- Hits        45121    45117       -4     
- Misses       4446     4455       +9

Flag	Coverage Δ
#multiple	`88.79% <100%> (ø)`	⬆️
#single	`40.25% <100%> (-0.06%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/indexes/numeric.py	`97.22% <100%> (+0.03%)`	⬆️
pandas/core/dtypes/concat.py	`98.29% <100%> (+0.01%)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.72% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2f00159...b6c0f19. Read the comment docs.

codecov · 2017-08-22T09:16:36Z

Codecov Report

Merging #17307 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #17307      +/-   ##
==========================================
- Coverage   91.24%   91.23%   -0.02%     
==========================================
  Files         163      163              
  Lines       50168    50176       +8     
==========================================
  Hits        45777    45777              
- Misses       4391     4399       +8

Flag	Coverage Δ
#multiple	`89.04% <100%> (ø)`	⬆️
#single	`40.28% <80%> (-0.06%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/indexes/numeric.py	`97.27% <100%> (+0.03%)`	⬆️
pandas/core/dtypes/concat.py	`99.14% <100%> (+0.01%)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.75% <0%> (-0.1%)`	⬇️
pandas/core/indexes/datetimes.py	`95.5% <0%> (+0.09%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4489389...31c8a52. Read the comment docs.

jreback · 2017-08-22T09:40:46Z

looks fine
can u add some asv for concat of Index
ones that hit your various paths would be good
IOW concat RangeIndex, IntIndex and Index cases would be great

toobaz · 2017-08-22T10:40:22Z

can u add some asv for concat of Index

pd.concat() doesn't accept Index inputs, so I guess you mean Index.append(): but then I think my previous PR already addressed all the cases you mention. I'm pretty sure at least it touches the code paths I added in #16213, in #16236 and in this PR.

jorisvandenbossche · 2017-08-22T11:12:47Z

@toobaz indeed append, but question was about asv benchmark, not tests

toobaz · 2017-08-22T11:21:46Z

@toobaz indeed append, but question was about asv benchmark, not tests

Aha, OK, I had no idea of what "asv" meant. Looking into it.

toobaz · 2017-08-23T10:15:29Z

I guess the benchmarking part is ready... but I wasn't able to test it (by the way, can you confirm that the content of this page is obsolete?). Is it automatically ran anywhere?

TomAugspurger · 2017-08-23T10:57:21Z

@toobaz yeah, that page is out of date. New docs are here. I'm tempted to just remove that wiki page. Any objections?

Results are run nightly and posted to http://pandas.pydata.org/speed/pandas/ (still setting this up)

jorisvandenbossche · 2017-08-23T11:18:24Z

I removed the content of the page and put a link to the contributing docs for now.

jorisvandenbossche

small comment about the location of the benchmarks

jorisvandenbossche · 2017-08-23T11:21:10Z

asv_bench/benchmarks/index_concat.py

+from .pandas_vb_common import *
+
+
+class Int64Indexing(object):


Can you put this one in index_object.py ? And maybe call the class IndexOps or something (not 'indexing' as it append has nothing to do with indexing)

jreback · 2017-08-23T11:55:55Z

asv_bench/benchmarks/index_concat.py

+    def setup(self):
+        idx = Index(range(10))
+        self.ridx = [idx] * 10
+        self.iidx = [idx.astye(int)] * 10


jreback · 2017-08-23T11:56:07Z

asv_bench/benchmarks/index_concat.py

+class Int64Indexing(object):
+    goal_time = 0.2
+
+    def setup(self):


make this bigger like 1000

@toobaz to see what size is best (without being able to run asv), just do the timing yourself with %timeit, and set the size so that it takes something in the range of a few milliseconds to few tens of milliseconds

toobaz · 2017-08-23T20:08:42Z

OK, the benchmarks now take ~5 ms. each.

jreback · 2017-08-23T23:17:34Z

asv_bench/benchmarks/index_object.py

+        idx = Index(range(10))
+        self.ridx = [idx] * N
+        self.iidx = [idx.astype(int)] * N
+        self.oidx = [idx.astype(str)] * N


can can you post these results (commit before your recent fix of range index concat) to this commit

pep8speaks · 2017-08-24T07:01:31Z

Hello @toobaz! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on October 28, 2017 at 06:48 Hours UTC

toobaz · 2017-08-24T07:02:07Z

can can you post these results (commit before your recent fix of range index concat) to this commit

You mean here?

(notice I changed the benchmarks so the RangeIndex special-casing is actually tested - indexes are consecutive)

Before special-casing RangeIndex.append() (with 94ef7b6 ):

In [19]: %timeit ridx[0].append(ridx[1:])
100 loops, best of 3: 10.3 ms per loop

In [20]: %timeit iidx[0].append(iidx[1:])
100 loops, best of 3: 9.5 ms per loop

In [21]: %timeit oidx[0].append(oidx[1:])
100 loops, best of 3: 10.6 ms per loop

Before special-casing NumericIndex._concat_same_dtype() (with 2f00159 ):

In [6]: %timeit ridx[0].append(ridx[1:])
100 loops, best of 3: 6.15 ms per loop

In [7]: %timeit iidx[0].append(iidx[1:])
100 loops, best of 3: 10.9 ms per loop

In [8]: %timeit oidx[0].append(oidx[1:])
100 loops, best of 3: 11.7 ms per loop

Now (with 6324b86 ):

In [6]: %timeit ridx[0].append(ridx[1:])
100 loops, best of 3: 6.08 ms per loop

In [7]: %timeit iidx[0].append(iidx[1:])
100 loops, best of 3: 9.71 ms per loop

In [8]: %timeit oidx[0].append(oidx[1:])
100 loops, best of 3: 11.7 ms per loop

jreback · 2017-08-24T10:04:29Z

asv_bench/benchmarks/index_object.py

+    goal_time = 0.2
+
+    def setup(self):
+        N = 1000


make this 10000

jreback · 2017-08-24T10:06:11Z

@toobaz I wanted to see the asv results for these benchmarks;
```asv continuous master thisbranch --bench concat `` (you might want to name these index_concat actually)

and I don't care about the internal benchmarks (a _*) function, rather the public API .append()

toobaz · 2017-08-24T10:31:34Z

@toobaz I wanted to see the asv results for these benchmarks;
```asv continuous master thisbranch --bench concat `` (you might want to name these index_concat actually)

OK, when I'll find the time to set up asv.

and I don't care about the internal benchmarks (a _*) function, rather the public API .append()

Not following you here... indeed I'm testing only public methods.

jreback · 2017-08-25T11:39:00Z

asv_bench/benchmarks/index_concat.py is included but no changes.

jorisvandenbossche · 2017-08-25T20:03:06Z

asv_bench/benchmarks/index_concat.py is included but no changes.

that's a leftover that should be removed, but the actual benchmarks are in the other file and time append

toobaz · 2017-08-25T22:14:09Z

asv_bench/benchmarks/index_concat.py is included but no changes.

oops fixed, sorry

toobaz · 2017-08-26T14:28:48Z

@jreback @jorisvandenbossche
Anyone has a hint of what's happening in pandas.tests.indexes.test_category.TestCategoricalIndex.test_reindexing? It fails randomly on Travis, (on other branches too) and I really can't understand what could cause the error.

jreback · 2017-08-26T15:28:58Z

#17323

toobaz · 2017-10-24T12:20:04Z

Rebased and ran asv:

· Running 16 total benchmarks (2 commits * 1 environments * 8 benchmarks)
[  0.00%] · For pandas commit hash 3123fcff:
[  0.00%] ·· Building for conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt.......................................................
[  0.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[  6.25%] ··· Running categoricals.Categoricals.time_concat                                                                                                                          4.67±0.07ms
[ 12.50%] ··· Running index_object.IndexOps.time_concat_int                                                                                                                           116±0.01ms
[ 18.75%] ··· Running index_object.IndexOps.time_concat_obj                                                                                                                            148±0.6ms
[ 25.00%] ··· Running index_object.IndexOps.time_concat_range                                                                                                                        67.2±0.06ms
[ 31.25%] ··· Running join_merge.Concat.time_concat_empty_frames1                                                                                                                        207±1μs
[ 37.50%] ··· Running join_merge.Concat.time_concat_empty_frames2                                                                                                                        211±2μs
[ 43.75%] ··· Running join_merge.Concat.time_concat_series_axis1                                                                                                                         111±1ms
[ 50.00%] ··· Running join_merge.Concat.time_concat_small_frames                                                                                                                      38.1±0.1ms
[ 50.00%] · For pandas commit hash e1dabf37:
[ 50.00%] ·· Building for conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt...
[ 50.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 56.25%] ··· Running categoricals.Categoricals.time_concat                                                                                                                          4.88±0.05ms
[ 62.50%] ··· Running index_object.IndexOps.time_concat_int                                                                                                                            131±0.2ms
[ 68.75%] ··· Running index_object.IndexOps.time_concat_obj                                                                                                                            144±0.7ms
[ 75.00%] ··· Running index_object.IndexOps.time_concat_range                                                                                                                         69.6±0.7ms
[ 81.25%] ··· Running join_merge.Concat.time_concat_empty_frames1                                                                                                                        208±2μs
[ 87.50%] ··· Running join_merge.Concat.time_concat_empty_frames2                                                                                                                        206±1μs
[ 93.75%] ··· Running join_merge.Concat.time_concat_series_axis1                                                                                                                         110±1ms
[100.00%] ··· Running join_merge.Concat.time_concat_small_frames                                                                                                                      38.2±0.2ms       before           after         ratio
     [e1dabf37]       [3123fcff]
-       131±0.2ms       116±0.01ms     0.88  index_object.IndexOps.time_concat_int

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

jreback · 2017-10-24T12:45:44Z

IIRC was the purpose of this PR to make code more consistent; small perf bump as a result?

toobaz · 2017-10-24T13:19:16Z

The patch avoids an int -> object -> int roundtrip by not relying on _concat_index_asobject, hence yes, resulting in the modest speedup.

jreback · 2017-10-28T00:03:14Z

lgtm. pls rebase just to make sure. ping on green.

toobaz · 2017-10-28T08:16:29Z

@jreback ping

jreback · 2017-10-28T15:26:14Z

thanks!

toobaz mentioned this pull request Aug 22, 2017

CLN: Index.append() refactoring #16236

Merged

2 tasks

toobaz force-pushed the numericindex_append branch 2 times, most recently from 42acf07 to b6c0f19 Compare August 22, 2017 09:16

jreback added Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Aug 22, 2017

toobaz force-pushed the numericindex_append branch from 47fa09f to 1b00189 Compare August 22, 2017 18:28

jorisvandenbossche reviewed Aug 23, 2017

View reviewed changes

jreback reviewed Aug 23, 2017

View reviewed changes

toobaz force-pushed the numericindex_append branch 2 times, most recently from bcc85e8 to 3827072 Compare August 23, 2017 16:06

jreback reviewed Aug 23, 2017

View reviewed changes

toobaz force-pushed the numericindex_append branch from 3827072 to 6324b86 Compare August 24, 2017 07:01

toobaz force-pushed the numericindex_append branch from 6324b86 to b941cce Compare August 24, 2017 09:47

jreback reviewed Aug 24, 2017

View reviewed changes

asv_bench/benchmarks/index_object.py Outdated

goal_time = 0.2

def setup(self):

N = 1000

Copy link

Contributor

jreback Aug 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this 10000

toobaz force-pushed the numericindex_append branch from b941cce to c45d2a5 Compare August 24, 2017 10:28

toobaz force-pushed the numericindex_append branch from c45d2a5 to fb05c39 Compare August 25, 2017 22:12

toobaz force-pushed the numericindex_append branch 4 times, most recently from 11b3172 to 9f69db2 Compare August 26, 2017 10:37

toobaz force-pushed the numericindex_append branch from 9f69db2 to 795305b Compare August 26, 2017 15:12

toobaz force-pushed the numericindex_append branch from 795305b to 3123fcf Compare October 24, 2017 11:27

jreback added this to the 0.22.0 milestone Oct 28, 2017

toobaz added 2 commits October 28, 2017 08:27

REF: Special case NumericIndex._concat_same_dtype()

2b5c0cf

PERF: test indexes concat

31c8a52

toobaz force-pushed the numericindex_append branch from 3123fcf to 31c8a52 Compare October 28, 2017 06:48

jreback merged commit 46826ad into pandas-dev:master Oct 28, 2017

toobaz deleted the numericindex_append branch October 28, 2017 17:06

peterpanmj pushed a commit to peterpanmj/pandas that referenced this pull request Oct 31, 2017

REF: Special case NumericIndex._append_same_dtype() (pandas-dev#17307)

89a8ac3

No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017

REF: Special case NumericIndex._append_same_dtype() (pandas-dev#17307)

a3498f1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REF: Special case NumericIndex._append_same_dtype() #17307

REF: Special case NumericIndex._append_same_dtype() #17307

toobaz commented Aug 22, 2017

codecov bot commented Aug 22, 2017

codecov bot commented Aug 22, 2017 •

edited

Loading

jreback commented Aug 22, 2017

toobaz commented Aug 22, 2017

jorisvandenbossche commented Aug 22, 2017

toobaz commented Aug 22, 2017

toobaz commented Aug 23, 2017

TomAugspurger commented Aug 23, 2017 •

edited

Loading

jorisvandenbossche commented Aug 23, 2017

jorisvandenbossche left a comment

jorisvandenbossche Aug 23, 2017

jreback Aug 23, 2017

jreback Aug 23, 2017

jorisvandenbossche Aug 23, 2017

toobaz commented Aug 23, 2017

jreback Aug 23, 2017

pep8speaks commented Aug 24, 2017 •

edited

Loading

toobaz commented Aug 24, 2017

jreback Aug 24, 2017

jreback commented Aug 24, 2017

toobaz commented Aug 24, 2017

jreback commented Aug 25, 2017

jorisvandenbossche commented Aug 25, 2017

toobaz commented Aug 25, 2017

toobaz commented Aug 26, 2017

jreback commented Aug 26, 2017

toobaz commented Oct 24, 2017

jreback commented Oct 24, 2017

toobaz commented Oct 24, 2017 •

edited

Loading

jreback commented Oct 28, 2017

toobaz commented Oct 28, 2017

jreback commented Oct 28, 2017

REF: Special case NumericIndex._append_same_dtype() #17307

REF: Special case NumericIndex._append_same_dtype() #17307

Conversation

toobaz commented Aug 22, 2017

codecov bot commented Aug 22, 2017

Codecov Report

codecov bot commented Aug 22, 2017 • edited Loading

Codecov Report

jreback commented Aug 22, 2017

toobaz commented Aug 22, 2017

jorisvandenbossche commented Aug 22, 2017

toobaz commented Aug 22, 2017

toobaz commented Aug 23, 2017

TomAugspurger commented Aug 23, 2017 • edited Loading

jorisvandenbossche commented Aug 23, 2017

jorisvandenbossche left a comment

Choose a reason for hiding this comment

jorisvandenbossche Aug 23, 2017

Choose a reason for hiding this comment

jreback Aug 23, 2017

Choose a reason for hiding this comment

jreback Aug 23, 2017

Choose a reason for hiding this comment

jorisvandenbossche Aug 23, 2017

Choose a reason for hiding this comment

toobaz commented Aug 23, 2017

jreback Aug 23, 2017

Choose a reason for hiding this comment

pep8speaks commented Aug 24, 2017 • edited Loading

Comment last updated on October 28, 2017 at 06:48 Hours UTC

toobaz commented Aug 24, 2017

jreback Aug 24, 2017

Choose a reason for hiding this comment

jreback commented Aug 24, 2017

toobaz commented Aug 24, 2017

jreback commented Aug 25, 2017

jorisvandenbossche commented Aug 25, 2017

toobaz commented Aug 25, 2017

toobaz commented Aug 26, 2017

jreback commented Aug 26, 2017

toobaz commented Oct 24, 2017

jreback commented Oct 24, 2017

toobaz commented Oct 24, 2017 • edited Loading

jreback commented Oct 28, 2017

toobaz commented Oct 28, 2017

jreback commented Oct 28, 2017

codecov bot commented Aug 22, 2017 •

edited

Loading

TomAugspurger commented Aug 23, 2017 •

edited

Loading

pep8speaks commented Aug 24, 2017 •

edited

Loading

toobaz commented Oct 24, 2017 •

edited

Loading