ENH: _dir_additions returns also the first level of a MultiIndex #16326

BibMartin · 2017-05-11T09:29:24Z

tests added / passed
passes git diff upstream/master --name-only -- '*.py' | flake8 --diff
whatsnew entry

Assuming a DataFrame with MultiIndex columns like this:

>> df
    foo   bar      
          foo   bar
0  0.32  0.15  0.45
1  0.62  0.73  0.36
2  0.21  0.68  0.10
3  0.05  0.36  0.90

one can access to df['foo'] and df['bar'] with the shortcuts df.foo and df.bar,
but one don't benefit from autocompletion because 'foo' and 'bar' are not listed in dir(df).

>> 'bar' in dir(df)
False

This PR extends df._dir_additions so that the first level of a MultiIndex is listed in dir(df).

>> 'bar' in dir(df)
True

codecov · 2017-05-11T10:05:37Z

Codecov Report

Merging #16326 into master will increase coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #16326      +/-   ##
==========================================
+ Coverage   90.37%   90.37%   +<.01%     
==========================================
  Files         161      161              
  Lines       50863    50863              
==========================================
+ Hits        45966    45969       +3     
+ Misses       4897     4894       -3

Flag	Coverage Δ
#multiple	`88.16% <ø> (+0.02%)`	⬆️
#single	`40.33% <ø> (-0.11%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/generic.py	`91.94% <ø> (ø)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.59% <0%> (-0.1%)`	⬇️
pandas/plotting/_converter.py	`65.05% <0%> (+1.81%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0607e03...456a932. Read the comment docs.

codecov · 2017-05-11T10:05:43Z

Codecov Report

❗ No coverage uploaded for pull request base (master@e909ea0). Click here to learn what that means.
The diff coverage is 100%.

@@            Coverage Diff            @@
##             master   #16326   +/-   ##
=========================================
  Coverage          ?   91.56%           
=========================================
  Files             ?      153           
  Lines             ?    51272           
  Branches          ?        0           
=========================================
  Hits              ?    46947           
  Misses            ?     4325           
  Partials          ?        0

Flag	Coverage Δ
#multiple	`89.42% <100%> (?)`
#single	`40.68% <0%> (?)`

Impacted Files	Coverage Δ
pandas/core/generic.py	`95.9% <100%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e909ea0...4ee5b9f. Read the comment docs.

jreback · 2017-05-11T11:21:30Z

I am +0 on this. This add some needless complexity, though from a user POV might be nice. Would need some additional tests to consider.

BibMartin · 2017-05-11T11:31:45Z

@jreback

Would need some additional tests to consider.

Sure. I plan to put tests if the principle is accepted. Any idea is welcome.
Btw, I've not seen whether there are tests about dir behavior.

jreback · 2017-05-11T11:36:57Z

Btw, I've not seen whether there are tests about dir behavior.

bash-3.2$ grep -R 'dir(' pandas/tests/

BibMartin · 2017-05-13T11:10:53Z

Added test_tab_completion for DataFrames, inpired from Series.

jreback · 2017-08-01T22:55:32Z

@jorisvandenbossche thoughts here

jreback · 2017-09-23T16:58:48Z

this is ok in principle. can you rebase / update

jreback · 2017-11-12T19:13:50Z

can you rebase

BibMartin · 2017-11-29T06:02:00Z

Sorry, I've been long to rebase.

Do I need to add a whatsneww entry ? In which version file ?

jreback · 2017-11-29T12:06:14Z

pandas/core/generic.py

@@ -196,7 +196,11 @@ def __unicode__(self):
    def _dir_additions(self):
        """ add the string-like attributes from the info_axis """
        additions = set([c for c in self._info_axis
-                         if isinstance(c, string_types) and isidentifier(c)])
+                         if isinstance(c, string_types) and isidentifier(c)] +
+                        [c[0] for c in self._info_axis


I would rather both of these to (private) methods on Index and overriden on a MultiIndex; this become much simpler

e.g.

def _to_identifiers(self): return [c for c in self if isinstance(c, string_types) and isidentifier(c)]

with some doc-strings & tests

then this method becomes really simple.

Now that I understand the code better, I realize that another simple implementation could be:

additions = set([c for c in self._info_axis.get_level_values(0) if isinstance(c, string_types) and isidentifier(c)])

Please tell me which method would be best.

yes this looks reasonable. pls add a comment on what is going one.

BibMartin · 2017-12-01T10:09:08Z

Changes are done, and rebased. Btw, I've added a test on Series auto-completion.

jreback

can you add a note in 0.22, other enhancements.

jreback · 2017-12-01T11:32:21Z

pandas/tests/series/test_api.py

@@ -234,6 +234,39 @@ def test_tab_completion(self):
        assert 'str' not in dir(s)
        assert 'dt' in dir(s)  # as it is a datetime categorical

+    def test_index_tab_completion(self):
+        # dir contains string-like values of the Index.


you can parametrize this with index

jreback · 2017-12-01T11:33:00Z

pandas/tests/series/test_api.py

+                assert (not isinstance(x, string_types) or
+                        not isidentifier(x) or x in dir_s)
+
+        # dir contains string-like values of the MultiIndex first level.


you can add these in the parameterization above

jreback · 2017-12-01T11:35:40Z

one additional thing I think we need. If you have a very large index, _dir_additions actually takes quite a bit of time (this is not exclusive of this change).

So what I would do is if the index is say < 100, use the currently _dir_addition, otherwise return an empty list! (its essentially too big to use tab completion for anyhow). can you make this change and add an asv for this (could be a separate PR as well)

jreback · 2017-12-01T11:37:35Z

xref #18587

jreback · 2017-12-01T14:49:03Z

you can actually use .unique(level=0) to make this efficient

…s-dev#16326, pandas-dev#18587)

BibMartin · 2017-12-05T18:28:05Z

I think that it's ready to go ; please tell me if I shall squash the commits.

jreback · 2017-12-06T01:16:37Z

doc/source/whatsnew/v0.22.0.txt

@@ -77,6 +77,8 @@ Other Enhancements
 - :func:`Series.fillna` now accepts a Series or a dict as a ``value`` for a categorical dtype (:issue:`17033`)
 - :func:`pandas.read_clipboard` updated to use qtpy, falling back to PyQt5 and then PyQt4, adding compatibility with Python3 and multiple python-qt bindings (:issue:`17722`)
 - Improved wording of ``ValueError`` raised in :func:`read_csv` when the ``usecols`` argument cannot match all columns. (:issue:`17301`)
+- :func:`NDFrame._dir_additions` (tab completion) also returns identifiers in the first level of a :func:`MultiIndex`. (:issue:`16326`)
+- :func:`NDFrame._dir_additions` (tab completion) limits to 100 values, for better performance. (:issue:`18587`)


can you move 2nd to performance. don't use NDFrame. just say Series/DataFrame tab completion

jreback · 2017-12-06T01:17:31Z

pandas/tests/frame/test_api.py

+    def test_tab_completion(self):
+        # DataFrame whose columns are identifiers shall have them in __dir__.
+        df = pd.DataFrame([list('abcd'), list('efgh')], columns=list('ABCD'))
+        assert 'A' in dir(df)


actually you can assert all of the items are in the dir() (use a loop)

jreback · 2017-12-06T01:18:00Z

pandas/tests/frame/test_api.py

+            columns=pd.MultiIndex.from_tuples(list(zip('ABCD', 'EFGH'))))
+        assert 'A' in dir(df)
+        assert isinstance(df.__getitem__('A'), pd.DataFrame)
+


assert EFGH are NOT in the result (in a loop) and assert ABCD there

jreback · 2017-12-06T01:18:52Z

pandas/tests/series/test_api.py

@@ -10,7 +10,7 @@
 from pandas import Index, Series, DataFrame, date_range
 from pandas.core.indexes.datetimes import Timestamp

-from pandas.compat import range
+from pandas.compat import range, lzip, isidentifier, string_types


can you add a test that has 101 columns and assert first 100 there and last 1 is not.

can you do thsi

I've put this in the other test (See here and there)
Please tell me if you prefer that I put it in a separate test.

jreback · 2017-12-06T01:20:10Z

pandas/core/generic.py

+        """ add the string-like attributes from the info_axis.
+        If info_axis is a MultiIndex, it's first level values are used.
+        """
+        additions = set(


can you add an asv that does dir() on a Series/DataFrame with say 10000 elements (we might already have one of these)

…s-dev#16326, pandas-dev#18587)

…irst level

…s-dev#16326, pandas-dev#18587)

BibMartin · 2017-12-08T19:01:20Z

Rebased and ready

…s-dev#16326, pandas-dev#18587)

jreback · 2017-12-11T11:23:52Z

thanks @BibMartin nice patch!

BibMartin · 2017-12-11T13:16:34Z

Thanks to you @jreback : it's been a great occasion to learn for me.

jreback added MultiIndex Output-Formatting __repr__ of pandas objects, to_string labels May 11, 2017

BibMartin force-pushed the feature/multiindex-columns-autocompletion branch from b4748b2 to f4c763f Compare October 14, 2017 16:49

BibMartin force-pushed the feature/multiindex-columns-autocompletion branch from f4c763f to 43e716c Compare November 29, 2017 01:57

jreback requested changes Nov 29, 2017

View reviewed changes

BibMartin force-pushed the feature/multiindex-columns-autocompletion branch from 43e716c to 7a31d11 Compare December 1, 2017 09:25

jreback requested changes Dec 1, 2017

View reviewed changes

jreback mentioned this pull request Dec 1, 2017

PERF: tab completion with a large index #18587

Open

BibMartin force-pushed the feature/multiindex-columns-autocompletion branch from da084ac to 33ace7b Compare December 5, 2017 17:05

BibMartin pushed a commit to BibMartin/pandas that referenced this pull request Dec 5, 2017

DOC: Update whatsnew about NDFrame._dir_additions enhancements (panda…

33ace7b

…s-dev#16326, pandas-dev#18587)

jreback requested changes Dec 6, 2017

View reviewed changes

BibMartin force-pushed the feature/multiindex-columns-autocompletion branch from b2646aa to 1c4109f Compare December 6, 2017 14:22

BibMartin pushed a commit to BibMartin/pandas that referenced this pull request Dec 6, 2017

DOC: Update whatsnew about NDFrame._dir_additions enhancements (panda…

308ea2b

…s-dev#16326, pandas-dev#18587)

BibMartin force-pushed the feature/multiindex-columns-autocompletion branch from 1c4109f to f4cf5f8 Compare December 6, 2017 16:54

BibMartin pushed a commit to BibMartin/pandas that referenced this pull request Dec 6, 2017

DOC: Update whatsnew about NDFrame._dir_additions enhancements (panda…

46eb051

…s-dev#16326, pandas-dev#18587)

Martin Journois added 2 commits December 8, 2017 19:53

ENH: _dir_additions returns only the 100 first values of the index' f…

6840e11

…irst level

TST: Add test on DataFrame columns auto-completion

4ea43bf

BibMartin force-pushed the feature/multiindex-columns-autocompletion branch from f4cf5f8 to b199d66 Compare December 8, 2017 18:58

BibMartin pushed a commit to BibMartin/pandas that referenced this pull request Dec 8, 2017

DOC: Update whatsnew about NDFrame._dir_additions enhancements (panda…

1724c72

…s-dev#16326, pandas-dev#18587)

Martin Journois added 3 commits December 8, 2017 20:13

TST: ADD test on Series index auto-completion

bee67d1

DOC: Update whatsnew about NDFrame._dir_additions enhancements (panda…

edb184a

…s-dev#16326, pandas-dev#18587)

PERF: Add ASV on Series dir

a46e9e9

BibMartin force-pushed the feature/multiindex-columns-autocompletion branch from 1a3f348 to a46e9e9 Compare December 8, 2017 19:14

jreback added this to the 0.22.0 milestone Dec 11, 2017

jreback added 2 commits December 11, 2017 06:20

Merge branch 'master' into PR_TOOL_MERGE_PR_16326

829bdf4

fix whatsnew

4ee5b9f

jreback approved these changes Dec 11, 2017

View reviewed changes

jreback merged commit 2aa4aa9 into pandas-dev:master Dec 11, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: _dir_additions returns also the first level of a MultiIndex #16326

ENH: _dir_additions returns also the first level of a MultiIndex #16326

BibMartin commented May 11, 2017 •

edited

Loading

codecov bot commented May 11, 2017

codecov bot commented May 11, 2017 •

edited

Loading

jreback commented May 11, 2017

BibMartin commented May 11, 2017

jreback commented May 11, 2017

BibMartin commented May 13, 2017

jreback commented Aug 1, 2017

jreback commented Sep 23, 2017

jreback commented Nov 12, 2017

BibMartin commented Nov 29, 2017

jreback Nov 29, 2017

BibMartin Nov 30, 2017

jreback Dec 1, 2017

BibMartin commented Dec 1, 2017

jreback left a comment

jreback Dec 1, 2017

jreback Dec 1, 2017

jreback commented Dec 1, 2017

jreback commented Dec 1, 2017

jreback commented Dec 1, 2017

BibMartin commented Dec 5, 2017

jreback Dec 6, 2017

jreback Dec 6, 2017

jreback Dec 6, 2017

jreback Dec 6, 2017

jreback Dec 9, 2017

BibMartin Dec 11, 2017

jreback Dec 6, 2017

BibMartin commented Dec 8, 2017

jreback commented Dec 11, 2017

BibMartin commented Dec 11, 2017

ENH: _dir_additions returns also the first level of a MultiIndex #16326

ENH: _dir_additions returns also the first level of a MultiIndex #16326

Conversation

BibMartin commented May 11, 2017 • edited Loading

codecov bot commented May 11, 2017

Codecov Report

codecov bot commented May 11, 2017 • edited Loading

Codecov Report

jreback commented May 11, 2017

BibMartin commented May 11, 2017

jreback commented May 11, 2017

BibMartin commented May 13, 2017

jreback commented Aug 1, 2017

jreback commented Sep 23, 2017

jreback commented Nov 12, 2017

BibMartin commented Nov 29, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BibMartin commented Dec 1, 2017

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Dec 1, 2017

jreback commented Dec 1, 2017

jreback commented Dec 1, 2017

BibMartin commented Dec 5, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BibMartin commented Dec 8, 2017

jreback commented Dec 11, 2017

BibMartin commented Dec 11, 2017

BibMartin commented May 11, 2017 •

edited

Loading

codecov bot commented May 11, 2017 •

edited

Loading