-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: _dir_additions returns also the first level of a MultiIndex #16326
ENH: _dir_additions returns also the first level of a MultiIndex #16326
Conversation
Codecov Report
@@ Coverage Diff @@
## master #16326 +/- ##
==========================================
+ Coverage 90.37% 90.37% +<.01%
==========================================
Files 161 161
Lines 50863 50863
==========================================
+ Hits 45966 45969 +3
+ Misses 4897 4894 -3
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #16326 +/- ##
=========================================
Coverage ? 91.56%
=========================================
Files ? 153
Lines ? 51272
Branches ? 0
=========================================
Hits ? 46947
Misses ? 4325
Partials ? 0
Continue to review full report at Codecov.
|
I am +0 on this. This add some needless complexity, though from a user POV might be nice. Would need some additional tests to consider. |
Sure. I plan to put tests if the principle is accepted. Any idea is welcome. |
|
Added |
@jorisvandenbossche thoughts here |
this is ok in principle. can you rebase / update |
b4748b2
to
f4c763f
Compare
can you rebase |
f4c763f
to
43e716c
Compare
Sorry, I've been long to rebase. Do I need to add a whatsneww entry ? In which version file ? |
pandas/core/generic.py
Outdated
@@ -196,7 +196,11 @@ def __unicode__(self): | |||
def _dir_additions(self): | |||
""" add the string-like attributes from the info_axis """ | |||
additions = set([c for c in self._info_axis | |||
if isinstance(c, string_types) and isidentifier(c)]) | |||
if isinstance(c, string_types) and isidentifier(c)] + | |||
[c[0] for c in self._info_axis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather both of these to (private) methods on Index
and overriden on a MultiIndex
; this become much simpler
e.g.
def _to_identifiers(self):
return [c for c in self if isinstance(c, string_types) and isidentifier(c)]
with some doc-strings & tests
then this method becomes really simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I understand the code better, I realize that another simple implementation could be:
additions = set([c for c in self._info_axis.get_level_values(0)
if isinstance(c, string_types) and isidentifier(c)])
Please tell me which method would be best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes this looks reasonable. pls add a comment on what is going one.
43e716c
to
7a31d11
Compare
Changes are done, and rebased. Btw, I've added a test on Series auto-completion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a note in 0.22, other enhancements.
pandas/tests/series/test_api.py
Outdated
@@ -234,6 +234,39 @@ def test_tab_completion(self): | |||
assert 'str' not in dir(s) | |||
assert 'dt' in dir(s) # as it is a datetime categorical | |||
|
|||
def test_index_tab_completion(self): | |||
# dir contains string-like values of the Index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can parametrize this with index
pandas/tests/series/test_api.py
Outdated
assert (not isinstance(x, string_types) or | ||
not isidentifier(x) or x in dir_s) | ||
|
||
# dir contains string-like values of the MultiIndex first level. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can add these in the parameterization above
one additional thing I think we need. If you have a very large index, So what I would do is if the index is say < 100, use the currently |
xref #18587 |
you can actually use |
da084ac
to
33ace7b
Compare
I think that it's ready to go ; please tell me if I shall squash the commits. |
doc/source/whatsnew/v0.22.0.txt
Outdated
@@ -77,6 +77,8 @@ Other Enhancements | |||
- :func:`Series.fillna` now accepts a Series or a dict as a ``value`` for a categorical dtype (:issue:`17033`) | |||
- :func:`pandas.read_clipboard` updated to use qtpy, falling back to PyQt5 and then PyQt4, adding compatibility with Python3 and multiple python-qt bindings (:issue:`17722`) | |||
- Improved wording of ``ValueError`` raised in :func:`read_csv` when the ``usecols`` argument cannot match all columns. (:issue:`17301`) | |||
- :func:`NDFrame._dir_additions` (tab completion) also returns identifiers in the first level of a :func:`MultiIndex`. (:issue:`16326`) | |||
- :func:`NDFrame._dir_additions` (tab completion) limits to 100 values, for better performance. (:issue:`18587`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move 2nd to performance. don't use NDFrame. just say Series/DataFrame tab completion
pandas/tests/frame/test_api.py
Outdated
def test_tab_completion(self): | ||
# DataFrame whose columns are identifiers shall have them in __dir__. | ||
df = pd.DataFrame([list('abcd'), list('efgh')], columns=list('ABCD')) | ||
assert 'A' in dir(df) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually you can assert all of the items are in the dir() (use a loop)
pandas/tests/frame/test_api.py
Outdated
columns=pd.MultiIndex.from_tuples(list(zip('ABCD', 'EFGH')))) | ||
assert 'A' in dir(df) | ||
assert isinstance(df.__getitem__('A'), pd.DataFrame) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert EFGH are NOT in the result (in a loop) and assert ABCD there
@@ -10,7 +10,7 @@ | |||
from pandas import Index, Series, DataFrame, date_range | |||
from pandas.core.indexes.datetimes import Timestamp | |||
|
|||
from pandas.compat import range | |||
from pandas.compat import range, lzip, isidentifier, string_types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a test that has 101 columns and assert first 100 there and last 1 is not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you do thsi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
""" add the string-like attributes from the info_axis. | ||
If info_axis is a MultiIndex, it's first level values are used. | ||
""" | ||
additions = set( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add an asv that does dir() on a Series/DataFrame with say 10000 elements (we might already have one of these)
b2646aa
to
1c4109f
Compare
1c4109f
to
f4cf5f8
Compare
f4cf5f8
to
b199d66
Compare
Rebased and ready |
1a3f348
to
a46e9e9
Compare
thanks @BibMartin nice patch! |
Thanks to you @jreback : it's been a great occasion to learn for me. |
git diff upstream/master --name-only -- '*.py' | flake8 --diff
Assuming a DataFrame with MultiIndex columns like this:
one can access to
df['foo']
anddf['bar']
with the shortcutsdf.foo
anddf.bar
,but one don't benefit from autocompletion because 'foo' and 'bar' are not listed in
dir(df)
.This PR extends
df._dir_additions
so that the first level of a MultiIndex is listed indir(df)
.