-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use boolean indexing via getitem to trigger masking; add inplace keyword to where #2230
Conversation
changed method __getitem__ to use .mask directly (e.g. df.mask(df > 0) is equivalent semantically to df[df>0]) added inplace keyword to where method (to update the dataframe in place, default is NOT to use inplace, and return a new dataframe) changed method _boolean_set_ to use where and inplace=True (this allows alignment of the passed values and is slightly less strict than the current method) all tests pass (as well as an added test in boolean frame indexing)
…al sized frame thus we now allow: df[df[:-1]<0] = 2 (essentially partial boolean indexing) all tests continue to pass (added new test to test partial boolean indexing, removed test requiring an equal indexed frame)
The files here were made executable again (notably, this causes nose to exclude them from runs). Something must be wrong with your env-- as a band-aid you can globally configure git so it doesn't change file permissions |
git config core.filemode false |
I also think something like this should be added to the docs: selecting with a single column of a df (eg df[df['A']>0]) while masking with the entire frame (eg df[df>0]) |
@jseabold :) |
I'm unconvinced about the
vs
? |
actually no; I use mask like this:
so I am 'applying' a series of 'filters' to df, but I want to avoid reindexing at each step (as I like to preserve the shape); I am fine with making users do df.mask explicity; I was actually thinking that since df[df > 0 ] = 2 is supported, then the converse operation should directly yield a frame (rather than an ndarray)
is equivalent to the more verbose (assume that df[df >0] returns a frame rather than an ndarray)
|
in response to @changhiskhan
|
@changhiskhan inplace = true is setitem, while inplace = false (e.g. use np.where) is getitem here's what I would do:
so then syntax is pretty clean: I have tested and am ready to push if this looks ok |
removed mask method made other optional kw parm in where changed __setitem__ to use where (rather than mask)
added condition testing to where that raised ValueError on an invalid condition (e.g. not an ndarray like object) added tests for same
I don't think the semantics for |
ok...make sense.....only (minor) issue....in numpy pretty sure that negation (-) is completely equivalent of invert (~) when applied to boolean? in mask should change ~cond to -cond ? |
Yeah, no biggie either way |
these might be helpful examples for whatsnew and/or docs.....
standard frame selection (with a boolean series as the condition),
standard frame selection (but with boolean frame as the condition,
where is the underlying mechanism
substitue values that meet the condition
setting values
masking is the inverse boolean operation of where
advanced: partial selection and setting
|
if someone wants to do a PR to add these to the docs/what's new that'd be great. i am doing too many other things to do it myself |
done....added a small pull-request with the changes for whatsnew for 0.9.1...let me know if you need anything further! |
* commit 'v0.9.1rc1-27-ge374f0f': (52 commits) BUG: axes.color_cycle from mpl rcParams should not be joined as single string BUG: icol duplicate columns with integer sequence failure. close pandas-dev#2228 TST: unit test for pandas-dev#2214 BUG: coerce ndarray dtype to object when comparing series ENH: make vbench_suite/run_suite executable ENH: Use __file__ to determine REPO_PATH in vb_suite/suite.py BUG: 1 ** NA issue in computing new fill value in SparseSeries. close pandas-dev#2220 BUG: make inplace semantics of DataFrame.where consistent. pandas-dev#2230 BUG: fix internal error in constructing DataFrame.values with duplicate column names. close pandas-dev#2236 added back mask method that does condition inversion added condition testing to where that raised ValueError on an invalid condition (e.g. not an ndarray like object) added tests for same in core/frame.py TST: getting column from and applying op to a df should commute TST: add dual ( x op y <-> y op x ) tests for arith operators BUG: Incorrect error message due to zero based levels. close pandas-dev#2226 fixed file modes for core/frame.py, test/test_frame.py relaxed __setitem__ restriction on boolean indexing a frame on an equal sized frame in core/frame.py ENH: warn user when invoking to_dict() on df with non-unique columns BUG: modify df.iteritems to support duplicate column labels pandas-dev#2219 TST: df.iteritems() should yield Series even with non-unique column labels ...
in core/frame.py
changed method getitem to use mask directly (e.g. df.mask(df > 0) is equivalent semantically to df[df>0])
this would be a small API change as before df[df >0] returned a boolean np array
added inplace keyword to where method (to update the dataframe in place, default is NOT to use inplace, and return a new dataframe)
changed method boolean_set to use where and inplace=True (this allows alignment of the passed values and is slightly less strict than the current method)
all tests pass (as well as an added test in boolean frame indexing)
if included in 0.9.1 would be great (sorry for the late addition)