-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pairwise versions for rolling_cov, ewmcov and expanding_cov #4950
Conversation
It looks like these have nearly the same signature as the existing |
I agree with you. I provided these implementations under the *_pairwise names to fit in with the current API as described on the Computational Tools page. While the behaviour of However I clearly don't know where people are using this so I rather provided these pairwise implementations under different function names in line with the current |
Ok, I've been thinking more about this and think that rather than taking guidance from my own biased preferences, a better approach is to try and minimise cognitive impedance in the API along the lines of the "Don't make me think" principle. Now To deal with the case where the non-pairwise version is desired, I just learned that for |
@snth It could be hard sell to change the existing functions in the way you describe (since they'd produce different behavior with current defaults). if you couldn't change the existing API (i.e., It looks like there already is a That said, why couldn't we change the signature such that if you only passed one object, it computes the rolling pairwise correlation and otherwise computes the correlation between the two objects. Shouldn't be that hard to implement. The only quirk would be that you'd have to either change the signature to: |
@jtratner I like your approach of overloading the signature. I can probably do this inside |
@jtratner I've tried to incorporate the API changes as you suggested. I still need to update the documentation but I ran out of time. I'll look at that tomorrow. |
@jtratner I've updated the documentation for the API changes. Along the way I unified the 3 doc template. Hopefully everything unaffected function docstrings are still intact. I did a couple of spot checks and everything looked ok to me. Would be good to have some more eyeballs on this. I noticed that the center keyword argument isn't documented but appears in the function signatures. Should we add this? The only other thing I'd like to add is to make these functions part of the DataFrame API as I'd quite like to be able to say things like |
@snth can you rebase this....looks pretty good |
I tried a rebase of this onto on of the 0.13 release candidates some time ago and there were quite a few conflicts already then. I'm a little busy at the moment so I'll only be able to look at this next week at the earliest. |
@snth when you have a chance |
@jreback I finally got around to doing that rebase. I rebased off of master as of this morning for which the whole test suite passed on my machine. Post-rebase tests look all ok as well but let's see what Travis says. @jorisvandenbossche I tried to incorporate your docstring changes but as I had changed the template slightly I could have missed copying some changes across. If you wouldn't mind having a look to check that everything looks ok. |
this look great!
|
One remark: due to your (usefull!) refactoring of the docstrings, now the two notes I added (on "by default, result is set to right edge of window ..." and on "the freq keyword is used ...") are added to all functions, also to the expanding_ functions. However, there the |
Thanks @jorisvandenbossche . Please check if the last commit fixed this. @jreback If @jorisvandenbossche is happy with the docstrings then I'll squash down and add a release note. |
A quick note about the additional behaviour I introduced. Basically I took the pairwise behaviour from rolling_corr_pairwise() and moved it into _flex_binary_moment() in order to make it available to rolling_cov(), ewmcov(), expanding_cov(), ... I am no statistician and cannot promise that this is statistically sound. In particular if there are missing values in the data then a different number of datapoints will be used in the calculation of different entries of the covariance matrices and these covariance matrices are not guaranteed to be positive semi-definite. I just want to point that out as often the availability of something is taken as an implicit guarantee that it is correct, especially by novice users who are not trained in the field. In this case the user is responsible for ensuring that the results are suitable for his or her use-case. |
thanks otherwise looks good |
I'm busy putting in release note. Should be done tomorrow. |
I added a release note and some example usage for I also marked |
I think you need to rebase (it can't be automatically merged) |
correls[df.index[-50]] | ||
|
||
.. note:: | ||
|
||
This was previously available through ``rolling_corr_pairwise`` which is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put a prior to version 0.14
minor comments, pls rebase and I think good to go |
The deprecation is fine (I marked a reminder to remove in #6581 at some point in the future) |
looks good...ping when green |
@jtratner Travis build is green. |
Pairwise versions for rolling_cov, ewmcov and expanding_cov
@snth awesome! thanks! |
http://pandas-docs.github.io/pandas-docs-travis/computation.html#moving-rolling-statistics-moments the mention of the pairwise should be removed from the list for rolling and expanding yes? |
I deleted those 2 references, pls review and submit a fix if the docs don't look correct http://pandas-docs.github.io/pandas-docs-travis/computation.html#moving-rolling-statistics-moments |
You're right, the mention of the 2 *_corr_pairwise() functions should have been deleted. Thanks for sorting it out. Looks good to me. |
I added the functions rolling_cov_pairwise(), expanding_cov_pairwise() and ewmcov_pairwise(). I also modified the behaviour of rolling_corr_pairwise(), expanding_corr_pairwise() and ewmcorr_pairwise() slightly so that now these can be computed between different DataFrames and the resulting matrices at each time slice will be rectangular rather than square.
I think this is what was asked for in the original discussion that gave rise to issue #1853.
Fixes #1853