-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: Gather and other inspiration from tidyr #10109
Comments
Hadley Wickham is brilliant at API design, so I'm always happy to use his work for inspiration. Concrete suggestions would be helpful. At the very least, |
This already exists (sort of). as
|
I just wanted to give the same reference to In that sense, I am -1 on just adding 'yet another reshape-like' function, before thinking it a bit more through (but I fully agree that the current reshape functionality (
The problem is a bit that pandas is becoming a monolithic package. Hadley Wickham has indeed very nice API's (and we can learn a lot from that to strive for in pandas). But, if he has a new idea, he just starts a new package. For example, the current |
-1 on my own suggestion and further overloading the pandas namespace and +1 on using simple, composable building block abstractions and maybe starting a new package. Pydata is great for consistency, but R has more rapid diffuse iterative innovation...I wonder if we can help foster the latter. |
That's well put, but I'm forced to strongly disagree. While as you say R unarguably benefits from a rapid diffuse iterative innovation, If you really examine the issue closely you must realize that pydata tools tend to intentionally embrace a more focused decentralized convergent behaviour-driven amalgamation approach, one that is inherently aspect-oriented and inline with the best-of-breed theories of cloud-first which rain supreme over this exciting new age of "stuff". |
Sorry, I don't understand what that means. |
Indeed, a separate package is probably a better place to start. The only unfortunate bit about doing this outside of pandas is that users can't do method chaining with third party packages. @kay1793 Please don't troll. |
But what good trolling it was :) No I'm just kidding, it was rude and it had me going because he almost had a point there. This method chaining issue perfectly illustrates my somewhat densely articulated point: In R, new packages spring up all the time, iterate on other packages and are connected using pipes. In python (statsmodels for example) users go through a long and arduous PR process to include in packages, thus then increasing maintenance burden and decreasing motivation/ sense of ownership to maintain my own code. While the code quality is more variable and APIs are less consistent, the vibrant package landscape, binded by cran and piping, makes up for it in a sense. Sure we can write our own libraries in python, but without a CRAN like thing, they end up languishing unmaintained in corners of github. In the meantime, its harder to push to the primary packages and innovation there diminishes as maintenance takes up a higher proportion of reviewer time. Forgive me for the digression, but I think this is tangentially related and critical for pydata. The end result is that it seems the landscape in R is advancing much faster (aided in no small part by Dr. Wickham of course). Other variables include additional R moocs, but my point stands regardless. |
@datnamer I agree. I recently made a similar argument as part of a push by @mrocklin for adding macros to Python: https://mail.python.org/pipermail/python-ideas/2015-March/032822.html There may also be less extreme ways to achieve the same result... if you have ideas about things we can do, I'm all ears. |
@shoyer: Hmmm.... I really think the core pandas guys, Matt Rocklin, Travis, the Pandas people etc need to get together for a serious brainstorming session on this if python is to keep up in the near and distant future. We need to encourage innovation, modularity and ease of use. I think the low hanging fruit is to improve the sense of connectivity, idea dispersion and utilization of third party packages in the pydata community. Some sort of pydata bloggers and a CRAN like task view database is important. Regarding the chaining issue... I'm not so up on the technical details....but this looks promising: https://github.com/dalejung/naginpy Is there any reason it can't be built out and/or work interactively? Can context managers be used in some way? Is Matt Rocklin still pursuing this macro idea? |
This seems like a great topic for a BoF session at SciPy 2015... anyone else interested in co-organizing? |
In my free time, which is to say "Not at the moment". But I was pleasantly surprised by a warm response to the idea by a number of people at PyCon. I'll send out feelers to see if anyone is gung ho about pushing it forward. I think that the next step is to spec out a design and actually implement a proof of concept. CPython hackers welcome. |
Interesting. I wonder if @dalejung has thoughts on this? |
Every heard of PyPI? You are WAY underestimating the importance of consistent API's. R has succeeded to some extent, IN SPITE of this major major problem. In fact, I would argue that they are moving more and more toward curated type of packages (e.g.
The point of a 'curated' model is that not only do you get consistency, you get a best practices one-way-to-do-it. You don't have to search around and figure out 'how do I do X'. You get support and bug fixes. How many one-of-a-kind R packages have this? Sure they may have some value to a small group of people, great. But truly is this a package system that you would want to actually rely upon? The biggest benefit, however, of a package like pandas is that you get distribution. Once a feature is accepted into pandas, then it immediately becomes available to a pretty large community, is announced at release time, and is supported. I think you'd have a hard time saying the same about virtually any grass-roots packages (in R or Python), unless, they are more 'mainstream'. my 2c. (and I do agree that the StatsModels is not iterating fast enough for the community, but there also is not a lot of community support, as compared to say scikit-learn or many R packages). |
I've gone the way of adding the wackiness through integrated tooling. I'm not sure what the likelihood of Python adding macro capabilities and even then I imagine they'd be too sensible for my tastes. The features I want out of a lab environment are commonly bad practice for library development :/ |
For late-comers, here is a transformation with >>> from datar.all import c, f, tibble, pivot_longer
>>> df = tibble(
... name = c("Wilbur", "Petunia", "Gregory"),
... a = c(67, 80, 64),
... b = c(56, 90, 50)
... )
>>> df
name a b
<object> <int64> <int64>
0 Wilbur 67 56
1 Petunia 80 90
2 Gregory 64 50
>>> df >> pivot_longer(~f.name, names_to="TREATMENT", values_to="HEART RATE")
name TREATMENT HEART RATE
<object> <object> <int64>
0 Wilbur a 67
1 Petunia a 80
2 Gregory a 64
3 Wilbur b 56
4 Petunia b 90
5 Gregory b 50 |
Discussed on today's dev call, consensus was that if wide_to_long already handles this, we don't want another function. Closing. |
http://connor-johnson.com/2014/08/28/tidyr-and-pandas-gather-and-melt/
In the spirit of the excellent assign method, wondering if there is support for some tidyr style transformations?
The text was updated successfully, but these errors were encountered: