-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add access example to DataFrame docs (#1001) #1004
Add access example to DataFrame docs (#1001) #1004
Conversation
Add an example showing what happens when trying to access a data frame column that does not exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I requested a slight simplification we can do.
What other aspects of the Access
implementation did you maybe want to highlight?
Mainly some of the aspects from this discussion: https://elixirforum.com/t/access-behaviour-for-explorer-dataframe/66691. (Though, looking at the username, I think you might already be in that discussion!) To summarize that thread, I was surprised by the fact that DataFrame had a different I wasn't sure if these doc change was the spot to get into all of that though. |
Yep that's me! I wanted to clarify because some points (like not needing to implement If we do want to include more than what's
I'm not married to any of that language. But explaining the "why" might be nice (if the team agrees!). |
I like the idea of adding some clarification of the reasoning of why things were designed this way. For me, it was pretty surprising to see the same action (trying to access a column) have different result depending on the argument, (sometimes raising, sometimes returning an empty dataframe, etc.). (note: this got a bit rambling, there is a summary at the bottom) More examplesTo be honest, I don't understand the reasoning behind sometimes raising and sometimes returning an empty dataframe. E.g., for this dataframe:
This raises:
But this returns an empty data frame:
Asking for the 2nd column raises:
But this returns a data frame with just column
Comparing to R's tidyverseI'm not making a value judgement on this, rather trying to explain what I found confusing. Consider this R tidyverse example. Here is a data frame.
And here attempting to access a column that doesn't exist:
That gives an error similar to Explorer. And the result of selecting columns from a data frame with regex also behaves like Explorer, by giving an "empty" dataframe.
Same behavior when accessing a column by index when that index is out of bounds (R using 1-based indexing):
But doing the equivalent of the
That raises an error in R, whereas Explorer gives the data frame with just column SummaryExplorer's access behavior is surprising to me in some ways. Not making a judgement on how it should be, since I don't have enough experience with Explorer, just trying to think of ways that docs could make it clearer. I think some of this stuff is out of the scope for this small PR though. In fact, it would probably make sense as part of a cheat sheet or something like Explorer for Tidyverse users. I may have a go at putting something like that together, but it can be a big task. (There was a discussion here about cheat sheets too.) |
To summarize and expand a little, here are your uses in the three relevant libraries:
And here are their outputs:
So there's only a mismatch on the range one if I'm following. And I agree it's a bit borderline (and admit I was surprised ranges didn't raise). But when designing the API, the Explorer team is balancing a few tensions: what makes sense in Elixir, what data scientists are used to, what's feasible to implement today via Polars, etc. In this case, though I can't say for sure, I think we just defaulted to how Polars handles ranges rather than the tidyverse. Both seem somewhat reasonable to me. But regardless of the reasoning behind individual design decisions, clearer/better docs are almost always the right call. And as for the cheatsheet stuff, it would be appreciated! Even a start that can be improved upon later. |
I think we should raise for ranges out of bounds, for consistency within DataFrames and also with Nx. |
Issue for raising with ranges: #1005 Thanks for pointing it out, @mooreryan! |
This sort of info would be pretty cool to have in docs somewhere--I'm just not sure where they would ideally go. |
Yeah I'm not sure. Maybe our contributor's guidelines? |
The trailing newline causes issues in doctests. See comments of PR elixir-explorer#1004 for context.
💚 💙 💜 💛 ❤️ |
To start, I added one example that shows what happens when trying to access a data frame column that does not exist.
I'm not sure how much to go into some of the comments in #1001, e.g., the fact that an error is raised instead of returning
nil
and other aspects of theAccess
behaviour that might be surprising.What do you think, should it be kept simple like this or should I add something specifically about Access behaviour?