-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Fix by
in DataFrame.plot.hist and DataFrame.plot.box
#28373
Conversation
@charlesdong1991 Thanks for the PR! What issue does this relate to/close? |
Hello @charlesdong1991! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2021-07-12 06:22:22 UTC |
ahh, sorry, it's still in progress, so i didn't add that Added! @simonjayhawkins |
by
in hist plot (will change once pr is ready for review)by
in hist plot
by
in hist plotby
in hist plot
by
in hist plotby
in hist plot
by
in hist plotby
in hist plot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking through reviews here, I noticed some in which
if self.by is None
was asked to be changed to
if self.by
for the sake of readability
Just for reference, here's an example of bugs in pandas visualisation which were going unnoticed because if labelsize
was being used instead of if labelsize is None
: #34768 - so, strong preference for checking is None
from me :)
Anyway, I didn't notice tests here in which by
isn't None
but would pass if self.by
(e.g. an empty list), could one be added?
Hi, @MarcoGorelli Thanks very much for your reviews! I agree we should be cautious here, I didn't notice that kind of bugs. However, in this case, I think it is fine to use As you mentioned, indeed there is no tests for this, and I have added two tests: |
Awesome, thanks! |
Agree with @MarcoGorelli about |
thank both of your comment @datapythonista @MarcoGorelli Just double check a bit before making the change: Let me just summarize a bit: so in this case, no matter if it is Or do we expect to return an error if an empty list is assigned to Hope we have consensus on the expected behaviour ^^ |
The empty list could be good if it's the same as But I'd check what's the current behavior in |
ahh, okay, yeah |
Hi, thanks for your comments @MarcoGorelli @datapythonista Sorry for late updates again, I was still busy with some exams in the past week. As discussed, I have changed it to The behavior now is when users put not None input for Please let me know if you have further comments. |
From a quick look, no obvious objections from me - I won't have a chance to do a full review til next week though |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks pretty reasonable, a couple of questions.
do the viz docs / doc-strings need updating with examples of this? (could be a followon, if so, can you open an issue)
pandas/plotting/_core.py
Outdated
@@ -1277,6 +1277,9 @@ def hist(self, by=None, bins=10, **kwargs): | |||
---------- | |||
by : str or sequence, optional | |||
Column in the DataFrame to group by. | |||
|
|||
.. versionadded:: 1.3.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
versionchanged right? can you add a line on what is changing here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I was doubting about this. We do accept by
right now but don't do anything about that. Now we do start supporting by
to make plots for groups, so i put added
here instead of changed
.
I think then i will change to versionchanged
.
:context: close-figs | ||
|
||
>>> age_list = [8, 10, 12, 14, 72, 74, 76, 78, 20, 25, 30, 35, 60, 85] | ||
>>> df = pd.DataFrame({"gender": list("MMMMMMMMFFFFFF"), "age": age_list}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was by allowed before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do allow by
, but don't do anything on that. i will also change to versionchanged
pandas/plotting/_matplotlib/core.py
Outdated
self.by = by | ||
|
||
# if users assign an empty list or tuple, treat them as None | ||
# then no group-by will be conducted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is by allowed to be an empty list/tuple?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right. I initially thought that is the behaviour of current df.hist
and df.box
, but I am wrong. I change to raise an error exactly the same as current df.hist
and df.box
.
I also changed the inline comment above this line and align with the change, and also add tests to reflect the changes.
level = 1 | ||
else: | ||
raise ValueError( | ||
f"create_iter_data_given_by can only be used with " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this hit in a test? this looks internal yes? or is it user facing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this is just an internal function, and only used in hist
and box
plot function, so literally this line should never be hit, maybe let me just remove it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point of the comment was if we should add a test, to make sure the exception is raised as expected. Probably not needed, but feel free to add. But better don't remove this. If this is ever called for the wrong plot, better to have this clear message, that one about level not being defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree! Will change later and maybe add a small test for this also
@jreback thanks very much for your reviews! I made several changes to reflect all your comments! And I will open a PR to update docs/vis with examples of this, and can work on it as a follow-up! @MarcoGorelli No rush at all, I have completed my exams last week ^^, so I will have time to work on reviews quickly. It is nice to have several pairs of eyes! |
going to merge this, but @MarcoGorelli pls have a look when you get a chance. |
thanks @charlesdong1991 sorry for the long delay. nice feature! |
Great job @charlesdong1991. Would be great if you can create issues, or work on the PRs if you've got time, to delete all the duplicate code among |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff