-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: DataFrame.plot.scatter argument c
now accepts a column of strings, where rows with the same string are colored identically
#59239
Conversation
import pandas as pd
import random as rand
df = pd.DataFrame({
'dataX': [rand.randint(0, 100) for _ in range(100)],
'dataY': [rand.randint(0, 100) for _ in range(100)],
'fav_fruit': [rand.choice(['Apples', 'Bananas', 'Grapes', 'Peaches']) for _ in range(100)],
})
df.plot.scatter('dataX', 'dataY', c='fav_fruit') |
c
now accepts a column of strings, where rows with the same string are colored identically
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Dr-Irv in case you have any feedback as well
pandas/plotting/_matplotlib/core.py
Outdated
@@ -1390,6 +1407,38 @@ def _get_c_values(self, color, color_by_categorical: bool, c_is_column: bool): | |||
c_values = c | |||
return c_values | |||
|
|||
def _are_valid_colors(self, c_values: np.ndarray | list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what instances is c_values
a list? Might be misreading but would be better if we only worked with a pd.Series and could call .unique on that, instead of checking every single value in a loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to take a pd.Series
, not np.ndarray | list
pandas/plotting/_matplotlib/core.py
Outdated
except (TypeError, ValueError) as _: | ||
return False | ||
|
||
def _uniquely_color_strs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am no matplotlib expert but I think we need to defer to that somehow to get the desired colors, instead of trying to write this out ourselves
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some research looking into how other libs which support this functionality, including seaborn, handle this workflow. Each utilize the current mpl.colormap and draw colors from a linear space across this map. This allows users to change the choice colors of the same way they would with all other graphs
Additionally, I've added an automatic legend to this type of plot since the chosen colors are not exposed to the user, similar to how a colorbar is drawn in some cases of the same function
with tm.assert_produces_warning(None): | ||
ax = df.plot.scatter(x=0, y=1, c="state") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure that the test here should be in the tm.assert_produces_warning(None)
context. My intuition is that should be removed.
There should also be tests where the column state
contains values that are colors, as well as a mix of colors and non-colors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the tm.assert_produces_warning(None)
. I added 2 test cases: 1 containing only strings and 1 containing both valid mpl strings and invalid mpl strings.
Additionally, I updated some other functionality tests to confirm they are not executing the new logic added and coloring properly
Not sure if I need to add this as part of this PR, but there is some documentation (and maybe more) about using the Should I update this to include this new feature? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with this. Will leave it to @WillAyd to decide/merge
Thanks @michaelmannino - very impressive work! Thanks for all the efforts on the refactors / feedback |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.