-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Meaning of color argument in DataFrame.plot.scatter() #16485
Comments
Oh interesting, I might try to see if I can pick this up |
Hi, I think my comment is related (tell me if I'm wrong). Until version 0.19.2, the color argument in df.plot could be specified with a rbga tuple. It is not supported in newer versions. Is it intentional?
I get the following error in newer versions:
But it was supported before. |
I just ran into the original bug (color names not being recognised/used; cryptic error message) and would like to resurrect this issue. It's frustrating because this kind of example should work according to a lot of stack overflow examples (e.g. https://stackoverflow.com/questions/41069676/make-scatter-plot-and-color-points-with-colors-stored-in-data-frame) - as a new pandas user, this is going to cause a lot of confusion. I've traced the problem (I think) to _compute_plot_data in plotting/_core.py: pandas/pandas/plotting/_core.py Line 340 in 2d491c3
AFAICT this function throws out non-numeric columns - this includes the column containing the string color values, so after this function, the dataframe no longer contains the 'color' column. The minimal example in the original issue still results in: |
I think, it would make more sense if the api would interpret either the Current behaviour:
Should generate something like:
This would be way more practical than the current behaviour IMO. |
take |
Hi all who are still interested in this topic, I have completed the general functionality, and it is out in my PR if you would like to take a look My only question is, what is the best way to choose default colors for strings here? Currently, I am pulling the largest list of mpl's colors and randomly choosing as just iterating though normally tends to pick too similar of colors |
Code Sample, a copy-pastable example if possible
Problem description
The issue here is that it is not clear what the values in the column corresponding to the argument
c
ofscatter
should be. The example given at http://pandas.pydata.org/pandas-docs/stable/visualization.html#scatter-plot uses numerical values, but in this example, I just want red and green dots. With matplotlib, you can supply the colors as a vector.IMHO, the API should be consistent. You should be able to specify the column names corresponding to the value of x, y, and the color. This would be especially useful if you have a pattern such as:
where you produce a scatter plot of selected rows.
The code in the simple example generates an error:
Expected Output
A plot with 2 points, one red and one green.
Output of
pd.show_versions()
pandas: 0.20.1
pytest: None
pip: 9.0.1
setuptools: 27.2.0
Cython: None
numpy: 1.12.1
scipy: None
xarray: None
IPython: 6.0.0
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.0
openpyxl: None
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: