-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scatterplot throws an error when referencing int column by name #2279
Comments
Hm I think this is the same story as #2263: the refactoring/standardization of the input data processing appears to have broken a couple of styles of data specification that are in a sort of gray zone where they weren't quite documented as part of the API but did work given how the old code functioned. As in my answer to #2263, it may be possible to rework the new code to handle this case, although it will make things more complicated/unpredictable than considering x/y/etc as keys if they're strings and data otherwise. How important are numeric names to your usecase? |
Interesting!
Just a note on my confusion with the API here - the current docs just say
Personally I find this really nice because I can do the following:
It is convenient because when my data columns have no real meaning other than "First dimension", "Second dimension"... etc. (which is often the case when I'm just starting from an array) the default in pandas is to just make the column names integers if you don't pass in column names. Obviously it's not essential (can work around by making the columns strings) but it is super convenient! I imagine it could matter even more if you were plotting a dataframe that was constructed by a pivot as the columns names could easily end up being
If you end up deciding to support, let me know if I can try to help in some way! |
Right, the ambiguity of "key" there is why I say it's in a gray area. I meant that it's not documented in the sense that none of the API examples (to my knowledge) use non-string values for keyed data. I have personally always considered integer pandas labels to be a bit of an antipattern, because it feels like it muddies the distinction between positional and label-based indexing. But that may be a personal preference. Additionally, seaborn can interpret scalars as data, e.g. sns.scatterplot(x=0, y=np.arange(10)) For now you could do sns.scatterplot(x=plot_df[0], y=plot_df[1], hue=labels) |
Summary
scatterplot
throws an error when trying to reference column by name if the column name is an intIn seaborn 0.10 and lower, I was able to pass column names to
scatterplot
that were integers (e.g.scatterplot(data=data, x=0, y=1)
). After updating to 0.11, I'm getting the error below.Apologies if already posted but I did not see anything about this in the issue tracker. Thanks for the awesome package!
Environment
seaborn version: 0.11.0 (with 0.10.1, the code below will run fine)
matplotlib version: 3.1.3
matplotlib backend: TkAgg
Code to reproduce
Output
The text was updated successfully, but these errors were encountered: