Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better integration with CategoricalArrays.jl in keyword attributes #2173

Closed
juliohm opened this issue Sep 6, 2019 · 10 comments
Closed

Better integration with CategoricalArrays.jl in keyword attributes #2173

juliohm opened this issue Sep 6, 2019 · 10 comments

Comments

@juliohm
Copy link

juliohm commented Sep 6, 2019

It seems like the basic series types doesn't support categorical values. Categorical values are relevant in classification tasks with MLJ.jl for example.

One option is to add explicit support in Plots.jl to handle types from CategoricalArrays.jl Another option is to modify plot recipes in my own packages to handle categorical values: convert them manually to Integer types and then plot. Since I think this feature is useful to many more users in the ecosystem, I think the first option is the way to go.

Could you please clarify the current support of Plots.jl with CategoricalArrays.jl types? Can you please guide me where I could change the Plots.jl codebase to add support if the support is not there yet?

@mkborregaard
Copy link
Member

I think Plots mostly should support CategoricalArrays out of the box, no? Can you give an example where Plots does not have the desired behaviour?

@juliohm
Copy link
Author

juliohm commented Sep 11, 2019

Hi @mkborregaard , I am trying to find some time to reproduce the issue I had last week. Got busy with some other projects, but I should return soon. Sorry for the delay.

@juliohm
Copy link
Author

juliohm commented Sep 26, 2019

Finally found the time to come back to this. The following code reproduces the issue:

using Plots
using CategoricalArrays

x = rand(100)
y = rand(100)
c = rand(Int, 100)

scatter(x, y, zcolor=c) # works fine
scatter(x, y, zcolor=categorical(c)) # throws error

Could you please advise on how to fix it? I am doing a presentation tomorrow, and will have to hack my plot recipes temporarily to convert the categorical values to integers.

@mkborregaard mkborregaard changed the title Better integration with CategoricalArrays.jl Better integration with CategoricalArrays.jl in keyword attributes Sep 27, 2019
@mkborregaard
Copy link
Member

ah, I see. Categorical for attributes, not data.
In this case, what would you expect? Categories can be nominal, and thus not obviously printable with zcolor. Instead you'd expect it to act like scatter(x, y, group = c), right?

@juliohm
Copy link
Author

juliohm commented Sep 27, 2019

Maybe separate the two cases as suggested. When the categorical value is ordered, just plot the color using zcolor, and when it it not, assign the group meaning, what do you think?

My real issue is that I want to color points properly no matter the type of the variables. For example, I have a point set in 2D plane where colors can be continuous variables, categorical variables, etc. Is there a keyword in Plots.jl that represents this general coloring goal?

@juliohm
Copy link
Author

juliohm commented Sep 27, 2019

Let's say we are looking at the traditional data types in statistics:

  • Numerical: continuous and discrete (countable)
  • Categorical: nominal and ordered

Do we have a good mechanism to write recipes that accept arrays of any type, figures out the corresponding scitype of the entries with "ScientificTypes.jl" and then maps the correct keyword (e.g. zcolor, group, ...)?

@mkborregaard
Copy link
Member

Not really but I can see how that would be useful. The design predates CategoricalArrays

@juliohm
Copy link
Author

juliohm commented Sep 29, 2019

Thank you @mkborregaard what can be done meanwhile? I was using zcolor as the keyword to color points in a scatter plot. The workaround is to introduce a if/else in every recipe whenever I am trying to pass categorical values?

@juliohm
Copy link
Author

juliohm commented Nov 22, 2019

Below is a snippet of code that converts a categorical array into an integer coding for plotting:

using CategoricalArrays

a = categorical(["AB", "CD"])

b = CategoricalArrays.order(a.pool)[a.refs]
2-element Array{UInt32,1}:
 0x00000001
 0x00000002

So Plots.jl could handle this conversion internally in its plot recipes.

Another issue that I can't figure out how to solve is the existence of missing values:

a = categorical(["AB", missing, "CD"])

In this case the snippet of code above does not work.

Where I can add that first code snippet in Plots.jl so that it handles categorical arrays without crashing? Ideally the plot recipe would change completely, but that is not urgent.

cc: @mlubin

@juliohm
Copy link
Author

juliohm commented Mar 4, 2020

This is an issue that could be discussed in detail in vizcon for the new recipe system instead. Recipes and plots that consider scientific types besides just raw types.

@juliohm juliohm closed this as completed Mar 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants