Categorical! to include Char #1282

Nosferican · 2017-11-21T17:34:56Z

The current "default" behavior is to make AbstractVector{T} where T <: AbstractString a CategoricalArrays.CategoricalVector and more specifically CategoricalArrays.CategoricalVector{T} where {T <: CategoricalArrays.CategoricalString, ...}

However, I believe that a better default would be to include columns that are eltype(col) <: Char as well.

The text was updated successfully, but these errors were encountered:

nalimilan · 2017-11-22T09:13:11Z

So, to rephrase, you suggest that eltype(eltype(categorical(["a"]))) give Char rather than String. That kind of makes sense, since that would be consistent with getindex. We would need to find another function to access the type of the wrapped element (but I guess it's mostly used internally, at least for now).

Nosferican · 2017-11-22T18:49:20Z

Currently Categorical Arrays only treats String differently.

df = DataFrame(a = 'a':'z', b = vcat(repeat(["a","b","c","d","e"], inner = 5), "f"))
categorical!(df, :a)
categorical!(df)

That should be fine as for developing
eltype(df[:a]) <: CategoricalValue and
eltype(df[:b]) <: AbstractString.

My point is that categorical!(df) will only pool eltype(col) <: AbstractString. If one has a column with eltype(col) <: Char it won't be pooled by default. I find this strange as Char would most likely be used as a categorical variable. In essence, the current default is

function categorical!(df::DataFrame)
    for i in 1:size(df, 2)
        if eltype(df[i]) <: AbstractString
            df[i] = CategoricalVector(df[i])
        end
    end
    df
end

why not,

function categorical!(df::DataFrame)
    for i in 1:size(df, 2)
        T = eltype(df[i])
        if T <: AbstractString | T <: Char
            df[i] = CategoricalVector(df[i])
        end
    end
    df
end

I am unfamiliar if people actually use Char in dataframes, but as a robust feature I would include it in the default behavior.

nalimilan · 2017-11-22T19:14:47Z

Ah, you refer to that issue. It's hard to decide which types should be considered as categorical and which types shouldn't. I'd tend to think we should either convert all types, or only strings by default. Any other combination would be too subtle (what about symbols? dates? custom types). We could add an argument which would default to AbstractString, and which could easily be set to Char or Any depending on your needs.

@cjprybol What do you think?

cjprybol · 2017-11-28T18:36:51Z

We could add an argument which would default to AbstractString, and which could easily be set to Char or Any depending on your needs.

I think that sounds like a fantastic approach. Given that String vectors benefit the most (are the only vector types that benefit at all?) from being converted to CategoricalVectors, keeping that as the default makes sense. But if any users prefer to convert multiple column types to CategoricalVectors so they can interact with them using a consistent API, and given the great work that went into JuliaData/CategoricalArrays.jl#87 to support non-string values, enabling support on whatever column-types users want makes sense

nalimilan mentioned this issue Nov 22, 2017

Change behavior of eltype() JuliaData/CategoricalArrays.jl#102

Merged

nalimilan added the intro issue label Sep 20, 2018

nalimilan added the Hacktoberfest label Oct 2, 2018

bkamins mentioned this issue Jan 15, 2019

DataFrames.jl roadmap #1678

Closed

31 tasks

bkamins mentioned this issue Aug 11, 2019

add type argument to categorical and categorical! #1918

Merged

bkamins closed this as completed in #1918 Aug 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Categorical! to include Char #1282

Categorical! to include Char #1282

Nosferican commented Nov 21, 2017

nalimilan commented Nov 22, 2017

Nosferican commented Nov 22, 2017

nalimilan commented Nov 22, 2017

cjprybol commented Nov 28, 2017

Categorical! to include Char #1282

Categorical! to include Char #1282

Comments

Nosferican commented Nov 21, 2017

nalimilan commented Nov 22, 2017

Nosferican commented Nov 22, 2017

nalimilan commented Nov 22, 2017

cjprybol commented Nov 28, 2017