Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing vs nothing #1854

Closed
s-celles opened this issue Jun 20, 2019 · 10 comments
Closed

missing vs nothing #1854

s-celles opened this issue Jun 20, 2019 · 10 comments

Comments

@s-celles
Copy link

s-celles commented Jun 20, 2019

Hello,

After asking on SO https://stackoverflow.com/questions/56684447/convert-a-julia-dataframe-column-with-string-to-one-with-int-and-missing-values/56685891?noredirect=1#comment99940131_56685891 I think this should in fact be discussed here.

I need to convert the following DataFrame

julia> df = DataFrame(:A=>["", "2", "3"], :B=>[1.1, 2.2, 3.3])

which looks like

3×2 DataFrame
│ Row │ A      │ B       │
│     │ String │ Float64 │
├─────┼────────┼─────────┤
│ 1   │        │ 1.1     │
│ 2   │ 2      │ 2.2     │
│ 3   │ 3      │ 3.3     │

I would like to convert A column from Array{String,1} to array of Int with missing values.

I tried

julia> df.A = tryparse.(Int, df.A)
3-element Array{Union{Nothing, Int64},1}:
  nothing
 2
 3

julia> df
3×2 DataFrame
│ Row │ A      │ B       │
│     │ Union… │ Float64 │
├─────┼────────┼─────────┤
│ 1   │        │ 1.1     │
│ 2   │ 2      │ 2.2     │
│ 3   │ 3      │ 3.3     │

julia> eltype(df.A)
Union{Nothing, Int64}

but I'm getting A column with elements of type Union{Nothing, Int64}.

nothing (of type Nothing) and missing (of type Missing) seems to be 2 differents kind of values.

After asking on SO, it seems that a solution could be

julia> df.A = map(x->begin val = tryparse(Int, x)
                           ifelse(typeof(val) == Nothing, missing, val)
                      end, df.A)
3-element Array{Union{Missing, Int64},1}:
  missing
 2
 3

Despite it perfectly answered my question I don't think that's what we can expect from DataFrames users to do so.

Maybe we should have a function which could replace nothing by missing or maybe another approach could be to have an other definition for tryparse function (which could output missing).

What is you opinion?

Kind regards

@s-celles
Copy link
Author

s-celles commented Jun 20, 2019

I noticed that replacing nothing by missing can be done using:

df.A = replace(df.A, nothing=>missing)

maybe doc should provide such a DataFrame example (with values as String, tryparse to parse as Int, and replace)

@s-celles
Copy link
Author

s-celles commented Jun 20, 2019

Having tryparse being able to directly return missing when a String can't be parsed would simplify this JuliaLang/julia#32378

@quinnj
Copy link
Member

quinnj commented Jun 20, 2019

How about:

tryparsem(T, str) = something(tryparse(T, str), missing)
df.A = tryparsem.(df.A)

@s-celles
Copy link
Author

s-celles commented Jun 20, 2019

I didn't know something function. Thanks for the idea.
Should tryparsem be included in Base or in DataFrames.jl or in user code?
This idea is too clever for not being part of a package or the language itself 😉

@quinnj
Copy link
Member

quinnj commented Jun 20, 2019

I think the reason it isn't included is because of how simple it is. As long as you're aware of the something function (and its missing counterpart coalesce), there are some really quick ways to switch between things.

@bkamins
Copy link
Member

bkamins commented Jun 20, 2019

@scls19fr - can this be closed given the solution given by @quinnj?

@s-celles
Copy link
Author

I'm still wondering what should be done here and I definitely think that closing this simply is not the best action.

At least, the doc should be improved to provide this idea.

But I still don't know why we couldn't / shouldn't add such a function (even if it's so simple).

By providing such a function in Base or in DataFrame it will urge developer to use same function name which is (imho) a good practice to improve code readability.

@bkamins
Copy link
Member

bkamins commented Jun 20, 2019

I am asking, because this functionality is not DataFrames.jl related. It should live in Base or Missings.jl (probably you can first discuss it in Missings.jl, as this is a place where experimental missing relate functionality is implemented before it is introduced in Base).

@s-celles
Copy link
Author

Ok it seems you opened a quite similar issue JuliaData/Missings.jl#61

@bkamins
Copy link
Member

bkamins commented Jun 20, 2019

Yes - but we were not sure what was the best way to do it 😞.

@s-celles s-celles reopened this Jun 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants