error when combining a grouped empty dataframe using `first` #3426

ctarn · 2024-03-01T09:12:31Z

It would be better if we could return an empty dataframe with the same cols, instead of raising an error. Thanks!

it is ok:

df = DataFrames.DataFrame(x=Int[1, 1, 2, 2], y=Int[1, 2, 3, 4])
gd = DataFrames.groupby(df, :x)
DataFrames.combine(gd, :y => Ref)
"""
2×2 DataFrame
 Row │ x      y_Ref     
     │ Int64  SubArray… 
─────┼──────────────────
   1 │     1  [1, 2]
   2 │     2  [3, 4]
"""

df = DataFrames.DataFrame(x=Int[], y=Int[])
gd = DataFrames.groupby(df, :x)
DataFrames.combine(gd, :y => Ref)
"""
0×2 DataFrame
 Row │ x      y_Ref     
     │ Int64  SubArray… 
─────┴──────────────────
"""

error:

df = DataFrames.DataFrame(x=Int[1, 1, 2, 2], y=Int[1, 2, 3, 4])
gd = DataFrames.groupby(df, :x)
DataFrames.combine(gd, :y => first) |> display
"""
2×2 DataFrame
 Row │ x      y_first 
     │ Int64  Int64   
─────┼────────────────
   1 │     1        1
   2 │     2        3
"""

df = DataFrames.DataFrame(x=Int[], y=Int[])
gd = DataFrames.groupby(df, :x)
DataFrames.combine(gd, :y => first) |> display
"""
ERROR: BoundsError: attempt to access 0-element view(::Vector{Int64}, Int64[]) with eltype Int64 at index [1]
Stacktrace:
 [1] _combine(gd::DataFrames.GroupedDataFrame{…}, cs_norm::Vector{…}, optional_transform::Vector{…}, copycols::Bool, keeprows::Bool, renamecols::Bool, threads::Bool)
   @ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/groupeddataframe/splitapplycombine.jl:755
 [2] _combine_prepare_norm(gd::DataFrames.GroupedDataFrame{…}, cs_vec::Vector{…}, keepkeys::Bool, ungroup::Bool, copycols::Bool, keeprows::Bool, renamecols::Bool, threads::Bool)
   @ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/groupeddataframe/splitapplycombine.jl:87
 [3] _combine_prepare(gd::DataFrames.GroupedDataFrame{…}, ::Base.RefValue{…}; keepkeys::Bool, ungroup::Bool, copycols::Bool, keeprows::Bool, renamecols::Bool, threads::Bool)
   @ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/groupeddataframe/splitapplycombine.jl:52
 [4] _combine_prepare
   @ ~/.julia/packages/DataFrames/58MUJ/src/groupeddataframe/splitapplycombine.jl:26 [inlined]
 [5] combine(gd::DataFrames.GroupedDataFrame{…}, args::Union{…}; keepkeys::Bool, ungroup::Bool, renamecols::Bool, threads::Bool)
   @ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/groupeddataframe/splitapplycombine.jl:857
 [6] top-level scope
   @ Untitled-1:7
...

bkamins · 2024-03-01T10:34:38Z

It is a first error not a DataFrame.jl error. You need to write e.g.:

julia> DataFrames.combine(gd, :y => v -> isempty(v) ? v : first(v))
0×2 DataFrame
 Row │ x      y_function
     │ Int64  Int64
─────┴───────────────────

ctarn · 2024-03-01T10:46:57Z

Thank you very much! May I ask how the type of y_function is determined? Is Int64 the default type?

bkamins · 2024-03-01T11:09:13Z

No, it is determined by the type of the :y column:

julia> @code_warntype (v -> isempty(v) ? v : first(v))([1,2,3])
MethodInstance for (::var"#5#6")(::Vector{Int64})
  from (::var"#5#6")(v) @ Main REPL[3]:1
Arguments
  #self#::Core.Const(var"#5#6"())
  v::Vector{Int64}
Body::Union{Int64, Vector{Int64}}
1 ─ %1 = Main.isempty(v)::Bool
└──      goto #3 if not %1
2 ─      return v
3 ─ %4 = Main.first(v)::Int64
└──      return %4

As you can see the compiler can infer what is needed.

ctarn · 2024-03-01T11:21:52Z

Can we process combine in the following way?

result = initial empty dataframes of specified empty cols
for group in groups
    row = process(group)
    add row to results
end
return result

We can also process it col by col instead of row by row.

Since the types of all cols can be inferred no matter whether a grouped dataframe is empty or not, generating an initial empty dataframe of consistent types would be possible, and users don't have to use other functions to handle specific cases.

ctarn · 2024-03-01T11:23:54Z

Since the grouped dataframe is empty, functions such as first is not called actually, and thus such an error should not be raised?

bkamins · 2024-03-02T09:56:18Z

Can we process combine in the following way?

yes

thus such an error should not be raised?

yes

bkamins closed this as completed Mar 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error when combining a grouped empty dataframe using `first` #3426

error when combining a grouped empty dataframe using `first` #3426

ctarn commented Mar 1, 2024

bkamins commented Mar 1, 2024 •

edited

Loading

ctarn commented Mar 1, 2024

bkamins commented Mar 1, 2024

ctarn commented Mar 1, 2024

ctarn commented Mar 1, 2024

bkamins commented Mar 2, 2024

error when combining a grouped empty dataframe using first #3426

error when combining a grouped empty dataframe using first #3426

Comments

ctarn commented Mar 1, 2024

bkamins commented Mar 1, 2024 • edited Loading

ctarn commented Mar 1, 2024

bkamins commented Mar 1, 2024

ctarn commented Mar 1, 2024

ctarn commented Mar 1, 2024

bkamins commented Mar 2, 2024

error when combining a grouped empty dataframe using `first` #3426

error when combining a grouped empty dataframe using `first` #3426

bkamins commented Mar 1, 2024 •

edited

Loading