`isempty(df)` should return true if either dimension == 0. #1231

rofinn · 2017-09-12T01:31:28Z

coveralls · 2017-09-12T02:08:54Z

Coverage remained the same at 86.815% when pulling 50c7917 on rofinn:rf/isempty-fix into 5dce05f on JuliaData:master.

nalimilan · 2017-09-12T10:09:13Z

src/abstractdataframe/abstractdataframe.jl

@@ -254,7 +254,7 @@ end

 Base.haskey(df::AbstractDataFrame, key::Any) = haskey(index(df), key)
 Base.get(df::AbstractDataFrame, key::Any, default::Any) = haskey(df, key) ? df[key] : default
-Base.isempty(df::AbstractDataFrame) = ncol(df) == 0
+Base.isempty(df::AbstractDataFrame) = size(df, 2) == 0 || size(df, 1) == 0


Do you actually need to check size(df, 2) == 0? If there are no columns, there can't be any rows, so the second condition will hold already (and I think size already has a special case for that which returns 0).

@nalimilan Currently, I don't think there is a a way to construct a dataframe where there are rows without columns, but I could imagine several ways that assumption might be broken in the future. I personally like that this is very explicit and makes as few assumption about the dataframe. I'm not aware of any special cases for size, is this a special case for Dataframes specifically?

FWIW, base julia does this by checking that the length (# of element) equals 0, but that isn't a reasonable option given the debate around what length(df) should return ( #1200).

See the definition of nrows, which is used by size:

nrow(df::DataFrame) = ncol(df) > 0 ? length(df.columns[1])::Int : 0

So you don't need to check the number of columns again.

Again, while I agree that the extra col check in isempty currently isn't needed, I'd like to minimize the number of assumptions being made about the structure of the dataframe moving forward.

I could see an argument for reversing the order though if we want to short circuit on nrows? That would currently bypass the extra check while remaining explicit and less dependent on the underlying structure.

FWIW, the addition of the extra col check seems fairly minimal compared to checking nrows.

julia> @benchmark size($df, 2) == 0 BenchmarkTools.Trial: memory estimate: 0 bytes allocs estimate: 0 -------------- minimum time: 6.684 ns (0.00% GC) median time: 6.690 ns (0.00% GC) mean time: 6.896 ns (0.00% GC) maximum time: 29.170 ns (0.00% GC) -------------- samples: 10000 evals/sample: 1000 julia> @benchmark size($df, 1) == 0 BenchmarkTools.Trial: memory estimate: 0 bytes allocs estimate: 0 -------------- minimum time: 39.164 ns (0.00% GC) median time: 39.681 ns (0.00% GC) mean time: 40.402 ns (0.00% GC) maximum time: 68.512 ns (0.00% GC) -------------- samples: 10000 evals/sample: 991 julia> @benchmark size($df, 1) == 0 || size($df, 2) == 0 BenchmarkTools.Trial: memory estimate: 0 bytes allocs estimate: 0 -------------- minimum time: 44.666 ns (0.00% GC) median time: 47.513 ns (0.00% GC) mean time: 48.973 ns (0.00% GC) maximum time: 68.798 ns (0.00% GC) -------------- samples: 10000 evals/sample: 988

These are probably going to be inlined and optimized out anyway, I was just thinking in terms of consistency of the code. Honestly, I don't see how in the future a data frame could be allowed to have no columns and yet contain multiple rows, but if you're afraid of that, why not...

…cols.

coveralls · 2017-09-13T04:20:01Z

Coverage increased (+0.3%) to 87.117% when pulling 18cdfd8 on rofinn:rf/isempty-fix into 5dce05f on JuliaData:master.

isempty(df) should return true if either dimension == 0.

50c7917

Fixes JuliaData#1230.

ararslan approved these changes Sep 12, 2017

View reviewed changes

nalimilan reviewed Sep 12, 2017

View reviewed changes

Reversed isempty conditions ordering to short-circuit on nrows vs n…

18cdfd8

…cols.

nalimilan merged commit fb72599 into JuliaData:master Sep 13, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`isempty(df)` should return true if either dimension == 0. #1231

`isempty(df)` should return true if either dimension == 0. #1231

rofinn commented Sep 12, 2017

coveralls commented Sep 12, 2017 •

edited

Loading

nalimilan Sep 12, 2017

rofinn Sep 12, 2017

rofinn Sep 12, 2017

nalimilan Sep 12, 2017

rofinn Sep 12, 2017

rofinn Sep 12, 2017

rofinn Sep 13, 2017

nalimilan Sep 13, 2017

coveralls commented Sep 13, 2017 •

edited

Loading

isempty(df) should return true if either dimension == 0. #1231

isempty(df) should return true if either dimension == 0. #1231

Conversation

rofinn commented Sep 12, 2017

coveralls commented Sep 12, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coveralls commented Sep 13, 2017 • edited Loading

`isempty(df)` should return true if either dimension == 0. #1231

`isempty(df)` should return true if either dimension == 0. #1231

coveralls commented Sep 12, 2017 •

edited

Loading

coveralls commented Sep 13, 2017 •

edited

Loading