-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
isempty(df)
should return true if either dimension == 0.
#1231
Conversation
@@ -254,7 +254,7 @@ end | |||
|
|||
Base.haskey(df::AbstractDataFrame, key::Any) = haskey(index(df), key) | |||
Base.get(df::AbstractDataFrame, key::Any, default::Any) = haskey(df, key) ? df[key] : default | |||
Base.isempty(df::AbstractDataFrame) = ncol(df) == 0 | |||
Base.isempty(df::AbstractDataFrame) = size(df, 2) == 0 || size(df, 1) == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you actually need to check size(df, 2) == 0
? If there are no columns, there can't be any rows, so the second condition will hold already (and I think size
already has a special case for that which returns 0).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nalimilan Currently, I don't think there is a a way to construct a dataframe where there are rows without columns, but I could imagine several ways that assumption might be broken in the future. I personally like that this is very explicit and makes as few assumption about the dataframe. I'm not aware of any special cases for size
, is this a special case for Dataframes specifically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, base julia does this by checking that the length (# of element) equals 0, but that isn't a reasonable option given the debate around what length(df)
should return ( #1200).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the definition of nrows
, which is used by size
:
nrow(df::DataFrame) = ncol(df) > 0 ? length(df.columns[1])::Int : 0
So you don't need to check the number of columns again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, while I agree that the extra col check in isempty
currently isn't needed, I'd like to minimize the number of assumptions being made about the structure of the dataframe moving forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could see an argument for reversing the order though if we want to short circuit on nrows? That would currently bypass the extra check while remaining explicit and less dependent on the underlying structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, the addition of the extra col check seems fairly minimal compared to checking nrows.
julia> @benchmark size($df, 2) == 0
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 6.684 ns (0.00% GC)
median time: 6.690 ns (0.00% GC)
mean time: 6.896 ns (0.00% GC)
maximum time: 29.170 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000
julia> @benchmark size($df, 1) == 0
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 39.164 ns (0.00% GC)
median time: 39.681 ns (0.00% GC)
mean time: 40.402 ns (0.00% GC)
maximum time: 68.512 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 991
julia> @benchmark size($df, 1) == 0 || size($df, 2) == 0
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 44.666 ns (0.00% GC)
median time: 47.513 ns (0.00% GC)
mean time: 48.973 ns (0.00% GC)
maximum time: 68.798 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 988
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are probably going to be inlined and optimized out anyway, I was just thinking in terms of consistency of the code. Honestly, I don't see how in the future a data frame could be allowed to have no columns and yet contain multiple rows, but if you're afraid of that, why not...
Fixes #1230.