-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend unnecessary_lambda_linter to look for "inner comparisons" #2300
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2300 +/- ##
=======================================
Coverage 99.39% 99.40%
=======================================
Files 122 122
Lines 5495 5523 +28
=======================================
+ Hits 5462 5490 +28
Misses 33 33 ☔ View full report in Codecov by Sentry. |
Is it really more efficient to not produce a logical vector directly via vapply? |
well, there's the cases where it's really obviously worse:
But I guess you have in mind more general cases: DF = data.frame(replicate(times = 20L, rnorm(1000L), simplify = FALSE))
microbenchmark(times = 10000L, vapply(DF, sum, numeric(1L)) > 0, vapply(DF, function(x) sum(x) > 0, logical(1L)))
# Unit: microseconds
# expr min lq mean median uq max neval cld
# vapply(DF, sum, numeric(1L)) > 0 29.647 31.276 32.87099 31.749 32.2575 2662.433 10000 a
# vapply(DF, function(x) sum(x) > 0, logical(1L)) 36.162 38.470 40.01192 39.138 39.9675 2746.665 10000 b In cases where the lambda can't be eliminated, though, it's basically a toss-up: microbenchmark(times = 10000L, vapply(DF, function(x) sum(abs(x)), numeric(1L)) > 10, vapply(DF, function(x) sum(abs(x)) > 10, logical(1L)))
# Unit: microseconds
# expr min lq mean median uq max neval cld
# vapply(DF, function(x) sum(abs(x)), numeric(1L)) > 10 56.019 60.498 98.04062 101.6445 103.8465 10194.537 10000 a
# vapply(DF, function(x) sum(abs(x)) > 10, logical(1L)) 55.706 60.430 96.81288 101.4660 103.7045 8070.142 10000 a |
Funny enough microbenchmark(times = 10000L, vapply(DF, sum, numeric(1L)) > 0, vapply(DF, function(x) sum(x) > 0, logical(1L)), colSums(DF) > 0)
# Unit: microseconds
# expr min lq mean median uq max neval
# vapply(DF, sum, numeric(1L)) > 0 20.243 21.6950 23.40762 22.4530 23.4465 68.499 10000
# vapply(DF, function(x) sum(x) > 0, logical(1L)) 24.900 26.6695 28.85965 27.5250 28.7680 1169.621 10000
# colSums(DF) > 0 96.006 101.3280 123.05545 105.1525 109.8010 5607.288 10000 |
From the experiments - should we make the linter more conservative in the sense that it only lints lambdas of the form I am getting a small but consistently better performance for microbenchmark(times = 10000L, vapply(DF, function(x) sum(abs(x)), numeric(1L)) > 10, vapply(DF, function(x) sum(abs(x)) > 10, logical(1L)))
# Unit: microseconds
# expr min lq mean median uq max neval
# vapply(DF, function(x) sum(abs(x)), numeric(1L)) > 10 36.435 38.459 48.00805 39.7680 40.877 3371.193 10000
# vapply(DF, function(x) sum(abs(x)) > 10, logical(1L)) 36.440 38.406 47.02529 39.6865 40.861 3849.850 10000 |
Just for Anyway, looks like my comment was lost or I never hit enter -- I am wondering if we should just merge this logic into |
Why? It doesn't seem too bad for readability?
SGTM |
Interpreting as "merge with Marking as draft until I can look at the recent |
OK, now |
Part of #884
No hits on {lintr}.
Here's another one that's a bit in limbo since we ultimately decided against using it internally, the argument being that sometimes having the comparison as close as possible to its usage is preferable for readability.
If we think this is still on balance useful enough to include, there's a few clean-up steps that need to be added, e.g. customizing the lint message for
sapply()
vs.vapply()
.