Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend unnecessary_lambda_linter to look for "inner comparisons" #2300

Merged
merged 18 commits into from
Nov 21, 2023

Conversation

MichaelChirico
Copy link
Collaborator

Part of #884

No hits on {lintr}.

Here's another one that's a bit in limbo since we ultimately decided against using it internally, the argument being that sometimes having the comparison as close as possible to its usage is preferable for readability.

If we think this is still on balance useful enough to include, there's a few clean-up steps that need to be added, e.g. customizing the lint message for sapply() vs. vapply().

@codecov-commenter
Copy link

codecov-commenter commented Nov 17, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (ef82966) 99.39% compared to head (624c3f2) 99.40%.

❗ Current head 624c3f2 differs from pull request most recent head 9620a8c. Consider uploading reports for the commit 9620a8c to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2300   +/-   ##
=======================================
  Coverage   99.39%   99.40%           
=======================================
  Files         122      122           
  Lines        5495     5523   +28     
=======================================
+ Hits         5462     5490   +28     
  Misses         33       33           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@AshesITR
Copy link
Collaborator

Is it really more efficient to not produce a logical vector directly via vapply?

@MichaelChirico
Copy link
Collaborator Author

Is it really more efficient to not produce a logical vector directly via vapply?

well, there's the cases where it's really obviously worse:

v = sample(letters, 1e7, TRUE)

microbenchmark(times = 10L, vapply(v, function(vi) vi == 'a', logical(1L)), vapply(v, `==`, 'a', FUN.VALUE=logical(1L)), v == 'a')
# Unit: milliseconds
#                                            expr        min         lq
#  vapply(v, function(vi) vi == "a", logical(1L)) 6218.61526 6272.04090
#   vapply(v, `==`, "a", FUN.VALUE = logical(1L)) 3660.20617 3745.33906
#                                        v == "a"   46.01302   46.97116
#        mean     median         uq        max neval cld
#  6486.86968 6457.15351 6526.12283 7031.29944    10 a  
#  3950.88407 3879.94950 4201.16618 4397.80781    10  b 
#    48.88616   47.88342   49.99104   56.04004    10   c

But I guess you have in mind more general cases:

DF = data.frame(replicate(times = 20L, rnorm(1000L), simplify = FALSE))

microbenchmark(times = 10000L, vapply(DF, sum, numeric(1L)) > 0, vapply(DF, function(x) sum(x) > 0, logical(1L)))
# Unit: microseconds
#                                             expr    min     lq     mean median      uq      max neval cld
#                 vapply(DF, sum, numeric(1L)) > 0 29.647 31.276 32.87099 31.749 32.2575 2662.433 10000  a 
#  vapply(DF, function(x) sum(x) > 0, logical(1L)) 36.162 38.470 40.01192 39.138 39.9675 2746.665 10000   b

In cases where the lambda can't be eliminated, though, it's basically a toss-up:

microbenchmark(times = 10000L, vapply(DF, function(x) sum(abs(x)), numeric(1L)) > 10, vapply(DF, function(x) sum(abs(x)) > 10, logical(1L)))
# Unit: microseconds
#                                                   expr    min     lq     mean   median       uq       max neval cld
#  vapply(DF, function(x) sum(abs(x)), numeric(1L)) > 10 56.019 60.498 98.04062 101.6445 103.8465 10194.537 10000   a
#  vapply(DF, function(x) sum(abs(x)) > 10, logical(1L)) 55.706 60.430 96.81288 101.4660 103.7045  8070.142 10000   a

@AshesITR
Copy link
Collaborator

Funny enough colSums is actually even worse in the example:

microbenchmark(times = 10000L, vapply(DF, sum, numeric(1L)) > 0, vapply(DF, function(x) sum(x) > 0, logical(1L)), colSums(DF) > 0)
# Unit: microseconds
#                                             expr    min       lq      mean   median       uq      max neval
#                 vapply(DF, sum, numeric(1L)) > 0 20.243  21.6950  23.40762  22.4530  23.4465   68.499 10000
#  vapply(DF, function(x) sum(x) > 0, logical(1L)) 24.900  26.6695  28.85965  27.5250  28.7680 1169.621 10000
#                                  colSums(DF) > 0 96.006 101.3280 123.05545 105.1525 109.8010 5607.288 10000

@AshesITR
Copy link
Collaborator

AshesITR commented Nov 19, 2023

From the experiments - should we make the linter more conservative in the sense that it only lints lambdas of the form function(x) foo(x) > y with arbitrary comparison and a one or zero calls foo()?

I am getting a small but consistently better performance for vapply(DF, function(x) sum(abs(x)) > 10 too

microbenchmark(times = 10000L, vapply(DF, function(x) sum(abs(x)), numeric(1L)) > 10, vapply(DF, function(x) sum(abs(x)) > 10, logical(1L)))
# Unit: microseconds
#                                                  expr    min     lq     mean  median     uq      max neval
#  vapply(DF, function(x) sum(abs(x)), numeric(1L)) > 10 36.435 38.459 48.00805 39.7680 40.877 3371.193 10000
#  vapply(DF, function(x) sum(abs(x)) > 10, logical(1L)) 36.440 38.406 47.02529 39.6865 40.861 3849.850 10000

@MichaelChirico
Copy link
Collaborator Author

should we make the linter more conservative in the sense that it only lints lambdas of the form function(x) foo(x) > y with arbitrary comparison and a one or zero calls foo()?

Just for vapply() though, right? Or we can just drop vapply() anyway (since the FUN.VALUE= argument gets in the way of making the "inner" vs. "outer" versions look close enough for readability.

Anyway, looks like my comment was lost or I never hit enter -- I am wondering if we should just merge this logic into unnecessary_lambda_linter() so that foo(x) {op} {const} and x {op} {const} get lints.

@AshesITR
Copy link
Collaborator

AshesITR commented Nov 19, 2023

Just for vapply() though, right? Or we can just drop vapply() anyway (since the FUN.VALUE= argument gets in the way of making the "inner" vs. "outer" versions look close enough for readability.

Why? It doesn't seem too bad for readability?

Anyway, looks like my comment was lost or I never hit enter -- I am wondering if we should just merge this logic into unnecessary_lambda_linter() so that foo(x) {op} {const} and x {op} {const} get lints.

SGTM

@MichaelChirico
Copy link
Collaborator Author

MichaelChirico commented Nov 19, 2023

SGTM

Interpreting as "merge with unnecessary_lambda_linter()."

Marking as draft until I can look at the recent unnecessary_lambda_linter() issue, if there's something to fix there I'd prefer to do that first before attempting the merge.

@MichaelChirico MichaelChirico marked this pull request as draft November 19, 2023 19:24
@MichaelChirico MichaelChirico changed the base branch from main to ul-rhs-in November 20, 2023 06:00
@MichaelChirico MichaelChirico marked this pull request as ready for review November 20, 2023 06:36
@MichaelChirico
Copy link
Collaborator Author

MichaelChirico commented Nov 20, 2023

OK, now based against #2320 and incorporated into unnecessary_lambda_linter(). It's still in an immature state, but pretty close to ready for initial merge.

Base automatically changed from ul-rhs-in to main November 20, 2023 06:46
NEWS.md Outdated Show resolved Hide resolved
@MichaelChirico MichaelChirico changed the title New inner_comparison_linter Extend unnecessary_lambda_linter to look for "inner comparisons" Nov 20, 2023
AshesITR
AshesITR previously approved these changes Nov 21, 2023
@MichaelChirico MichaelChirico merged commit 209e250 into main Nov 21, 2023
20 checks passed
@MichaelChirico MichaelChirico deleted the inner_comparison branch November 21, 2023 07:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants