Remove unused transformers before processing #711

lorenzwalthert · 2021-01-03T10:41:27Z

Some transformers like the one turning a;b -> a\nb are rarely used, as the token ; rarely ocurrs in data. But we apply the associated transformer force_assignment_op() on every nest, knowing in advance that we'll never use it. This is expensive. There are quite some transformers that we know will be used infrequently. The idea of this PR is to extend create_style_guide() to take a new ~~subset_transformer~~ transformers_drop argument, which is a list that contains the names of the transformers and which token must be missing for the rule to be included in the final set of transformers used:

transformers_drop <- list(
    force_assignment_op = "EQ_ASSIGN",
    add_line_break_after_pipe = "SPECIAL-PIPE",
    wrap_if_else_while_for_fun_multi_line_in_curly = c("IF", "WHILE", "FOR", "FUNCTION"),
    ...
)

If the parse table contains no EQ_ASSIGN token for example, the force_asignment_op transformer will be removed from the set of transformers before these are applied at every nest.

Idea originated in #558 (comment).

Todo:

example in create_style_guide().
harmonise scope and arguments in create_style_guide().
add type stability with specify_*() instead of list() as input for subset_transformers().
transformers_drop = specify_transformer_dropping() Noohh

Here is how a stupid removal of the edge case rules would change the timings:

Remove all edge case transformers (f048a33):

cache_applying: 0.03 -> 0.03 (0%)
cache_recording: 0.89 -> 0.79 (-10.9%)
without_cache: 2.7 -> 2.34 (-13.4%)

Bump CI (3fdea5a):

cache_applying: 0.03 -> 0.03 (-11.5%)
cache_recording: 0.96 -> 0.86 (-10.5%)
without_cache: 3 -> 2.7 (-10.1%)

First draft implementation for touchstone (a87c7f9):

cache_applying: 0.03 -> 0.03 (-8.7%)
cache_recording: 0.89 -> 0.8 (-10.2%)
without_cache: 2.56 -> 2.41 (-6%)

fix tests (6624000) (includes use more elegant approach with reduce and some tests):

cache_applying: 0.03 -> 0.03 (-8%)
cache_recording: 0.99 -> 0.92 (-7.8%)
without_cache: 2.99 -> 2.8 (-6.3%)

bump CI (3061e30):

cache_applying: 0.03 -> 0.03 (-7.8%)
cache_recording: 1.03 -> 0.97 (-6.6%)
without_cache: 3.07 -> 2.86 (-6.9%)

replace loop with walk for speed (3061e30):

cache_applying: 0.03 -> 0.03 (-7.8%)
cache_recording: 1.01 -> 0.99 (-1.7%)
without_cache: 2.97 -> 3.06 (3%)

resort to simple loop (4c1f95a):

cache_applying: 0.03 -> 0.03 (-9.5%)
cache_recording: 0.96 -> 0.89 (-7.7%)
without_cache: 2.88 -> 2.6 (-9.7%)

loop 2, no material change (04c609c)

cache_applying: 0.03 -> 0.03 (-11.1%)
cache_recording: 0.9 -> 0.84 (-7%)
without_cache: 2.66 -> 2.42 (-9.2%)

loop 3, no material change (781f796)

cache_applying: 0.03 -> 0.03 (-9.3%)
cache_recording: 1.02 -> 0.9 (-11.6%)
without_cache: 3.04 -> 2.74 (-9.8%)

also include indent (f71deae):

cache_applying: 0.03 -> 0.03 (-6.8%)
cache_recording: 1.01 -> 0.92 (-9.3%)
without_cache: 3.03 -> 2.81 (-7.3%)

another hack for old R version (5523a8f):
cache_applying: 0.04 -> 0.05 (18.7%)
cache_recording: 1.45 -> 1.97 (35.8%)
without_cache: 4.17 -> 3.62 (-13.2%)

if over ifelse (35976ef):

cache_applying: 0.03 -> 0.03 (-6.6%)
cache_recording: 1.03 -> 0.97 (-5.5%)
without_cache: 3.01 -> 2.83 (-6%)

save another assignment (fd4440c):

cache_applying: 0.04 -> 0.04 (-5.5%)
cache_recording: 1.33 -> 1.24 (-6.9%)
without_cache: 3.92 -> 3.7 (-5.7%)

get rid of condition that always holds (4e7d97f):

cache_applying: 0.03 -> 0.03 (-5.9%)
cache_recording: 0.93 -> 0.85 (-9%)
without_cache: 2.77 -> 2.58 (-6.8%)

don't remove some transformers that won't ever be used (1b5498e):
cache_applying: 0.03 -> 0.03 (-7.2%)
cache_recording: 1.06 -> 0.95 (-10.4%)
without_cache: 3.07 -> 2.88 (-6.3%)

list is so much faster than lst (62ebd4b):

cache_applying: 0.04 -> 0.03 (-16.5%)
cache_recording: 1.37 -> 1.3 (-5.3%)
without_cache: 3.97 -> 3.66 (-7.8%)

bump CI (b80a0f2):

cache_applying: 0.03 -> 0.03 (-19.9%)
cache_recording: 1.02 -> 0.94 (-8%)
without_cache: 3.05 -> 2.75 (-9.8%)

can't use if condition in assignment because it will assign NULL to (c430adc):

cache_applying: 0.03 -> 0.03 (-18%)
cache_recording: 1 -> 0.94 (-6.3%)
without_cache: 2.96 -> 2.73 (-7.7%)
cache_applying: 0.03 -> 0.03 (-18.9%)
cache_recording: 0.99 -> 0.92 (-6.6%)
without_cache: 2.23 -> 2.08 (-6.6%)

more docs (96ede0b);

cache_applying: 0.03 -> 0.02 (-21.4%)
cache_recording: 0.93 -> 0.85 (-8.2%)
without_cache: 2.76 -> 2.49 (-9.9%)

(4e8a248):

cache_applying: 0.03 -> 0.02 (-20.1%)
cache_recording: 0.92 -> 0.84 (-8.5%)
without_cache: 2.73 -> 2.46 (-10.2%)

This is promising. Now we'll see if the overhead from filtering the rules is smaller so we are left with some interesting gain.

Document
add tests for transformer_subset() directly.
consider doing subsetting for every block instead separately to have even more speed gain.
allow for subset_transformers to be NULL or similar.
input validation for subset_transformers() (must match names of rules).

codecov-io · 2021-01-10T22:30:44Z

Codecov Report

Merging #711 (4e8a248) into master (f1e6d9e) will increase coverage by 0.17%.
The diff coverage is 99.13%.

@@            Coverage Diff             @@
##           master     #711      +/-   ##
==========================================
+ Coverage   90.49%   90.67%   +0.17%     
==========================================
  Files          47       47              
  Lines        2273     2326      +53     
==========================================
+ Hits         2057     2109      +52     
- Misses        216      217       +1

Impacted Files	Coverage Δ
R/rules-spaces.R	`98.30% <ø> (-0.07%)`	⬇️
R/style-guides.R	`99.39% <98.85%> (-0.61%)`	⬇️
R/testing.R	`78.29% <100.00%> (+1.82%)`	⬆️
R/transform-code.R	`95.55% <100.00%> (ø)`
R/transform-files.R	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f1e6d9e...4e8a248. Read the comment docs.

Now we see it's working, we should do a real implementation This reverts commit f048a33.

they will almost allways be present unless the expression is like 1+1 which is pretty quick to style anyways

…he element (instead of not assigning anything

lorenzwalthert added 8 commits January 23, 2021 18:10

remove all edge case transformers and see speed implications

d978797

Revert "remove all edge case transformers and see speed implications"

8cd6563

Now we see it's working, we should do a real implementation This reverts commit f048a33.

first implementation draft for touchstone

82a7158

use more elegant approach with reduce and some tests

1fd7d80

fix tests

6d08838

replace loop with walk for speed

d3c90bd

more extensive tests for old style guides

48f53a3

resolve with simple for loop

4c1f95a

lorenzwalthert force-pushed the subset-transformers branch from 3061e30 to 4c1f95a Compare January 23, 2021 20:54

lorenzwalthert added 5 commits January 23, 2021 22:03

fix test

04c609c

fix R cmd check

781f796

make things work for R < 3.6

9c4fc1e

boost by fix typo and remove indention too

f71deae

another hack for old R versions

5523a8f

lorenzwalthert changed the title ~~Benchmark study: remove edge case transformers~~ Remove unused transformers before processing Jan 23, 2021

lorenzwalthert added 14 commits January 24, 2021 00:44

if over ifelse

35976ef

back to normal formatting

258b677

save another assignment

fd4440c

remove apparently useless rule

21b1f2a

get rid of condition that always hold

4e7d97f

deactivate removal for some very common operators

1b5498e

they will almost allways be present unless the expression is like 1+1 which is pretty quick to style anyways

more doc, formalize dropper

5da948b

doc improvement

25cf594

add news bullet

7dac9eb

list is so much faster than lst

62ebd4b

put name on right side of if

80fee9a

bump CI

b80a0f2

can't use if condition in assignment because it will assign NULL to t…

c430adc

…he element (instead of not assigning anything

more thorough transformer testing

e2f3931

more docs

96ede0b

lorenzwalthert force-pushed the subset-transformers branch from 51a0fee to 96ede0b Compare January 25, 2021 11:08

lorenzwalthert added 2 commits January 25, 2021 15:01

random roxygenize

45dca2a

consistently use transformers_drop

3f3ab1c

lorenzwalthert force-pushed the subset-transformers branch from 578c491 to 3f3ab1c Compare January 25, 2021 18:56

minor consistency edits

4e8a248

lorenzwalthert merged commit 3337939 into r-lib:master Jan 25, 2021

lorenzwalthert deleted the subset-transformers branch January 25, 2021 22:30

lorenzwalthert restored the subset-transformers branch February 20, 2021 22:48

This was referenced Mar 25, 2021

Styler is slow #558

Open

Understand where styler spends most of the time #759

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove unused transformers before processing #711

Remove unused transformers before processing #711

lorenzwalthert commented Jan 3, 2021 •

edited

Loading

codecov-io commented Jan 10, 2021 •

edited

Loading

Remove unused transformers before processing #711

Remove unused transformers before processing #711

Conversation

lorenzwalthert commented Jan 3, 2021 • edited Loading

codecov-io commented Jan 10, 2021 • edited Loading

Codecov Report

lorenzwalthert commented Jan 3, 2021 •

edited

Loading

codecov-io commented Jan 10, 2021 •

edited

Loading