Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unused transformers before processing #711

Merged
merged 31 commits into from
Jan 25, 2021

Conversation

lorenzwalthert
Copy link
Collaborator

@lorenzwalthert lorenzwalthert commented Jan 3, 2021

Some transformers like the one turning a;b -> a\nb are rarely used, as the token ; rarely ocurrs in data. But we apply the associated transformer force_assignment_op() on every nest, knowing in advance that we'll never use it. This is expensive. There are quite some transformers that we know will be used infrequently. The idea of this PR is to extend create_style_guide() to take a new subset_transformer transformers_drop argument, which is a list that contains the names of the transformers and which token must be missing for the rule to be included in the final set of transformers used:

transformers_drop <- list(
    force_assignment_op = "EQ_ASSIGN",
    add_line_break_after_pipe = "SPECIAL-PIPE",
    wrap_if_else_while_for_fun_multi_line_in_curly = c("IF", "WHILE", "FOR", "FUNCTION"),
    ...
)

If the parse table contains no EQ_ASSIGN token for example, the force_asignment_op transformer will be removed from the set of transformers before these are applied at every nest.

Idea originated in #558 (comment).

Todo:

  • example in create_style_guide().
  • harmonise scope and arguments in create_style_guide().
  • add type stability with specify_*() instead of list() as input for subset_transformers().
  • transformers_drop = specify_transformer_dropping() Noohh

Here is how a stupid removal of the edge case rules would change the timings:

Remove all edge case transformers (f048a33):

  • cache_applying: 0.03 -> 0.03 (0%)
  • cache_recording: 0.89 -> 0.79 (-10.9%)
  • without_cache: 2.7 -> 2.34 (-13.4%)

Bump CI (3fdea5a):

  • cache_applying: 0.03 -> 0.03 (-11.5%)
  • cache_recording: 0.96 -> 0.86 (-10.5%)
  • without_cache: 3 -> 2.7 (-10.1%)

First draft implementation for touchstone (a87c7f9):

  • cache_applying: 0.03 -> 0.03 (-8.7%)
  • cache_recording: 0.89 -> 0.8 (-10.2%)
  • without_cache: 2.56 -> 2.41 (-6%)

fix tests (6624000) (includes use more elegant approach with reduce and some tests):

  • cache_applying: 0.03 -> 0.03 (-8%)
  • cache_recording: 0.99 -> 0.92 (-7.8%)
  • without_cache: 2.99 -> 2.8 (-6.3%)

bump CI (3061e30):

  • cache_applying: 0.03 -> 0.03 (-7.8%)
  • cache_recording: 1.03 -> 0.97 (-6.6%)
  • without_cache: 3.07 -> 2.86 (-6.9%)

replace loop with walk for speed (3061e30):

  • cache_applying: 0.03 -> 0.03 (-7.8%)
  • cache_recording: 1.01 -> 0.99 (-1.7%)
  • without_cache: 2.97 -> 3.06 (3%)

resort to simple loop (4c1f95a):

  • cache_applying: 0.03 -> 0.03 (-9.5%)
  • cache_recording: 0.96 -> 0.89 (-7.7%)
  • without_cache: 2.88 -> 2.6 (-9.7%)

loop 2, no material change (04c609c)

  • cache_applying: 0.03 -> 0.03 (-11.1%)
  • cache_recording: 0.9 -> 0.84 (-7%)
  • without_cache: 2.66 -> 2.42 (-9.2%)

loop 3, no material change (781f796)

  • cache_applying: 0.03 -> 0.03 (-9.3%)
  • cache_recording: 1.02 -> 0.9 (-11.6%)
  • without_cache: 3.04 -> 2.74 (-9.8%)

also include indent (f71deae):

  • cache_applying: 0.03 -> 0.03 (-6.8%)
  • cache_recording: 1.01 -> 0.92 (-9.3%)
  • without_cache: 3.03 -> 2.81 (-7.3%)

another hack for old R version (5523a8f):
cache_applying: 0.04 -> 0.05 (18.7%)
cache_recording: 1.45 -> 1.97 (35.8%)
without_cache: 4.17 -> 3.62 (-13.2%)

if over ifelse (35976ef):

  • cache_applying: 0.03 -> 0.03 (-6.6%)
  • cache_recording: 1.03 -> 0.97 (-5.5%)
  • without_cache: 3.01 -> 2.83 (-6%)

save another assignment (fd4440c):

  • cache_applying: 0.04 -> 0.04 (-5.5%)
  • cache_recording: 1.33 -> 1.24 (-6.9%)
  • without_cache: 3.92 -> 3.7 (-5.7%)

get rid of condition that always holds (4e7d97f):

  • cache_applying: 0.03 -> 0.03 (-5.9%)
  • cache_recording: 0.93 -> 0.85 (-9%)
  • without_cache: 2.77 -> 2.58 (-6.8%)

don't remove some transformers that won't ever be used (1b5498e):
cache_applying: 0.03 -> 0.03 (-7.2%)
cache_recording: 1.06 -> 0.95 (-10.4%)
without_cache: 3.07 -> 2.88 (-6.3%)

list is so much faster than lst (62ebd4b):

  • cache_applying: 0.04 -> 0.03 (-16.5%)
  • cache_recording: 1.37 -> 1.3 (-5.3%)
  • without_cache: 3.97 -> 3.66 (-7.8%)

bump CI (b80a0f2):

  • cache_applying: 0.03 -> 0.03 (-19.9%)
  • cache_recording: 1.02 -> 0.94 (-8%)
  • without_cache: 3.05 -> 2.75 (-9.8%)

can't use if condition in assignment because it will assign NULL to (c430adc):

  • cache_applying: 0.03 -> 0.03 (-18%)

  • cache_recording: 1 -> 0.94 (-6.3%)

  • without_cache: 2.96 -> 2.73 (-7.7%)

  • cache_applying: 0.03 -> 0.03 (-18.9%)

  • cache_recording: 0.99 -> 0.92 (-6.6%)

  • without_cache: 2.23 -> 2.08 (-6.6%)

more docs (96ede0b);

  • cache_applying: 0.03 -> 0.02 (-21.4%)
  • cache_recording: 0.93 -> 0.85 (-8.2%)
  • without_cache: 2.76 -> 2.49 (-9.9%)

(4e8a248):

  • cache_applying: 0.03 -> 0.02 (-20.1%)
  • cache_recording: 0.92 -> 0.84 (-8.5%)
  • without_cache: 2.73 -> 2.46 (-10.2%)

This is promising. Now we'll see if the overhead from filtering the rules is smaller so we are left with some interesting gain.

  • Document
  • add tests for transformer_subset() directly.
  • consider doing subsetting for every block instead separately to have even more speed gain.
  • allow for subset_transformers to be NULL or similar.
  • input validation for subset_transformers() (must match names of rules).

@codecov-io
Copy link

codecov-io commented Jan 10, 2021

Codecov Report

Merging #711 (4e8a248) into master (f1e6d9e) will increase coverage by 0.17%.
The diff coverage is 99.13%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #711      +/-   ##
==========================================
+ Coverage   90.49%   90.67%   +0.17%     
==========================================
  Files          47       47              
  Lines        2273     2326      +53     
==========================================
+ Hits         2057     2109      +52     
- Misses        216      217       +1     
Impacted Files Coverage Δ
R/rules-spaces.R 98.30% <ø> (-0.07%) ⬇️
R/style-guides.R 99.39% <98.85%> (-0.61%) ⬇️
R/testing.R 78.29% <100.00%> (+1.82%) ⬆️
R/transform-code.R 95.55% <100.00%> (ø)
R/transform-files.R 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f1e6d9e...4e8a248. Read the comment docs.

@lorenzwalthert lorenzwalthert changed the title Benchmark study: remove edge case transformers Remove unused transformers before processing Jan 23, 2021
@lorenzwalthert lorenzwalthert merged commit 3337939 into r-lib:master Jan 25, 2021
@lorenzwalthert lorenzwalthert deleted the subset-transformers branch January 25, 2021 22:30
@lorenzwalthert lorenzwalthert restored the subset-transformers branch February 20, 2021 22:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants