Caching top level expressions #578

lorenzwalthert · 2020-01-19T10:41:30Z

The cache is currently invalidated if any expression is modified. If we manage to cache top level expressions separately, we can still use the cache for all non-modified expressions, which would should improve speed very often. Closes #570.

Note: The diff is so large because all trees are rewritten as we use includeText = TRUE in getParseData(). Otherwise it would be 5k lower.

codecov-io · 2020-01-19T15:43:02Z

Codecov Report

Merging #578 into master will increase coverage by 0.29%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #578      +/-   ##
==========================================
+ Coverage   90.65%   90.95%   +0.29%     
==========================================
  Files          46       47       +1     
  Lines        2044     2111      +67     
==========================================
+ Hits         1853     1920      +67     
  Misses        191      191

Impacted Files	Coverage Δ
R/nested-to-tree.R	`100% <ø> (ø)`	⬆️
R/relevel.R	`96.92% <ø> (ø)`	⬆️
R/initialize.R	`97.05% <ø> (ø)`	⬆️
R/transform-files.R	`100% <100%> (ø)`	⬆️
R/utils-cache.R	`100% <100%> (ø)`	⬆️
R/token-create.R	`95.74% <100%> (+0.18%)`	⬆️
R/serialize.R	`100% <100%> (ø)`	⬆️
R/utils.R	`82.75% <100%> (+1.98%)`	⬆️
R/nest.R	`100% <100%> (ø)`	⬆️
R/parse.R	`85.55% <100%> (ø)`	⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b7c4d9b...4f6a890. Read the comment docs.

Not really active on expression level, just some infrastructure and code running. Must now process each block separately, i.e. parse_transform_serialize and parse_transform_serialize_block from branch caching@48a9a6

has not resolved following problems: - Cache each expression separately, processing all at once. Currently, process each expression separaetly, which takes very long. Probable problem: need to generate cache per expression before it is merged with other expressions or parse generated code at the very end to get expressions again (probably cheap but cognitive overhead) Infrastructure: - Can't remember why get_parse_data should also add the variable is_cached (and drag along transformers.

… instead of get_parse_data() more coherent conceptually and less dragging of arguments

consistently formatted and all formatting comes from initializer (that is in style guide), nothing hardcoded in styler infra.

Because the line break that is applied to the first token with apply_stylerignore may contain a line break, but since all blocks are separated by a line break there would be an extra line break added.

…look different. This is responsible for 153 files changed, 5204 insertions(+), 4856 deletions(-). We include the text so we can, if we use the cache, return text. Previously, this was not needed because we did not process by expressions, so the input text was available at the time we checked the cache (function returned from make_transformer).

…ens, so there is no error when parser version < 2 in relocate_eq_sign when a newly created parse table is bound together with the previous one.

…ize_r_block to enable processing by blocks of expressions but caching by expression (but not implemented in practive with cache_find_block()

probably not faster, but much more complicated includes quite slow creation of text of individual expressions with getParseData. parse(deparse()) was not used because it also styles the code at the same time (indention of function declaration for example). Could not find quick way to turn that off and make it return as is.

…omplicated

…ck = NA up)

rillig · 2020-02-05T03:28:26Z

R/initialize.R

+
+
+
+


What's the semantic difference between 1 empty line, 3 empty lines (as in line 22) and 5 empty lines?

rillig · 2020-02-05T03:30:36Z

R/nest.R

+#' Drop all children of a top level expression that are cached
+#'
+#' Note that we do cache top-level comments. Because package code has a lot of
+#' roxygen comments and each of them is a top level expresion, so checking is


The word "so" is wrong since the sentence starts with "Because".

rillig · 2020-02-05T03:31:20Z

R/nest.R

+#' [parse_transform_serialize_r_block()], we simply return `text` for the top
+#' level token. For that
+#' reason, the nested parse table can, at the rows where these expressions are
+#' located, be shallow, i.e. it does not have to contain a children, because it


typo: "a children" -> "children"

rillig · 2020-02-05T03:31:30Z

R/nest.R

+#' level token. For that
+#' reason, the nested parse table can, at the rows where these expressions are
+#' located, be shallow, i.e. it does not have to contain a children, because it
+#' will neighter be transformerd nor serialized anytime. This function drop all


typo: "neighter" -> "neither"
typo: "transformerd" -> "transformed"
typo: "drop" -> "drops"

rillig · 2020-02-05T03:40:53Z

R/transform-block.R

+#' Every expression is an expression itself, Expressions on same line are in
+#' same block.
+#' Multiple expressions can sit on one row, e.g. in line comment and commands
+#' seperated with ";". This creates a problem when processing each expression


typo: "seperated" -> "separated"

rillig · 2020-02-05T03:42:19Z

R/transform-block.R

+#' @param pd A top level nest.
+find_blank_lines_to_next_block <- function(pd) {
+  block_boundary <- pd$block != lag(pd$block, default = 0)
+  # TODO everywhere: block is not ambiguous. use cache block since we also have


typo: "not ambiguous" -> "ambiguous"

rillig · 2020-02-05T03:43:03Z

R/utils-cache.R

@@ -13,11 +13,32 @@ hash_standardize <- function(text) {
    list()
 }

+#' Check if text is cached
+#'
+#' This boilds down to check if the hash exists at the caching dir as a file.


typo: "boilds" -> "boils"

lorenzwalthert · 2020-02-05T06:27:47Z

Thanks @rillig, I will have a look.

lorenzwalthert force-pushed the caching-top-level-expr branch 2 times, most recently from 0fef796 to 4a45b62 Compare January 19, 2020 14:36

lorenzwalthert force-pushed the caching-top-level-expr branch 2 times, most recently from b6745e5 to 3efe926 Compare February 4, 2020 20:48

lorenzwalthert added 25 commits February 4, 2020 21:49

cache on expression level, part 1

b640559

Not really active on expression level, just some infrastructure and code running. Must now process each block separately, i.e. parse_transform_serialize and parse_transform_serialize_block from branch caching@48a9a6

only compute information about caching in parse_transform_serialize_r…

8ea1205

… instead of get_parse_data() more coherent conceptually and less dragging of arguments

remove unneded args

8881173

initialize on all levels to have parse data

d569b5c

consistently formatted and all formatting comes from initializer (that is in style guide), nothing hardcoded in styler infra.

apply start line after stylerignore

aa7aaae

Because the line break that is applied to the first token with apply_stylerignore may contain a line break, but since all blocks are separated by a line break there would be an extra line break added.

Put expressions on the same line into the same block

2b82c61

fix r cmd check

97be061

attributes can be initialized earlier, before we ever call create_tok…

6e10ea4

…ens, so there is no error when parser version < 2 in relocate_eq_sign when a newly created parse table is bound together with the previous one.

to make caching work, you must only add start_line after cache.

e9bba1b

styling plus don't run on blocks if cache is deactivated

bc83cd5

move caching and roundtrip verification out of parse_transform_serial…

85dc320

…ize_r_block to enable processing by blocks of expressions but caching by expression (but not implemented in practive with cache_find_block()

document

9750da5

fix r cmd check

037aabf

only cache if cache is activated

aa59e41

include comments

43f7676

make it work but not in general

644d0ed

alternative appraoch to making pd shallow

2921c7e

deactivate the cache for this vignette because it makes things more c…

1147d7f

…omplicated

document

a5f6dab

add high-level tests

7c6e7c6

do not cache comments because it's expensive

b3ce4e9

refactor and extend tests

9e6a99f

lorenzwalthert added 2 commits February 4, 2020 21:49

refactor writing to cache

8564e74

document

7679c2a

lorenzwalthert force-pushed the caching-top-level-expr branch from 3efe926 to 7679c2a Compare February 4, 2020 20:49

lorenzwalthert added 6 commits February 4, 2020 21:53

bump version for cache invalidation

fc22511

r cmd check

e9e7f9d

update news

b3c5ba7

Add vignette that describes caching feature

4ae8fe5

make things work for parser version < 2 (relocate eq_assign moves blo…

bcbfe2d

…ck = NA up)

don't eval

c5bd163

lorenzwalthert force-pushed the caching-top-level-expr branch from 3bc86a6 to c5bd163 Compare February 4, 2020 22:48

lorenzwalthert added 2 commits February 4, 2020 23:54

need block anyways on all levels

f1b3e4a

strincter benchmarks

824e2a8

lorenzwalthert mentioned this pull request Feb 4, 2020

Styler is slow #558

Open

rillig reviewed Feb 5, 2020

View reviewed changes

lorenzwalthert added 2 commits February 5, 2020 08:12

typos found by @rilling.

cdbe1bf

document and more tests

4f6a890

lorenzwalthert marked this pull request as ready for review February 5, 2020 19:09

lorenzwalthert merged commit a87198f into r-lib:master Feb 5, 2020

This was referenced Feb 6, 2020

revert destructive CI-Commit cynkra/dm#267

Closed

make R.cache an import #589

Merged

lorenzwalthert deleted the caching-top-level-expr branch February 11, 2020 12:35

lorenzwalthert restored the caching-top-level-expr branch November 28, 2020 11:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Caching top level expressions #578

Caching top level expressions #578

lorenzwalthert commented Jan 19, 2020 •

edited

Loading

codecov-io commented Jan 19, 2020 •

edited

Loading

rillig Feb 5, 2020

rillig Feb 5, 2020

rillig Feb 5, 2020

rillig Feb 5, 2020

rillig Feb 5, 2020

rillig Feb 5, 2020

rillig Feb 5, 2020

lorenzwalthert commented Feb 5, 2020

Caching top level expressions #578

Caching top level expressions #578

Conversation

lorenzwalthert commented Jan 19, 2020 • edited Loading

codecov-io commented Jan 19, 2020 • edited Loading

Codecov Report

rillig Feb 5, 2020

Choose a reason for hiding this comment

rillig Feb 5, 2020

Choose a reason for hiding this comment

rillig Feb 5, 2020

Choose a reason for hiding this comment

rillig Feb 5, 2020

Choose a reason for hiding this comment

rillig Feb 5, 2020

Choose a reason for hiding this comment

rillig Feb 5, 2020

Choose a reason for hiding this comment

rillig Feb 5, 2020

Choose a reason for hiding this comment

lorenzwalthert commented Feb 5, 2020

lorenzwalthert commented Jan 19, 2020 •

edited

Loading

codecov-io commented Jan 19, 2020 •

edited

Loading