-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Caching top level expressions #578
Caching top level expressions #578
Conversation
0fef796
to
4a45b62
Compare
Codecov Report
@@ Coverage Diff @@
## master #578 +/- ##
==========================================
+ Coverage 90.65% 90.95% +0.29%
==========================================
Files 46 47 +1
Lines 2044 2111 +67
==========================================
+ Hits 1853 1920 +67
Misses 191 191
Continue to review full report at Codecov.
|
b6745e5
to
3efe926
Compare
Not really active on expression level, just some infrastructure and code running. Must now process each block separately, i.e. parse_transform_serialize and parse_transform_serialize_block from branch caching@48a9a6
has not resolved following problems: - Cache each expression separately, processing all at once. Currently, process each expression separaetly, which takes very long. Probable problem: need to generate cache per expression before it is merged with other expressions or parse generated code at the very end to get expressions again (probably cheap but cognitive overhead) Infrastructure: - Can't remember why get_parse_data should also add the variable is_cached (and drag along transformers.
… instead of get_parse_data() more coherent conceptually and less dragging of arguments
consistently formatted and all formatting comes from initializer (that is in style guide), nothing hardcoded in styler infra.
Because the line break that is applied to the first token with apply_stylerignore may contain a line break, but since all blocks are separated by a line break there would be an extra line break added.
…look different. This is responsible for 153 files changed, 5204 insertions(+), 4856 deletions(-). We include the text so we can, if we use the cache, return text. Previously, this was not needed because we did not process by expressions, so the input text was available at the time we checked the cache (function returned from make_transformer).
…ens, so there is no error when parser version < 2 in relocate_eq_sign when a newly created parse table is bound together with the previous one.
…ize_r_block to enable processing by blocks of expressions but caching by expression (but not implemented in practive with cache_find_block()
probably not faster, but much more complicated includes quite slow creation of text of individual expressions with getParseData. parse(deparse()) was not used because it also styles the code at the same time (indention of function declaration for example). Could not find quick way to turn that off and make it return as is.
3efe926
to
7679c2a
Compare
3bc86a6
to
c5bd163
Compare
R/initialize.R
Outdated
|
||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the semantic difference between 1 empty line, 3 empty lines (as in line 22) and 5 empty lines?
R/nest.R
Outdated
#' Drop all children of a top level expression that are cached | ||
#' | ||
#' Note that we do cache top-level comments. Because package code has a lot of | ||
#' roxygen comments and each of them is a top level expresion, so checking is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The word "so" is wrong since the sentence starts with "Because".
R/nest.R
Outdated
#' [parse_transform_serialize_r_block()], we simply return `text` for the top | ||
#' level token. For that | ||
#' reason, the nested parse table can, at the rows where these expressions are | ||
#' located, be shallow, i.e. it does not have to contain a children, because it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: "a children" -> "children"
R/nest.R
Outdated
#' level token. For that | ||
#' reason, the nested parse table can, at the rows where these expressions are | ||
#' located, be shallow, i.e. it does not have to contain a children, because it | ||
#' will neighter be transformerd nor serialized anytime. This function drop all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: "neighter" -> "neither"
typo: "transformerd" -> "transformed"
typo: "drop" -> "drops"
R/transform-block.R
Outdated
#' Every expression is an expression itself, Expressions on same line are in | ||
#' same block. | ||
#' Multiple expressions can sit on one row, e.g. in line comment and commands | ||
#' seperated with ";". This creates a problem when processing each expression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: "seperated" -> "separated"
R/transform-block.R
Outdated
#' @param pd A top level nest. | ||
find_blank_lines_to_next_block <- function(pd) { | ||
block_boundary <- pd$block != lag(pd$block, default = 0) | ||
# TODO everywhere: block is not ambiguous. use cache block since we also have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: "not ambiguous" -> "ambiguous"
R/utils-cache.R
Outdated
@@ -13,11 +13,32 @@ hash_standardize <- function(text) { | |||
list() | |||
} | |||
|
|||
#' Check if text is cached | |||
#' | |||
#' This boilds down to check if the hash exists at the caching dir as a file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: "boilds" -> "boils"
Thanks @rillig, I will have a look. |
The cache is currently invalidated if any expression is modified. If we manage to cache top level expressions separately, we can still use the cache for all non-modified expressions, which would should improve speed very often. Closes #570.
Do only add variables block and is_cached (if at all) after than
get_parse_data (but before parse_transform_serialize_r_block because when we
do block processing this might be too late) to not drag along
transformers everywhere totally unrelated. When processing blocks, make sure
to cache individual values where they are created.
process in block, not every expression.
expressions on the same line must be processed together, i.e. when separated
by semi-colon or comment, they belong to the same block.
think about the interaction with create_tokens and that each column should
potentially exist on every level, not just top level.
the text attibute of a top level nest is not used to create a hash for
caching, but is used to return text in case no styling is needed because the
value is cached.
resolve line break problem.
Add tests. Both low-level and high-level.
compute_parse_data_nested()
should not compute the parse data nested for expressions that are cached already because for them, we just returntext
of the top level expression.Put all consecutive non-cached expressions into one block to speed up things. Maybe this does not help as much as expected. Need more comprehensive benchmarks.
nice to have: Cache after styling each expression separately, so styling aDoes not seem to work because we must have blocks first.file with ten identical expressions will use cache for 9 of them.
Note: The diff is so large because all trees are rewritten as we use
includeText = TRUE
ingetParseData()
. Otherwise it would be 5k lower.