-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Templating within a remakefile #2
Comments
The output/%.csv:
depends: %
rule: export_csv
target_argument_name: filename Would that be an option? |
That's really nice, yeah. And I like the connection to the make rules. The use case that triggered this was slightly more complicated though (and done manually). The repo is at dfalster/baad:
Not very pretty but it worked. Getting the wildcard bit (#70) could work for file based patterns, but for looping over sets of objects it could all get a bit nastier. |
At this point we might want to consider supporting a DSL in addition to the .yml format. It could be as simple as: packages:
- magrittr
- tibble
- dplyr
- ggplot2
sources:
- src/
targets:
target:
command: command_to_create_target(dep1, dep2) remake() %>%
add_library(c("magrittr", "tibble", "dplyr", "ggplot2)) %>%
add_sources("src/") %>%
add_target("target", command = ~command_to_create_target(dep1, dep2)) If we give the user a bit more flexibility when specifying the rules, we don't need to add all the logic ourselves. (Thanks @hadley for suggesting a DSL here.) |
I think it would be better to make remake_needs_package("magrittr", "tibble", "dplyr", "ggplot2")
remake_uses_source("src/")
target <- remake_target(~ command_to_create_target(dep1, dep2)) (I'm speculating wildly on the rest of the DSL. I'm happy to give more feedback) |
Thanks. It seems that using the assignment operator will make it difficult to create similar rules programmatically. How about: remake_needs_package("magrittr", "tibble", "dplyr", "ggplot2")
remake_uses_source("src/")
remake_target(~ target, ~ command_to_create_target(dep1, dep2))
remake_target("file/target.txt", ~ command_to_create_file_target(dep3, dep4)) |
How did I not see this issue before? I wrote a whole package to deal with it! # install_github("wlandau/remakeGenerator")
> library(remakeGenerator)
> df = commands(data = simulate(center = MU, scale = SIGMA))
> df
target command
1 data simulate(center = MU, scale = SIGMA) Add multiple reps: > df = expand(df, values = c("rep1", "rep2"))
> df
target command
1 data_rep1 simulate(center = MU, scale = SIGMA)
2 data_rep2 simulate(center = MU, scale = SIGMA) Evaluate wildcard patterns: > evaluate(df, rules = list(MU = 1:2, SIGMA = c(0.1, 1)), expand = FALSE)
target command
1 data_rep1 simulate(center = 1, scale = 0.1)
2 data_rep2 simulate(center = 2, scale = 1) Expand over wildcard patterns: > evaluate(df, rules = list(MU = 1:2, SIGMA = c(0.1, 1)), expand = TRUE)
target command
1 data_rep1_1_0.1 simulate(center = 1, scale = 0.1)
2 data_rep1_1_1 simulate(center = 1, scale = 1)
3 data_rep1_2_0.1 simulate(center = 2, scale = 0.1)
4 data_rep1_2_1 simulate(center = 2, scale = 1)
5 data_rep2_1_0.1 simulate(center = 1, scale = 0.1)
6 data_rep2_1_1 simulate(center = 1, scale = 1)
7 data_rep2_2_0.1 simulate(center = 2, scale = 0.1)
8 data_rep2_2_1 simulate(center = 2, scale = 1) Write remake.yml and the Makefile: targ = targets(stage1 = df, some_other_stage = similar_data_frame)
workflow(targets = targ, sources = my_sources, packages = my_packages, ...) Functions |
Thanks @wlandau, will try! I still think we need to be able to specify rules without the roundtrip of a .yml file. |
Thanks for the thoughts all; I have been thinking on them a while. There are a number of things here that might be better to be broken up into a series of separate issues. This is going to be a bit of a wall of text, I'm afraid. templating vs DSLI think these things are actually fairly orthogonal, and while a DSL might help remove some needs for templating it won't do all. I think it could work for the case that I linked above where files "appear" in a set of directories independently of remake. But if there is a target that generates a set of files, then some concept of wildcards is needed in the underlying machinery. Whatever the solution is definitely should not involve a yaml roundtrip and that should really not be needed A DSLI think that the idea of adding a DSL is interesting and worth pursuing. @hadley emailed me about remake almost 2 years ago making suggestions along similar lines 😀, and my position on this is largely unchanged:
My previous approach to the DSL looked like m <- maker({
library(testthat)
source("code.R")
file("data.csv", cleanup_level="purge") <- download_data(target_name)
processsed <- process_data("data.csv")
plot("plot.pdf", width=8, height=4) <- myplot(processed)
}) see here -- I think this followed directly from thinking about Hadley's email. In the current sources (though I think it's deleted in the refactor branch and I need to work out where it stands at the moment), there's an alternative bit of experimentation that looks like: m <- remake()
m$add <- "package:testthat"
m$add <- "code.R"
m$add <- target("data.csv", download_data(target_name),
cleanup_level="purge")
m$add <- target("processed", process_data("data.csv"))
m$add <- target("plot.pdf", myplot(processed),
plot=list(width=8, height=4))
m$make("plot.pdf") This is probably not that far from what @krlmlr is imagining above, though differing in implementation. Making the remake object part implicit would be easy and there's already a cache of remake objects. The trick from memory there was building the object up and then at the last minute before anything uses the remake object to build something you have to do the validation. Done right, that approach (sequentially adding things to a remake object, validating, running) could be used by the yaml interface at which point things are properly decoupled |
I think in the two years my feelings for an internal DSL (i.e. something pipe-y) vs. an external DSL (e.g. YAML) have grown stronger. remake is fundamentally about running code, which IMO, means that you should be in R scripts as much as possible (and then templates just become functions and for loops etc). I think yaml is better for "string-y" type operations (like pkgdown) with straightforward hierarchies (not graphs like remake models). The advantage of a pipe based DSL is that the pipe is optional - you can use it with your preferred style of function invocation. It sounds like you're arguing more for a non-functional (i.e. mutable object) approach. I think that's generally sub-optimal because it's different to the majority of R code that most R users will see. |
This is something suggested by @cboettig, and which I immediately ran into on dfalster/tree-p#9 (private repo currently). We have a makerfile that includes targets like:
It might be nice to be able to remove the duplication in a couple of places.
Most simply, the output directory (ideally there won't be that many file targets in a maker workflow, but repetition is bad form). The simplest option I can think of there is to use whisker, so that we'd have:
and then another section in the makerfile:
The only real sticking point here is that will fail miserably if a whisker variable is missing because the mustache spec says by default that missing variables result in the empty string. This issue suggests that throwing errors might be a possibility.
A more complicated form of templating (which is then more prone to odd corner cases) would be to define whole template rules. So we'd have:
somehow fill that in for the two cases above.
Of course, it should be fairly easy for users to manually template their own files prior to running maker, so the simplest solution might be to trial some forms of templating outside the package and incorporate what works.
The text was updated successfully, but these errors were encountered: