Templating within a remakefile #2

richfitz · 2014-09-26T22:40:31Z

This is something suggested by @cboettig, and which I immediately ran into on dfalster/tree-p#9 (private repo currently). We have a makerfile that includes targets like:

  output/leaf_traits.csv:
    depends: leaf_traits
    rule: export_csv
    target_argument_name: filename

  output/growth_mortality_traits.csv:
    depends: growth_mortality_traits
    rule: export_csv
    target_argument_name: filename

It might be nice to be able to remove the duplication in a couple of places.

Most simply, the output directory (ideally there won't be that many file targets in a maker workflow, but repetition is bad form). The simplest option I can think of there is to use whisker, so that we'd have:

  {{output_dir}}/leaf_traits.csv:
    depends: leaf_traits
    rule: export_csv
    target_argument_name: filename

  {{output_dir}}/growth_mortality_traits.csv:
    depends: growth_mortality_traits
    rule: export_csv
    target_argument_name: filename

and then another section in the makerfile:

variables:
  output_dir: output

The only real sticking point here is that will fail miserably if a whisker variable is missing because the mustache spec says by default that missing variables result in the empty string. This issue suggests that throwing errors might be a possibility.

A more complicated form of templating (which is then more prone to odd corner cases) would be to define whole template rules. So we'd have:

output/{{filename}}.csv:
    depends: {{object}}
    rule: export_csv
    target_argument_name: filename

somehow fill that in for the two cases above.

Of course, it should be fairly easy for users to manually template their own files prior to running maker, so the simplest solution might be to trial some forms of templating outside the package and incorporate what works.

The text was updated successfully, but these errors were encountered:

krlmlr · 2016-11-25T15:14:03Z

The make syntax would be:

  output/%.csv:
    depends: %
    rule: export_csv
    target_argument_name: filename

Would that be an option?

richfitz · 2016-11-25T15:27:48Z

That's really nice, yeah. And I like the connection to the make rules.

The use case that triggered this was slightly more complicated though (and done manually). The repo is at dfalster/baad:

the template was in whisker and required just one substitution
we had a bootstrap script that generated a fraction of a remake file
that fraction was included in the main remake file

Not very pretty but it worked.

Getting the wildcard bit (#70) could work for file based patterns, but for looping over sets of objects it could all get a bit nastier.

krlmlr · 2016-11-26T15:23:27Z

At this point we might want to consider supporting a DSL in addition to the .yml format. It could be as simple as:

packages:
- magrittr
- tibble
- dplyr
- ggplot2

sources:
- src/

targets:
  target:
    command: command_to_create_target(dep1, dep2)

remake() %>%
  add_library(c("magrittr", "tibble", "dplyr", "ggplot2)) %>%
  add_sources("src/") %>%
  add_target("target", command = ~command_to_create_target(dep1, dep2))

If we give the user a bit more flexibility when specifying the rules, we don't need to add all the logic ourselves. (Thanks @hadley for suggesting a DSL here.)

hadley · 2016-11-26T17:15:30Z

I think it would be better to make remake() implicit (i.e. make it the last argument and default to remake()), i.e.

remake_needs_package("magrittr", "tibble", "dplyr", "ggplot2")
remake_uses_source("src/")

target <- remake_target(~ command_to_create_target(dep1, dep2))

(I'm speculating wildly on the rest of the DSL. I'm happy to give more feedback)

krlmlr · 2016-11-26T19:34:05Z

Thanks. It seems that using the assignment operator will make it difficult to create similar rules programmatically. How about:

remake_needs_package("magrittr", "tibble", "dplyr", "ggplot2")
remake_uses_source("src/")

remake_target(~ target, ~ command_to_create_target(dep1, dep2))
remake_target("file/target.txt", ~ command_to_create_file_target(dep3, dep4))

wlandau · 2016-11-26T19:44:45Z

How did I not see this issue before? I wrote a whole package to deal with it!
remakeGenerator does templating inside R to create remake.yml and a Makefile. It works with data frames of remake commands and writes remake.yml (plus an overarching Makefile via parallelRemake as in #84).

# install_github("wlandau/remakeGenerator")
> library(remakeGenerator)

> df = commands(data = simulate(center = MU, scale = SIGMA))
> df
  target                              command
1   data simulate(center = MU, scale = SIGMA)

Add multiple reps:

> df = expand(df, values = c("rep1", "rep2"))
> df
     target                              command
1 data_rep1 simulate(center = MU, scale = SIGMA)
2 data_rep2 simulate(center = MU, scale = SIGMA)

Evaluate wildcard patterns:

> evaluate(df, rules = list(MU = 1:2, SIGMA = c(0.1, 1)), expand = FALSE)
     target                           command
1 data_rep1 simulate(center = 1, scale = 0.1)
2 data_rep2   simulate(center = 2, scale = 1)

Expand over wildcard patterns:

> evaluate(df, rules = list(MU = 1:2, SIGMA = c(0.1, 1)), expand = TRUE)
           target                           command
1 data_rep1_1_0.1 simulate(center = 1, scale = 0.1)
2   data_rep1_1_1   simulate(center = 1, scale = 1)
3 data_rep1_2_0.1 simulate(center = 2, scale = 0.1)
4   data_rep1_2_1   simulate(center = 2, scale = 1)
5 data_rep2_1_0.1 simulate(center = 1, scale = 0.1)
6   data_rep2_1_1   simulate(center = 1, scale = 1)
7 data_rep2_2_0.1 simulate(center = 2, scale = 0.1)
8   data_rep2_2_1   simulate(center = 2, scale = 1)

Write remake.yml and the Makefile:

targ = targets(stage1 = df, some_other_stage = similar_data_frame)
workflow(targets = targ, sources = my_sources, packages  = my_packages, ...)

Functions analyses() and summaries() work this way internally. Also, you can add fields like plot and knitr to data frames of commands, and they will appear in remake.yml and the Makefile. The linked vignette and example_remakeGenerator() have more.

krlmlr · 2016-11-27T10:02:29Z

Thanks @wlandau, will try! I still think we need to be able to specify rules without the roundtrip of a .yml file.

richfitz · 2016-11-30T09:49:26Z

Thanks for the thoughts all; I have been thinking on them a while. There are a number of things here that might be better to be broken up into a series of separate issues. This is going to be a bit of a wall of text, I'm afraid.

templating vs DSL

I think these things are actually fairly orthogonal, and while a DSL might help remove some needs for templating it won't do all. I think it could work for the case that I linked above where files "appear" in a set of directories independently of remake. But if there is a target that generates a set of files, then some concept of wildcards is needed in the underlying machinery. Whatever the solution is definitely should not involve a yaml roundtrip and that should really not be needed

A DSL

I think that the idea of adding a DSL is interesting and worth pursuing.

@hadley emailed me about remake almost 2 years ago making suggestions along similar lines 😀, and my position on this is largely unchanged:

There is nothing in remake that fundamentally needs yaml; it's just a convenient vehicle to get a set of nested data into R. I can rework things to decouple things further and then whatever interface is useful to generate the structure can be used (be it the generative approaches that @wlandau has tried, a pipe-and-shiny approach that Hadley mooted a couple of years ago, or whatever).
The package is already too big (I've already pulled storr out of it, and I'm working on factoring out some other bits at the moment) - I think it would probably be nice to allow things like the DSL to be built on top of the underlying engine. My focus with the package (and really my interest in this area) is in getting the underlying machinery robust. Interface design is something that others are probably better at than I am and giving people freer rein there would be an advantage to everyone I suspect
As an old-timer who is not the biggest fan in the universe of the pipe operator I would love it if use of the pipe be optional
Whatever happens with the DSL, I think we need to be very careful not to create something that works with mtcars and iris but not with real research problems. remake was created because we hit the wall on reproducibilty trying do the right thing with knitr and caching (the blog post I wrote for ropensci was my aha moment). I honestly believe that like mustache there's a big advantage in a logic-free approach to this problem - I'm well aware that not everyone believes me though. In my previous job we used remake to scale projects that ran to many days of CPU time though, so I feel that it is at least a sufficient (if not necessary) approach. If things can be made modular though, it really won't matter - especially if it's possible to translate between one approach and the other. My concern is that if the DSL looks R-ish pretty soon people will want loops, conditionals, etc and the whole thing will balloon out of control and you'll end up with something awkward to use. OTOH, the success of dplyr and ggplot2 show that a well designed DSL can be very powerful, so who knows.

My previous approach to the DSL looked like

m <- maker({
  library(testthat)
  source("code.R")

  file("data.csv", cleanup_level="purge") <- download_data(target_name)
  processsed <- process_data("data.csv")
  plot("plot.pdf", width=8, height=4) <- myplot(processed)
})

see here -- I think this followed directly from thinking about Hadley's email.

In the current sources (though I think it's deleted in the refactor branch and I need to work out where it stands at the moment), there's an alternative bit of experimentation that looks like:

  m <- remake()
  m$add <- "package:testthat"
  m$add <- "code.R"
  m$add <- target("data.csv", download_data(target_name),
                  cleanup_level="purge")
  m$add <- target("processed", process_data("data.csv"))
  m$add <- target("plot.pdf", myplot(processed),
                  plot=list(width=8, height=4))
  m$make("plot.pdf")

This is probably not that far from what @krlmlr is imagining above, though differing in implementation. Making the remake object part implicit would be easy and there's already a cache of remake objects.

The trick from memory there was building the object up and then at the last minute before anything uses the remake object to build something you have to do the validation.

Done right, that approach (sequentially adding things to a remake object, validating, running) could be used by the yaml interface at which point things are properly decoupled

hadley · 2016-11-30T14:13:58Z

I think in the two years my feelings for an internal DSL (i.e. something pipe-y) vs. an external DSL (e.g. YAML) have grown stronger. remake is fundamentally about running code, which IMO, means that you should be in R scripts as much as possible (and then templates just become functions and for loops etc). I think yaml is better for "string-y" type operations (like pkgdown) with straightforward hierarchies (not graphs like remake models).

The advantage of a pipe based DSL is that the pipe is optional - you can use it with your preferred style of function invocation. It sounds like you're arguing more for a non-functional (i.e. mutable object) approach. I think that's generally sub-optimal because it's different to the majority of R code that most R users will see.

richfitz mentioned this issue Feb 3, 2015

Variables for object names and filenames #22

Open

richfitz changed the title ~~Templating within a makerfile~~ Templating within a remakefile Feb 6, 2015

richfitz mentioned this issue Jul 29, 2015

Tabular dependencies #56

Open

richfitz mentioned this issue Nov 30, 2015

wildcard globbing #70

Open

richfitz mentioned this issue Dec 1, 2016

Two questions re using remake with R package code #142

Closed

wlandau mentioned this issue Apr 7, 2017

Solution for a large number of files? #163

Open

wlandau-lilly mentioned this issue Apr 13, 2017

Other interfaces and templating options ropensci/drake#26

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Templating within a remakefile #2

Templating within a remakefile #2

richfitz commented Sep 26, 2014

krlmlr commented Nov 25, 2016

richfitz commented Nov 25, 2016

krlmlr commented Nov 26, 2016 •

edited

Loading

hadley commented Nov 26, 2016 •

edited

Loading

krlmlr commented Nov 26, 2016 •

edited

Loading

wlandau commented Nov 26, 2016 •

edited

Loading

krlmlr commented Nov 27, 2016

richfitz commented Nov 30, 2016

hadley commented Nov 30, 2016

Templating within a remakefile #2

Templating within a remakefile #2

Comments

richfitz commented Sep 26, 2014

krlmlr commented Nov 25, 2016

richfitz commented Nov 25, 2016

krlmlr commented Nov 26, 2016 • edited Loading

hadley commented Nov 26, 2016 • edited Loading

krlmlr commented Nov 26, 2016 • edited Loading

wlandau commented Nov 26, 2016 • edited Loading

krlmlr commented Nov 27, 2016

richfitz commented Nov 30, 2016

templating vs DSL

A DSL

hadley commented Nov 30, 2016

krlmlr commented Nov 26, 2016 •

edited

Loading

hadley commented Nov 26, 2016 •

edited

Loading

krlmlr commented Nov 26, 2016 •

edited

Loading

wlandau commented Nov 26, 2016 •

edited

Loading