Skip to content
This repository has been archived by the owner on Jan 3, 2018. It is now read-only.

Converting RMarkdown files to html using jekyll #92

Closed
jdblischak opened this issue Oct 22, 2013 · 28 comments
Closed

Converting RMarkdown files to html using jekyll #92

jdblischak opened this issue Oct 22, 2013 · 28 comments

Comments

@jdblischak
Copy link
Contributor

We need to decide how to incorporate RMarkdown files and their conversion to html into the bc repo workflow (this issue inspired by @jduckles recent PR). The best solution would be one where only the RMarkdown file is committed to the repo and then is automatically converted to html when building the site with jekyll. Here is a link to one solution for accomplishing this. It involves adding an extra R file that to "knit" the files when rendering the website. My main concern is that this will slow down the build process if each time we have to run all of the R lessons in the repo. Thoughts?

@ahmadia
Copy link
Contributor

ahmadia commented Oct 22, 2013

@jdblischak - Thanks for this. Unfortunately, I don't think this is a solution in the "our gh-pages branch will not have generated content" sense. You'll notice that the second solution requires that you check the generated content into version control.

The only two long-term solutions for this are to get GitHub to add more generation support (which seems unlikely to me), or to stop using GitHub to host our generated content. I am strongly -1 on getting into the habit of storing generated content into our repositories, and I'm willing to take a fairly principled stand on this.

As you can probably guess, I'm -1 on using gh-pages in general as our development/deployment branch, but I think I've been outvoted on this one.

@jdblischak
Copy link
Contributor Author

Good catch. That's what I get for quickly skimming the blog post.

We seem set on using the gh-pages branch for both development and deployment, so it seems we will have to upload the markdown or html (or both?) versions of the RMarkdown file. In past boot camps where I taught R, I simply uploaded the html file for lack of a better solution.

@ahmadia
Copy link
Contributor

ahmadia commented Oct 22, 2013

@jdblischak - keeping the RMarkdown in the repository with README instructions on how to generate HTML and upload it to a bootcamp fork sounds like a pretty good compromise, actually. Is that what you're proposing?

@rgaiacs
Copy link

rgaiacs commented Oct 22, 2013

As @ahmadia, I am strongly -1 on getting into the habit of storing generated content into our repositories too.

@jdblischak
Copy link
Contributor Author

@ahmadia - I was simply explaining what I had done in the past, but I do like your solution. The plan for lessons in RMarkdown could be:

  • Only include original RMarkdown in PR to bc repo
  • Render to html and push to specific boot camp repo to display on boot camp website

I don't think we want a redundant README in each directory with RMarkdown files explaining how to generate the html file and push to the boot camp repo, especially since this is usually a straightforward process. Perhaps we could add a note about submitting RMarkdown either to the main bc README.md or at the base of the directory where we want to store R lessons.

@ahmadia
Copy link
Contributor

ahmadia commented Oct 22, 2013

or at the base of the directory where we want to store R lessons.

+1

@wking
Copy link
Contributor

wking commented Oct 22, 2013

On Tue, Oct 22, 2013 at 06:51:38AM -0700, Aron Ahmadia wrote:

As you can probably guess, I'm -1 on using gh-pages in general as
our development/deployment branch, but I think I've been outvoted on
this one.

To work around limitations in GitHub's hosted Jekyll processing, maybe
we need an optional Jekyll pre-processor for compiling the stuff
Jekyll doesn't handle ;). Not the most elegant solution, but I think
it's better than leaving by-hand compilation notes in a README
somewhere…

@ahmadia
Copy link
Contributor

ahmadia commented Oct 22, 2013

@wking - Agreed, but I think the two suggestions are compatible. I'm happy to prefer a Jekyll plugin if it exists or somebody's written one, but I think I'll take the dirty "this is how to make it work" notes over a published product if it enables instructors to get going without worrying about debugging a series of moving parts.

I'm not sure it's worth bikeshedding the approach too much here. I'm open to all PRs that address this issue, and I will be +1 for merge on the most elegant solution that is both working and in front of me :)

@ethanwhite
Copy link
Contributor

I'm not sure it's worth bikeshedding the approach too much here. I'm open to all PRs that address this issue, and I will be +1 for merge on the most elegant solution that is both working and in front of me :)

It looks like rOpenSci is now solving this problem using .Rmd -> .md and letting Jekyll handle the markdown:

we can take our package vignettes with only text and code as a .Rmd file, convert to a .md file with text + code + the output of that code, insert some yaml metadata at the top, and have Jekyll automagically generate html pages.

http://ropensci.org/blog/2013/10/03/tutorials/

@karthikram (who is part of both rOpenSci & SWC) - Do you have this process automated at some level and if so would you be open to us borrowing it for our workflow?

@karthik
Copy link
Contributor

karthik commented Oct 31, 2013

@ethanwhite We don't do anything fancy. We just have .Rmd files that we convert ('knit') to .md. Those files are in the _posts folder which then get rendered to html by Jekyll.

On further thought, I see what this thread is about. I did not submit this pull request with the intent of the content being turned into a web page. Not at all. I submitted this material so a future instructor hoping to teach a R bootcamp could take this material, customize it, perhaps change the examples to appeal to a specific domain, and drop them into a bootcamp repo.

So that's why I didn't bother submitting a .Rmd file. You'll note that the same folder has other content that is not readily exportable to html.

@jdblischak writes:

Thanks for this. Unfortunately, I don't think this is a solution in the "our gh-pages branch will not have generated content" sense.

Sorry, this was never my intention. If that's what you'd like, please ignore this pull request and close out this issue.

@ahmadia
Copy link
Contributor

ahmadia commented Oct 31, 2013

@karthikram - Please sit tight while we sort this out :) bc is hitting it's first major growing pains this month, and we're realizing that we need to be clearer about how we want to bring new content in. @gvwilson has been on travel (I think you know this :), and hasn't had a chance to weigh in on a lot of the issues that have been brought up.

@ahmadia
Copy link
Contributor

ahmadia commented Oct 31, 2013

@karthikram - Also, this is a meta-issue that came up before your current PR :)

@ethanwhite
Copy link
Contributor

Apologies for confusing things. I was just suggesting that the current rOpenSci workflow for going between RMarkdown and the web could be the best approach for handling R content since it seems simple and straightforward and is working for them.

@karthik
Copy link
Contributor

karthik commented Oct 31, 2013

@ahmadia @ethanwhite Got it. I'll hang on till there is more community consensus on the matter.

@jdblischak
Copy link
Contributor Author

@karthikram We definitely appreciate your PR! I was debating the best way to organize the creation of an R boot camp, and your materials will serve as a great starting point (not to mention will save us a ton of time!).

But as you noticed with the pre-existing R material in bc, there has not been much centralized thought to the organization. This issue is about trying to figure out how to best include new R materials given that the new boot camp work flow that was instituted this Fall emphasizes hosting the lessons on the boot camp website (which works really great for md files, not so much for the IPython notebooks). Right now we are leaning towards only including the RMarkdown files in the bc repo. Then when an instructor starts a boot camp repo, s/he can knit the file into md or html and push it to the repo so that the students can view it from the boot camp website.

And of course I am willing to help you convert your md files to Rmd, because as I mentioned before your lessons are already going to save those of us interested in hosting R boot camps a lot of time.

@jennybc
Copy link

jennybc commented Nov 12, 2013

I have two comments on this:

[1] I am actually PRO keeping compiled content very visible and available. This probably reveals me to be more of a "user" and less a "developer" but hear me out. I am very keen on reproducibility but when the rubber meets the road at 2am or right after getting a new machine ... I don't always WANT to regenerate everything, even if, in theory I can. I am not necessarily saying we should bring Markdown and baked HTML into the main repo and keep it under version control. But maybe we should ?!? I think the totally elegant purist attitude of keeping only the raw ingredients on hand and giving people a makefile-type script to compile is ... not realistic. Depending on what's in the modules, there can be quite a lot of dependencies, path name, cache and figure issues and getting all of that sorted out can be a real gating factor. Sometimes you just want to glance at what someone else has done and cloning + re-generating everything has a real chilling effect. This is a genuinely tricky issue but I think we need to be really sympathetic to humans trying to make the best use of existing materials who teach SWC in their "spare time".

[2] I have an R package, private to me at the moment but I aspire to turn public soon, to automate the conversion of many R markdown files. The target audience/workflow is someone who doesn't want to go all the way to jekyll and tolerate all the Ruby dependencies. BUT I am very aware of jekyll and the need to support/play nice with that. If I got the blessing from SWC, I would be willing to explicitly provide a solution for us to programatically bake R markdown into markdown by way of my package. For example, it already can parse _config.yml and figure out what needs to be converted from R markdown to markdown.

@ahmadia
Copy link
Contributor

ahmadia commented Nov 12, 2013

Thanks @jennybc. I recognize the dangers of not making content easily available to users (and developers), so the best compromise I can provide is automated build bots in the background set up to compile our content into more digestible forms. We have most of the infrastructure for this in place. If you could point your browser to visit the HTML associated with a particular commit easily, would this be a reasonable compromise to not checking generating content into the GitHub repositories?

Regarding [2], I think that's very interesting. Would you be interested in opening up your project for review by a couple of the other R instructors?

@gvwilson
Copy link
Contributor

[1] +1 from me on having compiled content in the repo, for all the
reasons Jenny says. (People in Australia are currently wrestling with
Jekyll versions to build things locally, which is roughly zero fun.)

[2] I don't know enough to have an opinion.

@wking
Copy link
Contributor

wking commented Nov 12, 2013

On Tue, Nov 12, 2013 at 12:33:07PM -0800, Greg Wilson wrote:

[1] +1 from me on having compiled content in the repo, for all the
reasons Jenny says. (People in Australia are currently wrestling
with Jekyll versions to build things locally, which is roughly zero
fun.)

+1 from me for centrally-generated compiled content in a branch, but
-1 on that branch being bc/master. If you just want the compiled
version of the master source, checking out the 'compiled' branch (or
the master branch of a bc-dist repository, or whatever) doesn't seem
too complicated for 2am. The maintainers would just install a hook on
their local machine to trigger a re-build and re-push of the compiled
content when they merged something into the master branch. That way
the purists get what they want (a clean development branch) and the
pragmatists get what they want (a ready-to-use, pre-compiled branch)
at the same time. I'm -0 on yielding the 'master' namespace and
calling these branches 'develop' and 'master' respectively.

@stevenkoenig
Copy link

[1] +1 from me as well.
+1 for wking's suggestion of having two branches, one for raw material, one for pre-compiled stuff.

@jdblischak
Copy link
Contributor Author

[1] -1

If an RMarkdown file fails to knit at 2am, that is very useful information. That means some code that is going to be used in the bootcamp no longer works for whatever reason. Better to find that out the night before instead of in front of a live audience.

@wking
Copy link
Contributor

wking commented Nov 12, 2013

On Tue, Nov 12, 2013 at 02:29:09PM -0800, John Blischak wrote:

If an RMarkdown file fails to knit at 2am, that is very useful
information. That means some code that is going to be used in the
bootcamp no longer works for whatever reason. Better to find that
out the night before instead of in front of a live audience.

Even better to find that out when the maintainer merged the code into
master, than at 2am on the night before a boot camp ;). This “build
at merge time” approach in #95 turned up the issue fixed four days
after the merge in #141. Of course, instructors should be
encouraged to rebuild (to make sure their enviroment's up to snuff
and that they haven't made any incompatible changes), but I don't
think we need to require them to do so.

@jennybc
Copy link

jennybc commented Nov 13, 2013

Let's contrast (and support!) two use cases: actual delivery of bootcamp vs. instructor thinking about what/how to teach. The hard line about "clean repo, auto build, any error is a showstopper" applies only to the former. But the latter is the priority if we want to build core curriculum materials that instructors actually reuse and refine.

I ran my entire course with R Markdown, knitr, and github this year. Yes it's imperfect but I'll share anyway.

https://github.com/jennybc/STAT545A

http://www.stat.ubc.ca/~jenny/STAT545A/current.html

35 students, 18 contact hours, 6 weeks. Only 1 instructor = me. The main reason for stress at 2am is not that the real R code or logic is somehow flawed. It's that an invisible-but-evaluated R chunk in file x sets a global option to centre figures, which then breaks side-by-side figure placement in file y. Or that the gtools package attached silently as a dependency of an explicitly attached package in file x masks reorder.default in base R and produces weird errors when knitting file y. All real examples. I hated troubleshooting this crap and I was just cleaning up my own loose ends! As soon as I encounter stuff like that with others' materials, I go back to rolling my own. I'm not proud of that but it's the truth.

I'd love to browse the baked website of someone else's bootcamp. If I see stuff I admire, then I would totally be motivated to grab the source, figure out any issues that arise, and reuse it in one I'm leading.

@ahmadia about the package for knitting Rmd files en masse. I am doing a "reboot" of it right now. A talented undergrad wrote the first pass this summer and it's what builds my current website including the course mentioned above. But I've got some clean up to do before I can share :) It's very close! I'd be happy to add people as collaborators on the repo soon.

@ahmadia
Copy link
Contributor

ahmadia commented Dec 12, 2013

@jennybc @karthik @jdblischak - We need to have an unsplit policy on this going forward. Until we have a solution for directly rendering source files such as .Rmd, I will support committing generated files into the repository under the following conditions:

1] There is at least one person familiar with the source files and capable of regenerating the output files participating actively on the repository
2] All pull requests need to be posed against the source material. Direct updates to anything generated will not be accepted/merged.
3] We strip out all the merged generated output as soon as a suitable external replacement appears.

I need at least one person to put their hand up for [1], and if there are any disagreements or questions about [2], we should hash them out here and then close this issue.

@jdblischak
Copy link
Contributor Author

I need more clarification of 2]. If someone wants to change the lesson, I think they should be required to change all versions of the files. We don't want changes to the generated material without a change to the source file (which I think your phrasing currently states), but I also don't think we want to change the source file without simultaneously updating the generated material (which I am not sure is allowed based on the current wording of 2]).

@ahmadia
Copy link
Contributor

ahmadia commented Dec 12, 2013

Yes @jdblischak, thanks for the clarification. I meant that changes must modify the source, and keep the generated files "in synch" as part of the commit. In other words, all of the files must change simultaneously, and we "ignore" the generated materials in the repository except to make sure that they stay in sync.

I should also say for rationale that I am balancing the inconveniences mentioned by @jennybc and others against the relatively lightweight output files being generated and the pain of maintenance. Since the output weight to the repository is acceptable and you are agreeing to maintain, I'm going to put up my 👍 for allowing generated content in from .Rmd for now.

@jdblischak
Copy link
Contributor Author

Sounds good.

@ahmadia
Copy link
Contributor

ahmadia commented Dec 16, 2013

Okay, I'm going to close this issue. We can review it later when we have more build infrastructure up for the repository that could foreseeably generate and populate gh-pages branches for us.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants