diff --git a/amdahl_foundation.md b/amdahl_foundation.md new file mode 100644 index 0000000..157375a --- /dev/null +++ b/amdahl_foundation.md @@ -0,0 +1,90 @@ +--- +title: "Running a Parallel Application on the Cluster" +teaching: 10 +exercises: 2 +--- + +:::::::::::::::::::::::::::::::::::::: questions + +- What output does the Amdahl code generate? +- Why does parallelizing the amdahl code make it faster? + +:::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::: objectives + +- Run the amdahl parallel code on the cluster +- Note what output is generated, and where it goes +- Predict the trend of execution time vs parallelism + +:::::::::::::::::::::::::::::::::::::::::::::::: + +## Introduction + +A high-performance computing cluster offers powerful +computational resources to its users, but taking advantage +of these resources is not always straightforward. The +cluster system does not work in the same way as systems +you may be more familiar with. + +The software we will use in this lesson is a model of +the kind of parallel task that is well-adapted to +high-performance computing resources. It's called "amdahl", +named for Eugene Amdahl, a famous computer scientist who +coined "Amdahl's Law", which is about the advantages and +limitations of parallelism in code execution. + + +:::::::::::::::::::::::::::::::::: callout + +[Amdahl's Law](https://en.wikipedia.org/wiki/Amdahl%27s_law) is +a statement about how much benefit you can expect to get by +parallelizing a computer program. + +The limitation arises from the fact that, in any application, +there is some fraction of the work to be done which is inherently +serial, and some fraction which is amenable to parallelization. +The law is a quantitative expression of the fact that, by +parallelizing the code, you can only ever make the parallel +part faster, you cannot reduce the execution time of the +serial part. + +As a practical matter, this means that developer effort spent +on parallelization has diminishing returns on the overall +reduction in execution time. + +::::::::::::::::::::::::::::::::::::: + +## The Amdahl Code + +Download it and install it, via pip. + +## Running It on the Cluster + +Use the `sacct` command to see the run-time. The run-time +is also recorded in the output itself. + +::::::::::::::::::::::::::::::::::::: challenge + +Run the amdhal code with a few (small!) levels +of parallelism. Make a quantitative estimate of +how much faster the code will run with 3 processors +than 2. The naive estimate would be that it would run +1.5x the speed, or equivalently, that it would +complete in 2/3 the time. + +::::::::::::::::::::::::::::::::::::: solution + +The amdahl code runs faster with 3 processors than with +2, but the speed-up is less than 1.5x. + +::::::::::::::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::: keypoints + +- The amdahl code is a model of a parallel application +- The execution speed depends on the degree of parallelism + +:::::::::::::::::::::::::::::::::::::::::::::::: diff --git a/amdahl_snakemake.md b/amdahl_snakemake.md new file mode 100644 index 0000000..c5206f8 --- /dev/null +++ b/amdahl_snakemake.md @@ -0,0 +1,61 @@ +--- +title: "Amdahl Parallel Runs" +teaching: 10 +exercises: 2 +--- + +:::::::::::::::::::::::::::::::::::::: questions + +- How can we collect data on Amdahl run times? + +:::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::: objectives + +- Collect systematic data on the runtime of the amdahl code + +::::::::::::::::::::::::::::::::::::: + +## Systematic Data Collection + +Using what we have learned so far, including Snakemake +profiles and rules, we will now compose a Snakefile +that runs the Amdahl example code over a range of +parallel widths. This workflow will generate the +data we will use in the next module to demonstrate +the diminishing returns of increasing parallelism. + +## Write a File + +Compose the Snakemake file that does what we want. + +We can put the widths in a list and iterate over +them. We will use the profile generated previously +to ensure that the jobs run on the cluster. + +## Run Snakemake + +Throw the switch! + +::::::::::::::::::::::::::::::::::::: challenge + +Our example has a single paramter, the parallelism, +that we vary. How would you generalize this to arbitrary +parameters? + +::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::: solution + +Arbitrary parameters are still finite, so you could +just generate a flat list of all the combinations, and iterate +over that. Or you could generate two lists and do a nested +loop. + +:::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::: keypoints + +- A relatively compact snakemake file collects interesting data. + +:::::::::::::::::::::::::::::::::::::::::::::::: diff --git a/config.yaml b/config.yaml index f711a90..39ce08f 100644 --- a/config.yaml +++ b/config.yaml @@ -59,7 +59,12 @@ contact: 'team@carpentries.org' # FIXME # Order of episodes in your lesson episodes: -- introduction.Rmd +- amdahl_foundation.md +- snakemake_single.md +- snakemake_multiple.md +- snakemake_cluster.md +- snakemake_profiles.md +- amdahl_snakemake.md # Information for Learners learners: diff --git a/fig/introduction-rendered-pyramid-1.png b/fig/introduction-rendered-pyramid-1.png deleted file mode 100644 index 7361544..0000000 Binary files a/fig/introduction-rendered-pyramid-1.png and /dev/null differ diff --git a/introduction.md b/introduction.md deleted file mode 100644 index 9e66e6e..0000000 --- a/introduction.md +++ /dev/null @@ -1,119 +0,0 @@ ---- -title: "Using RMarkdown" -teaching: 10 -exercises: 2 ---- - -:::::::::::::::::::::::::::::::::::::: questions - -- How do you write a lesson using R Markdown and `{sandpaper}`? - -:::::::::::::::::::::::::::::::::::::::::::::::: - -::::::::::::::::::::::::::::::::::::: objectives - -- Explain how to use markdown with the new lesson template -- Demonstrate how to include pieces of code, figures, and nested challenge blocks - -:::::::::::::::::::::::::::::::::::::::::::::::: - -## Introduction - -This is a lesson created via The Carpentries Workbench. It is written in -[Pandoc-flavored Markdown](https://pandoc.org/MANUAL.txt) for static files and -[R Markdown][r-markdown] for dynamic files that can render code into output. -Please refer to the [Introduction to The Carpentries -Workbench](https://carpentries.github.io/sandpaper-docs/) for full documentation. - -What you need to know is that there are three sections required for a valid -Carpentries lesson template: - - 1. `questions` are displayed at the beginning of the episode to prime the - learner for the content. - 2. `objectives` are the learning objectives for an episode displayed with - the questions. - 3. `keypoints` are displayed at the end of the episode to reinforce the - objectives. - -:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: instructor - -Inline instructor notes can help inform instructors of timing challenges -associated with the lessons. They appear in the "Instructor View" - -:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: - -::::::::::::::::::::::::::::::::::::: challenge - -## Challenge 1: Can you do it? - -What is the output of this command? - -```r -paste("This", "new", "lesson", "looks", "good") -``` - -:::::::::::::::::::::::: solution - -## Output - -```output -[1] "This new lesson looks good" -``` - -::::::::::::::::::::::::::::::::: - - -## Challenge 2: how do you nest solutions within challenge blocks? - -:::::::::::::::::::::::: solution - -You can add a line with at least three colons and a `solution` tag. - -::::::::::::::::::::::::::::::::: -:::::::::::::::::::::::::::::::::::::::::::::::: - -## Figures - -You can also include figures generated from R Markdown: - - -```r -pie( - c(Sky = 78, "Sunny side of pyramid" = 17, "Shady side of pyramid" = 5), - init.angle = 315, - col = c("deepskyblue", "yellow", "yellow3"), - border = FALSE -) -``` - -
-pie chart illusion of a pyramid -

Sun arise each and every morning

-
- -Or you can use standard markdown for static figures with the following syntax: - -`![optional caption that appears below the figure](figure url){alt='alt text for -accessibility purposes'}` - -![You belong in The Carpentries!](https://raw.githubusercontent.com/carpentries/logo/master/Badge_Carpentries.svg){alt='Blue Carpentries hex person logo with no text.'} - -## Math - -One of our episodes contains $\LaTeX$ equations when describing how to create -dynamic reports with {knitr}, so we now use mathjax to describe this: - -`$\alpha = \dfrac{1}{(1 - \beta)^2}$` becomes: $\alpha = \dfrac{1}{(1 - \beta)^2}$ - -Cool, right? - -::::::::::::::::::::::::::::::::::::: keypoints - -- Use `.md` files for episodes when you want static content -- Use `.Rmd` files for episodes when you need to generate output -- Run `sandpaper::check_lesson()` to identify any issues with your lesson -- Run `sandpaper::build_lesson()` to preview your lesson locally - -:::::::::::::::::::::::::::::::::::::::::::::::: - -[r-markdown]: https://rmarkdown.rstudio.com/ diff --git a/md5sum.txt b/md5sum.txt index 8f06b51..832b5a2 100644 --- a/md5sum.txt +++ b/md5sum.txt @@ -1,12 +1,17 @@ "file" "checksum" "built" "date" "CODE_OF_CONDUCT.md" "c93c83c630db2fe2462240bf72552548" "site/built/CODE_OF_CONDUCT.md" "2023-05-02" "LICENSE.md" "b24ebbb41b14ca25cf6b8216dda83e5f" "site/built/LICENSE.md" "2023-05-02" -"config.yaml" "3e7855c6ceaa6f7d37cadf15ef27e95d" "site/built/config.yaml" "2023-05-02" +"config.yaml" "a4b7ada62c5b5c170d7f9a8db3f91eb2" "site/built/config.yaml" "2023-06-01" "index.md" "a02c9c785ed98ddd84fe3d34ddb12fcd" "site/built/index.md" "2023-05-02" "links.md" "8184cf4149eafbf03ce8da8ff0778c14" "site/built/links.md" "2023-05-02" -"episodes/introduction.Rmd" "ff977557e9e880564e0636c2c3ff3fe4" "site/built/introduction.md" "2023-05-02" +"episodes/amdahl_foundation.md" "c77d9c450a51152939e07795efe01a76" "site/built/amdahl_foundation.md" "2023-06-01" +"episodes/snakemake_single.md" "8a0101812af2f8a1ee5396dcdbb07843" "site/built/snakemake_single.md" "2023-06-01" +"episodes/snakemake_multiple.md" "42909d76788532aa7c7581cf9fdfd4f1" "site/built/snakemake_multiple.md" "2023-06-01" +"episodes/snakemake_cluster.md" "3a99cd6440cd66d7f7e7f17045aa280b" "site/built/snakemake_cluster.md" "2023-06-01" +"episodes/snakemake_profiles.md" "a9a31ead95d1a408a01db09a2970ca2c" "site/built/snakemake_profiles.md" "2023-06-01" +"episodes/amdahl_snakemake.md" "5b47e3bc93d2f6472c25764902160f5a" "site/built/amdahl_snakemake.md" "2023-06-01" "instructors/instructor-notes.md" "cae72b6712578d74a49fea7513099f8c" "site/built/instructor-notes.md" "2023-05-02" "learners/reference.md" "1c7cc4e229304d9806a13f69ca1b8ba4" "site/built/reference.md" "2023-05-02" "learners/setup.md" "61568b36c8b96363218c9736f6aee03a" "site/built/setup.md" "2023-05-02" "profiles/learner-profiles.md" "60b93493cf1da06dfd63255d73854461" "site/built/learner-profiles.md" "2023-05-02" -"renv/profiles/lesson-requirements/renv.lock" "7e6ed5826061f0c954127b73f76fca46" "site/built/renv.lock" "2023-05-02" +"renv/profiles/lesson-requirements/renv.lock" "c3e9e558e1985837d230c3b923ab1c5a" "site/built/renv.lock" "2023-06-01" diff --git a/renv.lock b/renv.lock index 3b519c6..1baa91e 100644 --- a/renv.lock +++ b/renv.lock @@ -17,260 +17,6 @@ ] }, "Packages": { - "R6": { - "Package": "R6", - "Version": "2.5.1", - "Source": "Repository", - "Repository": "RSPM", - "Requirements": [ - "R" - ], - "Hash": "470851b6d5d0ac559e9d01bb352b4021" - }, - "base64enc": { - "Package": "base64enc", - "Version": "0.1-3", - "Source": "Repository", - "Repository": "RSPM", - "Requirements": [ - "R" - ], - "Hash": "543776ae6848fde2f48ff3816d0628bc" - }, - "bslib": { - "Package": "bslib", - "Version": "0.4.2", - "Source": "Repository", - "Repository": "RSPM", - "Requirements": [ - "R", - "base64enc", - "cachem", - "grDevices", - "htmltools", - "jquerylib", - "jsonlite", - "memoise", - "mime", - "rlang", - "sass" - ], - "Hash": "a7fbf03946ad741129dc81098722fca1" - }, - "cachem": { - "Package": "cachem", - "Version": "1.0.7", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "fastmap", - "rlang" - ], - "Hash": "cda74447c42f529de601fe4d4050daef" - }, - "cli": { - "Package": "cli", - "Version": "3.6.1", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "utils" - ], - "Hash": "89e6d8219950eac806ae0c489052048a" - }, - "digest": { - "Package": "digest", - "Version": "0.6.31", - "Source": "Repository", - "Repository": "RSPM", - "Requirements": [ - "R", - "utils" - ], - "Hash": "8b708f296afd9ae69f450f9640be8990" - }, - "ellipsis": { - "Package": "ellipsis", - "Version": "0.3.2", - "Source": "Repository", - "Repository": "RSPM", - "Requirements": [ - "R", - "rlang" - ], - "Hash": "bb0eec2fe32e88d9e2836c2f73ea2077" - }, - "evaluate": { - "Package": "evaluate", - "Version": "0.20", - "Source": "Repository", - "Repository": "RSPM", - "Requirements": [ - "R", - "methods" - ], - "Hash": "4b68aa51edd89a0e044a66e75ae3cc6c" - }, - "fastmap": { - "Package": "fastmap", - "Version": "1.1.1", - "Source": "Repository", - "Repository": "CRAN", - "Hash": "f7736a18de97dea803bde0a2daaafb27" - }, - "fontawesome": { - "Package": "fontawesome", - "Version": "0.5.0", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "htmltools", - "rlang" - ], - "Hash": "e80750aec5717dedc019ad7ee40e4a7c" - }, - "fs": { - "Package": "fs", - "Version": "1.6.1", - "Source": "Repository", - "Repository": "RSPM", - "Requirements": [ - "R", - "methods" - ], - "Hash": "f4dcd23b67e33d851d2079f703e8b985" - }, - "glue": { - "Package": "glue", - "Version": "1.6.2", - "Source": "Repository", - "Repository": "RSPM", - "Requirements": [ - "R", - "methods" - ], - "Hash": "4f2596dfb05dac67b9dc558e5c6fba2e" - }, - "highr": { - "Package": "highr", - "Version": "0.10", - "Source": "Repository", - "Repository": "RSPM", - "Requirements": [ - "R", - "xfun" - ], - "Hash": "06230136b2d2b9ba5805e1963fa6e890" - }, - "htmltools": { - "Package": "htmltools", - "Version": "0.5.5", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "base64enc", - "digest", - "ellipsis", - "fastmap", - "grDevices", - "rlang", - "utils" - ], - "Hash": "ba0240784ad50a62165058a27459304a" - }, - "jquerylib": { - "Package": "jquerylib", - "Version": "0.1.4", - "Source": "Repository", - "Repository": "RSPM", - "Requirements": [ - "htmltools" - ], - "Hash": "5aab57a3bd297eee1c1d862735972182" - }, - "jsonlite": { - "Package": "jsonlite", - "Version": "1.8.4", - "Source": "Repository", - "Repository": "RSPM", - "Requirements": [ - "methods" - ], - "Hash": "a4269a09a9b865579b2635c77e572374" - }, - "knitr": { - "Package": "knitr", - "Version": "1.42", - "Source": "Repository", - "Repository": "RSPM", - "Requirements": [ - "R", - "evaluate", - "highr", - "methods", - "tools", - "xfun", - "yaml" - ], - "Hash": "8329a9bcc82943c8069104d4be3ee22d" - }, - "lifecycle": { - "Package": "lifecycle", - "Version": "1.0.3", - "Source": "Repository", - "Repository": "RSPM", - "Requirements": [ - "R", - "cli", - "glue", - "rlang" - ], - "Hash": "001cecbeac1cff9301bdc3775ee46a86" - }, - "magrittr": { - "Package": "magrittr", - "Version": "2.0.3", - "Source": "Repository", - "Repository": "RSPM", - "Requirements": [ - "R" - ], - "Hash": "7ce2733a9826b3aeb1775d56fd305472" - }, - "memoise": { - "Package": "memoise", - "Version": "2.0.1", - "Source": "Repository", - "Repository": "RSPM", - "Requirements": [ - "cachem", - "rlang" - ], - "Hash": "e2817ccf4a065c5d9d7f2cfbe7c1d78c" - }, - "mime": { - "Package": "mime", - "Version": "0.12", - "Source": "Repository", - "Repository": "RSPM", - "Requirements": [ - "tools" - ], - "Hash": "18e9c28c1d3ca1560ce30658b22ce104" - }, - "rappdirs": { - "Package": "rappdirs", - "Version": "0.3.3", - "Source": "Repository", - "Repository": "RSPM", - "Requirements": [ - "R" - ], - "Hash": "5e3c5dc0b071b21fa128676560dbe94d" - }, "renv": { "Package": "renv", "Version": "0.17.3", @@ -280,127 +26,6 @@ "utils" ], "Hash": "4543b8cd233ae25c6aba8548be9e747e" - }, - "rlang": { - "Package": "rlang", - "Version": "1.1.0", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "utils" - ], - "Hash": "dc079ccd156cde8647360f473c1fa718" - }, - "rmarkdown": { - "Package": "rmarkdown", - "Version": "2.21", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "bslib", - "evaluate", - "fontawesome", - "htmltools", - "jquerylib", - "jsonlite", - "knitr", - "methods", - "stringr", - "tinytex", - "tools", - "utils", - "xfun", - "yaml" - ], - "Hash": "493df4ae51e2e984952ea4d5c75786a3" - }, - "sass": { - "Package": "sass", - "Version": "0.4.5", - "Source": "Repository", - "Repository": "RSPM", - "Requirements": [ - "R6", - "fs", - "htmltools", - "rappdirs", - "rlang" - ], - "Hash": "2bb4371a4c80115518261866eab6ab11" - }, - "stringi": { - "Package": "stringi", - "Version": "1.7.12", - "Source": "Repository", - "Repository": "RSPM", - "Requirements": [ - "R", - "stats", - "tools", - "utils" - ], - "Hash": "ca8bd84263c77310739d2cf64d84d7c9" - }, - "stringr": { - "Package": "stringr", - "Version": "1.5.0", - "Source": "Repository", - "Repository": "RSPM", - "Requirements": [ - "R", - "cli", - "glue", - "lifecycle", - "magrittr", - "rlang", - "stringi", - "vctrs" - ], - "Hash": "671a4d384ae9d32fc47a14e98bfa3dc8" - }, - "tinytex": { - "Package": "tinytex", - "Version": "0.44", - "Source": "Repository", - "Repository": "RSPM", - "Requirements": [ - "xfun" - ], - "Hash": "c0f007e2eeed7722ce13d42b84a22e07" - }, - "vctrs": { - "Package": "vctrs", - "Version": "0.6.1", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "R", - "cli", - "glue", - "lifecycle", - "rlang" - ], - "Hash": "06eceb3a5d716fd0654cc23ca3d71a99" - }, - "xfun": { - "Package": "xfun", - "Version": "0.38", - "Source": "Repository", - "Repository": "CRAN", - "Requirements": [ - "stats", - "tools" - ], - "Hash": "1ed71215d45e85562d3b1b29a068ccec" - }, - "yaml": { - "Package": "yaml", - "Version": "2.3.7", - "Source": "Repository", - "Repository": "CRAN", - "Hash": "0d0056cc5383fbc240ccd0cb584bf436" } } } diff --git a/snakemake_cluster.md b/snakemake_cluster.md new file mode 100644 index 0000000..4a49911 --- /dev/null +++ b/snakemake_cluster.md @@ -0,0 +1,64 @@ +--- +title: "Snakemake and the Cluster" +teaching: 10 +exercises: 2 +--- + +:::::::::::::::::::::::::::::::::::::: questions + +- How can we express a one-task cluster operation in Snakemake? + +:::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::: objectives + +- Write a Snakefile that executes a job on the cluster +- Use MPI options to ensure the job runs in parallel + + +::::::::::::::::::::::::::::::::::::: + +## Snakemake and the Cluster + +Snakemake has provisions for operating on an HPC cluster. + +Various command-line arguments can be provided to tell +Snakemake not to run things locally, but do run things +via the queuing system instead. + +In this lesson, we will repeat the first module, running +the admahl code on the cluster, but will use snakemake +to make it happen. + +## Write a cluster Snakemake rule file + +Open your favorite editor, do the thing. +Specify resources. Provide command line arguments +to do the cluster operations by hand. + +## Run Snakemake + +Throw the switch! + +::::::::::::::::::::::::::::::::::::: challenge + +How can you control the degree of parallelism +of your cluster task? + +::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::: solution + +Use the "mpi" option in the resource block of +the Snakemake rule, and specify the number of tasks. +This will be mapped to the `-n` argument of the +equivalent `sbatch` command. + +:::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::: keypoints + +- Snakemake rule files can submit cluster jobs. +- There are a lot of options. + +:::::::::::::::::::::::::::::::::::::::::::::::: diff --git a/snakemake_multiple.md b/snakemake_multiple.md new file mode 100644 index 0000000..b18e7b6 --- /dev/null +++ b/snakemake_multiple.md @@ -0,0 +1,78 @@ +--- +title: "More Complicated Snakefiles" +teaching: 10 +exercises: 2 +--- + +:::::::::::::::::::::::::::::::::::::: questions + +- What is a task graph? +- How does the Snakemake file express a task graph? + +:::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::: objectives + +- Write a multiple-rule Snakefile with dependent rules +- Translate between a task graph and rule set + + +::::::::::::::::::::::::::::::::::::: + +## Snakemake and Workflow + +A Snakefile can contain multiple rules. In the trivial +case, there will be no dependencies between the rules, and +they can all run concurrently. + +A more interesting case is when there are dependencies between +the rules, e.g. when one rule takes the output of another rule +as its input. In this case, the dependent rule (the one that needs +another rule's output) cannot run until the rule it depends on +has completed. + +It's possible to express this relationship by means of +a task graph, whose nodes are tasks, and whose arcs are +input-output relationships between the tasks. + +A Snakemake file is textual description of a task +graph. + +## Write a multi-rule Snakemake rule file + +Open your favorite editor, do the thing. + +## Run Snakemake + +Throw the switch! + +::::::::::::::::::::::::::::::::::::: challenge + +Draw the task graph for your Snakefile. + +Given an example task graph, write a Snakefile that +implements it. + +::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::: solution + +The rules in the snakefile are nodes in the task +graph. Two rules are connected by an arc in the task +graph if the output of one rule is the input to the +other. The task graph is directed, so the arc points +from the rule that generates a file as output to the rule +that consumes the same file as input. + +A rule with an output that no other rules consumes is +a terminal rule. + +:::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::: keypoints + +- Snakemake rule files can be mapped to task graphs +- Tasks are executed as required in dependency order +- Where possible, tasks may run concurrently. + +:::::::::::::::::::::::::::::::::::::::::::::::: diff --git a/snakemake_profiles.md b/snakemake_profiles.md new file mode 100644 index 0000000..0cc352e --- /dev/null +++ b/snakemake_profiles.md @@ -0,0 +1,68 @@ +--- +title: "Snakemake Profiles" +teaching: 10 +exercises: 2 +--- + +:::::::::::::::::::::::::::::::::::::: questions + +- How can we encapsulate our desired snakemake configuration? +- How do we balance non-reptition and customizability? + +:::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::: objectives + +- Write a Snakemake profile for the cluster +- Run the amdahl code with varying degrees of parallelism +with the cluster profile. + + +::::::::::::::::::::::::::::::::::::: + +## Snakemake Profiles + +Snakemake has a provision for profiles, which allow users +to collect various common settings together in a special +file that snakemake examines when it runs. This lets users +avoid repetition and possible errors of omission for common +settings, and encapsulates some of the cluster complexity +we encoutered in the previous module. + +Not all settings should be in the profile. Users can +choose which ones to make static and which ones to make +adjustable. In our case, we will want to have the freedom +to choose the degree of parallelism, but most of the +cluster arguments will not change, and so can be static +in the profile. + +## Write a Profile + +Do the thing. + +## Run Snakemake + +Throw the switch! + +::::::::::::::::::::::::::::::::::::: challenge + +Write a profile that allows you to choose a +different partition, in addition to the level of +parallelism. + +::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::: solution + +The profile files can have variables taken from +the rule file, and in particular can refer to +resources from a rule. + +:::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::: keypoints + +- Snakemake profiles encapsulate cluster complexity. +- Retaining operational flexibliity is also important. + +:::::::::::::::::::::::::::::::::::::::::::::::: diff --git a/snakemake_single.md b/snakemake_single.md new file mode 100644 index 0000000..0742d14 --- /dev/null +++ b/snakemake_single.md @@ -0,0 +1,69 @@ +--- +title: "Introduction to Snakemake" +teaching: 10 +exercises: 2 +--- + +:::::::::::::::::::::::::::::::::::::: questions + +- What are Snakemake rules? +- Why do Snakemake rules not always run? + +:::::::::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::: objectives + +- Write a single-rule Snakefile and execute it with Snakemake +- Predict whether the rule will run or not + +::::::::::::::::::::::::::::::::::::: + +## Snakemake + +Snakemake is a workflow tool. It takes as input +a description of the work that you would like the computer +to do, and when run, does the work that you have +asked for. + +The description of the work takes the form of a +series of rules, written in a special format into a +Snakefile. Rules have outputs, and the Snakefile +and generated output files make up the system state. + +## Write a Snakemake rule file + +Open your favorite editor, do the thing. + +## Run Snakemake + +Throw the switch! + +::::::::::::::::::::::::::::::::::::: challenge + +Remove the output file, and run Snakemake. Then +run it again. Edit the output file, and run it +a third time. For which of these invocations +does Snakemake do nontrivial work? + +::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::: solution + +The rule does not get executed the seconed time. The +Snakemake infrastructure is stateful, and knows that +the required outputs are up to date. + +The rule also does not get executed the third time. +The output is not the output from the rule, but the +Snakemake infrastructure doesn't know that, it only +checks the file time-stamp. Editing Snakemake-manipulated +files can get you into an inconsistent state. + +:::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::: keypoints + +- Snakemake is an indirect way of running executables +- Snakemake has a notion of system state, and can be fooled. + +::::::::::::::::::::::::::::::::::::::::::::::::