Skip to content

Commit

Permalink
markdown source builds
Browse files Browse the repository at this point in the history
Auto-generated via {sandpaper}
Source  : 85bc4d4
Branch  : main
Author  : Trevor Keller <[email protected]>
Time    : 2023-06-01 21:14:18 +0000
Message : Merge pull request #4 from carpentries-incubator/episode-skeleton

Episode skeleton
  • Loading branch information
actions-user committed Jun 1, 2023
1 parent 7d62abc commit e14df8e
Show file tree
Hide file tree
Showing 11 changed files with 444 additions and 498 deletions.
90 changes: 90 additions & 0 deletions amdahl_foundation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
---
title: "Running a Parallel Application on the Cluster"
teaching: 10
exercises: 2
---

:::::::::::::::::::::::::::::::::::::: questions

- What output does the Amdahl code generate?
- Why does parallelizing the amdahl code make it faster?

::::::::::::::::::::::::::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::: objectives

- Run the amdahl parallel code on the cluster
- Note what output is generated, and where it goes
- Predict the trend of execution time vs parallelism

::::::::::::::::::::::::::::::::::::::::::::::::

## Introduction

A high-performance computing cluster offers powerful
computational resources to its users, but taking advantage
of these resources is not always straightforward. The
cluster system does not work in the same way as systems
you may be more familiar with.

The software we will use in this lesson is a model of
the kind of parallel task that is well-adapted to
high-performance computing resources. It's called "amdahl",
named for Eugene Amdahl, a famous computer scientist who
coined "Amdahl's Law", which is about the advantages and
limitations of parallelism in code execution.


:::::::::::::::::::::::::::::::::: callout

[Amdahl's Law](https://en.wikipedia.org/wiki/Amdahl%27s_law) is
a statement about how much benefit you can expect to get by
parallelizing a computer program.

The limitation arises from the fact that, in any application,
there is some fraction of the work to be done which is inherently
serial, and some fraction which is amenable to parallelization.
The law is a quantitative expression of the fact that, by
parallelizing the code, you can only ever make the parallel
part faster, you cannot reduce the execution time of the
serial part.

As a practical matter, this means that developer effort spent
on parallelization has diminishing returns on the overall
reduction in execution time.

:::::::::::::::::::::::::::::::::::::

## The Amdahl Code

Download it and install it, via pip.

## Running It on the Cluster

Use the `sacct` command to see the run-time. The run-time
is also recorded in the output itself.

::::::::::::::::::::::::::::::::::::: challenge

Run the amdhal code with a few (small!) levels
of parallelism. Make a quantitative estimate of
how much faster the code will run with 3 processors
than 2. The naive estimate would be that it would run
1.5x the speed, or equivalently, that it would
complete in 2/3 the time.

::::::::::::::::::::::::::::::::::::: solution

The amdahl code runs faster with 3 processors than with
2, but the speed-up is less than 1.5x.

:::::::::::::::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::: keypoints

- The amdahl code is a model of a parallel application
- The execution speed depends on the degree of parallelism

::::::::::::::::::::::::::::::::::::::::::::::::
61 changes: 61 additions & 0 deletions amdahl_snakemake.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
---
title: "Amdahl Parallel Runs"
teaching: 10
exercises: 2
---

:::::::::::::::::::::::::::::::::::::: questions

- How can we collect data on Amdahl run times?

::::::::::::::::::::::::::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::: objectives

- Collect systematic data on the runtime of the amdahl code

:::::::::::::::::::::::::::::::::::::

## Systematic Data Collection

Using what we have learned so far, including Snakemake
profiles and rules, we will now compose a Snakefile
that runs the Amdahl example code over a range of
parallel widths. This workflow will generate the
data we will use in the next module to demonstrate
the diminishing returns of increasing parallelism.

## Write a File

Compose the Snakemake file that does what we want.

We can put the widths in a list and iterate over
them. We will use the profile generated previously
to ensure that the jobs run on the cluster.

## Run Snakemake

Throw the switch!

::::::::::::::::::::::::::::::::::::: challenge

Our example has a single paramter, the parallelism,
that we vary. How would you generalize this to arbitrary
parameters?

:::::::::::::::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::: solution

Arbitrary parameters are still finite, so you could
just generate a flat list of all the combinations, and iterate
over that. Or you could generate two lists and do a nested
loop.

::::::::::::::::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::: keypoints

- A relatively compact snakemake file collects interesting data.

::::::::::::::::::::::::::::::::::::::::::::::::
7 changes: 6 additions & 1 deletion config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,12 @@ contact: '[email protected]' # FIXME

# Order of episodes in your lesson
episodes:
- introduction.Rmd
- amdahl_foundation.md
- snakemake_single.md
- snakemake_multiple.md
- snakemake_cluster.md
- snakemake_profiles.md
- amdahl_snakemake.md

# Information for Learners
learners:
Expand Down
Binary file removed fig/introduction-rendered-pyramid-1.png
Binary file not shown.
119 changes: 0 additions & 119 deletions introduction.md

This file was deleted.

11 changes: 8 additions & 3 deletions md5sum.txt
Original file line number Diff line number Diff line change
@@ -1,12 +1,17 @@
"file" "checksum" "built" "date"
"CODE_OF_CONDUCT.md" "c93c83c630db2fe2462240bf72552548" "site/built/CODE_OF_CONDUCT.md" "2023-05-02"
"LICENSE.md" "b24ebbb41b14ca25cf6b8216dda83e5f" "site/built/LICENSE.md" "2023-05-02"
"config.yaml" "3e7855c6ceaa6f7d37cadf15ef27e95d" "site/built/config.yaml" "2023-05-02"
"config.yaml" "a4b7ada62c5b5c170d7f9a8db3f91eb2" "site/built/config.yaml" "2023-06-01"
"index.md" "a02c9c785ed98ddd84fe3d34ddb12fcd" "site/built/index.md" "2023-05-02"
"links.md" "8184cf4149eafbf03ce8da8ff0778c14" "site/built/links.md" "2023-05-02"
"episodes/introduction.Rmd" "ff977557e9e880564e0636c2c3ff3fe4" "site/built/introduction.md" "2023-05-02"
"episodes/amdahl_foundation.md" "c77d9c450a51152939e07795efe01a76" "site/built/amdahl_foundation.md" "2023-06-01"
"episodes/snakemake_single.md" "8a0101812af2f8a1ee5396dcdbb07843" "site/built/snakemake_single.md" "2023-06-01"
"episodes/snakemake_multiple.md" "42909d76788532aa7c7581cf9fdfd4f1" "site/built/snakemake_multiple.md" "2023-06-01"
"episodes/snakemake_cluster.md" "3a99cd6440cd66d7f7e7f17045aa280b" "site/built/snakemake_cluster.md" "2023-06-01"
"episodes/snakemake_profiles.md" "a9a31ead95d1a408a01db09a2970ca2c" "site/built/snakemake_profiles.md" "2023-06-01"
"episodes/amdahl_snakemake.md" "5b47e3bc93d2f6472c25764902160f5a" "site/built/amdahl_snakemake.md" "2023-06-01"
"instructors/instructor-notes.md" "cae72b6712578d74a49fea7513099f8c" "site/built/instructor-notes.md" "2023-05-02"
"learners/reference.md" "1c7cc4e229304d9806a13f69ca1b8ba4" "site/built/reference.md" "2023-05-02"
"learners/setup.md" "61568b36c8b96363218c9736f6aee03a" "site/built/setup.md" "2023-05-02"
"profiles/learner-profiles.md" "60b93493cf1da06dfd63255d73854461" "site/built/learner-profiles.md" "2023-05-02"
"renv/profiles/lesson-requirements/renv.lock" "7e6ed5826061f0c954127b73f76fca46" "site/built/renv.lock" "2023-05-02"
"renv/profiles/lesson-requirements/renv.lock" "c3e9e558e1985837d230c3b923ab1c5a" "site/built/renv.lock" "2023-06-01"
Loading

0 comments on commit e14df8e

Please sign in to comment.