Skip to content

Commit

Permalink
update for loops; fix #23
Browse files Browse the repository at this point in the history
  • Loading branch information
tavareshugo committed Jul 18, 2024
1 parent 81c615c commit ce22145
Showing 1 changed file with 88 additions and 19 deletions.
107 changes: 88 additions & 19 deletions materials/02-programming/03-loops.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,11 @@ title: "Loops"
As such they are key to productivity improvements through automation.
Similar to wildcards and tab completion, using loops also reduces the amount of typing required (and hence reduces the number of typing mistakes).

Going back to our `molecules` directory, suppose we wanted to use our `count_atoms.sh` script to get the number of atoms in each of our molecules' PDB files.
We know how to run the script for a single file:
Going back to our `molecules` directory, suppose we wanted to count the number of atoms in each of our molecules' PDB files.
As a reminder, here is the command to do this for one of our files:

```bash
bash count_atoms.sh cubane.pdb
cat cubane.pdb | grep "ATOM" | wc -l
```

Of course, we could manually then repeat this for each of our molecule files: `cubane.pdb`, `ethane.pdb`, `methane.pdb`, `octane.pdb`, `pentane.pdb`, `propane.pdb`.
Expand All @@ -31,31 +31,29 @@ We’ll use a loop to solve this problem, but first let’s look at the general
```bash
for thing in list_of_things
do
operation_using $thing # Indentation within the loop is not required, but aids legibility
# Indentation within the loop is not required, but aids legibility
operation_using ${thing}
done
```

Let's create a new script called `count_loop.sh` (using `nano` or `gedit`), where we apply this idea to our example:
Taking our command above to count atoms, let's create a new script called `count_loop.sh`, where we apply this idea:

```bash
#!/bin/bash

for filename in cubane.pdb ethane.pdb methane.pdb
do
# count the number of lines containing the word "ATOM"
natoms=$(cat ${filename} | grep "ATOM" | wc -l)

# print a message to the user
echo "The number of atoms in ${filename} is: ${natoms}"
cat ${filename} | grep "ATOM" | wc -l
done
```

If we run this script (`bash count_loop.sh`), we get the expected output:
If we ran this script (`bash count_loop.sh`), we would get the following output:

```
The number of atoms in cubane.pdb is: 16
The number of atoms in ethane.pdb is: 8
The number of atoms in methane.pdb is: 5
16
8
5
```

When the shell sees the keyword `for`, it knows to repeat a command (or group of commands) once for each item in a list.
Expand All @@ -65,12 +63,42 @@ Inside the loop, we call for the variable's value `$filename`.

In our example, at each iteration of the _for loop_, the variable `$filename` stored a different value, cycling through `cubane.pdb`, `ethane.pdb` and finally `methane.pdb`.

At the moment our script is not very informative of what files are being processed.
But we could use some of the programming techniques we've already learned about to make our output even more informative.
Here is an example of a modified script:

```bash
#!/bin/bash

for filename in cubane.pdb ethane.pdb methane.pdb
do
# count the number of lines containing the word "ATOM"
# store the result inside a variable 'natoms'
natoms=$(cat ${filename} | grep "ATOM" | wc -l)

# print a message to the user
echo "The number of atoms in ${filename} is: ${natoms}"
done
```

If we run this script (`bash count_loop.sh`), we get a more informative output than before:

```
The number of atoms in cubane.pdb is: 16
The number of atoms in ethane.pdb is: 8
The number of atoms in methane.pdb is: 5
```

::: {.callout-note}
- Do not use spaces, quotes, or wildcard characters such as '*' or '?' in filenames, as it complicates variable expansion.
- Give files consistent names that are easy to match with wildcard patterns to make it easy to select them for looping.
:::

:::{.callout-exercise}

See the [loop multiple files](#wildcard-loops-exr) and [searching for text](#grep-loop-exr) exercises to test your knowledge.
:::


## Scripts within scripts

Expand Down Expand Up @@ -117,11 +145,15 @@ bash count_atoms.sh methane.pdb
So, it wouldn't actually run the command within the loop, but rather tell us what would have been run.
This is a good practice when building scripts that include a _for loop_, because it lets us check that our code is all correct.

:::{.callout-exercise}
Try the [dry run exercise](#dry-run-exr) to test your knowledge.
:::


## Exercises

:::{.callout-exercise}
#### Multiple files
:::{.callout-exercise #wildcard-loops-exr}
#### Looping multiple files
{{< level 1 >}}

Can you think of a way to improve our `count_loop.sh` script, so that every file gets processed, but without having to type all the individual files' names?
Expand All @@ -142,7 +174,7 @@ done
:::
:::

:::{.callout-exercise}
:::{.callout-exercise #grep-loop-exr}
#### Searching for text
{{< level 2 >}}

Expand All @@ -158,7 +190,9 @@ Write a _for loop_ to search for several variants:

- Use `nano` to create a new script called `count_variants.sh`.
- Adapt the commands shown above to write a _for loop_ to search for the variants "Alpha", "Delta" and "Omicron".
- Bonus: print a message indicating which of the variants is being searched for.
- Print a message indicating which of the variants is being searched for.

**Bonus (optional):** modify the script to output the results to a CSV file called `variant_counts.csv` with the name of the variant as the first column and the count as the second column.

::: {.callout-answer collapse=true}
We can write the following script:
Expand Down Expand Up @@ -189,13 +223,48 @@ The number of Delta samples is: 75
The number of Omicron samples is: 93
```

The bonus task asked to modify the code to output the results to a file.
We can use the redirection operators (`>` / `>>`) to achieve this:

```bash
#!/bin/bash

# outside of the loop we initiate a new file with column names
echo "variant,count" > variant_counts.csv

for variant in Alpha Delta Omicron
do
# count the variant occurrence across all files - save the result in a variable called "n"
n=$(cat *_variants.csv | grep "${variant}" | wc -l)

# we append the variant name and its count to our file, each separated by a comma
echo "${variant},${n}" >> variant_counts.csv
done
```

If we run this modified script (`bash count_variants.sh`), nothing is printed to the terminal.
However, a file is created in our directory, which contains the results of our analysis:

```bash
cat variant_counts.csv
```

```
variant,count
Alpha,38
Delta,75
Omicron,93
```

Because this is a CSV file, we could easily import it into a data analysis package (e.g. R or Python) to produce some visualisations.

:::
:::


:::{.callout-exercise}
:::{.callout-exercise #dry-run-exr}
#### Dry run
{{< level 3 >}}
{{< level 2 >}}

Suppose we want to set up up a directory structure to organize some experiments measuring reaction rate constants with different compounds and different temperatures.
Modify the following code to run as a _dry-run_ (i.e. not actually execute the command inside the loop) and try to understand what would happen:
Expand Down

0 comments on commit ce22145

Please sign in to comment.