Skip to content

Commit

Permalink
Rewrite to create an HTML file
Browse files Browse the repository at this point in the history
  • Loading branch information
fiveop committed Apr 19, 2023
1 parent b487dc8 commit bc3ca83
Showing 1 changed file with 40 additions and 38 deletions.
78 changes: 40 additions & 38 deletions episodes/08-process-automation.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,25 +20,25 @@ exercises: 0
We have setup our project, collaborated with our colleagues;
we fill our research diary every day. Now it is time to make use of it.

We think about using part of our daily commute by train to review what happened in the past.
We always have our e-reader with us, so ideally we would like to have an version of our diary in EPUB format.
We want to share it with the world, because what we do is so important and everyone should now about it.
So we want to create a website based on the diary and update it whenever our Markdown files change.

Just as much based in reality as the rest of this lesson’s story, we know how to do that, but we do not want to do it by hand.
Instead, we are going to let GitLab automatically build us an EPUB file, whenever our repository changes.
To achieve it, we will us the feature that GitLab files under “CI/CD”, which stands for Continuous Integration/Continuous Deployment.
Instead, we are going to let GitLab automatically turn our Markdown files into HTML, whenever our repository changes.
To achieve it, we will use two GitLab features: CI/CD-Pipelines and Pages.

::: callout

### Continuous Integration/Continuous Deployment
### CI/CD or Continuous Integration/Continuous Deployment

The terms continuous integration (CI) and continuous deployment (CD) come from the fields of software engineering and software operation.

A software project uses CI, if it has setup a process that automatically runs some (or all) tests for a software project, whenever changes are committed or even just proposed (for example through a merge request) to the main branch of a repository.
In particular when already done for proposed changes this can help prevent bugs to reach the code base, while freeing the developer of thinking of and running all the tests themselves.
In particular, when tests run already for proposed changes this can help prevent bugs to reach the code base, while freeing the developer of remembering to run all the tests, before creating a merge request.

CD is a process that updates the software on the production machines (deploys) as soon as a new change reaches the code base, generally after having passed the CI process.

Both processes can be implemented by running scripts, called jobs, triggered either by changes to the code base or by the successful completion of other jobs, which lead to the CI/CD feature of GitLab.
Both processes can be implemented by running scripts, called jobs on GitLab, triggered either by changes to the code base or by the successful completion of other jobs, which lead to the CI/CD feature of GitLab.

:::

Expand All @@ -48,12 +48,12 @@ To configure our automatic process, we navigate to our project page and click th

The page that this leads to invites us to “use a sample `.gitlab-ci.yml` template file to explore how CI/CD works.”
The GitLab CI is configured by providing a file called `.gitlab-ci.yml` in the root directory of a project’s repository.
Even though we want to learn how CI works in GitLab, we do not follow the invitation, because the example is quite elaborate and targets software developers specifically.
Even though we want to learn how to configure CI in GitLab, we do not follow the invitation, because the example is quite elaborate and targets software developers specifically.
Instead, we will use the provided template for Bash.

We click on the button labeled “Use template” of the entry for Bash in the list at the bottom of the page.
In the list at the bottom of the page, we click on the button labeled “Use template” in the entry for Bash.

This leads us to an editor for the `.gitlab-ci.yml` file that is prefilled from the selected template.
This leads us to an editor for the `.gitlab-ci.yml` file that is prepopulated with the selected template.
The file is expected to be written in the [YAML](https://yaml.org/) file format, hence the file extension `.yml`.
We will go through the example line by line. Afterwards we will adapt it to our needs.

Expand All @@ -65,7 +65,7 @@ image: busybox:latest
```
This line states that the scripts that are provided later on should be executed in Docker containers build from the stated Docker image.
Busybox is a very small Linux distribution, reduced to the bare minimum, and it provides a shell.
Busybox is a very small Linux distribution, reduced to the bare minimum, that provides bash, a shell.
::: callout
Expand All @@ -75,22 +75,22 @@ TODO
:::
Now follow to blocks starting with `before_script:` and `after_script:`.
It is followed by two blocks starting with `before_script:` and `after_script:`.
We will ignore those.

The remaining four blocks, `build1:`, `test1:`, `test2:`, and `deploy1:`, each define a so called job.
A job defines a script and represents one non-divisible unit of a CI/CD process.
Together the jobs defined in a project’s `.gitlab-ci.yml` file form a so called pipeline.
The remaining four blocks, `build1:`, `test1:`, `test2:`, and `deploy1:`, each define a so called **job**.
A job represents one non-divisible unit of a CI/CD process.
Together the jobs defined in a project’s `.gitlab-ci.yml` file form a so called **pipeline**.

By default a pipeline has three stages, called `build`, `test`, and `deploy`, and their order is important.
By default a pipeline has three **stages**, called `build`, `test`, and `deploy`, and their order is important.
The terminology comes from software development again.
A pipeline is executed by running all jobs of the first stage and if they succeed to continue onto the next stage, continuing until a job fails or all jobs of the last stage succeeded.
A pipeline is executed by running all jobs of the first stage and, if they succeed, to then continue with jobs of the next stage, repeating this until a job fails or all stages complete.

The four jobs in the example are named similar to the stages they are assigned to.
The assignment is done through the keyword `stage:` that can always be found in the second line of a job’s definition.
The assignment is done through the keyword `stage:` that can in this example be found in the second line of each job’s definition.

The lines following the keyword `script:` in each section, define the script that will be executed as part of the respective job.
In this example, they are all `echo` statements.
In this example, they are all `echo` statements, that output what follows.

## Our Own Configuration

Expand All @@ -103,10 +103,10 @@ Then we define our own stages, because we do not develop software and want to ha
```yaml
stages:
- check
- create
- publish
```

In the first stage, we will do some testing, in the second we will create an EPUB format version of our research diary.
In the first stage, we will do some testing, in the second we will create the HTML version of our diary and publish it.

We call the first job `check-for-mds` and in it make sure, that at least one Markdown file exists, because otherwise someone must have accidentally removed
them all.
Expand All @@ -124,43 +124,45 @@ We also state that the job belongs to the stage `check`.
Finally, we provide the script.
It tests for the existence of any files with names ending in `.md`.

Whenever the job `check-for-mds` successfully completes, we want to create the EPUB format version of our research diary.
We define a second job and call it `create-epub`.
Whenever the job `check-for-mds` successfully completes, we want to create the HTML version of our research diary.
We define a second job and call it `publish-on-web`.

```yaml
create-epub:
publish:
image:
name: pandoc/core:latest
entrypoint: ["/bin/sh", "-c"]
stage: create
stage: publish-on-web
script:
- pandoc *.md -o diary.epub
- pandoc *.md -o diary.html
artifacts:
paths:
- diary.epub
- diary.html
```

After giving the name of the job, we again provide an Docker image in which the job should run.
Since we want to use pandoc to convert our Markdown files into an EPUB file, we use the official pandoc Docker image.
We use pandoc to convert our Markdown files into an HTML file, so we use the official pandoc Docker image.
The next line, with the keyword `entrypoint` is necessary, because the pandoc Docker image is configured to directly run pandoc, when started in a container (pandoc is configured as its entrypoint).
However, GitLab CI expects to run scripts in the Docker container, thus expects the Docker images to start a shell.
However, GitLab CI expects to run scripts in the Docker container in a shell.

::: callout

### Pandoc

Pandoc converts text documents from format to another, for example from Markdown to EPUB.
Pandoc converts text documents from one format to another, for example from Markdown to HTML.
It supports many formats for the source documents and even more for the target document.

The [project webpage](https://pandoc.org/) provides a complete list (in text and graphical form).

The conversion is customizeable, for example through templates.

:::

Next, we specify the stage, followed by the script.
It runs pandoc on all files ending in `.md` in the current directory and instructs it to output a file `diary.epub`.
Pandoc deduces from the file extension that it should be in EPUB format.
It runs pandoc on all files ending in `.md` in the current directory and instructs it to output a file `diary.html`.
Pandoc deduces from the file extension that it should be in HTML format.

The final three lines, starting with the keyword `artifacts`, specify that GitLab CI should save the file `diary.epub` from the Docker container the job runs in and provide it for download on its web interface.
The final three lines, starting with the keyword `artifacts`, specify that GitLab CI should save the file `diary.HTML` from the Docker container the job runs in and provide it for download on its web interface.

This completes our GitLab CI configuration.
There are a lot more configuration options, for example to have a process run only for commits to certain branches, that are documented in [GitLab’s Handbook](https://docs.gitlab.com/ee/ci/yaml/gitlab_ci_yaml.html)
Expand Down Expand Up @@ -200,7 +202,7 @@ The ”Triggerer” column tells us who caused the pipeline run.
In our case that’s the author of the commit.

The ”Stages” column visualizes the stages and their state.
In addition to the colors discussed above, there are stages colored in grey.
In addition to the colors discussed above, there are stages colored in gray.
They are waiting for their predecessor stages (or need to be manually triggered, if configured that way).

We wait until our pipeline ran through.
Expand All @@ -216,10 +218,10 @@ This needs to be handled on a case by case basis.

We click on the button labeled by the three dots on the right of the entry.
In the menu that opens, we see that it provides a link to download artifacts of the pipeline.
We click on the link labeled “create-epub:archive”, which causes our browser to download a file called `artifacts.zip`.
We click on the link labeled “publish-on-web:archive”, which causes our browser to download a file called `artifacts.zip`.

In that archive, we find the file `diary.epub` that was build by our second job, `create-epub`.
In that archive, we find the file `diary.html` that was build by our second job, `publish-on-web`.

If you have an ebook viewer, you can verify that it contains the contents of all Markdown our files.
By opening the file in your browser, you can verify that it contains the contents of all Markdown our files.
(We ignore the fact, that the order might not make sense at all.
For that we would need to improve the script that builds the EPUB file.)
For that we would need to improve the script that builds the HTML file.)

0 comments on commit bc3ca83

Please sign in to comment.