From fce42607748e45d61f174a3b255596d6417b9628 Mon Sep 17 00:00:00 2001
From: Foivos Gypas <fgypas@gmail.com>
Date: Thu, 19 Dec 2024 23:13:03 +0100
Subject: [PATCH] Simplify main README.md

---
 CONTRIBUTING.md | 170 +++++++++++++++++++++++++++++++++++++
 README.md       | 221 +++++++++++++++++++-----------------------------
 2 files changed, 255 insertions(+), 136 deletions(-)
 create mode 100644 CONTRIBUTING.md
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
new file mode 100644
index 0000000..1321a82
--- /dev/null
+++ b/CONTRIBUTING.md
@@ -0,0 +1,170 @@
+# Guidelines for contributing
+
+## General workflow
+
+We are using [Git][git], [GitHub][github] and [Git Flow][git-flow].
+
+> **Note:** If you are a **beginner** and do not have a lot of experience with
+> this sort of workflow, please do not feel overwhelmed. We will guide you
+> through the process until you feel comfortable using it. And do not worry
+> about mistakes either - everybody does them. Often! Our project layout makes
+> it very very hard for anyone to cause irreversible harm, so relax, try things
+> out, take your time and enjoy the work! :)
+
+We would kindly ask you to abide by our [Code of Conduct][coc] in all
+interactions with the community when contributing to this project, regardless
+of the type of contribution. We will not accept any offensive or demeaning
+behavior towards others and will take any necessary steps to ensure that
+everyone is treated with respect and dignity.
+
+## Issue tracker
+
+Please use each project's GitHub [issue tracker][issue-tracker] to:
+
+- find issues to work on
+- report bugs
+- propose features
+- discuss future directions
+
+## Submitting issues
+
+Please choose a template when submitting an issue: choose the [**bug report**
+template][bug-report] only when reporting bugs; for all other issues,
+choose the [**feature request** template][bug-report]. Please follow the
+instructions in the templates.
+
+You do not need to worry about adding labels or milestones for an issue, the
+project maintainers will do that for you. However, it is important that all
+issues are written concisely, yet with enough detail and with proper
+references (links, screenshots, etc.) to allow other contributors to start
+working on them. For bug reports, it is essential that they include all
+information required to reproduce the bug.
+
+Please **do not** use the issue tracker to ask usage questions, installation
+problems etc., unless they appear to be bugs. For these issues, please use
+the [communication channels](#communication) outlined below.
+
+## Communication
+
+Send us an [email][contact] if you want to reach out to us
+work on)
+
+## Code style and testing
+
+To make it easier for everyone to maintain, read and contribute to the code,
+as well as to ensure that the code base is robust and of high quality, we
+would kindly ask you to stick to the following guidelines for code style and
+testing.
+
+- Please use a recent version of [Python 3][py] (3.7.4+)
+- Please try to conform to the used code, docstring and commenting style within
+  a project to maintain consistency
+- Please use [type hints][py-typing] for all function/method signatures
+  (exception: tests)
+- Please use the following linters (see configuration files in repository root
+  directory, e.g., `setup.cfg`, for settings):
+  - [`flake8`][py-flake8]
+  - [`pylint`][py-pylint] (use available [configuration][py-pylint-conf])
+  - [`mypy`][py-mypy] OR [`pyright`][py-pyright] to help with type hints
+- Please use the following test suites:
+  - [`pytest`][py-pytest]
+  - [`coverage`][py-coverage]
+
+## Commit messages
+
+In an effort to increase consistency, simplify maintenance and enable automated
+change logs, we would like to kindly ask you to write _semantic commit
+messages_, as described in the [Conventional Commits
+specification][conv-commits].
+
+The general structure of _Conventional Commits_ is as follows:
+
+```console
+<type>[optional scope]: <description>
+
+[optional body]
+
+[optional footer]
+```
+
+Depending on the changes, please use one of the following **type** prefixes:
+
+| Type | Description |
+| --- | --- |
+| build | The build type (formerly known as chore) is used to identify development changes related to the build system (involving scripts, configurations or tools) and package dependencies.  |
+| ci | The ci type is used to identify development changes related to the continuous integration and deployment system - involving scripts, configurations or tools. |
+| docs | The docs type is used to identify documentation changes related to the project - whether intended externally for the end users (in case of a library) or internally for the developers. |
+| feat | The feat type is used to identify production changes related to new backward-compatible abilities or functionality. |
+| fix | The fix type is used to identify production changes related to backward-compatible bug fixes. |
+| perf | The perf type is used to identify production changes related to backward-compatible performance improvements. |
+| refactor | The refactor type is used to identify development changes related to modifying the codebase, which neither adds a feature nor fixes a bug - such as removing redundant code, simplifying the code, renaming variables, etc. |
+| revert | For commits that revert one or more previous commits. |
+| style | The style type is used to identify development changes related to styling the codebase, regardless of the meaning - such as indentations, semi-colons, quotes, trailing commas and so on. |
+| test | The test type is used to identify development changes related to tests - such as refactoring existing tests or adding new tests. |
+
+In order to ensure that the format of your commit messages adheres to the
+Conventional Commits specification and the defined type vocabulary, you can
+use the [dedicated linter][conv-commits-lint]. More information about
+_Conventional Commits_ can also be found in this [blog
+post][conv-commits-blog].
+
+## Merging your code
+
+Here is a check list that you can follow to make sure that code merges
+happen smoothly:
+
+1. [Open an issue](#submitting-issues) _first_ to give other contributors a
+   chance to discuss the proposed changes (alternatively: assign yourself
+   to one of the existing issues)
+2. Clone the repository, create a feature branch off of the default branch
+   (never commit changes to protected branches directly) and implement your
+   code changes
+3. If applicable, update relevant sections of the [documentation][docs]
+4. Add or update tests; untested code will not be merged; refer to the
+   [guidelines](#code-style-and-testing) above for details
+5. Ensure that your coding style is in line with the
+   [guidelines](#code-style-and-testing) described above
+6. Ensure that all tests and linter checks configured in the [Travis
+   CI][travis-docs] [continuous integration][ci-cd] (CI) pipeline pass without
+   issues
+7. If necessary, clean up excessive commits with `git rebase`; cherry-pick and
+   merge commits as you see fit; use concise and descriptive commit messages
+8. Push your clean, tested and documented feature branch to the remote; make
+   sure the [Travis CI][travis-docs] [CI][ci-cd] pipeline passes
+9. Issue a pull request against the default branch; follow the instructions in
+   the [template][pull-request]; importantly, describe your changes in
+   detail, yet with concise language, and do not forget to indicate which
+   issue(s) the code changes resolve or refer to; assign a project maintainer
+   to review your changes
+
+## Becoming a co-maintainer
+
+If you are as interested in the project as we are and have contributed some
+code, suggested some features or bug reports and have taken part in
+discussions on where to go with the project, we will very likely to have you
+on board as a co-maintainer. If you are intersted in that, please let us
+know. You can reach us by [email][contact].
+
+[bug-report]: .github/ISSUE_TEMPLATE/bug_report.mdrequest.md
+[ci-cd]: <https://en.wikipedia.org/wiki/Continuous_integration>
+[coc]: CODE_OF_CONDUCT.md
+[contact]: <zavolab-biozentrum@unibas.ch>
+[conv-commits]: <https://www.conventionalcommits.org/en/v1.0.0-beta.2/#specification>
+[conv-commits-blog]: <https://nitayneeman.com/posts/understanding-semantic-commit-messages-using-git-and-angular/>
+[conv-commits-lint]: <https://github.com/conventional-changelog/commitlint>
+[docs]: README.md
+[git]: <https://git-scm.com/>
+[git-flow]: <https://nvie.com/posts/a-successful-git-branching-model/>
+[github]: <https://github.com>
+[issue-tracker]: <https://github.com/zavolanlab/zarp/issues>
+[pull-request]: PULL_REQUEST_TEMPLATE.md
+[py]: <https://www.python.org/>
+[py-flake8]: <https://gitlab.com/pycqa/flake8>
+[py-mypy]: <http://mypy-lang.org/>
+[py-pylint]: <https://www.pylint.org/>
+[py-pylint-conf]: pylint.cfg
+[py-pyright]: <https://github.com/microsoft/pyright>
+[py-pytest]: <https://docs.pytest.org/en/latest/>
+[py-coverage]: <https://pypi.org/project/coverage/>
+[py-typing]: <https://docs.python.org/3/library/typing.html>
+[travis-docs]: <https://docs.travis-ci.com/>
diff --git a/README.md b/README.md
index 5bbe011..baaf9cf 100644
--- a/README.md
+++ b/README.md
@@ -1,54 +1,30 @@
 [![ci](https://github.com/zavolanlab/zarp/workflows/CI/badge.svg?branch=dev)](https://github.com/zavolanlab/zarp/actions?query=workflow%3Aci)
 [![GitHub license](https://img.shields.io/github/license/zavolanlab/zarp?color=orange)](https://github.com/zavolanlab/zarp/blob/dev/LICENSE)
-[![DOI:biorxiv](https://img.shields.io/badge/bioRxiv-10.1101%2F2021.11.18.469017-informational)](https://doi.org/10.1101/2021.11.18.469017)
-[![DOI:zenodo](https://img.shields.io/badge/Zenodo-10.5281%2Fzenodo.5703358-informational)](https://doi.org/10.5281/zenodo.5703358)
+[![Static Badge](https://img.shields.io/badge/f1000-10.12688/f1000research.149237.1-blue)](https://doi.org/10.12688/f1000research.149237.1)
+[![DOI:zenodo](https://img.shields.io/badge/Zenodo-10.5281%2Fzenodo.10797025-informational)](https://doi.org/10.5281/zenodo.10797025)
 [![DOI:workflowhub](https://img.shields.io/badge/WorkflowHub-10.48546%2Fworkflowhub.workflow.447.1-informational)](https://doi.org/10.48546/workflowhub.workflow.447.1)
 
 <div align="left">
     <img width="20%" align="left" src=images/zarp_logo.svg>
-</div> 
+</div>
 
-**ZARP** ([Zavolab][zavolan-lab] Automated RNA-seq Pipeline) is a generic
-RNA-Seq analysis workflow that allows users to process and analyze Illumina
-short-read sequencing libraries with minimum effort. Better yet: With our
-companion [**ZARP-cli**](https://github.com/zavolanlab/zarp-cli) command line
-interface, you can start ZARP runs with the simplest and most intuitive
-commands.
+**ZARP** ([Zavolab][zavolan-lab] Automated RNA-seq Pipeline) is a generic RNA-Seq analysis workflow that allows users to process and analyze Illumina short-read sequencing libraries with minimum effort. Better yet: With our companion [**ZARP-cli**](https://github.com/zavolanlab/zarp-cli) command line interface, you can start ZARP runs with the simplest and most intuitive commands.
 
 _RNA-seq analysis doesn't get simpler than that!_
 
-ZARP relies on publicly available bioinformatics tools and currently handles
-single or paired-end stranded bulk RNA-seq data. The workflow is developed in
-[Snakemake][snakemake], a widely used workflow management system in the
-bioinformatics community.
+ZARP relies on publicly available bioinformatics tools and currently handles single or paired-end stranded bulk RNA-seq data. The workflow is developed in [Snakemake][snakemake], a widely used workflow management system in the bioinformatics community.
 
-ZARP will pre-process, align and quantify your single- or paired-end stranded
-bulk RNA-seq sequencing libraries with publicly available state-of-the-art
-bioinformatics tools. ZARP's browser-based rich reports and visualitations will
-give you meaningful initial insights in the quality and composition of your
-sequencing experiments - fast and simple. Whether you are an experimentalist
-struggling with large scale data analysis or an experienced bioinformatician,
-when there's RNA-seq data to analyze, just _zarp 'em_!
+ZARP will pre-process, align and quantify your single- or paired-end stranded bulk RNA-seq sequencing libraries with publicly available state-of-the-art bioinformatics tools. ZARP's browser-based rich reports and visualitations will give you meaningful initial insights in the quality and composition of your sequencing experiments - fast and simple. Whether you are an experimentalist struggling with large scale data analysis or an experienced bioinformatician, when there's RNA-seq data to analyze, just _ZARP 'em_!
 
 <div align="center">
     <img width="60%" src=images/zarp_schema.png>
 </div> 
 
-> **Note:** For a more detailed description of each step, please refer to the [workflow
-> documentation][pipeline-documentation].
-
-# Requirements
-
-The workflow has been tested on:
-- CentOS 7.5
-- Debian 10
-- Ubuntu 16.04, 18.04
-
-> **NOTE:**
-> Currently, we only support **Linux** execution. 
+# Documentation
 
+For the full documentation please visit the [ZARP website](https://zavolanlab.github.io/zarp).
 
-# Installation
+# Quick installation
 
 > **IMPORTANT: Rather than installing the ZARP workflow as described in this section, we
 > recommend installing [ZARP-cli](https://github.com/zavolanlab/zarp-cli) for most use
@@ -56,141 +32,112 @@ The workflow has been tested on:
 > instructions](https://zavolanlab.github.io/zarp-cli/guides/installation/), you can
 > skip the instructions below.
 
-## 1. Clone the repository
-
-Go to the desired directory/folder on your file system, then clone/get the 
-repository and move into the respective directory with:
+Quick installation requires the following:
+- Linux
+- Git
+- [Conda][conda] >= 22.11.1
+- [Mamba][mamba] >=1.3.0 <2
+- [Singularity][singularity] >=3.5.2  (Required only if you want to use Singulaarity for the dependencies)
 
 ```bash
 git clone https://github.com/zavolanlab/zarp.git
 cd zarp
+mamba env create -f install/environment.yml
+conda activate zarp
 ```
 
-## 2. Conda and Mamba installation
+# Basic usage
 
-Workflow dependencies can be conveniently installed with the [Conda][conda]
-package manager. We recommend that you install [Miniconda][miniconda-installation] 
-for your system (Linux). Be sure to select Python 3 option. 
-The workflow was built and tested with `miniconda 4.7.12`.
-Other versions are not guaranteed to work as expected.
+You can trigger ZARP without ZARP-cli. This is convenient for users who have some experience with Snakemake and don't want to use a CLI to trigger their runs. Extensive documentation of the usage is available in the [usage documentation](https://zavolanlab.github.io/zarp/guides/usage/), while below you can find the basic steps to trigger a run.
 
-Given that Miniconda has been installed and is available in the current shell the first
-dependency for ZARP is the [Mamba][mamba] package manager (version 1), which needs to be installed in
-the `base` conda environment with:
+1. Assuming that your current directory is the workflow repository's root directory,
+create a directory for your workflow run and move into it with:
 
-```bash
-conda install mamba=1 -n base -c conda-forge
-```
+    ```bash
+    mkdir config/my_run
+    cd config/my_run
+    ```
 
-## 3. Dependencies installation
+2. Create an empty sample table and a workflow configuration file:
 
-For improved reproducibility and reusability of the workflow,
-each individual step of the workflow runs either in its own [Singularity][singularity]
-container or in its own [Conda][conda] virtual environemnt. 
-As a consequence, running this workflow has very few individual dependencies. 
-The **container execution** requires Singularity to be installed on the system where the workflow is executed. 
-As the functional installation of Singularity requires root privileges, and Conda currently only provides Singularity
-for Linux architectures, the installation instructions are slightly different depending on your system/setup:
+    ```bash
+    touch samples.tsv
+    touch config.yaml
+    ```
 
-### For most users
+3. Use your editor of choice to populate these files with appropriate
+values. Have a look at the examples in the `tests/` directory to see what the
+files should look like, specifically:
 
-If you do *not* have root privileges on the machine you want
-to run the workflow on *or* if you do not have a Linux machine, please [install
-Singularity][singularity-install] separately and in privileged mode, depending
-on your system. You may have to ask an authorized person (e.g., a systems
-administrator) to do that. This will almost certainly be required if you want
-to run the workflow on a high-performance computing (HPC) cluster. 
+    - [samples.tsv](https://github.com/zavolanlab/zarp/blob/dev/tests/input_files/samples.tsv)
+    - [config.yaml](https://github.com/zavolanlab/zarp/blob/dev/tests/input_files/config.yaml)
 
-> **NOTE:**
-> The workflow has been tested with the following Singularity versions:  
->  * `v2.6.2`
->  * `v3.5.2`
 
-After installing Singularity, install the remaining dependencies with:
-```bash
-mamba env create -f install/environment.yml
-```
+4. Create a runner script. Pick one of the following choices for either local
+or cluster execution. Before execution of the respective command, you need to
+remember to update the argument of the `--singularity-args` option of a
+respective profile (file: `profiles/{profile}/config.yaml`) so that
+it contains a comma-separated list of _all_ directories
+containing input data files (samples and any annotation files etc) required for
+your run.
 
+    Runner script for _local execution_:
 
-### As root user on Linux
+    ```bash
+    cat << "EOF" > run.sh
+    #!/bin/bash
 
-If you have a Linux machine, as well as root privileges, (e.g., if you plan to
-run the workflow on your own computer), you can execute the following command
-to include Singularity in the Conda environment:
+    snakemake \
+        --profile="../../profiles/local-singularity" \
+        --configfile="config.yaml"
 
-```bash
-mamba env update -f install/environment.root.yml
-```
+    EOF
+    ```
 
-## 4. Activate environment
+    **OR**
 
-Activate the Conda environment with:
+    Runner script for _Slurm cluster exection_ (note that you may need
+    to modify the arguments to `--jobs` and `--cores` in the file:
+    `profiles/slurm-singularity/config.yaml` depending on your HPC
+    and workload manager configuration):
 
-```bash
-conda activate zarp
-```
-
-# Extra installation steps (optional)
+    ```bash
+    cat << "EOF" > run.sh
+    #!/bin/bash
+    mkdir -p logs/cluster_log
+    snakemake \
+        --profile="../profiles/slurm-singularity" \
+        --configfile="config.yaml"
+    EOF
+    ```
 
-## 5. Non-essential dependencies installation
+    > Note: When running the pipeline with *Conda* you should use `local-conda` and
+    `slurm-conda` profiles instead.
 
-Most tests have additional dependencies. If you are planning to run tests, you
-will need to install these by executing the following command _in your active
-Conda environment_:
+    > Note: The slurm profiles are adapted to a cluster that uses the quality-of-service (QOS) keyword. If QOS is not supported by your slurm instance, you have to remove all the lines with "qos" in `profiles/slurm-config.json`.
 
-```bash
-mamba env update -f install/environment.dev.yml
-```
+5. Start your workflow run:
 
-## 6. Successful installation tests
-
-We have prepared several tests to check the integrity of the workflow and its
-components. These can be found in subdirectories of the `tests/` directory. 
-The most critical of these tests enable you to execute the entire workflow on a 
-set of small example input files. Note that for this and other tests to complete
-successfully, [additional dependencies](#installing-non-essential-dependencies) 
-need to be installed. 
-Execute one of the following commands to run the test workflow 
-on your local machine:
-* Test workflow on local machine with **Singularity**:
-```bash
-bash tests/test_integration_workflow/test.local.sh
-```
-* Test workflow on local machine with **Conda**:
-```bash
-bash tests/test_integration_workflow_with_conda/test.local.sh
-```
-Execute one of the following commands to run the test workflow 
-on a [Slurm][slurm]-managed high-performance computing (HPC) cluster:
-
-* Test workflow with **Singularity**:
-
-```bash
-bash tests/test_integration_workflow/test.slurm.sh
-```
-* Test workflow with **Conda**:
-
-```bash
-bash tests/test_integration_workflow_with_conda/test.slurm.sh
-```
+    ```bash
+    bash run.sh
+    ```
 
-> **NOTE:** Depending on the configuration of your Slurm installation you may
-> need to adapt file `slurm-config.json` (located directly under `profiles`
-> directory) and the arguments to options `--cores` and `--jobs`
-> in the file `config.yaml` of a respective profile.
-> Consult the manual of your workload manager as well as the section of the
-> Snakemake manual dealing with [profiles].
+## Contributing
 
-# Running the workflow on your own samples
+This project lives off your contributions, be it in the form of bug reports,
+feature requests, discussions, or fixes and other code changes. Please refer
+to the [contributing guidelines](CONTRIBUTING.md) if you are interested to
+contribute. Please mind the [code of conduct](CODE_OF_CONDUCT.md) for all
+interactions with the community.
 
-## Running ZARP with ZARP-cli (recommended)
+## Contact
 
-Head over to the [ZARP-cli](https://zavolanlab.github.io/zarp-cli/) to learn how to
-start ZARP runs with very simple commands, like:
+For questions or suggestions regarding the code, please use the
+[issue tracker][issue-tracker]. For any other inquiries, please contact us
+by [email][contact].
 
-## Running ZARP without ZARP-cli
+&copy; 2021 [Zavolab, Biozentrum, University of Basel][zavolab]
 
-You can also trigger ZARP without ZARP-cli. This is convenient for users who have some experience with snakemake and don't want to use a CLI to trigger their runs. Please head over to the [ZARP](https://zavolanlab.github.io/zarp/) documentation to learn how to start ZARP.
 
 [conda]: <https://docs.conda.io/projects/conda/en/latest/index.html>
 [hts-infer]: <https://github.com/zavolanlab/htsinfer>
@@ -208,3 +155,5 @@ You can also trigger ZARP without ZARP-cli. This is convenient for users who hav
 [zavolan-lab]: <https://www.biozentrum.unibas.ch/research/researchgroups/overview/unit/zavolan/research-group-mihaela-zavolan/>
 [pipeline-documentation]: pipeline_documentation.md
 [resources.tmpdir]: <https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html?#standard-resources>
+[zavolab]: <https://www.biozentrum.unibas.ch/research/researchgroups/overview/unit/zavolan/research-group-mihaela-zavolan/>
+[contact]: <mailto:zavolab-biozentrum@unibas.ch>
\ No newline at end of file