From fce42607748e45d61f174a3b255596d6417b9628 Mon Sep 17 00:00:00 2001 From: Foivos Gypas Date: Thu, 19 Dec 2024 23:13:03 +0100 Subject: [PATCH] Simplify main README.md --- CONTRIBUTING.md | 170 +++++++++++++++++++++++++++++++++++++ README.md | 221 +++++++++++++++++++----------------------------- 2 files changed, 255 insertions(+), 136 deletions(-) create mode 100644 CONTRIBUTING.md diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 0000000..1321a82 --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,170 @@ +# Guidelines for contributing + +## General workflow + +We are using [Git][git], [GitHub][github] and [Git Flow][git-flow]. + +> **Note:** If you are a **beginner** and do not have a lot of experience with +> this sort of workflow, please do not feel overwhelmed. We will guide you +> through the process until you feel comfortable using it. And do not worry +> about mistakes either - everybody does them. Often! Our project layout makes +> it very very hard for anyone to cause irreversible harm, so relax, try things +> out, take your time and enjoy the work! :) + +We would kindly ask you to abide by our [Code of Conduct][coc] in all +interactions with the community when contributing to this project, regardless +of the type of contribution. We will not accept any offensive or demeaning +behavior towards others and will take any necessary steps to ensure that +everyone is treated with respect and dignity. + +## Issue tracker + +Please use each project's GitHub [issue tracker][issue-tracker] to: + +- find issues to work on +- report bugs +- propose features +- discuss future directions + +## Submitting issues + +Please choose a template when submitting an issue: choose the [**bug report** +template][bug-report] only when reporting bugs; for all other issues, +choose the [**feature request** template][bug-report]. Please follow the +instructions in the templates. + +You do not need to worry about adding labels or milestones for an issue, the +project maintainers will do that for you. However, it is important that all +issues are written concisely, yet with enough detail and with proper +references (links, screenshots, etc.) to allow other contributors to start +working on them. For bug reports, it is essential that they include all +information required to reproduce the bug. + +Please **do not** use the issue tracker to ask usage questions, installation +problems etc., unless they appear to be bugs. For these issues, please use +the [communication channels](#communication) outlined below. + +## Communication + +Send us an [email][contact] if you want to reach out to us +work on) + +## Code style and testing + +To make it easier for everyone to maintain, read and contribute to the code, +as well as to ensure that the code base is robust and of high quality, we +would kindly ask you to stick to the following guidelines for code style and +testing. + +- Please use a recent version of [Python 3][py] (3.7.4+) +- Please try to conform to the used code, docstring and commenting style within + a project to maintain consistency +- Please use [type hints][py-typing] for all function/method signatures + (exception: tests) +- Please use the following linters (see configuration files in repository root + directory, e.g., `setup.cfg`, for settings): + - [`flake8`][py-flake8] + - [`pylint`][py-pylint] (use available [configuration][py-pylint-conf]) + - [`mypy`][py-mypy] OR [`pyright`][py-pyright] to help with type hints +- Please use the following test suites: + - [`pytest`][py-pytest] + - [`coverage`][py-coverage] + +## Commit messages + +In an effort to increase consistency, simplify maintenance and enable automated +change logs, we would like to kindly ask you to write _semantic commit +messages_, as described in the [Conventional Commits +specification][conv-commits]. + +The general structure of _Conventional Commits_ is as follows: + +```console +[optional scope]: + +[optional body] + +[optional footer] +``` + +Depending on the changes, please use one of the following **type** prefixes: + +| Type | Description | +| --- | --- | +| build | The build type (formerly known as chore) is used to identify development changes related to the build system (involving scripts, configurations or tools) and package dependencies. | +| ci | The ci type is used to identify development changes related to the continuous integration and deployment system - involving scripts, configurations or tools. | +| docs | The docs type is used to identify documentation changes related to the project - whether intended externally for the end users (in case of a library) or internally for the developers. | +| feat | The feat type is used to identify production changes related to new backward-compatible abilities or functionality. | +| fix | The fix type is used to identify production changes related to backward-compatible bug fixes. | +| perf | The perf type is used to identify production changes related to backward-compatible performance improvements. | +| refactor | The refactor type is used to identify development changes related to modifying the codebase, which neither adds a feature nor fixes a bug - such as removing redundant code, simplifying the code, renaming variables, etc. | +| revert | For commits that revert one or more previous commits. | +| style | The style type is used to identify development changes related to styling the codebase, regardless of the meaning - such as indentations, semi-colons, quotes, trailing commas and so on. | +| test | The test type is used to identify development changes related to tests - such as refactoring existing tests or adding new tests. | + +In order to ensure that the format of your commit messages adheres to the +Conventional Commits specification and the defined type vocabulary, you can +use the [dedicated linter][conv-commits-lint]. More information about +_Conventional Commits_ can also be found in this [blog +post][conv-commits-blog]. + +## Merging your code + +Here is a check list that you can follow to make sure that code merges +happen smoothly: + +1. [Open an issue](#submitting-issues) _first_ to give other contributors a + chance to discuss the proposed changes (alternatively: assign yourself + to one of the existing issues) +2. Clone the repository, create a feature branch off of the default branch + (never commit changes to protected branches directly) and implement your + code changes +3. If applicable, update relevant sections of the [documentation][docs] +4. Add or update tests; untested code will not be merged; refer to the + [guidelines](#code-style-and-testing) above for details +5. Ensure that your coding style is in line with the + [guidelines](#code-style-and-testing) described above +6. Ensure that all tests and linter checks configured in the [Travis + CI][travis-docs] [continuous integration][ci-cd] (CI) pipeline pass without + issues +7. If necessary, clean up excessive commits with `git rebase`; cherry-pick and + merge commits as you see fit; use concise and descriptive commit messages +8. Push your clean, tested and documented feature branch to the remote; make + sure the [Travis CI][travis-docs] [CI][ci-cd] pipeline passes +9. Issue a pull request against the default branch; follow the instructions in + the [template][pull-request]; importantly, describe your changes in + detail, yet with concise language, and do not forget to indicate which + issue(s) the code changes resolve or refer to; assign a project maintainer + to review your changes + +## Becoming a co-maintainer + +If you are as interested in the project as we are and have contributed some +code, suggested some features or bug reports and have taken part in +discussions on where to go with the project, we will very likely to have you +on board as a co-maintainer. If you are intersted in that, please let us +know. You can reach us by [email][contact]. + +[bug-report]: .github/ISSUE_TEMPLATE/bug_report.mdrequest.md +[ci-cd]: +[coc]: CODE_OF_CONDUCT.md +[contact]: +[conv-commits]: +[conv-commits-blog]: +[conv-commits-lint]: +[docs]: README.md +[git]: +[git-flow]: +[github]: +[issue-tracker]: +[pull-request]: PULL_REQUEST_TEMPLATE.md +[py]: +[py-flake8]: +[py-mypy]: +[py-pylint]: +[py-pylint-conf]: pylint.cfg +[py-pyright]: +[py-pytest]: +[py-coverage]: +[py-typing]: +[travis-docs]: diff --git a/README.md b/README.md index 5bbe011..baaf9cf 100644 --- a/README.md +++ b/README.md @@ -1,54 +1,30 @@ [![ci](https://github.com/zavolanlab/zarp/workflows/CI/badge.svg?branch=dev)](https://github.com/zavolanlab/zarp/actions?query=workflow%3Aci) [![GitHub license](https://img.shields.io/github/license/zavolanlab/zarp?color=orange)](https://github.com/zavolanlab/zarp/blob/dev/LICENSE) -[![DOI:biorxiv](https://img.shields.io/badge/bioRxiv-10.1101%2F2021.11.18.469017-informational)](https://doi.org/10.1101/2021.11.18.469017) -[![DOI:zenodo](https://img.shields.io/badge/Zenodo-10.5281%2Fzenodo.5703358-informational)](https://doi.org/10.5281/zenodo.5703358) +[![Static Badge](https://img.shields.io/badge/f1000-10.12688/f1000research.149237.1-blue)](https://doi.org/10.12688/f1000research.149237.1) +[![DOI:zenodo](https://img.shields.io/badge/Zenodo-10.5281%2Fzenodo.10797025-informational)](https://doi.org/10.5281/zenodo.10797025) [![DOI:workflowhub](https://img.shields.io/badge/WorkflowHub-10.48546%2Fworkflowhub.workflow.447.1-informational)](https://doi.org/10.48546/workflowhub.workflow.447.1)
-
+ -**ZARP** ([Zavolab][zavolan-lab] Automated RNA-seq Pipeline) is a generic -RNA-Seq analysis workflow that allows users to process and analyze Illumina -short-read sequencing libraries with minimum effort. Better yet: With our -companion [**ZARP-cli**](https://github.com/zavolanlab/zarp-cli) command line -interface, you can start ZARP runs with the simplest and most intuitive -commands. +**ZARP** ([Zavolab][zavolan-lab] Automated RNA-seq Pipeline) is a generic RNA-Seq analysis workflow that allows users to process and analyze Illumina short-read sequencing libraries with minimum effort. Better yet: With our companion [**ZARP-cli**](https://github.com/zavolanlab/zarp-cli) command line interface, you can start ZARP runs with the simplest and most intuitive commands. _RNA-seq analysis doesn't get simpler than that!_ -ZARP relies on publicly available bioinformatics tools and currently handles -single or paired-end stranded bulk RNA-seq data. The workflow is developed in -[Snakemake][snakemake], a widely used workflow management system in the -bioinformatics community. +ZARP relies on publicly available bioinformatics tools and currently handles single or paired-end stranded bulk RNA-seq data. The workflow is developed in [Snakemake][snakemake], a widely used workflow management system in the bioinformatics community. -ZARP will pre-process, align and quantify your single- or paired-end stranded -bulk RNA-seq sequencing libraries with publicly available state-of-the-art -bioinformatics tools. ZARP's browser-based rich reports and visualitations will -give you meaningful initial insights in the quality and composition of your -sequencing experiments - fast and simple. Whether you are an experimentalist -struggling with large scale data analysis or an experienced bioinformatician, -when there's RNA-seq data to analyze, just _zarp 'em_! +ZARP will pre-process, align and quantify your single- or paired-end stranded bulk RNA-seq sequencing libraries with publicly available state-of-the-art bioinformatics tools. ZARP's browser-based rich reports and visualitations will give you meaningful initial insights in the quality and composition of your sequencing experiments - fast and simple. Whether you are an experimentalist struggling with large scale data analysis or an experienced bioinformatician, when there's RNA-seq data to analyze, just _ZARP 'em_!
-> **Note:** For a more detailed description of each step, please refer to the [workflow -> documentation][pipeline-documentation]. - -# Requirements - -The workflow has been tested on: -- CentOS 7.5 -- Debian 10 -- Ubuntu 16.04, 18.04 - -> **NOTE:** -> Currently, we only support **Linux** execution. +# Documentation +For the full documentation please visit the [ZARP website](https://zavolanlab.github.io/zarp). -# Installation +# Quick installation > **IMPORTANT: Rather than installing the ZARP workflow as described in this section, we > recommend installing [ZARP-cli](https://github.com/zavolanlab/zarp-cli) for most use @@ -56,141 +32,112 @@ The workflow has been tested on: > instructions](https://zavolanlab.github.io/zarp-cli/guides/installation/), you can > skip the instructions below. -## 1. Clone the repository - -Go to the desired directory/folder on your file system, then clone/get the -repository and move into the respective directory with: +Quick installation requires the following: +- Linux +- Git +- [Conda][conda] >= 22.11.1 +- [Mamba][mamba] >=1.3.0 <2 +- [Singularity][singularity] >=3.5.2 (Required only if you want to use Singulaarity for the dependencies) ```bash git clone https://github.com/zavolanlab/zarp.git cd zarp +mamba env create -f install/environment.yml +conda activate zarp ``` -## 2. Conda and Mamba installation +# Basic usage -Workflow dependencies can be conveniently installed with the [Conda][conda] -package manager. We recommend that you install [Miniconda][miniconda-installation] -for your system (Linux). Be sure to select Python 3 option. -The workflow was built and tested with `miniconda 4.7.12`. -Other versions are not guaranteed to work as expected. +You can trigger ZARP without ZARP-cli. This is convenient for users who have some experience with Snakemake and don't want to use a CLI to trigger their runs. Extensive documentation of the usage is available in the [usage documentation](https://zavolanlab.github.io/zarp/guides/usage/), while below you can find the basic steps to trigger a run. -Given that Miniconda has been installed and is available in the current shell the first -dependency for ZARP is the [Mamba][mamba] package manager (version 1), which needs to be installed in -the `base` conda environment with: +1. Assuming that your current directory is the workflow repository's root directory, +create a directory for your workflow run and move into it with: -```bash -conda install mamba=1 -n base -c conda-forge -``` + ```bash + mkdir config/my_run + cd config/my_run + ``` -## 3. Dependencies installation +2. Create an empty sample table and a workflow configuration file: -For improved reproducibility and reusability of the workflow, -each individual step of the workflow runs either in its own [Singularity][singularity] -container or in its own [Conda][conda] virtual environemnt. -As a consequence, running this workflow has very few individual dependencies. -The **container execution** requires Singularity to be installed on the system where the workflow is executed. -As the functional installation of Singularity requires root privileges, and Conda currently only provides Singularity -for Linux architectures, the installation instructions are slightly different depending on your system/setup: + ```bash + touch samples.tsv + touch config.yaml + ``` -### For most users +3. Use your editor of choice to populate these files with appropriate +values. Have a look at the examples in the `tests/` directory to see what the +files should look like, specifically: -If you do *not* have root privileges on the machine you want -to run the workflow on *or* if you do not have a Linux machine, please [install -Singularity][singularity-install] separately and in privileged mode, depending -on your system. You may have to ask an authorized person (e.g., a systems -administrator) to do that. This will almost certainly be required if you want -to run the workflow on a high-performance computing (HPC) cluster. + - [samples.tsv](https://github.com/zavolanlab/zarp/blob/dev/tests/input_files/samples.tsv) + - [config.yaml](https://github.com/zavolanlab/zarp/blob/dev/tests/input_files/config.yaml) -> **NOTE:** -> The workflow has been tested with the following Singularity versions: -> * `v2.6.2` -> * `v3.5.2` -After installing Singularity, install the remaining dependencies with: -```bash -mamba env create -f install/environment.yml -``` +4. Create a runner script. Pick one of the following choices for either local +or cluster execution. Before execution of the respective command, you need to +remember to update the argument of the `--singularity-args` option of a +respective profile (file: `profiles/{profile}/config.yaml`) so that +it contains a comma-separated list of _all_ directories +containing input data files (samples and any annotation files etc) required for +your run. + Runner script for _local execution_: -### As root user on Linux + ```bash + cat << "EOF" > run.sh + #!/bin/bash -If you have a Linux machine, as well as root privileges, (e.g., if you plan to -run the workflow on your own computer), you can execute the following command -to include Singularity in the Conda environment: + snakemake \ + --profile="../../profiles/local-singularity" \ + --configfile="config.yaml" -```bash -mamba env update -f install/environment.root.yml -``` + EOF + ``` -## 4. Activate environment + **OR** -Activate the Conda environment with: + Runner script for _Slurm cluster exection_ (note that you may need + to modify the arguments to `--jobs` and `--cores` in the file: + `profiles/slurm-singularity/config.yaml` depending on your HPC + and workload manager configuration): -```bash -conda activate zarp -``` - -# Extra installation steps (optional) + ```bash + cat << "EOF" > run.sh + #!/bin/bash + mkdir -p logs/cluster_log + snakemake \ + --profile="../profiles/slurm-singularity" \ + --configfile="config.yaml" + EOF + ``` -## 5. Non-essential dependencies installation + > Note: When running the pipeline with *Conda* you should use `local-conda` and + `slurm-conda` profiles instead. -Most tests have additional dependencies. If you are planning to run tests, you -will need to install these by executing the following command _in your active -Conda environment_: + > Note: The slurm profiles are adapted to a cluster that uses the quality-of-service (QOS) keyword. If QOS is not supported by your slurm instance, you have to remove all the lines with "qos" in `profiles/slurm-config.json`. -```bash -mamba env update -f install/environment.dev.yml -``` +5. Start your workflow run: -## 6. Successful installation tests - -We have prepared several tests to check the integrity of the workflow and its -components. These can be found in subdirectories of the `tests/` directory. -The most critical of these tests enable you to execute the entire workflow on a -set of small example input files. Note that for this and other tests to complete -successfully, [additional dependencies](#installing-non-essential-dependencies) -need to be installed. -Execute one of the following commands to run the test workflow -on your local machine: -* Test workflow on local machine with **Singularity**: -```bash -bash tests/test_integration_workflow/test.local.sh -``` -* Test workflow on local machine with **Conda**: -```bash -bash tests/test_integration_workflow_with_conda/test.local.sh -``` -Execute one of the following commands to run the test workflow -on a [Slurm][slurm]-managed high-performance computing (HPC) cluster: - -* Test workflow with **Singularity**: - -```bash -bash tests/test_integration_workflow/test.slurm.sh -``` -* Test workflow with **Conda**: - -```bash -bash tests/test_integration_workflow_with_conda/test.slurm.sh -``` + ```bash + bash run.sh + ``` -> **NOTE:** Depending on the configuration of your Slurm installation you may -> need to adapt file `slurm-config.json` (located directly under `profiles` -> directory) and the arguments to options `--cores` and `--jobs` -> in the file `config.yaml` of a respective profile. -> Consult the manual of your workload manager as well as the section of the -> Snakemake manual dealing with [profiles]. +## Contributing -# Running the workflow on your own samples +This project lives off your contributions, be it in the form of bug reports, +feature requests, discussions, or fixes and other code changes. Please refer +to the [contributing guidelines](CONTRIBUTING.md) if you are interested to +contribute. Please mind the [code of conduct](CODE_OF_CONDUCT.md) for all +interactions with the community. -## Running ZARP with ZARP-cli (recommended) +## Contact -Head over to the [ZARP-cli](https://zavolanlab.github.io/zarp-cli/) to learn how to -start ZARP runs with very simple commands, like: +For questions or suggestions regarding the code, please use the +[issue tracker][issue-tracker]. For any other inquiries, please contact us +by [email][contact]. -## Running ZARP without ZARP-cli +© 2021 [Zavolab, Biozentrum, University of Basel][zavolab] -You can also trigger ZARP without ZARP-cli. This is convenient for users who have some experience with snakemake and don't want to use a CLI to trigger their runs. Please head over to the [ZARP](https://zavolanlab.github.io/zarp/) documentation to learn how to start ZARP. [conda]: [hts-infer]: @@ -208,3 +155,5 @@ You can also trigger ZARP without ZARP-cli. This is convenient for users who hav [zavolan-lab]: [pipeline-documentation]: pipeline_documentation.md [resources.tmpdir]: +[zavolab]: +[contact]: \ No newline at end of file