Skip to content

Commit

Permalink
Rich codex to separate PR
Browse files Browse the repository at this point in the history
  • Loading branch information
grst committed Sep 21, 2024
1 parent cdf838a commit 810f48f
Show file tree
Hide file tree
Showing 6 changed files with 31 additions and 44 deletions.
33 changes: 0 additions & 33 deletions .github/workflows/screenshots.yml

This file was deleted.

42 changes: 31 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,14 +21,22 @@ step of your analysis (usually one script with defined inputs and outputs) organ
cannot be nested. A *folder* is used to organize stages in a hierarchical way within the project.

You can use `dso init` to create a new project

![`dso init "my_cool_project"`](docs/img/dso_init.png)
```
$> dso init
Please enter the name of the project, e.g. "single_cell_lung_atlas": my_cool_project
Please add a short description of the project: This analysis solves *all* the problems!
```

Within a project, you can use `dso create` to initalize folders and stages from a predefined template

![`dso create stage --template bash --description 'Quality control'`](docs/img/dso_stage.png)

All of the above is also possible in an interactively guided text based way if you do not supply the parameters directly.
```
$> dso create stage
? Choose a template: (Use arrow keys)
bash
» quarto
Please enter the name of the stage, e.g. "01_preprocessing": 02_quality_control
Please add a short description of the stage: Make a PCA to detect outliers
```

### How-to write and use config files
The config files in a project, subfolder or stage are the cornerstone of any reproducable analysis by minimising analysis configuration errors within related scripts. Additionally, config files reduce the time needed to modify your scripts when changing configurations such as p-value cutoffs, excluded samples, output directory, data input, and many more.
Expand All @@ -37,31 +45,42 @@ A config file of a project, subfolder, or stage contains all necessary parameter

In DSO two parameter files are given called `params.yaml` and `params.in.yaml`. `params.yaml` is an autogenerated YAML containing all the parameters specified in the params.in.yaml and other params.yaml files in its parent directories (see figure below for an example how this behaves in real). `params.yaml` will be compiled when running `dso compile-config`.

<img src="docs/img/config.png" width="500" alt="Hierarchical configuration schema" />
<img src="img/config.png" width="500" alt="Hierarchical configuration schema" />

![`dso init "my_cool_project" --description "test" && cd my_cool_project && dso compile-config`](docs/img/dso_compile_config.png)
```
$> dso compile-config
[08/22/24 20:53:43] INFO Detected /home/grst/my_cool_project as project root.
INFO Compiling a total of 2 config files.
INFO Configuration compiled successfully.
```

### Linting checks

Dso provides linting checks that detect common errors in analysis projects. Right now only few checks are implemented,
but more will be available in the future.

To run the linting checks manuall, execute `dso lint`:
To run the linting checks manuall, execute

![`dso init "my_cool_project" --description "test" && cd my_cool_project && dso lint`](docs/img/dso_lint.png)
```
$> dso lint
[08/22/24 20:53:43] INFO Compiled a list of 22 to be linted
```

However, it is preferable to execute linting checks as pre-commit hooks and/or as continuous integration checks.
A `.pre-commit-config.yaml` comes with the DSO project template. Simply activate it using `pre-commit install`.

### Reproducing projects

To reproduce/execute all stages within a project, run `dso repro`
To reproduce/execute all stages within a project, run

![`dso init "my_cool_project" --description "test" && cd my_cool_project && dso repro`](docs/img/dso_repro.png)
```
$> dso repro
```

This is a thin wrapper around `dvc repro` that compiles all configuration files beforehand.
DVC will only reproduce stages defined in the dvc.yaml where changes have been made. When dependencies have been changed, previous stages will also be re-run.


### Integration with quarto

DSO provides some additional tooling around quarto documents for generating reproducible reports. When you create a
Expand Down Expand Up @@ -93,6 +112,7 @@ To access stage parameters and resolve file paths relative to the stage director
companion package [`dso-r`](https://github.com/Boehringer-Ingelheim/dso-r) that provides the two functions
`read_params(stage_name)` and `stage_here(path)`.


## Installation

DSO requires Python 3.10 or later.
Expand Down
Binary file removed docs/img/config.png
Binary file not shown.
Binary file removed docs/img/dso_lint.png
Binary file not shown.
Binary file removed docs/img/dso_repro.png
Binary file not shown.
Binary file removed docs/img/dso_stage.png
Binary file not shown.

0 comments on commit 810f48f

Please sign in to comment.