Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update precluster branch #99

Closed
wants to merge 27 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
633181b
add benchmarks
AroneyS Dec 20, 2023
83d6621
add faa output to evaluate prodigal
AroneyS Dec 20, 2023
947cfde
replace mem_mb values to function
AroneyS Dec 20, 2023
965c3c6
group bin prodigal and pipe
AroneyS Dec 20, 2023
02754ea
remove previous downloads when retrying
AroneyS Dec 20, 2023
f28c410
remove group-components: have to specify per group
AroneyS Dec 20, 2023
b7d610b
update polars to v0.20
AroneyS Dec 20, 2023
d8ef141
fix deprecated lengths()
AroneyS Dec 20, 2023
4910c85
fix comparison with None
AroneyS Dec 20, 2023
5924f81
fix deprecated take
AroneyS Dec 20, 2023
63ff31c
fix max coassembly size filtering
AroneyS Dec 20, 2023
24e138d
add test for empty df2 input to join_list_subsets
AroneyS Dec 20, 2023
45c4225
collect coassembly_edges before filtering
AroneyS Dec 20, 2023
43f4430
fix pl.lit behaviour change
AroneyS Dec 20, 2023
163d4b1
fix more deprecations
AroneyS Dec 20, 2023
98176f8
switch to outer_coalesce to get previous behaviour
AroneyS Dec 20, 2023
2d0d483
update SingleM to v0.16
AroneyS Dec 20, 2023
2263284
pin SingleM version
AroneyS Dec 20, 2023
eb686b4
switch to bioconda Aviary
AroneyS Dec 20, 2023
e6e6bdc
skip symlinking when new_genomes provided
AroneyS Dec 21, 2023
60ed469
fix test
AroneyS Dec 21, 2023
a988b7f
change name to binchicken
AroneyS Dec 21, 2023
e6568ab
bump version to v0.10.0
AroneyS Dec 21, 2023
8e7c732
add metagenome to summary
AroneyS Dec 21, 2023
e6d7e18
update readme
AroneyS Dec 21, 2023
c60bd3b
add metagenome to summary
AroneyS Dec 21, 2023
999a5d1
shorten descriptions in example workflow
AroneyS Dec 21, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: Test Ibis with Setup-Miniconda From Marketplace
name: Test Bin chicken with Setup-Miniconda From Marketplace
on: [push]

jobs:
Expand All @@ -17,7 +17,7 @@ jobs:
uses: conda-incubator/setup-miniconda@v2
with:
activate-environment: test
environment-file: ibis.yml
environment-file: binchicken.yml
python-version: ${{ matrix.python-version }}
miniforge-variant: Mambaforge
auto-activate-base: false
Expand All @@ -33,7 +33,7 @@ jobs:
sudo rm -rf /opt/ghc
sudo rm -rf "/usr/local/share/boost"
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
- name: Install Ibis
- name: Install Bin chicken
run: |
pip install -e .
- name: Run unit tests
Expand Down
3 changes: 2 additions & 1 deletion MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
include LICENSE
recursive-include ibis *.yaml *.yml *.smk ibis_logo.png
include binchicken_logo.png
recursive-include binchicken *.yaml *.yml *.smk
110 changes: 65 additions & 45 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,34 +1,60 @@
# Ibis
# Bin chicken

[<img src="ibis_logo.png" width="50%" />](ibis_logo.png)
[<img src="binchicken_logo.png" width="50%" />](binchicken_logo.png)

Ibis (bin chicken) - targeted recovery of low abundance genomes through intelligent coassembly.
Bin chicken - targeted recovery of low abundance metagenome assembled genomes through intelligent coassembly.

## Installation options

### Install from Bioconda

Install latest release via bioconda.

```bash
conda create -n binchicken -c bioconda binchicken
```

### Install from pip

Install latest release via pip.

```bash
pip install binchicken
```

### Install from source

Create conda env from `ibis.yml` and install from source.
Create conda env from `binchicken.yml` and install from source.

```bash
git clone https://github.com/AroneyS/ibis.git
cd ibis
conda env create -f ibis.yml
conda activate ibis
git clone https://github.com/AroneyS/binchicken.git
cd binchicken
conda env create -f binchicken.yml
conda activate binchicken
pip install -e .
```

## Environment setup

Create subprocess conda environments and setup environment variables.
Conda prefix is the directory you want to contain the subprocess conda environments.
SingleM metapackage is the metapackage downloaded by SingleM using `singlem data` (see <https://github.com/wwood/singlem>).

The latter databases are required only if you want to run Aviary directly using the `--run-aviary` argument.
GTDB-Tk database is the directory containing the GTDB-Tk release (see <https://github.com/Ecogenomics/GTDBTk>).
CheckM2 database is the directory containing the CheckM2 database (see <https://github.com/chklovski/CheckM2>).
These can also be downloaded automatically by Aviary using `aviary configure --download gtdb singlem checkm2` (see <https://github.com/rhysnewell/aviary>).

```bash
ibis build \
binchicken build \
--conda-prefix /path/to/conda/envs/dir \
--singlem-metapackage /metapackage/dir \
--gtdbtk-db /gtdb/release/dir \
--checkm2-db /checkm2/db/dir
```

Alternatively, set directory to contain subprocess conda environments and environment variables manually.
Subprocess conda environments will be created when required.

```bash
conda env config vars set SNAKEMAKE_CONDA_PREFIX="/path/to/conda/envs"
Expand All @@ -38,38 +64,32 @@ conda env config vars set GTDBTK_DATA_PATH="/gtdb/release/dir"
conda env config vars set CHECKM2DB="/checkm2/db/dir"
```

### Install from pip

Install latest release via pip.

```bash
pip install ibis-genome
```

## Example workflow

```bash
# Assemble and recover from each sample individually with 20 samples used for differential abundance binning
ibis coassemble \
# Assemble and recover from each sample individually
# 20 samples used for differential abundance binning
binchicken coassemble \
--forward-list samples_forward.txt --reverse-list samples_reverse.txt \
--run-aviary --single-assembly \
--cores 64 --output ibis_single_assembly
--cores 64 --output binchicken_single_assembly

# Assemble and recover from 2-sample coassemblies, prioritising samples with genomes not previously recovered
ibis iterate \
--coassemble-output ibis_single_assembly \
# Assemble and recover from 2-sample coassemblies
# Prioritising samples with genomes not previously recovered
binchicken iterate \
--coassemble-output binchicken_single_assembly \
--run-aviary --assemble-unmapped \
--cores 64 --output ibis_2_coassembly
--cores 64 --output binchicken_2_coassembly

# Perform another iteration of coassembly, with 3-samples this time
ibis iterate \
binchicken iterate \
--coassembly-samples 3 \
--coassemble-output ibis_2_coassembly \
--coassemble-output binchicken_2_coassembly \
--run-aviary --assemble-unmapped \
--cores 64 --output ibis_3_coassembly
--cores 64 --output binchicken_3_coassembly
```

## Ibis coassemble
## Bin chicken coassemble

Snakemake pipeline to discover coassembly sample clusters based on co-occurrence of single-copy marker genes, excluding those genes present in reference genomes (e.g. previously recovered genomes).
The taxa of the considered sequences can be filtered to target a specific taxon (e.g. the phylum Planctomycetota).
Expand All @@ -80,65 +100,65 @@ Paired end reads of form reads_1.1.fq, reads_1_1.fq and reads_1_R1.fq are automa

```bash
# Example: cluster reads into proposed coassemblies
ibis coassemble --forward reads_1.1.fq ... --reverse reads_1.2.fq ...
binchicken coassemble --forward reads_1.1.fq ... --reverse reads_1.2.fq ...

# Example: cluster reads into proposed coassemblies based on unbinned sequences
ibis coassemble --forward reads_1.1.fq ... --reverse reads_1.2.fq ... --genomes genome_1.fna ...
binchicken coassemble --forward reads_1.1.fq ... --reverse reads_1.2.fq ... --genomes genome_1.fna ...

# Example: cluster reads into proposed coassemblies based on unbinned sequences and coassemble only unbinned reads
ibis coassemble --forward reads_1.1.fq ... --reverse reads_1.2.fq ... --genomes genome_1.fna ... --assemble-unmapped
binchicken coassemble --forward reads_1.1.fq ... --reverse reads_1.2.fq ... --genomes genome_1.fna ... --assemble-unmapped

# Example: cluster reads into proposed coassemblies based on unbinned sequences from a specific taxa
ibis coassemble --forward reads_1.1.fq ... --reverse reads_1.2.fq ... --genomes genome_1.fna ... --taxa-of-interest "p__Planctomycetota"
binchicken coassemble --forward reads_1.1.fq ... --reverse reads_1.2.fq ... --genomes genome_1.fna ... --taxa-of-interest "p__Planctomycetota"

# Example: find relevant samples for differential coverage binning (no coassembly)
ibis coassemble --forward reads_1.1.fq ... --reverse reads_1.2.fq ... --single-assembly
binchicken coassemble --forward reads_1.1.fq ... --reverse reads_1.2.fq ... --single-assembly

# Example: run proposed coassemblies through aviary with cluster submission
# Create snakemake profile at ~/.config/snakemake/qsub with cluster, cluster-status, cluster-cancel, etc.
# See https://snakemake.readthedocs.io/en/stable/executing/cli.html#profiles
ibis coassemble --forward reads_1.1.fq ... --reverse reads_1.2.fq ... --run-aviary \
binchicken coassemble --forward reads_1.1.fq ... --reverse reads_1.2.fq ... --run-aviary \
--snakemake-profile qsub --cluster-retries 3 --local-cores 64 --cores 64
```

## Ibis evaluate
## Bin chicken evaluate

Evaluates the recovery of target genes by coassemblies suggested by above, finding the number of target genes present in the newly recovered genomes.
Compares the recovery by phyla and by single-copy marker gene.

```bash
# Example: evaluate a completed coassembly
ibis evaluate --coassemble-output coassemble_dir --aviary-outputs coassembly_0_dir ...
binchicken evaluate --coassemble-output coassemble_dir --aviary-outputs coassembly_0_dir ...

# Example: evaluate a completed coassembly by providing genomes directly
ibis evaluate --coassemble-output coassemble_dir --new-genomes genome_1.fna ... --coassembly-run coassembly_0
binchicken evaluate --coassemble-output coassemble_dir --new-genomes genome_1.fna ... --coassembly-run coassembly_0
```

## Ibis iterate
## Bin chicken iterate

Run a further iteration of coassemble, including newly recovered bins.
Defaults to using genomes with at least 70% complete and at most 10% contamination CheckM2.
Automatically excludes previous coassemblies.

```bash
# Example: rerun coassemble, adding new bins to database
ibis iterate --coassemble-output coassemble_dir
binchicken iterate --coassemble-output coassemble_dir

# Example: rerun coassemble, adding new bins to database, providing genomes directly
ibis iterate --coassemble-output coassemble_dir --new-genomes new_genome_1.fna
binchicken iterate --coassemble-output coassemble_dir --new-genomes new_genome_1.fna
```

## Ibis update
## Bin chicken update

Applies further processing to a previous Ibis coassemble run: downloading SRA reads, generating unmapped reads files, and/or running assembly/recovery commands.
Applies further processing to a previous Bin chicken coassemble run: downloading SRA reads, generating unmapped reads files, and/or running assembly/recovery commands.

```bash
# Example: update previous run to download SRA reads
ibis update --coassemble-output coassemble_dir --sra --forward SRA000001 ... --genomes genome_1.fna ...
binchicken update --coassemble-output coassemble_dir --sra --forward SRA000001 ... --genomes genome_1.fna ...

# Example: update previous run to perform unmapping
ibis update --coassemble-output coassemble_dir --assemble-unmapped --forward reads_1.1.fq ... --reverse reads_1.2.fq ... --genomes genome_1.fna ...
binchicken update --coassemble-output coassemble_dir --assemble-unmapped --forward reads_1.1.fq ... --reverse reads_1.2.fq ... --genomes genome_1.fna ...

# Example: update previous run to run specific coassemblies
ibis update --coassemble-output coassemble_dir --run-aviary --coassemblies coassembly_0 ... --forward reads_1.1.fq ... --reverse reads_1.2.fq ... --genomes genome_1.fna ...
binchicken update --coassemble-output coassemble_dir --run-aviary --coassemblies coassembly_0 ... --forward reads_1.1.fq ... --reverse reads_1.2.fq ... --genomes genome_1.fna ...
```
4 changes: 2 additions & 2 deletions ibis.yml → binchicken.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: ibis
name: binchicken
channels:
- conda-forge
- bioconda
Expand All @@ -11,7 +11,7 @@ dependencies:
- bird_tool_utils_python=0.4.*
- extern=0.4.*
- ruamel.yaml=0.17.*
- polars=0.19.7
- polars=0.20.*
- pigz=2.3.*
- pyarrow=12.0.*
- parallel=20230522
Expand Down
1 change: 1 addition & 0 deletions binchicken/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
__version__ = "0.10.0"
Loading
Loading