Skip to content

Commit

Permalink
Rebase from master
Browse files Browse the repository at this point in the history
  • Loading branch information
baileythegreen committed Dec 17, 2021
2 parents 08c61bc + c0fdf6d commit 62967a0
Show file tree
Hide file tree
Showing 64 changed files with 840 additions and 209 deletions.
9 changes: 9 additions & 0 deletions .all-contributorsrc
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,15 @@
"code",
"ideas"
]
},
{
"login": "dparks1134",
"name": "Donovan Parks",
"avatar_url": "https://avatars.githubusercontent.com/u/3688336?v=4",
"profile": "https://github.com/dparks1134",
"contributions": [
"bug"
]
}
],
"contributorsPerLine": 7,
Expand Down
4 changes: 2 additions & 2 deletions .flake8
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[flake8]
ignore = E203, E231, E266, E501, W503, F403, F401
ignore = E203, E231, E266, E501, W503, F403, F401, E731
max-line-length = 88
max-complexity = 18
select = B,C,E,F,W,T4,B9
select = B,C,E,F,W,T4,B9
16 changes: 15 additions & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,20 @@ This will install all dependencies for running and developing `pyani`, as well a
make test
```

If you want to be able to edit source files and have those changes take immediate effect when calling `pyani` (useful for testing), clone the GitHub repository with:

```bash
git clone https://github.com/widdowquinn/pyani.git
```

then inside the new `pyani` directory run:

```bash
pip install -e .
```

This is the [`pip install --editable`](https://pip.pypa.io/en/stable/cli/pip_install/#install-editable) command, which links the installed package to the specified location (here `.`, i.e. the current directory) rather than the usual package location (`site-packages`). When using this option, edits to the source code are immediately available in the installed package. This allows you to test changes to the source code as you make them, without the need for an additional uninstall/install step.

#### Cleaning up development environment

You can remove the `conda` development environment with the following commands:
Expand Down Expand Up @@ -219,7 +233,7 @@ A good long description could be
> This fix improves efficiency of the veeblefetzer. The main change is replacing a
> nested loop with asyncio calls to a new function `fetzveebles()`. This commit
> makes affects `veebles.py`, and new tests are added in `test_veeblefetzer.py`.
>
>
> fixes #246
A bad long description might be
Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ clean_walkthrough:
walkthrough: clean_walkthrough
pyani download --email [email protected] -t 203804 C_blochmannia
pyani createdb -f
pyani anim C_blochmannia C_blochmannia_ANIm \
pyani anim -i C_blochmannia -o C_blochmannia_ANIm \
--name "C. blochmannia run 1" \
--labels C_blochmannia/labels.txt --classes C_blochmannia/classes.txt
pyani report --runs C_blochmannia_ANIm/ --formats html,excel,stdout
Expand Down
45 changes: 23 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ and we are grateful to all who have contributed to this software:
<td align="center"><a href="https://b-brankovics.github.io"><img src="https://avatars.githubusercontent.com/u/6728856?v=4?s=100" width="100px;" alt=""/><br /><sub><b>Balázs Brankovics</b></sub></a><br /><a href="https://github.com/widdowquinn/pyani/commits?author=b-brankovics" title="Code">💻</a> <a href="https://github.com/widdowquinn/pyani/issues?q=author%3Ab-brankovics" title="Bug reports">🐛</a></td>
<td align="center"><a href="https://github.com/sammywinchester19"><img src="https://avatars.githubusercontent.com/u/67588791?v=4?s=100" width="100px;" alt=""/><br /><sub><b>sammywinchester19</b></sub></a><br /><a href="https://github.com/widdowquinn/pyani/issues?q=author%3Asammywinchester19" title="Bug reports">🐛</a></td>
<td align="center"><a href="https://github.com/TSL-RamKrishna"><img src="https://avatars.githubusercontent.com/u/20773891?v=4?s=100" width="100px;" alt=""/><br /><sub><b>Ram Krishna Shrestha</b></sub></a><br /><a href="https://github.com/widdowquinn/pyani/commits?author=TSL-RamKrishna" title="Tests">⚠️</a> <a href="https://github.com/widdowquinn/pyani/commits?author=TSL-RamKrishna" title="Code">💻</a> <a href="#ideas-TSL-RamKrishna" title="Ideas, Planning, & Feedback">🤔</a></td>
<td align="center"><a href="https://github.com/dparks1134"><img src="https://avatars.githubusercontent.com/u/3688336?v=4?s=100" width="100px;" alt=""/><br /><sub><b>Donovan Parks</b></sub></a><br /><a href="https://github.com/widdowquinn/pyani/issues?q=author%3Adparks1134" title="Bug reports">🐛</a></td>
</tr>
</table>

Expand Down Expand Up @@ -63,7 +64,7 @@ DOI: [10.1039/C5AY02550H](https://doi.org/10.1039/C5AY02550H)
[![pyani sourcerank](https://img.shields.io/librariesio/sourcerank/pypi/pyani.svg?logo=koding&logoColor=white)](https://libraries.io/pypi/pyani)

<!-- ALL-CONTRIBUTORS-BADGE:START - Do not remove or modify this section -->
[![All Contributors](https://img.shields.io/badge/all_contributors-10-orange.svg?style=flat-square)](#contributors-)
[![All Contributors](https://img.shields.io/badge/all_contributors-11-orange.svg?style=flat-square)](#contributors-)
<!-- ALL-CONTRIBUTORS-BADGE:END -->

[![pyani PyPi version](https://img.shields.io/pypi/v/pyani "PyPI version")](https://pypi.python.org/pypi/pyani)
Expand Down Expand Up @@ -132,7 +133,7 @@ DOI: [10.1039/C5AY02550H](https://doi.org/10.1039/C5AY02550H)

Where available, `pyani` can take advantage of multicore systems, and integrates with [SGE/OGE](http://gridscheduler.sourceforge.net/)-type job schedulers for the sequence comparisons.

`pyani` installs the prgram `pyani`, which enables command-line based analysis of genomes.
`pyani` installs the program `pyani`, which enables command-line based analysis of genomes.

-----

Expand Down Expand Up @@ -235,10 +236,10 @@ The first step is to obtain genome data for analysis. `pyani` expects to find ea
We'll use the `pyani download` subcommand to download all available genomes for *Candidatus Blochmannia* from NCBI. The taxon ID for this grouping is [203804](https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=203804&lvl=3&lin=f&keep=1&srchmode=1&unlock).

```bash
pyani download C_blochmannia --email [email protected] -t 203804 -v -l C_blochmannia_dl.log
pyani download -o C_blochmannia --email [email protected] -t 203804 -v -l C_blochmannia_dl.log
```

The first argument is the output directory into which the downloaded genomes will be written (`C_blochmannia`). To download anything from NCBI we must provide an email address (`--email [email protected]`), and to specify which taxon subtree we want to download we provide the taxon ID (`-t 203804`).
The first argument is the output directory into which the downloaded genomes will be written (`-o C_blochmannia`). To download anything from NCBI we must provide an email address (`--email [email protected]`), and to specify which taxon subtree we want to download we provide the taxon ID (`-t 203804`).

Here we also request verbose output (`-v`), and write a log file for reproducible research/diagnosing bugs and errors (`-l C_blochmannia_dl.log`).

Expand All @@ -249,27 +250,27 @@ $ tree C_blochmannia
C_blochmannia
├── GCF_000011745.1_ASM1174v1_genomic.fna
├── GCF_000011745.1_ASM1174v1_genomic.fna.gz
├── GCF_000011745.1_ASM1174v1_genomic.md5
├── GCF_000011745.1_ASM1174v1_genomic.fna.md5
├── GCF_000011745.1_ASM1174v1_hashes.txt
├── GCF_000043285.1_ASM4328v1_genomic.fna
├── GCF_000043285.1_ASM4328v1_genomic.fna.gz
├── GCF_000043285.1_ASM4328v1_genomic.md5
├── GCF_000043285.1_ASM4328v1_genomic.fna.md5
├── GCF_000043285.1_ASM4328v1_hashes.txt
├── GCF_000185985.2_ASM18598v2_genomic.fna
├── GCF_000185985.2_ASM18598v2_genomic.fna.gz
├── GCF_000185985.2_ASM18598v2_genomic.md5
├── GCF_000185985.2_ASM18598v2_genomic.fna.md5
├── GCF_000185985.2_ASM18598v2_hashes.txt
├── GCF_000331065.1_ASM33106v1_genomic.fna
├── GCF_000331065.1_ASM33106v1_genomic.fna.gz
├── GCF_000331065.1_ASM33106v1_genomic.md5
├── GCF_000331065.1_ASM33106v1_genomic.fna.md5
├── GCF_000331065.1_ASM33106v1_hashes.txt
├── GCF_000973505.1_ASM97350v1_genomic.fna
├── GCF_000973505.1_ASM97350v1_genomic.fna.gz
├── GCF_000973505.1_ASM97350v1_genomic.md5
├── GCF_000973505.1_ASM97350v1_genomic.fna.md5
├── GCF_000973505.1_ASM97350v1_hashes.txt
├── GCF_000973545.1_ASM97354v1_genomic.fna
├── GCF_000973545.1_ASM97354v1_genomic.fna.gz
├── GCF_000973545.1_ASM97354v1_genomic.md5
├── GCF_000973545.1_ASM97354v1_genomic.fna.md5
├── GCF_000973545.1_ASM97354v1_hashes.txt
├── classes.txt
└── labels.txt
Expand All @@ -279,7 +280,7 @@ Seven genomes have been downloaded, and each is represented by four files:

- `_genomic.fna.gz`: the compressed genome sequence
- `_genomic.fna`: the uncompressed genome sequence
- `_genomic.md5`: an MD5 hash/checksum of the (uncompressed) genome sequence; this was generated during the download
- `_genomic.fna.md5`: an MD5 hash/checksum of the (uncompressed) genome sequence; this was generated during the download
- `_hashes.txt`: a list of MD5 hashes; this is provided by NCBI and is a reference to be sure that the download did not corrupt the genome sequence

There are two additional plain text files: `classes.txt` and `labels.txt`, which provide alternative labels for use in the analysis. These files are generated during the download.
Expand Down Expand Up @@ -316,7 +317,7 @@ Subsequent `pyani` commands will assume this location for the database, but you
In this walkthrough, we'll run ANIm on the downloaded genomes, using the command:

```bash
pyani anim C_blochmannia C_blochmannia_ANIm -v -l C_blochmannia_ANIm.log \
pyani anim -i C_blochmannia -o C_blochmannia_ANIm -v -l C_blochmannia_ANIm.log \
--name "C. blochmannia run 1" \
--labels C_blochmannia/labels.txt --classes C_blochmannia/classes.txt
```
Expand All @@ -332,10 +333,10 @@ One reason for using a database backend for analysis results is so that, for ver
You can test this for yourself by running the analysis command again, as below. You will see a number of messages indicating that genomes have been seen before, and that analyses performed before were skipped:

```bash
$ pyani anim C_blochmannia C_blochmannia_ANIm -v -l C_blochmannia_ANIm.log \
$ pyani anim -i C_blochmannia -o C_blochmannia_ANIm -v -l C_blochmannia_ANIm.log \
--name "C. blochmannia run 2" \
--labels C_blochmannia/labels.txt --classes C_blochmannia/classes.txt
INFO: command-line: pyani anim C_blochmannia C_blochmannia_ANIm -v -l C_blochmannia_ANIm.log
INFO: command-line: pyani anim -i C_blochmannia -o C_blochmannia_ANIm -v -l C_blochmannia_ANIm.log
INFO: Running ANIm analysis
INFO: Adding analysis information to database .pyani/pyanidb
INFO: Current analysis has ID 2 in this database
Expand Down Expand Up @@ -367,9 +368,9 @@ Once an analysis is run, the results are placed in a local `SQLite` database, wh
The report tables are written to a named directory (compulsory argument), and are written by default to a `.tab` plain-text format, but HTML and Excel format can also be requested with the `--formats` argument:

```bash
$ pyani report -v --runs C_blochmannia_ANIm/ --formats html,excel,stdout
INFO: Processed arguments: Namespace(cmdline='./pyani report -v --runs C_blochmannia_ANIm/ --formats html,excel', dbpath='.pyani/pyanidb', formats='html,excel', func=<function subcmd_report at 0x10c674a60>, logfile=None, outdir='C_blochmannia_ANIm/', run_results=False, show_genomes=False, show_genomes_runs=False, show_runs=True, show_runs_genomes=False, verbose=True)
INFO: command-line: ./pyani report -v --runs C_blochmannia_ANIm/ --formats html,excel
$ pyani report -v --runs -o C_blochmannia_ANIm/ --formats html,excel,stdout
INFO: Processed arguments: Namespace(cmdline='./pyani report -v --runs -o C_blochmannia_ANIm/ --formats html,excel', dbpath='.pyani/pyanidb', formats='html,excel', func=<function subcmd_report at 0x10c674a60>, logfile=None, outdir='C_blochmannia_ANIm/', run_results=False, show_genomes=False, show_genomes_runs=False, show_runs=True, show_runs_genomes=False, verbose=True)
INFO: command-line: ./pyani report -v --runs -o C_blochmannia_ANIm/ --formats html,excel
INFO: Creating output in formats: ['excel', 'tab', 'html']
INFO: Using database: .pyani/pyanidb
INFO: Writing table of pyani runs from the database to C_blochmannia_ANIm/runs.*
Expand All @@ -385,9 +386,9 @@ C_blochmannia_ANIm/
To see all of the pairwise results for an individual run, the run ID must be provided. It is possible to get results for more than one run ID by providing a comma-separated list of run IDs (though each run's results will be provided in a separate file):

```bash
$ pyani report -v --runs C_blochmannia_ANIm/ --formats html,excel --run_results 1,2,3,4
INFO: Processed arguments: Namespace(cmdline='./pyani report -v --runs C_blochmannia_ANIm/ --formats html,excel --run_results 1,2,3,4', dbpath='.pyani/pyanidb', formats='html,excel', func=<function subcmd_report at 0x108616a60>, logfile=None, outdir='C_blochmannia_ANIm/', run_results='1,2,3,4', show_genomes=False, show_genomes_runs=False, show_runs=True, show_runs_genomes=False, verbose=True)
INFO: command-line: ./pyani report -v --runs C_blochmannia_ANIm/ --formats html,excel --run_results 1,2,3,4
$ pyani report -v --runs -o C_blochmannia_ANIm/ --formats html,excel --run_results 1,2,3,4
INFO: Processed arguments: Namespace(cmdline='./pyani report -v --runs -o C_blochmannia_ANIm/ --formats html,excel --run_results 1,2,3,4', dbpath='.pyani/pyanidb', formats='html,excel', func=<function subcmd_report at 0x108616a60>, logfile=None, outdir='C_blochmannia_ANIm/', run_results='1,2,3,4', show_genomes=False, show_genomes_runs=False, show_runs=True, show_runs_genomes=False, verbose=True)
INFO: command-line: ./pyani report -v --runs -o C_blochmannia_ANIm/ --formats html,excel --run_results 1,2,3,4
INFO: Creating output in formats: ['tab', 'excel', 'html']
INFO: Using database: .pyani/pyanidb
INFO: Writing table of pyani runs from the database to C_blochmannia_ANIm/runs.*
Expand All @@ -402,7 +403,7 @@ INFO: Completed. Time taken: 1.285
You can see a run's results in the terminal by specifying the `stdout` format. For example, to see the identity, coverage, and other output matrices, you would specify `--run_matrices <RUN>` and `--formats=stdout` as below:

```bash
$ pyani report C_blochmannia_ANIm --formats=stdout --run_matrices 1
$ pyani report -o C_blochmannia_ANIm --formats=stdout --run_matrices 1
TABLE: C_blochmannia_ANIm/matrix_identity_1
C. Blochmannia pennsylvanicus BPEN C. Blochmannia floridanus C. Blochmannia vafer BVAF C. Blochmannia chromaiodes 640 B. endosymbiont of Polyrhachis (Hedomyrma) turneri 675 B. endosymbiont of Camponotus (Colobopsis) obliquus 757
C. Blochmannia pennsylvanicus BPEN 1.000000 0.834866 0.836903 0.980244 0.843700 0.829509
Expand Down Expand Up @@ -454,7 +455,7 @@ B. endosymbiont of Camponotus (Colobopsis) obli... 0.
The output of a `pyani` run can also be represented graphically, using the `plot` subcommand. For example, the command:

```bash
pyani plot C_blochmannia_ANIm 1 -v --formats png,pdf
pyani plot -o C_blochmannia_ANIm --run_id 1 -v --formats png,pdf
```

will place `.pdf` and `.png` format output in the `C_blochmannia_ANIm` output directory for the run with ID 1, generated above. Five heatmaps are generated:
Expand Down
2 changes: 1 addition & 1 deletion README_v_0_2_x.md
Original file line number Diff line number Diff line change
Expand Up @@ -213,7 +213,7 @@ Command-line options can be viewed using:

```bash
$ genbank_get_genomes_by_taxon.py -h
usage: genbacnk_get_genomes_by_taxon.py [-h] [-o OUTDIRNAME] [-t TAXON] [-v]
usage: genbank_get_genomes_by_taxon.py [-h] [-o OUTDIRNAME] [-t TAXON] [-v]
[-f] [--noclobber] [-l LOGFILE]
[--format FORMAT] [--email EMAIL]
[--retries RETRIES]
Expand Down
2 changes: 1 addition & 1 deletion docs/basic_use.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,4 @@ Basic Use
indexing
createdb
run_anim

interpreting_plots
Loading

0 comments on commit 62967a0

Please sign in to comment.