Skip to content

Commit

Permalink
Merge pull request #35 from MDU-PHL/pubMLST
Browse files Browse the repository at this point in the history
Major update
  • Loading branch information
stroehleina authored Oct 27, 2022
2 parents 2d68493 + 3d9b8d6 commit b44e980
Show file tree
Hide file tree
Showing 34 changed files with 102,035 additions and 70,185 deletions.
3 changes: 1 addition & 2 deletions .bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
[bumpversion]
current_version = 0.5.5
current_version = 1.0.0
commit = True
tag = True

[bumpversion:file:ngmaster/__init__.py]

4 changes: 2 additions & 2 deletions .github/workflows/python-package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.6, 3.7, 3.8, 3.9]
python-version: [3.7, 3.8, 3.9]

steps:
- uses: actions/checkout@v2
Expand All @@ -38,7 +38,7 @@ jobs:
- name: Install ngmaster and dependencies
shell: bash -l {0}
run: |
conda install flake8 ispcr
conda install flake8 mlst
pip install .
- name: Lint with flake8
shell: bash -l {0}
Expand Down
1 change: 1 addition & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
graft ngmaster/db
graft ngmaster/test
graft ngmaster/scripts
96 changes: 53 additions & 43 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,60 +5,63 @@

# ngmaster

*In silico* multi-antigen sequence typing for *Neisseria gonorrhoeae* (NG-MAST).
*In silico* **m**ulti-**a**ntigen **s**equence **t**yping for ***N**eisseria **g**onorrhoeae* (NG-MAST) and
_**N**eisseria **g**onorrhoeae_ **s**equence **t**yping for **a**ntimicrobial **r**esistance (NG-STAR).

## Synopsis
```
% ngmaster gono.fa
ID NG-MAST POR TBPB
gono.fa 10699 6277 4
ngmaster gono.fa
FILE SCHEME NG-MAST/NG-STAR porB_NG-MAST tbpB penA mtrR porB_NG-STAR ponA gyrA parC 23S
gono.fa ngmaSTar 4186/231 2569 241 23 42 100 100 10 2 100
```

## Dependencies

* [Python >= 3.6](https://www.python.org/)
* [Python >= 3.7](https://www.python.org/)
* [BioPython](http://biopython.org/)
* [isPcr >= v33x2](http://hgwdev.cse.ucsc.edu/~kent/src/) by Jim Kent
* [mlst](https://github.com/tseemann/mlst)

## Installation

#### PiPy
```
# TODO how to integrate mlst dependency
pip3 install ngmaster
```
#### Brew
```
# TODO how to integrate mlst dependency
brew install brewsci/bio/ngmaster
```
#### Conda
```
conda install -c conda-forge -c bioconda -c defaults ngmaster # COMING SOON
conda install -c bioconda ngmaster
```

## Test

Once installed, you can run the following to ensure `ngmaster` is successfully working:

$ ngmaster --test
ngmaster --test

If everything works, you will see the following:

```
Running ngmaster.py on test example (NG-MAST 10699) ...
$ ngmaster.py test/test.fa
ID NG-MAST POR TBPB
test.fa 10699 6277 4
Running ngmaster.py on test example (NG-MAST 4186 / NG-STAR 231) ...
FILE SCHEME NG-MAST/NG-STAR porB_NG-MAST tbpB penA mtrR porB_NG-STAR ponA gyrA parC 23S
test.fa ngmaSTar 4186/231 2569 241 23 42 100 100 10 2 100
... Test successful.
```

## Usage

$ ngmaster -h
ngmaster -h

usage:
ngmaster [OPTIONS] <fasta1> <fasta2> <fasta3> ... <fastaN>

In silico multi-antigen sequence typing for Neisseria gonorrhoeae (NG-MAST)
and Neisseria gonorrhoeae Sequence Typing for Antimicrobial Resistance (NG-STAR)

Please cite as:
Kwong JC, Goncalves da Silva A, Howden BP and Seemann T.
Expand All @@ -71,24 +74,37 @@ test.fa 10699 6277 4
optional arguments:
-h, --help show this help message and exit
--db DB specify custom directory containing allele databases
directory must contain database files "POR.tfa", "TBPB.tfa", and "ng_mast.txt"
--csv output comma-separated format (CSV) rather than tab-separated
--printseq FILE specify filename to save allele sequences to (default=off)
--updatedb update allele database from <www.ng-mast.net>
directory must contain database sequence files (.tfa) and allele profile files (ngmast.txt / ngstar.txt)
in mlst format (see <https://github.com/tseemann/mlst#adding-a-new-scheme>)
--csv output comma-separated format (CSV) rather than tab-separated
--printseq FILE specify filename to save allele sequences to
--minid MINID DNA percent identity of full allele to consider 'similar' [~]
--mincov MINCOV DNA percent coverage to report partial allele at [?]
--updatedb update NG-MAST and NG-STAR allele databases from <https://rest.pubmlst.org/db/pubmlst_neisseria_seqdef>
--assumeyes assume you are certain you wish to update db
--test run test example
--comments include NG-STAR comments for each allele in output
--version show program's version number and exit


## Quick start

**To perform *in silico* NG-MAST on FASTA files:**
**To perform *in silico* NG-MAST and NG-STAR typing on FASTA files:**

`$ ngmaster <fasta1> <fasta2> <fasta3> ... <fastaN>`

The NG-MAST result and allele numbers are printed in tab-separated format to `stdout`.
* If an allele is not found (ie. unable to located with primers), the allele result is "``".
* If an allele is found (ie. located with primers), but the conserved region containing the starting key motif required for sequence trimming cannot be located, the allele result is "`no_key`".
* If an allele is found (ie. located with primers), but the trimmed sequence is novel, and not in the current database, the allele result is "`new`".
The NG-MAST and NG-STAR results and allele numbers are printed in tab-separated format to `stdout`.

* `ngmaster` reports alleles according to the same rules that are implemented in `mlst`.
* `mlst`'s arguments `--minid` and `--mincov` are available directly in `ngmaster`
* For each allele n:

Symbol | Meaning | Length | Identity
--- | --- | --- | ---
`n` | exact intact allele | 100% | 100%
`~n` | novel full length allele similar to n | 100% | &ge; `--minid`
`n?` | partial match to known allele | &ge; `--mincov` | &ge; `--minid`
`-` | allele missing | &lt; `--mincov` | &lt; `--minid`
`n,m` | multiple alleles | &nbsp; | &nbsp;

**To save results to a tab-separated text file, redirect `stdout`:**

Expand All @@ -98,13 +114,18 @@ The NG-MAST result and allele numbers are printed in tab-separated format to `st

`$ ngmaster --csv <fasta1> <fasta2> <fasta3> ... <fastaN>`

**To save sequences of the alleles to a file (eg. for uploading "new" sequences to [http://www.ng-mast.net](http://www.ng-mast.net/)):**
**To save sequences of the alleles to a file (eg. for uploading "new" sequences to [PubMLST](https://rest.pubmlst.org/db/pubmlst_neisseria_seqdef/)):**

`$ ngmaster --printseq [filename] <fasta1> <fasta2> <fasta3> ... <fastaN>`

This will create two files:

1. `NGMAST__filename`
2. `NGSTAR__filename`

## Updating the allele databases

**To update the allele databases from http://www.ng-mast.net :**
**To update the allele databases from [PubMLST](https://rest.pubmlst.org/db/pubmlst_neisseria_seqdef/):**
*Warning: This will overwrite the existing databases so ensure you back them up if you wish to keep them.*

$ ngmaster.py --updatedb
Expand All @@ -120,15 +141,9 @@ This can then be specified when running ngmaster using the ```--db path/to/fold

## Creating a custom allele database

1. Create custom database files: `POR.tfa`, `TBPB.tfa`, `ng_mast.txt`
See default `db` directory for examples.
`POR.tfa` and `TBPB.tfa` contain the respective allele sequences in FASTA format.
`ng_mast.txt` contains a list of NG-MAST types and the corresponding allele types.

2. Place the custom database files in a folder.

3. Specify the path to that custom database folder:
`$ ngmaster --db [/path/to/custom/folder/] <fasta1> <fasta2> <fasta3> ... <fastaN>`
To create a custom allele database please follow the instructions for creating a custom ```mlst``` database
described [here](https://github.com/tseemann/mlst#adding-a-new-scheme).
Usually, this should not be necessary, simply run `ngmaster --update` to update to the latest NG-MAST and NG-STAR schemes from PubMLST.

## Citation

Expand All @@ -141,30 +156,25 @@ DOI:[10.1099/mgen.0.000076](https://doi.org/10.1099/mgen.0.000076)

## Bugs

### Software
Please submit via the [GitHub issues page](https://github.com/MDU-PHL/ngmaster/issues).

### Database
Note that the NG-MAST databases and website are curated and hosted at the
Department of Infectious Disease Epidemiology, Imperial College London. For
issues with the NG-MAST databases, please contact the [NG-MAST
curator](mailto:[email protected]).

## Software Licence

[GPLv2](https://github.com/MDU-PHL/ngmaster/blob/master/LICENSE)

## References

* Martin et al. J Infect Dis, 2004 Apr 15; 189(8): 1497-1505.
* See also [http://www.ng-mast.net](http://www.ng-mast.net/).
* [Martin et al. J Infect Dis, 2004 Apr 15; 189(8): 1497-1505](https://doi.org/10.1086/383047).
* [Demczuk et al. J Clin Microbiol, 2017 May; 55(5): 1454-1468](https://doi.org/10.1128/jcm.00100-17)
* See also [PubMLST](https://rest.pubmlst.org/db/pubmlst_neisseria_seqdef/).

## Authors

* Jason Kwong (@kwongjc)
* Anders Gonçalves da Silva (@drandersgs)
* Mark Schultz (@schultzm)
* Torsten Seemann (@torstenseemann)
* Andreas Stroehlein (@stroehleina)

## Development

Expand Down
2 changes: 1 addition & 1 deletion ngmaster/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Script by Jason Kwong & Torsten Seemann
# In silico multi-antigen sequence typing for Neisseria gonorrhoeae (NG-MAST)

__version__ = "0.5.8"
__version__ = "1.0.0"

Loading

0 comments on commit b44e980

Please sign in to comment.