Skip to content

Releases: phac-nml/ecoli_serotyping

v2.0.0: Pathotyping and Shiga toxin typing

12 Dec 12:30
21f2cbd
Compare
Choose a tag to compare
  • Updated species identification module now based on GTDB + custom Escherichia and Shigella sketch covering all known bacterial species
  • Implemented pathotyping covering 7 DEC Escherichia coli pathotypes (DAEC, EAEC, EHEC, EIEC, EPEC, ETEC and STEC) supporting simultaneous presence of multiple signatures (e.g. ETEC/STEC). Note that EHEC is reported as EHEC-STEC as this is a more severe subtype of STEC.
  • Implemented Shiga 1 and 2 toxin typing supporting multiple toxin signatures present in a single sample.
    • A total of 4 stx1 subtypes are supported: stx1a, stx1c, stx1d and stx1e.
    • A total of 15 stx2 subtypes are supported: stx2a, stx2b, stx2c, stx2d, stx2e, stx2f, stx2g ,stx2h, stx2i, stx2j, stx2k, stx2l, stx2m, stx2n, stx2o.
  • new database of pathotypes and toxins in JSON clear transparent format composed of the key virulence factors based on both BioNumerics and literature sources
  • support for gzip compressed inputs fastq.gz and fasta.gz saving storage and increasing versatility
  • other toxin typing covering enterohemolysin A (ehxA), hemolysin E (hlyE), hemolysin A (hlyA)
  • support for long raw reads improving mapping capabilities of bowtie2

v1.0.0: E.coli serotyping with QC module and adaptive thresholding

24 Apr 05:39
Compare
Choose a tag to compare

Major improvements:

  • Incorporation of Quality Control module allowing for easier results interpretation and any need for correction measure (re-sequencing, wet-lab serotyping). Unique thresholding at allele level allowing to determine if a given allele and query quality parameters (%identity and %coverage) are sufficient to resolve an antigen call unambiguously.
  • Cluster friendly behaviour supporting multiple instances via a .lock file preventing racing conditions and simultaneous database update via several instances
  • An updated database of alleles with the removal of duplicated or truncated alleles (e.g. O157 antigen)
  • Improved species identification resolution for highly similar non-Ecoli species such as Shigella and E.albertii. Now species identification is only done via MASH NCBI RefSeq sketch (https://gembox.cbcb.umd.edu/mash/refseq.genomes.k21s1000.msh)
  • Users can add new alleles to an existing allele database and make serotype predictions via custom allele database thanks to --dbpath parameter
  • Improved O and H antigens call rates and accuracy thanks to decoupling of %identity and %coverage thresholds for each antigen. Now global thresholds could be specified separately. This is especially important if one of the antigen genes (e.g. wzx/wzy or fliC, etc) is truncated or has low coverage
  • Improved adaptive O antigen calling rates if only a single O antigen candidate in preliminary BLAST results is available making accurate O antigen call even in poorly sequenced samples with minimal coverage.
  • Addition of mixed O antigen calls for highly similar O antigens (e.g. O17/O77)
  • Allele names/keys used to make antigen calls are also reported making easier troubleshooting for dubious alleles and alleles database cleaning
  • More detailed error messages and support for 16 high similarity O-antigens (%identity > 99%) based on the reference publication PMID: 25428893

Minor bugs correction in species identification and increased robustness of the --verify switch

07 Dec 15:05
a7e67b6
Compare
Choose a tag to compare
Merge pull request #78 from kbessonov1984/master

Version 0.9.1 addressing minor issues on species identification and fasta files handling

E.coli serotyping with ability to differentiate between Shigella and other Escherichia cryptic species

05 Oct 23:38
bcc0e6a
Compare
Choose a tag to compare
  • improved O-antigen serotyping coverage of complex samples that lack some O-antigen signatures
  • better complex cases handling and error recovery in cases of poor reference allele coverage
  • improved O-antigen identification precision favoring the presence of both alleles (e.g. wzx and wzy) to support the final call. The sum of scores for both alleles of the same antigen is used in ranking now
  • automatic download and update of RefSeq genome sketches every 6 months
  • addition of Quality Control flags in the output (as an extra column in the results.tsv) for ease of results interpretation
  • improved species identification for the FASTQ files. All raw reads are used for species identification
  • query length coverage default threshold lowered from 50% to 10% to account for truncated alleles. This greatly improved the sensitivity of the tool while not changing significantly specificity
  • wrote additional unit tests to cover all aspects of the program
  • file lock application when updating RefSeq sketch and assembly stats files

ectyper

20 Sep 16:37
754ff4b
Compare
Choose a tag to compare

Updates output file from output.csv to output.tsv.
Adds column headers to the output.

ectyper

19 Jul 19:49
09d4a83
Compare
Choose a tag to compare

Adds multicore support and improved prediction accuracy.

ectyper

05 Jul 15:53
3988788
Compare
Choose a tag to compare

bioconda version

ectyper

04 Jul 21:08
e7cec06
Compare
Choose a tag to compare

Code cleanup and slightly modified output format.

ectyper

04 Jul 17:34
Compare
Choose a tag to compare

bioconda realease candidate; formatting update

ectyper

04 Jul 15:53
Compare
Choose a tag to compare

bioconda release candidate