v1.0.0: E.coli serotyping with QC module and adaptive thresholding
Major improvements:
- Incorporation of Quality Control module allowing for easier results interpretation and any need for correction measure (re-sequencing, wet-lab serotyping). Unique thresholding at allele level allowing to determine if a given allele and query quality parameters (
%identity
and%coverage
) are sufficient to resolve an antigen call unambiguously. - Cluster friendly behaviour supporting multiple instances via a
.lock
file preventing racing conditions and simultaneous database update via several instances - An updated database of alleles with the removal of duplicated or truncated alleles (e.g. O157 antigen)
- Improved species identification resolution for highly similar non-Ecoli species such as Shigella and E.albertii. Now species identification is only done via MASH NCBI RefSeq sketch (https://gembox.cbcb.umd.edu/mash/refseq.genomes.k21s1000.msh)
- Users can add new alleles to an existing allele database and make serotype predictions via custom allele database thanks to
--dbpath
parameter - Improved O and H antigens call rates and accuracy thanks to decoupling of
%identity
and%coverage
thresholds for each antigen. Now global thresholds could be specified separately. This is especially important if one of the antigen genes (e.g.wzx
/wzy
or fliC, etc) is truncated or has low coverage - Improved adaptive O antigen calling rates if only a single O antigen candidate in preliminary BLAST results is available making accurate O antigen call even in poorly sequenced samples with minimal coverage.
- Addition of mixed O antigen calls for highly similar O antigens (e.g. O17/O77)
- Allele names/keys used to make antigen calls are also reported making easier troubleshooting for dubious alleles and alleles database cleaning
- More detailed error messages and support for 16 high similarity O-antigens (%identity > 99%) based on the reference publication PMID: 25428893