We conducted whole-genome classification and a multigene maximum-likelihood phylogenetic reconstruction on a set of 49 genomes spanning five genera in the Pectobacteriaceae (including Musicola) to establish relative placement of the proposed Musicola genus. The genomes used were:
- Dickeya zeae
- CFBP 2052 (T) DZE2538_1.0
- MS2 ASM288755v1
- Ech586 ASM2506v1
- Dickeya chrysanthemi
- CFBP 2048 (T) DCH402_1.0
- Ech1591 ASM2356v1
- NCPPB 516 DCH516_v1.0
- Dickeya poaceiphila
- NCPPB 569 (T) ASM785897v1
- Dickeya fangzhongdai
- CFBP 8607 (T) ASM281248v1
- ND14b ASM75834v1
- LN1 ASM1485475v1
- Dickeya dianthicola
- CFBP 1200 (T) DDI453_v1.0
- ME23 ASM340313v1
- 16MB01 ASM1836112v1
- Dickeya solani
- CFBP 7345 (T) ASM164470v1
- IFB0223 ASM371833v1
- IFB0421 ASM1333418v1
- Dickeya dadantii
- CFBP 1269 (T) DDA898_v1.0
- NCPPB 3937 ASM14705v1
- DSM 18020 ASM304978v1
- Dickeya undicola
- CFBP 8650 (T) ASM78473
- FVG10-MFV-A16 ASM372557v1
- FVG1-MFV-O17 ASM372561v1
- Dickeya lacustris
- CFBP 8647 (T) ASM393429
- Dickeya aquatica
- CFBP 8348 (T) Daq1742
- Musicola paradisiaca
- Ech 703 ASM2354v1
- NCPPB 2511 (T) DPA2511_1.0
- Musicola keenii
- CFBP 8732 (T) ASM1485550v1
- Brenneria alni
- NCPPB 3934 (T) ASM366624v1
- Brenneria goodwinii
- FRB141 (T) ASM229144v1
- OBR1 B_goodwinii_PB
- FRB171 ASM366614v1
- Brenneria roseae
- LMG 27715 ASM311581v1
- LMG 27714 (T) ASM311584v1
- Lonsdalea britannica
- LMG 26267 (T) ASM211159v1
- LMG 26268 ASM211163v1
- Lonsdalea iberica
- LMG26264 (T) ASM211158v1
- LMG26265 ASM211162v1
- Lonsdalea quercina
- ATCC 29281 (T) IMG-taxon 2597490349 annotated assembly
- CFCC 11059 ASM326983v1
- CFCC 13731 ASM326981v1
- Pectobacterium atrosepticum
- SCRI 1043 (T) ASM1160v1
- 21A ASM74096v1
- JG10-08 ASM69646v1
- Pectobacterium parvum
- NCPPB 3395 ASM74991v1
- s0421 (T) Pc0421_3
- Y1 ASM80841v1
- Pectobacterium wasabiae
- CFBP 3304 (T) ASM174218v1
- NCPPB 3701 ASM74986v1
- NCPPB 3702 ASM75468v1
Each genome was downloaded from NCBI with ncbi-genome-download
v0.3.0 (https://github.com/kblin/ncbi-genome-download/) using the accession ID as identified. To ensure consistency of annotation between genomes, all sequences were reannotated using prodigal
v2.6.3 (Hyatt et al. 2010) to obtain the predicted proteome.
Whole-genome classification of the 49 genomes was performed using pyani
v0.3.0b (Pritchard et al. 2016) and the ANIm algorithm. Taking 94-96% identity as an approximate threshold corresponding to species division, and 40-50% coverage as an approximate threshold corresponding to genus division, the results support the following eight genus divisions:
- Dickeya (D. solani, D. dadantii, D. fangzhongai, D. undicola, D. dianthicola, D. paceiphila, D. zeae, D. chrysanthemi)
- Musicola (M. paradisiaca, M. keenii, formerly D. paradisiaca)
- Gen. nov. I (D. aquatica, D. lacustris)
- Lonsdalea (L. iberica, L. quercina, L. britannica)
- Pectobacterium (P. atrosepticum, P. wasabiae, P. parvum)
- Gen. nov. II (B. roseae)
- Gen. nov. III (B. alni)
- Gen. nov. IV (B. goodwinii)
and 22 species divisions:
- D. undicola
- D. dianthicola
- D. fangzhongdai
- D. solani
- D. dadantii
- D. zeae
- D. poaceiphila
- D. chrysanthemi
- P. parvum
- P. wasabiae
- P. atrosepticum
- M. paradisiaca
- M. keenii
- L. quercina ATCC 29281
- L. iberica
- L. sp. nov. (currently L. quercina CFCC 11059, L. quercina CFCC 13731)
- L. britannica
- B. goodwinii
- Gen. nov. I sp. nov. I (currently D. aquatica)
- Gen. nov. I sp. nov. II (currently D. lacustris)
- B. alni
- B. roseae
A total of 1201 single-copy orthologues were identified as present in the predicted proteomes (amino acid sequences) of all 49 genomes, using orthofinder
v2.5.2 (Emms & Kelly 2019). The protein sequences for each of the 1201 genes were aligned using MAFFT
v7.480 (Nakamura et al. 2018) and the corresponding CDS sequences threaded onto these alignments using t-coffee
v12.00.7fb08c2 (Notredame et al. 2000). The nucleotide alignments were concatenated into a single sequence per genome using the Python script concatenate_cds.py
, which also generated a partition file (one partition per gene) for the subsequent maximum-likelihood phylogenetic reconstruction.
The concatenated nucleotide sequence alignment of 1201 single-copy orthologues and corresponding partition file were used as input to raxml-ng
v1.0.2 (Kozlov et al. 2019). Initial processing with raxml-ng
recommended the GTR+F0+G4m+B model for each of the 1201 genes, and the partition file was used to allow individual parameterisation of this model for each gene. A single topology was found for all 20 trees, suggesting that this was the globally-optimal topology. One hundred bootstrap replicate trees were determined to estimate support values for each tree partition; MRE-based bootstoppiing indicated that convergence was reached with only 50 replicates. The best estimate from 20 starting trees was midpoint-rooted, manually annotated and coloured using figtree
v1.4.4 (http://tree.bio.ed.ac.uk/software/figtree/).
The resulting reconstruction supports the same genus and species divisions noted above for whole-genome classification using pyani
.
Both ANIm and a comprehensive multigene phylogeny support the same genus and species divisions, including establishment of Musicola as a novel genus.
We note in passing that these approaches also support division of Brenneria into multiple genus-level groups, establishment of a further genus-level group circumscribing genomes currently described as D. aquatica and D. lacustris, and reassignment of members of L. quercina.
Emms, D.M. and Kelly, S. (2019) OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biology 20:238
Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010 Mar 8;11:119. doi: 10.1186/1471-2105-11-119. PMID: 20211023; PMCID: PMC2848648.
Alexey M. Kozlov, Diego Darriba, Tomáš Flouri, Benoit Morel, and Alexandros Stamatakis (2019) RAxML-NG: A fast, scalable, and user-friendly tool for maximum likelihood phylogenetic inference. Bioinformatics, btz305 doi:10.1093/bioinformatics/btz305
Nakamura, Yamada, Tomii, Katoh 2018 (Bioinformatics 34:2490–2492) Parallelization of MAFFT for large-scale multiple sequence alignments. (describes MPI parallelization of accurate progressive options)
Notredame, Higgins, Heringa 2000 T-Coffee: A novel method for multiple sequence alignments. JMB, 302(205-217)
Pritchard et al. (2016) “Genomics and taxonomy in diagnostics for food security: soft-rotting enterobacterial plant pathogens” Anal. Methods, 2016, 8, 12-24 DOI: 10.1039/C5AY02550H