Strains resolution #80

LisaCarraro · 2022-01-28T15:45:51Z

Hallo,

I have some questions about the use of Phylophlan as phylogenetic analysis for microbial isolates/genomes :

Is it correct to use Phylophlan for detecting persistent strains ?
How does the assignement of SNPs work in Phylophlan ?
Does the mutation rate table give all the SNPs found in the alignment?
Are the core UniRef90 proteins chosen as they are conserved and suitable markers for this kind of analyses?
Can Phylophlan be compared with CFSAN or Lyve-SET, two methods of WGS data analysis ?
Or is Phylophlan approach more similar to the wgMLST / cgMLST methods ?

Thank you very much!!!

Best

Lisa

fasnicar · 2022-01-31T17:22:54Z

Hi Lisa,

Is it correct to use Phylophlan for detecting persistent strains ?

I think you can do it, the important aspect here is to make sure of which database of phylogenetic markers you use, as you would like them to be species-specific and not generic/universal.

How does the assignement of SNPs work in Phylophlan ?

PhyloPhlAn doesn't really assign SNPs, but they are coming from the MSAs of the phylogenetic markers in the database as they were mapped against the inputs. So, for this, you can change (if you have specific insights about the database and/or your inputs) the tool and/or its param specified in the config file (under the [msa] section) instead of using the default.

Does the mutation rate table give all the SNPs found in the alignment?

Not precisely. The mutation rates table is computed from the set of "comparable" positions from the first MSAs that are generated (so prior to any trimming and/or subsampling). Positions, for instance, where one of the two is a gap will not be considered in this estimation.

Are the core UniRef90 proteins chosen as they are conserved and suitable markers for this kind of analyses?

The core UniRef90 are coming from the core set as estimated by the ChocoPhlAn pipeline when extracting the unique markers for the MetaPhlAn database. So, these are UniRef90 that are core for the species according to the set of genomes as in the ChocoPhlAn database. Now, if you add more/new genomes to the analysis the coreness of these UniRef90 might change, that's why within PhyloPhlAn you can filter again the markers based on the actual coreness of the set of genomes you're analysing using the --min_num_entries parameter.

Can Phylophlan be compared with CFSAN or Lyve-SET, two methods of WGS data analysis ?

If you mean that the methodologies are similar, I would say that it depends. Suppose you use a very comprehensive set of genes for a given species, you can tune PhyloPhlAn params to discard very conserved positions and use all the other positions (without further trimming and/or subsampling) to reconstruct a whole-genome phylogeny. Of course, in this case, you need to make sure that the tools and their params in the config file are appropriate for the analysis you're doing.

Or is Phylophlan approach more similar to the wgMLST / cgMLST methods ?

PhyloPhlAn is, in general, more flexible as it can exploit a diverse set of genes/proteins databases to base the phylogenetic analysis on. And makers in the database can also be filtered based on how well they mapped against the given inputs.

I hope this helps.

Many thanks,
Francesco

LisaCarraro · 2022-02-08T10:00:58Z

Thank you very much for your very kind and precise explanation.

Lisa

fasnicar self-assigned this Jan 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strains resolution #80

Strains resolution #80

LisaCarraro commented Jan 28, 2022

fasnicar commented Jan 31, 2022

LisaCarraro commented Feb 8, 2022

Strains resolution #80

Strains resolution #80

Comments

LisaCarraro commented Jan 28, 2022

fasnicar commented Jan 31, 2022

LisaCarraro commented Feb 8, 2022