Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strains resolution #80

Open
LisaCarraro opened this issue Jan 28, 2022 · 2 comments
Open

Strains resolution #80

LisaCarraro opened this issue Jan 28, 2022 · 2 comments
Assignees

Comments

@LisaCarraro
Copy link

Hallo,

I have some questions about the use of Phylophlan as phylogenetic analysis for microbial isolates/genomes :

  • Is it correct to use Phylophlan for detecting persistent strains ?

  • How does the assignement of SNPs work in Phylophlan ?

  • Does the mutation rate table give all the SNPs found in the alignment?

  • Are the core UniRef90 proteins chosen as they are conserved and suitable markers for this kind of analyses?

  • Can Phylophlan be compared with CFSAN or Lyve-SET, two methods of WGS data analysis ?

  • Or is Phylophlan approach more similar to the wgMLST / cgMLST methods ?

Thank you very much!!!

Best

Lisa

@fasnicar fasnicar self-assigned this Jan 31, 2022
@fasnicar
Copy link
Collaborator

Hi Lisa,

Is it correct to use Phylophlan for detecting persistent strains ?

I think you can do it, the important aspect here is to make sure of which database of phylogenetic markers you use, as you would like them to be species-specific and not generic/universal.

How does the assignement of SNPs work in Phylophlan ?

PhyloPhlAn doesn't really assign SNPs, but they are coming from the MSAs of the phylogenetic markers in the database as they were mapped against the inputs. So, for this, you can change (if you have specific insights about the database and/or your inputs) the tool and/or its param specified in the config file (under the [msa] section) instead of using the default.

Does the mutation rate table give all the SNPs found in the alignment?

Not precisely. The mutation rates table is computed from the set of "comparable" positions from the first MSAs that are generated (so prior to any trimming and/or subsampling). Positions, for instance, where one of the two is a gap will not be considered in this estimation.

Are the core UniRef90 proteins chosen as they are conserved and suitable markers for this kind of analyses?

The core UniRef90 are coming from the core set as estimated by the ChocoPhlAn pipeline when extracting the unique markers for the MetaPhlAn database. So, these are UniRef90 that are core for the species according to the set of genomes as in the ChocoPhlAn database. Now, if you add more/new genomes to the analysis the coreness of these UniRef90 might change, that's why within PhyloPhlAn you can filter again the markers based on the actual coreness of the set of genomes you're analysing using the --min_num_entries parameter.

Can Phylophlan be compared with CFSAN or Lyve-SET, two methods of WGS data analysis ?

If you mean that the methodologies are similar, I would say that it depends. Suppose you use a very comprehensive set of genes for a given species, you can tune PhyloPhlAn params to discard very conserved positions and use all the other positions (without further trimming and/or subsampling) to reconstruct a whole-genome phylogeny. Of course, in this case, you need to make sure that the tools and their params in the config file are appropriate for the analysis you're doing.

Or is Phylophlan approach more similar to the wgMLST / cgMLST methods ?

PhyloPhlAn is, in general, more flexible as it can exploit a diverse set of genes/proteins databases to base the phylogenetic analysis on. And makers in the database can also be filtered based on how well they mapped against the given inputs.

I hope this helps.

Many thanks,
Francesco

@LisaCarraro
Copy link
Author

Thank you very much for your very kind and precise explanation.

Lisa

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants