Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot run PhyloPhlAn with supermatrix_nt.cfg #63

Open
zhangws119 opened this issue May 25, 2021 · 1 comment
Open

cannot run PhyloPhlAn with supermatrix_nt.cfg #63

zhangws119 opened this issue May 25, 2021 · 1 comment
Assignees

Comments

@zhangws119
Copy link

PhyloPhlAn version 3.0.60 (27 November 2020)

Command line: /home/geobig/User/Softwares/Anaconda3/envs/phylophlan/bin/phylophlan -i 0_selected_genomes/ -d /home/geobig/Users/Liuli/phylophlan_databases/ -f phylophlan/supermatrix_nt.cfg -o 2_phylophlan --nproc 20 --diversity low --accurate --verbose

Automatically setting "database=phylophlan_databases" and "databases_folder=/home/geobig/Users/Liuli"
Automatically setting "input=0_selected_genomes" and "input_folder=/home/geobig/Users/Wensi/nitrospirae_tree"
"low-accurate" preset
Arguments: {'input': '0_selected_genomes', 'clean': None, 'output': '2_phylophlan', 'database': 'phylophlan_databases', 'db_type': None, 'config_file': 'phylophlan/supermatrix_nt.cfg', 'diversity': 'low', 'accurate': True, 'fast': False, 'clean_all': False, 'database_list': False, 'submat': 'pfasum60', 'submat_list': False, 'submod_list': False, 'nproc': 20, 'min_num_proteins': 1, 'min_len_protein': 50, 'min_num_markers': 1, 'trim': 'not_variant', 'gap_perc_threshold': 0.67, 'not_variant_threshold': 0.99, 'subsample': None, 'unknown_fraction': 0.3, 'scoring_function': None, 'sort': False, 'remove_fragmentary_entries': False, 'fragmentary_threshold': 0.85, 'min_num_entries': 4, 'maas': None, 'remove_only_gaps_entries': False, 'mutation_rates': False, 'force_nucleotides': False, 'input_folder': '/home/geobig/Users/Wensi/nitrospirae_tree/0_selected_genomes', 'data_folder': '2_phylophlan/tmp', 'databases_folder': '/home/geobig/Users/Liuli', 'submat_folder': '/home/geobig/User/Softwares/Anaconda3/envs/phylophlan/lib/python3.7/site-packages/phylophlan/phylophlan_substitution_matrices/', 'submod_folder': '/home/geobig/User/Softwares/Anaconda3/envs/phylophlan/lib/python3.7/site-packages/phylophlan/phylophlan_substitution_models/', 'configs_folder': '/home/geobig/User/Softwares/Anaconda3/envs/phylophlan/lib/python3.7/site-packages/phylophlan/phylophlan_configs/', 'output_folder': '', 'genome_extension': '.fna', 'proteome_extension': '.faa', 'update': False, 'verbose': True}
Loading configuration file "phylophlan/supermatrix_nt.cfg"
Checking configuration file
Checking "/home/geobig/User/Softwares/Anaconda3/envs/phylophlan/bin/makeblastdb"
Checking "/home/geobig/User/Softwares/Anaconda3/envs/phylophlan/bin/blastn"
Checking "/home/geobig/User/Softwares/Anaconda3/envs/phylophlan/bin/mafft"
Checking "/home/geobig/User/Softwares/Anaconda3/envs/phylophlan/bin/trimal"
Checking "/home/geobig/User/Softwares/Anaconda3/envs/phylophlan/bin/FastTreeMP"
Checking "/home/geobig/User/Softwares/Anaconda3/envs/phylophlan/bin/raxmlHPC-PTHREADS-SSE3"
Traceback (most recent call last):
File "/home/geobig/User/Softwares/Anaconda3/envs/phylophlan/bin/phylophlan", line 10, in
sys.exit(phylophlan_main())
File "/home/geobig/User/Softwares/Anaconda3/envs/phylophlan/lib/python3.7/site-packages/phylophlan/phylophlan.py", line 3227, in phylophlan_main
verbose=args.verbose)
File "/home/geobig/User/Softwares/Anaconda3/envs/phylophlan/lib/python3.7/site-packages/phylophlan/phylophlan.py", line 818, in init_database
for f in glob.iglob(os.path.join(folder, '*'))
File "/home/geobig/User/Softwares/Anaconda3/envs/phylophlan/lib/python3.7/site-packages/phylophlan/phylophlan.py", line 819, in
for _, seq in SimpleFastaParser(bz2.open(f, 'rt') if f.endswith('.bz2') else open(f))])
File "/home/geobig/User/Softwares/Anaconda3/envs/phylophlan/lib/python3.7/site-packages/Bio/SeqIO/FastaIO.py", line 47, in SimpleFastaParser
for line in handle:
File "/home/geobig/User/Softwares/Anaconda3/envs/phylophlan/lib/python3.7/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xba in position 1035: invalid start byte

Dear developers,

I have a problem like that when I run the command. Could you help me solve the problem?

Thanks very much.

@fasnicar
Copy link
Collaborator

Hi, and thanks for reporting this. I believe the issue here is with the database parameter:

-d /home/geobig/Users/Liuli/phylophlan_databases/

The -d param should take the db name and not the path to the database(s) folder.
If you want to use the phylophlan database you should specify:

-d phylophlan

and you don't need to provide the path as that should be automatically detected (I'm assuming that's the default database location). In case not, you can either specify:

-d /home/geobig/Users/Liuli/phylophlan_databases/phylophlan/

or

-d phylophlan --databases_folder /home/geobig/Users/Liuli/phylophlan_databases/

Now, having fixed the database parameter, I noticed you specified:

-f phylophlan/supermatrix_nt.cfg

and this configuration file is for a gene database (nucleotides). If you indeed wanted to use the phylophlan database, that's a collection of 400 universal proteins, so you should use the supermatrix_aa.cfg instead.

Please, let me know if this fixes your problem.

Many thanks,
Francesco

@fasnicar fasnicar self-assigned this May 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants