Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stage 5: Filter Fasta error #47

Open
karimi81 opened this issue Oct 27, 2021 · 1 comment
Open

Stage 5: Filter Fasta error #47

karimi81 opened this issue Oct 27, 2021 · 1 comment

Comments

@karimi81
Copy link

Hi there,
I have got an error at stage 5 indicating that the header of fasta files are not correct:
processing file GCF_014441545.1_ROS_Cfam_1.0_protein.fasta
The ID on def line '>GCF_014441545.1_ROS_Cfam_1.0_protein|NP_001002930.1' is missing the prefix '0_protein|' 'GCF_014441545.1_ROS_Cfam_1.0_protein'

I have downloaded the protein fasta files from the NCBI genome annotation for each species. I wonder if I need to edit these files before processing using pipeline. Can you please let me know which format of fasta file would require for orthomcl-pipeline
Thanks

@apetkau
Copy link
Owner

apetkau commented Oct 28, 2021

Are you using the OrthoMCL software available from https://orthomcl.org/orthomcl/app/downloads/software/ or the version from https://github.com/apetkau/orthomclsoftware-custom? I suspect you are using the version from orthomcl.org, which does not support underscores _ and other special characters in the fasta file name. The version in https://github.com/apetkau/orthomclsoftware-custom contains some modifications I made to support underscores _ and so you may wish to try installing this version instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants