-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ALG with odp_nway_rbh fails with blast #21
Comments
Hi Tauana, Is your only goal to recreate the analyses of Simakov et al 2022? If I know what your goals are maybe I can better answer your question. |
Thanks for the quick reply Darrin! |
Hi again Darrin, Command: Is there another parameter for the config file that would generate ribbon plots? Thanks! |
Yes, that's right - the way the software is implemented now there is a set of BCnS ALGs that each analysis with the
I think plotorder is broken at the moment due to a recent change in how chromosomes are sorted along the x-axis. It sounds like maybe the software is plotting a lot of small scaffolds, in addition to the 4 chromosome scale ones. Can you figure out your smallest scaffold that you want to be plotted, then add this bit to that species' entry?
It is a bit cumbersome at the moment to make these plots, but easier than other solutions out there. Working on making this easier though! Please look at this config file and it should be enough to run the script. Make sure you point to the directory with the BCnS_LGs .rbh files to see the provenance of your species' 4 chromosomes in relation to those ALGs.
Sure, please let me know if you have suggestions for the documentation or anything else to improve the user experience. |
Thank you very much Darrin! I am running odp again on the set of species I want to use in the ribbon plot: Hydra, lancelet, drosophila, Celegans and my genome. Basically same command and config file as before, but now I am getting a new error related to colors in the plot (below). Is there a way I could bypass this? Error in rule plot_synteny_of_ALGs_plus_species: RuleException: |
The color |
@tauanajc - I found the source of the error. I think it was caused by some software automatically incrementing the numbers at the end of the color string in one of the ALG database files. The easiest fix right now is to delete the files called |
Hi Darrin, sorry for the late reply. I don't have a LG_db folder, so I removed all files with UnicellMetazoanLgs* in the entire analysis, but the run still failed (perhaps I deleted files that shouldn't have). The dot plots should be sufficient for what I need at the moment, so I will try the ribbon plots again later on. Thanks again for all the help. How should I cite opd at the moment? Thanks! |
Hi Tauana, the folder will be where you installed odp, so I will work on making the interface easier to work with, but for now please let me know if you would like to get the subway plots/ribbon diagrams working properly for your species. I am also working on improving the implementation of this tool to make it easier for people to use. The taxa used to generate the |
Closed the issue, but please open again with new relevant info if need be. |
Dear Darrin,
I contact Oleg about doing synteny analyses similar to the 2022 metazoan paper and he told me about odp. It looks like a fantastic tool and I'm excited to be trying it.
I'm trying the second use case to get ALGs from 3 NCBI genomes, the lancelet, scallop and jellyfish. My first attempt was with diamond, but there was an error pointing to line 242 of the odp_functions.py script. I fixed it by replacing
illegal = set("prot_to_loc", "prot-to-loc")
withillegal = set()
.Steps 1 and 2 then ran fine! However, the list of orthogroups in the reciprocal_best_hits.rbh.groupby was relatively short compared to the metazoan paper (2648 groups, instead of >6000). I decided to try blast instead of diamond, and also to increase the number of permutations (there was no example in the github, so I originally used
num_permutations: 1000000
, and now increased to 100000000). Running with blast quickly fails. The end of the log file is below, and I attach the config and log files:Waiting at most 5 seconds for missing files.
MissingOutputException in rule diamond_blast_x_to_y in file /home/FM/tcunha/scripts/odp/scripts/odp_nway_rbh, line 165:
Job 9 completed successfully, but some output files are missing. Missing files after 5 seconds. This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait:
odp_nway_rbh/step0-blastp_results/Bfloridae_against_Myessoensis.blastp
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2023-04-26T163218.427529.snakemake.log
Here is an example of the first few analyses: [['Bfloridae', 'Hvulgaris', 'Myessoensis']]
There are 1 possible combinations.
step1.log
config.yaml.txt
I would love to have it running on blast, but also any ideas about how to reproduce the >6000 groups would be appreciated!
Thanks!
The text was updated successfully, but these errors were encountered: