-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-Core support #39
Conversation
Hi Frankie, This is to already say thanks for taking a look at this and submitting a pull request. I was running a few tests today (which I didn't quite finish), and am going to comment in more detail tomorrow. |
Right, here are some more details. Again many thanks for giving this a whirl. So far I have refrained from adding the multi-core option mainly because I thought that it might break the paired-end option completely and therefore require a complete rewrite of Trim Galore. From my tests it appears that giving Cutadapt several cores with your new option From all that I saw the new option The attached graph shows that using 2 cores for parallel trimming decreased the processing time by ~30% for single-end, and ~20% for paired-end trimming. Anything beyond 2 cores does not make any difference time-wise, but eats up additional resources. I suppose a bottle neck here could be either that Cutadapt is trying to keep the reads in the same order in which they were fed in (which takes time), or that the additional checks (e.g. 5' or 3' trimming) and the writing out within Trim Galore limit the speed at which reads are trimmed. The Cutadapt documentation also mentions that one need to have Apart from the limitations described in the Cutadapt documentation, there is the additional issue with the multi-core option that the parallel trimming mode requires Python 3.3 or later. Since were running Cutadapt with Python 2.7 the whole Trim Galore process died (not very gracefully) because Cutadapt died internally. If we wanted to implement this option we would want a check in Trim Galore that would test first whether Python 2 or 3 was being used, and would terminate multi-core runs if the used Cutadapt or Python versions wouldn't support it. Given into how much trouble one can run with Python2 and Python3 installations and only one What would be useful in the long run would be to change the paired-end processing mode completely to using the Cutadapt option of handling paired-end reads together (which didn't exist when we wrote Trim Galore back in 2011 or so...). This would avoid the sequential SE trimming and validation step, and would therefore certainly decrease PE processing time substantially - probably way more than the multi-core option. This has been discussed previously (e.g. here), but so far I haven't found the time to look at it further. Going forward with this pull request I would say that we should at least add checks for the version of Python (2/3.3+) and Cutadapt so that Trim Galore can fail more gracefully if someone attempts to use it. In addition the option should probably limited to 2 cores (as more than 2 doesn't appear to make sense), and add some help and warning text about this option. What are your thoughts about it? Best, Felix |
…cutadapt as well as as pigz for multicore (de)compression.
…ting multicore support accordingly. If python 2 detected, use cutadapt with one core.
… name after version, breaking the previous way
…utput pigz version to screen, just chekc return code
Hello @FelixKrueger, I have updated the Trim-Galore code, to check for the version of python found in PATH, as this should be the python that cutadapt is using as well. I ran this in two separate conda environments one with python 2.7.12 and the other with python 3.6.3. In the case of python 2.x.x, the number of cores will be 1, where as with python 3.x.x, it will be what ever is supplied with After running some basic tests, it is clear that when pigz is installed and cutadapt is being run with python 3.x.x, that the speed is significant and will increase as the number of cores increases, it will eventually get bottlenecked writing to the disk (with a HDD anyway), but this depends on your drive. Previous Version: |
Hi @fjames003, Thanks for your work on this yesterday. I have cloned your version and ran a few tests with it. Using As it stands, your version is not working on our side because both tests for the version of Python and I will nevertheless merge your version, and attempt to solve the remaining issues this afternoon. Thanks again! Felix |
Hi Frankie, I have tried to get the multi-core support so that it also works on our our cluster, and also for single-end files. I still need to do some testing but overall I am quite happy with its performance. Would you mind cloning the latest dev version and letting me know if it also works in your hands? Thanks, Felix |
Hi @FelixKrueger, I like your idea of grabbing the shebang from the top of the Thanks Frankie |
@FelixKrueger @fjames003 what was the verdict? (scoots to edge of my seat) |
Hmm let me see. It works? It is fast(er)? If you want it to also deal well with Cutadapt versions from 2011, I would suggest you get the current dev version (v0.6.1_dev). |
@FelixKrueger, don't you love having to handle people who are using 8 year old versions of @radlinsky, The verdict was that the multicore support increased the performance quite a bit. If you are running on a machine with multiple cores or a cluster, I would definitely take advantage of this. Just make sure you have |
absolutely :P |
I have added options to trim-galore that take advantage of cutadapts multicore capabilities.