-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
symlink specific gridss index files #54
Conversation
genome_gridss_index folder may contain additional files that are staged by nextflow if they are part of the inputs (e.g. someone might have the .dict file in this folder if they created it via gridss.PrepareReference). This causes an error when we try to symlink the entire contents of the folder. Could we change this line to symlink the specific files of interest?
|
Thanks for opening the PR! I don't think this affects users where the GRIDSS index was prepared with the builtin functionality but understand the current set up may have issues for externally created GRIDSS indexes that contain clashing file names with other staged files as you've pointed out. I'm not sure what the best approach is here - only support precisely the expected GRIDSS index fileset or support all/common deviations from these expectations? Leaning towards the former but will accommodate in the case that you feel other users may experience the same issue. If that's the case, I'd suggest one of two approaches:
While the former requires strong alignment with expected files I'd probably prefer it since I find it safer than forceful replacement. And if making this change it would be good to apply to all instances where the GRIDSS index is used for consistency. |
Hi Stephen, Thanks for looking into this! I like the first option of explicitly including all index files. However, based on the current module, the genome_bwa_index folder isn't staged. Therefore only the files in the genome_gridss_index folder will be picked up (I think that's just the img and gridsscache files). |
Ah, the reference genome indexes have been rearranged recently and the new GRIDSS index directory contains the following files:
Current GRIDSS index directory for GRCh38_hmf (click to show)
Sorry for the confusion!
Okay, let's make this change then. Can you adjust each of the symlink/find commands for the GRIDSS index directory on your PR branch according to the first option above and also change the merge PR base branch to |
Thanks Stephen, sounds good! |
No, the user will need to place all the BWA index files under the same GRIDSS index directory themselves. I think this is okay since (1) it is required to create the GRIDSS index files anyway, and (2) the BWA index files are not used anywhere else in the workflow. Once the changes have been made, let's review and test! |
Accomodates users using externally created GRIDSS indexes that may contain clashing file names.
Thanks Stephen for explaining, agree with that logic! |
I'm going to rebase your commits on top of I noticed I made a typo in the |
6a26bee
to
fb49fd0
Compare
The 'Run pipeline stubs' check is showing that the new back-slashes need to be escaped:
|
Escaped back-slashes.
Okay, changes now look good. I reproduced the issue with the |
genome_gridss_index folder may contain additional files that are staged by nextflow if they are part of the inputs (e.g. someone might have the .dict file in this folder if they created it via gridss.PrepareReference). This causes an error when we try to symlink the entire contents of the folder. Suggestion: symlink only specific files of interest.