-
Notifications
You must be signed in to change notification settings - Fork 445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix HISAT2 strandedness parameter #1470
Conversation
Can we add a test and increase the version number? |
Hi @bgruening thanks for taking a look at this. I've added a test and increased the version number. Something I don't understand yet with the wrapper versioning is how come I did not have to bump the version number for this kallisto bugfix #1463 but I do need to now for this bugfix (or should I really have bumped the kallisto version?)? I added a test but I don't know if it's a good one, as all it does is check that a bam is output when the rna_strandedness parameter is set to F. I was thinking a better test could be to check that the header of the bam in the output contains "--rna-strandness FR". Is it possible to check the text of a BAM file header in a test? |
tools/hisat2/hisat2_macros.xml
Outdated
@@ -78,7 +78,15 @@ | |||
<token name="@strandedness_parameters@"> | |||
#if str($spliced_options.spliced_options_selector) == "advanced": | |||
#if str($spliced_options.rna_strandness).strip() != '': | |||
--rna-strandness $spliced_options.rna_strandness | |||
#if str($input_format.paired.paired_selector) == 'paired': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you use a tool for checking indentation? my eyes don't seem to be very reliable :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mblue9 Unfortunately I'm not aware of tool to check indentation for the Cheetah code inside XML, so basically it's my eyes (on GitHub) or searching for 4 spaces in the text editor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks for the tips!
tools/hisat2/hisat2.xml
Outdated
@@ -464,7 +464,7 @@ | |||
<output name="output_unaligned_reads_l" file="test_unaligned_reads.fasta" /> | |||
<output name="output_unaligned_reads_r" file="test_unaligned_reads.fasta" /> | |||
</test> | |||
<test><!-- Ensure fastqsanger.gz works --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd leave the comment here and below, any specific reason to remove them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No reason to remove so I put those comments back. I only removed them because they were getting in the way of me commenting out tests while doing my own testing. I just forgot to put them back. I put them back after your review here but then I removed them again for more testing and then forgot to put them back again :( They're back again now.
tools/hisat2/hisat2.xml
Outdated
<param name="history_item" ftype="fasta" value="phiX.fa" /> | ||
<param name="forward" ftype="fastqsanger" value="hisat_input_1_forward.fastq" /> | ||
<param name="reverse" ftype="fastqsanger" value="hisat_input_1_reverse.fastq" /> | ||
<param name="rna_strandedness" value="F" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/rna_strandedness/rna_strandness/
This may actual cause the output bam to be different, I imagine)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well this highlighted that my test really was terrible! That strandness parameter was never used because of the typo so my test was checking against a incorrect bam! The lines diff check doesn't catch that but checking the header would, so if there is ever a way in the future to check a bam header that could be cool to help make tests like this one more robust.
I'm not sure this was ever formalised, but you generally need to increase the version number for:
You don't need to increase the version number for:
Changing the command (like in this case) is borderline, because it doesn't hinder workflow compatibility, but changes the output, so it may be a problem for reproducibility, if you really want to reproduce wrong results! |
Hi there, great to see things getting picked up :) I am not sure on this fix. It looks like the problem is more the way the input is requested. Currently the 'rna_strandness' is part of the spliced options. Would it not be better to put this close to the input format stuff? For example in a paired-end/sngle-end conditional?
Then in the command you can run something like this, without hacking your values back:
|
- fixed the rna-strandness bug, now HISAT2 works correctly with paired-end stranded reads - made the HISAT2 wrapper more consistent with the Bowtie2 wrapper - removed parameters not listed in HISAT2 options (e.g. --ma, --dovetail) - added some more parameters as options (e.g. no-templatelen-adjustment) - updated HISAT2 version to 2.1.0 (and also updated the test data outputs) - added options to output new summary format and summary file - added test for the summary file output - updated wrapper version - added sections
To fix this bug I've tried to make the change to how the input is requested, as suggested by @yhoogstrate (I thought that was a really good suggestion), however this was a more complicated wrapper than what I've worked on so far and I got a bit tangled up in errors for a bit.
Now I don't know if in the process of trying to fix this bug I've made too many changes!! If you think so let me know and I can change things back. |
@mblue9 I see two strand selection drop downs for the paired-end and paired-end (collection) data: Is this intended?
This is great but it doesn't need to test every (e.g. boolean) parameter because there are quite a lot. Conditionals are more important to test since they may introduce bugs if not all 's are correctly implemented. What I also prefer are small comments explaining what's being tested (like |
Thanks for the review @yhoogstrate! The two drop downs are intended, in that both options are given by HISAT- one is for if the sequencing assay was stranded (the more common option I would say), the other is generally --fr for paired-end sequencing, but can differ for mate-pair sequencing, see here: https://gatkforums.broadinstitute.org/gatk/discussion/6327/paired-end-mate-pair
Thanks for the info on the tests! I'll take another look at what conditionals I should test and add some comments to the tests (I agree, they're helpful for quickly seeing what the tests are doing). |
Hi @yhoogstrate I've now:
|
Hi @mblue9, great, all the changes make me really happy! Sorry to say but I don't have time to go over everything today, but I will try to do this asap! This absolutely deserves a careful review.
I get it now and I should have seen this before shouting out loud. |
😄 thanks @yhoogstrate glad they make you happy. I did make a lot of changes, including updating the test data, so it would be great to have a careful review from a few pairs of 👀 |
I added some changes myself:
I still see a small thing that could be changed in the hisat2 wrapper. The disallow softclipping+ is requesting to disallow something. Such settings are always subject to discussion, but over several courses we have learned it is often easier to reprase this to 'allow softclipping' and put it by default on yes/true. This does not exactly reflect the commandline input but is more intuitive. You have recently suggested to update the iuc standards and added:
|
That's been possible for a while and is not related to the tool XML syntax/interpretation (it's just for testing), so it doesn't require the new profile. |
Thanks @nsoranzo and @yhoogstrate for the detailed feedback! And the edits! Especially like the XS info in the help. @yhoogstrate your other suggested changes sound good to me too so I've made them. In this commit I've:
|
Awesome stuff! Thanks a bunch @mblue9!!! And thanks for the review @yhoogstrate and @nsoranzo. |
Great!! Will this new version get picked up soon by Main? As this bug will be affecting the results of people there who are using HISAT2 with paired-end stranded data. |
P.S. The GTN tutorials using HISAT2 will need to be updated (screenshots and text) if they're using the new version of the tool, I just created an issue for that: galaxyproject/training-material#552 |
Submitting this PR to fix the bug reported here #1455
The suggested fix is from the reporter @NCEichner