Fix HISAT2 strandedness parameter #1470

mblue9 · 2017-09-08T22:06:14Z

Submitting this PR to fix the bug reported here #1455

The suggested fix is from the reporter @NCEichner

bgruening · 2017-09-11T11:34:46Z

Can we add a test and increase the version number?
Thanks @mblue9!

mblue9 · 2017-09-12T01:03:28Z

Hi @bgruening thanks for taking a look at this. I've added a test and increased the version number.

Something I don't understand yet with the wrapper versioning is how come I did not have to bump the version number for this kallisto bugfix #1463 but I do need to now for this bugfix (or should I really have bumped the kallisto version?)?

I added a test but I don't know if it's a good one, as all it does is check that a bam is output when the rna_strandedness parameter is set to F. I was thinking a better test could be to check that the header of the bam in the output contains "--rna-strandness FR". Is it possible to check the text of a BAM file header in a test?

nsoranzo · 2017-09-12T06:51:19Z

tools/hisat2/hisat2_macros.xml

@@ -78,7 +78,15 @@
    <token name="@strandedness_parameters@">
        #if str($spliced_options.spliced_options_selector) == "advanced":
            #if str($spliced_options.rna_strandness).strip() != '':
-                --rna-strandness $spliced_options.rna_strandness
+               #if str($input_format.paired.paired_selector) == 'paired':


Indentation.

Do you use a tool for checking indentation? my eyes don't seem to be very reliable :(

@mblue9 Unfortunately I'm not aware of tool to check indentation for the Cheetah code inside XML, so basically it's my eyes (on GitHub) or searching for 4 spaces in the text editor.

@mblue9 no worries, @nsoranzo just has special eagle-eye superpowers the rest of us don't possess ;) 🦅 👀

(but most editors have a setting to show whitespace characters and/or indentation guides, which helps for me)

Ok, thanks for the tips!

nsoranzo · 2017-09-12T09:54:20Z

tools/hisat2/hisat2.xml

@@ -464,7 +464,7 @@
            <output name="output_unaligned_reads_l" file="test_unaligned_reads.fasta" />
            <output name="output_unaligned_reads_r" file="test_unaligned_reads.fasta" />
        </test>
-        <test><!-- Ensure fastqsanger.gz works -->


I'd leave the comment here and below, any specific reason to remove them?

No reason to remove so I put those comments back. I only removed them because they were getting in the way of me commenting out tests while doing my own testing. I just forgot to put them back. I put them back after your review here but then I removed them again for more testing and then forgot to put them back again :( They're back again now.

nsoranzo · 2017-09-12T09:58:18Z

tools/hisat2/hisat2.xml

+            <param name="history_item" ftype="fasta" value="phiX.fa" />
+            <param name="forward" ftype="fastqsanger" value="hisat_input_1_forward.fastq" />
+            <param name="reverse" ftype="fastqsanger" value="hisat_input_1_reverse.fastq" />
+            <param name="rna_strandedness" value="F" />


s/rna_strandedness/rna_strandness/
This may actual cause the output bam to be different, I imagine)

Well this highlighted that my test really was terrible! That strandness parameter was never used because of the typo so my test was checking against a incorrect bam! The lines diff check doesn't catch that but checking the header would, so if there is ever a way in the future to check a bam header that could be cool to help make tests like this one more robust.

nsoranzo · 2017-09-12T11:35:09Z

Something I don't understand yet with the wrapper versioning is how come I did not have to bump the version number for this kallisto bugfix #1463 but I do need to now for this bugfix (or should I really have bumped the kallisto version?)?

I'm not sure this was ever formalised, but you generally need to increase the version number for:

modification of requirements
adding/removing params and outputs, or changing their type
everything else that may change how the tool works in a workflow.

You don't need to increase the version number for:

cosmetic changes, e.g. modifying tool name, description or help, param/output label or help

Changing the command (like in this case) is borderline, because it doesn't hinder workflow compatibility, but changes the output, so it may be a problem for reproducibility, if you really want to reproduce wrong results!
In my opinion, these changes don't need a version bump, but it's fine if you do it.

yhoogstrate · 2017-09-12T13:45:59Z

Hi there, great to see things getting picked up :)

I am not sure on this fix. It looks like the problem is more the way the input is requested. Currently the 'rna_strandness' is part of the spliced options. Would it not be better to put this close to the input format stuff? For example in a paired-end/sngle-end conditional?

<conditional name="mate_pairs">
	<input type=select> 
		<option value="single-end">
		<option value="paired-end">
	</option>

	<when pe>
		<conditional name="fastx" label="fasta/fastq">
			<input files>
			<input files>
		</conditional>
		
		<input name="strandness" help="strandness selection for pe">
			<option value="FR">
			<option value="RF">
		</input>
	</when>
	
	<when se>
		<conditional name="fastx" label="fasta/fastq">
			<input files>
		</conditional>
		
		<input name="strandness" help="strandness selection for se">
			<option value="F">
			<option value="R">
		</input>
	</when>
</conditional>

Then in the command you can run something like this, without hacking your values back:

 --rna-strandness {$mate_pairs.strandness}

- fixed the rna-strandness bug, now HISAT2 works correctly with paired-end stranded reads - made the HISAT2 wrapper more consistent with the Bowtie2 wrapper - removed parameters not listed in HISAT2 options (e.g. --ma, --dovetail) - added some more parameters as options (e.g. no-templatelen-adjustment) - updated HISAT2 version to 2.1.0 (and also updated the test data outputs) - added options to output new summary format and summary file - added test for the summary file output - updated wrapper version - added sections

mblue9 · 2017-09-17T09:27:01Z

To fix this bug I've tried to make the change to how the input is requested, as suggested by @yhoogstrate (I thought that was a really good suggestion), however this was a more complicated wrapper than what I've worked on so far and I got a bit tangled up in errors for a bit.
But I managed to get myself out with the help of the Bowtie2 wrapper and in the process ended up making the HISAT2 wrapper more like the Bowtie2 one, as I found that wrapper easier to follow. I'm not sure if what I've done here is ok as I've made a lot of changes, in these new commits I've:

fixed the strandedness bug, now HISAT2 works correctly with paired-end stranded reads
made the HISAT2 wrapper more consistent with the Bowtie2 wrapper (I've changed the outputs so not sure if it might also fix this issue reported here HISAT2 in workflows: renaming output is broken #1478)
removed parameters not listed in HISAT2 options (e.g. --ma, --dovetail)
added some more parameters as options (e.g. no-templatelen-adjustment)
updated HISAT2 version to 2.1.0 (and also updated the test data outputs)
added options to output new summary format and summary file (the new summary format is compatiable with MultiQC https://github.com/ewels/MultiQC/blob/master/docs/modules/hisat2.md)
added test for the summary file output
updated wrapper version
added sections (I like the sections for reducing clutter but don't know if it's ok to add them like I've done here?)
I started adding in validators to the advanced options but then thought I should check if I should continue with this?

Now I don't know if in the process of trying to fix this bug I've made too many changes!! If you think so let me know and I can change things back.

yhoogstrate · 2017-09-18T07:19:54Z

@mblue9 I see two strand selection drop downs for the paired-end and paired-end (collection) data:

Is this intended?

I started adding in validators to the advanced options but then thought I should check if I should continue with this?

This is great but it doesn't need to test every (e.g. boolean) parameter because there are quite a lot. Conditionals are more important to test since they may introduce bugs if not all 's are correctly implemented. What I also prefer are small comments explaining what's being tested (like ).

mblue9 · 2017-09-18T21:25:58Z

Thanks for the review @yhoogstrate!

The two drop downs are intended, in that both options are given by HISAT- one is for if the sequencing assay was stranded (the more common option I would say), the other is generally --fr for paired-end sequencing, but can differ for mate-pair sequencing, see here: https://gatkforums.broadinstitute.org/gatk/discussion/6327/paired-end-mate-pair

In general, paired-end reads tend to be in a FR orientation, have relatively small inserts (~300 - 500 bp), and are particularly useful for the sequencing of fragments that contain short repeat regions. Mate-pair fragments are generally in a RF conformation, contain larger inserts (~3 kb), and enable sequence coverage of genomic regions containing large structural rearrangements. Tandem reads can result from inversions and rearrangements during library preparation.

Thanks for the info on the tests! I'll take another look at what conditionals I should test and add some comments to the tests (I agree, they're helpful for quickly seeing what the tests are doing).

mblue9 · 2017-09-19T04:32:51Z

Hi @yhoogstrate I've now:

added in more validators
added comments to the tests to try to explain what they're doing
tried to make the stranded option a little clearer in the help section (having two very similar parameters is confusing!)

yhoogstrate · 2017-09-20T10:31:59Z

Hi @mblue9, great, all the changes make me really happy! Sorry to say but I don't have time to go over everything today, but I will try to do this asap! This absolutely deserves a careful review.

one is for if the sequencing assay was stranded

I get it now and I should have seen this before shouting out loud.

mblue9 · 2017-09-21T05:49:05Z

😄 thanks @yhoogstrate glad they make you happy. I did make a lot of changes, including updating the test data, so it would be great to have a careful review from a few pairs of 👀

yhoogstrate · 2017-09-22T08:47:31Z

fixed the strandedness bug, now HISAT2 works correctly with paired-end stranded reads
- Nice
made the HISAT2 wrapper more consistent with the Bowtie2 wrapper (I've changed the outputs so not sure if it might also fix this issue reported here HISAT2 in workflows: renaming output is broken #1478)
- I am not sure but I believe the action type="metadata" is updated automatically in later versions of Galaxy. If this is indeed the case I believe all the dbkey metadata actions can be removed?
removed parameters not listed in HISAT2 options (e.g. --ma, --dovetail)
- I cross-checked this in the hisat code. The argument is still applicable [https://github.com/infphilo/hisat2/blob/master/hisat2.cpp#L659] but functionality is disabled [https://github.com/infphilo/hisat2/blob/master/hisat2.cpp#L1378] so this makes sense to me
added some more parameters as options (e.g. no-templatelen-adjustment)
- Good
updated HISAT2 version to 2.1.0 (and also updated the test data outputs)
- This makes sense too, also the appropriate lines_diff.
added options to output new summary format and summary file (the new summary format is compatiable with MultiQC https://github.com/ewels/MultiQC/blob/master/docs/modules/hisat2.md) + added test for the summary file output
- good and I really like that it is part of a selection box that is by default set to false
updated wrapper version
- this is necessary indeed
added sections (I like the sections for reducing clutter but don't know if it's ok to add them like I've done here?)
- I like this kind of structuring, but for example Select reference genome and Select reads could be merged together. If they are kept in a section I think they need to be expanded=True. I even wouldn't mind to take these out of sections.
I started adding in validators to the advanced options but then thought I should check if I should continue with this?
- Some are not necessary (in_range while min and max are enabled) but I think they provide better interactivity as validators give a direct response (correct me if I'm wrong). No opinion on this so it's good unless somebody else objects

I added some changes myself:

s/[ ]$// whitespace removal at the line ends
paired end strandness parameter moved into the paired end macro
A bit more and updated text to the help section of strandness parameter
Moved some stuff for int_quals from label to help

I still see a small thing that could be changed in the hisat2 wrapper. The disallow softclipping+ is requesting to disallow something. Such settings are always subject to discussion, but over several courses we have learned it is often easier to reprase this to 'allow softclipping' and put it by default on yes/true. This does not exactly reflect the commandline input but is more intuitive.

You have recently suggested to update the iuc standards and added:

Provide a tool_data_table_conf.xml.test file, which is an uncommented version of tool_data_table_conf.xml.sample containing the path to the loc file for testing: <file path="${__HERE__}/test-data/data_table_name.loc" />
I think this is a great oppurtinity to showcase ;) (I suppose this requires profile="17.05"?)

nsoranzo · 2017-09-22T17:00:32Z

Some are not necessary (in_range while min and max are enabled) but I think they provide better interactivity as validators give a direct response (correct me if I'm wrong). No opinion on this so it's good unless somebody else objects

<validator type="in_range" min="0" /> and min="0" attribute added directly to the param element are completely equivalent, with the second one being more concise and probably preferred. So I'd remove the redundant in_range validators.

Provide a tool_data_table_conf.xml.test file, which is an uncommented version of tool_data_table_conf.xml.sample containing the path to the loc file for testing:
I think this is a great oppurtinity to showcase ;) (I suppose this requires profile="17.05"?)

That's been possible for a while and is not related to the tool XML syntax/interpretation (it's just for testing), so it doesn't require the new profile.

mblue9 · 2017-09-26T07:01:19Z

Thanks @nsoranzo and @yhoogstrate for the detailed feedback! And the edits! Especially like the XS info in the help. @yhoogstrate your other suggested changes sound good to me too so I've made them.

In this commit I've:

removed the no longer necessary actions=metadata dbkey from outputs, and also all the other actions from outputs (they are apparently also not necessary as the tests pass without them?)
removed the sections around reference genome and reads. I think that should also fix this workflow issue here: HISAT2 in workflows: renaming output is broken #1478 (although am still wondering if that's the best strategy for renaming samples or if there's a better way)
removed the redundant "in_range" with "min" validators
changed to Allow soft-clipping = true rather than Disallow softclipping = false (I agree that seems better)
added the tool_data_table_conf.xml.test file and created a test for using an in-built reference genome

bgruening · 2017-09-26T19:17:02Z

Awesome stuff! Thanks a bunch @mblue9!!! And thanks for the review @yhoogstrate and @nsoranzo.

mblue9 · 2017-09-27T04:15:37Z

Great!! Will this new version get picked up soon by Main? As this bug will be affecting the results of people there who are using HISAT2 with paired-end stranded data.

mblue9 · 2017-09-27T04:30:13Z

P.S. The GTN tutorials using HISAT2 will need to be updated (screenshots and text) if they're using the new version of the tool, I just created an issue for that: galaxyproject/training-material#552

fix strandedness parameter

9dbc8ef

mblue9 mentioned this pull request Sep 8, 2017

HISAT2: error in option --rna-strandness with paired-end data #1455

Closed

add strandedness test and bump version

823fe2b

nsoranzo reviewed Sep 12, 2017

View reviewed changes

fix indentation

1dee782

nsoranzo reviewed Sep 12, 2017

View reviewed changes

mblue9 added 3 commits September 17, 2017 19:54

remove word accidentally inserted

619b3b0

remove lines accidentally left in

d4e9996

fix summary options and add comments back

3073d44

fixes from review

8e4d6e3

yhoogstrate added 2 commits September 22, 2017 10:26

small changes to hisat

07e9976

hisat2 int_quals help fix

d8a64b9

more fixes from feedback

200afcb

quotes

d16f91c

yhoogstrate approved these changes Sep 26, 2017

View reviewed changes

bgruening merged commit 118b80f into galaxyproject:master Sep 26, 2017

nsoranzo mentioned this pull request Sep 27, 2017

HISAT2 in workflows: renaming output is broken #1478

Closed

mblue9 deleted the fix-hisat2 branch January 19, 2018 04:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix HISAT2 strandedness parameter #1470

Fix HISAT2 strandedness parameter #1470

mblue9 commented Sep 8, 2017

bgruening commented Sep 11, 2017

mblue9 commented Sep 12, 2017

nsoranzo Sep 12, 2017

mblue9 Sep 17, 2017

nsoranzo Sep 20, 2017

shiltemann Sep 20, 2017 •

edited

Loading

mblue9 Sep 21, 2017

nsoranzo Sep 12, 2017

mblue9 Sep 17, 2017

nsoranzo Sep 12, 2017 •

edited

Loading

mblue9 Sep 17, 2017

nsoranzo commented Sep 12, 2017

yhoogstrate commented Sep 12, 2017 •

edited

Loading

mblue9 commented Sep 17, 2017

yhoogstrate commented Sep 18, 2017 •

edited

Loading

mblue9 commented Sep 18, 2017

mblue9 commented Sep 19, 2017

yhoogstrate commented Sep 20, 2017

mblue9 commented Sep 21, 2017

yhoogstrate commented Sep 22, 2017 •

edited

Loading

nsoranzo commented Sep 22, 2017

mblue9 commented Sep 26, 2017

bgruening commented Sep 26, 2017

mblue9 commented Sep 27, 2017

mblue9 commented Sep 27, 2017

Fix HISAT2 strandedness parameter #1470

Fix HISAT2 strandedness parameter #1470

Conversation

mblue9 commented Sep 8, 2017

bgruening commented Sep 11, 2017

mblue9 commented Sep 12, 2017

nsoranzo Sep 12, 2017

Choose a reason for hiding this comment

mblue9 Sep 17, 2017

Choose a reason for hiding this comment

nsoranzo Sep 20, 2017

Choose a reason for hiding this comment

shiltemann Sep 20, 2017 • edited Loading

Choose a reason for hiding this comment

mblue9 Sep 21, 2017

Choose a reason for hiding this comment

nsoranzo Sep 12, 2017

Choose a reason for hiding this comment

mblue9 Sep 17, 2017

Choose a reason for hiding this comment

nsoranzo Sep 12, 2017 • edited Loading

Choose a reason for hiding this comment

mblue9 Sep 17, 2017

Choose a reason for hiding this comment

nsoranzo commented Sep 12, 2017

yhoogstrate commented Sep 12, 2017 • edited Loading

mblue9 commented Sep 17, 2017

yhoogstrate commented Sep 18, 2017 • edited Loading

mblue9 commented Sep 18, 2017

mblue9 commented Sep 19, 2017

yhoogstrate commented Sep 20, 2017

mblue9 commented Sep 21, 2017

yhoogstrate commented Sep 22, 2017 • edited Loading

nsoranzo commented Sep 22, 2017

mblue9 commented Sep 26, 2017

bgruening commented Sep 26, 2017

mblue9 commented Sep 27, 2017

mblue9 commented Sep 27, 2017

shiltemann Sep 20, 2017 •

edited

Loading

nsoranzo Sep 12, 2017 •

edited

Loading

yhoogstrate commented Sep 12, 2017 •

edited

Loading

yhoogstrate commented Sep 18, 2017 •

edited

Loading

yhoogstrate commented Sep 22, 2017 •

edited

Loading