-
Notifications
You must be signed in to change notification settings - Fork 441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
interaction:fixed effects and corrections modified #5928
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @renu-pal.
One question inline. Please also bump the tool version.
tools/maaslin2/maaslin2.xml
Outdated
@@ -55,7 +55,7 @@ cd outputFolder && mkdir -p figures/ && cp *.pdf figures | |||
<inputs> | |||
<param name="input_data" type="data" format="tabular" label="Data (or features) file"/> | |||
<param name="input_metadata" type="data" format="tabular" label="Metadata file"/> | |||
<param argument="--fixed_effects" type="select" multiple="true" optional="true" label="Interactions: Fixed effects" help="The fixed effects for the model, comma-delimited for multiple effects"> | |||
<param argument="--fixed_effects" type="text" multiple="true" optional="true" label="Interactions: Fixed effects" help="The fixed effects for the model, comma-delimited for multiple effects"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why should this be a text paraeter? We should probably fix the help instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do that now but if more association factors get added in the future, we might have to make changes again as we are doing now. So instead take it as a text parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, the fixed_effects
as well as the random_effects
should be of type: data_column
(see:
tools-iuc/tools/bedtools/overlapBed.xml
Line 17 in 4e3719a
<param name="cols" type="data_column" data_ref="input" multiple="true" |
Since for both multiple columns from the input can be chosen. The original optional values are nonsense, since they hard-coded the column names of the test data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am getting column numbers instead of names after using type="data_column" . Do we have any other type where we can get names of columns directly instead of numbers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tried setting numerical=false? https://docs.galaxyproject.org/en/latest/dev/schema.html#id51
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
setting numerical="false" did not work instead setting use_header_names="true" worked and UI shows column names now but the test cases are failing. Do not understand why it is still accepting numerical values as valid options for tests. So working on that now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could be wrong, but the use_header_names="true" option is really just cosmetic and for the UI, you still need to give numbers in the tests.
Very good idea to use use_header_names="true"!!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @renu-pal it was a surprise for me, that there is not easy way to get the column_names available for the command line from Galaxy, however after some fideling I found a way to get them from the input file:
## get column names of fixed and random effect from the input file, since galaxy
## can only return indices with type="data_column"
## get header
#set $input = open(str($input_metadata), 'r')
#set $header = $input.readlines()[0].split('\t')
## get fixed effects
#set $fixed_effects_val = []
#for $i in $fixed_effects:
#silent $fixed_effects_val.append($header[int($i)])
#end for
#set $fixed_effects = ','.join($fixed_effects_val)
## get random effects
#set $random_effects_val = []
#for $i in $random_effects:
#silent $random_effects_val.append($header[int($i)])
#end for
#set $random_effects = ','.join($random_effects_val)
Maybe let the user know in the help, that the column names must be in the first line of the table or modify the code to skip comments when parsing the header
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
forget about that, we need to get the column names in the command line not the cheetah code...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So update, this way, using awk the file is only parsed on script execution
## get column names of fixed and random effect from the input file, since galaxy
## can only return indices with type="data_column"
## using awk so that the file is only parsed on command line execution
#set idx = []
#for $i in $fixed_effects:
#silent idx.append(f'${i}')
#end for
#set idx_for_awk = ','.join(idx)
fixed_effects=`awk -v OFS=',' -F"\t" 'NR == 1 { print $idx_for_awk}' '$input_metadata'` &&
echo 'Assigned fixed effects as:' \$fixed_effects &&
#set idx = []
#for $i in $random_effects:
#silent idx.append(f'${i}')
#end for
#set idx_for_awk = ','.join(idx)
random_effects=`awk -v OFS=',' -F"\t" 'NR == 1 { print $idx_for_awk}' '$input_metadata'` &&
echo 'Assigned random effects as:' \$random_effects &&
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @renu-pal ! Ones this is fixed, the tool is finally functional !
Ones they update the bioconda recipe: bioconda/bioconda-recipes#46915 we can also update the wrapper, but lets do this one first !
tools/maaslin2/maaslin2.xml
Outdated
@@ -55,7 +55,7 @@ cd outputFolder && mkdir -p figures/ && cp *.pdf figures | |||
<inputs> | |||
<param name="input_data" type="data" format="tabular" label="Data (or features) file"/> | |||
<param name="input_metadata" type="data" format="tabular" label="Metadata file"/> | |||
<param argument="--fixed_effects" type="select" multiple="true" optional="true" label="Interactions: Fixed effects" help="The fixed effects for the model, comma-delimited for multiple effects"> | |||
<param argument="--fixed_effects" type="text" multiple="true" optional="true" label="Interactions: Fixed effects" help="The fixed effects for the model, comma-delimited for multiple effects"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, the fixed_effects
as well as the random_effects
should be of type: data_column
(see:
tools-iuc/tools/bedtools/overlapBed.xml
Line 17 in 4e3719a
<param name="cols" type="data_column" data_ref="input" multiple="true" |
Since for both multiple columns from the input can be chosen. The original optional values are nonsense, since they hard-coded the column names of the test data.
tools/maaslin2/maaslin2.xml
Outdated
<option value="hochberg">hochberg</option> | ||
<option value="hommel">hommel</option> | ||
<option value="bonferroni">bonferroni</option> | ||
<option value="BH">BH</option> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add the full names for BH and BY as well, maybe also add a link to the explanation to the help and check if they all work, maybe add a test or two as well
@renu-pal that means you need to increase the The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update to work with no columns selected
tools/maaslin2/macros.xml
Outdated
@@ -1,6 +1,6 @@ | |||
<?xml version="1.0"?> | |||
<macros> | |||
<token name="@TOOL_VERSION@">0.99.12</token> | |||
<token name="@TOOL_VERSION@">1.7.3</token> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@atm we are still at 0.99.12, no ?
## can only return indices with type="data_column" | ||
## using awk so that the file is only parsed on command line execution | ||
|
||
#set idx = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@renu-pal you said Yh it did a bit, but it's returning the complete list in case of 'random effect " when user does not choose any option
Could you try to fix that here, i.e. add an if statement, that does not choose any columns if no columns are specified by the user - and also add a test for it ...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, working on that
warning : the correction methods other than BH and BY will not work due to older version of maaslin. |
@renu-pal the newer version of https://anaconda.org/bioconda/maaslin2/files is online. |
Thanks for the update @bgruening ! I will bump the version to 1.16.0 but will need some time to push the updated version as test cases are failing because of the version update. So I need to look into that and make necessary changes. |
Since for the current version the wrapper is broken (not logically usable), would it make sense to merge this version first and then bump to the next version in a new PR ? @bgruening any preference ? |
tools/maaslin2/maaslin2.xml
Outdated
@@ -245,7 +278,7 @@ cd outputFolder && mkdir -p figures/ && cp *.pdf figures | |||
<test expect_num_outputs="5"> | |||
<param name="input_data" value="HMP2_taxonomy.tsv"/> | |||
<param name="input_metadata" value="HMP2_metadata.tsv"/> | |||
<param name="fixed_effects" value="diagnosis,dysbiosisnonIBD"/> | |||
<param name="fixed_effects" value="c4: diagnosis,c9: dysbiosisnonIBD"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mvdbeek does this look like a regression in Galaxy to you? I thought that either the column name or the c4
should work, but in combination, it looks wired to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The combination is not necessary. I mentioned header names in test cases to make it easier for us to recognize which values we are taking. I will change it to just numbers.
I changed the values to just numbers. I was wondering if I should disable the other options other than BY and BH in correction methods so the user does not choose it until we do not update the version. |
@renu-pal it was a good idea. I was just confused and learned now that this is also possible. If you like you can add a comment to say what those columns are. Overall this looks like its ready to go. I hope I fixed the failing test. |
Ready to go from my end too!! |
@@ -347,6 +474,9 @@ Output | |||
- It only includes associations with q-values <= to the threshold. | |||
- Data frame with residuals for each feature (R data file) | |||
- This file contains a data frame with residuals for each feature. | |||
|
|||
Correction methods to compute the q-value : https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/p.adjust |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is called for the correction, why can it only take BH / BY ? Or is this fixed now ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check this link.
https://forum.biobakery.org/t/invalid-p-value-correction-options/3321
It seems there was some issue from their end. So I believe if we update the version, it should work but I am not sure if they fixed the bug. Any way to check that ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK good to know, then the wrapper is fine as is, but you can remove this text from the help section (it will confuse the user since the option is not there); and put it back when we update the tool :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, will update that
I have updated the correction help. Please let me know if anything else needs to be updated. |
Looks good from my side, can we merge it @bgruening ? |
Thanks @renu-pal! Great work! |
Thanks for the guidance @paulzierep and @bgruening :) |
FOR CONTRIBUTOR:
[ *] I have read the CONTRIBUTING.md document and this tool is appropriate for the tools-iuc repo.
[ *] This PR updates an existing tool or tool collection