Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update MultiQC to reject or report in error/help that HTML input is not valid #1538

Closed
jennaj opened this issue Oct 26, 2017 · 3 comments · Fixed by #1821
Closed

Update MultiQC to reject or report in error/help that HTML input is not valid #1538

jennaj opened this issue Oct 26, 2017 · 3 comments · Fixed by #1821

Comments

@jennaj
Copy link
Member

jennaj commented Oct 26, 2017

Users are reporting usage problems because they are inputting the HTML report from FastQC, not the Raw Data report, and sometimes both (!!).

Tool version 1.2.0: https://toolshed.g2.bx.psu.edu/view/iuc/multiqc/3bad335ccea9

Tutorials cover appropriate inputs here (and other places) but it doesn't seem to be enough: https://galaxyproject.org/tutorials/ngs/#assessing-data-quality

Could we consider these updates, with item 1 being the best (avoid this error case completely, making items 2 & 3 unnecessary, plus avoid wasting resources on a bad job that will eventually fail):

  1. Trap when HTML input is given to the tool at runtime and aborting the submission with a red box warning near the input select stating "Inputs included HTML formatted reports. Input raw (plain text) reports instead." - OR - by filtering HTML datatypes out in the input select list of available datasets?

  2. If item 1 above is not possible, improve the error message to state "Inputs included HTML formatted reports. Input raw (plain text) reports instead." ("Raw text" is the actual name given to the FastQC text version of the report - so just stating "plain text" is language that is not always interpreted by sci users well - and doesn't map to commonly input dataset names).

  3. If item 1 above is not possible, state similar text as item 2 above in the help section of the tool form, example "Input raw (plain text) reports only. HTML reports will trigger a tool failure".

@bgruening
Copy link
Member

We could:

  • change FASTQC output to a more defined data format than txt
  • it would be nice to exclude formats in the data-format attribute, something like format="txt,!html,bed"

@mblue9
Copy link
Contributor

mblue9 commented Oct 28, 2017

As someone who was also confused by what FastQC output was needed for MultiQC ☺️ it would definitely be great to make that clearer!! What about also changing the names of the FastQC outputs as that "RawData" name is confusing in itself I think. Maybe changing from->to something like?
Webpage -> Report (HTML)
RawData -> Report (Text)

@jennaj
Copy link
Member Author

jennaj commented Mar 3, 2018

The current MTS version still will accept, yet eventually error, when given HTML as input. The resulting log doesn't help the user to know what really went wrong or how to correct the inputs.

It is great that the tool form label states "Raw Data" and it does match the FastQC dataset names to select. But filtering to exclude HTML would be better as @mblue9 states.

Anyway, let's see if that is enough - I'll watch for user error reports due to this use-case. If nothing comes in, or at least fewer reported issues come in, I think we can close this out, unless there are future plans?

Example from test history datasets 136 137 138 https://usegalaxy.org/u/jen/h/test-tools-prokka-unicycler-quast-multiqc

Dataset Error
An error occured while running the tool toolshed.g2.bx.psu.edu/repos/iuc/multiqc/multiqc/1.3.1.

Tool execution generated the following messages:

Fatal error: Exit code 1 ()
[WARNING]         multiqc : MultiQC Version v1.4 now available!
[INFO   ]         multiqc : This is MultiQC v1.3
[INFO   ]         multiqc : Template    : default
[INFO   ]         multiqc : Searching 'multiqc_WDir'
[ERROR  ]         multiqc : Oops! The 'fastqc' MultiQC module broke... 
                    Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues 
                    (if possible, include a log file that triggers the error) 
============================================================
Module fastqc raised an exception: Traceback (most recent call last):
  File "/cvmfs/main.galaxyproject.org/deps/_conda/envs/[email protected]/bin/multiqc", line 412, in multiqc
    output = mod()
  File "/cvmfs/main.galaxyproject.org/deps/_conda/envs/[email protected]/lib/python3.5/site-packages/multiqc/modules/fastqc/fastqc.py", line 44, in __init__
    self.parse_fastqc_report(f['f'], s_name, f)
  File "/cvmfs/main.galaxyproject.org/deps/_conda/envs/[email protected]/lib/python3.5/site-packages/multiqc/modules/fastqc/fastqc.py", line 178, in parse_fastqc_report
    self.fastqc_data[s_name]['basic_statistics'] = {d['measure']: d['value'] for d in self.fastqc_data[s_name]['basic_statistics']}
KeyError: 'basic_statistics'
============================================================
[WARNING]         multiqc : No analysis results found. Cleaning up..
[INFO   ]         multiqc : MultiQC complete

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants