-
Notifications
You must be signed in to change notification settings - Fork 441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MultiQC for list of pairs (of FastQC output) #1658
Comments
Yes that would be great if MultiQC could work with paired collections and FastQC! I ran into this issue aswell. |
Do you know why it does not accept lists of pairs as input? Is there some thing wrong in the tool xml form description? |
fastqc (the software, not the wrapper) is not aware of paired end datasets.
|
Thanks for the information. What I do not understand is why the tool form does not list the available list of pairs as input. Would this need an additional parameter in the For the application I think it does not matter that FastQC does not know about paired data sets. One could treat them separately with FastQC. This is also what actually happens: a list of pairs of fastq data is transformed by FastQC into a list of pairs of reports. So I would suggest to modify the MultiQC tool such that also list of pairs are accepted and the result should be the same as if a joint list of single fastq files is used as input. The toolbox function of MultiQC allows to color by sample name or forward/reverse information (or any other info contained in the filename). If users prefer to analyze interleaved data they can still do so, but I just learned that forward and reverse read may differ in their qualities, i.e. it makes sense to treat them separately. |
How would that work ? Just iterating over the forward or reverse reads? Why not do this explicitly using the unzip collection tool ? I'm happy to see an implementation, but I don't think MultiQC does well with R1/R2 or forward/reverse pairs based on MultiQC/MultiQC#542. Now if the report was actually for paired end data this wouldn't be an issue at all. |
I think http://multiqc.info/docs/#afterqc may be a good option to replace fastqc for this purpose. |
Yes I'd love to see a Galaxy tool for AfterQC anyway and if helps get around this issue with FastQC then that would be great! |
Just a note: according to the readme of https://github.com/OpenGene/AfterQC afterqc has been reimplemented: https://github.com/OpenGene/fastp |
Do you think we should handle the paired collection for FastQC module of MultiQC? And then only for FastQC? |
Thanks for reviving this issue @bebatut . First, I would like to understand why MultiQC does not accept a list of pairs of FastQC as input. I do not see anything in the tool xml that limits this. Is this because Second, I think that the workaround of @mvdbeek (create two separate lists and merge them / maybe flatten the collection / interleaving them) should be sufficient to do what I want. In the end I only want to analyze a forward and reverse reads together. Concluding, I would suggest to close the issue if we understand the technical reason why paired lists are not accepted. |
I tested: if you have |
It is definitely the expected behavior. I think in this use case the semantics of what should occur is kind of clear because you know a lot of the tool and data that Galaxy does not know and that isn't represented in the tool wrapper. The problem is say I allow a The complete reduction option makes sense here for this tool, but I would imagine if the tool was like "cat1" or some tool that summarized fastq files the other option of mapping over the list and reducing the pairs would make some sense in some cases. And the mapping over the list and reducing the pairs behavior is more inline with what happens for instance if a In the list:list or the list:paired case - if you know you want to wipe out the structure of the nested collection - Galaxy provides the tools to do that - you can use the flatten collection operation tool. It should reorganize the information to make it clear the nested structure is not important for a give application or part of your workflow. This may feel heavy - but it shouldn't duplicate any of the actual data on disk and it should run relatively quickly. Newer tool form options should be introduces that would let the researcher say - |
So for now, I think we can not do anything. @bernt-matthias what do you think? |
I agree. Thanks to @jmchilton for the explanations. |
Yes thanks @jmchilton !!! |
Still a problem even with collection "list" input. Another fix is going on. Should this be re-opened too? |
If I run FastQC on a List of Paired data I get also the FastQC results as list of pairs. These are not selectable as input in a MultiQC analysis.
I'm unsure if this should be fixed in FastQC (maybe it should just output a list of (unpaired) reports) or MultiQC which maybe should also support list of paired data as input...?
The text was updated successfully, but these errors were encountered: