-
Notifications
You must be signed in to change notification settings - Fork 922
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Odd compression on the R2.bz2 file for mrsa-illumina tutorial #4730
Comments
Ugh this was always a nightmare, had so many issues with the .bz2 files. Incredibly frustrating, and the worfklow was always very difficult about the format, and I never understood why. a test of the file exits without error
and agreed not truncated
In the workflow+tutorial for the nanopore they're decompressed automatically, for the exact same issue, maybe we add that step to the tutorial. Or do you have another suggestion for fixing it? bz2 should be supported :( |
I can create a new Zenodo entry with data in a different format |
or it's a fastqc problem and they can't actually handle this bz2 and we stop accepting bz2 in the fastqc wrapper s-andrews/FastQC#48 we aren't using a concatenated file, but, it could still be down to fastqc's support for standard bzip2 files vs the java implementation that they're using which they might switch away from |
I recompressed with compression levels 1-9 and they all are read fine by fastqc, including and up to 9 which is what's used by the existing file.
but that also is exactly what's described in their issue:
so we could upload a replacement for one of these, easily I guess |
The header of a Bzip2 file includes a
this file is definitely concatenated! (but it's still very much the java bzip2 library's fault for not being able to read this when the other implementations can.) edit: the source files: https://ddbj.nig.ac.jp/public/ddbj_database/dra/fastq/DRA008/DRA008776/DRX178031/ exhibit the exact issue, they are concatenated for some reason. |
I've found another use of this dataset in the wild and contacted him since he's a colleague in µbinfie, his tutorial uses Trimmomatic and that also fails, so, here is a re-issued record https://zenodo.org/records/10669812 with re-build bzip2 files that should not experience this issue (confirmed with fastqc + trimmomatic.) |
if any of y'all wanna approve #4732 |
Thank you for digging into this more! So odd but glad solved now :) |
Tutorial: https://training.galaxyproject.org/training-material/topics/assembly/tutorials/mrsa-illumina/tutorial.html
Zenodo: https://zenodo.org/records/4534098
Ok --- https://zenodo.org/record/4534098/files/DRR187559_1.fastqsanger.bz2
Fails "as truncated data" with FastQC --- https://zenodo.org/record/4534098/files/DRR187559_2.fastqsanger.bz2
The second file R2 in bz2 format is tossing errors with FastQC at both usegalaxy.org and usegalaxy.eu.
If that same file is uncompressed but otherwise unchanged, it works fine with FastQC.
If that file is left bz2 compressed, it fails just FastQC but works with the tutorial's workflow for the other steps.
Compression problem? Doesn't seem to be actually truncated.
ORG https://usegalaxy.org/u/jen-galaxyproject/h/genome-assembly-of-mrsa-using-illumina-miseq-data-1
EU didn't run myself but it was reported here https://help.galaxyproject.org/t/fastqc-fails-in-mrsa-genome-assembly-tutorial/11703
The text was updated successfully, but these errors were encountered: