-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MultiQC-Additional overview stat table #134
MultiQC-Additional overview stat table #134
Conversation
…e output to table and add table to multiqc
|
bin/collect_stats.py
Outdated
|
||
print("Done!", flush=True) | ||
# peptide_id, condition_name, condition_peptide_count, highest_prediction_score, prediction_score_allele_0, prediction_score_allele_1,...prediction_score_allele_n | ||
conditions_peptides = conditions_peptides.set_index("peptide_id").join(best_scored_peptides).join(predictions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understood all details of the new code, but since it caused quite some memory usage, I have a few questions/suggestions since I am wondering if you really need that many joins:
conditions_peptides
: if the resulting df become too large, you could drop thecondition_name
(anyway the same within this context) andcount
before joining to the predictions- filter
predictions
directly after reading in for entries that are above the threshold, and drop theprediction_score
to reduce the size - merge/join for each allele to get only peptides that are "binders" and count
Brings me to the next question: you just provide the count of unique binders, right? Maybe that should be stated somewhere more clearly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I managed to reduce the used Memory to around 50GB which is only 10GB more compared to the previous version not including peptide predictions!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment regarding names, other than that it looks great!
This PR adds an overview summary table to the multiqc report, so users unexperienced with the pipeline can get an idea what the pipeline output contains.
full size test is currently running and test will be delivered after.
A current multiqc report is attached.
multiqc.zip
PR checklist
nf-core lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).nextflow run . -profile debug,test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).