Releases: UMCUGenetics/DIMS
v2.4.0
v2.3.0
2.2.2
bugfix missing m/z values
things changed:
- add extra time of 5 seconds after each dependency (wait for this joblist + 5 seconds)
- (2.2.1) added in the end mail information about which m/z values are missing or if everything is ok. I want to keep this for a while, to test if the bugfix is consistent. This bug is difficult to reproduce.
- changed afterany to afterok in dependencies
- added 2 wait in bash
- fixed a typo, that filled a variable with job_ids that is never referenced in the code. Therefore the next sbatch took the wrong job_id dependency
- logging of loaded files in script 10 (in the default .o file of script 10) to easier pinpoint the problem if it still occurs after this bugfix
2.2.1
hotfix:
Sometimes data seemed to be missing after running the pipeline. This was visible when an IS (internal standard) was missing in the plots.
Seems to be a bigger problem, because there are sometimes chunks of m/z missing in the data, directly related to the HMDB split files by the pipeline.
To monitor this problem, the missing m/z values will be printed to the end mail. An additional file is made for this purpose in each run, namely missing_mz_warning.txt
.
2.2.0
2.1.2
bug fix and new possibilities for positive control sample names
DIMS v2.1.2 is an update of the DIMS pipeline to make other positive control names possible. At this point, they are hard coded as P1002.1, P1003.1 and P1005.1. Recently there is an update of patient materials and therefore plasma runs will use P1005.2 and DBS runs still makes use of P1005.1. Therefore now all positive controls with the format P1002.x, P1003.x and P1005.x (where x stand for a number from 1 to 9) are accepted to accommodate for future changes in positive controls, as well as backwards compatibility.
Bug with DIMS2.1.1. Samplenames with only one number (e.g. P7.2, C3.1) are blocked by the GUI as "incorrect sample names that do not fit the requirements). This was a small oversight in the regex, which expected 2 numbers. Those sample names are not incorrect and now the GUI will not block these anymore.
fixes issue #47
2.1.1
2.1.0
a. Configureerbare job tijden en memory. Zonder direct de code aan te passen de mogelijkheid bieden om de slurmjob tijd en slurmjob memory aan te passen. Bijvoorbeeld in tijden dat het HPC erg vol zit, om time-out errors te voorkomen of als de ruwe date veel groter is en er meer processing kracht nodig is.
b. Checkt of de sample name aan de voorwaarden voldoet, zoals omschreven in de SOP (aMEZ0220) als de Z-score aanstaat
c. Bugfix van een filtered error probleem die ervoor zorgde dat het script niet helemaal afrondde als alle technical replicates van een sample uitgefilterd worden gedurende de pipeline door slechte/te weinig data. Bovendien gebeurt dit ook nog verderop in het script als dit plaatsvindt bij de positieve controles ('P1002.1', 'P1003.1', 'P1005.1').
d. Vermeld in de completion e-mail welke samples en/of positieve controles uitgevallen zijn doordat alle technical replicates uitgefilterd zijn (update c).
e. Bij het creëren van de bestanden “Pos_Contr.Rdata” en “IS_results.RData” de projectnaam in naam verwerken (unieke naamgeving). Vergelijkbaar als nu al met de Excel file gedaan wordt. Veranderd naar “[Run_name]Pos_Contr.RData” en “[Run_name] IS_results.RData”.
f. De Leucine plots “Leucine_sum.png”, “Leucine_pos.png” en “Leucine_neg.png” zijn komen te vervallen.
g. Er zijn extra plots die de meest belangrijke (selectie) Interne standaarden (IS-plots) laat zien per modus. (“IS_bar_select_sum.png“, “IS_bar_select_pos.png“, “IS_bar_select_neg.png“) en de bestaande plots waar alle IS getoond worden zijn hernoemd naar “IS_bar_all_sum.png“, “IS_bar_all_pos.png“, “IS_bar_all_neg.png“.
h. Punt g kan precies hetzelfde toegepast worden voor de line plots. Er zijn extra plots die de meest belangrijke (selectie) Interne standaarden (IS-plots) laat zien per modus. (“IS_line_select_sum.png“, “IS_line_select_pos.png“, “IS_line_select_neg.png“) en de bestaande plots waar alle IS getoond worden zijn hernoemd naar “IS_line_all_sum.png“, “IS_line_all_pos.png“, “IS_line_all_neg.png“.
i. Een nieuwe excelfile “[Run_name]_Pos_Contr.xlsx”, waarin de Z-scores staan van de positieve controles, Run name, gebruikte matrix en run date.
2.0.0
- Use of Slurm Workload Manager, instead of Oracle Grid Engine (SGE)
- Conversion from raw to mzML with the use of ThermoRawFileParser, instead of to mzXML with msconvert through ProteoWizard docker
- Pipeline now ends with cleanup script (14-cleanup.sh) that opens the permissions of both the raw data and the processed folders, as well as sends a completion mail to the given email which contains any potential error outputs from jobs
- The splitting of the HMDB files into smaller files is now done at the beginning of the pipeline and can be run parallel to the rest
- Addition of script 13-excelExport.R, which makes an Excel file (.xlsx) from the results, as well as plots based on the internal standards and
- Option to not calculate Z-score and to not include HMDB plots in the Excel file (z_score parameter in settings.config)
- Some steps in the pipeline are now done with R version 3.6.2 instead of 3.2.2, namely 2-DIMS.R and 13-excelExport.R
- Some scripts have been simplified by combining them with the function script that was originally in AddOnFunctions, as well as cleaning up some of the script structure
- Use of dos2unix to make settings.config readable for HPC if the file was made from a Windows PC via the GUI
- A makeInit.R script to manually make the init.RData whenever the GUI isn't available
- GUI updates:
- Checks that need to be correct before the button to start the data transfer and the pipeline appears:
o Is there a sample sheet selected
o Is there a datafolder selected
o Do all samples that have been selected exist in the datafolder
o Does every biological sample have the correct amount of technical replicates
o Have all parameters been filled in and/or selected
o Can a connection with HPC submit node be made with the given HPC logins
o Can a connection with HPC transfer node be made with the given HPC logins
o Is there no existing raw data folder yet with the given folder name
o Is there no existing processed folder yet with the given folder name - Use of shinydashboard package for layout
- Use of datatables package (DT) for visualising and selecting of samples
- Use of ssh package for data transfer
- Automatic selection of the parameter 'total intensity threshold' based on the selected matrix
1.0.0
First release of the renewed DIMS pipeline. The changes consist of:
- Better folder structure on HPC.
- Input and output folders can be given as parameters instead of being hardcoded as "data" and "results".
- Config file for parameters instead of hardcoded parameters that needed to be constantly edited.
- Better logging; eg. not spammed to one's home folder anymore, logs all scripts that were submitted as jobs and their output in ordered folders, logs git commit number and date when the run was started.
- RAW to mzXML conversion is now done by the pipeline with the use of a WINE wrapper inside a docker using the singularity software - this is needed as it is originally Windows-only software. It can sometimes fail, so checks were built in to redo the conversion max 2 times.
- Mails for when the last jobs of the negative and positive branches are finished.