Pipeline that processes raw Direct Infusion Mass Spectrometry data.
.
|───GUI/ (GUI scripts)
|───db/ (Human Metabolome Database files)
|───extra/ (flowcharts)
|───pipeline/ (pipeline scripts)
|───post/ (scripts that can be manually run after pipeline)
Used R version: 3.6.1
Libraries: DT, shiny, shinydashboard, shinyFiles, ssh
- Copy config_default.R to your own config.R, and configure as needed.
docker build -t umcugenbioinf/dims:[tag] -f Dockerfile .
docker push umcugenbioinf/dims:[tag]
on HPC:
srun -c 2 -t 0:30:00 -A dbg_mz --mem=100G --gres=tmpspace:100G --pty /usr/bin/bash
cd /hpc/dbg_mz/tools/singularity_cache/
singularity build /hpc/dbg_mz/tools/singularity_cache/dims-[tag].img docker://umcugenbioinf/dims:[tag]
Used R version: 4.1.0
Libraries: xcms, stringr, dplyr, Rcpp, openxlsx, reshape2, loder, ggplot2, gridExtra
- Create the following folders in the same root map (e.g. /hpc/dbg_mz)
/development
/processed
/production
/raw_data
/tools
- In
/development
, clone the dev branch of the DIMS repo.
git clone -b dev --single-branch [email protected]:UMCUGenetics/DIMS.git
- In
/production
, clone the master branch of the DIMS repo.
git clone -b master --single-branch [email protected]:UMCUGenetics/DIMS.git
- In
/tools
, install mono with GUIX under /mono - In
/tools
, place the latest tested release of ThermoRawFileParser (v1.1.11) under /ThermoRawFileParser_1.1.11 - In
/tools
, put the required Human Metabolome Database (HMDB) .RData files under /db.
The pipeline is meant to be started with the GUI, which is an R shiny program to transfer data to the HPC and start the pipeline. To open the GUI, open GUI.Rproj in Rstudio, which should open run.R and config.R. Then click "Run App" from the run.R file.
Manually starting the pipeline is also possible.
CMD:
sh run.sh -i <input path> -o <output path> [-r] [-v] [-h]
REQUIRED ARGS:
-i - full path input folder, e.g. /hpc/dbg_mz/raw_data/run1
-o - full path output folder, e.g. /hpc/dbg-mz/processed/run1
OPTIONAL ARGS:
-r - restart the pipeline, removing any existing output for the entered run (default off)
-v - verbose printing (default off)
-h - show help
EXAMPLE:
sh run.sh -i /hpc/dbg_mz/raw_data/run1 -o /hpc/dbg_mz/processed/run1
Input folder requirements:
- all the .raw files
- init.RData (sampelsheet, which contains which technical replicates belong to which biological sample)
- a 'setting.config' file containing e.g.:
thresh_neg=2000
dims_thresh=100
trim=0.1
nrepl=3
normalization=disabled
thresh2remove=1000000000
resol=140000
[email protected]
matrix=DBS
db=.../tools/db/HMDB_add_iso_corrected_V2.RData
db2=.../tools/db/HMDB_with_info_relevance_corrected_V2.RData
z_score=1
standard_run=yes
hmdb_parts_dir=/hpc/dbg_mz/production/DIMS/hmdb_preparts