-
Notifications
You must be signed in to change notification settings - Fork 0
Advanced
Here we describe how to run the data processing steps described on the homepage but dynamically across any number of files in a folder.
First, make sure you are able to run a single file as described on the Wiki homepage. This will allow the main work of processing a single file but not any of the housekeeping involved in reliable and trackable processing.
Our workflow for multi-file processing will look like this
-
Stick all of the files we want to process in a folder somewhere in our home directory
-
Create a "runner" script which loops through each file and calls
sbatch
to batch a script which will handle all processing of the given file -
Create a "processor" script which will copy the file to a "scratch" working directory, run the core dataprep process on that file, and then move the completed file to a "results" directory.
Create a script called dataprep-runner.sh
which looks like this
#!/bin/bash
# gets all files in a folder and runs SBATCH on each with proper args
FILES=~/pixels/sorted/*.csv
for f in $FILES
do
# TODO: copy the file into processing directory/scratch
export PIXEL_FILE=$f
FILE_NAME=$(basename -a -s .csv $f)
echo "Batching dataprep run for $f"
sbatch -t 120 -J dp-$FILE_NAME -o batch-$FILE_NAME-%j.out dataprep-processor.sh
done
Create a file called dataprep-processor.sh
like so
#!/bin/bash -l
FILE_NAME=$(basename -a -s .csv $PIXEL_FILE)
# Print the hostname for debugging purposes
hostname
# Set your variables
export OSRM_FILE="/home/$USER/osrm-data/california-latest.osrm"
export HGT_FILES="/home/$USER/hgt/"
export HGT_USER="postit"
export HGT_PASS="PASSWORD"
# copy the file into processing directory/scratch for this job
mkdir -p /scratch/$USER/$SLURM_JOBID
cp $PIXEL_FILE /scratch/$USER/$SLURM_JOBID/$FILE_NAME.csv
PIXEL_FILE=/scratch/$USER/$SLURM_JOBID/$FILE_NAME.csv
export TREATED_OUT_FILE=/scratch/$USER/$SLURM_JOBID/$FILE_NAME-processed.csv
echo "Processing $PIXEL_FILE to $TREATED_OUT_FILE"
srun node ./cec-dataprep/index.js
# remove the copied pixel file and move the result to your home directory
echo "Finished processing, cleaning up files"
rm $PIXEL_FILE
mv $TREATED_OUT_FILE /home/$USER/pixels/results/
Now you can just run the runner with sh dataprep-runner.sh
and it'll submit one job for each CSV file.