Skip to content
Scott Kirkland edited this page Jun 26, 2020 · 9 revisions

Welcome to the cec-dataprep wiki!

Running on Farm

Setup your environment

While git is already installed, you need to install node in a conda environment.

First, load the conda module in farm module load conda3.

One time only, create the environment: conda create -yn cec nodejs.

From then on, every time you login you need to load the conda3 module and do source activate cec

Run the project

If you haven't already, git clone https://github.com/ucdavis/cec-dataprep to get this project and then npm install your dependencies.

Now create a batch file to run the npm build and npm start commands as well as set your env variables.

dataprep.sh (TODO: this is an example script, real one to come soon)

#!/bin/bash -l

# Name of the job - You'll probably want to customize this.
#SBATCH -J csvimport

# Standard out and Standard Error output files with the job number in the name.
#SBATCH -o csvimport-%j.output
#SBATCH -e csvimport-%j.output

# Print the hostname for debugging purposes
hostname

# Set your variables
export DB_HOST="test.host.name"

# Run the actual work you want to do
srun echo 'hello world'

Once you have a script, the real work happens when it is batched to be run by the farm slurm system

Example runs:

sbatch -t 30 dataprep.sh -- submit the job to be run on one node for up to 30 minutes sbatch -N 1 -n 2 -t 30 dataprep.sh use one node with 2 processes sbatch --array=[1-5] -t 30 dataprep.sh run it 5 times, once for each array value

You can monitor your submitted job with squeue -u $USER

Get the CSV file(s)

Make the file accessible via link, possibly using Box. If you use box, you must copy the "direct download link" and not preview share link.

Download it into a subfolder of your home directory: wget -O file.csv https://url.com/

Clone this wiki locally