Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot tell if RP install succeeded...expected files not written #61

Open
mchimenti opened this issue Apr 4, 2019 · 0 comments
Open

Comments

@mchimenti
Copy link

Hello,

I'm trying to install Ricopili on a linux HPC cluster using the directions found here:
https://docs.google.com/document/d/14aa-oeT5hF541I8hHsDAL_42oyvlHRC5FWR7gir4xco

I've gotten to the point of running "rp_test_navi" and waiting for output. The pipeline completes but doesn't write expected files. The main error seems to be that it's trying to submit a job from a compute node and it also can't understand "batch_job_output_jid" in my custom config file (honestly, I can't understand what this should be either). See below for details on these errors.

Any help is appreciated. I think I'm close to having RP working, but I need some guidance. Thanks!


This is what I get after running the test module (I removed the boilerplate "RICOPILI" text for clarity):

mchiment@argon-login-1: test_2$ rp_test_navi
Config_file: /Users/mchiment/ricopili.conf

rp_test_navi - module of ricopili pipeline
version: 2019_Feb_18.001

https://sites.google.com/a/broadinstitute.org/ricopili/home
Stephan Ripke: [email protected]

testing email program
!!Warning!! : No mutt command available, trying mail
mail found in /usr/bin

also test scratchdir, ldscore, starting R


touch.1.finished not found, need to start job
touch.2.finished not found, need to start job
touch.3.finished not found, need to start job
touch.4.finished not found, need to start job
touch.5.finished not found, need to start job
touch.6.finished not found, need to start job
touch.7.finished not found, need to start job
touch.8.finished not found, need to start job
touch.9.finished not found, need to start job
touch.10.finished not found, need to start job
touch.11.finished not found, need to start job
touch.12.finished not found, need to start job
touch.13.finished not found, need to start job
touch.14.finished not found, need to start job
touch.15.finished not found, need to start job
touch.16.finished not found, need to start job
touch.17.finished not found, need to start job
touch.18.finished not found, need to start job
touch.19.finished not found, need to start job
touch.20.finished not found, need to start job
Config_file: /Users/mchiment/ricopili.conf
starting job_array, j.heavycomp.test
batch_taskid: $SGE_TASK_ID
this is the job_array sent: my.start_job --parn 16 -n $SGE_TASK_ID --jobfile heavycomp.job_list
Config_file: /Users/mchiment/ricopili.conf
stdout from array submission: Your job-array 7447606.1-2:1 ("heavycomp.test") has been submitted
dependent job ID:7447606
starting motherscript, depending on 7447606

20 jobs successfully submitted
please see tail of /Users/mchiment/ricopili/test_info for regular updates
also check bjobs -w for running jobs
possibly different command on different computer cluster: e.g. qstat -u USER
you will be informed via email if errors or successes occur

This looks OK, I think...but the manual says I should see these files:

rp_test_forest_join-nup.pdf (a merge of all PDFs)
rp_text.xls (a mock excel file)
ldsc.PGC_meta.r4.gz.tar.gz (the results from the LDScore analyses

My output (working) directory looks like this after running the test:

drwxr-x--- 2 mchiment its-rs-user 8 Apr 4 13:51 errandout
-rw-r--r-- 1 mchiment its-rs-user 271 Apr 4 13:50 heavycomp.job_list
-rwxr--r-- 1 mchiment its-rs-user 252 Apr 4 13:50 heavycomp.job_list.sub1.sh
-rwxr--r-- 1 mchiment its-rs-user 69 Apr 4 13:50 heavycomp.job_list.sub2.sh
-rw-r--r-- 1 mchiment its-rs-user 123 Apr 4 13:50 j.heavycomp.test
-rw-r--r-- 1 mchiment its-rs-user 67 Apr 4 13:50 j.heavycomp.test.id
-rw-r--r-- 1 mchiment its-rs-user 192 Apr 4 13:50 j.heavycomp.test.log
-rw-r--r-- 1 mchiment its-rs-user 120 Apr 4 13:51 j.qqplot.test
-rw-r--r-- 1 mchiment its-rs-user 0 Apr 4 13:51 j.qqplot.test.id
-rw-r--r-- 1 mchiment its-rs-user 186 Apr 4 13:51 j.qqplot.test.log
-rw-r--r-- 1 mchiment its-rs-user 69 Apr 4 13:50 j._te_test
-rw-r--r-- 1 mchiment its-rs-user 49 Apr 4 13:50 j._te_test.id
-rw-r--r-- 1 mchiment its-rs-user 185 Apr 4 13:50 j._te_test.log
-rw-r--r-- 1 mchiment its-rs-user 33K Apr 4 13:50 PGC_cohort1.ch.fl.r4.gz
-rw-r--r-- 1 mchiment its-rs-user 34K Apr 4 13:50 PGC_cohort2.ch.fl.r4.gz
-rw-r--r-- 1 mchiment its-rs-user 34K Apr 4 13:50 PGC_cohort3.ch.fl.r4.gz
-rw-r--r-- 1 mchiment its-rs-user 34K Apr 4 13:50 PGC_cohort4.ch.fl.r4.gz
-rw-r--r-- 1 mchiment its-rs-user 54K Apr 4 13:50 PGC_meta.r4.gz
-rw-r--r-- 1 mchiment its-rs-user 717 Apr 4 13:51 qqplot.job_list
-rw-r--r-- 1 mchiment its-rs-user 8 Apr 4 13:51 touch.10.finished
-rw-r--r-- 1 mchiment its-rs-user 8 Apr 4 13:51 touch.11.finished
-rw-r--r-- 1 mchiment its-rs-user 8 Apr 4 13:51 touch.12.finished
-rw-r--r-- 1 mchiment its-rs-user 8 Apr 4 13:51 touch.13.finished
-rw-r--r-- 1 mchiment its-rs-user 8 Apr 4 13:51 touch.14.finished
-rw-r--r-- 1 mchiment its-rs-user 8 Apr 4 13:51 touch.15.finished
-rw-r--r-- 1 mchiment its-rs-user 8 Apr 4 13:51 touch.16.finished
-rw-r--r-- 1 mchiment its-rs-user 8 Apr 4 13:51 touch.17.finished
-rw-r--r-- 1 mchiment its-rs-user 8 Apr 4 13:51 touch.18.finished
-rw-r--r-- 1 mchiment its-rs-user 8 Apr 4 13:51 touch.19.finished
-rw-r--r-- 1 mchiment its-rs-user 8 Apr 4 13:51 touch.1.finished
-rw-r--r-- 1 mchiment its-rs-user 8 Apr 4 13:51 touch.20.finished
-rw-r--r-- 1 mchiment its-rs-user 8 Apr 4 13:51 touch.2.finished
-rw-r--r-- 1 mchiment its-rs-user 8 Apr 4 13:51 touch.3.finished
-rw-r--r-- 1 mchiment its-rs-user 8 Apr 4 13:51 touch.4.finished
-rw-r--r-- 1 mchiment its-rs-user 8 Apr 4 13:51 touch.5.finished
-rw-r--r-- 1 mchiment its-rs-user 8 Apr 4 13:51 touch.6.finished
-rw-r--r-- 1 mchiment its-rs-user 8 Apr 4 13:51 touch.7.finished
-rw-r--r-- 1 mchiment its-rs-user 8 Apr 4 13:51 touch.8.finished
-rw-r--r-- 1 mchiment its-rs-user 8 Apr 4 13:51 touch.9.finished


Looking in "errandout" we have from the logs:

mchiment@argon-login-1: errandout$ cat _te_test.e7447609
Unable to run job: denied: host "argon-lc-g11-16.hpc" is not a submit host
Exiting

AND

mchiment@argon-login-1: errandout$ cat _te_test.o7447609
Config_file: /Users/mchiment/ricopili.conf

.......testing email program....
!!Warning!! : No mutt command available, trying mail
mail found in /usr/bin

also test scratchdir, ldscore, starting R


succ: touch.1.finished
succ: touch.2.finished
succ: touch.3.finished
succ: touch.4.finished
succ: touch.5.finished
succ: touch.6.finished
succ: touch.7.finished
succ: touch.8.finished
succ: touch.9.finished
succ: touch.10.finished
succ: touch.11.finished
succ: touch.12.finished
succ: touch.13.finished
succ: touch.14.finished
succ: touch.15.finished
succ: touch.16.finished
succ: touch.17.finished
succ: touch.18.finished
succ: touch.19.finished
succ: touch.20.finished
PGC_cohort1.ch.fl.r4.gz.out-qq.pdf not found, need to start job
PGC_cohort2.ch.fl.r4.gz.out-qq.pdf not found, need to start job
PGC_cohort3.ch.fl.r4.gz.out-qq.pdf not found, need to start job
PGC_cohort4.ch.fl.r4.gz.out-qq.pdf not found, need to start job
Config_file: /Users/mchiment/ricopili.conf
starting job_array, j.qqplot.test
batch_taskid: $SGE_TASK_ID
this is the job_array sent: my.start_job --parn 16 -n $SGE_TASK_ID --jobfile qqplot.job_list
qsub -l h_vmem=2g -l h_rt=2:00:00 -tc 0 -t 1-1 -o /Users/mchiment/temp/test_ricopili/test_2/errandout -e /Users/mchiment/temp/test_ricopili/test_2/errandout -N qqplot.test j.qqplot.test > j.qqplot.test.id
->system call failed: 1
Config_file: /Users/mchiment/ricopili.conf
stdout from array submission:
dependent job ID:


Error: something seems wrong with entry batch_job_output_jid in your custom file
please revisit installation process
Something is very strange since the array submission output is empty as well

5 jobs successfully submitted
please see tail of /Users/mchiment/ricopili/test_info for regular updates
also check bjobs -w for running jobs
possibly different command on different computer cluster: e.g. qstat -u USER
you will be informed via email if errors or successes occur

Config custom file:

mchiment@argon-login-1: rp_bin$ cat rp_config.custom.txt

for details please refer to https://docs.google.com/document/d/14aa-oeT5hF541I8hHsDAL_42oyvlHRC5FWR7gir4xco/edit?usp=sharing

and https://docs.google.com/spreadsheets/d/1LhNYIXhFi7yXBC17UkjI1KMzHhKYz0j2hwnJECBGZk4/edit?usp=sharing

variable_name variable_value

rp_dependencies_dir /Users/mchiment/ricopili/dependencies
R_packages_dir /Users/mchiment/R/x86_64-pc-linux-gnu-library/3.5
starting_R module_SPACE_load_SPACE_R
path_to_Perlmodules /Users/mchiment/ricopili/dependencies
path_to_scratchdir /localscratch/Users/mchiment
starting_ldsc source_SPACE_activate_SPACE_ldsc;SPACE_python_SPACE/Users/mchiment/ricopili/dependencies/ldsc
ldsc_reference /Users/mchiment/ricopili/dependencies/ldsc
rp_user_initials msc
rp_user_email [email protected]
rp_logfiles ~/ricopili


---- jobarray and queueing parameters:


batch_jobcommand qsub
batch_memory_request -l_SPACE_h_vmem=XXXg
batch_walltime -l_SPACE_h_rt=HH:MM:SS
batch_array -t_SPACE_1-XXX
batch_max_parallel_jobs_per_one_array -tc_SPACE_YYY
batch_jobfile XXX
batch_name -N_SPACE_XXX
batch_stdout -o_SPACE_XXX
batch_stderr -e_SPACE_XXX
batch_job_dependency -hold_jid_SPACE_XXX
batch_array_task_id $SGE_TASK_ID
batch_other_job_flags NONE
batch_job_output_jid Your_SPACE_job-array_SPACE_XXX.1-YYY:1_SPACE("ZZZ")_SPACE_has_SPACE_been_SPACE_submitted
batch_ncores_per_node 56
batch_mem_per_node 32

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant