Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run_grnboost failed error 140 #24

Closed
BrunoGuillotin opened this issue Feb 4, 2025 · 2 comments
Closed

run_grnboost failed error 140 #24

BrunoGuillotin opened this issue Feb 4, 2025 · 2 comments

Comments

@BrunoGuillotin
Copy link

Dear MiniEx team,

I ran MiniEx in the past with success, thanks for this great tool.
I am now running MiniEx on a bigger scmatrix containing 12 to 17k cells. But I get the below error.

`executor > local (1), slurm (4)
[fe/da2b3b] process > check_user_input (1) [100%] 1 of 1 ✔
[14/5472e9] process > run_grnboost (1) [100%] 1 of 1, failed: 1 ✘
[e5/8ccdc3] process > get_expressed_genes (1) [100%] 1 of 1 ✔
[0d/d169be] process > unzip_motif_mappings [100%] 1 of 1 ✔
[- ] process > run_enricher_motifs -
[- ] process > filter_motifs -
[4e/bf8ead] process > get_top_degs (1) [100%] 1 of 1 ✔
[- ] process > run_enricher_cluster -
[- ] process > filter_expression -
[- ] process > make_info_file -
[- ] process > make_regulon_clustermap -
[- ] process > get_network_centrality -
[- ] process > make_go_enrichment_files -
[- ] process > run_enricher_go -
[- ] process > check_reference -
[- ] process > make_ref_ranking_dataframe -
[- ] process > make_borda -
[- ] process > score_edges -
[- ] process > make_top_regulons_heatmaps -
[- ] process > make_regmaps -
[- ] process > make_log_file -
Error executing process > 'run_grnboost (1)'

Caused by:
Process run_grnboost (1) terminated with an error exit status (140)

Command executed:

OMP_NUM_THREADS=1 python3 "/scratch/bg93/MiniEx/bin/MINIEX_grnboostMultiprocess.py" Ath_TF_list.tsv "CTR_matrix.tsv" "12" "CTR_grnboost2.tsv"

Command exit status:
140

Command output:
Loaded expression matrix of 12661 cells and 23236 genes in 56.45990753173828 seconds...
Loaded 1877 TFs...

Command error:
11%|nf-core/sarek#1 | 2610/23236 [52:21<10:57:07, 1.91s/it]
11%|nf-core/sarek#1 | 2611/23236 [52:23<10:57:08, 1.91s/it]
11%|nf-core/sarek#1 | 2612/23236 [52:29<18:21:15, 3.20s/it]
11%|nf-core/sarek#1 | 2613/23236 [52:48<45:50:19, 8.00s/it]
11%|nf-core/sarek#1 | 2631/23236 [53:00<33:12:08, 5.80s/it]
11%|nf-core/sarek#1 | 2642/23236 [53:09<24:31:30, 4.29s/it]
11%|nf-core/sarek#1 | 2646/23236 [53:21<22:34:01, 3.95s/it]
11%|nf-core/sarek#1 | 2656/23236 [53:22<15:56:47, 2.79s/it]
11%|nf-core/sarek#1 | 2659/23236 [53:37<19:35:20, 3.43s/it]
11%|nf-core/sarek#1 | 2667/23236 [53:39<14:09:23, 2.48s/it]
12%|nf-core/sarek#1 | 2676/23236 [54:08<15:31:18, 2.72s/it]
12%|nf-core/sarek#1 | 2677/23236 [54:09<11:43:38, 2.05s/it]
12%|nf-core/sarek#1 | 2703/23236 [54:16<8:39:27, 1.52s/it]
12%|nf-core/sarek#1 | 2706/23236 [54:25<11:05:48, 1.95s/it]
12%|nf-core/sarek#1 | 2710/23236 [54:27<8:44:37, 1.53s/it]
12%|nf-core/sarek#1 | 2713/23236 [54:38<12:26:41, 2.18s/it]
12%|nf-core/sarek#1 | 2719/23236 [54:47<11:18:30, 1.98s/it]
12%|nf-core/sarek#1 | 2727/23236 [55:15<13:50:27, 2.43s/it]
12%|nf-core/sarek#1 | 2751/23236 [55:28<10:36:12, 1.86s/it]
12%|nf-core/sarek#1 | 2757/23236 [55:34<9:02:29, 1.59s/it]
12%|nf-core/sarek#1 | 2771/23236 [55:51<8:24:02, 1.48s/it]
12%|nf-core/sarek#1 | 2781/23236 [55:52<6:08:46, 1.08s/it]
12%|nf-core/sarek#2 | 2790/23236 [55:54<4:31:32, 1.25it/s]
12%|nf-core/sarek#2 | 2793/23236 [56:11<12:58:49, 2.29s/it]
12%|nf-core/sarek#2 | 2801/23236 [56:25<12:11:03, 2.15s/it]
12%|nf-core/sarek#2 | 2807/23236 [56:27<8:59:57, 1.59s/it]
12%|nf-core/sarek#2 | 2820/23236 [56:39<7:48:27, 1.38s/it]
12%|nf-core/sarek#2 | 2827/23236 [57:01<10:59:08, 1.94s/it]
12%|nf-core/sarek#2 | 2828/23236 [57:18<36:20:59, 6.41s/it]
12%|nf-core/sarek#2 | 2843/23236 [57:20<25:39:08, 4.53s/it]
12%|nf-core/sarek#2 | 2854/23236 [57:22<18:11:47, 3.21s/it]
12%|nf-core/sarek#2 | 2860/23236 [57:33<15:49:32, 2.80s/it]
12%|nf-core/sarek#2 | 2869/23236 [57:35<11:24:50, 2.02s/it]
12%|nf-core/sarek#2 | 2872/23236 [57:35<8:07:03, 1.44s/it]
12%|nf-core/sarek#2 | 2880/23236 [57:37<6:02:51, 1.07s/it]
12%|nf-core/sarek#2 | 2882/23236 [57:42<8:34:21, 1.52s/it]
12%|nf-core/sarek#2 | 2883/23236 [57:51<22:33:43, 3.99s/it]
12%|nf-core/sarek#2 | 2888/23236 [57:55<16:58:38, 3.00s/it]
12%|nf-core/sarek#2 | 2891/23236 [57:56<12:14:50, 2.17s/it]
12%|nf-core/sarek#2 | 2893/23236 [58:02<14:08:11, 2.50s/it]
12%|nf-core/sarek#2 | 2898/23236 [58:06<11:06:51, 1.97s/it]
12%|nf-core/sarek#2 | 2901/23236 [58:07<8:14:44, 1.46s/it]
12%|nf-core/sarek#2 | 2903/23236 [58:18<15:44:27, 2.79s/it]
12%|nf-core/sarek#2 | 2904/23236 [58:24<21:28:46, 3.80s/it]
13%|nf-core/sarek#2 | 2920/23236 [58:25<15:04:11, 2.67s/it]
13%|nf-core/sarek#2 | 2922/23236 [58:29<14:20:10, 2.54s/it]
13%|nf-core/sarek#2 | 2925/23236 [58:39<15:41:16, 2.78s/it]
13%|nf-core/sarek#2 | 2927/23236 [58:46<16:08:53, 2.86s/it]
13%|nf-core/sarek#2 | 2928/23236 [58:49<17:03:34, 3.02s/it]
13%|nf-core/sarek#2 | 2932/23236 [58:51<12:41:16, 2.25s/it]

Work dir:
/scratch/bg93/work/14/5472e9590f85f41e015c3b9185076f

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh
`
This seems to be an memory issue (Error 140),
But I already set up the memory to 200Gb using 12 cpu on my sbatch command and I made sure to set up the config file with:

withName: run_grnboost {
memory = '200 GB'
cpus = 12
Each time I increase the memory the % of progress increases by a little bit like with 100Gb memory i got up to 9% and with 200Gb I got to 13%.
Is that normal or should I change something, maybe decrease the size of the matrix, removing low expressed genes etc... ?
It seems way too much memory to be honest ^^

Thank you in advance for your help,
Bruno

Command used and terminal output
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --cpus-per-task=12
#SBATCH --mem=500GB
#SBATCH --time=72:00:00
#SBATCH --job-name=bg93_10x
#SBATCH --mail-type=END
#SBATCH --mail-user=
#SBATCH --output=slurm_%j.out

module purge
module load singularity/3.7.4
module load nextflow/21.10.6

nextflow -C miniex_CtrCut_CTR.config run MiniEx/miniex.nf

@jstaut
Copy link
Collaborator

jstaut commented Feb 4, 2025

Dear Bruno,

Thank you for providing a detailed report of the problem. 17k cells should normally not be a problem. We suspect that the issue lies with having to specify resource parameters per process.

Nextflow works in such a way that it submits independent jobs for each process, to the queue of your cluster environment, which seems to be SLURM in your case. Therefore, the job that executes the Nextflow command nextflow -C miniex_CtrCut_CTR.config run MiniEx/miniex.nf is not doing much computation, it merely organizes the work directory and submits other jobs to the cluster, which are doing the actual computations of the MINI-EX workflow. Thus, it is not needed to give many resources to the script executing the Nextflow command. Normally, 1 CPU and 5G is enough for this. It is however important to give sufficient resources to individual jobs it submits, which you can do in the config file.

We see that you gave the run_grnboost process 12 CPUs and 200G memory in the config file. This sounds more than enough to us, given the size of the dataset. However, you explicitly specify #SBATCH --time=72:00:00 in your main script, which makes sense, but note that this script only submits jobs, and the resources specified for this job will not be applied to all the jobs that it submits. Those have to be specified in the config file.

We suspect that the cluster on which you are working might apply a default wall time of 1h, if nothing is specified. Since no wall time is specified for the run_grnboost process, it seems likely that the system will always kill this job after 1h. Error 140 can be related to insufficient memory, but also to insufficient wall time.

If you specify the wall time for the run_grnboost process as follows, the problem might be solved.

withName: run_grnboost {
    memory = '200 GB'
    cpus = 12
    time = '48 h'
}

Thanks for letting us know if this helped!

Kind regards,
The MINI-EX team

@BrunoGuillotin
Copy link
Author

Dear MINI-EX team,

Yep that was the problem ^^. Everything is running smoothly and I got all expected output files.
Thank you very much for your fast reply and great help,

Regards,
Bruno

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants