Segmentation fault when using GPU in supercomputer #6727

mushroomfire · 2022-11-24T13:56:19Z

Environment:
Paratera supercomputer

submit scipt:

#!/bin/bash
#SBATCH -N 1
#SBATCH -n 5
#SBATCH -p gpu 
#SBATCH --gres=gpu:1
#SBATCH --no-requeue

nvidia-smi

python test.py

Here is the test.py:

import taichi as ti
ti.init(ti.cuda)

The output file is as below:

Thu Nov 24 21:48:39 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.73.08    Driver Version: 510.73.08    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:8A:00.0 Off |                    0 |
| N/A   23C    P0    42W / 300W |      0MiB / 32768MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
/tmp/slurmd/job604193/slurm_script: line 10: 143192 Segmentation fault      python test.py

I don't know how to sovle this Segmentation fault error. If you need more detail information, please let me know. Thanks a lot.

ailzhang · 2022-11-25T01:25:26Z

Hey @mushroomfire , OOC does this repro if you run it directly on a V100 without slurs? Thanks!

mushroomfire · 2022-11-25T04:40:34Z

Hey @ailzhang, here is the results if I run script directly in shell:
python test.py

[Taichi] version 1.2.2, llvm 10.0.0, commit 608e4b57, linux, python 3.8.0
[Taichi] Starting on arch=cuda
Segmentation fault

mushroomfire added the question Question on using Taichi label Nov 24, 2022

taichi-gardener added this to Taichi Lang Nov 24, 2022

taichi-gardener moved this to Untriaged in Taichi Lang Nov 24, 2022

ailzhang assigned jim19930609 Nov 25, 2022

ailzhang moved this from Untriaged to Todo in Taichi Lang Nov 25, 2022

turbo0628 moved this from Todo to Backlog in Taichi Lang Dec 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation fault when using GPU in supercomputer #6727

Segmentation fault when using GPU in supercomputer #6727

mushroomfire commented Nov 24, 2022

ailzhang commented Nov 25, 2022

mushroomfire commented Nov 25, 2022 •

edited

Loading

Segmentation fault when using GPU in supercomputer #6727

Segmentation fault when using GPU in supercomputer #6727

Comments

mushroomfire commented Nov 24, 2022

ailzhang commented Nov 25, 2022

mushroomfire commented Nov 25, 2022 • edited Loading

mushroomfire commented Nov 25, 2022 •

edited

Loading