-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
seg fault in pio_syncfile #14
Comments
Hi Kate, We will be happy to look into this. Could you provide us with a case description for when you get this error? What kind of machine are you running on? |
Hi Kate, I'm trying to reproduce this on Yellowstone. I've been running on a Penguin Kate On Thu, Sep 24, 2015 at 10:07 AM, Kate Thayer-Calder <
|
Kate, This sounds stupid I'm sure, but I've got an interactive session on caldera Kate On Thu, Sep 24, 2015 at 10:33 AM, Katherine Hedstrom [email protected]
|
Hi Kate, It runs on Yellowstone, no problem. Kate On Thu, Sep 24, 2015 at 10:33 AM, Katherine Hedstrom [email protected]
|
To run on caldera in an interactive session you use On Thu, Sep 24, 2015 at 12:56 PM, Kate Hedstrom [email protected]
Jim Edwards CESM Software Engineer |
Hi Jim and Kate, Since ifort was working for me on Yellowstone, I built everything with the [pacman9:483725] *** An error occurred in MPI_Alltoallw Kate On Thu, Sep 24, 2015 at 1:43 PM, jedwards4b [email protected]
|
Hi Kate, We have a local linux cluster where I'd like to try to reproduce this problem, but I need a little more information. How are you configuring PIO to start? When you say you are going from 4 tasks to 2, how are you making that change? Any information you can give me about how best to try to reproduce this error would be very helpful. If you have code that you can tar up and send to me, that would be REALLY great, but I understand that can be difficult. My email is [email protected], if you want to directly contact me. Thanks! |
Update: It looks like some of the issues were caused by an incorrect configuration:
PIO_STRIDE = 4 |
I have a case which runs on 4 cores with 4 pio tasks and blows up with 1 or 2 pio tasks. It's dying in the call to pio_syncfile. I get pages of this sort of output (see below), then the seg fault. The first active pio process is the one that complains from inside memcpy. Here's the stack trace:
mod_pio`netcdf_sync, FP=7fff8067c170
GPTLstop: GPTLinitialize has not been called
GPTLstart name=PIO:write_darray_multi_nc: GPTLinitialize has not been called
GPTLstop: GPTLinitialize has not been called
GPTLstart name=PIO:flush_output_buffer: GPTLinitialize has not been called
/archive/u1/uaf/kate/src/parallelio/src/clib/pio_darray.c 1362 2
/archive/u1/uaf/kate/src/parallelio/src/clib/pio_darray.c 1366 2
GPTLstop: GPTLinitialize has not been called
GPTLstart name=PIO:write_darray_multi_nc: GPTLinitialize has not been called
GPTLstop: GPTLinitialize has not been called
GPTLstart name=PIO:flush_output_buffer: GPTLinitialize has not been called
GPTLstop: GPTLinitialize has not been called
GPTLstart name=PIO:rearrange_comp2io: GPTLinitialize has not been called
mpirun noticed that process rank 1 with PID 150416 on node pacman3 exited on signal 11 (Segmentation fault).
Currently Loaded Modulefiles:
pnetcdf is 1.6.1
The text was updated successfully, but these errors were encountered: