Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provider variable changes break gnitest #258

Closed
jswaro opened this issue Jun 29, 2015 · 3 comments
Closed

Provider variable changes break gnitest #258

jswaro opened this issue Jun 29, 2015 · 3 comments

Comments

@jswaro
Copy link
Member

jswaro commented Jun 29, 2015

Merge from OFI-WG containing the provider variable changes introduces a regression detectable with gnitest.

Output:
[RUN ] wait_verify::invalid_type
[PASS] wait_verify::invalid_type: (0.00s)
[====] Synthesis: Tested: 161 | Passing: 161 | Failing: 0 | Crashing: 0 
srun: error: nid00014: task 0: Segmentation fault

GDB shows:

Program terminated with signal 11, Segmentation fault.
#0  0x000000000044994b in dlist_remove (item=0x0) at ./include/fi_list.h:89
89      item->prev->next = item->next;
(gdb) where
#0  0x000000000044994b in dlist_remove (item=0x0) at ./include/fi_list.h:89
#1  0x0000000000449925 in fi_param_fini () at src/var.c:307
#2  0x0000000000447f25 in fi_fini () at src/fabric.c:414
#3  0x000000000040469a in __do_global_dtors_aux ()
#4  0x0000000000489f2d in _real_fini ()
#5  0x0000000000489f22 in _fini ()
#6  0x00007fe719434874 in _dl_fini () from /lib64/ld-linux-x86-64.so.2
#7  0x00007fe718251665 in __run_exit_handlers () from /lib64/libc.so.6
#8  0x00007fe7182516b5 in exit () from /lib64/libc.so.6
#9  0x00007fe71823ac3d in __libc_start_main () from /lib64/libc.so.6
#10 0x000000000040462d in _start () at ../sysdeps/x86_64/elf/start.S:113
Change List:
    Port over FI_* env vars to new var API. (commit: 59d16853f5b11151ad9fcc69fd7b03b2ccd4ac0f) (details)
    fabric/param: Use dlist for param list (commit: c9b56a6c2ad817c27e86f10ce5f475fb4703ba14) (details)
    fabric/param: Rename len to cnt in fi_getparams (commit: cd8af8c3515b9db02052f6319b314b11ac824519) (details)
    fabric/param: Define socket drop rate as int (commit: fc72f18a9799ed218bf95a7b25de02f50c34c50b) (details)
    fabric/info: Cleanup fi_info source (commit: 80709a45995361ccb2d3b370df1db9b68414c441) (details)
    fabric/param: Rename fi_param_register to fi_param_define (commit: 57f4aa6b6c69c83ce48bc6fd0db3e107be82104d) (details)
    fabric/param: Merge fi_param_get calls into a single function (commit: 851d51a51156f80c3b1b433e62b1611d73e441bc) (details)
    v1.1.0rc1 (commit: c10aa12c27deebda554dcb87974fd22bdf983498) (details)
@hppritcha
Copy link
Member

Hi Jim,

I hope this isn't an artifact of criterion's fork approach. Probably not?
I saw this error in our provider before (not this specific one but a
similar one for
the dlist type) when I had done something that corrupted the dlist. Maybe I
did a dlist remove twice of an element from the list? The dlist_remote
could
be made smarter by setting the next/prev values of the element removed to
null? That would at least call a segfault if it were invoked a second time
for the element.

Howard

2015-06-29 9:46 GMT-06:00 James Swaro [email protected]:

Merge from OFI-WG containing the provider variable changes introduces a
regression detectable with gnitest.

Output:
[RUN ] wait_verify::invalid_type
[PASS] wait_verify::invalid_type: (0.00s)
[====] Synthesis: Tested: 161 | Passing: 161 | Failing: 0 | Crashing: 0
srun: error: nid00014: task 0: Segmentation fault

GDB shows:

Program terminated with signal 11, Segmentation fault.
#0 0x000000000044994b in dlist_remove (item=0x0) at ./include/fi_list.h:89
89 item->prev->next = item->next;
(gdb) where
#0 0x000000000044994b in dlist_remove (item=0x0) at ./include/fi_list.h:89
#1 0x0000000000449925 in fi_param_fini () at src/var.c:307
#2 0x0000000000447f25 in fi_fini () at src/fabric.c:414
#3 0x000000000040469a in __do_global_dtors_aux ()
#4 0x0000000000489f2d in _real_fini ()
#5 0x0000000000489f22 in _fini ()
#6 0x00007fe719434874 in _dl_fini () from /lib64/ld-linux-x86-64.so.2
#7 0x00007fe718251665 in __run_exit_handlers () from /lib64/libc.so.6
#8 0x00007fe7182516b5 in exit () from /lib64/libc.so.6
#9 0x00007fe71823ac3d in __libc_start_main () from /lib64/libc.so.6
#10 0x000000000040462d in _start () at ../sysdeps/x86_64/elf/start.S:113

Change List:
Port over FI_* env vars to new var API. (commit: 59d1685) (details)
fabric/param: Use dlist for param list (commit: c9b56a6) (details)
fabric/param: Rename len to cnt in fi_getparams (commit: cd8af8c) (details)
fabric/param: Define socket drop rate as int (commit: fc72f18) (details)
fabric/info: Cleanup fi_info source (commit: 80709a4) (details)
fabric/param: Rename fi_param_register to fi_param_define (commit: 57f4aa6) (details)
fabric/param: Merge fi_param_get calls into a single function (commit: 851d51a) (details)
v1.1.0rc1 (commit: c10aa12) (details)


Reply to this email directly or view it on GitHub
#258.

@jswaro
Copy link
Member Author

jswaro commented Jun 29, 2015

I don't think it is. I solved part of the crash by initializing the
param_list outside of the init functions, so that's part of it. I think
there is a genuine bug here.
-- Jim

On 6/29/15, 11:15 AM, "Howard Pritchard" [email protected] wrote:

Hi Jim,

I hope this isn't an artifact of criterion's fork approach. Probably not?
I saw this error in our provider before (not this specific one but a
similar one for
the dlist type) when I had done something that corrupted the dlist. Maybe
I
did a dlist remove twice of an element from the list? The dlist_remote
could
be made smarter by setting the next/prev values of the element removed to
null? That would at least call a segfault if it were invoked a second time
for the element.

Howard

2015-06-29 9:46 GMT-06:00 James Swaro [email protected]:

Merge from OFI-WG containing the provider variable changes introduces a
regression detectable with gnitest.

Output:
[RUN ] wait_verify::invalid_type
[PASS] wait_verify::invalid_type: (0.00s)
[====] Synthesis: Tested: 161 | Passing: 161 | Failing: 0 | Crashing: 0
srun: error: nid00014: task 0: Segmentation fault

GDB shows:

Program terminated with signal 11, Segmentation fault.
#0 0x000000000044994b in dlist_remove (item=0x0) at
./include/fi_list.h:89
89 item->prev->next = item->next;
(gdb) where
#0 0x000000000044994b in dlist_remove (item=0x0) at
./include/fi_list.h:89
#1 0x0000000000449925 in fi_param_fini () at src/var.c:307
#2 0x0000000000447f25 in fi_fini () at src/fabric.c:414
#3 0x000000000040469a in __do_global_dtors_aux ()
#4 0x0000000000489f2d in _real_fini ()
#5 0x0000000000489f22 in _fini ()
#6 0x00007fe719434874 in _dl_fini () from /lib64/ld-linux-x86-64.so.2
#7 0x00007fe718251665 in __run_exit_handlers () from /lib64/libc.so.6
#8 0x00007fe7182516b5 in exit () from /lib64/libc.so.6
#9 0x00007fe71823ac3d in __libc_start_main () from /lib64/libc.so.6
#10 0x000000000040462d in _start () at ../sysdeps/x86_64/elf/start.S:113

Change List:
Port over FI_* env vars to new var API. (commit:
59d1685) (details)
fabric/param: Use dlist for param list (commit:
c9b56a6) (details)
fabric/param: Rename len to cnt in fi_getparams (commit:
cd8af8c) (details)
fabric/param: Define socket drop rate as int (commit:
fc72f18) (details)
fabric/info: Cleanup fi_info source (commit:
80709a4) (details)
fabric/param: Rename fi_param_register to fi_param_define (commit:
57f4aa6) (details)
fabric/param: Merge fi_param_get calls into a single function (commit:
851d51a) (details)
v1.1.0rc1 (commit: c10aa12) (details)


Reply to this email directly or view it on GitHub
#258.


Reply to this email directly or
view it on GitHub
<#258 (comment)
365>.

@jswaro
Copy link
Member Author

jswaro commented Jun 29, 2015

This bug is fixed by ofiwg#1120. The crash should be resolved, but the two debug build issues are separate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants