Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buffer size assert failure #1665

Closed
yuyichao opened this issue Mar 5, 2016 · 18 comments
Closed

Buffer size assert failure #1665

yuyichao opened this issue Mar 5, 2016 · 18 comments

Comments

@yuyichao
Copy link
Contributor

yuyichao commented Mar 5, 2016

I've repeatedly hit this assertion failure in GdbConnection::write_data_raw when trying to debug a segfault.

The message suggests that this is because the static buffer size is too small?

Platform: ArchLinux x64, rr version 11a08c0

@rocallahan
Copy link
Collaborator

Can you uncomment DEBUGTAG in GdbConnection.cc and dump the resulting logs somewhere?

@yuyichao
Copy link
Contributor Author

yuyichao commented Mar 5, 2016

I assume you want the output during rr replay? The issue takes ~1h to reproduce so it'll take some time.

@yuyichao
Copy link
Contributor Author

yuyichao commented Mar 5, 2016

@rocallahan The log is here. The first file is the last ~200 lines of the second one, which is the 24M full log, including all of the very long write_flush calls before the assert triggers.

@rocallahan
Copy link
Collaborator

I have no idea what gdb is doing here. Can you find out what symbol address 0x4c3fe2e is for?

@rocallahan
Copy link
Collaborator

and 0x4c44fd8

@rocallahan
Copy link
Collaborator

I wonder if it's some weird auto-loading macro stuff. You might try disabling that to see if it makes a difference.

@yuyichao
Copy link
Contributor Author

yuyichao commented Mar 6, 2016

Is there a way to figure that out without replaying it for another 2 hours?

@rocallahan
Copy link
Collaborator

No, so I guess it's probably not worth it.

I'll try to figure out a way to make gdb generate such large requests. If I can, I'll try just implementing dynamic allocation, which should fix your problems.

@yuyichao
Copy link
Contributor Author

yuyichao commented Mar 6, 2016

Also Both 0x4c44fd8 and 0x4c3fe2e looks like heap addresses.

@yuyichao
Copy link
Contributor Author

yuyichao commented Mar 6, 2016

As a context, this is running a test for array operations in julia (with a JIT). I have some other processes running the same test (since the problem is not very reproducible) and the /proc/<pid>/maps of those processes starts with

yuyichao% cat /proc/13905/maps 
00400000-00405000 r-xp 00000000 08:03 7479371                            /home/yuyichao/projects/julia/tests/master/usr/bin/julia
00604000-00605000 rw-p 00004000 08:03 7479371                            /home/yuyichao/projects/julia/tests/master/usr/bin/julia
017b3000-063d4000 rw-p 00000000 00:00 0                                  [heap]
68000000-68200000 rwxp 00000000 00:00 0 
70000000-70001000 r-xp 00000000 08:02 3066689                            /usr/bin/rr_page_64
7f0e30dd5000-7f0e31556000 rw-p 00000000 00:00 0 
7f0e31556000-7f0e32457000 rw-p 00000000 00:00 0 
7f0e32457000-7f0e324fd000 r-xp 00000000 00:00 0 
7f0e324fd000-7f0e32cff000 rw-p 00000000 00:00 0 
7f0e32e80000-7f0e3313f000 r-xp 00000000 00:00 0 
7f0e3313f000-7f0e33140000 rw-p 00000000 00:00 0 
7f0e33140000-7f0e33480000 r-xp 00000000 00:00 0 
7f0e33480000-7f0e33481000 r-xp 00000000 08:03 7609562                    /home/yuyichao/projects/julia/tests/master/usr/lib/libsuitesparse_wrapper.so
7f0e33481000-7f0e33680000 ---p 00001000 08:03 7609562                    /home/yuyichao/projects/julia/tests/master/usr/lib/libsuitesparse_wrapper.so
7f0e33680000-7f0e33681000 rw-p 00000000 08:03 7609562                    /home/yuyichao/projects/julia/tests/master/usr/lib/libsuitesparse_wrapper.so
7f0e33681000-7f0e33683000 r-xp 00000000 08:02 3063424                    /usr/lib/libsuitesparseconfig.so.4.4.4

and

yuyichao% cat /proc/13808/maps
00400000-00405000 r-xp 00000000 08:03 7479371                            /home/yuyichao/projects/julia/tests/master/usr/bin/julia
00604000-00605000 rw-p 00004000 08:03 7479371                            /home/yuyichao/projects/julia/tests/master/usr/bin/julia
01e95000-06a05000 rw-p 00000000 00:00 0                                  [heap]
68000000-68200000 rwxp 00000000 00:00 0 
70000000-70001000 r-xp 00000000 08:02 3066689                            /usr/bin/rr_page_64
7ff8ccbdd000-7ff8ccc02000 r-xp 00000000 08:03 7610692                    /home/yuyichao/projects/julia/tests/master/usr/lib/libopenlibm.so.1.0
7ff8ccc02000-7ff8cce01000 ---p 00025000 08:03 7610692                    /home/yuyichao/projects/julia/tests/master/usr/lib/libopenlibm.so.1.0
7ff8cce01000-7ff8cce02000 rw-p 00024000 08:03 7610692                    /home/yuyichao/projects/julia/tests/master/usr/lib/libopenlibm.so.1.0

ASLR is on so they are not the same but it seems that both address always fall in the heap address range.

@rocallahan
Copy link
Collaborator

OK thanks. FWIW if you're having a difficulty reproducing a bug, try rr chaos mode.

@yuyichao
Copy link
Contributor Author

yuyichao commented Mar 6, 2016

Ha, I didn't know that (seems new?) THX, will definitely give it a try next time.

@yuyichao
Copy link
Contributor Author

yuyichao commented Mar 6, 2016

(Just realized that the assertion I hit is the one for write instead of the one I linked above .....)

@rocallahan
Copy link
Collaborator

I can't get my gdb (7.10.1-30.fc23) to make such large requests. It always breaks memory reads/writes up into 8192 bytes or less.

@yuyichao
Copy link
Contributor Author

yuyichao commented Mar 6, 2016

I've changed the assertion to a sleep and got a backtrace of gdb.

0x00007fe412114903 in select () from /usr/lib/libc.so.6
(gdb) bt
#0  0x00007fe412114903 in select () from /usr/lib/libc.so.6
#1  0x0000000000532375 in gdb_select (n=<optimized out>, readfds=<optimized out>, writefds=<optimized out>, exceptfds=<optimized out>, timeout=<optimized out>)
    at posix-hdep.c:31
#2  0x000000000049ea30 in ser_base_wait_for (timeout=timeout@entry=1, scb=0x1eb92f0) at ser-base.c:224
#3  0x000000000049eb6e in do_ser_base_readchar (scb=0x1eb92f0, timeout=<optimized out>) at ser-base.c:335
#4  0x000000000049ec78 in generic_readchar (scb=0x1eb92f0, timeout=<optimized out>, do_readchar=<optimized out>) at ser-base.c:410
#5  0x0000000000695651 in serial_readchar (scb=0x1eb92f0, timeout=timeout@entry=-1) at serial.c:382
#6  0x00000000004b9d8a in readchar (timeout=-1) at remote.c:8173
#7  0x00000000004ba6a7 in getpkt_or_notif_sane_1 (buf=buf@entry=0x1672950, sizeof_buf=sizeof_buf@entry=0x1672958, forever=forever@entry=0, 
    expecting_notif=expecting_notif@entry=0, is_notif=is_notif@entry=0x0) at remote.c:8720
#8  0x00000000004c127c in getpkt_sane (forever=0, sizeof_buf=0x1672958, buf=0x1672950) at remote.c:8820
#9  getpkt (forever=0, sizeof_buf=0x1672958, buf=0x1672950) at remote.c:8660
#10 remote_read_bytes_1 (memaddr=<optimized out>, memaddr@entry=79974360, myaddr=myaddr@entry=0x119a9290 "", len_units=len_units@entry=19824, 
    unit_size=unit_size@entry=1, xfered_len_units=xfered_len_units@entry=0x7ffc4b010770) at remote.c:7889
#11 0x00000000004c1b6e in remote_read_bytes (xfered_len=0x7ffc4b010770, unit_size=1, len=19824, myaddr=0x119a9290 "", memaddr=79974360, 
    ops=0xc5ec40 <extended_remote_ops>) at remote.c:8024
#12 remote_xfer_partial (ops=0xc5ec40 <extended_remote_ops>, object=TARGET_OBJECT_MEMORY, annex=0x0, readbuf=0x119a9290 "", writebuf=0x0, offset=79974360, len=19824, 
    xfered_len=0x7ffc4b010770) at remote.c:9988
#13 0x00000000006051a2 in raw_memory_xfer_partial (ops=ops@entry=0xc5ec40 <extended_remote_ops>, readbuf=readbuf@entry=0x119a9290 "", writebuf=writebuf@entry=0x0, 
    memaddr=memaddr@entry=79974360, len=19824, xfered_len=xfered_len@entry=0x7ffc4b010770) at target.c:1094
#14 0x00000000006052f9 in memory_xfer_partial_1 (ops=ops@entry=0xc5ec40 <extended_remote_ops>, object=object@entry=TARGET_OBJECT_MEMORY, 
    readbuf=readbuf@entry=0x119a9290 "", writebuf=writebuf@entry=0x0, memaddr=memaddr@entry=79974360, len=<optimized out>, xfered_len=0x7ffc4b010770) at target.c:1225
#15 0x00000000006055d9 in memory_xfer_partial (xfered_len=0x7ffc4b010770, len=<optimized out>, memaddr=79974360, writebuf=0x0, readbuf=0x119a9290 "", 
    object=TARGET_OBJECT_MEMORY, ops=0xc5ec40 <extended_remote_ops>) at target.c:1252
#16 target_xfer_partial (ops=ops@entry=0xc5ec40 <extended_remote_ops>, object=object@entry=TARGET_OBJECT_MEMORY, annex=annex@entry=0x0, 
    readbuf=readbuf@entry=0x119a9290 "", writebuf=writebuf@entry=0x0, offset=offset@entry=79974360, len=19824, xfered_len=0x7ffc4b010770) at target.c:1328
#17 0x0000000000605c43 in target_read_partial (xfered_len=0x7ffc4b010770, len=19824, offset=79974360, buf=0x119a9290 "", annex=0x0, object=TARGET_OBJECT_MEMORY, 
    ops=0xc5ec40 <extended_remote_ops>) at target.c:1593
#18 target_read (ops=0xc5ec40 <extended_remote_ops>, object=object@entry=TARGET_OBJECT_MEMORY, annex=annex@entry=0x0, buf=0x119a9290 "", offset=79974360, 
    len=len@entry=19824) at target.c:1633
#19 0x0000000000605cb0 in target_read_memory (memaddr=<optimized out>, myaddr=<optimized out>, len=len@entry=19824) at target.c:1420
#20 0x00000000006c5979 in mem_bfd_iovec_pread (abfd=<optimized out>, stream=<optimized out>, buf=<optimized out>, nbytes=19824, offset=<optimized out>) at jit.c:115
#21 0x00000000006ede95 in opncls_bread (abfd=<optimized out>, buf=<optimized out>, nbytes=<optimized out>) at opncls.c:503
#22 0x00000000006ea612 in bfd_bread (ptr=ptr@entry=0x119a9290, size=<optimized out>, abfd=abfd@entry=0x1195cd30) at bfdio.c:196
#23 0x00000000006ff6d0 in elf_slurp_reloc_table_from_section (abfd=abfd@entry=0x1195cd30, rel_hdr=rel_hdr@entry=0x1197ea90, reloc_count=reloc_count@entry=826, 
    relents=relents@entry=0x119a2b40, symbols=0x11944d60, dynamic=0, asect=0x11979b30, asect=0x11979b30) at elfcode.h:1402
#24 0x00000000006ffa2b in bfd_elf64_slurp_reloc_table (abfd=0x1195cd30, asect=0x11979b30, symbols=0x11944d60, dynamic=0) at elfcode.h:1541
#25 0x000000000070cab6 in _bfd_elf_canonicalize_reloc (abfd=<optimized out>, section=0x11979b30, relptr=0x119a1150, symbols=<optimized out>) at elf.c:7622
#26 0x000000000076d39b in bfd_generic_get_relocated_section_contents (abfd=0x1195cd30, link_info=0x7ffc4b010a80, link_order=<optimized out>, data=0x11991790 "z", 
    relocatable=0, symbols=0x11944d60) at reloc.c:7932
#27 0x00000000006f0120 in bfd_simple_get_relocated_section_contents (abfd=0x1195cd30, sec=<optimized out>, outbuf=0x11991790 "z", symbol_table=0x11944d60)
    at simple.c:270
#28 0x0000000000627af9 in dwarf2_read_section (objfile=objfile@entry=0x11945120, info=0x1198b2b0) at dwarf2read.c:2335
#29 0x000000000064481f in dwarf2_read_section (info=<optimized out>, objfile=0x11945120) at dwarf2read.c:2285
#30 dwarf2_build_psymtabs_hard (objfile=0x11945120) at dwarf2read.c:6463
---Type <return> to continue, or q <return> to quit---
#31 dwarf2_build_psymtabs (objfile=0x11945120) at dwarf2read.c:4268
#32 0x00000000005a2ef9 in require_partial_symbols (objfile=objfile@entry=0x11945120, verbose=verbose@entry=0) at psymtab.c:91
#33 0x00000000005a7e84 in read_symbols (objfile=objfile@entry=0x11945120, add_flags=add_flags@entry=0) at symfile.c:899
#34 0x00000000005a7997 in syms_from_objfile_1 (add_flags=0, addrs=0x1191bef0, objfile=0x11945120) at symfile.c:1078
#35 syms_from_objfile (add_flags=0, addrs=0x1191bef0, objfile=0x11945120) at symfile.c:1094
#36 symbol_file_add_with_addrs (abfd=abfd@entry=0x1195cd30, name=<optimized out>, add_flags=add_flags@entry=0, addrs=addrs@entry=0x1191bef0, flags=flags@entry=66, 
    parent=parent@entry=0x0) at symfile.c:1191
#37 0x00000000005a7ed5 in symbol_file_add_from_bfd (abfd=abfd@entry=0x1195cd30, name=<optimized out>, add_flags=add_flags@entry=0, addrs=addrs@entry=0x1191bef0, 
    flags=flags@entry=66, parent=parent@entry=0x0) at symfile.c:1280
#38 0x00000000006c716a in jit_bfd_try_read_symtab (code_entry=<optimized out>, code_entry=<optimized out>, gdbarch=<optimized out>, entry_addr=107889088) at jit.c:944
#39 jit_register_code (gdbarch=<optimized out>, entry_addr=107889088, code_entry=<optimized out>) at jit.c:972
#40 0x00000000006c7440 in jit_event_handler (gdbarch=0x1e433a0) at jit.c:1429
#41 0x00000000005629c7 in handle_jit_event () at breakpoint.c:5798
#42 bpstat_run_callbacks (bs_head=<optimized out>) at breakpoint.c:5999
#43 0x00000000005c2f3d in process_event_stop_test (ecs=ecs@entry=0x7ffc4b011440) at infrun.c:6185
#44 0x00000000005c5176 in handle_inferior_event_1 (ecs=0x7ffc4b011440) at infrun.c:4938
#45 handle_inferior_event (ecs=ecs@entry=0x7ffc4b011440) at infrun.c:5361
#46 0x00000000005c6f55 in fetch_inferior_event (client_data=<optimized out>) at infrun.c:3921
#47 0x00000000005ddb18 in inferior_event_handler (event_type=<optimized out>, client_data=<optimized out>) at inf-loop.c:44
#48 0x00000000005dc0fc in check_async_event_handlers () at event-loop.c:1038
#49 gdb_do_one_event () at event-loop.c:302
#50 0x00000000005dc18e in start_event_loop () at event-loop.c:347
#51 0x00000000005d6003 in captured_command_loop (data=data@entry=0x0) at main.c:318
#52 0x00000000005d30fd in catch_errors (func=func@entry=0x5d5ff0 <captured_command_loop>, func_args=func_args@entry=0x0, errstring=errstring@entry=0x7ad94a "", 
    mask=mask@entry=RETURN_MASK_ALL) at exceptions.c:240
#53 0x00000000005d6bae in captured_main (data=data@entry=0x7ffc4b011790) at main.c:1157
#54 0x00000000005d30fd in catch_errors (func=func@entry=0x5d6540 <captured_main>, func_args=func_args@entry=0x7ffc4b011790, errstring=errstring@entry=0x7ad94a "", 
    mask=mask@entry=RETURN_MASK_ALL) at exceptions.c:240
#55 0x00000000005d731b in gdb_main (args=args@entry=0x7ffc4b011790) at main.c:1165
#56 0x00000000004627d5 in main (argc=<optimized out>, argv=<optimized out>) at gdb.c:32

And I believe this happens in the JIT debugging iterface http://docs.adacore.com/gdb-docs/html/gdb_30.html

@yuyichao
Copy link
Contributor Author

yuyichao commented Mar 6, 2016

I couldn't figure out how to check the process being debugged directly (rr is ptraceing it) so I dumped the heap and find the object file corresponding to the requested addresses (by finding the ELF magic number before the address) and it looks like the last two requests are indeed reading the entire debug info sections.

The in memory object file starts at 0x4c35ae0. The second last read request was a read of the .debug_info section at file offset 0xa34e of size 0x1b76 (7030). The last read request was a read of the .rela.debug_info section at file offset 0xf4f8 of size 0x4d70 (19824).

I'm not sure how can this be generated in a test (without a JIT compiler). This particular interface is apparently not too well tested and a little buggy so I'll not be surprised that it doesn't use the same optimizations used in other code path.

I'll keep the sleeping processes open in case more info is needed. (I can't do much other than dumping memory though....).

I'll also appreciate if there's some temporary workaround before this is properly fixed so that I can finish the replay.... (edit: I'll just try to increase the size of inbuf and outbuf by a lot for now...)

@yuyichao
Copy link
Contributor Author

yuyichao commented Mar 6, 2016

Dump of all the headers readelf -S of the in memory object file located at 0x4c35ae0

There are 23 section headers, starting at offset 0x14e30:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .strtab           STRTAB           0000000000000000  000148c8
       0000000000000562  0000000000000000           0     0     1
  [ 2] .text             PROGBITS         00007f49bcdd3000  00000040
       0000000000006336  0000000000000000  AX       0     0     16
  [ 3] .rela.text        RELA             0000000000000000  0000dde8
       0000000000001710  0000000000000018          22     2     8
  [ 4] .data             PROGBITS         0000000000000000  00006378
       0000000000000000  0000000000000000  WA       0     0     4
  [ 5] .bss              NOBITS           0000000000000000  00006378
       0000000000000000  0000000000000000  WA       0     0     4
  [ 6] .rodata.cst8      PROGBITS         00007f49bcdd2000  00006378
       0000000000000018  0000000000000008  AM       0     0     8
  [ 7] .debug_str        PROGBITS         0000000000000000  00006390
       00000000000009f4  0000000000000001  MS       0     0     1
  [ 8] .debug_loc        PROGBITS         0000000000000000  00006d84
       0000000000003478  0000000000000000           0     0     1
  [ 9] .debug_abbrev     PROGBITS         0000000000000000  0000a1fc
       0000000000000152  0000000000000000           0     0     1
  [10] .debug_info       PROGBITS         0000000000000000  0000a34e
       0000000000001b76  0000000000000000           0     0     1
  [11] .rela.debug_info  RELA             0000000000000000  0000f4f8
       0000000000004d70  0000000000000018          22    10     8
  [12] .debug_ranges     PROGBITS         0000000000000000  0000bec4
       00000000000002d0  0000000000000000           0     0     1
  [13] .debug_pubnames   PROGBITS         0000000000000000  0000c194
       0000000000000218  0000000000000000           0     0     1
  [14] .rela.debug_pubna RELA             0000000000000000  00014268
       0000000000000180  0000000000000018          22    13     8
  [15] .debug_pubtypes   PROGBITS         0000000000000000  0000c3ac
       000000000000007f  0000000000000000           0     0     1
  [16] .rela.debug_pubty RELA             0000000000000000  000143e8
       0000000000000060  0000000000000018          22    15     8
  [17] .note.GNU-stack   PROGBITS         0000000000000000  0000c42b
       0000000000000000  0000000000000000           0     0     1
  [18] .eh_frame         PROGBITS         00007f49bcdd2018  0000c430
       0000000000000570  0000000000000000   A       0     0     8
  [19] .rela.eh_frame    RELA             0000000000000000  00014448
       0000000000000300  0000000000000018          22    18     8
  [20] .debug_line       PROGBITS         0000000000000000  0000c9a0
       0000000000000e5a  0000000000000000           0     0     1
  [21] .rela.debug_line  RELA             0000000000000000  00014748
       0000000000000180  0000000000000018          22    20     8
  [22] .symtab           SYMTAB           0000000000000000  0000d800
       00000000000005e8  0000000000000018           1    10     8
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), l (large)
  I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
  O (extra OS processing required) o (OS specific), p (processor specific)

I'm guessing registering (a copy of) a real object file through the JIT debugging interface can probably show this behavior.

@rocallahan
Copy link
Collaborator

Thanks for explaining that. I think I'll just make these buffers dynamic; I can test that locally even if I can't reproduce the original problem.

yuyichao referenced this issue in JuliaLang/julia Mar 7, 2016
This was caused by the call to jl_has_intrinsics in the LambdaInfo
constructor, which looked up symbols inside the function too soon.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants