dwarf: Fix incorrect cfi execution #54

yuyichao · 2017-10-28T15:51:49Z

Ref my mailing list post yesterday http://lists.nongnu.org/archive/html/libunwind-devel/2017-10/msg00005.html. I have since figured out the reason for the signal resume test failure (#51) and I think I also understand what the comment means by "For execution resume". It seems to basically mean signal frame and since that's how this is referred to in the code elsewhere, I changed the comment a little to hopefully make it more clear. Therefore, I'm not fairly confident that this is the right thing to do and shouldn't cause any new test failures afaict.

More detail description of the change from the commit message follows

During unwinding/resuming execution of a normal call frame,
it is not only necessary to use the previous instruction to lookup the unwind info
but also when executing the cfi program. Although the call usually don't modify
any unwinding state, it can happen for noreturn call or when the callee cleanup the stack.
In these cases, the next instruction after the call may have a cfi adjusting the state
(e.g. stack pointer) and such instruction should be executed.

3d9a694 worked around this issue by treating cfi_restore_state
specially. It works when the compiler use that instruction to restore the state, i.e.

    .cfi_remember_state
    je .L0
    push ...
    .cfi_def_cfi_offset <new_value>
    call noreturn
.L0
    .cfi_restore_state

which is what GCC ususally does. However, it is not necessarily the case and clang/LLVM doesn't
do that. Instead LLVM emits the following unwind info which is also perfectly valid but is not
handled by the special case.

    je .L0
    push ...
    .cfi_def_cfi_offset <new_value>
    call noreturn
.L0
    .cfi_def_cfi_offset <old_value>

e9e8ed7 also worked around this issue for another special case.

This patch fix this issue for all cfi types by adjusting the end_ip based on the type of the
current frame instead, similar to what's done in fetch_proc_info.
Since this requires using the same use_prev_instr value after fetch_proc_info returns,
the patch also remove the need_unwind_info parameter to the function and move the code updating
use_prev_instr after all use of the old value are done.

During unwinding/resuming execution of a normal call frame, it is not only necessary to use the previous instruction to lookup the unwind info but also when executing the cfi program. Although the call usually don't modify any unwinding state, it can happen for noreturn call or when the callee cleanup the stack. In these cases, the next instruction after the call may have a cfi adjusting the state (e.g. stack pointer) and such instruction should be executed. 3d9a694 worked around this issue by treating `cfi_restore_state` specially. It works when the compiler use that instruction to restore the state, i.e. ``` .cfi_remember_state je .L0 push ... .cfi_def_cfi_offset <new_value> call noreturn .L0 .cfi_restore_state ``` which is what GCC ususally does. However, it is not necessarily the case and clang/LLVM doesn't do that. Instead LLVM emits the following unwind info which is also perfectly valid but is not handled by the special case. ``` je .L0 push ... .cfi_def_cfi_offset <new_value> call noreturn .L0 .cfi_def_cfi_offset <old_value> ``` e9e8ed7 also worked around this issue for another special case. This patch fix this issue for all cfi types by adjusting the `end_ip` based on the type of the current frame instead, similar to what's done in `fetch_proc_info`. Since this requires using the same `use_prev_instr` value after `fetch_proc_info` returns, the patch also remove the `need_unwind_info` parameter to the function and move the code updating `use_prev_instr` after all use of the old value are done.

The issue is the most significant with FPO enabled and on x86 but can in principle happen in other cases too. Upstream PR libunwind/libunwind#54

djwatson · 2017-10-31T16:07:22Z

@unkadoug Want to take a look? It looks like this will fix reg_states_iterate also.

unkadoug

This change solves the problem I was having after change 3d9a694.

I don't understand the need for the variable next_use_prev_instr in find_reg_state and in dwarf _reg_states_iterate. It seems that replacing it with c->use_prev_instr would break nothing. Can you help me understand the need for this variable?

yuyichao · 2017-10-31T21:44:40Z

It seems that replacing it with c->use_prev_instr would break nothing.

Doing that will cause the unwinding to use a different version of the value than the one used for looking up unwind info.

unkadoug · 2017-10-31T21:59:51Z

In find_reg_state, you use next_use_prev_instr to save a vaiue before calling create_state_record_for, which calls parse_fde, which depends on c->use_prev_instr. So you're right in that case. However, I don't believe that create_state_record_for modifies dci->signal frame, so you could just rewrite two lines in find_reg_state as:

ret = create_state_record_for (c, sr, ci->ip);
c->use_prev_instr = ! dci->signal_frame

and eliminate the need for next_use_prev_instr there.

In dwarf_reg_states_iterate, the case for next_use_prev_instr is weaker, since dwarf_reg_states_table_iterate does not depend on c->use_prev_instr. But, suppose you decide that reg_states_table_iterate should depend on c->use_prev_instr (should it?). Then the problem can be addressed the same way, by examining ! dci->signal_frame after invoking the iterate function instead of before, and again making next_use_prev_instr unnecessary.

yuyichao · 2017-10-31T22:11:07Z

I did not check each user of the value and I'm just doing the same thing for all users to make it clear that the update should be done after all users are done.
If there's any functions in there that will not use the previous value now and in the future, that can be documented in the code and have the code changed for that. I certainly don't have that knowledge so I'm just going with the change that's the easiest to reason about...

unkadoug · 2017-10-31T22:24:44Z

I'll just restate that you don't need that variable, and I think I've explained why. Dave Watson can decide whether or not that matters to him, but I'd want to get rid of it. Other than that, the change looks okay to me, and thank you for offering it.

djwatson · 2017-11-01T16:14:22Z

Yea I noticed the extraneous variable also, but we can clean that up later. This is a great fix. Merging, thanks

yuyichao added a commit to JuliaLang/julia that referenced this pull request Oct 28, 2017

Add libunwind patch to fix dwarf unwinding

1fa0947

The issue is the most significant with FPO enabled and on x86 but can in principle happen in other cases too. Upstream PR libunwind/libunwind#54

yuyichao mentioned this pull request Oct 28, 2017

Add libunwind patch to fix dwarf unwinding JuliaLang/julia#24379

Merged

unkadoug reviewed Oct 31, 2017

View reviewed changes

djwatson merged commit f248ac0 into libunwind:master Nov 1, 2017

yuyichao deleted the cfi_bound branch November 1, 2017 16:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dwarf: Fix incorrect cfi execution #54

dwarf: Fix incorrect cfi execution #54

yuyichao commented Oct 28, 2017

djwatson commented Oct 31, 2017

unkadoug left a comment

yuyichao commented Oct 31, 2017

unkadoug commented Oct 31, 2017

yuyichao commented Oct 31, 2017

unkadoug commented Oct 31, 2017

djwatson commented Nov 1, 2017

dwarf: Fix incorrect cfi execution #54

dwarf: Fix incorrect cfi execution #54

Conversation

yuyichao commented Oct 28, 2017

djwatson commented Oct 31, 2017

unkadoug left a comment

Choose a reason for hiding this comment

yuyichao commented Oct 31, 2017

unkadoug commented Oct 31, 2017

yuyichao commented Oct 31, 2017

unkadoug commented Oct 31, 2017

djwatson commented Nov 1, 2017