Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PE support for x86_64 #13

Merged
merged 5 commits into from
Dec 14, 2023
Merged

Add PE support for x86_64 #13

merged 5 commits into from
Dec 14, 2023

Conversation

afranchuk
Copy link
Contributor

It also expands UnwindRegs to allow all gp registers and adds caching for common PE unwind cases.

Related: #3.

pub fn registers(&self) -> Vec<Reg> {
use Reg::*;

let mut regs = vec![RBX, RBP, RDI, RSI, R12, R13, R14, R15];
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't fully understood this yet; is this similar to the register permutation encoding that OPCODE_KIND_X86_FRAMELESS_IMMEDIATE uses? Could we reuse code from https://github.com/mstange/macho-unwind-info/blob/main/src/opcodes/permutation.rs ?

If this only supports one fixed order of registers, can you write a comment to that effect somewhere?

Copy link
Contributor Author

@afranchuk afranchuk Dec 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is the same algorithm to compress the registers. It supports one arbitrary order used for encode and decode, which I've chosen there (this is the ordering of callee-saved registers that the microsoft docs list).

Clarify some types and documentation around PE unwinding.
@afranchuk afranchuk requested a review from mstange December 14, 2023 18:21
Copy link
Owner

@mstange mstange left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!

@mstange mstange merged commit 5b859b2 into mstange:main Dec 14, 2023
Copy link
Contributor

@ishitatsuyuki ishitatsuyuki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contributing PE support. I'm currently trying to integrate this into samply for use with Wine profiling, but I've saw a few design decisions here that I feel is not the best. There also seems to be a bug introduced by switching our parser from goblin to pe-unwind-info (mozilla/pe-unwind-info#4).

I'm planning to upstream design changes as I see fit, but I would like to also know what your use case require so that I'll not be breaking it.

///
/// The registers are stored in a separate compressed ordering to facilitate restoring register
/// values if desired. If not for this we could simply store the total offset.
OffsetSpAndPopRegisters(UnwindRuleOffsetSpAndPopRegisters),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason to record the full set of register into an unwind rule? AFAIU, the goal of framehop is to only get a stack trace, not recover all registers in the stack frame(s). I think this should just use OffsetSp, without extra bookkeeping for all the other registers. From my testing with a variety of binaries built with MSVC, MinGW or Clang, using the same strategy as Unix of tracking RIP, RBP and RSP only is sufficient to unwind all their frames.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use framehop to also get additional frame context (the registers) when unwinding. If this is a performance concern, it shouldn't be difficult to make this a configurable behavior (in fact I had originally intended to do that but forgot).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit of a deviation from framehop's initial mission statement. Alex wants the other registers for crash unwinding, and I relented and said "if I don't notice any perf regression from it, it's probably ok."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's tricky since if you wanted framehop to stay true to that mission, a library which unwound and restored registers in a very efficient manner could look almost identical, so it makes more sense to add to framehop. Admittedly you could also fork+rename+re-mission :P

@@ -19,6 +20,120 @@ pub enum UnwindRuleX86_64 {
},
/// (sp, bp) = (bp + 16, *bp)
UseFramePointer,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both Clang and MSVC will use "base pointers", a generalization of frame pointer, when both alloca and stack realignment are involved. This consists a small but often measurable amount of sampled stacks. My fork originally handled that case, and I still think we should have a rule for that.

https://github.com/ishitatsuyuki/framehop/blob/4b6f62a8b8a046a06c56db534dd79f10b1f32472/src/x86_64/unwind_rule.rs#L22-L26

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw that, but at the time had trouble finding any cases of it in a few example large binaries, so omitted it. It'd be good to add that.

None,
OffsetBy8(u16),
Pop(Reg),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from a few things desired to be added, including frame / base pointer support, there should be a way to express Noop in case of "save register" codes (these do not alter stack or frame pointer, except when restoring frame pointer). Right now any frames that have a "save" code is disqualified from the fast path.

Copy link
Contributor Author

@afranchuk afranchuk Feb 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a very easy change to make, good thinking! Notably, this is only applicable if we don't care about additional register values (so we should add that configuration as mentioned in the other comment).

UnwindInfo::parse(sections.unwind_info_memory_at_rva(unwind_info_address)?)
.ok_or(PeUnwinderError::UnwindInfoParseError)?;

// Check whether the address is in the function epilog. If so, we need to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instruction analysis actually only needs to happen for the first frame. Anything beyond should be function calls which will never be inside an epilog. It might make sense to add an is_first_frame parameter to the unwinder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. Though FWIW the FunctionEpilogParser is very likely bail out after looking at 1 or 2 instructions, but it would be faster to skip it entirely.

@afranchuk
Copy link
Contributor Author

@ishitatsuyuki do you mind making a new bug to cover the improvements you mention?

@mstange
Copy link
Owner

mstange commented Feb 29, 2024

@ishitatsuyuki Thank you so much for chiming in! I was going to do the samply changes and ask you to test them, but if you're already working on them, that's even better! Actually, is there an easy way for me to test the PE-on-Linux scenario? E.g. are there any cheap Proton-ified Steam games that ship with enough symbols?

@ishitatsuyuki
Copy link
Contributor

ishitatsuyuki commented Feb 29, 2024

Most Unity game is OK as Unity provides a symbol server. Those with an enterprise contract that allows custom build from source will not work, though. You will also need mstange/samply@3f01aaf as both PDB debug name resolution and local PDB lookup when absolute path is coded into PDBFilename is broken right now.

Aimlab seems to be both free and use a public Unity build.

For profiling you can just build the forks and attach to the game process. Following https://gist.github.com/ishitatsuyuki/09eef31e16e5247718063cb2390cdc37#use-nsenter-to-change-samplys-mount-namespace will additionally make Unix libraries unwind correctly.

The old branches are at https://github.com/ishitatsuyuki/framehop/tree/seh-poc and https://github.com/ishitatsuyuki/samply/tree/poc-old. The new branches are https://github.com/ishitatsuyuki/framehop/tree/poc-new and https://github.com/ishitatsuyuki/samply/tree/poc-new. Compared to the old branch, the new branch still seems to fail to unwind around 1% of the samples fully, even after the chained unwind info fix. I'm still working on identifying why.

@ishitatsuyuki
Copy link
Contributor

I open #21, #22, #23 to continue the discussions.

@ishitatsuyuki
Copy link
Contributor

The remaining unwinding instability seems to come from epilogue analysis. For example, both of the following in-function jumps are identified as end of function when they are not:

I thought I implemented epilog analysis in my fork at some point too, but looks like the branch I currently use have no epilog analysis and always use SEH unwind rules. The stacks are reasonably correct there, as expected.

There are a few thing I plan to work on to improve the accuracy:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants