LSE-334: Fallback for getting locks from stack frames #99

brenns10 · 2024-08-12T18:43:56Z

This pull request augments the existing lock module. Previously, it could only work with DWARF debuginfo, using the CFI to identify the mutex, sem, or rwsem that a task was blocked waiting on. Now, it can also function in cases where the DWARF debuginfo fails, as a fallback. What's more, it also works with CTF debuginfo.

The approach here is based on @imran-kn's idea that you can guess that a pointer corresponds to a lock, and then you can verify that by checking to see whether the current task is on the list of waiters for that lock. The catch here is that you need to be really careful about it: it's not enough to just do a "try/except" for FaultErrors. You also need to ensure that the linked list nodes are valid (so it's less likely that you're dereferencing random pointers). And even if the linked list nodes are valid, you could end up with an infinite loop, so we also need to track the set of already visited nodes in the list. If we do all this, we can be confident that either we will (a) dereference some junk and generate a FaultError, (b) encounter a pointer to something that's not a valid linked list node, so we'll get a ValidationError, or (c) we'll make a cycle in the list and raise a ValidationError because that's not valid data either.

With the careful checking described above, there's no need to have complex tables of offsets! We just need to find the start and stop location on the stack, and search every word on the stack for a valid lock. We can speed things along by checking what kind of address we're looking at, and only testing the pointer if it could be a valid lock pointer (skipping userspace addresses, text addresses, etc). This was @biger410's idea, if he hadn't mentioned it, I would still be generating tables of offsets...

To test this, I've created a large crop of vmcores with kernel modules loaded. These modules spawn several kthreads that wait for a lock, and then the test case can find these threads and assert that it is able to find the lock from the stack frame, both with and without the fallback.

Tests pass on all vmcores internally (including the newly uploaded ones). Tests pass on Github CI as well, though there's not much testing of this specific functionality.

I think this will probably replace #81.

The OL8 UEK6 x86_64 builds used a broken CTF toolchain for a long time, resulting in broken CTF. However, the toolchain used to build aarch64 was newer and not affected by the bug. Mark it as compatible so we can use drgn with CTF on aarch64 for this release. Signed-off-by: Stephen Brennan <[email protected]>

For module debuginfo, we use a heuristic to detect whether debuginfo is loaded. This can fail and it's also extra overhead. When we load CTF, we know that we've got debuginfo for all modules. Cache this fact and use it to special case the module_has_debuginfo() check. This fixes test failures which happen due to a failed check. The test failures result in rather cryptic errors due to missing type fields. EG: AttributeError: 'struct files_struct' has no member 'fdt' These get caused because when CTF is loaded, the failed check will result in the debuginfo code loading a DWARF module debuginfo file, which can cause conflicts in the type lookups. Signed-off-by: Stephen Brennan <[email protected]>

Signed-off-by: Stephen Brennan <[email protected]>

We currently use the "frame.name" which may not be present when using CTF. Instead, we should fallback to the symbol name. The frame_name() function is not appropriate, since this may include additional information, such as the module containing the symbol. So introduce a new helper, func_name() for this case. Signed-off-by: Stephen Brennan <[email protected]>

aarch64 doesn't guarantee provide the value of SP when doing a frame pointer based unwind. Don't fail on this: instead fall back to FP. Fixes internal issue tracked as LSE-374. Signed-off-by: Stephen Brennan <[email protected]>

There's no reason to be confident that a variable value will be valid when we're formatting it in the stack trace. It is possible that it will contain stale values that may result in FaultErrors. Be sure to catch this and simply print a message saying the FaultError happened. This ensures bt does not crash in these exceptional cases. Signed-off-by: Stephen Brennan <[email protected]>

Signed-off-by: Stephen Brennan <[email protected]>

There's no reason to exempt UEK4 from this test: it's not that drgn can't do stack traces on UEK4, it's really that "crashed_thread()" doesn't work on UEK4 vmcores. Let's work around that so we can still smoke test the actual bt operation. Signed-off-by: Stephen Brennan <[email protected]>

Signed-off-by: Stephen Brennan <[email protected]>

For mutexes, locks, and semaphores, we are able to reliably test whether any given pointer actually corresponds to the correct lock. We simply pretend that it is valid, and check to see whether the current task is present on the list of waiters. Imran initially implemented the is_task_blocked_on_lock() function with this great idea. We don't know where lock pointer will actually be on the stack: that's what DWARF would tell us. We could hard-code the stack offsets we've observed in the past, but this is not very maintainable. It takes a lot of code to implement and it takes a lot of time & resources to check every kernel. Plus, we would need to check each new kernel as they are released. The alternative, as suggested by Junxiao, is much simpler: just check every stack offset from the top of the stack to the mutex/sem lock function. We can eliminate many addresses by the fact that they may not be kernel memory addresses. For the remaining addresses, so long as we are careful (validating that we are really following a linked list, and not going into a loop), there should be no problem testing them to see if they are really locks. This commit implements the approach. Signed-off-by: Stephen Brennan <[email protected]> Co-authored-by: Imran Khan <[email protected]> Suggested-by: Junxiao Bi <[email protected]>

We've generated vmcores for every combination of OL, UEK, and architecture. Additionally, for combinations where there are multiple offsets, we've generated a vmcore on the oldest and newest version. Each vmcore is generated after first loading the "lockmod" kernel module, so that we can test loading the lock value out of the stack frame. The test is written to test the fallback on DWARF (in case the stack frame lookup fails) and CTF. Signed-off-by: Stephen Brennan <[email protected]>

biger410 · 2024-08-16T22:10:24Z

drgn_tools/locking.py

+    # First, we need to get the start and end address to search. This is
+    # arch-specific.
+    if prog.platform.arch == Architecture.X86_64:
+        candidates = range(task.thread.sp, frame.sp, 8)


frame.sp is the bottom of the stack?

Yeah, it's the bottom of this stack frame. This reliably works well though! I think it's because x86 always spills the register beneath the stack frame anyway. With aarch64, it seems that the register gets stored within the stack frame.

I could try to get the next stack frame up, but this function doesn't actually accept a stack trace... just the single stack frame. I could try to use the workaround with the frame pointer that aarch64 does. In practice, we always find the lock before we get to the upper end up the range anyway, so it wouldn't be much of an issue.

so task.thread.sp is the bottom of the stack, then the scan happened to the parents stack frame of current one, could there be a case that lock address is stored in the children stack frame of current one?

I guess I'm not sure what you mean. The task.thread.sp is the "bottom" of the stack, that is, it is where the stack pointer was stored just prior to context switching away. Then, frame.sp is the stack pointer for the current function (e.g. mutex_lock()). So, the stack looks something like this:

HIGHER ADDRESSES / OLDER FRAMES HERE [parent caller frame] ... [mutex_lock() frame] 0xDEADBEEF 0xF000B000 ... vars ... <---- frame.sp [schedule_preempt_disabled() frame] ... [schedule() frame] ... vars ... ... bottom of the stack ... <--- task.thread.sp LOWER ADDRESSES / NEWER FRAMES HERE

So we start at task.thread.sp and increment the address until we reach frame.sp. Since frame.sp is technically pointing at the bottom of the stack frame for mutex_lock(), it's possible that the lock could be somewhere in that frame and we'd miss it. However in practice that just doesn't happen, the lock has always been beneath the frame.sp value.

My fault, thought task.thread.sp is the other end of the stack.
I'm OK with current approach, we can always fix it if lock is stored in current stack frame in future kernels.

brenns10 · 2024-08-16T23:08:50Z

Thanks @biger410! I'll leave this open until Monday morning PST so @imran-kn can have a chance to raise any issues (since some of the code is his), but I'll merge if I don't hear anything. It would be great to get the 1.0 release branch started!

imran-kn

LGTM. Super cool stuff @brenns10 : )

oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Aug 12, 2024

brenns10 force-pushed the fallback_var_stack branch from 08dcde6 to 3bf92ae Compare August 12, 2024 19:35

brenns10 requested a review from imran-kn August 12, 2024 21:01

brenns10 and others added 14 commits August 14, 2024 09:10

tests: mm: don't fail test over missing variable

d298b75

Signed-off-by: Stephen Brennan <[email protected]>

tests: fix the requirements to allow drgn 0.0.27

84f28a5

Signed-off-by: Stephen Brennan <[email protected]>

lsmod: fix typo

ba567b0

Signed-off-by: Stephen Brennan <[email protected]>

bt: use frame pointer when formatting aarch64 stack frames

c4c2117

aarch64 doesn't guarantee provide the value of SP when doing a frame pointer based unwind. Don't fail on this: instead fall back to FP. Fixes internal issue tracked as LSE-374. Signed-off-by: Stephen Brennan <[email protected]>

bt_has_any: remove redundant code

62e4b75

Signed-off-by: Stephen Brennan <[email protected]>

bt_has_any: disregard constprop / isra

84721a5

Signed-off-by: Stephen Brennan <[email protected]>

bt_has_any: add one_per_task argument

5d8b674

Signed-off-by: Stephen Brennan <[email protected]>

brenns10 force-pushed the fallback_var_stack branch from 3bf92ae to 6d07880 Compare August 14, 2024 16:27

biger410 reviewed Aug 16, 2024

View reviewed changes

biger410 approved these changes Aug 16, 2024

View reviewed changes

imran-kn approved these changes Aug 17, 2024

View reviewed changes

brenns10 merged commit 6a15f9e into oracle-samples:main Aug 19, 2024
5 checks passed

brenns10 deleted the fallback_var_stack branch August 19, 2024 15:53

brenns10 mentioned this pull request Aug 19, 2024

LSE-334: Add fallback method to locate locks in stack frame #81

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LSE-334: Fallback for getting locks from stack frames #99

LSE-334: Fallback for getting locks from stack frames #99

brenns10 commented Aug 12, 2024 •

edited

Loading

biger410 Aug 16, 2024

brenns10 Aug 16, 2024

biger410 Aug 16, 2024

brenns10 Aug 16, 2024

biger410 Aug 16, 2024

brenns10 commented Aug 16, 2024

imran-kn left a comment

LSE-334: Fallback for getting locks from stack frames #99

LSE-334: Fallback for getting locks from stack frames #99

Conversation

brenns10 commented Aug 12, 2024 • edited Loading

biger410 Aug 16, 2024

Choose a reason for hiding this comment

brenns10 Aug 16, 2024

Choose a reason for hiding this comment

biger410 Aug 16, 2024

Choose a reason for hiding this comment

brenns10 Aug 16, 2024

Choose a reason for hiding this comment

biger410 Aug 16, 2024

Choose a reason for hiding this comment

brenns10 commented Aug 16, 2024

imran-kn left a comment

Choose a reason for hiding this comment

brenns10 commented Aug 12, 2024 •

edited

Loading