Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LSE-334: Fallback for getting locks from stack frames #99

Merged
merged 14 commits into from
Aug 19, 2024

Conversation

brenns10
Copy link
Member

@brenns10 brenns10 commented Aug 12, 2024

This pull request augments the existing lock module. Previously, it could only work with DWARF debuginfo, using the CFI to identify the mutex, sem, or rwsem that a task was blocked waiting on. Now, it can also function in cases where the DWARF debuginfo fails, as a fallback. What's more, it also works with CTF debuginfo.

The approach here is based on @imran-kn's idea that you can guess that a pointer corresponds to a lock, and then you can verify that by checking to see whether the current task is on the list of waiters for that lock. The catch here is that you need to be really careful about it: it's not enough to just do a "try/except" for FaultErrors. You also need to ensure that the linked list nodes are valid (so it's less likely that you're dereferencing random pointers). And even if the linked list nodes are valid, you could end up with an infinite loop, so we also need to track the set of already visited nodes in the list. If we do all this, we can be confident that either we will (a) dereference some junk and generate a FaultError, (b) encounter a pointer to something that's not a valid linked list node, so we'll get a ValidationError, or (c) we'll make a cycle in the list and raise a ValidationError because that's not valid data either.

With the careful checking described above, there's no need to have complex tables of offsets! We just need to find the start and stop location on the stack, and search every word on the stack for a valid lock. We can speed things along by checking what kind of address we're looking at, and only testing the pointer if it could be a valid lock pointer (skipping userspace addresses, text addresses, etc). This was @biger410's idea, if he hadn't mentioned it, I would still be generating tables of offsets...

To test this, I've created a large crop of vmcores with kernel modules loaded. These modules spawn several kthreads that wait for a lock, and then the test case can find these threads and assert that it is able to find the lock from the stack frame, both with and without the fallback.


Tests pass on all vmcores internally (including the newly uploaded ones). Tests pass on Github CI as well, though there's not much testing of this specific functionality.

I think this will probably replace #81.

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Aug 12, 2024
@brenns10 brenns10 force-pushed the fallback_var_stack branch from 08dcde6 to 3bf92ae Compare August 12, 2024 19:35
@brenns10 brenns10 requested a review from imran-kn August 12, 2024 21:01
brenns10 and others added 14 commits August 14, 2024 09:10
The OL8 UEK6 x86_64 builds used a broken CTF toolchain for a long time,
resulting in broken CTF. However, the toolchain used to build aarch64
was newer and not affected by the bug. Mark it as compatible so we can
use drgn with CTF on aarch64 for this release.

Signed-off-by: Stephen Brennan <[email protected]>
For module debuginfo, we use a heuristic to detect whether debuginfo is
loaded. This can fail and it's also extra overhead. When we load CTF, we
know that we've got debuginfo for all modules. Cache this fact and use
it to special case the module_has_debuginfo() check.

This fixes test failures which happen due to a failed check. The test
failures result in rather cryptic errors due to missing type fields. EG:

    AttributeError: 'struct files_struct' has no member 'fdt'

These get caused because when CTF is loaded, the failed check will
result in the debuginfo code loading a DWARF module debuginfo file,
which can cause conflicts in the type lookups.

Signed-off-by: Stephen Brennan <[email protected]>
Signed-off-by: Stephen Brennan <[email protected]>
We currently use the "frame.name" which may not be present when using
CTF. Instead, we should fallback to the symbol name.

The frame_name() function is not appropriate, since this may include
additional information, such as the module containing the symbol. So
introduce a new helper, func_name() for this case.

Signed-off-by: Stephen Brennan <[email protected]>
aarch64 doesn't guarantee provide the value of SP when doing a frame
pointer based unwind. Don't fail on this: instead fall back to FP.
Fixes internal issue tracked as LSE-374.

Signed-off-by: Stephen Brennan <[email protected]>
There's no reason to be confident that a variable value will be valid
when we're formatting it in the stack trace. It is possible that it will
contain stale values that may result in FaultErrors. Be sure to catch
this and simply print a message saying the FaultError happened. This
ensures bt does not crash in these exceptional cases.

Signed-off-by: Stephen Brennan <[email protected]>
Signed-off-by: Stephen Brennan <[email protected]>
There's no reason to exempt UEK4 from this test: it's not that drgn
can't do stack traces on UEK4, it's really that "crashed_thread()"
doesn't work on UEK4 vmcores. Let's work around that so we can still
smoke test the actual bt operation.

Signed-off-by: Stephen Brennan <[email protected]>
For mutexes, locks, and semaphores, we are able to reliably test whether
any given pointer actually corresponds to the correct lock. We simply
pretend that it is valid, and check to see whether the current task is
present on the list of waiters. Imran initially implemented the
is_task_blocked_on_lock() function with this great idea.

We don't know where lock pointer will actually be on the stack: that's
what DWARF would tell us. We could hard-code the stack offsets we've
observed in the past, but this is not very maintainable. It takes a lot
of code to implement and it takes a lot of time & resources to check
every kernel. Plus, we would need to check each new kernel as they are
released.

The alternative, as suggested by Junxiao, is much simpler: just check
every stack offset from the top of the stack to the mutex/sem lock
function. We can eliminate many addresses by the fact that they may not
be kernel memory addresses. For the remaining addresses, so long as we
are careful (validating that we are really following a linked list, and
not going into a loop), there should be no problem testing them to see
if they are really locks.

This commit implements the approach.

Signed-off-by: Stephen Brennan <[email protected]>
Co-authored-by: Imran Khan <[email protected]>
Suggested-by: Junxiao Bi <[email protected]>
We've generated vmcores for every combination of OL, UEK, and
architecture. Additionally, for combinations where there are multiple
offsets, we've generated a vmcore on the oldest and newest version. Each
vmcore is generated after first loading the "lockmod" kernel module, so
that we can test loading the lock value out of the stack frame. The test
is written to test the fallback on DWARF (in case the stack frame lookup
fails) and CTF.

Signed-off-by: Stephen Brennan <[email protected]>
@brenns10 brenns10 force-pushed the fallback_var_stack branch from 3bf92ae to 6d07880 Compare August 14, 2024 16:27
# First, we need to get the start and end address to search. This is
# arch-specific.
if prog.platform.arch == Architecture.X86_64:
candidates = range(task.thread.sp, frame.sp, 8)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

frame.sp is the bottom of the stack?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's the bottom of this stack frame. This reliably works well though! I think it's because x86 always spills the register beneath the stack frame anyway. With aarch64, it seems that the register gets stored within the stack frame.

I could try to get the next stack frame up, but this function doesn't actually accept a stack trace... just the single stack frame. I could try to use the workaround with the frame pointer that aarch64 does. In practice, we always find the lock before we get to the upper end up the range anyway, so it wouldn't be much of an issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so task.thread.sp is the bottom of the stack, then the scan happened to the parents stack frame of current one, could there be a case that lock address is stored in the children stack frame of current one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I'm not sure what you mean. The task.thread.sp is the "bottom" of the stack, that is, it is where the stack pointer was stored just prior to context switching away. Then, frame.sp is the stack pointer for the current function (e.g. mutex_lock()). So, the stack looks something like this:

HIGHER ADDRESSES / OLDER FRAMES HERE
[parent caller frame]
...
[mutex_lock() frame]
0xDEADBEEF
0xF000B000
... vars ...                                       <---- frame.sp
[schedule_preempt_disabled() frame]
...
[schedule() frame]
... vars ...
... bottom of the stack ...                     <--- task.thread.sp
LOWER ADDRESSES / NEWER FRAMES HERE

So we start at task.thread.sp and increment the address until we reach frame.sp. Since frame.sp is technically pointing at the bottom of the stack frame for mutex_lock(), it's possible that the lock could be somewhere in that frame and we'd miss it. However in practice that just doesn't happen, the lock has always been beneath the frame.sp value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My fault, thought task.thread.sp is the other end of the stack.
I'm OK with current approach, we can always fix it if lock is stored in current stack frame in future kernels.

@brenns10
Copy link
Member Author

Thanks @biger410! I'll leave this open until Monday morning PST so @imran-kn can have a chance to raise any issues (since some of the code is his), but I'll merge if I don't hear anything. It would be great to get the 1.0 release branch started!

Copy link
Contributor

@imran-kn imran-kn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Super cool stuff @brenns10 : )

@brenns10 brenns10 merged commit 6a15f9e into oracle-samples:main Aug 19, 2024
5 checks passed
@brenns10 brenns10 deleted the fallback_var_stack branch August 19, 2024 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCA Verified All contributors have signed the Oracle Contributor Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants