Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using TBI and Address Math for detecting use-after-free in heap #9497

Open
Verdagon opened this issue Jul 31, 2021 · 1 comment
Open

Using TBI and Address Math for detecting use-after-free in heap #9497

Verdagon opened this issue Jul 31, 2021 · 1 comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@Verdagon
Copy link

Verdagon commented Jul 31, 2021

I think there could be another way that Zig could detect use-after-free, with a different set of tradeoffs from the current method. I hope this issue will spark some ideas; this issue's "proposal" is really just a starting point, I suspect that with more Zig knowledge one could refine this to something much better than what I have written here!

This method is partially based on Top-Byte-Ignore (https://arxiv.org/ftp/arxiv/papers/1802/1802.09517.pdf), available for certain processors. TL;DR of their approach: Every 16 bytes chunk in memory has a corresponding 4 bits which contains a (pseudo-)random "tag", changed whenever we deallocate that 16 bytes. If we ever have a pointer pointing at that chunk, then the top byte of that pointer will contain the same tag value. Whenever we dereference the pointer, we assert that the pointer's tag value and the allocation's tag value still match.

The method described there requires 4 bits per 16-byte chunk, and the check is done on every load and store. They did it at the LLVM/instrumentation level, but I think a language could offer more opportunities here:

  1. A language will be able to more intelligently decide when it really needs these checks to happen.
  2. A language could use a tag larger than 4 bits.
  3. A language could decide a better place might be to put the tag for an allocation. Preferably somewhere with better locality!

So here's a possible idea for #2 and #3, inspired by some things from Vale's generational references.

First, we'd have a custom allocator instead of malloc in this mode (Zig already does this, IIRC). Let's assume that every 4kb page has allocations of the same size (only 16-byte allocations in one 4kb page, only 32-byte allocations in another, etc).

The allocator can decide what virtual address to map a new physical page to, and therefore determines the addresses of the allocations. We'll use that to give some guarantees, using the (arbitrarily chosen) "middle byte", byte #5.

When the allocator decides what virtual address to map a new physical page to, it might use these rules:

  1. For the pages in which it will put 16-byte allocations, the middle byte should be 5.
  2. For the pages in which it will put 32-byte allocations, the middle byte should be 6.
  3. For the pages in which it will put 64-byte allocations, the middle byte should be 7.
  4. For the pages in which it will put 128-byte allocations, the middle byte should be 8.
    and so on.

At the top of these allocations, it will put one pseudo-random non-zero byte to serve as our tag. When the allocator returns the address of a shiny new allocation to the user, it will include this tag in the top byte of that returned pointer.

Whenever Zig dereferences such a pointer, if the top byte is non-zero, it will use the middle byte to find the allocation's current tag and compare that to the pointer's top byte.

To find the top of the allocation (and therefore the tag):

  1. If middle byte is 5, zero the last 4 bits and subtract one ((addr & ~0xF) - 1).
  2. If middle byte is 6, zero the last 5 bits and subtract one ((addr & ~0x1F) - 1).
  3. If middle byte is 7, zero the last 6 bits and subtract one ((addr & ~0x3F) - 1).
  4. If middle byte is 8, zero the last 7 bits and subtract one ((addr & ~0x7F) - 1).
    and so on.
    We can generalize this to:
    (addr & ((1 << middlebyte) - 1)) - 1)

As mentioned, we would only do this check if the top byte is non-zero. It might look like this:

if (ptr_top_byte) {
  if (*(addr & ((1 << middlebyte) - 1)) - 1) != ptr_top_byte) {
    exit(1);
  }
}

Some benefits from this scheme so far:

  • The tag is closer to the object, better locality and hopefully saving us some cache misses.
  • We could only have one tag per allocation, rather than per 16 bytes. (or sometimes not, see below notes on padding)

Some drawbacks so far:

  • A branch to check for that zero, with no obvious way to hint the branch predictor for it (the second branch is easily hinted).
  • 6-10 instructions in that little equation, which could be expensive to do on every dereference.
  • Depending on allocator implementation details, that 1 byte tag might cost us a lot of padding, or might not.
  • This only works on CPUs with TBI. On other CPUs, we might need to mask that byte off ourselves.
  • All our bitwise trickery might make LLVM too nervous to do certain optimizations.

Some potential improvements/mitigations:

  • We could simplify that first branch away using some clever multiplication and addition. But, that would probably increase the number of instructions.
    • Even with more instructions, perhaps the CPU would see that these calculations are only to trigger an unexpected branch, and not used in any further calculations; it could continue executing subsequent code in parallel, and might not be as much of a slowdown as we think.
  • If we could find some guarantees about stack allocation and externally malloc'd data, then we mightn't need that first branch.
  • If the allocation's tag byte can be at the end of an allocation instead of before, it can often fit inside the allocation's trailing padding.

Can I confidently say that this locality would be worth the extra instructions and possible branching? Nope! But it's an interesting possible approach. Vale has shown us that combining weird ideas with compiler flexibility and profiling will often show us crazy efficient solutions, previously hidden. In this case, we were able to use single ownership and static analysis to eliminate most these checks as unnecessary, and then adjust struct layouts to eliminate that first branch and reduce the above 6-10 instructions to only four. So, unexpected flexibilities might be a factor in this idea's potential for Zig, who knows!

@kubkon kubkon added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Aug 2, 2021
@kubkon kubkon added this to the 1.1.0 milestone Aug 2, 2021
@lin72h
Copy link

lin72h commented Jul 9, 2023

It think this a good direction to explore. Intel has something like TBI in the coming CPU.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests

3 participants