-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CHERI, take 2021.10 #402
CHERI, take 2021.10 #402
Conversation
src/aal/aal_cheri.h
Outdated
|
||
#include "../ds/defines.h" | ||
|
||
#include <cheri.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use cherintrin.h instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two files seem to define the same macros. Why do both exist?
We should probably avoid either and use the builtins directly here until the headers are stable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We added cheriintrin.h as a header that simply exposes the intrinsics+permission macros under shorter names. The cheri.h header also defines macros for the non-purecap case where they are simply no-ops. I can't find a link to the discussion but I if I recall correctly we wanted a *intrin.h
header that only provides macros if the feature is supported without a fallback to different behaviour. Additionally cheri.h provides inline functions with fixed void* argument types whereas the builtins are overloaded to preserve the argument type + constness (and also accept __intcap).
I would consider cheriintrin.h stable now, but using the builtins directly sounds might be the simplest solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved to use the builtins everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is surprisingly simple. I guess all the ground work has made the final push for CHERI support pretty simple.
src/backend/address_space_core.h
Outdated
@@ -20,6 +20,14 @@ namespace snmalloc | |||
* | |||
* It cannot unreserve memory, so this does not require the | |||
* usual complexity of a buddy allocator. | |||
* | |||
* TODO: This manages pieces of memory smaller than (1U << MIN_CHUNK_BITS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we also allocate allocators (CoreAlloc
) from here? We might also do slightly more once I rebuild the PalNotification
mechanism. This was allocating meta-data objects.
Also, there is issues that in CHECK_CLIENT I have placed meta-data in a larger range with guard pages to make OOB accessing the meta-data harder, which would also need engineering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment updated to be a little more CHERI-specific.
src/aal/aal_cheri.h
Outdated
|
||
#include "../ds/defines.h" | ||
|
||
#include <cheri.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two files seem to define the same macros. Why do both exist?
We should probably avoid either and use the builtins directly here until the headers are stable.
* capability. It does not carry provenance: it is not a capability, but | ||
* merely an address. | ||
*/ | ||
typedef ptraddr_t address_t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks as if it comes from stddef.h
, which isn't included here. Given that the base AAL tells you the address width, it might be a better idea to use that rather than depend on a non-standard declaration from a standard header - the glibc version will undoubtedly hide this behind some feature-check macro and so our headers will be broken by random build-system flags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The correct place for ptraddr_t is <stddef.h>, CheriBSD also provides it in sys/sys/_stdint.h
and in sys/sys/stddef.h
. I take the view that any CHERI-aware <stddef.h> must provide ptraddr_t when building in __CHERI_PURE_CAPABILITY__
mode, and I would hope that the GLibc port does that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm include
-ing <stddef.h>
now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a fixed-with integer type would also be slightly wrong, as it's not necessarily the same type as ptraddr_t. On Darwin, uint64_t is unsigned long long but size_t is unsigned long, so ptraddr_t would likely also be unsigned long.
src/aal/aal_riscv.h
Outdated
// The "Zihintpause" extension steals some dead space of the "fence" | ||
// instruction and so should be available everywhere even if it doesn't do | ||
// anything on a particular microarchitecture. Our assemblers don't | ||
// understand it, yet, tho', so thus this hilarity. | ||
__asm__ volatile(".byte 0xF; .byte 0x0; .byte 0x0; .byte 0x1"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, Zihintpause was the first extension ratified under the RISC-V 'fast track' process, which meant that no one who understood the point of the thing that they were trying to implement was able to give feedback. As such, it doesn't actually do what snmalloc needs. It is defined to pause the HART for an impdef number of cycles. We want a hint to tell the processor not to keep filling the pipeline with instructions on the assumption that a read of an address remains the same multiple times. The Arm, x86, and Loongarch variants of this instruction have the right semantics.
It may be that RISC-V implementers do the right thing instead of what the spec says but until we have some examples of this I'd rather we didn't put anything here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somehow it seems even worse. The Zihintpause spec is... vague, not even promising to pause the HART, and focused on instruction retirement rather than on instruction dispatch, which seems odd to me:
indicates the current hart’s rate of instruction retirement should be temporarily reduced or paused.
The duration of its effect must be bounded and may be zero. No architectural state is changed.
We should, at the very least, ensure that CHERI-RISC-V cores DTRT. However, I have replaced this with a larger block comment.
inline static address_t | ||
signed_prev(address_t curr, address_t next, const FreeListKey& key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this take a pointer type, rather than an address type? I would expect this to be the place to add in sealing of freelist pointers in CHERI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only ever traverse the next
links; the prev
ones are just there for probabilistic integrity checking. By that token, I thought we'd eventually seal the forward pointers but leave the backward links just addresses. Haven't thought very hard about this yet, so could be convinced otherwise.
src/pal/pal_freebsd.h
Outdated
@@ -3,6 +3,10 @@ | |||
#if defined(__FreeBSD__) && !defined(_KERNEL) | |||
# include "pal_bsd_aligned.h" | |||
|
|||
# if defined(__CHERI_PURE_CAPABILITY__) | |||
# include <cheri/cheric.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is unavoidable, but I don't really like all of the things that this header leaks into the global namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cheri/cherireg.h
brings slightly less, so I've jumped to that instead.
f271143
to
d4e35e1
Compare
.github/workflows/main.yml
Outdated
triple: riscv64-linux-gnu | ||
rtld: ld-linux-riscv64-lp64d.so.1 | ||
extra-packages: binutils-riscv64-linux-gnu | ||
ld: /usr/bin/riscv64-linux-gnu-ld |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is all to work around the fact that riscv on Linux requires linker relaxations (even crt1.o requires it, so it's not a thing we can work around in our own compilation commands), and llvm doesn't have those implemented, so we have to use the GNU tools.
The |
ci/Toolchain.cmake
Outdated
|
||
set(CROSS_LINKER_FLAGS "-Wl,--dynamic-linker=/usr/${triple}/lib/$ENV{RTLD_NAME},-rpath,/usr/${triple}/lib") | ||
if((DEFINED SNMALLOC_LINKER) AND NOT ("${SNMALLOC_LINKER}" MATCHES "^$")) | ||
string(PREPEND CROSS_LINKER_FLAGS "-fuse-ld=${SNMALLOC_LINKER} ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will trigger a warning with newer llvm where they want you to use --ld-path=/path/to/foo
(either with or without -fuse-ld=bfd to select the linker type)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How's what's there now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
Hm. While this appears to pass the test suite, please do not merge quite yet. I appear to be handling |
We'll want user_address_control_type in some particular PALs, so it can't live in pal.h. While here, make the spelling be capptr::is_spatial_refinement.
Even on debug builds, these little things should be inlined
There is no such thing as "struct Slab" any more. We use alignof(RemoteAllocator) below, so we already require the complete type definition at this point.
capptr_rebound was only ever going to be used for external_pointer, which now operates entirely using pointer_offset. So instead, just make external_pointer use capptr::AllocWild<void>, capptr_from_client, and a new capptr_reveal_wild.
Avoid computing bits::next_pow2_bits(1 << n). Even if the compiler can see through enough of the algebra, it's surely more direct to just use n. While here, slightly expand documentation about what's going on with the "sizeclass" encoded into MetaEntry-s.
- Grab a larger second allocation on the first allocator to dodge the sizeclass of the prior alloc on that allocator *and* any implicit, bootstrapping slabs that get opened (e.g., for remote queue message stubs). - De-FAST_PATH the domestication function. No need to always inline it, here. - Document things a little better
The use of void* had let an overzealous unsafe_ptr() leak a pointer with address space control to the client (in LocalAllocator::alloc_not_small, specifically). Correct this to call capptr_chunk_is_alloc() (to capture our intent) and capptr_to_user_address_control() (to do the bounding) and defer the conversion to void* until the very periphery of the allocator, using capptr_reveal() (again, to capture intent).
This preserves the chunk pointer through the use of a chunk as a slab. It does grow the structure by one pointer, but on non-CHERI it is still padded to 64 bytes, even with CHECK_CLIENT guards in place: 0: MetaCommon chunk pointer 8: next pointer 16: builder head[0] 24: builder head[1] 32: builder tail[0] 40: builder tail[1] 48: builder length[0] (uint16_t) 50: builder length[1] (uint16_t) 52: padding (4 bytes) 56: needed (uint16_t) 58: sleeping (bool) (Sadly, on CHERI, even without CHECK_CLIENT guards and with no padding, there are now four pointers in the structure -- chunk, next, head, tail -- plus five extra bytes. We will likely wish to explore encoding the head and tail offsets relative to the chunk pointer.) This lets us remove the "subversive amplification" in dealloc() in favor of just preserving the chunk pointer. Speaking of, be sure to assign that in all the right places, and ASSERT that we've got it right.
This dates back to the much earlier design that required the use of an authority map. Since the current CHERI design does not, go ahead and ask the platform whenever underlying allocation requests are sufficiently aligned.
So make signed_prev return one. This distinction only matters on CHERI, of course; everywhere else address_t *is* a uintptr_t.
1c64e95
to
4126e95
Compare
This adds a CHERI AAL and expands the FreeBSD PAL to cover CHERI. It updates a comment in ds/address.h now that there is an example architecture that differentiates uintptr_t and address_t.
The correct thing to do, of course, is to fix these upstream, but that requires understanding exactly what's wrong, and that's harder than just not tickling the bugs.
Cumulative differences since last review: https://github.com/microsoft/snmalloc/compare/d4e35e1..8dfbb44 ; the most semantically meaningful one is 83dc540 and the rest is just some noise of tidying (esp 1e2f7ff). I think this take on CI addresses Alex's comment, above. |
|
||
set(CROSS_LINKER_FLAGS "-fuse-ld=${SNMALLOC_LINKER_FLAVOUR} -Wl,--dynamic-linker=/usr/${triple}/lib/$ENV{RTLD_NAME},-rpath,/usr/${triple}/lib") | ||
if((DEFINED SNMALLOC_LINKER) AND NOT ("${SNMALLOC_LINKER}" MATCHES "^$")) | ||
string(APPEND CROSS_LINKER_FLAGS " --ld-path=${SNMALLOC_LINKER}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This requires Clang 12, but I assume the variable will only be set when that requirement is met?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea; despite the generic name, AFAICT this file is used only in cross-build jobs, where we're already requiring clang-13
.
Getting smaller every time; down to <500 changed lines now! Passes most of the test harness on my desk, even (I am unaware of it failing any but I'm impatient and some of the tests seem to trigger some performance pathologies in CHERI
qemu
). Let's see what CI thinks.For the humans looking at this PR, this is something of a rampage and I'm happy to split the stack into multiple PRs if desired (already #401 is included here). In rough overview, from bottom to top of the patch stack:
Metaslab
/ChunkRecord
to include aMetaCommon
that holds a high authority pointer to the chunk for CHERIPerhaps best reviewed by commit rather than by cumulative delta.