Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collaboration with alloc-wg and @CAD97's foundation grant #6

Open
CAD97 opened this issue Jun 22, 2022 · 3 comments
Open

Collaboration with alloc-wg and @CAD97's foundation grant #6

CAD97 opened this issue Jun 22, 2022 · 3 comments

Comments

@CAD97
Copy link

CAD97 commented Jun 22, 2022

Oh hi! I hadn't seen this until I saw it pop up in rust-lang/rust#97052.

I'm part of alloc-wg and about two months ago I picked up the Storage API again in this repo: https://github.com/CAD97/storages-api (docs: https://cad97.github.io/storages-api/storage_api/index.html)

As part of this refinement and simplification push, I ended up converting the Storage trait to be untyped: that is, Storage::Handle is no longer a GAT, and notably this means that handle resolution can return regular references to [MaybeUninit<u8>] rather than pointers with difficult to explain invalidation rules.

There's some miscellaneous renaming involved, but also the simplification brought us down to a set of just four traits: Storage, SharedMutabilityStorage, PinningStorage, and MultipleStorage. I've been advised (and agree) that controlling complexity is important here for a path to std inclusion; even with this simplified API surface without GAT, it's still a contender for the most complex API exposed by std. And that complexity is safety critical.

I'm fairly convinced that the current shape in my repo is minimal for what we want to support, but I wasn't thinking the previous POC was terribly over-engineered1 either, so it's possible that a further simplification is hiding.

matthieu-m agrees that the untyping of storage is probably a good direction; the previous storage API was not typed due to some conscious decision due to necessity, but because it evolved from C++'s std::allocator design, which is typed, and matthiue-m just hadn't considered using an untyped interface. This does mean that usage of the storage traits becomes more unsafe/difficult, but this is easily addressed by adding a typed RawBox around Storage, RawVec around Storage of slices, and maybe RawArena around MultipleStorage? as a bridge between raw storage and high-level collections.

(It's also quite possible that matthieu-m overlooked the possibility of making handles untyped because of the relative newness of the pointee metadata APIs at the time, in order to draw the conclusion that S::Handle<T> can be replaced by (S::Handle, <T as Pointee>::Metadata).)

Anyway, getting to the point:

I received a Rust Foundation Community Grant. My project is sort of two projects; the first is unrelated work in the tracing ecosystem, but the second half (scheduled Oct..=Dec) is to develop and push the Storage API to an MCP. rust-lang/rust#97052 is actually an early start on prerequisite work for upstreaming storages, as otherwise a storage-generic Box can't be coerced.

pub struct RawBox<T: ?Sized, S: Storage> {
    handle: S::Handle,
    metadata: <T as Pointee>::Metadata, // -> JustMetadata<T>
    storage: S,
}

My plan for MCPing the Storage API is basically broken down into the following three parallel steps:

  • Recruit a T-lang member to liason/proxy/sponsor/whatever-the-word-is the proposal.
  • Provide a proof of concept implementation for minimally all currently allocator-generic std collections on top of the Storage API.
    • I'm still up-in-the-air about whether to do this in-tree on the std impls directly in a fork, or to reimplement them in a separate repo (like you've done here).
    • Part of what makes this interesting is that this makes std collections #[no_std] compatible, so while refactoring in-place has the benefits of full API coverage and ease of "build with my fork of std" testing, wanting to move things into libcore means the diff would be little better than an external crate. Plus having to deal with std iteration times.
  • Write the MCP.
    • Document the supported use cases / design constraints / etc.
    • Document why the API has the shape it does.
      • My intent here is to guide the reader through inventing the storage traits themselves, like the "reinventing tower::Service" article.
    • Make the argument for putting this into std in addition to the current unstable Allocator trait.

Ultimately, I think that this MCP is where the Storage API either sinks or swims. Either it gets accepted for in-tree implementation/experimentation, or it gets rejected, likely on the matter of being insufficiently motivated for std inclusion given its complexity. Especially given the async-wg group w.r.t. async in traits (e.g. dyn*), now really does feel like the time to say if we want to make Box the "owning handle to some storage" or if it's just an owning pointer to allocated memory. My claim is that Box<T, ?> can be both &move T and dyn* T, as well as InlineMaxSizeDst<T> and so much more, so its a matter of proving it worth it.

I'd like to work together, or at the very least avoid duplicating work you've already done unnecessarily. Adding your experience working with (your modified copy of) the storage API will help to strengthen the motivation provided in the MCP.

...I don't really know how to end this info dump. I guess...

TL;DR: Hi 👋 I'm working on the Storage API and would like to incorporate your experience working with it.

Footnotes

  1. Even back then I disagreed on some details with matthieu-m, which ended up being integrated into the most recent iteration. Notably, matthieu-m had separate API trees for "Element" and "Range" storage, but I've always advocated for "Range" storage to just be "Element" storage of an array/slice. Along with this difference is that matthieu-m's element storage takes T to place it, whereas mine previously took <T as Pointee>::Metadata (when it was still typed) and now just deals in Layout.

@CraftSpider
Copy link
Owner

CraftSpider commented Jun 22, 2022

Hello! I'd love to help with pushing storages forwards towards std acceptance, as my biggest current pain is that I find myself wanting to just copy-paste things like BTreeMap or OsString into this repository, its ergonomics would be much better in the standard library. You can contact me through gmail at runetynan (at) gmail (dot) com, zulip at CraftSpider, or through discord at CraftSpider # 0001, if you want to talk further.

Some quick thoughts here:

I think untyping the handle loses some ergonomics, but also understandably simplifies the API, one of my current big pains is that a typed handle can't have a useful supertrait due to lack of things like casting support.

You seem to have ended on a similar API, with some slightly different specific choices, but that may be a good point towards it as a good choice for the final API design - We separately came up with basically the same API from the original POC.

I decided instead of SharedMutability as an extension I'd require all storages to be internally mutable, and instead of a single Pinnable, I separated it into multiple traits. I think the first is fine, though for the second, I think it may actually be a useful distinction - I've found multiple extension storages that implement only a subset of my distinguished traits, and it simplifies the safety requirements and implementations for unleaking quite a bit.

@CAD97
Copy link
Author

CAD97 commented Jun 22, 2022

Let me summarize my understanding of the trait net:

cad97/storage-api CraftSpider/department summary
Storage * base de/alloc and resolving of handles
MultipleStorage MultiItemStorage refinement: can have multiple live handles
SharedMutabilityStorage Storage refinement: can mutate handles resolved with &S
PinningStorage LeaksafeStorage refinement: stored memory is pinned until deallocated
* FromLeakedStorage refinement: can turn resolved handle back into a handle

I think the guarantees/requirements of my PinningStorage and your LeaksafeStorage are mostly but not quite equivalent. The one potential difference is how we explain the behavior of Storage::allocate without MultipleStorage. I say "existing handles are invalidated" + "memory will not be moved/reused;" you say "may overwrite the item, allocate a new item, or fail" + "handles/data must outlive the storage."

From this I think my SmallStorage is PinningStorage but not LeaksafeStorage. The difference here is that SmallStorage has a handle type of () which always resolves to the single handled memory. (With typed storage, this would be a handle type of JustMetadata<T>.) This semantically invalidates the previous handle (because the storage can't tell them apart) and thus violates LeaksafeStorage's requirement that handles must outlive the storage, because it's no longer possible to resolve the old handle.

The exact behavior here is a complex combination of multiple refinements' interactions, but I think I do need to clarify that PinningStorage means that resolved references can have their lifetime unsafely but soundly extended to if/when the handle is deallocated. Potential: rename SharedMutabilityStorage::resolve_raw to resolve_shr/_ref/etc and add PinningStorage::resolve_raw[_mut] which returns ptr::NonNull<Memory> (unlike the other methods, this has simple invalidation rules) or the theoretical &'unsafe [mut] Memory. New resolve_raw would only grant read permissions unless SharedMutabilityStorage also holds, in which case it also gives write permission.

You would think that we could just have PinningStorage: SharedMutabilityStorage (and this is almost always the case and very convenient to have, removing the resolve_raw[_mut] split), but Box is arguably not SharedMutabilityStorage. If UCG#258 is resolved that Box is a uniqueness barrier (like ptr::NonNull is; AIUI the discussion is about the internal ptr:Unique which Box and RawVec are built on), then PinningStorage: SharedMutabilityStorage potentially makes sense, since roughly no collections will use PinningStorage without SharedMutabilityStorage.

The one storage type (other than the Box debate) which is pinning but not SharedMutability is &'static Memory. And there's a fun extra challenge there that comes from subobject slicing: in order to not slice to a fixed header, you need a fat pointer (Memory = [MaybeUninit<u8>]). The one out in the face of references doing subobject slicing is using extern type. The requirements then to get a PinningStorage + !SharedMutabilityStorage is pretty much just the SmallStorage: you need to handle your memory as &mut 'static ExternType inside the storage itself, so non-mut resolution goes through &&mut 'static ExternType. Additionally, using the &mut trick to get transparent uniqueness can only apply if the pointer is stored internally to the storage, since the Handle type is Copy (and &mut isn't). There are a lot of constraints to to satisfying PinningStorage without also satisfying SharedMutabilityStorage that imho the only reason to support it is to support keeping ptr::Unique's stricter semantics.

All you have to do to understand the footgun that is PinningStorage + !SharedMutabilityStorage is to look at the discussed potential fallout of enforcing UCG#258 (and notably, UCG#262 for other ptr::Unique-based collections (read: Vec)).

I have talked myself in a circle and am not sure what my position is anymore w.r.t. this, let me walk away and mull on it for a bit.

As for FromLeakedStorage: there's two "kinds" of handle reverse lookup: ptr -> Handle and (ptr, &Storage) -> Handle. The former is AIUI all that's required for std collection's into_raw/from_raw to use resolved pointers, with a Handle: From<*mut ()> bound or similar. In any case, this case should be a trait refinement on the handle, not on the storage imho. In fact, I think that unleaking in a different storage instance (that doesn't just deref to some shared instance) requires this.

But the case of requiring a reference to the storage is more complicated. Obviously it's useful for your Debug storage wrapper, since it can track validity through unleak. The actual functionality case, though, is something like your StaticHeap arena, which gives out usize handles.

It's worth noting that &'static StaticHeap can give out ptr::NonNull handles, and doesn't need to give out usize handles. Of course, that makes it more unsafe to use, since it can't verify handle validity... but Storage is still fundamentally an unsafe abstraction layer, and the safe layer is on top of it. So if the main goal is to be able to support Box<T, &'static StaticHeap>::[into|from]_raw, that's doable without FromLeakedStorage. Without 'static, though, reverse handle lookup does require consulting the storage, since the handle can't just be the pointer.

Still, personally I'd prefer spelling it S::Handle: UnleakHandleIn<S> rather than S: UnleakCapableStorage.

What usecases do you have for unleaking / reverse handle lookup with non-pointer types? [into|from]_raw[_parts] is gated on S = AllocStorage<Global>, and I think it's potentially reasonable to tell cases that want to support other handles to use [into|from]_raw[_parts]_with_storage which would return S::Handle rather than *mut T. If they need to roundtrip the handle through a raw pointer, they can require S::Handle: UnleakHandle or even S::Handle = PtrHandle. (Not just *mut or ptr::NonNull because providing Send + Sync for S::Handle is both semantically correct (it's a useless token without the storage) and very useful ((S::Handle, T::Metadata, S) automatically is the right Send/Sync).)


And because I don't have a better place to note it right now: consider the rename: SharedMutabilityStorage -> SharedStorage, to better fit with the other names. While SharedStorage is I think accurate -- multiple people can share access to the backing storage and still get mutable memory access by coordinating -- it might too strongly imply &impl SharedStorage : Storage. This can't be provided with the existing bounds, because I require &mut access to allocate (and deallocate), namely so that a storage implemented e.g. like RawVec can reallocate its backing buffer without requiring its state to be UnsafeCell. SharedMutabilityStorage is about the uniqueness barrier, but the inline is still on this side.

Also, we've just found that &impl Allocator: Allocator is problematic for implementors (w.r.t. pinning) and consumers (w.r.t. not introducing redundant indirection), so maybe not providing such an impl by default for storages is desirable?


Apologies for the big brain dump. Hopefully I'll be able to set up an initiative/tracking repo/issue to keep track of these things in the near-ish future, but right now, this is it 🙃 (since I can't really start in on a libcollections fork until rust-lang/rust#97052 is available in bootstrap without a huge amount of distracting #[cfg(bootstrap)] noise.)

@CraftSpider
Copy link
Owner

Bottom to Top:

No problem with the brain dump, I appreciate the thoughts on potential API ramifications and etc, once there's a single location these thoughts can hopefully be centralized, and probably use issues or such to track these kinds of API decisions.

The rename would probably be fine I think, but also that's probably the kind of thing that will get some bikeshedding later on :P

From everything you've said, I agree that bounding the handle might be nicer. My use case was more about being able to relax S = AllocStorage<Global>, and support leaking from stack heaps which could be useful for getting &'storage mut T items out of boxes. Your analysis of the traits seems right, you skipped over ClonesafeStorage, but it's basically just a finer-grained refinement on LeaksafeStorage, and probably not necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants