Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce stack size of DataPayload #4438

Closed
sffc opened this issue Dec 9, 2023 · 11 comments · Fixed by #4463
Closed

Reduce stack size of DataPayload #4438

sffc opened this issue Dec 9, 2023 · 11 comments · Fixed by #4463
Labels
C-data-infra Component: provider, datagen, fallback, adapters

Comments

@sffc
Copy link
Member

sffc commented Dec 9, 2023

Currently DataPayload adds 2 usizes of stack space on top of the data struct.

A DataPayload looks like this:

pub enum DataPayloadEquiv<Y: 'static> {
    Yoke {
        yokeable: MaybeUninit<Y>,
        cart: Option<Arc<Box<[u8]>>>,
    },
    StaticRef(&'static Y),
}

The Option<Arc<Box<[u8]>>> fits in a single word, and the enum discriminant takes the whole other word.

We should aim to get the size impact down to a single word.

Some potential options:

  1. Change the cart from Option<Arc<Box<[u8]>>> to Arc<Option<Box<[u8]>>>. This requires allocating a pointer when creating a data payload via DataPayload::new_owned or when using DataPayload::with_mut, neither of which are used in the most common code paths (baked data and immutable postcard data). Playground
  2. Implement some custom unsafe bitpacking like we managed to do in ZeroVec.
  3. Wait for Rust to add alignment niches.
  4. Add a feature flag to disable the Yoke variant, saving even more stack size, but at the cost of flexibility. Doesn't work with global feature resolution.

Thoughts? @Manishearth @robertbastian

@sffc sffc added C-data-infra Component: provider, datagen, fallback, adapters needs-approval One or more stakeholders need to approve proposal labels Dec 9, 2023
@sffc
Copy link
Member Author

sffc commented Dec 9, 2023

I should note that Option<DataPayload> and ShortVec<DataPayload> both end up adding the extra word back anyway, so there is no difference for those two types on whether we make the proposed change to DataPayload. It only helps if DataPayload is its own field (which is often the case).

@Manishearth
Copy link
Member

Add a feature flag to disable the Yoke variant, saving even more stack size, but at the cost of flexibility. Doesn't work with global feature resolution.

This would have to be the other way around, using default features. Which also has problems here.

@sffc
Copy link
Member Author

sffc commented Dec 11, 2023

Yeah that's what I meant, sorry. icu_provider gets a feature to enable the Yoke variant. BlobProvider enables the feature. Also any crates using map_project enable the feature, which is DateTime but we're getting rid of that code soon. But again this doesn't work with global feature resolution so I'm inclined to not do it right now.

@sffc
Copy link
Member Author

sffc commented Dec 11, 2023

Ideally I want a ShortVec of DataPayload to fit in one word plus the struct size. Here's how:

  1. 0x0 0x0 for empty
  2. 0x0 0x1 ptr len for heap vec
  3. 0x0 0x2 ptr for single payload StaticRef
  4. 0x1 data for single payload yoke empty cart
  5. ptr data for single payload yoke Rc cart

This works so long as 0x1 is not a valid address for an Rc which it shouldn't be due to alignment.

@sffc
Copy link
Member Author

sffc commented Dec 11, 2023

Here's a fully safe type ShortVecDataPayload where I got it to fit into 1 word plus struct size, except for the alignment niche (using an Rc instead of Option<Rc> cart):

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=f6c4fdb5b113407d0c1f0cc488fd39c4

And here's the union described in the previous post, which allows for a yoke with empty cart on the 0x1 discriminator (alignment niche):

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=b29c4df4935d6cd639d140ebbdde3373

@Manishearth
Copy link
Member

Change the cart from Option<Arc<Box<[u8]>>> to Arc<Option<Box<[u8]>>>. This requires allocating a pointer when creating a data payload via DataPayload::new_owned or when using DataPayload::with_mut, neither of which are used in the most common code paths (baked data and immutable postcard data)

This is incorrect: The Arc<Option> would always allocate for postcard. And due to how allocators work it's likely that this will end up allocating more memory overall depending on the allocator.


I don't think 3 will happen anytime soon. I think 2 will have a relevant perf impact and it's going to be tricky to do in a generic context (it's basically impossible to write your own niche logic like "Yoke uses this bitpacking but only when the cart type looks like this", and it breaks parametricity)

I'm not super happy about 1 either since it pessimizes postcard, personally I'd rather not do this than do any of the changes here.

@sffc
Copy link
Member Author

sffc commented Dec 11, 2023

Change the cart from Option<Arc<Box<[u8]>>> to Arc<Option<Box<[u8]>>>. This requires allocating a pointer when creating a data payload via DataPayload::new_owned or when using DataPayload::with_mut, neither of which are used in the most common code paths (baked data and immutable postcard data)

This is incorrect: The Arc<Option> would always allocate for postcard. And due to how allocators work it's likely that this will end up allocating more memory overall depending on the allocator.

Yes. My sentence was correct but incomplete. Postcard already uses an Arc, so there's no difference for Postcard, only a difference when using new_owned or with_mut.

I don't think 3 will happen anytime soon.

Ok. How hard is it? Is it blocked on bandwidth or on more fundamental design concerns?

I think 2 will have a relevant perf impact and it's going to be tricky to do in a generic context (it's basically impossible to write your own niche logic like "Yoke uses this bitpacking but only when the cart type looks like this", and it breaks parametricity)

Did you review the specific proposal for 2 that I posted in the sandbox in my previous reply?

I'm not super happy about 1 either since it pessimizes postcard, personally I'd rather not do this than do any of the changes here.

I don't think it pessimizes postcard. It pessimizes custom data adapters and components that use map_project or similar, which will be none after the datetime refactor. However I'm not super happy about it either.

@sffc
Copy link
Member Author

sffc commented Dec 11, 2023

I think 2 will have a relevant perf impact

I'm not too concerned adding an extra branch or few cycles to this especially since we've designed our APIs with the assumption that DataPayload::get is not completely free (this is why everything has as_borrowed). Also based on my example I think I can write .get() to be the same complexity as it currently is (since we already branch on the enum).

@Manishearth
Copy link
Member

Ok. How hard is it? Is it blocked on bandwidth or on more fundamental design concerns?

Unclear, it's not even in the stage of discussion where it's something the lang/compiler teams are tracking. It's an idea someone had that garnered some discussion, with some positive-to-neutral comments from team members but nothing concrete. Someone needs to put in the effort to talk to the teams and maybe write a proper RFC. It'll still be blocked on bandwidth then.

There may be perf tradeoff concerns but probably not for this particular optimization.

Yes. My sentence was correct but incomplete. Postcard already uses an Arc, so there's no difference for Postcard, only a difference when using new_owned or with_mut.

Ah I see. Yeah I guess.

Did you review the specific proposal for 2 that I posted in the sandbox in my previous reply?

It doesn't do it in a generic context. You're using specific types there, there's no way to fit that into Yoke<Y, C>.

I'm not a huge fan of having extra Yoke types that are not Yoke: the Yoke logic is quite self-contained and I'd like to keep it that way.

I don't think it pessimizes postcard. It pessimizes custom data adapters and components that use map_project or similar, which will be none after the datetime refactor. However I'm not super happy about it either.

Yeah in that case I think 1 is probably the best path forward but I'm still not happy about it and not sure if we should do this at all.

@sffc
Copy link
Member Author

sffc commented Dec 12, 2023

It doesn't do it in a generic context. You're using specific types there, there's no way to fit that into Yoke<Y, C>.

I'm not a huge fan of having extra Yoke types that are not Yoke: the Yoke logic is quite self-contained and I'd like to keep it that way.

With the union approach I would always convert back to a regular Yoke before doing anything with it. It's just a memory layout optimization.

@robertbastian
Copy link
Member

Change the cart from Option<Arc<Box<[u8]>>> to Arc<Option<Box<[u8]>>>. This requires allocating a pointer when creating a data payload via DataPayload::new_owned or when using DataPayload::with_mut, neither of which are used in the most common code paths (baked data and immutable postcard data). Playground

I'm fine with this.

Implement some custom unsafe bitpacking like we managed to do in ZeroVec.

I assume this is what the open PR does? I don't like how much unsafe code that adds.

Wait for Rust to add alignment niches.

image

Add a feature flag to disable the Yoke variant, saving even more stack size, but at the cost of flexibility. Doesn't work with global feature resolution.

Very opposed. Global feature resolution means it will be tricky and brittle to actually get this size reduction.

@sffc sffc removed the needs-approval One or more stakeholders need to approve proposal label May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-data-infra Component: provider, datagen, fallback, adapters
Projects
None yet
3 participants