Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CloneIn unnecessarily copies strings when cloning within same allocator #96

Open
overlookmotel opened this issue Aug 15, 2024 · 2 comments

Comments

@overlookmotel
Copy link

overlookmotel commented Aug 15, 2024

https://github.com/oxc-project/oxc/blob/2476dceee0a656a00d5138e0aeb34b5c062883d8/crates/oxc_span/src/atom.rs#L62-L68

https://github.com/oxc-project/oxc/blob/2476dceee0a656a00d5138e0aeb34b5c062883d8/crates/oxc_span/src/atom.rs#L76-L80

So Atom::clone_in copies the underlying string data and re-allocates it in new allocator.

This is the correct behavior when cloning from one allocator into another allocator - string data does need to be copied into the new allocator.

But when cloning within same allocator, this string copying is unnecessary. Could just do Atom::clone, which creates another reference to original underlying string data, with no data copy.

Not sure how to avoid this copying. Do we need a 2nd cloning trait for within-same-allocator cloning? And how would we enforce that the allocator provided to clone_in is the same allocator as the original? (or maybe we don't need to exactly - 'old_alloc: 'new_alloc would suffice to ensure that the string data in old allocator lives longer than the new Atom).

@overlookmotel
Copy link
Author

I think perhaps what we need is 2 different traits: CloneIn and CloneInto.

CloneInto

This is current CloneIn, just renamed to CloneInto. It can clone into a different allocator.

It uses 2 x lifetimes - 'old_alloc and 'new_alloc.

pub trait CloneInto<'new_alloc>: Sized {
    type Cloned;
    fn clone_into(&self, allocator: &'new_alloc Allocator) -> Self::Cloned;
}

impl<'old_alloc, 'new_alloc, T, C> CloneInto<'new_alloc> for Box<'old_alloc, T>
where
    T: CloneInto<'new_alloc, Cloned = C>,
{
    type Cloned = Box<'new_alloc, C>;
    fn clone_into(&self, allocator: &'new_alloc Allocator) -> Self::Cloned {
        Box::new_in(self.as_ref().clone_into(allocator), allocator)
    }
}

impl<'old_alloc, 'new_alloc> CloneInto<'new_alloc> for &'old_alloc str {
    type Cloned = &'new_alloc str;
    fn clone_into(&self, allocator: &'new_alloc Allocator) -> Self::Cloned {
        // String data is copied into new arena
        allocator.alloc_str(self)
    }
}

New CloneIn

This clones only within same allocator.

It uses only 1 lifetime 'alloc - the output has same lifetime as the input.

pub trait CloneIn: Sized {
    fn clone_in(&self, allocator: &'alloc Allocator) -> Self;
}

impl<'alloc, T> CloneIn<'alloc> for Box<'alloc, T>
where
    T: CloneIn<'alloc>,
{
    fn clone_in(&self, allocator: &'alloc Allocator) -> Self {
        Box::new_in(self.as_ref().clone_in(allocator), allocator)
    }
}

impl<'alloc> CloneIn<'alloc> for &'alloc str {
    fn clone_in(&self, _: &'alloc Allocator) -> Self {
        // String data is not copied
        self
    }
}

Probably CloneIn can have a blanket impl impl<'a> CloneIn<'a> for T where T: Clone (where CloneInto cannot).

FromInto / IntoInto?

If we go this route, would we also want to provide separate FromIn/IntoIn and FromInto/IntoInto traits which convert within same allocator / different allocator? (probably need better naming though! IntoInto??)

Is this worth it?

Having 2 traits is an extra complication. However, personally I think it's worth it because:

  • Within Oxc, CloneIn (clone within same allocator) is the one we'll use most. So it's good to have an implementation which is optimized for our common use case.
  • String copying is relatively expensive because strings are not sized, so it tends to involve a call to memcpy. Whereas copying sized types (most of cloning) can often be expressed with static size inline load/store operations.
  • Some strings (e.g. the Atoms inside StringLiterals) can be quite large. So again, relatively expensive to copy.
  • Thanks to ast_tools, creating a whole new trait and implementing it on all types is not as horrendous amount of work as it would be writing it by hand.

@rzvxa You are the one who knows these traits best. What do you think?

@overlookmotel
Copy link
Author

One further thought:

Alternative for cloning within same allocator

The root cause of why we need to call clone_in with an &Allocator is that oxc_allocator::Box does not hold a reference to its allocator internally. So when cloning a Box, you have to provide the Allocator.

If we encoded a reference to allocator within Box (#18 (comment)), then we could make the whole AST Clone, and then we'd only need CloneInto, and not CloneIn. However:

  1. As discussed on that issue, that approach has some drawbacks, and even if we do go for it in the end, it is unlikely to happen any time soon.
  2. CloneIn is not such a commonly-used API, so perhaps not worth bending over backwards to make the AST Clone. We can maybe get more juice out of spare bits in pointers by using them for other more common needs.

Therefore, regardless of what may become possible in future, I feel we'd be better off introducing the clone-within-same-allocator version of CloneIn now, rather than waiting for Clone to (maybe) become possible.

@Boshen Boshen transferred this issue from oxc-project/oxc Aug 23, 2024
Dunqing pushed a commit to oxc-project/oxc that referenced this issue Nov 6, 2024
…ne_in` on `LabelIdentifier` (#7172)

Follow-on after stack up to #7148.

Small optimization. `LabelIdentifier` is `Clone` so we can use `Clone::clone` to clone it, instead of `CloneIn::clone_in`.

The difference is that `clone_in` makes a 2nd copy of the `Atom`'s string in the arena, whereas `clone` creates a new `Atom` referencing the same string. This is an unfortunate shortcoming of `CloneIn` (oxc-project/backlog#96), but the more we can avoid `CloneIn`, the better.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant