`CloneIn` unnecessarily copies strings when cloning within same allocator #96

overlookmotel · 2024-08-15T14:11:01Z

https://github.com/oxc-project/oxc/blob/2476dceee0a656a00d5138e0aeb34b5c062883d8/crates/oxc_span/src/atom.rs#L62-L68

https://github.com/oxc-project/oxc/blob/2476dceee0a656a00d5138e0aeb34b5c062883d8/crates/oxc_span/src/atom.rs#L76-L80

So Atom::clone_in copies the underlying string data and re-allocates it in new allocator.

This is the correct behavior when cloning from one allocator into another allocator - string data does need to be copied into the new allocator.

But when cloning within same allocator, this string copying is unnecessary. Could just do Atom::clone, which creates another reference to original underlying string data, with no data copy.

Not sure how to avoid this copying. Do we need a 2nd cloning trait for within-same-allocator cloning? And how would we enforce that the allocator provided to clone_in is the same allocator as the original? (or maybe we don't need to exactly - 'old_alloc: 'new_alloc would suffice to ensure that the string data in old allocator lives longer than the new Atom).

The text was updated successfully, but these errors were encountered:

overlookmotel · 2024-08-19T13:16:44Z

I think perhaps what we need is 2 different traits: CloneIn and CloneInto.

`CloneInto`

This is current CloneIn, just renamed to CloneInto. It can clone into a different allocator.

It uses 2 x lifetimes - 'old_alloc and 'new_alloc.

pub trait CloneInto<'new_alloc>: Sized {
    type Cloned;
    fn clone_into(&self, allocator: &'new_alloc Allocator) -> Self::Cloned;
}

impl<'old_alloc, 'new_alloc, T, C> CloneInto<'new_alloc> for Box<'old_alloc, T>
where
    T: CloneInto<'new_alloc, Cloned = C>,
{
    type Cloned = Box<'new_alloc, C>;
    fn clone_into(&self, allocator: &'new_alloc Allocator) -> Self::Cloned {
        Box::new_in(self.as_ref().clone_into(allocator), allocator)
    }
}

impl<'old_alloc, 'new_alloc> CloneInto<'new_alloc> for &'old_alloc str {
    type Cloned = &'new_alloc str;
    fn clone_into(&self, allocator: &'new_alloc Allocator) -> Self::Cloned {
        // String data is copied into new arena
        allocator.alloc_str(self)
    }
}

New `CloneIn`

This clones only within same allocator.

It uses only 1 lifetime 'alloc - the output has same lifetime as the input.

pub trait CloneIn: Sized {
    fn clone_in(&self, allocator: &'alloc Allocator) -> Self;
}

impl<'alloc, T> CloneIn<'alloc> for Box<'alloc, T>
where
    T: CloneIn<'alloc>,
{
    fn clone_in(&self, allocator: &'alloc Allocator) -> Self {
        Box::new_in(self.as_ref().clone_in(allocator), allocator)
    }
}

impl<'alloc> CloneIn<'alloc> for &'alloc str {
    fn clone_in(&self, _: &'alloc Allocator) -> Self {
        // String data is not copied
        self
    }
}

Probably CloneIn can have a blanket impl impl<'a> CloneIn<'a> for T where T: Clone (where CloneInto cannot).

`FromInto` / `IntoInto`?

If we go this route, would we also want to provide separate FromIn/IntoIn and FromInto/IntoInto traits which convert within same allocator / different allocator? (probably need better naming though! IntoInto??)

Is this worth it?

Having 2 traits is an extra complication. However, personally I think it's worth it because:

Within Oxc, CloneIn (clone within same allocator) is the one we'll use most. So it's good to have an implementation which is optimized for our common use case.
String copying is relatively expensive because strings are not sized, so it tends to involve a call to memcpy. Whereas copying sized types (most of cloning) can often be expressed with static size inline load/store operations.
Some strings (e.g. the Atoms inside StringLiterals) can be quite large. So again, relatively expensive to copy.
Thanks to ast_tools, creating a whole new trait and implementing it on all types is not as horrendous amount of work as it would be writing it by hand.

@rzvxa You are the one who knows these traits best. What do you think?

overlookmotel · 2024-08-19T13:24:09Z

One further thought:

Alternative for cloning within same allocator

The root cause of why we need to call clone_in with an &Allocator is that oxc_allocator::Box does not hold a reference to its allocator internally. So when cloning a Box, you have to provide the Allocator.

If we encoded a reference to allocator within Box (#18 (comment)), then we could make the whole AST Clone, and then we'd only need CloneInto, and not CloneIn. However:

As discussed on that issue, that approach has some drawbacks, and even if we do go for it in the end, it is unlikely to happen any time soon.
CloneIn is not such a commonly-used API, so perhaps not worth bending over backwards to make the AST Clone. We can maybe get more juice out of spare bits in pointers by using them for other more common needs.

Therefore, regardless of what may become possible in future, I feel we'd be better off introducing the clone-within-same-allocator version of CloneIn now, rather than waiting for Clone to (maybe) become possible.

…ne_in` on `LabelIdentifier` (#7172) Follow-on after stack up to #7148. Small optimization. `LabelIdentifier` is `Clone` so we can use `Clone::clone` to clone it, instead of `CloneIn::clone_in`. The difference is that `clone_in` makes a 2nd copy of the `Atom`'s string in the arena, whereas `clone` creates a new `Atom` referencing the same string. This is an unfortunate shortcoming of `CloneIn` (oxc-project/backlog#96), but the more we can avoid `CloneIn`, the better.

Boshen transferred this issue from oxc-project/oxc Aug 23, 2024

overlookmotel mentioned this issue Nov 6, 2024

refactor(transformer/async-generator-functions): use clone not clone_in on LabelIdentifier oxc-project/oxc#7172

Merged

overlookmotel mentioned this issue Jan 20, 2025

feat(oxc_allocator): Add new method clone_in_with_semantic_ids for CloneIn trait oxc-project/oxc#8608

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`CloneIn` unnecessarily copies strings when cloning within same allocator #96

`CloneIn` unnecessarily copies strings when cloning within same allocator #96

overlookmotel commented Aug 15, 2024 •

edited

Loading

overlookmotel commented Aug 19, 2024

overlookmotel commented Aug 19, 2024

CloneIn unnecessarily copies strings when cloning within same allocator #96

CloneIn unnecessarily copies strings when cloning within same allocator #96

Comments

overlookmotel commented Aug 15, 2024 • edited Loading

overlookmotel commented Aug 19, 2024

CloneInto

New CloneIn

FromInto / IntoInto?

Is this worth it?

overlookmotel commented Aug 19, 2024

Alternative for cloning within same allocator

`CloneIn` unnecessarily copies strings when cloning within same allocator #96

`CloneIn` unnecessarily copies strings when cloning within same allocator #96

overlookmotel commented Aug 15, 2024 •

edited

Loading

`CloneInto`

New `CloneIn`

`FromInto` / `IntoInto`?