Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to reduce allocations when interning a byte slice? #18

Closed
marselester opened this issue Jul 6, 2023 · 5 comments
Closed

Comments

@marselester
Copy link

Hi, thank you for creating an awesome interning package!

Could you please advice whether it's possible to reduce allocations when interning a byte slice (a buffer that changes on every iteration and might contain duplicate strings)?

// InternString doesn't allocate.
func InternString(s string) string {
	return intern.GetByString(s).Get().(string)
}

// InternBytes allocates.
func InternBytes(b []byte) string {
	return intern.GetByString(string(b)).Get().(string)
}

// UnsafeInternBytes doesn't allocate but b mustn't be modified.
func UnsafeInternBytes(b []byte) string {
	return intern.GetByString(unsafeString(b)).Get().(string)
}

func unsafeString(b []byte) string {
	return *(*string)(unsafe.Pointer(&b))
}

For example, when interning is implemented using map[string]string, it would allocate only if a string wasn't found.

type Strings map[string]string

func (ss Strings) InternBytes(b []byte) string {
	if original, isInterned := ss[string(b)]; isInterned {
		return original
	}

	s := string(b)
	ss[s] = s
	return s
}
@bradfitz
Copy link
Contributor

bradfitz commented Jul 6, 2023

Not with this package, no. If you don't care about growing your map forever you can use a map, as you show. This package cares about letting the GC still do its job and reclaim things.

@bradfitz bradfitz closed this as completed Jul 6, 2023
@oschwald
Copy link

oschwald commented Jul 7, 2023

We would like to avoid growing the map indefinitely, but we would also like to avoid the allocation caused by converting the []byte to a string in the case where the string is already interned.

I had hoped that we could just add GetByStringBytes(b []byte) *Value or similar to do this, but looking at the implementation of get, I don't think that would work as I don't think Go would optimize out the string allocation of valMap[key{s: string(b), isString: true}] the way it does with m[string(b)]. Perhaps a fork that only handles string values would make sense.

@josharian
Copy link
Collaborator

https://github.com/josharian/intern might (might!) be a better fit for you

@josharian
Copy link
Collaborator

We would like to avoid growing the map indefinitely, but we would also like to avoid the allocation

These are hard to automatically satisfy at the same time. See also the discussion at https://commaok.xyz/post/intern-strings/

@oschwald
Copy link

oschwald commented Jul 8, 2023

These are hard to automatically satisfy at the same time. See also the discussion at https://commaok.xyz/post/intern-strings/

Thanks for the link to the library and the article!

Looking at the implementation of github.com/go4org/intern, I think it is possible to get the desired behavior with a fork that replaces the current key with a string key. I believe this would lead to somewhat more predictable interning behavior than the implementation in github.com/josharian/intern, but it comes with the disadvantage of having to pass around *intern.Value instead of the underlying string.

Perhaps something like golang/go#43615 or other changes to the runtime will make all of this more convenient in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants