-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
legacyLoadExtensionFieldInfo causes unbounded memory growth with dynamically created resolvers #1521
Comments
This is an unfortunate side-effort of heavy use of dynamic types and in many ways is no different than heavy use of Go reflection where you dynamically create new Go types (golang/go#28783) and use them with various Go stdlib packages. Many of them also use the equivalent of a Fundamentally, we need something like golang/go#43615, where a cache knows it can reclaim an object because it is the very last reference. |
You're right, a weak reference would also solve the problem wonderfully. For this specific case, however, my question is does this value need to be cached at all? I could be misunderstanding the code, but to me it looks like this caching hasn't worked as intended for 3+ years. Skipping the caching altogether would presumably retain all of the performance benefits that were realized by replacing the old cache with none of the side effects related to unbounded growth. |
Possibly? It's been a while since I've been regularly involved with protobuf. I wouldn't be surprised if there is code depending on Essentially: xi := legacyLoadExtensionFieldInfo(xt) guarantees that
Half the caching algorithm has never worked (and we should fix that), but to my understanding it still ensures the property before. It's just that it unfortunately always does the heavy |
That's definitely a valid use case, though I don't see any places where that is happening based on the references to
You are of course correct that as currently implemented this function will return the same pointer for a given input, which is certainly an important component of caching, though if I were to guess from the commit messages I'd assume that performance was a primary criteria. In this case, all existing tests continue to pass and the current performance would be largely unaffected by removing the cache. I've just been unable to find a case where a pointer difference would affect correctness, though I may be missing something (and there's no telling whether such a requirement might materialize in the future). Admittedly, I haven't done any performance testing with the caching fixed to say whether or not there are significant gains from skipping the I have some bias here due to the way the caching implementation affects our memory profile, but I can understand the reasoning for fixing the cache rather than removing it. |
Instead of a user-facing option, it would be nice if the Workloads that make use of dynamic protos should not need to be restarted periodically due to unbounded memory usage, so this seems like a serious bug. |
I've done the same exercise, of examining all usages of Another way to implement the cache of computed
This first one is already special-cased and handled this way: So instead of something like an The downside to such an approach (aside from one new exported type in the total exported surface area) is that existing implementations of But the performance impact could be easily fixed by having them embed the new type into their implementation. Here's some pseudo-code to riff on this idea: // In the internal/impl package
//------------------------------
type ExtensionFieldInfo struct {
init sync.Once
info *extensionFieldInfo
}
func (e *ExtensionFieldInfo) getExtensionInfo(xt protoreflect.ExtensionType) *extensionFieldInfo {
e.init.Do(func() {
e.info = makeExtensionFieldInfo(xt.TypeDescriptor())
})
return e.info
}
type extensionFieldInfoHolder interface {
getExtensionInfo(xt protoreflect.ExtensionType) *extensionFieldInfo
}
func legacyLoadExtensionFieldInfo(xt protoreflect.ExtensionType) *extensionFieldInfo {
// !! No more legacyExtensionFieldInfoCache !!
if holder, ok := xt.(extensionFieldInfoHolder); ok {
return holder.getExtensionInfo(xt)
}
// Slow path for things that don't opt into embedding info
return makeExtensionFieldInfo(xt.TypeDescriptor())
}
// In the protoimpl package
//------------------------------
// An exported alias for it the type above.
type ExtensionFieldInfo = impl.ExtensionFieldInfo
// In the dynamicpb package
//------------------------------
// This demonstrates how easy it is to use
type extensionType struct {
desc extensionTypeDescriptor
// cache extension metadata for performance
protoimpl.ExtensionFieldInfo
// just embed this type and that's it!
} |
I didn't originally grok everything that @marc-chiesa was saying about the cache being broken. But now I understand.
This is not possible. The bug described, that suggests caching has not worked for 3 years (now 4 since it's a year later), means that every time
This is incorrect. The caching bug (which has existed since this code was first introduced) means that it always returns a new So, because of the caching bug, it is clear that none of this is necessary since it doesn't actually do anything useful today. It merely functions to pin unused objects to the heap. As far as stable identity is concerned, the
guarantees that
guarantees that @dsnet, is it possible that is what you were thinking of? That is also what prevents my suggestion above from working -- we can't embed anything mutable into So I take back the whole proposal above. Clearly the simplest solution here is to just remove the |
I've proposed what I think is the safest and most sane solution here: https://go-review.googlesource.com/c/protobuf/+/560095 |
Resolves golang/protobuf#1521 In the internal/impl package, a helper computes metadata used for serialization and deserialization of extension values. It features a package var of type sync.Map that is used as cache. This ostensibly was for performance, however it has never worked, because the code that updates the cache inserts entries into the wrong map. (These erroneous entries do not cause any issues because they are keys that never conflict with those used in valid queries.) Instead of a one-line fix to have this code update the correct cache, this change removes the cache altogether. The existence of the cache means that once a protoreflect.ExtensionType is used to serialize or deserialize a message, it can *never* be garbage collected. Workloads that are long-lived servers using dynamic messages and extensions based on user requests will exhibit unbounded growth in memory usage as the cache only gets larger and larger. Since the cache has never worked, any advantages it ostensibly conferred have not been missed. So this fixes the unbounded memory growth instead. Change-Id: I15957fd8521852f9f7f9f89db7ebfd7170d85202
tl;dr: I agree. The offending function is:
This function contains a bug: It's clearly supposed to add a new
In addition, while I might be missing something, I think this bug may not be user-visible. I think |
The unfortunate nature of the |
What version of protobuf and what language are you using?
Version: go 1.28.1
What did you do?
I am periodically creating a protoregistry.Types resolver with a refreshed set of FileDescriptorProto messages containing the latest proto definitions used by various services.
We use UnmarshalOptions to specify a resolver for extensions. Every unmarshal via UnmarshalOptions with a new resolver causes in-use memory to grow. The memory continues to grow each time a new resolver is used, even though previous resolvers and their contained types are no longer referenced by our code.
What did you expect to see?
Memory should eventually reach a reasonable steady-state range for a given load.
What did you see instead?
Memory continues to grow every time a new resolver is used.
This seems to be caused by this function in codec_extension.go.
The
sync.map
caching does not appear to be implemented correctly (nothing ever stores tolegacyExtensionFieldInfoCache
, instead it stores inlegacyMessageTypeCache
) though the presence of even the attempted caching appears to be the root of the problem and fixing the cache would not resolve our issue.Instead, in order to support dynamically recreated resolvers without the possibility of unbounded memory growth, it would seem the following change to
getExtensionFieldInfo
would be adequate because it actually does less work than the current implementation but accomplishes the same thing sincelegacyExtensionFieldInfoCache
is empty andmakeExtensionFieldInfo
will run every time.Another possibly solution would be to add an option so skip the caching mechanism, though this would also require public options in other areas (such as mergeOptions) because they use the same code path.
The text was updated successfully, but these errors were encountered: