-
Notifications
You must be signed in to change notification settings - Fork 257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix high memory usage when extracting subgraphs for some fed1 supergraphs #2089
Conversation
👷 Deploy request for apollo-federation-docs pending review.Visit the deploys page to approve it
|
This pull request is automatically built and testable in CodeSandbox. To see build info of the built libraries, click here or the icon next to each commit SHA. |
docs/source/errors.md
Outdated
@@ -91,6 +91,7 @@ The following errors might be raised during composition: | |||
| `UNKNOWN_FEDERATION_LINK_VERSION` | The version of federation in a @link directive on the schema is unknown. | 2.0.0 | | | |||
| `UNKNOWN_LINK_VERSION` | The version of @link set on the schema is unknown. | 2.1.0 | | | |||
| `UNSUPPORTED_FEATURE` | Indicates an error due to feature currently unsupported by federation. | 2.1.0 | | | |||
| `UNSUPPORTED_LINKED_FEATURE` | Indicates that a feature used in a @link is either unsupported and is used with unsupported options. | 2.0.0 | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| `UNSUPPORTED_LINKED_FEATURE` | Indicates that a feature used in a @link is either unsupported and is used with unsupported options. | 2.0.0 | | | |
| `UNSUPPORTED_LINKED_FEATURE` | Indicates that a feature used in a `@link` is either unsupported or is used with unsupported options. | 2.0.0 | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little concerned about moving from Sets to arrays across the board, but if you're comfortable with it, it's ok with me. I left a few minor comments, but feel free to merge once you've cleaned those up.
@@ -2271,15 +2290,12 @@ export class EnumType extends BaseNamedType<OutputTypeReferencer, EnumType> { | |||
} | |||
|
|||
private removeValueInternal(value: EnumValue) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this function have any reason to exist? It just calls another function and is only referenced once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While that method is private, it's actually called from another class, side-stepping the privateness through EnumType.prototype['removeValueInternal'].call()
. Which is admittedly a bit of a hack but the goal is to not expose it publicly yet still allow calling it from EnumValue
. Bit of a poor-man friend
emulation.
In any case, in that situation, this method is just meant to abstract direct accesses to the EnumType._values
field so only EnumType
has to care what the type of that field. Granted, not extremely important in this situation, and possibly more of a personal preference in that case, but I do prefer it over the alternative (of inline that method, which would require to have something like (this._parent as any)._values
in EnumValue
in particular).
internals-js/src/definitions.ts
Outdated
} | ||
|
||
arguments(): readonly ArgumentDefinition<FieldDefinition<TParent>>[] { | ||
return this._args.values(); | ||
return this._args ? this._args.values() : []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: You've been a little inconsistent about whether to use the ternary operator. For the most part you favored
return this._args?.values() ?? [];
So I'd stick with it.
internals-js/src/definitions.ts
Outdated
} | ||
|
||
argument(name: string): ArgumentDefinition<FieldDefinition<TParent>> | undefined { | ||
return this._args.get(name); | ||
return this._args ? this._args.get(name) : undefined; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above.
return this._args?.get(name);
internals-js/src/definitions.ts
Outdated
@@ -2589,7 +2608,7 @@ export class FieldDefinition<TParent extends CompositeType> extends NamedSchemaE | |||
} | |||
|
|||
toString(): string { | |||
const args = this._args.size == 0 | |||
const args = !this.hasArguments() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rearrange so that having arguments case is first? i.e.
const args = this.hasArguments() ? ... : ...;
typeInfoInSubgraph, | ||
); | ||
reachableTypesBySubgraphs.set(subgraphName, reachableTypes); | ||
console.log(`For ${subgraphName}, reachableTypes = [${[...reachableTypes]}]`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leftover console.log
This also change a few sets into array in case where this doesn't make a meaningful difference performance wise to save on memory usage.
Fed1 supergraph lacks information on value types regarding which subgraphs defines them. We use to brute-force add all value types to all extract subgraphs, on the idea that if some extracted subgraphs have a few unused types, it's useless but has no functional impact. Unfortunately, in some special cases (lots of subgraphs and value types), those useless types can lead to a significant increase in memory consumptions. This patch instead look at type reachability within subgraphs to avoid including those useless value types, and thus lower the memory consumptions. Note that fed2 supergraphs are not affected by this problem has they have all the information needed to only extract types in the proper subgraphs. Fixes apollographql#2085
This was retaining a potentially non-trivial amount of memory (on large schema, that AST is not tiny) and afaict, none of our usage was relying on the caching much. The patch nonetheless introduce an option that allow to easily re-enable said caching so it's easy to enabled for specific use case later if we want to.
46f1832
to
dc96ad7
Compare
It's a fair concern, and I should have expanded sooner on my reasoning. First, I'm not suggesting we stop using In the case of Anyway, the changes from set to array in the PR are for:
|
As mentioned on #2085, some fed1 supergraphs can lead to very high memory usage, much larger than if the same subgraphs are recomposed with fed2. And the underlying reason is that fed1 supergraph have not
join__type
information for value types, so the code extracting subgraphs from supergraphs used to put value types into all extracted subgraphs pessimistically on account that while 1) it was simple and 2) it was harmless from a correctness POV since those were just types that ended up being unreachable.However, in the case where there is a lot of subgraph, and most of those subgraphs define a decent number of value types (typically each subgraph defining it's own set of (non-shared) value types), then it means that every extracted subgraph had all the value types from all subgraphs. And the sheer amount of data each subgraph ended up having led to that high memory usage.
So the main fix of this PR consists in, for fed1 supergraph, doing an additional pass when extracting the subgraphs to compute which types are actually reachable in each subgraph and make sure we then don't add the unreachable types. Doing so ensure the extracted subgraphs are roughly the same as in fed2 and thus the memory consumptions are comparable.
Additionally, this PR contains a few (non-fed1-supergraph-specific) small optimisations for memory that comes from investigating this issue. Those optimisations leads to measurable memory differences on at least the example considered on this issue. The short version of those optimisations are:
DocumentNode
) to be able to re-use graphQL-js validations and we used to cache the resulting ASTs. However, in hindsight, we don't really make use of this caching (in theory some code could make use of it, but I believe that's not the case currently). So not caching those avoid retaining that memory.core-schema-js
and turns out that it's because that library has aSchema
class too and was allocating some in globals. To be fair, those weren't using much memory, but we don't really usecore-schema-js
at all outside of a super simple error class that is easily replaced, so I took the opportunity to remove the dependency while at it.