Fix high memory usage when extracting subgraphs for some fed1 supergraphs #2089

pcmanus · 2022-08-24T16:57:07Z

As mentioned on #2085, some fed1 supergraphs can lead to very high memory usage, much larger than if the same subgraphs are recomposed with fed2. And the underlying reason is that fed1 supergraph have not join__type information for value types, so the code extracting subgraphs from supergraphs used to put value types into all extracted subgraphs pessimistically on account that while 1) it was simple and 2) it was harmless from a correctness POV since those were just types that ended up being unreachable.

However, in the case where there is a lot of subgraph, and most of those subgraphs define a decent number of value types (typically each subgraph defining it's own set of (non-shared) value types), then it means that every extracted subgraph had all the value types from all subgraphs. And the sheer amount of data each subgraph ended up having led to that high memory usage.

So the main fix of this PR consists in, for fed1 supergraph, doing an additional pass when extracting the subgraphs to compute which types are actually reachable in each subgraph and make sure we then don't add the unreachable types. Doing so ensure the extracted subgraphs are roughly the same as in fed2 and thus the memory consumptions are comparable.

Additionally, this PR contains a few (non-fed1-supergraph-specific) small optimisations for memory that comes from investigating this issue. Those optimisations leads to measurable memory differences on at least the example considered on this issue. The short version of those optimisations are:

as we validate schema, we convert them into graphQL-js asts (DocumentNode) to be able to re-use graphQL-js validations and we used to cache the resulting ASTs. However, in hindsight, we don't really make use of this caching (in theory some code could make use of it, but I believe that's not the case currently). So not caching those avoid retaining that memory.
there were a number of places in our schema abstractions where were allocating empty array, sets or maps that were never populated more often than not. For instance, most graphQL elements can have directive applied to them, but most elements of most schema don't get any. So for large schema, simply avoiding those empty object allocation adds up somewhat.
more of a cleanup than anything else, but while looking at heap profiling, I got confusing by some schema object that were coming from the core-schema-js and turns out that it's because that library has a Schema class too and was allocating some in globals. To be fair, those weren't using much memory, but we don't really use core-schema-js at all outside of a super simple error class that is easily replaced, so I took the opportunity to remove the dependency while at it.

netlify · 2022-08-24T16:57:09Z

👷 Deploy request for apollo-federation-docs pending review.

Visit the deploys page to approve it

Name	Link
🔨 Latest commit	`dc96ad7`

codesandbox-ci · 2022-08-24T16:58:01Z

This pull request is automatically built and testable in CodeSandbox.

To see build info of the built libraries, click here or the icon next to each commit SHA.

StephenBarlow · 2022-08-24T18:06:11Z

docs/source/errors.md

@@ -91,6 +91,7 @@ The following errors might be raised during composition:
 | `UNKNOWN_FEDERATION_LINK_VERSION` | The version of federation in a @link directive on the schema is unknown. | 2.0.0 |  |
 | `UNKNOWN_LINK_VERSION` | The version of @link set on the schema is unknown. | 2.1.0 |  |
 | `UNSUPPORTED_FEATURE` | Indicates an error due to feature currently unsupported by federation. | 2.1.0 |  |
+| `UNSUPPORTED_LINKED_FEATURE` | Indicates that a feature used in a @link is either unsupported and is used with unsupported options. | 2.0.0 |  |


clenfest

I'm a little concerned about moving from Sets to arrays across the board, but if you're comfortable with it, it's ok with me. I left a few minor comments, but feel free to merge once you've cleaned those up.

clenfest · 2022-08-26T01:53:33Z

internals-js/src/definitions.ts

@@ -2271,15 +2290,12 @@ export class EnumType extends BaseNamedType<OutputTypeReferencer, EnumType> {
  }

  private removeValueInternal(value: EnumValue) {


Does this function have any reason to exist? It just calls another function and is only referenced once.

While that method is private, it's actually called from another class, side-stepping the privateness through EnumType.prototype['removeValueInternal'].call(). Which is admittedly a bit of a hack but the goal is to not expose it publicly yet still allow calling it from EnumValue. Bit of a poor-man friend emulation.

In any case, in that situation, this method is just meant to abstract direct accesses to the EnumType._values field so only EnumType has to care what the type of that field. Granted, not extremely important in this situation, and possibly more of a personal preference in that case, but I do prefer it over the alternative (of inline that method, which would require to have something like (this._parent as any)._values in EnumValue in particular).

clenfest · 2022-08-26T01:59:01Z

internals-js/src/definitions.ts

  }

  arguments(): readonly ArgumentDefinition<FieldDefinition<TParent>>[] {
-    return this._args.values();
+    return this._args ? this._args.values() : [];


Nit: You've been a little inconsistent about whether to use the ternary operator. For the most part you favored

return this._args?.values() ?? [];

So I'd stick with it.

clenfest · 2022-08-26T01:59:45Z

internals-js/src/definitions.ts

  }

  argument(name: string): ArgumentDefinition<FieldDefinition<TParent>> | undefined {
-    return this._args.get(name);
+    return this._args ? this._args.get(name) : undefined;


Same comment as above.

return this._args?.get(name);

clenfest · 2022-08-26T02:02:14Z

internals-js/src/definitions.ts

@@ -2589,7 +2608,7 @@ export class FieldDefinition<TParent extends CompositeType> extends NamedSchemaE
  }

  toString(): string {
-    const args = this._args.size == 0
+    const args = !this.hasArguments()


rearrange so that having arguments case is first? i.e.

const args = this.hasArguments() ? ... : ...;

clenfest · 2022-08-26T02:17:10Z

internals-js/src/extractSubgraphsFromSupergraph.ts

+      typeInfoInSubgraph,
+    );
+    reachableTypesBySubgraphs.set(subgraphName, reachableTypes);
+    console.log(`For ${subgraphName}, reachableTypes = [${[...reachableTypes]}]`);


leftover console.log

This also change a few sets into array in case where this doesn't make a meaningful difference performance wise to save on memory usage.

Fed1 supergraph lacks information on value types regarding which subgraphs defines them. We use to brute-force add all value types to all extract subgraphs, on the idea that if some extracted subgraphs have a few unused types, it's useless but has no functional impact. Unfortunately, in some special cases (lots of subgraphs and value types), those useless types can lead to a significant increase in memory consumptions. This patch instead look at type reachability within subgraphs to avoid including those useless value types, and thus lower the memory consumptions. Note that fed2 supergraphs are not affected by this problem has they have all the information needed to only extract types in the proper subgraphs. Fixes apollographql#2085

This was retaining a potentially non-trivial amount of memory (on large schema, that AST is not tiny) and afaict, none of our usage was relying on the caching much. The patch nonetheless introduce an option that allow to easily re-enable said caching so it's easy to enabled for specific use case later if we want to.

pcmanus · 2022-08-26T08:23:31Z

I'm a little concerned about moving from Sets to arrays across the board

It's a fair concern, and I should have expanded sooner on my reasoning.

First, I'm not suggesting we stop using Set altogether. There is still uses of them in the codebase in fact. It's just that "for relatively small collections", they have a non completely negligeable memory overhead over arrays without necessarilly a performance edge.

In the case of Schema, given how it's used, I think we need to watch for memory consumption a bit, maybe even trade a bit of pure performance for it, because the gateway/router reads/builds schemas only at startup, but them keep them in memory for all execution (and I'm obviously not saying startup time doesn't matter at all, just that it's not the hottest path performance wise either).

Anyway, the changes from set to array in the PR are for:

_extensions: I can't imagine an element ever having more than a handful of extensions in practice so those are almost guaranteed to be tiny. Using arrays for those is frankly probaly ever better perfomance wise (not that it matters).
_referencers: those are also probably pretty small on average, though some element can admittedly have a large-ish amount of referencers, but never more than there is schema elements, so we're not talking tens of millions either. We also weren't exposing those externally like a set, so the only thing sets were saving were the inclusion check on insertion. Overall, it might make the building of some extremly large schema a bit slower, which again only impact "startup", but in exchange probably mostsly just save some memory with no other impact in all other cases. I absolutely could be wrong, but I suspect it's a good tradeoff overall (hence the chance).

pcmanus mentioned this pull request Aug 24, 2022

Significantly higher memory usage in gateway 2.0.5 and 2.1.0-alpha #2085

Closed

pcmanus requested a review from StephenBarlow as a code owner August 24, 2022 17:19

pcmanus self-assigned this Aug 24, 2022

StephenBarlow reviewed Aug 24, 2022

View reviewed changes

Geal mentioned this pull request Aug 25, 2022

update to federation v2.1.0 apollographql/router#1546

Closed

clenfest approved these changes Aug 26, 2022

View reviewed changes

Sylvain Lebresne added 7 commits August 26, 2022 09:17

Remove depdendency on core-schema-js

f179d91

Avoids allocating empty arrays/sets/maps that are often not populated

b097535

This also change a few sets into array in case where this doesn't make a meaningful difference performance wise to save on memory usage.

Fixup tests

67b986e

Fixup error code doc

57e0ebd

Review updates + changelog

dc96ad7

pcmanus force-pushed the f2085-high-memory-fed1 branch from 46f1832 to dc96ad7 Compare August 26, 2022 07:49

pcmanus merged commit a593ac8 into apollographql:main Aug 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix high memory usage when extracting subgraphs for some fed1 supergraphs #2089

Fix high memory usage when extracting subgraphs for some fed1 supergraphs #2089

pcmanus commented Aug 24, 2022

netlify bot commented Aug 24, 2022 •

edited

Loading

codesandbox-ci bot commented Aug 24, 2022 •

edited

Loading

StephenBarlow Aug 24, 2022

clenfest left a comment

clenfest Aug 26, 2022

pcmanus Aug 26, 2022

clenfest Aug 26, 2022

clenfest Aug 26, 2022

clenfest Aug 26, 2022

clenfest Aug 26, 2022

pcmanus commented Aug 26, 2022

	\| `UNSUPPORTED_LINKED_FEATURE` \| Indicates that a feature used in a @link is either unsupported and is used with unsupported options. \| 2.0.0 \| \|
	\| `UNSUPPORTED_LINKED_FEATURE` \| Indicates that a feature used in a `@link` is either unsupported or is used with unsupported options. \| 2.0.0 \| \|

		@@ -2271,15 +2290,12 @@ export class EnumType extends BaseNamedType<OutputTypeReferencer, EnumType> {
		}

		private removeValueInternal(value: EnumValue) {

Fix high memory usage when extracting subgraphs for some fed1 supergraphs #2089

Fix high memory usage when extracting subgraphs for some fed1 supergraphs #2089

Conversation

pcmanus commented Aug 24, 2022

netlify bot commented Aug 24, 2022 • edited Loading

👷 Deploy request for apollo-federation-docs pending review.

codesandbox-ci bot commented Aug 24, 2022 • edited Loading

StephenBarlow Aug 24, 2022

Choose a reason for hiding this comment

clenfest left a comment

Choose a reason for hiding this comment

clenfest Aug 26, 2022

Choose a reason for hiding this comment

pcmanus Aug 26, 2022

Choose a reason for hiding this comment

clenfest Aug 26, 2022

Choose a reason for hiding this comment

clenfest Aug 26, 2022

Choose a reason for hiding this comment

clenfest Aug 26, 2022

Choose a reason for hiding this comment

clenfest Aug 26, 2022

Choose a reason for hiding this comment

pcmanus commented Aug 26, 2022

netlify bot commented Aug 24, 2022 •

edited

Loading

codesandbox-ci bot commented Aug 24, 2022 •

edited

Loading