Increase default in-memory cache size or disable AST parsing #2227

smyrick · 2022-11-01T19:12:11Z

With the addition of #1878, by default in the gateway, we are now storing the full AST object in memory AND the query plan document for every operation. We turned on this feature without updating the default store size:

federation/gateway-js/src/index.ts

Lines 196 to 201 in 6bfec24

    
           // Create ~about~ a 30MiB InMemoryLRUCache.  This is less than precise 
        
           // since the technique to calculate the size of a DocumentNode is 
        
           // only using JSON.stringify on the DocumentNode (and thus doesn't account 
        
           // for unicode characters, etc.), but it should do a reasonable job at 
        
           // providing a caching document store for most operations. 
        
           maxSize: Math.pow(2, 20) * (approximateQueryPlanStoreMiB || 30),

Some enterprise customers are starting to see increased usage and scaling needs to now handle this change of usage which means a big jump in resources or time process cache evictions and the latter is causing some timeouts

If we are not going to increase the default cache size we might want to consider storing the AST in a separate store or turning it on with a flag so users of the parsed AST can opt in and can also be aware of the increased memory requirements

pcmanus · 2022-11-02T07:32:42Z

Yeah, this is probably my bad here for not thinking this through entirely, but I think the mistake was to make exposeDocumentNodeInFetchNode true by default: having those documents available is only useful to specific users that make explicit use of them, and it's easier to ask those few users to opt-in than increase the requirements to anyone else, forcing them to opt-out if necessary.

So my suggestion would be to use the upcoming 2.2 release to change that default (with a clear mention in the changelog): I reckon there is a risk of surprising someone relying on #1878, but 1) it's "for the greater good" and 2) it's unlikely to be widely used yet and we can explicitly ping the author of that PR.

That said, we can additionally improve the LRU cache size computation to check if the option is set and to increase its estimation in this case.

Since apollographql#1878 (added in 2.1.0), the full AST of subgraph fetch operations has been store in query plans. However, this is not used by the gateway by default, while it noticeably increases the size of query plans, leading to either a less efficient query plan cache (if its size is limited) or increased memory consumption/pressue (if the cache is given plenty of size). This patch disable this option by default. Users that explicitely need to access the subgraph fetch ASTs can still make the trade-off of additional memory pressure by manually using the `GatewayConfig.queryPlannerConfig.exposeDocumentNodeInFetchNode` config option. Fixes apollographql#2227.

Since #1878 (added in 2.1.0), the full AST of subgraph fetch operations has been store in query plans. However, this is not used by the gateway by default, while it noticeably increases the size of query plans, leading to either a less efficient query plan cache (if its size is limited) or increased memory consumption/pressue (if the cache is given plenty of size). This patch disable this option by default. Users that explicitely need to access the subgraph fetch ASTs can still make the trade-off of additional memory pressure by manually using the `GatewayConfig.queryPlannerConfig.exposeDocumentNodeInFetchNode` config option. Fixes #2227

samuelAndalon · 2022-11-02T18:31:16Z

@pcmanus i wonder what was the reason to expose the AST of fetch nodes on first place, i would image some processing before sending requests to subgraph, but that comes with a big cost, which is the memory usage of the query plan store, also, if the plan for long term is to allow clients to define their own store implementation like redis, having to serialize and deserialize the query plan might cause a CPU bottleneck.

pcmanus · 2022-11-02T20:16:58Z

would image some processing before sending requests to subgraph

That's my understanding, some sort of "processing". I could imagine a number of things that could be helped by having the request in "usable" form: conditional logging based on the request, fine-grained gateway-side authorisation, some cost-based rate-limiting, etc... Truth be told, this was a user request (see #1352 for the original ask) and I don't have a personal rational for it, but the ask felt reasonable so I saw no reason to oppose it. It definitively was a mistake to enable it by default, and that is entirely my bad for missing/underestimating that problem.

but that comes with a big cost, which is the memory usage of the query plan store

While it's true, if you do need access to the request on sent, re-parsing it every time (which you'd have to do if it's not kept around) can be rather costly CPU-wise and potentially add latency to your queries if whatever processing you do is not async, and more memory usage could well be a worthy trade-off in such cases.

Again, enabling this by default was a mistake. But letting everyone pick whichever trade-off make the most sense for his specific use case, I'm ok with.

pcmanus · 2022-11-18T10:32:09Z

Closing as fixed by #2230.

pcmanus self-assigned this Nov 2, 2022

pcmanus mentioned this issue Nov 2, 2022

Disable exposeDocumentNodeInFetchNode by default #2230

Merged

pcmanus mentioned this issue Nov 2, 2022

feat: Expose document representation of sub-query request within GraphQLDataSourceProcessOptions so that it is available to RemoteGraphQLDataSource.process and RemoteGraphQLDataSource.willSendRequest #1878

Merged

pcmanus added this to the 2.2.0 milestone Nov 3, 2022

This was linked to pull requests Nov 3, 2022

Fix validation of variable on input field not taking default into account #2176

Merged

Preserve default values for input object fields when extracting subgraph #2218

Merged

Disable exposeDocumentNodeInFetchNode by default #2230

Merged

This was unlinked from pull requests Nov 3, 2022

Fix validation of variable on input field not taking default into account #2176

Merged

Disable exposeDocumentNodeInFetchNode by default #2230

Merged

pcmanus linked a pull request Nov 3, 2022 that will close this issue

Disable exposeDocumentNodeInFetchNode by default #2230

Merged

pcmanus closed this as completed Nov 18, 2022

pcmanus mentioned this issue Nov 24, 2022

Router should support resolving interface implementations across subgraph #2193

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase default in-memory cache size or disable AST parsing #2227

Increase default in-memory cache size or disable AST parsing #2227

smyrick commented Nov 1, 2022 •

edited

Loading

pcmanus commented Nov 2, 2022

samuelAndalon commented Nov 2, 2022

pcmanus commented Nov 2, 2022

pcmanus commented Nov 18, 2022

Increase default in-memory cache size or disable AST parsing #2227

Increase default in-memory cache size or disable AST parsing #2227

Comments

smyrick commented Nov 1, 2022 • edited Loading

pcmanus commented Nov 2, 2022

samuelAndalon commented Nov 2, 2022

pcmanus commented Nov 2, 2022

pcmanus commented Nov 18, 2022

smyrick commented Nov 1, 2022 •

edited

Loading