-
Notifications
You must be signed in to change notification settings - Fork 257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query planning performance improvements #2610
Conversation
👷 Deploy request for apollo-federation-docs pending review.Visit the deploys page to approve it
|
🦋 Changeset detectedLatest commit: faf6e59 The changes in this PR will be included in the next version bump. This PR includes changesets to release 7 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
This pull request is automatically built and testable in CodeSandbox. To see build info of the built libraries, click here or the icon next to each commit SHA. |
@@ -810,7 +813,21 @@ class FetchGroup { | |||
// key for that. Having it here saves us from re-computing it more than once. | |||
readonly subgraphAndMergeAtKey?: string, | |||
private cachedCost?: number, | |||
// Cache used to save unecessary recomputation of the `isUseless` method. | |||
private isKnownUseful: boolean = false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be better to cache the status either way? i.e. the type should be boolean | undefined
and you can just return the value if it's not undefined and run isUseful() otherwise? Obviously you should change the variable name if you do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it'd be better. First, as I mentioned in one of the comment of the patch, when a group is shown to be useless, it is removed right away, so it doesn't really make matter whether that information is cached or not But adding to that, to properly cache the status either way, we'd have to add invalidation for that other way: that, changes to the group selection
would have to invalid this cache, when currently only change to the inputs
(which happens to be less frequent in practice) does so. So implementing the other would make things possibly less efficient overall.
// Given a list of just computed indirect paths and a field that we're trying to advance after those paths, this | ||
// method fields any path that should note be considered. | ||
// | ||
// Currently, this handle the case where the key used at the end of the indirect path contains (at top level) the field | ||
// being queried. Or to make this more concrete, if we're trying to collect field `id`, and the path last edge was using | ||
// key `id`, then we can ignore that path because this imply that there is a way to `id` "some other way" (also see | ||
// the `does not evaluate plans relying on a key field to fetch that same field` test in `buildPlan` for more details). | ||
function filterNonCollectingPathsForField<V extends Vertex>( | ||
paths: OpIndirectPaths<V>, | ||
field: Field, | ||
): OpIndirectPaths<V> { | ||
// We only handle leafs. Things are more complex non-leaf. | ||
if (!field.isLeafField()) { | ||
return paths; | ||
} | ||
|
||
const filtered = paths.paths.filter((p) => { | ||
const lastEdge = p.lastEdge(); | ||
if (!lastEdge || lastEdge.transition.kind !== 'KeyResolution') { | ||
return true; | ||
} | ||
|
||
const conditions = lastEdge.conditions; | ||
return !(conditions && conditions.containsTopLevelField(field)); | ||
}); | ||
return filtered.length === paths.paths.length | ||
? paths | ||
: { | ||
...paths, | ||
paths: filtered | ||
}; | ||
|
||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add tests for this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some tests. Not directly for this function because I preferred keeping it non-exported/private, but for the function that directly call this (advanceSimultaneousPathsWithOperation
) and for the behaviours that this method tackle.
Profiling of some slow query planning shows `FetchGroup.isUseless` as one of the hot path. This commit caches the result of this method for a group, and only invalid that cache when we know the result may needs to be recomputed. On the planning of some queries, this is shown to provide a 15% improvement to query planning time.
When a type has multiple keys, the query planning was sometimes considering an option where some key `x` was used to get field `y` but then key `y` was used to get that same `y` field from another subgraph. This is obviously not very useful, and we know we can ignore those paths as the 1st part of those path already does what we want. But considering those (useless) options, while harmless for correction, was in some case drastically increasing the number of plans that were evaluated, leading to long query planning times. In other words, this commit improve query planning times in some cases (and at times significantly).
This PR contains 2 relatively small (in patch size) performance improvement for query planning (performance in the sense of the time it takes to compute query plans):
FetchGroup.isUseless
, which was showing near the top of profiling some slow planning. This optimisation is arguably a minor one in that this probably only help specific cases, but it consistently reduced query planning time by ~15% on at least one example, so probably worth the small additional complexity.