Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query planning performance improvements #2610

Merged
merged 3 commits into from
Jun 21, 2023
Merged

Conversation

pcmanus
Copy link
Contributor

@pcmanus pcmanus commented May 31, 2023

This PR contains 2 relatively small (in patch size) performance improvement for query planning (performance in the sense of the time it takes to compute query plans):

  1. the first commit memoize when possible the result of the FetchGroup.isUseless, which was showing near the top of profiling some slow planning. This optimisation is arguably a minor one in that this probably only help specific cases, but it consistently reduced query planning time by ~15% on at least one example, so probably worth the small additional complexity.
  2. the 2nd commit is arguably more impactful: in some situation with multiple keys, the planner was not discarding some options that are trivially inefficient (see commit message and included test for details). Which while not incorrect per-se, was sometime multiplying the number of plans evaluated by a lot, leading to very inefficient query planning time. This change sometimes decrease query planning time by an order of magnitude (some plan that took close to 2s now takes less than 100ms).

@pcmanus pcmanus requested a review from a team as a code owner May 31, 2023 14:00
@netlify
Copy link

netlify bot commented May 31, 2023

👷 Deploy request for apollo-federation-docs pending review.

Visit the deploys page to approve it

Name Link
🔨 Latest commit faf6e59

@changeset-bot
Copy link

changeset-bot bot commented May 31, 2023

🦋 Changeset detected

Latest commit: faf6e59

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 7 packages
Name Type
@apollo/query-planner Patch
@apollo/query-graphs Patch
@apollo/federation-internals Patch
@apollo/gateway Patch
@apollo/composition Patch
@apollo/subgraph Patch
apollo-federation-integration-testsuite Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@pcmanus pcmanus self-assigned this May 31, 2023
@codesandbox-ci
Copy link

codesandbox-ci bot commented May 31, 2023

This pull request is automatically built and testable in CodeSandbox.

To see build info of the built libraries, click here or the icon next to each commit SHA.

query-planner-js/src/buildPlan.ts Outdated Show resolved Hide resolved
query-planner-js/src/buildPlan.ts Outdated Show resolved Hide resolved
query-planner-js/src/buildPlan.ts Outdated Show resolved Hide resolved
@@ -810,7 +813,21 @@ class FetchGroup {
// key for that. Having it here saves us from re-computing it more than once.
readonly subgraphAndMergeAtKey?: string,
private cachedCost?: number,
// Cache used to save unecessary recomputation of the `isUseless` method.
private isKnownUseful: boolean = false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to cache the status either way? i.e. the type should be boolean | undefined and you can just return the value if it's not undefined and run isUseful() otherwise? Obviously you should change the variable name if you do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it'd be better. First, as I mentioned in one of the comment of the patch, when a group is shown to be useless, it is removed right away, so it doesn't really make matter whether that information is cached or not But adding to that, to properly cache the status either way, we'd have to add invalidation for that other way: that, changes to the group selection would have to invalid this cache, when currently only change to the inputs (which happens to be less frequent in practice) does so. So implementing the other would make things possibly less efficient overall.

Comment on lines +1736 to +1881
// Given a list of just computed indirect paths and a field that we're trying to advance after those paths, this
// method fields any path that should note be considered.
//
// Currently, this handle the case where the key used at the end of the indirect path contains (at top level) the field
// being queried. Or to make this more concrete, if we're trying to collect field `id`, and the path last edge was using
// key `id`, then we can ignore that path because this imply that there is a way to `id` "some other way" (also see
// the `does not evaluate plans relying on a key field to fetch that same field` test in `buildPlan` for more details).
function filterNonCollectingPathsForField<V extends Vertex>(
paths: OpIndirectPaths<V>,
field: Field,
): OpIndirectPaths<V> {
// We only handle leafs. Things are more complex non-leaf.
if (!field.isLeafField()) {
return paths;
}

const filtered = paths.paths.filter((p) => {
const lastEdge = p.lastEdge();
if (!lastEdge || lastEdge.transition.kind !== 'KeyResolution') {
return true;
}

const conditions = lastEdge.conditions;
return !(conditions && conditions.containsTopLevelField(field));
});
return filtered.length === paths.paths.length
? paths
: {
...paths,
paths: filtered
};

}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add tests for this function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some tests. Not directly for this function because I preferred keeping it non-exported/private, but for the function that directly call this (advanceSimultaneousPathsWithOperation) and for the behaviours that this method tackle.

pcmanus added 2 commits June 20, 2023 10:12
Profiling of some slow query planning shows `FetchGroup.isUseless`
as one of the hot path. This commit caches the result of this method
for a group, and only invalid that cache when we know the result may
needs to be recomputed. On the planning of some queries, this is
shown to provide a 15% improvement to query planning time.
When a type has multiple keys, the query planning was sometimes
considering an option where some key `x` was used to get field `y`
but then key `y` was used to get that same `y` field from another
subgraph. This is obviously not very useful, and we know we can
ignore those paths as the 1st part of those path already does what
we want. But considering those (useless) options, while harmless
for correction, was in some case drastically increasing the number
of plans that were evaluated, leading to long query planning
times.

In other words, this commit improve query planning times in some
cases (and at times significantly).
@pcmanus pcmanus merged commit 7ac8345 into apollographql:main Jun 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants