Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation: improving overlapping fields quality #386

Merged
merged 1 commit into from
May 10, 2016
Merged

Conversation

leebyron
Copy link
Contributor

@leebyron leebyron commented May 10, 2016

This improves the overlapping fields validation performance and improves error reporting quality by separating the concepts of checking fields "within" a single collection of fields from checking fields "between" two different collections of fields. This ensures for deeply overlapping fields that nested fields are not checked against each other repeatedly. Extending this concept further, fragment spreads are no longer expanded inline before looking for conflicts, instead the fields within a fragment are compared to the fields with the selection set which contained the referencing fragment spread.

e.g.

{
  same: a
  same: b
  ...X
}

fragment X on T {
  same: c
  same: d
}

In the above example, the initial query body is checked "within" so a is compared to b. Also, the fragment X is checked "within" so c is compared to d. Because of the fragment spread, the query body and fragment X are checked "between" so that a and b are each compared to c and d. In this trivial example, no fewer checks are performed, but in the case where fragments are referenced multiple times, this reduces the overall number of checks (regardless of memoization).

BREAKING: This can change the order of fields reported when a conflict arises when fragment spreads are involved. If you are checking the precise output of errors (e.g. for unit tests), you may find existing errors change from "a" and "c" are different fields to "c" and "a" are different fields.

From a perf point of view, this is fairly minor as the memoization "PairSet" was already keeping these repeated checks from consuming time, however this will reduce the number of memoized hits because of the algorithm improvement.

From an error reporting point of view, this reports nearest-common-ancestor issues when found in a fragment that comes later in the validation process. I've added a test which fails with the existing impl and now passes, as well as changed a comment.

This also fixes an error where validation issues could be missed because of an over-eager memoization. I've also modified the PairSet to be aware of both forms of memoization. Also includes a test which fails before this patch.

@ghost ghost added the CLA Signed label May 10, 2016
@leebyron
Copy link
Contributor Author

I've been staring at this for way too long. Someone tell me if I'm off my rocker or if this looks sane.

Also, this validation check is by far the most complicated one in the rule set, so please speak up if any part of the change is unclear or under-commented.

Folks who may want to review: @robzhu, @dschafer, @OlegIlyenko, @finneganh

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 98.228% when pulling d650bc5 on within-between into 688a1ee on master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 98.228% when pulling 36f1223 on within-between into 688a1ee on master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 98.228% when pulling 5fcf80f on within-between into 688a1ee on master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.04%) to 98.287% when pulling b903a2c on within-between into 688a1ee on master.

if (selectionSet1 && selectionSet2) {
const visitedFragmentNames = {};
let subfieldMap = collectFieldASTsAndDefs(
cache,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth calling this fieldAndFragmentNameCache or somesuch throughout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea

@fionawhim
Copy link

I looked this over for a while. As far as I can tell it seems reasonable? I think I was able to follow the recursion enough that it was doing the right thing to the right values and would terminate.

The only quadratic memory growth now appears to be in fragment comparison, which is probably harder to get excessively large due to the raw bytes of syntax required to make each one.

The cache and comparedSet args could probably be given more specific names for clarity.

@leebyron
Copy link
Contributor Author

Thanks for the review, this is great feedback. Yeah, the comparison memoization being between fragments instead of fields should make it much harder to get problematic quadratic memory growth, but still potentially an issue. I'll look into ways to restrict that.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.04%) to 98.287% when pulling 0291fef on within-between into 688a1ee on master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.04%) to 98.287% when pulling dde663b on within-between into 688a1ee on master.

This improves the overlapping fields validation performance and improves error reporting quality by separating the concepts of checking fields "within" a single collection of fields from checking fields "between" two different collections of fields. This ensures for deeply overlapping fields that nested fields are not checked against each other repeatedly. Extending this concept further, fragment spreads are no longer expanded inline before looking for conflicts, instead the fields within a fragment are compared to the fields with the selection set which contained the referencing fragment spread.

e.g.

```graphql
{
  same: a
  same: b
  ...X
}

fragment X on T {
  same: c
  same: d
}
```

In the above example, the initial query body is checked "within" so `a` is compared to `b`. Also, the fragment `X` is checked "within" so `c` is compared to `d`. Because of the fragment spread, the query body and fragment `X` are checked "between" so that `a` and `b` are each compared to `c` and `d`. In this trivial example, no fewer checks are performed, but in the case where fragments are referenced multiple times, this reduces the overall number of checks (regardless of memoization).

**BREAKING**: This can change the order of fields reported when a conflict arises when fragment spreads are involved. If you are checking the precise output of errors (e.g. for unit tests), you may find existing errors change from `"a" and "c" are different fields` to `"c" and "a" are different fields`.

From a perf point of view, this is fairly minor as the memoization "PairSet" was already keeping these repeated checks from consuming time, however this will reduce the number of memoized hits because of the algorithm improvement.

From an error reporting point of view, this reports nearest-common-ancestor issues when found in a fragment that comes later in the validation process. I've added a test which fails with the existing impl and now passes, as well as changed a comment.

This also fixes an error where validation issues could be missed because of an over-eager memoization. I've also modified the `PairSet` to be aware of both forms of memoization, also represented by a previously failing test.
@leebyron leebyron merged commit 4afb263 into master May 10, 2016
@leebyron leebyron deleted the within-between branch May 10, 2016 21:50
sogko added a commit to sogko/graphql-js that referenced this pull request Jun 1, 2016
* master: (26 commits)
  0.6.0
  Validation: improving overlapping fields quality (graphql#386)
  Validation: context.getFragmentSpreads now accepts selectionSet rather than fragment AST node
  Factor out more closure functions
  Factor out closure functions to normal functions
  Deprecated directive (graphql#384)
  RFC: Directive location: schema definition (graphql#382)
  RFC: Schema Language Directives (graphql#376)
  Export introspection in public API
  Export directive definitions. (graphql#381)
  BUG: Ensure building AST schema does not exclude @Skip and @include (graphql#380)
  documentation of schema constructor
  Revert "Remove all 'instanceof GraphQLSchema' checks" (graphql#377)
  remove all 'instanceof GraphQLSchema' checks (graphql#371)
  Error logging for new interface type semantics (graphql#350)
  Nit: Missing case in grammar for TypeSystemDefinition in comment
  Bug: printer can print non-parsable value
  Factor out suggestion quoting utility
  Minor refactoring
  Minor refactoring of error messages for unknown fields
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants