Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert to synchronous GraphQL execution and data loading #1531

Merged
merged 20 commits into from
Nov 30, 2020

Conversation

martijnwalraven
Copy link
Contributor

This PR removes our promise implementation and modifies GraphQLExecutor to execute GraphQL operations synchronously. It also changes the NormalizedCache protocol to expose synchronous methods instead of callbacks.

Perhaps surprisingly, this should actually improve our ability to execute operations concurrently. ApolloStore still supports concurrent reads and blocking writes through its existing transaction model. Operations within a transaction will execute synchronously however, which avoids the threading issues we were seeing before as a result of over-relying on fine grained asynchronous execution.

In the existing model, the overhead of constant context switches and threads blocking has been shown to negatively impact performance. And it also runs the danger of thread explosion and starvation, which we’ve seen mostly in tests but could also become an issue under real world conditions.

This is a pretty substantial change, so it would be great if people could try this out and report back. I've seen some performance improvements in tests, but real world usage may show different results.


Because we rely on lazy evaluation to perform efficient batched data loading, this PR adds a PossiblyDeferred wrapper that allows us to keep the same data loading semantics as before, while avoiding asynchronous execution. Instead, deferred values invoke a thunk when they are accessed, which is used to load the next batch of records from the cache.

@martijnwalraven
Copy link
Contributor Author

@bharath2020 Since you mentioned slow data loading performance in #1519, it would be interesting to see what the numbers look like with this PR. There is much more we can do to improve performance, but this may already make a difference.

@martijnwalraven
Copy link
Contributor Author

@RolandasRazma @AnthonyMDev This PR reverts the lock that was added to InMemoryNormalizedCache in #552. The transaction model of ApolloStore should be enough to guarantee there are no concurrent writes to the cache. The problem with locking at the normalized cache level is that multiple batches of records will be loaded in the course of executing the same GraphQL operation, so locking individual calls to the normalized cache won't provide proper transaction isolation. This also means sharing a NormalizedCache between clients is not supported. Sharing an ApolloStore between clients should work however.

@RolandasRazma
Copy link
Contributor

thanks for info @martijnwalraven
It's not ideal as you would think of Cache/Store as a lowest possible level to share. Being it not thread safe feels like mine field.

@bharath2020
Copy link
Contributor

@martijnwalraven Will take a look at it and share the results soon.

@martijnwalraven
Copy link
Contributor Author

martijnwalraven commented Nov 20, 2020

thanks for info @martijnwalraven
It's not ideal as you would think of Cache/Store as a lowest possible level to share. Being it not thread safe feels like mine field.

The store is the lowest possible level to share, and it is thread-safe, but the normalized cache isn't designed to be used independently. Individual operations on the cache cannot offer any transaction guarantees, so that's why you always have to go through the store.

I hope this makes sense. The main issue is that multiple batches of records will be loaded in the course of loading a single GraphQL query from the store. So you want to make sure there are no intervening writes. That's why ApolloStore has withinReadTransaction and withinReadWriteTransaction.

@designatednerd
Copy link
Contributor

The store is the lowest possible level to share, and it is thread-safe, but the normalized cache isn't designed to be used independently.

We should probably take a swing at updating the docs to a) Differentiate what is a store vs what is a cache and b) make sure we're crystal clear on the point that you can share a store between multiple clients, but not a cache between multiple stores (with the attendant multiple clients)

@designatednerd
Copy link
Contributor

Screen Shot 2020-11-20 at 5 20 48 PM

These are my favorite kinds of PRs 😃

@AnthonyMillerSF
Copy link
Contributor

The store is the lowest possible level to share, and it is thread-safe, but the normalized cache isn't designed to be used independently.

We should probably take a swing at updating the docs to a) Differentiate what is a store vs what is a cache and b) make sure we're crystal clear on the point that you can share a store between multiple clients, but not a cache between multiple stores (with the attendant multiple clients)

Been a while since I've looked at this code. Do you think there is a simple way to add some asserts that fail if you attempt to use a cache with multiple stores? Make this fail in DEBUG builds and shoot off a warning in prod builds?

Copy link
Contributor

@designatednerd designatednerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot tell you how delighted I am to have you back to help rip this out, I'd have made such a hash of this 😄

I think for others reading this it's also important to note that while the cache itself is synchronous within itself, it's being read on an async thread via transactions in the store, so that it's not blocking the thread it's been called on. The wording of "synchronous GraphQL execution" I think may give some people a heart attack if they don't realize that.

<key>com.apple.XCTPerformanceMetric_WallClockTime</key>
<dict>
<key>baselineAverage</key>
<real>0.075663</real>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you setting up the perf tests to run in this PR? If so, we should probably get a baseline from CI rather than from one of our (almost certainly faster) local machines.

If it's not on this PR, keep that in mind for when we turn it on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I haven't looked into establishing a baseline on CI. That would have to somehow remember the results and make a commit, not sure how other projects handle that. I also don't think we should run performance tests on CI by default, because they are pretty costly (they need a whole separate optimized build). So it might be ok to only run them locally for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can set up a separate job to do the perf tests after merges - I'd definitely like to have them automated and running frequently, but it does make sense not to have them be a blocker to merging PRs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this should be handled in a separate PR thoguh)

Sources/Apollo/ApolloStore.swift Outdated Show resolved Hide resolved
self.queue.async {
do {
let returnValue = try body(ReadTransaction(store: self))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like an extra newline snuck in here and below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what happened there, I can't seem to find it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird, I'm still seeing it in the diff - lines 107 and 131

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, those were intentional. Do you think it's too much?

Sources/Apollo/DataLoader.swift Outdated Show resolved Hide resolved
Sources/Apollo/DataLoader.swift Show resolved Hide resolved
Sources/ApolloSQLite/SQLiteNormalizedCache.swift Outdated Show resolved Hide resolved
Tests/ApolloTests/BatchedLoadTests.swift Show resolved Hide resolved
Tests/ApolloTests/DataLoaderTests.swift Outdated Show resolved Hide resolved
Tests/ApolloTests/DataLoaderTests.swift Outdated Show resolved Hide resolved
Tests/ApolloTests/ResultOrPromiseTests.swift Show resolved Hide resolved
@martijnwalraven
Copy link
Contributor Author

@martijnwalraven Will take a look at it and share the results soon.

Great, thanks! I should also mention it's important to run performance tests on a release build, because compile optimizations like inlining seem to make a huge difference for this code.

@bharath2020
Copy link
Contributor

@martijnwalraven I have the results for reading from SQLiteDatabase. For the same set of records (200) and device (iPhone XS), I used to measure the read timings in #1519, The read timing with the changes in this PR is brought down to 26ms from 80ms that I had in 1519 PR.

Sources/Apollo/ApolloStore.swift Outdated Show resolved Hide resolved
Sources/Apollo/ApolloStore.swift Show resolved Hide resolved
Sources/Apollo/DataLoader.swift Outdated Show resolved Hide resolved
Sources/Apollo/DataLoader.swift Show resolved Hide resolved
Sources/Apollo/GraphQLExecutor.swift Show resolved Hide resolved
Sources/Apollo/GraphQLExecutor.swift Show resolved Hide resolved
Sources/Apollo/GraphQLExecutor.swift Outdated Show resolved Hide resolved
Sources/Apollo/GraphQLQueryWatcher.swift Outdated Show resolved Hide resolved
Sources/Apollo/PossiblyDeferred.swift Show resolved Hide resolved
func testLazilyEvaluateAllIsDeferred() throws {
let possiblyDeferreds: [PossiblyDeferred<String>] = [.deferred { "foo" }, .deferred { "bar" }]

var numberOfInvocations = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all these numberOfInvocations tests make me wonder: Are we doing anything to limit things to one invocation? or is this something where the number of invocations could be arbitrary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of invocations could be arbitrary, so any caching you want to do would happen downstream (in DataLoader for example). I thought about making PossiblyDeferred cache the result by changing itself from deferred to immediate the first time the deferred result gets resolved. But that won't really work because it is a value type. And changing it to a class makes these more heavy weight than I wanted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True - I think i'm more thinking what happens if you accidentally call get twice - you'd get foobarbar, right? I guess at that point the format of the incorrect data should be a giveaway, but if this is in a big chain it could be difficult to tell where you accidentally called something twice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you're talking specifically about what happens with the test. We also have assertions for numberOfInvocations, so that should tell you what happened?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No - I was using the test setup as an example. I'm wondering what happens if you call it twice in general. It's clear that in the test it's only getting called once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I know what you mean. A PossiblyDeferred that's .deferred can be invoked multiple times, that's by design and should not be an issue. It's similar to how you could call result.get() multiple times.

@martijnwalraven
Copy link
Contributor Author

@martijnwalraven I have the results for reading from SQLiteDatabase. For the same set of records (200) and device (iPhone XS), I used to measure the read timings in #1519, The read timing with the changes in this PR is brought down to 26ms from 80ms that I had in 1519 PR.

Thanks for checking, that's great news! Do you mean 260ms and 800ms by any chance, since #1519 mentions 0.8 seconds?

@bharath2020
Copy link
Contributor

bharath2020 commented Nov 24, 2020

Thanks for checking, that's great news! Do you mean 260ms and 800ms by any chance, since #1519 mentions 0.8 seconds?

Ah, typo. Yes It is 260ms v/s 800ms

@@ -1,4 +1,5 @@
import Foundation
import ApolloCore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For CocoaPods purposes this needs to be wrapped in an #if !COOCAPODS gate because of how we have the lib set up. I'll make a ticket to deal with actually getting the libs to reflect the proper names for coocapods, but for now just slapping the wrapper on should do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, done. We should talk about this separately, but I'm wondering why ApolloCore is a separate module, and what should or shouldn't go in there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The short answer is some stuff is used in both the codegen lib and in the main lib, the long answer is that eventually we'd like to separate the types needed for the codegen from the runtime to reduce what needs to be imported.

Copy link
Contributor

@designatednerd designatednerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix that merge conflict and you're good to go! :shipit:

@martijnwalraven martijnwalraven merged commit 982195d into main Nov 30, 2020
@designatednerd designatednerd added this to the Next Release milestone Nov 30, 2020
@calvincestari calvincestari deleted the synchronous-execution branch June 30, 2021 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants