-
-
Notifications
You must be signed in to change notification settings - Fork 454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subscription cache updates triggered during a mutation can cause cache tearing #1525
Comments
Without a reproduction I'll just have to rely on your description, but can you point out what turns |
Ah sorry I forgot to add a reference to the original discussion that contained a bit more information: #1492 Originally I thought this was just a case of partial data, although it's not mapped to ReasonUrql's PartialData type but to the Data type. Additionally it doesn't just happen when the The shortened version of the query (as posted in the discussion) is query ConversationMessages($id: ID!, $messagesLast: Int!, $messagesBefore: Cursor) {
chatConversation(id: $id) {
...Conversation_View @arguments(messagesLast: $messagesLast, messagesBefore: $messagesBefore)
}
viewer { id }
} Here I've added the full fragments in the following Gist as well as the state of the cache (using Urql devtools' explorer) that I still had open: https://gist.github.com/Kingdutch/b875483eb5673475f883b3b0827d171c |
Ok, lest me reading raw cache inputs though, those aren't that important. The question is whether your updater is seeing the correct data, writing the correct links, reading / handling null values correctly, and whether your mutation and listing fragments overlap sufficiently |
This is the minimum mutation that was able to reproduce the error (only the function referenced above is left). The commented out parts are things I was able to eliminate. viewerSendUserChatMessage: (result, args, cache, info) => {
// Our optimistic update doesn't include a clientMutationId but our result does,
// so we use it to remove the optimistically added message.
if (typeof result.clientMutationId !== "undefined") {
// cache.invalidate({__typename: 'ChatMessage', id: 'optimistic-' + result.clientMutationId});
}
const message = result.viewerSendUserChatMessage.message;
if (message === null) {
ErrorHandling.report(new Error("Successful chat message send mutation did not contain a message."));
return null;
}
const cachedMessage = cache.resolve({ __typename: message.__typename, id: message.id});
console.log("mutation", message.__typename, message.id, cachedMessage);
// Ensure we have a conversation to update.
if (!message.conversation || !message.conversation.__typename || !message.conversation.id) {
ErrorHandling.report(new Error("viewerSendUserChatMessage mutations must request conversation __typename and id to perform updates."));
return null;
}
updateConversationMessageLists(cache, message);
// updateViewerRecentConversationsList(cache, message.conversation.id);
// markChatConversationRead(cache, message.conversation.__typename, message.conversation.id, message.sent.timestamp);
// We have no reliable way of knowing how many messages were marked as read or unread so we can only invalidate the global unread count.
// invalidateViewerChatReadStatus(cache);
}, This is the minimum subscription with the same thoughts. I've not commented out the chatMessageReceived: (result, args, cache, info) => {
const message = result.chatMessageReceived;
// We need a message __typename and id to perform cache checks.
if (!message.id || !message.__typename) {
ErrorHandling.report(new Error("chatMessageReceived subscriptions must request __typename and id to perform updates."));
return null;
}
const cachedMessage = cache.resolve({ __typename: message.__typename, id: message.id});
console.log("subscription", message.__typename, message.id, cachedMessage);
// Ensure we have a conversation to update.
if (!message.conversation || !message.conversation.__typename || !message.conversation.id) {
ErrorHandling.report(new Error("chatMessageReceived subscriptions must request conversation __typename and id to perform updates."));
return null;
}
// If this is the first message in a conversation then we must create the conversation.
if (cache.resolve({ __typename: 'Query' }, 'chatConversation', {id: message.conversation.id }) === null) {
cache.updateQuery(
{
query: `
query ($id: ID!) {
chatConversation(id: $id) {
__typename
id
name
messages(last: 1) {
edges {
node {
id
...Chat_Message
}
}
}
}
}
${Chat_Message.query}
`,
variables: {
id: message.conversation.id
}
},
_ => ({
chatConversation: {
...message.conversation,
messages: {
__typename: "ChatConversationChatMessageConnection",
edges: [{
__typename: "ChatConversationChatMessageEdge",
node: message
}]
},
}
})
)
}
updateConversationMessageLists(cache, message);
// updateViewerRecentConversationsList(cache, message.conversation.id);
// // If this is a new message from another user then we treat it as unread.
// const viewer = getViewer(cache);
// if (viewer !== null && viewer.id !== message.sender.id) {
// incrementChatConversationUnreadCount(cache, message.conversation.__typename, message.conversation.id);
// incrementViewerChatUnreadCount(cache);
// }
}
I'm assuming these things are correct because for single messages everything is working as expected. I was also not able to reproduce it in a scenario where the connection to the subscription was failing (so it was only the mutation which was frequently updating). These things combined lead me to believe the issue is caused in the interplay of mutation and subscription when multiple mutations (and thus subscription messages) are triggered. This is the event view for the two faulty mutations (and the subscription). The top Q is a query from a portal that fetches a different query for The S without event is for another subscription type for which no data is being sent in this scenario. The two mutation are shown and the query that receives partially null'ed data is shown right above that. |
I'm not sure if it's related but it seems to be an unexpected error in either case. I was trying to reproduce this with the debugger enabled and asked it to pause on exceptions which shows that there are circumstances in which graphcache can throw a type error. I was able to track this down to a destructuring in Since this uses global state this could explain why the race condition occurs? One such place where I see an async clear of global state is at https://github.com/FormidableLabs/urql/blob/618f791a011bfb55de2a8d8bede3dc4132b4af75/exchanges/graphcache/src/store/data.ts#L146-L152 The firing moment of subscriptions are not controlled by Urql (is what I'm assuming since it originates from a websocket event). So it can occur that the planned cleanup of a mutation occurs during the processing of a subscription (or vice-versa). This would cause the data to be thrown out unexpectedly for the current operation being processed, corrupting the cache. I've not been able to actually expose the error and get it output on the console by adjusting code or monkeypatching |
I've been trying to build a reproduction case with the codesandbox links provided (I missed those when reporting, sorry about that! Great addition since my last bug report, making a representative server is now a lot easier! ❤️ ) Client: https://codesandbox.io/s/subscription-on-mutation-client-d7w33?file=/src/cache.js I'm not able to get the exact behaviour as I'm experiencing in my own application but I do see some (for me) unexpected things happening in the cache, that may cause the behaviour that I saw. Below is a short screen recording. Screen.Recording.2021-04-09.at.12.17.25.movIn the cache updates I'm mutating the incoming entity to append to the message where it arrived (either mutation or subscription). The idea was that since I only wrote new messages I could see where the message first arrived while trying to break it. However, I see that for some reason, outside of my own cache updates, the cache is being changed with the original message from the server. Sometimes this overrides the optimistic update (which has a different ID) but mostly it overrides the cache writes from my caching logic (this happened even if I gave those a different ID). Edit: If I disable the subscription in |
Ok, that's expected and I'm actually happy that you're seeing this, believe it or not 😆 The reason why you'll never see tearing ("mixed optimistic and non-optimistic results") in a Graphcache app without lists and subscriptions is that we have a mechanism to prevent tearing there. This is explained in our "Optimistic Mutation" part of the docs now:
What this means is that optimistic mutations will be queued up until all of their mutations have settled and have received a server result. This means that we can safely "flush out" all of the optimistic changes and instead apply the real ones. This cannot apply to subscriptions however. We realised that subscriptions are more time sensitive. Meaning, we can either queue all subscriptions that collide with the optimistic mutations up too and flush them at once (or flush them again and apply them optimistically). So that's a huge complexity add. We'd apply subscription results on optimistic layers (and we don't even have the means to apply them because we'd have to generate new IDs for their results), then when all optimistic mutations settle we'd flush them out and also apply them again, in order with mutations. Instead, we realised that once subscriptions are in play, it's better to not do anything and instead don't apply this special behaviour. This is advantageous because once you have subscriptions, you already have timings you'd like to follow. There are two problems here you have to address:
Both of these problems can be addressed by treating the entire list as unreliable. Instead, you'll need a way to identify the messages yourself. You could attach a small This then means that the optimistic mutation can generate and write to the same object that non-optimistic mutations can, since the ID is known. On the server you'd probably choose to still generate / auto-generate your normal IDs, since those are "safe" and not user-controlled. The optimistic updater then just, as usual, creates a message with the so, tl;dr: Subscriptions don't do any "magical" tearing protection, because that wouldn't quite give the "visual appeal" that you'd expect unless we'd put a lot of layers of logic in; even then, they'd probably stop feeling real time, if we did anything special. |
Okay right! However, I think that's then a limitation of my reproduction, not of the production code that had the original issue. Specifically in my production mutation I would do the following: if (typeof result.clientMutationId !== "undefined") {
cache.invalidate({__typename: 'ChatMessage', id: 'optimistic-' + result.clientMutationId});
} (In my post above this is commented out to eliminate it as a cause for my problems). So that should address the two points that you raised (in my production code). One thing that's unclear to me though is where in my recording the lines without any
How would the subscription get the I've adjusted the example client to invalidate the optimistic update when the mutation completes. However, this only works if there is no competing subscription. Screen.Recording.2021-04-09.at.14.09.12.movAs you can see for the first few messages the optimistic update is properly removed. However, when I rapidly trigger multiple mutations then this stops working. Perhaps I understand what's happening now, but not how I'm supposed to handle it? |
Okay, I think I get it.
So this means that in my Adding this (which is true for subscriptions) seems to work. Sort of. if (filterOptim) {
data.messages = data.messages.filter(
(m) => !m.id.startsWith("optimistic-")
);
} Screen.Recording.2021-04-09.at.14.18.06.movWhat I'm confused about is that I don't understand where the messages without the |
That doesn't really work. If you filter out all optimistic updates in your subscription, then on any subscription result you'll lose all optimistic messages. In other words, if a different user sends a message and that comes in as a subscription result, then you'd lose all pending message from the active user. Also, invalidating the optimistic mutation item may not do what you expect it to do. It'd be a lot more effective if you'd simplify the problem by letting the normalised cache know the ID from optimistic mutation to mutation result ahead of time. The messages without mutation and subscription? That's easily explained, I'd guess you have a subscription that updates a message of a completed mutation. If the mutation completes first then you'll have |
I'm not sure how I'd do that? The server assigns IDs so the client can't predict what the ID will be for a mutation result. However, since the server assigns IDs, and I don't have a
In that case I'd agree with you. Except that's not true. I'm updating the cache through two routes:
So as far as I understand I shouldn't see any new results without things appended. Unless the cache is secretly fetching query results in the background and somehow merging that in, but I've not been able to see any network activity that would point to this, nor would I understand why it does that (since my update functions should be sufficient). EDIT: In the video you can also very shortly see |
Like I started with before, I'd generate an ID on the client-side and attach it. So generally I'd maybe send an additional So instead of linking things up with
It is true. The subscription updater runs after the main subscription. So by the time you're trying to append an updated message the message has already been appended (most of the time) by the mutation and your subscription updater doesn't append anything new. This can often even work the other way around, where the mutation updates the subscription message. Remember, at some point your message is already in the list at which point your updaters aren't fully running and updating the message, because they abort when the message is already in the list. |
Okay that's something I'd have to think about. Storing that on our server for this specific case should be doable but there are other data types with similar constraints/real-time setups where storing this would be a non-trivial change to our architecture. I also think storage is a requirement to prevent malicious actors from sending duplicate
My mutation updater does {
...message,
message: message.message + " (mutation)"
} My subscription updater does {
...message,
message: message.message + " (subscription)"
} Yet the output shows a naked message. I understand that my updaters abort if the other updater has already processed the message. However, what's the third case here that seems to be bypassing both of my updaters? I only expect naked messages to happen when messages are fetched in a query from the server since that's neither a mutation nor a subscription, so those messages aren't modified by my updaters. When adding messages after the initial page load, the only network traffic I see are mutations and subscriptions. EDIT: I'm referring to 78/79 in this image. (The coloring was added because some mutations were being added out of order, which was resolved by no longer invalidating optimistic updates but replacing those with the final mutation/subscription results as per your suggested cuid pattern) |
When the message is already present in the list and the normalised cache writes the new data automatically to the message entity directly. That's when the message updates without any of your updaters. Edit: One idea I'd have would maybe work here too if you can't add a stable secondary ID for the clients. You can filter out optimistic messages (as you're already doing) but only if your current updater is non-optimistic, i.e. triggered by a mutation result. You can detect this by checking Edit 2: Then you'd only have duplicate messages if the subscription event comes in before the mutation result. You can make this less likely on the API by delaying the |
Aaaaah I see, okay :) Yeah unfortunately my subscription usually arrives (but not always) before my mutation result at the moment. Given that my subscriptions are served by a separate process from a message over a RabbitMQ queue and the message is put on the RabbitMQ queue before the mutation returns (since that's a PHP web request in Drupal that needs to finish its response). My constraints are, shall we say, interesting 🙈 . Thanks for the help! At least this means I have some improvements to make in the caching code of the actual application 🎉 . I'll get back to that on Monday (today was my day off but didn't want to let you wait too long for my reply) and then I'll see if those improvements also solve my original problem (or whether I need to improve my reproduction sandbox). |
I ended up being impatient and working on it today anyway 😇 It took me a lot of trial and error but I think I managed to get it right. It was indeed the problem that we described here :D I ended up adding a One thing that did trip me up along the way is that I was persisting the I was able to work around this because I also used the It looks like when properly handling those values as you described it works very well and I wasn't able to break it with about 100 near simultaneous messages :D The only thing to make it more robust would be to figure out why |
Just removed all the debugging code, did a redeploy and tried to break it again for good measure. Still held up so I'm calling it a win. 🎉 |
I may have found a race condition in Graphcache updates for cache updates that are triggered by both a mutation and a subscription. Both are needed for a chat where messages are sent that need to be synced across multiple devices.
In particular I believe that it's caused by multiple rapid mutations causing the subscription updater to be called before the updates for a successful mutation which in turns breaks reconciliation.
I've been able to track it down to a specific cache manipulation function (added at the end of this post for completeness). The function appends the new message to a message list in a conversation, if the message does not already exist in the list.
This function is called with a message from the result of a mutation and from the subscription that receives new messages (including ones sent by the client since the server can't link a subscription to the origin of a mutation).
This works fine when the mutation is sent slowly as can be seen by the log ouput that I added in the cache.
When two mutations are sent in rapid succession however the execution is no longer linear and as a result it seems the ChatConversation that's updated in
updateConversationMessageLists
is turned into anull
result in a component that depends on it causing the component to have no data where it expects data.This is the log output in that scenario
urql version & exchanges:
├─ @urql/[email protected]
├─ @urql/[email protected]
└─ @urql/[email protected]
Steps to reproduce
Expected behavior
All cache updates happen correctly and the component that has queried the updated data renders as usual.
Actual behavior
The component that has queried the data receives
null
causing it to fail its data dependencies, unable to show the user a meaningful screen.The text was updated successfully, but these errors were encountered: