-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Store Promise<Response> instead of Response for HTTP API transactions #1624
Conversation
This fixes a race whereby: - User hits an endpoint. - No cached transaction so executes main code. - User hits same endpoint. - No cache transaction so executes main code. - Main code finishes executing and caches response and returns. - Main code finishes executing and caches response and returns. This race is common in the wild when Synapse is struggling under load. This commit fixes the race by: - User hits an endpoint. - Caches the promise to execute the main code and executes main code. - User hits same endpoint. - Yields on the same promise as the first request. - Main code finishes executing and returns, unblocking both requests.
I wondering if a nicer API would be something like: self.transactions.fetch_or_execute(
self.handler.do_foo, txn_id,
arg1, arg2, arg3=arg3
) where |
Also, would be totally awesome if |
|
Also, I don't mind moving it to |
I'm guessing you're proposing I make it more generic (so |
Nah, I just made it up.
At the very least it should be moved up, but generally I quite helpers like this to live a bit separately, rather than being dumped alongside the rest servlets themselves
Well, it is currently implemented in a generic fashion. I'm happy for the arg name to be |
Do you want the implementation to be generic or are you happy with it in its current form (accepting |
That's fine i suppose
Oh, I misread. Yeah, ok, I guess the generation of the key is non-trivial. Though I'd still be tempted to move the |
SGTM |
Hmmm. The old implementation was using transaction IDs as a way to prune the cache, but it means that you couldn't have multiple in-flight requests at the same time and get idempotency, which feels bad. I've removed that code in my fix, but now the cache will grow unbounded. How do you propose I clear the cache? Periodic interval? 10 minutes? The generic form now just takes a key, so I can't be more intelligent like base it off the given user (access_token, which is now concatenated in the key). |
of (response_code, response_dict). | ||
""" | ||
try: | ||
return self.transactions[txn_key] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need a .observe()
on the end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
deferred = fn(*args, **kwargs) | ||
observable = ObservableDeferred(deferred) | ||
self.transactions[txn_key] = observable | ||
return observable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto a .observe()
here too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
observable = self.txns.fetch_or_execute_request( | ||
request, self.on_POST, request | ||
) | ||
res = yield observable.observe() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I'd move this .observe()
up into the actual cache to make things neater:
def on_PUT(self, request, txn_id):
return self.txns.fetch_or_execute_request(
request, self.on_POST, request
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
For now, I'd probably expire after 30mins (10 is probably a bit on the low side). Ideally I'd guess we'd probably batch persist these txn_ids to the db so they survive restarts, and then purge that table after a few hours/days. |
(Also, a python test case for the |
For cleaning entries, I'm just periodically checking every 30 minutes, and timestamping when functions were invoked (which means the actual time in the cache is between 30~60 minutes). This feels simpler and less wasteful compared to registering timeouts for each entry in the cache, which has comparatively more function call overhead. |
@erikjohnston PTAL Also, are the Dendron tests just flakey or should I be worried? Looking at the previous builds on http://matrix.org/jenkins/job/SynapseSytestDendronCommit/ makes me think flakey, but I don't know. |
Yes :( |
LGTM |
NOTE: According to <https://matrix.org/docs/spec/client_server/r0.3.0.html#id183>, the transaction ID should be scoped to the access token, so we should preserve it with the token. However, if the client crashes and fails to save the TID, and then reuses it in the future...what happens? The server seems to accept messages with already-used TIDs. Maybe it has some kind of heuristic... I found these: <matrix-org/synapse#1481> and <matrix-org/synapse#1624>.
NOTE: According to <https://matrix.org/docs/spec/client_server/r0.3.0.html#id183>, the transaction ID should be scoped to the access token, so we should preserve it with the token. However, if the client crashes and fails to save the TID, and then reuses it in the future...what happens? The server seems to accept messages with already-used TIDs. Maybe it has some kind of heuristic... I found these: <matrix-org/synapse#1481> and <matrix-org/synapse#1624>.
NOTE: According to <https://matrix.org/docs/spec/client_server/r0.3.0.html#id183>, the transaction ID should be scoped to the access token, so we should preserve it with the token. However, if the client crashes and fails to save the TID, and then reuses it in the future...what happens? The server seems to accept messages with already-used TIDs. Maybe it has some kind of heuristic... I found these: <matrix-org/synapse#1481> and <matrix-org/synapse#1624>.
This fixes a race whereby:
This race is common in the wild when Synapse is struggling under load.
This commit fixes the race by:
Now with bonus sytests!