Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Store Promise<Response> instead of Response for HTTP API transactions #1624

Merged
merged 9 commits into from
Nov 14, 2016

Conversation

kegsay
Copy link
Member

@kegsay kegsay commented Nov 11, 2016

This fixes a race whereby:

  • User hits an endpoint.
  • No cached transaction so executes main code.
  • User hits same endpoint.
  • No cache transaction so executes main code.
  • Main code finishes executing and caches response and returns.
  • Main code finishes executing and caches response and returns.

This race is common in the wild when Synapse is struggling under load.
This commit fixes the race by:

  • User hits an endpoint.
  • Caches the promise to execute the main code and executes main code.
  • User hits same endpoint.
  • Yields on the same promise as the first request.
  • Main code finishes executing and returns, unblocking both requests.

Now with bonus sytests!

This fixes a race whereby:
 - User hits an endpoint.
 - No cached transaction so executes main code.
 - User hits same endpoint.
 - No cache transaction so executes main code.
 - Main code finishes executing and caches response and returns.
 - Main code finishes executing and caches response and returns.

 This race is common in the wild when Synapse is struggling under load.
 This commit fixes the race by:
  - User hits an endpoint.
  - Caches the promise to execute the main code and executes main code.
  - User hits same endpoint.
  - Yields on the same promise as the first request.
  - Main code finishes executing and returns, unblocking both requests.
@kegsay kegsay added the z-bug (Deprecated Label) label Nov 11, 2016
@erikjohnston
Copy link
Member

I wondering if a nicer API would be something like:

self.transactions.fetch_or_execute(
    self.handler.do_foo, txn_id,
    arg1, arg2, arg3=arg3
)

where fetch_or_execute would use the first arg as a txn_id, and if that wasn't in the cache call the given function with the arguments. This has the advantage you don't explicitly need to remember to both check and store.

@erikjohnston
Copy link
Member

Also, would be totally awesome if HttpTransactionCache had some python tests. Maybe also move it to synapse.utils?

@kegsay
Copy link
Member Author

kegsay commented Nov 11, 2016

fetch_or_execute doesn't appear to pass the request object which is used to select the key in the cache. Was this intentional?

@kegsay
Copy link
Member Author

kegsay commented Nov 11, 2016

Also, I don't mind moving it to util, but that feels like a downgrade in specificity, since only the REST Servlets make use of this class, and the aforementioned reliance on a request object.

@kegsay
Copy link
Member Author

kegsay commented Nov 11, 2016

I'm guessing you're proposing I make it more generic (so txn_id is just the key in the cache)? Do we plan on using the generic form elsewhere?

@erikjohnston
Copy link
Member

fetch_or_execute doesn't appear to pass the request object which is used to select the key in the cache. Was this intentional?

Nah, I just made it up.

Also, I don't mind moving it to util, but that feels like a downgrade in specificity, since only the REST Servlets make use of this class, and the aforementioned reliance on a request object.

At the very least it should be moved up, but generally I quite helpers like this to live a bit separately, rather than being dumped alongside the rest servlets themselves

I'm guessing you're proposing I make it more generic (so txn_id is just the key in the cache)? Do we plan on using the generic form elsewhere?

Well, it is currently implemented in a generic fashion. I'm happy for the arg name to be txn_id or key or whatever

@kegsay
Copy link
Member Author

kegsay commented Nov 11, 2016

At the very least it should be moved up, but generally I quite helpers like this to live a bit separately, rather than being dumped alongside the rest servlets themselves

synapse/rest/client/transactions.py perhaps?

Well, it is currently implemented in a generic fashion. I'm happy for the arg name to be txn_id or key or whatever

Do you want the implementation to be generic or are you happy with it in its current form (accepting request objects which are then processed for their access_token for use as a key)?

@erikjohnston
Copy link
Member

synapse/rest/client/transactions.py perhaps?

That's fine i suppose

Do you want the implementation to be generic or are you happy with it in its current form (accepting request objects which are then processed for their access_token for use as a key)?

Oh, I misread. Yeah, ok, I guess the generation of the key is non-trivial. Though I'd still be tempted to move the _get_key onto the v1 rest base class, as that would make the transaction class a nice self-contained and easily testable class, rather than having it know about request objects.

@kegsay
Copy link
Member Author

kegsay commented Nov 11, 2016

SGTM

@kegsay
Copy link
Member Author

kegsay commented Nov 11, 2016

Hmmm. The old implementation was using transaction IDs as a way to prune the cache, but it means that you couldn't have multiple in-flight requests at the same time and get idempotency, which feels bad. I've removed that code in my fix, but now the cache will grow unbounded.

How do you propose I clear the cache? Periodic interval? 10 minutes? The generic form now just takes a key, so I can't be more intelligent like base it off the given user (access_token, which is now concatenated in the key).

of (response_code, response_dict).
"""
try:
return self.transactions[txn_key]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need a .observe() on the end

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

deferred = fn(*args, **kwargs)
observable = ObservableDeferred(deferred)
self.transactions[txn_key] = observable
return observable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto a .observe() here too

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

observable = self.txns.fetch_or_execute_request(
request, self.on_POST, request
)
res = yield observable.observe()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I'd move this .observe() up into the actual cache to make things neater:

def on_PUT(self, request, txn_id):
    return self.txns.fetch_or_execute_request(
        request, self.on_POST, request
    )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@erikjohnston
Copy link
Member

How do you propose I clear the cache? Periodic interval? 10 minutes? The generic form now just takes a key, so I can't be more intelligent like base it off the given user (access_token, which is now concatenated in the key).

For now, I'd probably expire after 30mins (10 is probably a bit on the low side). Ideally I'd guess we'd probably batch persist these txn_ids to the db so they survive restarts, and then purge that table after a few hours/days.

@erikjohnston
Copy link
Member

(Also, a python test case for the HttpTransactionCache class would be awesome)

@kegsay
Copy link
Member Author

kegsay commented Nov 14, 2016

(Also, a python test case for the HttpTransactionCache class would be awesome)

Done.

For cleaning entries, I'm just periodically checking every 30 minutes, and timestamping when functions were invoked (which means the actual time in the cache is between 30~60 minutes). This feels simpler and less wasteful compared to registering timeouts for each entry in the cache, which has comparatively more function call overhead.

@kegsay
Copy link
Member Author

kegsay commented Nov 14, 2016

@erikjohnston PTAL

Also, are the Dendron tests just flakey or should I be worried? Looking at the previous builds on http://matrix.org/jenkins/job/SynapseSytestDendronCommit/ makes me think flakey, but I don't know.

@erikjohnston
Copy link
Member

Also, are the Dendron tests just flakey or should I be worried? Looking at the previous builds on http://matrix.org/jenkins/job/SynapseSytestDendronCommit/ makes me think flakey, but I don't know.

Yes :(

@erikjohnston
Copy link
Member

LGTM

@richvdh richvdh deleted the kegan/idempotent-requests branch December 1, 2016 14:09
alphapapa added a commit to alphapapa/matrix-client.el that referenced this pull request Jun 11, 2018
NOTE: According to
<https://matrix.org/docs/spec/client_server/r0.3.0.html#id183>, the
transaction ID should be scoped to the access token, so we should
preserve it with the token.  However, if the client crashes and fails
to save the TID, and then reuses it in the future...what happens?  The
server seems to accept messages with already-used TIDs.  Maybe it has
some kind of heuristic...  I found these:
<matrix-org/synapse#1481> and
<matrix-org/synapse#1624>.
alphapapa added a commit to alphapapa/matrix-client.el that referenced this pull request Jun 11, 2018
NOTE: According to
<https://matrix.org/docs/spec/client_server/r0.3.0.html#id183>, the
transaction ID should be scoped to the access token, so we should
preserve it with the token.  However, if the client crashes and fails
to save the TID, and then reuses it in the future...what happens?  The
server seems to accept messages with already-used TIDs.  Maybe it has
some kind of heuristic...  I found these:
<matrix-org/synapse#1481> and
<matrix-org/synapse#1624>.
alphapapa added a commit to alphapapa/matrix-client.el that referenced this pull request Jun 11, 2018
NOTE: According to
<https://matrix.org/docs/spec/client_server/r0.3.0.html#id183>, the
transaction ID should be scoped to the access token, so we should
preserve it with the token.  However, if the client crashes and fails
to save the TID, and then reuses it in the future...what happens?  The
server seems to accept messages with already-used TIDs.  Maybe it has
some kind of heuristic...  I found these:
<matrix-org/synapse#1481> and
<matrix-org/synapse#1624>.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
z-bug (Deprecated Label)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants