Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[event-hubs] fix uncaught TypeError #8884

Merged
merged 13 commits into from
Jun 4, 2020
Merged

Conversation

chradek
Copy link
Contributor

@chradek chradek commented May 13, 2020

Fixes #8584

Summary

This change updates the EventHubReceiver, EventHubSender, and Management client to check if the rhea connection is closing before starting the process of negotiating a claim and opening a connection.

Background

More details can be found in #8584, but the short version is that the SDK sometimes ran into an edge case where it attempted to reopen a rhea connection that was in the process of being destroyed. This caused in-flight requests to throw an uncaught TypeError if the connection's state didn't update before the request timed out.

Description of changes

This change checks that connection.isOpen() === false and connection.isRemoteOpen() === true. This combination indicates that the local end of the rhea connection has closed, and rhea is giving the remote end a chance to acknowledge the close before destroying the connection.

We can see the above happen when an idle timeout is reached.

We then wait for the disconnected event to be emitted before moving on with an initialization.
Finally, we check if the connection is in the process of disconnecting and wait for it to finish.
Once all this is done, initialization can occur.

Why do we need to wait for a connection to finish disconnecting?

We don't attempt to bring back up any receivers or senders after a disconnected event, instead relying on the receivers/senders to bring themselves back up if a request failed for a retryable reason.

Even though _init waits for a disconnected event if the connection is closing, we still have to wait for the actual disconnect to complete before opening a new connection.
This is because the disconnected event listener on the ConnectionContext is non-blocking, and control will be given back to _init before the event listener completes. Since event listeners are non-blocking, putting the refresh logic in the original disconnected event listener causes a race condition because negotiateClaim can be triggered before the connection is refreshed.

@chradek chradek changed the title [event-hubs] fix uncaught TypeError [not-ready-for-review][event-hubs] fix uncaught TypeError May 13, 2020
@chradek chradek changed the title [not-ready-for-review][event-hubs] fix uncaught TypeError [event-hubs] fix uncaught TypeError May 14, 2020
@chradek
Copy link
Contributor Author

chradek commented May 14, 2020

/azp run js - eventhubs-client - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@chradek
Copy link
Contributor Author

chradek commented May 14, 2020

/azp run js - eventhubs-client - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@@ -169,39 +198,52 @@ export namespace ConnectionContext {
connectionContext.connection.removeAllSessions();

// Close the cbs session to ensure all the event handlers are released.
await connectionContext.cbsSession.close();
await connectionContext.cbsSession.close().catch((err) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Precaution or did you actually see errors getting thrown?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only made this change because in some of my extreme test runs I did see this throw, though it required nearly blocking the event loop for a long time.


// Close all senders and receivers to ensure clean up of timers & other resources.
if (state.numSenders || state.numReceivers) {
for (const senderName of Object.keys(connectionContext.senders)) {
const sender = connectionContext.senders[senderName];
if (!sender.isConnecting) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this if check removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible for disconnected to be emitted while connecting. For example, if the connection is viewed as being idle while the connection is still in the process of opening. This check was preventing us from closing the existing senders in this scenario.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this check is a remnant of the time when we recovered sender links when connection was disconnected. So the code below where we have sender.close() was previously sender.onDetached(). Since that need to be done if the sender is already connecting, we had this check. It should have been removed when we made the decision to not re-connect broken sender links in the background i.e. in #6737

// Trigger a disconnect on the underlying connection.
clientConnectionContext.connection["_connection"].idle();

await client.getPartitionIds({});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might still be worth validating the results that come back, but really it's kind of a "fails completely or works" kind of call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the extra verification.

// used to cause the rhea connection remote state to be cleared.
// This caused the in-flight sendBatch to throw an uncaught error
// if it timed out.
client.sendBatch([{ body: "test3" }]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this should "work"? Ie, one of our retries will pick up the new connection?

If so, can we verify that it didn't throw an exception (maybe wrap in a try/catch and assert?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a check to ensure it doesn't throw. I didn't initially because I only cared that for this test, the process didn't crash, but it turned out to be trivial to add this extra assertion.

Copy link
Member

@richardpark-msft richardpark-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions (looks like @HarshaNalluru is already asking about the isConnecting removal).

Copy link
Contributor

@ramya-rao-a ramya-rao-a left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a few notes after the first pass, I'll need some more time to make a second pass after the current concerns are addressed

sdk/eventhub/event-hubs/src/connectionContext.ts Outdated Show resolved Hide resolved
sdk/eventhub/event-hubs/src/connectionContext.ts Outdated Show resolved Hide resolved
connectionContext.isConnectionClosing = function() {
// When the connection is not open, but the remote end is open,
// then the rhea connection is in the process of terminating.
return Boolean(!this.connection.isOpen() && this.connection.isRemoteOpen());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any case when isOpen() would return true and isRemoteOpen() would return false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect this would be possible while the connection is opening and waiting for the service to respond, but I'll confirm.
What is your concern though?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far I've confirmed 1 case where isOpen() == true && isRemoteOpen() == false.

  1. When opening a connection, before the service has responded that the connection is open.

I'm currently waiting to see what happens if I leave the connection open for a long time (expecting the service to initialize the close).

I also did verify that when the client calls close, isOpen() == false && isRemoteOpen() == true until the service reports that it has closed on its end.

sdk/eventhub/event-hubs/src/connectionContext.ts Outdated Show resolved Hide resolved
sdk/eventhub/event-hubs/CHANGELOG.md Outdated Show resolved Hide resolved
sdk/eventhub/event-hubs/src/connectionContext.ts Outdated Show resolved Hide resolved
sdk/eventhub/event-hubs/src/connectionContext.ts Outdated Show resolved Hide resolved
@chradek
Copy link
Contributor Author

chradek commented Jun 4, 2020

/azp run js - eventhubs-client - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

/**
* Helper type to get the names of all the functions on an object.
*/
type FunctionPropertyNames<T> = { [K in keyof T]: T[K] extends Function ? K : never }[keyof T];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So...beautiful.....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish I could take credit but that's right out of the TypeScript handbook: https://www.typescriptlang.org/docs/handbook/advanced-types.html#distributive-conditional-types

return;
}
isDisconnecting = true;
waitForDisconnectPromise = new Promise((resolve) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the isDisconnecting variable is basically the same as waitForDisconnectPromise != null. Worth it to remove and just have the one variable to track if we're disconnecting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah...I have a check in readyToOpenLink() that checks isDisconnecting, which just felt easier to understand than if (waitForDisconnectPromise) or if (Boolean(waitForDisconnectPromise)), but it might be better to have one source of truth.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated!

Copy link
Member

@richardpark-msft richardpark-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Honestly, the timing on that is so tricky, good find and good fix.

@chradek
Copy link
Contributor Author

chradek commented Jun 4, 2020

/check-enforcer reset

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[event-hubs] investigate TypeError thrown from rhea-promise awaitable sender
4 participants