Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

singleton realtime instance obtains multiple connections and confuses them #1882

Closed
clickonetwo opened this issue Feb 29, 2024 · 11 comments · Fixed by #1883
Closed

singleton realtime instance obtains multiple connections and confuses them #1882

clickonetwo opened this issue Feb 29, 2024 · 11 comments · Fixed by #1883
Assignees
Labels
bug Something isn't working. It's clear that this does need to be fixed.

Comments

@clickonetwo
Copy link

clickonetwo commented Feb 29, 2024

Which version of the Ably SDK are you using?

ably-cocoa 1.2.25

On which platform does the issue happen?

iOS 16.x, 17.x (on iPhone and iPad) and macOS 14.2.1 (using MacCatalyst)

Are you using Carthage?

No

Are you using Cocoapods?

No

Which version of Xcode are you using?

Xcode 15.2
Build version 15C500b

What did you do?

I obtain a single realtime client object and use it to open two channels, sending one packet on the first and two packets on the second. This works some of the time. However, on app startup, there is an intermittent failure in which the realtime client opens two connections to the Ably server rather than one, sends the packets on the first connection, and gets a mismatched connection id error response from the server that references the second connection.

NOTE: As a workaround, I have written my code so that it always sends a throwaway packet first thing after attaching a channel, and it looks for the mismatched connection ID error to come back from that first packet. If it gets that error, it totally tears down and closes the client, and then goes through the complete client creation/attach sequence again.

I’ve attached a debug log of a typical failure case. In this case the failure happened not once but twice, causing the client to tear down the connection a second time and then build it back up. On the third time, there is no second connection opened to the Ably server and so there is no connection mismatch error thrown and things proceed as expected. I can show you logs where there is never a second connection opened at startup, logs where a second connection is only made once on the first startup and not thereafter, and logs like this one where the second connection is made on two successive startups but not thereafter. Once I get a clean single-connection startup, I can tear down and rebuild connections as often as I want and the error never recurs. It only happens on the first (or first two) connection attempts that come after application startup.

My code is open source on GitHub. Most of the relevant code can be found in this class. The three most relevant code snippets are:

  1. The function that does the channel open/attach, and which notices the error on first packet, is tryOpenChannel.
  2. For context, the function that calls tryOpenChannel is openChannels and it calls it twice: once to open the "content" channel, which is named by a UUID, and once to open the "control" channel (named with "control"). Once the control channel is opened a non-throwaway packet is sent to announce this client's presence to others.
  3. Authentication is done in the TcpAuthenticator class, which obtains a signed AuthTokenRequest from my server after authenticating with a JWT signed by a private secret issued by my server via APNS.

What did you expect to happen?

The expected outcome is that, once authentication is complete, only one channel to the server would be opened, and all packets would be sent on that.

What happened instead?

What happens in the failure case is that two different connections are made by the realtime client to the server, messages are sent based on the first connection opened, then the second connection is opened, then the initial set of messages are resent, and at that point the server rejects them because it’s expecting them to be sent using the second connection ID. In the attached log, for example, the id of the first connection is _29fvQ8ykQ (line 192 of the log) and the id of the second connection is CcI3ilmBdc (line 261 of the log).

Additional

I was asked to open this bug by Mike Clark of the support team. Developer evangelist Cameron Michie is familiar with my application and has access to a running beta.

┆Issue is synchronized with this Jira Bug by Unito

@jamienewcomb jamienewcomb added the bug Something isn't working. It's clear that this does need to be fixed. label Feb 29, 2024
@maratal
Copy link
Collaborator

maratal commented Feb 29, 2024

Thanks @clickonetwo We're investigating this, in case of urgency please use v1.2.24 by updating your Podfile:

pod 'Ably', '1.2.24'

@clickonetwo
Copy link
Author

clickonetwo commented Feb 29, 2024

Hi @maratal thanks for the suggestion. In fact I first observed this problem on 1.2.24 and only recently updated in hopes that 1.2.25 would have a fix. So I know this problem exists in 1.2.24 as well.

@maratal
Copy link
Collaborator

maratal commented Feb 29, 2024

@clickonetwo thanks! This changes a lot (still investigating though).

@maratal
Copy link
Collaborator

maratal commented Mar 1, 2024

@clickonetwo couldn't you try the branch for the fix? Thanks!

@clickonetwo
Copy link
Author

@maratal Thanks for the quick work - I've switched my dependency to the fix/1882-fix-reachability-activation branch and I will start testing right away!

@clickonetwo
Copy link
Author

Quick update after initial testing: seems to have fixed the problem! I've gone through about 15 app launches and initial connections and not seen the problem once. That would never have happened before - it was about 1 out of every 5 launches.

I am about to release a build with this library to my beta testers. That will get us close to 1000 launches in the next week or two. I'll report back on what I hear.

@maratal
Copy link
Collaborator

maratal commented Mar 1, 2024

Nice work! @clickonetwo

@clickonetwo
Copy link
Author

Hi @maratal just wanted to report that I've now had hundreds of sessions started against your branch library and there has been no trace of this problem. Thanks for much for your quick work on this fix! Do you have any estimate for when your PR will be accepted into the main line and released?

@maratal
Copy link
Collaborator

maratal commented Mar 8, 2024

Thanks @clickonetwo I will make a release today.

@maratal
Copy link
Collaborator

maratal commented Mar 8, 2024

This is now released @clickonetwo

@clickonetwo
Copy link
Author

clickonetwo commented Mar 8, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working. It's clear that this does need to be fixed.
Development

Successfully merging a pull request may close this issue.

3 participants