Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subscribe does not remove subscriptions from map on failure #94

Closed
shekharhimanshu opened this issue Apr 23, 2018 · 7 comments
Closed
Labels
closed-for-staleness help wanted We are asking the community to submit a PR to resolve this issue.

Comments

@shekharhimanshu
Copy link
Contributor

Hello,

It looks like Subscribe api inserts the subscription into the subscription map before sending the actual request.

It then waits for the SUBACK packet.

Subscribe gives up if say, SUBACK is not received within specified time and ResponseCode::MQTT_REQUEST_TIMEOUT_ERROR is returned to the user.

However, the subscription map is left untouched. Ideally since subscribe to the topic was not successful, it should be removed from the map to avoid any side effects.

One such side effect is inside the Keep-Alive thread:
The Keep-Alive thread will try to resubscribe to topics present in subscription map and hence will also retry those not successful subscriptions each time. In worst case scenario, reconnect might never succeed due to bad subscriptions present inside the subscription map.

Thanks.

@vareddy-zz
Copy link
Contributor

Hi @shekharhimanshu,
The Keepalive thread does not call the PerformAction API to resubscribe after a reconnect. It directly calls WriteToNetworkBuffer after constructing the Subscription packet. It will only deal with errors in the write call being made and a bad subscription will not make it fail indefinitely.

PerformAction is agnostic to the type of MQTT action being performed. Hence we can't do any automatic cleanup after PerformAction fails as we cannot distinguish between a Subscribe failure or a Publish failure.

The cleanup can be performed manually by calling RemoveAllSubscriptionsForPacketId but this API isn't currently exposed by MqttClient.

I've added a task in our queue for exposing this API. Please let us know if you have any more questions.
Thanks!
Varun

@shekharhimanshu
Copy link
Contributor Author

Hi @vareddy

It will only deal with errors in the write call being made and a bad subscription will not make it fail indefinitely.

Yes, you are correct about that. But consider the following sequence:

  1. User tries to subscribe to a topic, say TopicA not allowed by AWS IoT policies.
  2. In that case, I found that AWS IoT MQTT broker disconnects the client automatically. (Could not find a documentation for this but I was able to reproduce this with PubSub sample)
  3. Subscribe was not successful but it was retained inside the subscription map by the SDK. (User has no idea about this)
  4. Keep-alive thread sends PING and fails.
  5. Keep-alive thread tries to CONNECT -> OK
  6. Keep-alive thread tries to subscribe TopicA again and fails (since not allowed inside policies). Also client is automatically disconnected by AWS IoT MQTT broker.
  7. After some time PING is send again and it fails.
  8. Go to 4.

When above happens, Keep-alive thread will enter into an infinite loop of connect and disconnect.
However, such cases might be rare.

My point was there may be side effects of not cleaning unsuccessful subscriptions from the subscription map and hence IMO, should be handled with.

PerformAction is agnostic to the type of MQTT action being performed. Hence we can't do any automatic cleanup after PerformAction fails as we cannot distinguish between a Subscribe failure or a Publish failure.

Yes, you are right about that too. I tried to correct this myself but due to above reason could not come up with a nice solution without drastically changing the design of current SDK.

The cleanup can be performed manually by calling RemoveAllSubscriptionsForPacketId but this API isn't currently exposed by MqttClient.
I've added a task in our queue for exposing this API. Please let us know if you have any more questions.

Thank you for that. But my intention was not that though. In fact, users using this SDK have no idea about subscription maps and how reconnect thing works. So IMO, having RemoveAllSubscriptionsForPacketId api wont be of much help to them.

Thanks

@vareddy-zz
Copy link
Contributor

vareddy-zz commented Apr 26, 2018

Hi @shekharhimanshu,
You are right on all counts. I also realized that the customer won't be able to call RemoveAllSubscriptions as they will not be able to relate the connection failure to the subscribe action failing, even if they were aware of how subscriptions were being stored in the library.

A correction to the above is that PerformAction is aware of the MQTT operation being performed but cannot access the ClientState class members as ClientState inherits ClientCoreState.

One solution would be to keep track of at least one successful subscribe in another variable to ensure that the reconnect does not try to resubscribe to failed subscriptions (checking just the active flag will not help as the flag will be reset on a disconnect). Again this does not help in cleanup of failed subscriptions. That will require major redesign. Does this make sense to you?

This can be accompanied by another API that cleans up the subscriptions that failed the first time. This won't refer to the map or packet ID, so the end user can call this on a failed subscription.

I implemented the secondary flag fix and it seems to work for the basic testing that I did.
Thanks!
Varun

@shekharhimanshu
Copy link
Contributor Author

One solution would be to keep track of at least one successful subscribe in another variable to ensure that the reconnect does not try to resubscribe to failed subscriptions (checking just the active flag will not help as the flag will be reset on a disconnect). Again this does not help in cleanup of failed subscriptions. That will require major redesign. Does this make sense to you?

Yes. I had this in my mind as well but seemed a bit hacky to me. So didn't propose it as a solution.
Also as you rightly mentioned, it wouldn't solve the real problem but rather solve one of its side effects.

This can be accompanied by another API that cleans up the subscriptions that failed the first time. This won't refer to the map or packet ID, so the end user can call this on a failed subscription.

What kind of api do you have in mind? Is it something like RemoveSubscription(topic name)?

IMO, we can create such an api and mention the subscription map behavior as a known issue with above as its workaround till some major re-design happens.
What do you think?

I implemented the secondary flag fix and it seems to work for the basic testing that I did.

That's great! Thank you for trying.
If it passes your other tests, we should include it as well.

@vareddy-zz
Copy link
Contributor

Hi @shekharhimanshu,
Yes, I agree it seems like a hack but apart from a complete restructure of the subscription and SUBACK mechanism, we will not be able to counter the issue.

With reference to the API to remove subscriptions, it could either remove all subscriptions that failed the first time (the addition of the extra flag makes it easy), or remove specific subscriptions as you suggested.
Thanks!
Varun

@shekharhimanshu
Copy link
Contributor Author

shekharhimanshu commented May 1, 2018

With reference to the API to remove subscriptions, it could either remove all subscriptions that failed the first time (the addition of the extra flag makes it easy), or remove specific subscriptions as you suggested.

Both of your proposals LGTM. Thanks.

@justnance justnance added help wanted We are asking the community to submit a PR to resolve this issue. and removed help wanted labels Apr 19, 2019
@github-actions
Copy link

Greetings! Sorry to say but this is a very old issue that is probably not getting as much attention as it deservers. We encourage you to check if this is still an issue in the latest release and if you find that this is still a problem, please feel free to open a new one.

@github-actions github-actions bot added closing-soon This issue will automatically close in 4 days unless further comments are made. closed-for-staleness and removed closing-soon This issue will automatically close in 4 days unless further comments are made. labels May 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed-for-staleness help wanted We are asking the community to submit a PR to resolve this issue.
Projects
None yet
Development

No branches or pull requests

4 participants