Fix MQTTAsync_terminate() to re-check the number of handles after re-acquisition of mqttasync_mutex #1124
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Analyzing a few dumps revealed that the crash happens in the PAHO library in the background connection read thread, in function Socket_continueWrites(), which receives its socket argument pointing to de-allocated memory. Further investigation revealed that the cause for that is that the global variables, including variable mod_s in Socket.c, are cleaned up while the background threads are still alive.
Investigating further, the following race condition was found, leading to the possibility of having background threads running with global variables de-initialized:
MQTTAsync_destroy() , after freeing resources allocated to the given MQTTAsync handle, it checks if this was the last one, in which case it also calls MQTTAsync_terminate() to terminate the background threads and uninitialize the global variables
MQTTAsync_terminate() calls MQTTAsync_stop() which asks the background threads to terminate, by setting MQTTAsync_tostop to 1 (=true), and then waits for the background threads to actually terminate.
During that wait, mqttasync_mutex is released (and re-aquired afterwards), which opens the possibility that an MQTTAsync_create() and even MQTTAsync_connect() are called in the meantime
MQTTAsync_connect() resets MQTTAsync_tostop to 0 (false) and restarts the background threads if already terminated.
Even in that case events at 3-4 happen, MQTTAsync_terminate() continues by destroying the global variables, effectively corrupting the global data used by the background threads (and also by future API calls).
The main part of the fix is, in MQTTAsync_terminate(), to re-check the number of handles after re-aquisition of mqttasync_mutex, and refrain from deleting the global variables if the number of handles is no longer zero.
Call stack
#0 0x00007fffdfb20d3b in Socket_continueWrites (pwset=0x7fffbfffebc0, sock=0x7fffbfffeba4)
at /builds/di/stork/raven/Raven/src/tools/externals/paho.mqtt.c/src/Socket.c:1001
#1 0x00007fffdfb1e4c8 in Socket_getReadySocket (more_work=0, tp=0x7fffbfffece0,
mutex=0x7fffdfd4d3a0 <socket_mutex_store>, rc=0x7fffbfffeca8)
at /builds/di/stork/raven/Raven/src/tools/externals/paho.mqtt.c/src/Socket.c:287
#2 0x00007fffdfb0ca72 in MQTTAsync_cycle (sock=0x7fffbfffedbc, timeout=1000, rc=0x7fffbfffedb8)
at /builds/di/stork/raven/Raven/src/tools/externals/paho.mqtt.c/src/MQTTAsyncUtils.c:2854
#3 0x00007fffdfb08589 in MQTTAsync_receiveThread (n=0x7fffe406eea8)
at /builds/di/stork/raven/Raven/src/tools/externals/paho.mqtt.c/src/MQTTAsyncUtils.c:1955
#4 0x00007ffff3ce56db in start_thread (arg=0x7fffbffff700) at pthread_create.c:463
#5 0x00007ffff2ecb71f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
To Reproduce
The crash is sporadically reproduced by running automated tests that are stressing MQTT test connection functionality in our application.
Environment
Docker Image: Linux 5.8.0-53-generic #60~20.04.1-Ubuntu SMP Thu May 6 09:52:46 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Signed-off-by: Victor Rosca [email protected]
Thank you for your interest in this project managed by the Eclipse Foundation.
The guidelines for contributions can be found in the CONTRIBUTING.md file.
At a minimum, you must sign the Eclipse ECA, and sign off each commit.
To complete and submit a ECA, log into the Eclipse projects forge
You will need to create an account with the Eclipse Foundation if you have not already done so.
Be sure to use the same email address when you register for the account that you intend to use when you commit to Git.
Go to https://accounts.eclipse.org/user/eca to sign the Eclipse ECA.