Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MQTTProperties_free function cause crash in 1.3.7 #1009

Closed
kenxausten opened this issue Dec 4, 2020 · 12 comments
Closed

MQTTProperties_free function cause crash in 1.3.7 #1009

kenxausten opened this issue Dec 4, 2020 · 12 comments

Comments

@kenxausten
Copy link

Describe the bug

Program terminated with signal SIGSEGV, Segmentation fault.
#0 0xb6ca623a in MQTTProperties_free () from /root/FEP/edge-dpf/lib/support/libpaho-mqtt3cs.so.1
[Current thread is 1 (Thread 0xac9f4440 (LWP 32110))]
(gdb) bt
#0 0xb6ca623a in MQTTProperties_free () from /root/FEP/edge-dpf/lib/support/libpaho-mqtt3cs.so.1
#1 0xb6c930e0 in MQTTClient_freeMessage () from /root/FEP/edge-dpf/lib/support/libpaho-mqtt3cs.so.1
#2 0xb6effd54 in enos_message_arrived (context=0x111f3b0, topic_name=0x11deba0 "3-11eb-89be-3ee4e2b298f38", topic_len=0, message=0x1208460)
at /home/envuser/work/cross-compile/enos-api-sdk-c_moxa/src/enos_c_api/mqtt.c:1209
#3 0xb6c936a8 in MQTTClient_run () from /root/FEP/edge-dpf/lib/support/libpaho-mqtt3cs.so.1
#4 0xb6a145d8 in start_thread (arg=0x0) at pthread_create.c:458
#5 0xb615c6fa in ?? () at ../sysdeps/unix/sysv/linux/arm/clone.S:76 from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

To Reproduce
Take a library trace as outlined in the README, and/or have a program or describe the steps to reproduce the behavior:
1.just use the lib to send mqtt message

Expected behavior
It should not get crash.

Screenshots
If applicable, add screenshots to help explain your problem.
image

** Environment (please complete the following information):**

  • OS: linux
  • Version 4.4.0-cip-rt-moxa-imx7d

Additional context

@icraggs
Copy link
Contributor

icraggs commented Dec 4, 2020

I don't think "Just using the library to send a message" is enough to reproduce, otherwise the tests and samples would not work. See the sample paho_cs_sub for an example of the use of MQTTClient_freeMessage.

Note that MQTTClient_freeMessage takes a MQTTClient_message** as a parameter, not MQTTClient_message*.

If it isn't that, taking a client library trace (at the minimum level) may explain further what's going on.

@kenxausten
Copy link
Author

I use MQTTClient_message**. this issue is only occured in some situations and it's hard to reproduce.
The usage of this lib is as following:

I create a global MQTTClient variable, and in many threads to send/recv msg using this global variable.
When the connection is lost, I reuse the global variable, just use MQTTClient_disconnect(&client, 0) release last connection and use MQTTClient_connect to reconnect again. When I test this in 1.3.0, it's easy to encounter an crash, so I changed to use 1.3.7 and only encounter above issue.

@kenxausten
Copy link
Author

It seems there is thread safety issue. the following picture is captured by valgrind helcheck tools. the the mqtt connection is lost, and reconnect again, the valgrind will give following thread safety warning.

image

@icraggs
Copy link
Contributor

icraggs commented Dec 8, 2020

I'm a bit confused now. The issue title says crash in 1.3.7. What issue(s) are you getting in 1.3.7?

@kenxausten
Copy link
Author

I am use this lib(1.3.7) in multi-thread. When the connection is lost, I will call reconnect function which write by myself.
The reconnect function calls MQTTClient_disconnect & MQTTClient_connect as following picture shows.

In other multi-thread will call publish/subscribe function to send/recv msg. When the program run, it will got crash stochastically in function MQTTProperties_free.

I guess it is caused by thread unsafety issue. I want to know which function is thread unsafe?

image

image

@icraggs
Copy link
Contributor

icraggs commented Dec 11, 2020

If you would like to try this change, to see if it helps? Comment out the lines 861 and 863 (in the master branch) which unlock and lock the mqttclient_mutex around the call to message arrived:

			//Thread_unlock_mutex(mqttclient_mutex);
			rc = (*(m->ma))(m->context, qe->topicName, topicLen, qe->msg);
			//Thread_lock_mutex(mqttclient_mutex);

@kenxausten
Copy link
Author

Is this a bug which will be fixed in new version?

@icraggs
Copy link
Contributor

icraggs commented Jan 15, 2021

I asked you to try out a proposed fix over a month ago and you hadn't responded.

@icraggs
Copy link
Contributor

icraggs commented Jan 18, 2021

So if you can try the proposed fix above and let me know if it works or not for you, then this would allow me to consider adding it to the next release (assuming it works).

@icraggs
Copy link
Contributor

icraggs commented Apr 25, 2021

No response to request to try proposed fix.

@icraggs icraggs closed this as completed Apr 25, 2021
@Danielsuri
Copy link

I have the same as #995 and the fix doesn't get things better

@Danielsuri
Copy link

If you would like to try this change, to see if it helps? Comment out the lines 861 and 863 (in the master branch) which unlock and lock the mqttclient_mutex around the call to message arrived:

			//Thread_unlock_mutex(mqttclient_mutex);
			rc = (*(m->ma))(m->context, qe->topicName, topicLen, qe->msg);
			//Thread_lock_mutex(mqttclient_mutex);

the problem with this is that MQTTClient_publish5 try to lock Thread_lock_mutex(mqttclient_mutex); that already locked..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants