Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encapsulated package breaks the gateway #17

Closed
trivialkettle opened this issue Aug 27, 2024 · 16 comments
Closed

Encapsulated package breaks the gateway #17

trivialkettle opened this issue Aug 27, 2024 · 16 comments

Comments

@trivialkettle
Copy link

Hi,
when I send the following encapsulated package to the gateway

09 fe 00 d4 3e 27 6e 03  | ....>'n.
c3 1e 04 04 01 00 0a 64  | .......d
6f 6e 67 6c 65 2d 44 34  | ongle-D4
3a 33 45 3a 32 37 3a 36  | :3E:27:6
45 3a 30 33 3a 43 33     | E:03:C3

the gateway crashes with the message

$ cc_mqttsn_gateway_app -c ./gateway.conf
[INFO]: New UDP client connection from: 127.0.0.1:45576
cc_mqttsn_gateway_app: /xxx/cc.mqttsn.libs/gateway/app/gateway/GatewaySession.cpp:58: bool cc_mqttsn_gateway_app::GatewaySession::start(): Assertion `m_termReqCb' failed.

This is the first package my forwarder sends, so it did not connect or anything before.

Thanks for you help.

@arobenko
Copy link
Member

arobenko commented Aug 27, 2024

Please retry with "develop" branch.
At this stage the gateway application creates a connection to a broker for every Session object. When the packet arrives and not parsed yet, it is considered to be a legit client and a Session object is created for it, which initiates connection to a broker right away. When the message is parsed new "encapsulated" Session object is created and also initiates a connection to the broker. Due to the fact the the first Session object doesn't really send any data to a broker, the latter closes the connection, then the gateway application re-opens it again a bit later. As the result there is a constant loop of creating and closing one redundant connection to a broker.

I will work on making the connection to a broker being a "lazy" one, i.e. established only when there is a data to be sent to a broker.

@trivialkettle
Copy link
Author

Hi,
thanks for the fast solution. Now the forwarded client can connect to the broker.

Though another problem appeared.

After connecting I let the client idle and the gateway crashed with SIGSEGV

stdout:

[INFO]: New UDP client connection from: 127.0.0.1:56212
[INFO]: Connected client: dongle-D4:3E:27:6E:03:C3
[ERROR]: TCP/IP read error: End of file
[ERROR]: TCP/IP read error: End of file
[INFO]: Terminating session for client: 
[INFO]: Terminating session for client: dongle-D4:3E:27:6E:03:C3
[INFO]: Terminated UDP client connection from: 127.0.0.1:56212
Signal: SIGSEGV (Segmentation fault)

stack trace:

operator()<…> GatewaySession.cpp:343
operator() GatewaySession.cpp:339
main main.cpp:49

Mosquitto Broker Log:

broker-1  | 1724833057: New connection from 192.168.32.1:42778 on port 1883.
broker-1  | 1724833057: New connection from 192.168.32.1:42776 on port 1883.
broker-1  | 1724833057: New client connected from 192.168.32.1:42776 as dongle-D4:3E:27:6E:03:C3 (p2, c1, k10).
broker-1  | 1724833058: Client dongle-D4:3E:27:6E:03:C3 disconnected.
broker-1  | 1724833058: New connection from 192.168.32.1:42792 on port 1883.
broker-1  | 1724833058: New client connected from 192.168.32.1:42792 as dongle-D4:3E:27:6E:03:C3 (p2, c0, k10).
broker-1  | 1724833151: Client <unknown> has exceeded timeout, disconnecting.
broker-1  | 1724833151: Client dongle-D4:3E:27:6E:03:C3 closed its connection.

It looks like, the dongle (forwarded node) connects twice but I guess only one of them gets the PINGREQ/PINGRESP. In the log of my dongle I see the PINGREQ/PINGRESP packages.

Thanks a lot for this feature, I'm more than happy to help you resolving the remaining issues.

@arobenko
Copy link
Member

Thanks, I appreciate your help. My ability to properly test the forwarding encapsulation support is limited. I kind of reluctant and don't have much time to implement proper forwarding application to be able to reproduce problems in my environment.

I've added some extra assert statements to the GatewaySession.cpp (on develop branch) which might help detect the problem. I won't have much time tomorrow, but on Friday I think I'll introduce the lazy broker connection I mentioned. After that it should be easier to get to the bottom of it.

@trivialkettle
Copy link
Author

One thing that differs between "normal" node and forwarded node is the line in the gateway log

[ERROR]: TCP/IP read error: End of file

With the new changes I get the following logs:

cc_mqttsn_gateway_app -c ./gateway.conf
[INFO]: New UDP client connection from: 127.0.0.1:60551
[INFO]: Connected client: dongle-D4:3E:27:6E:03:C3
[ERROR]: TCP/IP read error: End of file
[ERROR]: TCP/IP read error: End of file
[INFO]: Terminating session for client: 
[INFO]: Terminating session for client: dongle-D4:3E:27:6E:03:C3
[INFO]: Terminated UDP client connection from: 127.0.0.1:60551
cc_mqttsn_gateway_app: /xxx/cc.mqttsn.libs/gateway/app/gateway/GatewaySession.cpp:346: cc_mqttsn_gateway_app::GatewaySession::startSession()::<lambda(cc_mqttsn_gateway::Session*)>::<lambda()>::<lambda(auto:5&)> [with auto:5 = std::unique_ptr<cc_mqttsn_gateway_app::GatewaySession>]: Assertion `sPtr' failed.

Process finished with exit code 134 (interrupted by signal 6:SIGABRT)

And mosquitto log

broker-1  | 1724840930: New connection from 192.168.32.1:58270 on port 1883.
broker-1  | 1724840930: New connection from 192.168.32.1:58256 on port 1883.
broker-1  | 1724840930: New client connected from 192.168.32.1:58256 as dongle-D4:3E:27:6E:03:C3 (p2, c1, k10).
broker-1  | 1724840930: Client dongle-D4:3E:27:6E:03:C3 disconnected.
broker-1  | 1724840930: New connection from 192.168.32.1:41516 on port 1883.
broker-1  | 1724840930: New client connected from 192.168.32.1:41516 as dongle-D4:3E:27:6E:03:C3 (p2, c0, k10).
broker-1  | 1724841023: Client <unknown> has exceeded timeout, disconnecting.
broker-1  | 1724841023: Client dongle-D4:3E:27:6E:03:C3 closed its connection.

So there is a client from port 58270, than the forwarded node connects from port 58256 and immediately disconnects and than connects again from port 41516. As we can see the dongle does not exceeds its timeout but the unknown client. I guess the one from port 58270.

@arobenko
Copy link
Member

I think I know what the problem is, please try updated "develop" branch again.

@trivialkettle
Copy link
Author

Did not solve it for me, my device does not get any PING/RESP/PINGREQ after the session is terminated for the dongle

Gateway log:

[INFO]: New UDP client connection from: 127.0.0.1:43269
[INFO]: Connected client: dongle-D4:3E:27:6E:03:C3
[ERROR]: TCP/IP read error: End of file
[ERROR]: TCP/IP read error: End of file
[INFO]: Terminating session for client: 
[INFO]: Terminating session for client: dongle-D4:3E:27:6E:03:C3
[INFO]: Terminated UDP client connection from: 127.0.0.1:43269
[INFO]: New UDP client connection from: 127.0.0.1:43269
[ERROR]: TCP/IP read error: End of file
[INFO]: Terminating session for client: 
[INFO]: Terminating session for client: 
[INFO]: Terminated UDP client connection from: 127.0.0.1:43269

mosquitto log:

broker-1  | 1724843537: New connection from 192.168.32.1:48172 on port 1883.
broker-1  | 1724843537: New connection from 192.168.32.1:48156 on port 1883.
broker-1  | 1724843537: New client connected from 192.168.32.1:48156 as dongle-D4:3E:27:6E:03:C3 (p2, c1, k10).
broker-1  | 1724843538: Client dongle-D4:3E:27:6E:03:C3 disconnected.
broker-1  | 1724843538: New connection from 192.168.32.1:48180 on port 1883.
broker-1  | 1724843538: New client connected from 192.168.32.1:48180 as dongle-D4:3E:27:6E:03:C3 (p2, c0, k10).
broker-1  | 1724843633: Client <unknown> has exceeded timeout, disconnecting.
broker-1  | 1724843633: Client dongle-D4:3E:27:6E:03:C3 closed its connection.
broker-1  | 1724843638: New connection from 192.168.32.1:52264 on port 1883.
broker-1  | 1724843638: New connection from 192.168.32.1:52280 on port 1883.
broker-1  | 1724843729: Client <unknown> has exceeded timeout, disconnecting.
broker-1  | 1724843729: Client <unknown> has exceeded timeout, disconnecting.

@arobenko
Copy link
Member

arobenko commented Aug 28, 2024 via email

@arobenko
Copy link
Member

Introduced the following changes on "develop" branch:

  • The connection to the broker is lazy, i.e. TCP/IP connection is established on the first attempt to send data to the broker. It results in only encapsulated session object having TCP/IP connection to the broker while the primary one, which handles the client / forwarder connection, does not.
  • When the last encapsulated session is deleted and the primary session itself didn't send any data to the broker, the primary session object is deleted as well to free memory and avoid accumulation of allocated resources.
  • A bit extra logging messages from the gateway application. The session logging includes pointer to the handling object to distinguish what message was printed by what object.

@trivialkettle
Copy link
Author

Hi,
it seems to work so far. Thank you very much :)

But one issue still exists:
The forwarded devices get disconnected from the broker after the first connection with a TCP/IP Error in gateway log and reconnect. However, I don't see this disconnect in the device log, my device connects and stays connected.

Gateway log:

[INFO]: New UDP client connection from: 127.0.0.1:34193
[INFO]: (0x5cfc4b3c8af0) New primary session
[INFO]: (0x5cfc4b3cad70) New enapsulated session
[INFO]: (0x5cfc4b3cad70) Connected client: dongle-D4:3E:27:6E:03:C3
[ERROR]: TCP/IP read error: End of file
[INFO]: (0x5cfc4b3cbd30) New enapsulated session
[INFO]: (0x5cfc4b3cbd30) Connected client: client-82:C5:FA
[ERROR]: TCP/IP read error: End of file

and mosquitto log:

broker-1  | 1725001667: New connection from 192.168.32.1:36628 on port 1883.
broker-1  | 1725001667: New client connected from 192.168.32.1:36628 as dongle-D4:3E:27:6E:03:C3 (p2, c1, k10).
broker-1  | 1725001667: Client dongle-D4:3E:27:6E:03:C3 disconnected.
broker-1  | 1725001667: New connection from 192.168.32.1:36630 on port 1883.
broker-1  | 1725001667: New client connected from 192.168.32.1:36630 as dongle-D4:3E:27:6E:03:C3 (p2, c0, k10).
broker-1  | 1725002024: New connection from 192.168.32.1:56598 on port 1883.
broker-1  | 1725002024: New client connected from 192.168.32.1:56598 as client-82:C5:FA (p2, c1, k10).
broker-1  | 1725002051: Client client-82:C5:FA disconnected.
broker-1  | 1725002051: New connection from 192.168.32.1:47428 on port 1883.
broker-1  | 1725002051: New client connected from 192.168.32.1:47428 as client-82:C5:FA (p2, c0, k10).

I let it run for some hours to see if anything breaks.

@arobenko
Copy link
Member

The "End of file" TCP/IP error usually means that the remote end (broker) closes its TCP/IP socket (connection). When such closure happens the gateway detects it and automatically re-connects using the recorded credentials picked up by the CONNECT / CONNACK messages exchange without notifying the client. That's why your client is not aware of it. I thought about implementing gateway logic to send DISCONNECT message to the client and then let the client to preform re-connection logic. However, it can create corner case races where client sends DISCONNECT message to enter sleep mode and if right after receiving such DISCONNECT message from the gateway, the client will perceive it as a sleep mode confirmation, while for gateway the client will be disconnected, not asleep. That's why I decided against it.

There is a need to perform a thorough analysis of why the broker disconnects the client. The first step should be analyzing the MQTT v3.1.1 messages exchanged between the gateway and the broker. I guess you can use wireshark for that purpose, or you can use CommsChampion Tools for that purpose. To do so you can build the all the projects of CommsChampion Ecosystem in one go using cc.cmake project.

cd /path/to/cc.cmake
mkdir build && cd build
cmake .. -DCMAKE_INSTALL_PREFIX=install -DCMAKE_PROJECT_INCLUDE=config/All.cmake
cmake --build . --parallel 16

After the build is complete use ./install/bin/cc_view.sh to run the CommsChampion Tools (see wiki for instructions of how to use). Select "TCP/IP Proxy Socket" as your socket plugin, configure it to connect to remote port 1883 while selecting other port for incoming connections. Select "MQTT v3.1.1 Protocol" as your protocol plugin. After the connection is started, configure your gateway application to connect to the proxy port you chose when configuring socket plugin of the CommsChampion Tools. Now you should be able to see the MQTT messages exchanged between the gateway and the broker with nice visual breakdown of the message fields.

Please let me know if you see any discrepancies between what you expect in those messages and when is actually been sent. Is "keep alive" configuration as expected?

@arobenko
Copy link
Member

arobenko commented Sep 1, 2024

I've added an ability to emulate encapsulated session to the pub / sub client applications.

  • Emulating sub client
./cc_mqttsn_client_sub -c client1 -t "#" --fwd-enc-node-id "\x00\x01"

The output from the gateway is:

[INFO]: New UDP client connection from: 127.0.0.1:38853
[INFO]: (0x614000000040) New primary session
[INFO]: (0x614000000240) New encapsulated session
[INFO]: (0x614000000240) Connected client: client1

Add two "pub" clients sending from the same port but claiming different node IDs:

./cc_mqttsn_client_pub -c client2 -t "bla" -m "hello" -P 20000 --pub-no-disconnect --fwd-enc-node-id "\x00\x02"
./cc_mqttsn_client_pub -c client3 -t "bla" -m "hello" -P 20000 --pub-no-disconnect --fwd-enc-node-id "\x00\x03"

The output from the gateway is:

[INFO]: New UDP client connection from: 127.0.0.1:20000
[INFO]: (0x614000000440) New primary session
[INFO]: (0x614000000640) New encapsulated session
[INFO]: (0x614000000640) Connected client: client2
[INFO]: (0x614000000840) New encapsulated session
[INFO]: (0x614000000840) Connected client: client3

After a while when keep-alive value expires the "publish" clients are disconnected by the gateway (before the broker forces disconnection):

[INFO]: (0x614000000640) Terminating session for client: client2
[INFO]: (0x614000000840) Terminating session for client: client3
[INFO]: (0x614000000440) Terminating session for client: 
[INFO]: Terminated UDP client connection from: 127.0.0.1:20000

Everything seems to be working as expected, no unexpected termination from the broker.
Also note, that previously I hard-coded the "radius" value in the forwarding encapsulation header to be "3". Maybe your forwarder didn't like it and did some unexpected stuff with it. Now I removed it and it's value became "0". Please retry again. If the problem persists you need to run a thorough traffic analysis on your setup and see if there are any discrepancies.

@arobenko
Copy link
Member

arobenko commented Sep 3, 2024

I'm closing this ticket. If after a thorough traffic analysis you come to a conclusion that the gateway doesn't behave as expected, please open a new ticket.

@arobenko arobenko closed this as completed Sep 3, 2024
@trivialkettle
Copy link
Author

Hi,
sorry for the delay, I got another important task. I guess I can work on this next week again.

@trivialkettle
Copy link
Author

Hi,
I think I found it in the wireshark log.

After connect I use mqttsn_bare_metal_client_will_update to update the will message. This results in a Disconnect Req from the gateway to the broker. Then happens some TCP magic and after that, the gateway talks over a different port to the broker. This whole process is transparent to the client, as you may know, because you wrote the mqttsn client :)

@arobenko
Copy link
Member

arobenko commented Sep 5, 2024

That makes sense. The MQTT v3.1.1 doesn't have an ability to change a will on the fly, requiring disconnection and re-connection from the gateway. However, the client should set its will through the connection procedure, rather than updating it after the connection is established. It will result in a single connection to the broker without reconnection later.

@trivialkettle
Copy link
Author

Due to some system design decisions, I need to subscribe some topics first, to know what my Will_Topic will be, so I need to update the will later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants