Data forwarding repeatedly stopping #5

marcus-aa · 2020-09-03T16:44:34Z

I have a problem on 2 different installations where BS just stops forwarding the data after 24-36 hours after a re-start.

This only happens where BS connects over LAN/TCP to the source, never when it uses a /dev/usb port.

INPUT_OPTIONS="--net 192.168.10.158:10003"

The TCP connection stays up, BS answers new local connections to the output ports, but no data is coming through.
if I stop and start the service, everything works again for a day or two.

TCP keepalive issue? In my experience they cause this kind of problems, better just check if data is coming in or re-connect.
The source are Radarcapes.
EDIT: Also with Dump1090 on the other end it shows the same problem.

● beast-splitter.service - Mode S Beast data forwarder
Loaded: loaded (/lib/systemd/system/beast-splitter.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2020-08-27 09:17:02 CEST; 1 weeks 0 days ago
Docs: https://github.com/flightaware/beast-splitter
Main PID: 391 (beast-splitter)
CGroup: /system.slice/beast-splitter.service
└─391 /usr/bin/beast-splitter --net 192.168.10.158:10003 --listen localhost:35005:RCdfGi --listen localhost:45005:Bcdfgij

mutability · 2020-09-03T17:24:43Z

When it gets stuck, what is the state of the TCP connection? Is new data actually arriving from the remote host? Is there data waiting in the receive buffer that beast-splitter isn't reading? What does the Radarcape think the state of the connection is?

marcus-aa · 2020-09-03T19:42:22Z

I have another one here that is stuck right now if you need further details. Here the other side is actually dump1090-fa on another Rpi.

INPUT_OPTIONS="--net 192.168.1.8:30005"
OUTPUT_OPTIONS="--listen localhost:30005:BCdfGi --connect localhost:30004:BCdfGi"

All connections show "Established".

netstat -tpn | grep beast
tcp        0      0 127.0.0.1:54598         127.0.0.1:30004         ESTABLISHED 16357/beast-splitte 
tcp        0      0 127.0.0.1:30005         127.0.0.1:54054         ESTABLISHED 16357/beast-splitte 
tcp        0      0 192.168.1.77:44636      192.168.1.8:30005       ESTABLISHED 16357/beast-splitte 
tcp        0      0 127.0.0.1:30005         127.0.0.1:52514         ESTABLISHED 16357/beast-splitte

There is plenty of data if I make a parallell connection to the same port, but none on the BS output ports.

root@P-ESSE01:~# nc localhost 30005
^C
root@P-ESSE01:~#

root@P-ESSE01:~# nc 192.168.1.8 30005 
2??R?K?]Fv?r2??R?x?=P?3??R?켠0?????s?3??R??-w???`0???Z?2??SK?]Fv?r3??S?w????????B??3??Sа????*A4?
                                                                                                }k?2??S?? ?Œ?2??SLλ]Fv?r3??S-?;?FX??qH??2??S?f]D	?^T?3??S?'?0?????s?2??LO)]BB??vQ3????'???g: ?է!2??S	^?<]F??e$3??S	ѻ'?0?9?0?>?)2??S
ҫ& 0K?)2??S
           ??? 0K?+3??S
                       ??Р??i??7??@?2??S
??+?F!/?!??3??S??;???A ???GE?2??S?(     ^

The service is active and there are no attempts to re-connect to 1.8

root@P-ESSE01:~# systemctl status beast-splitter.service
● beast-splitter.service - Mode S Beast data forwarder
   Loaded: loaded (/lib/systemd/system/beast-splitter.service; enabled; vendor preset: enabled)
   Active: active (running) since Wed 2020-08-26 17:18:05 CEST; 1 weeks 1 days ago
     Docs: https://github.com/flightaware/beast-splitter
 Main PID: 16357 (beast-splitter)
   CGroup: /system.slice/beast-splitter.service
           └─16357 /usr/bin/beast-splitter --net 192.168.1.8:30005 --listen localhost:30005:BCdfGi --connect localhost:30004:BCdfGi --status-file /run/beast-splitter/status.json

Sep 03 21:31:54 P-ESSE01 beast-splitter[16357]: 127.0.0.1:52758: settings changed to BCdfgij
Sep 03 21:34:24 P-ESSE01 beast-splitter[16357]: 127.0.0.1:52758: connection closed
Sep 03 21:34:24 P-ESSE01 beast-splitter[16357]: 127.0.0.1:30005: accepted a connection from 127.0.0.1:52778 with settings BCdfgi
Sep 03 21:34:24 P-ESSE01 beast-splitter[16357]: 127.0.0.1:52778: settings changed to BCdfgij
Sep 03 21:36:54 P-ESSE01 beast-splitter[16357]: 127.0.0.1:52778: connection closed
Sep 03 21:36:54 P-ESSE01 beast-splitter[16357]: 127.0.0.1:30005: accepted a connection from 127.0.0.1:52798 with settings BCdfgi
Sep 03 21:36:54 P-ESSE01 beast-splitter[16357]: 127.0.0.1:52798: settings changed to BCdfgij
Sep 03 21:39:25 P-ESSE01 beast-splitter[16357]: 127.0.0.1:52798: connection closed
Sep 03 21:39:25 P-ESSE01 beast-splitter[16357]: 127.0.0.1:30005: accepted a connection from 127.0.0.1:52812 with settings BCdfgi
Sep 03 21:39:25 P-ESSE01 beast-splitter[16357]: 127.0.0.1:52812: settings changed to BCdfgij

mutability · 2020-09-04T18:29:11Z

What does netstat look like on the dump1090 side?

Usually the cause of this sort of behavior is that the sending side no longer believes the connection is alive, but the receiving side never heard the connection teardown messages. This can be e.g. because of a sufficiently long network outage, or intermediate firewalls losing connection state.

dump1090-fa and, I think, Radarcapes will ensure they're always sending regular data, so treating the connection as failed after a timeout with no data is one way to solve it. But this behaviour is not universal so it'd need to be optional. Sending periodic beast commands upstream is another way to detect a failed connection. Enabling TCP keepalives is third way but the typical keepalive timeouts tend to be very large (hours).

marcus-aa · 2021-01-30T17:18:55Z

Revisiting.

In my experience with TCP data in NMEA & AIS networks, where data is similar and often unidirectional, there is a need to have a data timeout on the receiving side somewhere between 60 and 200 seconds. Otherwise this condition will appear sooner or later. In this particular case once per day on average.

I now have access to both sides of this connection, here is the Beast-splitter side (IP 192.68.1.10 connecting to .1.8:30005)

And here is the dump1090-fa side,

Just like you say there is no connection there any more.

After re-starting beast-splitter:

it is working normally again, for a while.

It would be very good to add an optional data timeout, otherwise the network input is not so useable.

mlat-client handles this already as default I think? It does not have this problem if I connect it directly to a network port without using beast-splitter in between. But then I don't get the ADSB-data to the rest of the system.

marcus-aa · 2022-09-18T14:58:37Z

Re-visiting. Same problem on another host. It has sat there for a week now thinking the connection is up. A data timeout is needed.

9404 /usr/bin/beast-splitter --net 192.168.0.102:10002 --listen localhost:30005:RCdfGi --listen localhost:40005:BCDfgij --connect localhost:30004:BCdfGi --status-file /run/beast-splitter/status.json

root@P-ESPA01:~# netstat -tpn | grep beast
tcp 0 0 127.0.0.1:30005 127.0.0.1:39208 ESTABLISHED 9404/beast-splitter
tcp 0 0 127.0.0.1:30005 127.0.0.1:51620 ESTABLISHED 9404/beast-splitter
tcp 0 0 192.168.0.101:46606 192.168.0.102:10002 ESTABLISHED 9404/beast-splitter

root@P-ESPA01:~# cat /run/beast-splitter/status.json
{
"radio" : {
"status" : "red",
"message" : "Not connected to receiver"

root@P-ESPA01:~# nc 192.168.0.102 10002
24Xħ??=�??34Xħ?)V?=?80????34X?QM? ??J%-??2ԧ34X?

No problem to re-connect.

mutability · 2022-09-18T15:27:16Z

I doubt I'll get to this any time soon - can you look at doing a PR for this?

mutability · 2022-09-18T15:29:48Z

That "not connected" is a little odd though - I wonder if something else is going on there. What's the last log output?

("not connected" implies that the connection is dead -- but the OS disagrees -- or has never seen a valid message, or it got garbled data and couldn't re-establish sync)

marcus-aa · 2022-09-18T17:08:07Z

My understanding is that Beast-splitter "knows" on a data level that it is not receiving status messages (it is a Radarcape), therefor the "Not connected..." messages , but the TCP connection has not been re-established correctly. So BS doesn't "do" anything with the information that the status messages are lost. Different OSI layers...

I only have log entries back to the 16th, lost connection on the 12th. It just repeats like this and are the incoming connections.

Sep 18 18:47:23 P-ESPA01 beast-splitter[9404]: 127.0.0.1:51810: settings changed to RCdfGij
Sep 18 18:49:53 P-ESPA01 beast-splitter[9404]: 127.0.0.1:51810: connection closed
Sep 18 18:49:53 P-ESPA01 beast-splitter[9404]: 127.0.0.1:30005: accepted a connection from 127.0.0.1:51816 with settings RCdfGi
Sep 18 18:49:53 P-ESPA01 beast-splitter[9404]: 127.0.0.1:51816: settings changed to RCdfGij

mlat-client seems to have a 150s timeout and re-establishes connections repeatedly.

Sep 18 18:59:54 P-ESPA01 vrss-mlat-client[255]: Sun Sep 18 18:59:54 2022 Disconnecting from localhost:30005: No data (not even keepalives) received for 150 seconds
Sep 18 18:59:54 P-ESPA01 vrss-mlat-client[255]: Sun Sep 18 18:59:54 2022 Input connected to localhost:30005
Sep 18 18:59:54 P-ESPA01 vrss-mlat-client[255]: Sun Sep 18 18:59:54 2022 Input format changed to RADARCAPE, 1000MHz clock

marcus-aa · 2022-09-18T17:10:07Z

I doubt I'll get to this any time soon - can you look at doing a PR for this?

I will look, but my programming skills are unfortunately more in the line of making a script that reads status.json and forces a re-start of the beast-splitter service if there are too many consecutive "Not connected"....

mutability · 2022-09-18T17:30:17Z

When beast-splitter has seen at least one status message, it expects at least one status message every 15 seconds. If 15 seconds elapses with no status messages, it will disconnect / reconnect, which is basically exactly what you're asking for here:

beast-splitter/beast_input.cc

Line 277 in 0b03490

liveness_timer.expires_from_now(radarcape_liveness_interval);

The "not connected to receiver", however, means that is_connected() returns false:

beast-splitter/status_writer.cc

Line 170 in 0b03490

    
           std::string radio_message = (input->is_connected() ? "Connected to receiver" : "Not connected to receiver");

beast-splitter/beast_input.h

Line 86 in 0b03490

    
           bool is_connected(void) const { return (good_sync && receiver_type != ReceiverType::UNKNOWN); }

If is_connected() is false, that means that either

a) we have no connection at all; or
b) we have a connection, but autodetection of Beast vs Radarcape mode is still in progress; or
c) we have a connection, but we don't have "sync" (where "sync" means "discovered the start of a correctly formatted message" -- this is more relevant for serial connections, where you might start receiving data mid-message).

For (a) the OS, at least, thinks we have a connection;
for (b) that condition should only last for a few seconds, not hours;
which leaves (c)

So my current guess (in the absence of a bug somewhere in the above code) is that beast-splitter established the most recent network connection, and then absolutely nothing was ever received - no status messages, no data, nothing. Since no status message was received, the must-see-regular-status-messages timeout never gets started and the connection just sits there forever.

* If the input connection is forced to Radarcape mode, then Radarcape status messages are expected immediately; beast-splitter does not wait to see a status message before starting the timer that expects at least one status message per 15 seconds. * New option, --beast-input-timeout, sets a timeout for non-Radarcape connections (in seconds). If no valid message is received within that timeout, beast-splitter disconnects/reconnects. + some internal cleanups

mutability · 2022-10-06T07:38:24Z

Can you try out PR #6 ? (Compiles, minimal testing)

You'll want to pass a new command-line option --beast-input-timeout (e.g. --beast-input-timeout 60.0 for a 60-second timeout). This applies only to non-Radarcape connections, Radarcape-mode connections expect a status message every 15 seconds even if --beast-input-timeout is not set

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data forwarding repeatedly stopping #5

Data forwarding repeatedly stopping #5

marcus-aa commented Sep 3, 2020 •

edited

Loading

mutability commented Sep 3, 2020 •

edited

Loading

marcus-aa commented Sep 3, 2020 •

edited

Loading

mutability commented Sep 4, 2020

marcus-aa commented Jan 30, 2021

marcus-aa commented Sep 18, 2022

mutability commented Sep 18, 2022

mutability commented Sep 18, 2022 •

edited

Loading

marcus-aa commented Sep 18, 2022

marcus-aa commented Sep 18, 2022

mutability commented Sep 18, 2022

mutability commented Oct 6, 2022 •

edited

Loading

Data forwarding repeatedly stopping #5

Data forwarding repeatedly stopping #5

Comments

marcus-aa commented Sep 3, 2020 • edited Loading

mutability commented Sep 3, 2020 • edited Loading

marcus-aa commented Sep 3, 2020 • edited Loading

mutability commented Sep 4, 2020

marcus-aa commented Jan 30, 2021

marcus-aa commented Sep 18, 2022

mutability commented Sep 18, 2022

mutability commented Sep 18, 2022 • edited Loading

marcus-aa commented Sep 18, 2022

marcus-aa commented Sep 18, 2022

mutability commented Sep 18, 2022

mutability commented Oct 6, 2022 • edited Loading

marcus-aa commented Sep 3, 2020 •

edited

Loading

mutability commented Sep 3, 2020 •

edited

Loading

marcus-aa commented Sep 3, 2020 •

edited

Loading

mutability commented Sep 18, 2022 •

edited

Loading

mutability commented Oct 6, 2022 •

edited

Loading