-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data forwarding repeatedly stopping #5
Comments
When it gets stuck, what is the state of the TCP connection? Is new data actually arriving from the remote host? Is there data waiting in the receive buffer that beast-splitter isn't reading? What does the Radarcape think the state of the connection is? |
I have another one here that is stuck right now if you need further details. Here the other side is actually dump1090-fa on another Rpi.
All connections show "Established".
There is plenty of data if I make a parallell connection to the same port, but none on the BS output ports.
The service is active and there are no attempts to re-connect to 1.8
|
What does netstat look like on the dump1090 side? Usually the cause of this sort of behavior is that the sending side no longer believes the connection is alive, but the receiving side never heard the connection teardown messages. This can be e.g. because of a sufficiently long network outage, or intermediate firewalls losing connection state. dump1090-fa and, I think, Radarcapes will ensure they're always sending regular data, so treating the connection as failed after a timeout with no data is one way to solve it. But this behaviour is not universal so it'd need to be optional. Sending periodic beast commands upstream is another way to detect a failed connection. Enabling TCP keepalives is third way but the typical keepalive timeouts tend to be very large (hours). |
Revisiting. In my experience with TCP data in NMEA & AIS networks, where data is similar and often unidirectional, there is a need to have a data timeout on the receiving side somewhere between 60 and 200 seconds. Otherwise this condition will appear sooner or later. In this particular case once per day on average. I now have access to both sides of this connection, here is the Beast-splitter side (IP 192.68.1.10 connecting to .1.8:30005) And here is the dump1090-fa side, Just like you say there is no connection there any more. After re-starting beast-splitter: it is working normally again, for a while. It would be very good to add an optional data timeout, otherwise the network input is not so useable. mlat-client handles this already as default I think? It does not have this problem if I connect it directly to a network port without using beast-splitter in between. But then I don't get the ADSB-data to the rest of the system. |
Re-visiting. Same problem on another host. It has sat there for a week now thinking the connection is up. A data timeout is needed. 9404 /usr/bin/beast-splitter --net 192.168.0.102:10002 --listen localhost:30005:RCdfGi --listen localhost:40005:BCDfgij --connect localhost:30004:BCdfGi --status-file /run/beast-splitter/status.json root@P-ESPA01:~# netstat -tpn | grep beast root@P-ESPA01:~# cat /run/beast-splitter/status.json root@P-ESPA01:~# nc 192.168.0.102 10002 No problem to re-connect. |
I doubt I'll get to this any time soon - can you look at doing a PR for this? |
That "not connected" is a little odd though - I wonder if something else is going on there. What's the last log output? ("not connected" implies that the connection is dead -- but the OS disagrees -- or has never seen a valid message, or it got garbled data and couldn't re-establish sync) |
My understanding is that Beast-splitter "knows" on a data level that it is not receiving status messages (it is a Radarcape), therefor the "Not connected..." messages , but the TCP connection has not been re-established correctly. So BS doesn't "do" anything with the information that the status messages are lost. Different OSI layers... I only have log entries back to the 16th, lost connection on the 12th. It just repeats like this and are the incoming connections. Sep 18 18:47:23 P-ESPA01 beast-splitter[9404]: 127.0.0.1:51810: settings changed to RCdfGij mlat-client seems to have a 150s timeout and re-establishes connections repeatedly. Sep 18 18:59:54 P-ESPA01 vrss-mlat-client[255]: Sun Sep 18 18:59:54 2022 Disconnecting from localhost:30005: No data (not even keepalives) received for 150 seconds |
I will look, but my programming skills are unfortunately more in the line of making a script that reads status.json and forces a re-start of the beast-splitter service if there are too many consecutive "Not connected".... |
When beast-splitter has seen at least one status message, it expects at least one status message every 15 seconds. If 15 seconds elapses with no status messages, it will disconnect / reconnect, which is basically exactly what you're asking for here: Line 277 in 0b03490
The "not connected to receiver", however, means that beast-splitter/status_writer.cc Line 170 in 0b03490
Line 86 in 0b03490
If a) we have no connection at all; or For (a) the OS, at least, thinks we have a connection; So my current guess (in the absence of a bug somewhere in the above code) is that beast-splitter established the most recent network connection, and then absolutely nothing was ever received - no status messages, no data, nothing. Since no status message was received, the must-see-regular-status-messages timeout never gets started and the connection just sits there forever. |
* If the input connection is forced to Radarcape mode, then Radarcape status messages are expected immediately; beast-splitter does not wait to see a status message before starting the timer that expects at least one status message per 15 seconds. * New option, --beast-input-timeout, sets a timeout for non-Radarcape connections (in seconds). If no valid message is received within that timeout, beast-splitter disconnects/reconnects. + some internal cleanups
Can you try out PR #6 ? (Compiles, minimal testing) You'll want to pass a new command-line option |
I have a problem on 2 different installations where BS just stops forwarding the data after 24-36 hours after a re-start.
This only happens where BS connects over LAN/TCP to the source, never when it uses a /dev/usb port.
INPUT_OPTIONS="--net 192.168.10.158:10003"
The TCP connection stays up, BS answers new local connections to the output ports, but no data is coming through.
if I stop and start the service, everything works again for a day or two.
TCP keepalive issue? In my experience they cause this kind of problems, better just check if data is coming in or re-connect.
The source are Radarcapes.
EDIT: Also with Dump1090 on the other end it shows the same problem.
● beast-splitter.service - Mode S Beast data forwarder
Loaded: loaded (/lib/systemd/system/beast-splitter.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2020-08-27 09:17:02 CEST; 1 weeks 0 days ago
Docs: https://github.com/flightaware/beast-splitter
Main PID: 391 (beast-splitter)
CGroup: /system.slice/beast-splitter.service
└─391 /usr/bin/beast-splitter --net 192.168.10.158:10003 --listen localhost:35005:RCdfGi --listen localhost:45005:Bcdfgij
The text was updated successfully, but these errors were encountered: