-
-
Notifications
You must be signed in to change notification settings - Fork 404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple copies of messages received when on a 3G connection #1106
Comments
Is there a way to rule out this hypothesis? I've noticed this behaviour after a connection was established and maintained for hours on end, so I have reasons to doubt that is is related to reconnection logic. |
Is the connection truly maintained for hours on end? I can't imagine 3G sustaining a connection for that long, it's probable that Sopel experienced a couple ping timeouts (and gracefully reconnected) over the course of that time. |
You're right, it is most likely not the same connection. Is there some kind of logging mechanism I could use to see if this is indeed the case? What would be the most reasonable troubleshooting strategy for this issue? |
Yeah, check your stdio.log - it should print something along the lines of "Warning: Disconnected. Reconnecting in 20 seconds..." whenever it loses connection. If you need timestamped logs, check raw.log but you'll have to look for message code 001 or similar. |
Having examined the logs I found a recurrent pattern, like this one, which is copied as many times as there are "re-reactions" to the incoming message.
It thinks the file descriptor is bad (maybe assuming the connection is gone). However, all the responses are eventually sent a bit later - which means that the file descriptor shouldn't have been bad. |
The plot thickens - I found a new clue for this puzzle. The problem occurs when I run Sopel on a 3G connection via So, the summary is that the This is very puzzling, so if you have any suggestions as to what might be causing this - please let me know. |
Is there any further info on this? At some point it would be nice to close this as resolved (or out of Sopel's control), and the only way that can happen without just declaring the issue abandoned is with details. :) |
The problem persists in the latest versions of Sopel as well, I was unable to troubleshoot it myself, unfortunately. However, I am willing to commit to running tests, trying out betas, or investigating it on my own if someone can provide new educated guesses about the root cause. |
If it occurs only when running under systemd, that sounds like a possible bug in the systemd environment. But I'm getting ahead of myself—never assume a bug is someone else's fault unless you've proved that it's definitely not yours. 😀 I don't see that you've checked the raw log, so it could help pinpoint this if you set |
I did check the raw log back then, but did not post it here. At this point I am preparing an environment where I can replicate this. Let me share some initial key-facts and observations I've made so far:
Given that this is a live system, I am reluctant to make an update to the latest version of Sopel. One system was selected for a "deeper" experiment, but there are some complications - the new Sopel depends on Another complication is that this is a headless system that runs on a physically moving target, in an environment where I have no direct access to it, so I cannot not run it under systemd without losing the guarantee that Sopel will start and be shown on my "radar". Having said that:
Another observation that is worth mentioning - I never noticed this behaviour with Sopel's default commands, e.g.
It works as expected when behaviour is normal. Otherwise it still works, but the function is invoked multiple times. |
I'll review what's changed since then, for my own information, since I've only been maintaining since the run-up to 6.5.2 and didn't keep track of changes back then. But technically only the newest version of Sopel is supported, and the status of open issues should be confirmed in the last release (or on
But I understand where you're coming from on updates. 😸
I don't know where this came from. Version 6.5.1 (long the newest release until less than a week ago) didn't require IPython at all—incorrectly, because the
That's an interesting data point. You shared a pair of function defs from one of your custom modules, but are these modules that manifest the duplicating behavior coded as standalone files (like the core modules) or based on Should your newly cleared logs turn up any further information, I'm sure I'll hear about it soon, here. As an aside, I think it's pretty neat that Sopel has found a use case in a public transit system. I'm glad you elected to share that detail. :-) |
Some updates so far, the two systems where the test is conducted are still not exhibiting the symptoms.
Also:
After the update, Sopel was restarted, and I'm keeping an eye on it, by sending commands every now and then. As for your last question, I did not use the cookie-cutter template, but neither did I do that as a standalone file like the standard modules. I read the documentation, checked some examples, and implemented the plugin. Here's what the tree looks like:
Let's say the module is called
Having examined the cookie-cutter template, I am inclined to say that my code replicates that model, as the tree structures are identical. |
I see. So, you have Sopel load the On the one hand, it's frustrating to see package-module reloading cause issues. But I'd also rather it be the same as an already-known issue than something new. :D |
Alright, I have some more clues, here's the raw log and some explanations about what is going on.
So based on this, I can say it is not about systemd, not about 3G and not about disconnect events, at least not at the surface :-) I was able to replicate the issue by forcing a I then did it in the systemd instance of Sopel and yes - the problem came back. (this part is already not in the log, because I did it after retrieving it).
Final point: note that I have to separate processes writing to the same log files, so some events are duplicated, e.g. |
So yeah, in retrospect, it is obviously #1056, the whole 3g/systemd saga was a false trail. |
Based on that raw log, yes I think we definitely have enough information to call this a duplicate of #1056! Believe it or not, I consider this a victory. It means there's still only one (known) bug related to modules, commands, and reconnection/reloading, instead of two separate ones to find and fix. I don't have a timeline on fixing #1056, but I'm not going to release Sopel 7 without squishing that nasty bug, for reasons brought up in other issues (like #1291, which result in an awful user experience without fixing this reloading problem). Thank you for taking the time to dump and sanitize your logs for me, @ralienpp! At least I can take great pleasure in closing an old issue. :-) |
This behaviour consistently replicates when the bot is on a system that is connected to the Internet over a 3G modem connection. The same system behaves correctly when the connection is over Wi-Fi.
At first glance it seems that the bot sends the same message multiple times, this is what it looks like:
However, I am certain this happens because sopel "thinks" it received multiple messages, and each of its redundant responses is actually mapped to a redundantly received message. I know that because one of my commands spawns a new process, and the PID of the new process is in the bot's answer.
If those were dupes, the PIDs would be the same. But since the PIDs are different, it means that Sopel actually handled the command N times.
The text was updated successfully, but these errors were encountered: