Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please can we regularly and automatically reap idle-presence connections on all networks #388

Closed
ara4n opened this issue Mar 11, 2017 · 36 comments
Labels
matrix.org-support Matrix.org specific problem possibly unrelated to the bridge p1

Comments

@ara4n
Copy link
Member

ara4n commented Mar 11, 2017

e.g. Snoonet IRC users are enraged at the amount of join/part spam when the bridge farts

@kegsay kegsay added the matrix.org-support Matrix.org specific problem possibly unrelated to the bridge label Mar 14, 2017
@kegsay
Copy link
Member

kegsay commented Mar 23, 2017

This can't be done until presence on matrix.org Synapse is turned back on.

@PureTryOut
Copy link

Please giving this some more priority, IRC users are getting more and more enraged every time the bridge has some issue related to connection.

screenshot_2017-04-07_11-50-08

@justjanne
Copy link

Especially because the joinspam delays all messages a matrix user sends until after the joinspam is over, so after the joinspam you get another few hundred lines of text spam relevant to stuff that was half an hour ago.

This means the matrix bridge having a hiccup makes the entire channel unusable, far worse than any kind of netsplit.

@kegsay
Copy link
Member

kegsay commented Apr 7, 2017

I'm not familiar with what you're describing @justjanne - if you can give me a network/channel/timestamp I can look into what happened.

Joinspam doesn't delay messages a matrix user sends (unless they were never joined in which case messages will be stacked up until they join), but we set restrictions on how long ago we will send messages through to IRC, so you shouldn't be getting a "few hundred lines of text spam", which is why I'm interested in looking into this.

@justjanne
Copy link

@kegsay: "unless they were never joined", but that's exactly the case when the bridge has an hiccup.

Every matrix user leaves the channel.

All of them rejoin, one after another, and as soon as they rejoin, all they had written in the past minutes will appear at once.

@kegsay
Copy link
Member

kegsay commented Apr 7, 2017

Hmm, I think I understand what this is then. The check which drops old messages is only done when the message is received from Matrix, it doesn't account for the time taken to reconnect the user to IRC, which would explain why it sends old messages. I can definitely fix that.

As for the hiccup, it looks like Snoonet stopped responding to our pings so we ended up knifing the connections. It took an exorbitant amount of time to reconnect what amounts to a few hundred users because it looks like we still have a global rate-limiting queue in place for Snoonet. I can also disable that, so in the event of a connection problem between the bridge and Snoonet it will reconnect much more rapidly.

@kegsay
Copy link
Member

kegsay commented Apr 7, 2017

Right, I've configured Snoonet to reconnect much more rapidly in the event of disconnections. I also need to update it to v0.7.2 to pull in all the performance work. I'll work on a patch for sending old messages and when that has landed on develop I'll restart Snoonet with that fix applied, so:

  • If users disconnect, they will reconnect more quickly.
  • Upon reconnection, they won't send messages that happened more than 5 minutes ago.

This should address your concerns @justjanne

@justjanne
Copy link

@kegsay: I've started work on an inverted matrix irc bridge, virtually joining all IRC users on matrix instead.

This means IRC will be continually usable, and it will be matrix users getting the daily continuing spam, getting the broken contexts, etc.

Hopefully it will be able to replace this project.

@PureTryOut
Copy link

See, she is one of those raging IRC users... Doesn't even give the project a chance, and blames the whole Matrix protocol for an issue in the bridge.

Anyways, that sounds a lot better @kegsay. Together with +D, or whatever the mode is called which doesn't show joins till a user has talked, on the IRC channel it should clear a lot of the spam. Now just removing the idle users, and I think most of the complaints should be over.

@justjanne
Copy link

@PureTryOut Don’t read too much into the jokingly critical comment ;)

I’m mostly exploring ideas atm, trying to truly solve it. One solution could be building something that can speak the inter-server protocols of a few IRC services, and speak Matrix on the other side.

The bridge leaving would look like a netsplit, integration would be native, and could even be done with NickServ, etc.

@kegsay
Copy link
Member

kegsay commented Apr 9, 2017

@justjanne you may be interested in https://github.com/matrix-org/matrix-ircd which is basically what you describe.

@kegsay
Copy link
Member

kegsay commented Apr 11, 2017

RE snoonet: element-hq/element-web#412 is merged, the config file is updated, I'll restart the Snoonet bridge tomorrow (2017/04/12) morning UTC which looks to be the quietest time according to the graphs.

@kegsay
Copy link
Member

kegsay commented Apr 12, 2017

^ This happened just now.

@ara4n
Copy link
Member Author

ara4n commented May 28, 2017

I just got further requests from Freenode for this; manually running the inactive-user-kick script on portal rooms is insufficient given it ignores plumbed rooms (which turn out to be the more problematic ones).

Obviously this can backfire if HS presence is turned off, but i suggest we default to idle (i.e. haven't seen any traffic) rather than inactive (i.e. haven't seen any online presence) semantics if presence has been disabled.

@cmcaine
Copy link

cmcaine commented Jan 6, 2018

I think it's confusing to matrix users if they're disconnected due to inactivity from matrix rooms that happen to have an IRC bridge, but not from other matrix rooms.

Is it possible to have the IRC bridge kick users only on the IRC side, then re-join them transparently once they become active on matrix?

As it is, some valuable contributors to my channel have been kicked because they haven't been around for a while, which seems a bit rude. It also means they're not around to be mentioned to pull them back in and, if when they next use matrix my channel has disappeared from their channel list, they may not remember to re-join.

@Mikaela
Copy link
Contributor

Mikaela commented Jan 6, 2018

I fear that keeping users at Matrix side wouldn't work either as IRC side generally prefers to see everyone in the nicklist and there are often complaints when someone has seen discussions without being in the channel visibly. There was also an issue where in race condition past history was visible in portal rooms, but I cannot find it at the moment.

Related:
#448
element-hq/element-web#449 element-hq/element-web#304

@cmcaine
Copy link

cmcaine commented Jan 6, 2018 via email

@Croydon
Copy link

Croydon commented Jan 6, 2018

I don't get that point. Somebody joins willingly an IRC channel in which you are OP. Then complaining about potential spam (what exactly?) from YOUR setup because YOU have installed a bridge. So the bridge is starting to kick idle users who did join willingly the channel to avoid potential spam FOR THEM. If somebody is not interested anymore in a channel they can leave on their own?

Also definitely rude and unexpected.

@justjanne
Copy link

The issue, @Croydon, is that 99% of matrix users in these larger channels never contribute, never read, stay idle — but every time the bridge reconnects (which in the past sometimes happened multiple times on a single day), you'll get 800 quit/join/quit/join/mode messages, and discussion will be impossible for half an hour.

@justjanne
Copy link

And it's not just quit/join. You have users leaving, users joining, the bridge slowly sending these messages (a few joins per second, so it continues for half an hour), then the bridge is replaying older messages from the past minutes that the users sent, then you have bots reacting to the joins, bots reacting to the replayed messages, channelmodes being set for the users, etc. It's insanity.

@Croydon
Copy link

Croydon commented Jan 6, 2018

@justjanne So the issue is on the Matrix side and not on IRC at all? I had the impression that IRC users did complain.

Then I guess there are better ways to figure something out.

Can't the bridge just post a "Over 100+ users joined/left" message instead of posting for each one individual?

then the bridge is replaying older messages from the past minutes that the users sent

You mean a Matrix users is sending a message, which gets posted into IRC, which in turn gets copied back to Matrix? Doesn't the bridge just ignore all messages from Matrix connected clients?

@justjanne
Copy link

First: The issue is on IRC side — the bridge is joining every user back into the channel, so all of this is happening (and spamming) on the IRC side.

Second: A matrix user sends a message, the message is sent to IRC. The bridge restarts, and sends the message again.

@cmcaine
Copy link

cmcaine commented Jan 6, 2018 via email

@ara4n
Copy link
Member Author

ara4n commented Jan 6, 2018

hang on. there's a whole bunch of confusion and misunderstanding going on here - please can I try to clarify:

  • At Freenode's request we ran the script today which disconnects any Matrix users who have not used their Matrix account whatsoever for more than 30 days from Freenode, as they are starting to have a tangible impact on the network (both by joins/quits if there's a netsplit between Freenode & Matrix, and in terms of overloading the ircds for no good reason).
  • In practice, this means that if you have a Matrix room which someone has plumbed into Freenode, it will kick the user from the Matrix room too. This is inevitable, given a stipulation of bridging to IRC (especially Freenode) is that the users on the Freenode side of the bridge need to be able to see which remote users are present and reading their conversation.

@cmcaine asked:

Is it possible to have the IRC bridge kick users only on the IRC side, then re-join them transparently once they become active on matrix?

No, unfortunately, as if they have been removed from the IRC side of the bridge we need to remove them from the Matrix side too, otherwise they would be seen to be 'spying invisibly' on the conversation from the perspective of IRC users.

As it is, some valuable contributors to my channel have been kicked because they haven't been around for a while, which seems a bit rude.

Yup, I'm sorry about this. The workarounds for this I can see are:

  • Unbridge your room from IRC.
  • Suggest a better 'absentee' threshold than 30 days beyond which we remove absent users from IRC-bridged rooms
  • Encourage someone to write or finish a TS6<->Matrix bridge, so the chance of netsplits and overloading on the Freenode side are smaller, and may make Freenode more willing to allow absent users to lurk forever.

It also means they're not around to be mentioned to pull them back in...

So, we could absolutely implement rocketchat-style "@-mention autoinvites a user into a room" semantics - i've filed a bug for it at https://github.com/vector-im/riot-web/issues/5933. It's unlikely to make it onto the core team radar but it's an easy thing for someone to contribute.

...and, if when they next use matrix my channel has disappeared from their channel list, they may not remember to re-join.

This is (i hope) not true. We very deliberately held off on ever auto-kicking users from channels until we had the right semantics in Matrix which meant that rooms which you are kicked from still appear in your channel list when you next use the client, until you explicitly close them. (Unless this has bitrotted; I haven't tested it recently).

Doesn't the Matrix side archive all messages forever by default? If so
there should be no expectation of privacy unless the matrix channel is set
up differently.

Matrix does archive all the messages forever, but we also apply ACLs on them to ensure that they are only visible to people who are actually participating on the IRC side of the bridge, in IRC bridged rooms, in order to support the expectation of IRC's privacy semantics. There are some edge cases where this fails briefly if the bridge crashes, but generally it's good enough. A good way to think of Matrix is simply as a bouncer. Just because I happen to be connecting to IRC via a bouncer which logs all my channels, it doesn't mean that other users on the same bouncer should be able to magically see my logs(!!!)

If users are away for a month or more it's no different to them having an
up to date log and reconnecting eventually.

So, if you have configured the history permissions on the Matrix room (in agreement with the IRC ops on the IRC side of the bridge) to publish the history to all, then the user can simply reconnect when they next use the client - which should be just a matter of pressing 'join' on the copy of the room they were kicked from, which will still be there in their room list, and then catch up on the history they've missed out on. Which makes it feel fine to have kicked them.

@justjanne said:

The issue, @Croydon, is that 99% of matrix users in these larger channels never contribute, never read, stay idle

This is completely untrue and rather disingenuous :( A very common use for Matrix is for people to use it as a bouncer to lurk in rooms for FOSS projects they're interested in - so they can read scrollback, search scrollback, and monitor it for keyword notifications (e.g. their nick). For instance, on my personal account I'm lurking in around 30 IRC channels that I very rarely speak in unless I'm binged, or unless I'm catching up on scrollback as if i were reading an RSS feed.

As per this bug, we kick Matrix users from IRC-bridged rooms who have been entirely absent from Matrix for more than 30 days. But in practice this is a relatively small proportion - today, there were 5000 such idle users, out of 22,000 total. The rest of them are actively using Matrix, using it to either lurk or contribute in IRC channels - no different to IRCCloud or a znc or whatever your bouncer of choice is.

— but every time the bridge reconnects (which in the past sometimes happened multiple times on a single day), you'll get 800 quit/join/quit/join/mode messages, and discussion will be impossible for half an hour.

So the fact that most Matrix users use the matrix.org-hosted Freenode bridge means that if something goes wrong on the bridge, the join/part spam is horrific. Towards the end of last year, this became a big problem and pissed off a lot of people (presumably including @justjanne) because Freenode was experiencing frequent netsplits between its ircds, which was overloading the Matrix bridge with IRC join/parts and triggering a really nasty bug on the Matrix bridge which would then livelock, requiring a restart and a flood of join/part spam - which could then break again if subsequent Freenode netsplits happened whilst the bridge was restarting.

We (@dbkr actually) tracked down the root cause of the bug which was livelocking the Matrix bridge at the beginning of December and since the fix was deployed on Dec 8 we have not had a single crash or netsplit between Matrix & Freenode, afaik - despite loads of freenode netsplits and IRC-originated spam since then. #530 was the fix fwiw.

In other words, @justjanne is complaining about a legitimate problem, but one we fixed a month ago. In the longer term we want to make this better by decoupling the IRC-connection side of the bridge from the rest of the bridge logic - or replacing it with a TS6<->Matrix bridge, but we're not there yet. PRs welcome.

TL;DR: I think the current behaviour is the best we can provide given the expectations of both IRC & Matrix users. Suggestions to improve it would still be appreciated however.

@ara4n
Copy link
Member Author

ara4n commented Jan 6, 2018

(fwiw, element-hq/element-web#308 is the bug, closed almost 2 years ago, which ensures that if you're kicked from a room it still appears in your roomlist the next time you use the client - at least on Riot/Web)

@justjanne
Copy link

@ara4n You’ll probably notice that I’m already @-mentioned in the second message of this thread – this issue has been annoying me for a long time, and also on Snoonet – where the "many users aren’t contributing, but lurking" is more pronounced.

or replacing it with a TS6<->Matrix bridge

That’s a suggestion that I suggested ages ago, but back then, the discussion with Snoonet network operators at least was that they wouldn’t be comfortable with running that, if I recall correctly.

In the long term, anything short of directly linking the bridge to the server-to-server protocol between the network’s servers will be suboptimal, and these issues will continue in one way or another (although I have to acknowledge the amazing work you did on solving the unsolvable, by actually getting a month without disconnection on IRC)

@ara4n
Copy link
Member Author

ara4n commented Jan 6, 2018

Whilst I haven't spoke to Snoonet about a TS6 bridge, the Freenode staff seem quite interested in such a thing. We'd love to build one in the core team, but simply don't have resource to write new bridges atm. Instead, I'm aware of about 3 other attempts to write ones (https://github.com/jevolk/charybdis/tree/master/include/ircd/m is aiming for this; i think @ilmari might have been looking at one (sorry if I'm totally misremembering); and someone who hated Matrix but hated the netsplit spam even more was also looking at it who I don't recall), but I've yet to see any code working. Eitherway, let's take that discussion to #329.

Likewise, if folks are actually seeing stability bugs which are causing netsplit-style join/part spam, please let us know (in another bug). I've just put out a call for more feedback over at https://twitter.com/matrixdotorg/status/949692219463819265 aka https://mastodon.matrix.org/@matrix/236579 in case anyone is seeing problems but not reported them.

@Croydon
Copy link

Croydon commented Jan 6, 2018

@ara4n Thanks for the explanation. One more question, wouldn't it be enough to kick Matrix users which idle for 30 days? The IRC Matrix bridge kicked IRC users from IRC and I don't understand how kicking IRC users can reduce spam FOR IRC users on the IRC channel.

The IRC channel gets spammed with Matrix users joining/leaving ect when the Matrix bridge re-connects with the IRC server, right? So, more IRC users don't increase the spam?

@ara4n
Copy link
Member Author

ara4n commented Jan 6, 2018

The IRC Matrix bridge only kicked the IRC users who it was joining to the channels itself, who are the manifestations of users in Matrix who have been absent for >30 days. The bridge does not and cannot kick other random IRC users. By removing absent Matrix users from the IRC channel, it means that if the Matrix<->IRC bridge glitches, there are less users who are seen to part and then join again - hence "less spam".

@itsrachelfish
Copy link

Since some people in this thread don't seem to understand what the problem is. This is what the problem is:

screenshot from 2018-03-10 08-16-12

It actually has nothing to do with automatically removing idle connections and everything to do with the fact that this "bridge" is a node script which connects bots to an IRC server instead of running as an ircd.

Maybe the title should be updated?

@ara4n
Copy link
Member Author

ara4n commented Mar 10, 2018

@itsrachelfish on the matrix.org team at least we are painfully aware what the problem is and what it looks like, and i can only apologise for the join/part spam. The bug for "the bridge should talk link protocol" is #329.

The best we can do otherwise for the likes of Freenode is:

  • Reap idle users to minimise the amount of join/part spam when things go wrong (which we do semi-regularly)
  • Make the bridge's uptime more reliable (which in general it has been, i hope, since early Jan when we fixed the main remaining stability bug - the last restart was a deliberate one on Feb 23 for instance).
  • Make the bridge's network connectivity more reliable (we can do some more work on this, but unfortunately the internet is never going to be fully reliable and we're always going to see netsplits).

In the instance of your screenshot, i'm guessing you're running your own bridge instance for your own ircd - and it looks like there may have been a netsplit between the bridge & the ircd? I have a bad feeling that even if we were speaking TS6 to your ircd, it'd still look like an ugly netsplit...

@ara4n
Copy link
Member Author

ara4n commented Mar 10, 2018

N.B. this always happens when the idling script is run:

screen shot 2018-03-10 at 21 34 47

One mitigation could be to finally get explicit metadata set on IRC channels to say whether they allow membership desyncs - and if so, just disconnect the user on the IRC side rather than kicking them on the Matrix side.

@Mikaela
Copy link
Contributor

Mikaela commented Jul 8, 2019

@ara4n @Half-Shot

At Freenode's request we ran the script today which disconnects any Matrix users who have not used their Matrix account whatsoever for more than 30 days from Freenode, as they are starting to have a tangible impact on the network (both by joins/quits if there's a netsplit between Freenode & Matrix, and in terms of overloading the ircds for no good reason).

How is the script ran? I am requesting it to be ran as a PirateIRC oper and I am told (by diasp.in who runs it) that the script requires a specific room as a target, but that cannot be right. Or is the script closed source?

Our situation is having several users who have used Matrix once and then disappeared to never use it again and their ghosts just keep being connected and getting reconnected and at times distracting actual users from IRC and Matrix into believing that by pinging them, the actual users would see and read their messages.

@Half-Shot
Copy link
Contributor

@Mikaela The script is closed source mostly because it's quite specific to our setup, but if you'd like I can see what can be done to make it more generic for others to follow.

Our situation is having several users who have used Matrix once and then disappeared to never use it again and their ghosts just keep being connected and getting reconnected and at times distracting actual users from IRC and Matrix into believing that by pinging them, the actual users would see and read their messages.

Would it be an acceptable solution to just automatically kill the IRC connection and kick the matrix user after N days, which could be something built into the bridge.

@Mikaela
Copy link
Contributor

Mikaela commented Jul 8, 2019

The script is closed source mostly because it's quite specific to our setup

Oh, I see, I always thought it was https://github.com/matrix-org/matrix-appservice-irc/blob/master/scripts/remove-idle-users.py

, but if you'd like I can see what can be done to make it more generic for others to follow.

Yes, please.

Would it be an acceptable solution to just automatically kill the IRC connection and kick the matrix user after N days, which could be something built into the bridge.

Yes, I think it would resolve the issue for us.

@Half-Shot
Copy link
Contributor

@Mikaela The PR has now been merged. However the way of activation has changed: We now require you to manually trigger the cleanup via the debug API (POST http://localhost:debugport/reapUsers?access_token=ABC&since=max_idle_time_in_hours&reason=optional_kick_reason_string). The reason for the removal of the automatic timer code was due to added complexity, also meaning that you couldn't alter parameters after starting the bridge.

I suggest anyone who wants to run this at regular intervals use a crontab, or a systemd timer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
matrix.org-support Matrix.org specific problem possibly unrelated to the bridge p1
Projects
None yet
Development

No branches or pull requests

9 participants