-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bgpalerter seems to have lost visibility ~ April 8 - RIS issues ? #535
Comments
Hi @mfld-pub, I'm checking your ticket. In the meanwhile, what the uptimeApi says? If any problem occurs in BGPalerter, a warning appears in the api. Was the process monitored at the time of the incident?
I'm doing the same with your AS, I'll let you know
This should not matter, connection errors can occur. Everything is ok as long as the process is able to re-establish the connection in the coming seconds.
I'm not sure if you did it already, but I suggest you to use |
It must have been a RIS issue of sorts ?! As of 0914 UTC today it seems to work again both my RHEL 8 prod and my Ubuntu docker container that I set up for this ticket. |
Yes. I was able to reproduce your issue and I contacted the main dev behind RIS and he did some digging. We are planning some improvements, including a missing/delayed messages monitoring in both BGPalerter and RIS. You will see a PR linked to this issue soon. In the meanwhile a new rule to limit the number of connections per user has been set in RIS (since one connection can have unlimited subscriptions to prefixes, there is no reason at all to open multiple connections...just a lack of reading-the-doc skills). Thanks for reporting this!! |
Thanks for cruising github issues on your Sunday <3 I think they also want us to send a user-agent with our connections in the ?client= parameter of the URI. Something like
Would it make sense to make this a configurable option in BGPAlerter ? |
You are welcome :)
We agreed with the RIS staff on what user agent to send for all BGPalerter clients, this is already done by BGPalerter and should not be configured per instance. Each user has also an additional random connection id. |
As promised, in addition to the fix on the RIS side reported above, in the next release of BGPalerter there will be a check for silent socket sessions. |
Took me a few days to open report this as I wanted to make sure it is not some local issue:
Since April 8, ~1200 UTC I am not seeing any monitored events being triggered. RIPE Service status indicates all is well, RIS Live should be functioning.
Running on RHEL 8 as a systemd service. About a year in prod. Worked fine after udpating to v1.27.1.
I checked that notifications work with the -t flag. They do. It spams Email and Telegram when I use the -t flag. I checked that the process has sufficient resources and permissions - all good. I checked bgpalerter's reports.log and it is indeed empty but I know I created plenty of
mayhem"events" :PI then tried creating a new prefixes.yml list and config.yml by stopping the service, renaming the existing ones, executing the binary manually once with bgpalerter-linux-x64 generate -a ASN-o prefixes.yml -i -m
This completed without errors and created sensible files. I restart service, withdraw a monitored prefix and tail -f reports.log. Nothing.
I hijack my prefix from a lab ASN. Nothing.
Final sanity check before reaching out: I spun up a new Ubuntu server, installed docker and created the bgpalerter docker container. Created config, started it, withdrew a prefix. This instance, too, does not "see" an event.
I do see in error.log of the original prod instance around the thing things stopped working:
But these 500 responses have occured from time to time in the past. Of note is that during my testing of prefix withdrawals the last entry in error.log indicated that we were connected at that time:
info: ris connector connected
Did RIPE change anything in RIS Live that could be breaking me ?
Unrelated, I did have some 530 responses from Cloudflare when using cloudflare as my vrpProvider for rpki which broke RPKI detection but I had switched back to ntt since and it worked fine.
The text was updated successfully, but these errors were encountered: