Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gluon-check-connection: extracted from gluon-scheduled-domain-switch #1684

Closed
wants to merge 4 commits into from

Conversation

rubo77
Copy link
Contributor

@rubo77 rubo77 commented Mar 24, 2019

Extract the connection-check for example, to reuse it in gluon-offline-ssid package #1649 .

The check so far only pings some definable IP6 addresses. In the old ssid-changer, we only used a batctl check if a gateway is reachable instead. It seems like the ping solution is the better, but are there any other suggestions?

We could also add a TQ-check to the gateway as in the offline-ssid package, so it is treated as offline, if the TQ sinks below a threshold.

@CodeFetch
Copy link
Contributor

Personally I prefer the ping check as when there is a node which just has a bad connection, I don't want it to be shown as offline. A bad connection is better than no connection.

@rubo77
Copy link
Contributor Author

rubo77 commented Mar 26, 2019

What does the function check_domain_switch do in this line? i.e. where is the function check_domain_switch defined?

Edit: seems to be an error: see #1741

if need_table(in_domain({'domain_switch'}), check_domain_switch, false) then

@rubo77
Copy link
Contributor Author

rubo77 commented Mar 26, 2019

I am building a firmware with this package right now, at least the build seems to work OK already

@rotanid rotanid added 0. type: enhancement The changeset is an enhancement 2. status: rfc request for comments labels Mar 28, 2019
@rotanid
Copy link
Member

rotanid commented Mar 28, 2019

i added the RFC label, please request to remove it as soon as you are finished from your side.
BTW: github has a new function to mark a pull request as a DRAFT (i dont know if you can switch to DRAFT after creating a PR, if not you can use it in the future)

@CodeFetch
Copy link
Contributor

CodeFetch commented Mar 31, 2019

@skorpy2009 If you have a problem with that, articulate it instead of giving thumbs down. If you don't want bad connections to work, define a threshold using the supported rates. At the moment the lowest recommended value by Gluon is 6 MBit/s. Thus if the nodes can't handle out that rate the ping check won't work either. Using the offline-ssid for defining the threshold is a wrong approach.

Edit: Another example: You have handled out a 150 MBit/s rate and have a TQ of 30, because the air is so congested. This means you can still get 45 MBit/s throughput. And you want to use the offline-ssid in that case because of the packet retransmissions?

@rotanid
Copy link
Member

rotanid commented Apr 8, 2019

@CodeFetch 45Mbit/s over a TQ of 30? in theory, maybe your values are correct, but in practise definitely far from reality, especially for non-TCP protocols.

@CodeFetch
Copy link
Contributor

CodeFetch commented Apr 9, 2019

@rotanid I've build long distance links with Ubiquiti hardware some years and I have achieved such TQ/throughput ratios in reality. The only reason why you don't achieve that with Gluon is because you have your retransmissions being handled on the remote server which introduces a high jitter.

@CodeFetch
Copy link
Contributor

CodeFetch commented Apr 9, 2019

@skorpy2009 Can you please tell me what your problem is or are you just trolling?

Edit: You're right that I was wrong that you "feel" the packet loss as a client. I assumed that retransmissions were turned of on gluon mesh networks, but that is not correct. Thus you should also get this high throughput with gluon.

If you don't believe me, log in to a node with a low TQ link, execute iw dev INTERFACE station dump look at the expected throughput field, execute batctl tp NEIGHBOURMAC and you'll see that this is near reality. In my case I have a very crowded link in a city mesh network with 5 nodes with a TQ of ~40, the tx bitrate is 135.0 MBit/s MCS 18 and rx is 120.0 MBit/s MCS 11. The expected throughput is 44.677Mbps and with batctl tp I get 38.80 Mbps. The uplink delivers ~12 MBit/s. Thus I don't feel any packet loss as a client.

@rotanid
Copy link
Member

rotanid commented Apr 9, 2019

ok, maybe, but we really went off-topic.

regarding the PR's changes, any opinions there?
to me it looks good, has it been build and run tested @rubo77 ?

@rubo77
Copy link
Contributor Author

rubo77 commented Apr 10, 2019

I built it in our ci successful from our site
https://gitlab.toppoint.de/ffki/ffki-site/tree/multidomain

here:

https://freifunk.in-kiel.de/firmware/multidomain/2018.2.2.0-643~multi/

Maybe you can test if everything works?

@rotanid rotanid added 2. status: waiting-on-author Waiting on some action from the author 5. needs: testing Testing of the changes is necessary and removed 2. status: rfc request for comments labels Apr 11, 2019
@rubo77 rubo77 force-pushed the connection-check branch from a536b23 to 9d14db3 Compare April 11, 2019 17:19
@rubo77 rubo77 changed the title gluon-connection-check: extracted from gluon-scheduled-domain-switch gluon-check-connection: extracted from gluon-scheduled-domain-switch Apr 11, 2019
docs/package/gluon-scheduled-domain-switch.rst Outdated Show resolved Hide resolved
package/gluon-check-connection/Makefile Outdated Show resolved Hide resolved
package/gluon-check-connection/check_site.lua Outdated Show resolved Hide resolved
package/gluon-check-connection/check_site.lua Outdated Show resolved Hide resolved
@@ -7,14 +7,8 @@ local site = require 'gluon.site'
local offline_flag_file = "/tmp/gluon_offline"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's clean up names and paths a bit:

  • Let's rename the script to gluon-connection-check (if we're going with my suggestion to rename the package)
  • A better place for the flag file would be /var/gluon/connection-check/offline
  • The cron job and micrond dependency can be moved from the domain switch package as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we keep the name check-connectin now, I will move the flag to /var/gluon/check-connection/offline

I hope, that there will be no write-action done to the flash-rom there either. (Which was the reason, we chose /tmp to store the flag)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure, how to move the cron from the scheduled domain switch to the new package, as the cron job should only run, in case domain switch is scheduled

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/tmp and /var are the same directory on OpenWrt, just symlinked.

Having two cronjobs, one in domain-switch and one in ssid-changer, isn't nice either. There would need to be some central facility in the check-connection package that other package can interface with to enable the cron job. Possibly a directory under /var/gluon where packages can drop a file when they need the offline status?

@CodeFetch
Copy link
Contributor

@NeoRaider gluon-check-connection would be better if we add other check-packages later.

@rubo77 rubo77 force-pushed the connection-check branch 2 times, most recently from 99b592e to 9cb905a Compare April 23, 2019 16:55
@neocturne
Copy link
Member

@CodeFetch Hmm, good point. Let's keep the current name.

@rubo77
Copy link
Contributor Author

rubo77 commented Apr 24, 2019

I see a problem: If we use the package gluon-check-connection for the domain switch and also for the offline-ssid, there could be needed different IP targets to ping: one to check, if time-servers are reacheable for the domain switch and other IPs to ping to check if the internet connection to the outside is working

@rubo77 rubo77 force-pushed the connection-check branch 2 times, most recently from fedadc7 to 05475a8 Compare April 24, 2019 09:08
@CodeFetch
Copy link
Contributor

@rubo77 Is this really a problem?

@rubo77
Copy link
Contributor Author

rubo77 commented Apr 24, 2019

We would need to change the sections in the site.conf again, then it will work.

The ips must be declared in the config block of each calling package

@rotanid rotanid removed the 2. status: waiting-on-author Waiting on some action from the author label Apr 24, 2019
*gluon-scheduled-domain-switch* package ::

check_connection = {
targets = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call this icmp_echo_targets to clarify their usage.

@rotanid
Copy link
Member

rotanid commented Feb 3, 2020

the recent mini-reviews are not that useful, or am i mistaken that some bigger changes that were discussed have not yet been implemented?

unfortunately myself also didn't find the time yet to do this :(

EDIT:
see #1684 (comment)

@CodeFetch
Copy link
Contributor

CodeFetch commented Feb 4, 2020

Indeed some ideas are missing here.
Edit: Sorry I messed things up...

The current state of the PR is completely different and would introduce superfluous cronjobs.
And it does not distinguish between "local" and "remote" servers, but the scheduled-domain-switch only needs to check the "local" connection while the ssid-changer needs to check the "global" connection.

My idea was not to have additional cronjobs for the packages that rely on check-connections, but only one which is being executed once in a minute for the check-connection script.
For this check-connection should have a UCI section where one can add entries with the following fields:

  • "interval" is used to only execute the trigger script once in a time (e.g. every 5 minutes).
  • "script" is the path to the executable to trigger
  • "event" can either be "offline", "online" or unset ("offline|online").
  • "scope" can be "local" or "global" or unset ("global&local")
  • "onchange" if set true the script is only being executed on a state change otherwise it's always executed

Edit2: Just want to clarify why setting scope "unset" is different from "global": When the node is a "local exit" it could have a global connection while the VPN connection could be broken and thus it would not have a "local" connection. With "global&local" I mean e.g. 8.8.8.8 and e.g. 10.20.0.1 (supernode) can be reached from a client perspective...

@rubo77
Copy link
Contributor Author

rubo77 commented Feb 4, 2020

Sounds good, can you implement this?

@CodeFetch

This comment has been minimized.

@rotanid
Copy link
Member

rotanid commented Feb 4, 2020

i think this is over-engineering and too much configuration.
i'd merge it with the smaller changes i suggested ( or maybe i find time for this some day myself )

@CodeFetch
Copy link
Contributor

@rotanid Okay, I'll look into it this weekend, too. When I look at the offline-ssid I see a lot of redundancy otherwise and I don't think that two, three or four (with fallback-autoupdater) cronjobs executed every minute will make the device more stable in any way. Thus I think merging it in the current state should not be considered.

@CodeFetch
Copy link
Contributor

@rubo77 Please have a look at #1930

@rubo77
Copy link
Contributor Author

rubo77 commented Feb 24, 2020

@CodeFetch It seems like you took some of the changes in my branch and created a new history, is is difficult to merge... how do we proceed now?

should we close this PR and continue development in yours?

@CodeFetch
Copy link
Contributor

CodeFetch commented Feb 24, 2020

@rubo77 I wouldn't close this, yet as I don't know if the "complexity" of my proposal will be accepted. I didn't wrote it on top of your branch, but I think I've copied some text at first... What do you mean specifically?

EDIT: Oh you're right... I was working on the wrong branch. Will fix it.

@mweinelt mweinelt modified the milestones: 2020.2, 2020.3 Apr 9, 2020
@rotanid rotanid removed this from the 2020.3 milestone Nov 8, 2020
@AiyionPrime
Copy link
Member

Hey everybody. We might need an interface like this for an ongoing corporation in hanover soon, so I'd like to put some effort in this. I hope "help appreciated" is still valid on this.

I'd like to work on this PR as a stepping stone.
Get it merged into master, work with it a little while and come back later with a second PR that adds, what we're missing most. (Likely some sort of hysteresis, or named sets of target hosts).
What I do not intend ist to get these features into this PR.

I contacted @rubo77 earlier this day and asked about a rebase on master.
There was one commit, that was not proper lua, I fixed it.
Technically there's no problem; I'm just afraid, things around this PR might have changed a lot in the last two years.

However, I compiled the rebase with a merge of our nightly siteconf and rubo77s documented site-parameters.
The result is available on http://testbench-02.n.ffh.zone/ .
I put the node in a domain, that's not going to find mesh-neighbours.

My speedport promises to block it's internet connection from time to time;
while that's not the case all past reviewers of this PR have ssh-access to the node.

Other than that, calling gluon-check-connection does now create the directory /var/gluon/check-connection and puts an offline flag within, if there's no route to the specified set of ips.

Imo; there should be some sort of crontab, that exectues the test on a regular basis, as testing the remotes costs a few seconds.

Feedback is appreciated.

@mweinelt
Copy link
Contributor

mweinelt commented Jun 7, 2021

We feel strongly that a package that pings out into the network to determine some state it can derive from local information has no place in the gluon core repository.

We are discussing a proposal on IRC right now with @blocktrron @AiyionPrime and @NeoRaider and will see what comes off it.

@mweinelt mweinelt closed this Jun 7, 2021
@CodeFetch
Copy link
Contributor

We are discussing a proposal on IRC right now

That's the toxic nature of Gluon - Discussing things on IRC... This discussion was done already and the result was that we would like to add a possibility to perform global ping checks for the SSID changer...

@CodeFetch
Copy link
Contributor

The ax is already at the root of the trees, and every tree that does not produce good fruit will be cut down and thrown into the fire.

@rotanid
Copy link
Member

rotanid commented Jun 7, 2021

@mweinelt what kind of behaviour is this? i have to agree somehow with @CodeFetch over at #1930 (comment)
some contribution open for a long while, someone tries to push it forward and the immediate reaction is to close it? in my eyes, the right way would have been to post the result of your IRC discussion here so others can join the discussion. As i don't want to start some kind of "edit-war" i'll not reopen it for the moment...
Also, your argument may be wrong as we already have something that pings out in gluons "core" repo and this interface is simply meant to make this functionality better and modular and therefore usable for other packages, too. I have to admit though, i don't remember at the moment where we already use something like this - and my point may be invalid if it was only a PR that wasn't merged.
CC @blocktrron @AiyionPrime @NeoRaider

@mweinelt
Copy link
Contributor

mweinelt commented Jun 8, 2021

We are discussing a proposal on IRC right now

That's the toxic nature of Gluon - Discussing things on IRC... This discussion was done already and the result was that we would like to add a possibility to perform global ping checks for the SSID changer...

It was clear from the long time this had been lying around, that none of the commiters felt this was happening. In fact while we added some review comments, some of us felt that it wasn't the right move going forward. Ultimately as maintainers of Gluon it is our perogative to decide what goes in, and what does not. It's why we usually ask to discuss approaches with us, before taking the chance to waste your time on something that we are not going to merge. I think it is fair to close pull requests, when that happens.

Our proposal stands. I tried to document it best I could in #2228. Please go read it, if you're interested.

@CodeFetch
Copy link
Contributor

@mweinelt I've tested it yesterday. It seems to work fine. There was only a tonumber missing. I'm going to make a PR for it in the community repository after it has been fully tested. I see your point of not doing local tests using ping, but for the SSID changer which I also want to take care of I see no better possibility than doing global pings.

Btw... I didn't feel motivated to do this as I haven't received any feedback whether it was something you would like that way. Furthermore I thought the code is so simple that it might be interesting for a newbie to take over.

@rotanid
Copy link
Member

rotanid commented Jun 9, 2021

i'm fine with #2228 - if it would have been created before closing the PRs, it would have been ideal ;-)

@mweinelt
Copy link
Contributor

We are discussing a proposal on IRC right now with @blocktrron @AiyionPrime and @NeoRaider and will see what comes off it.

I tried to communicate that we were working on something and will come forward with it as it was ready. The order might not have been ideal, but I'd kindly ask for the benefit of the doubt that what I'm doing is not malicious.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0. type: enhancement The changeset is an enhancement 5. needs: testing Testing of the changes is necessary help-appreciated
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants