Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Connection Experience #109

Open
lemoer opened this issue Jan 25, 2021 · 20 comments
Open

Improve Connection Experience #109

lemoer opened this issue Jan 25, 2021 · 20 comments

Comments

@lemoer
Copy link
Contributor

lemoer commented Jan 25, 2021

We have several issues regarding DHCP performance, which degrade the user experience a lot.

Server Issues:

  • OFFER packages always need to have option 54 (server identifier) set. Otherwise clients go crazy. It seems, we are affected by a bug in isc-dhcp-server: https://gitlab.isc.org/isc-projects/dhcp/-/issues/86
  • We do not NACK leases from other ranges. This might be disadvantageous for some clients which "probe" old leases. (Link)

Observed Apple iOS behaviour:

  • The following behaviour is found by analysing pcaps.
  • It looks like the Apple iOS tries to request all (or maybe a few?) old leases that it remembers.
    • The requests are sent consecutively with a delay of a 1-2 seconds.
      • I do not understand why they are sent consecutively.
      • Even if one of the requests is (successfully) ACKed, the client still seems to send the other requests.
    • Unanswered requests are repeated 3 times with exponential back off.
      • 1st retry after 2 seconds.
      • 2nd retry after additional 4 seconds.
      • Give up after additional 8 seconds.
    • In the my current sample, this procedure takes ~16 seconds in total.
    • Even if the client received an ACK for one of its previously sent requests, it starts with DISCOVER.
  • DISCOVER:
    • Seems to work as expected.
    • DISCOVER -> OFFER -> REQUEST -> ACK.
  • If the DISCOVER stage fails, it picks one of the remembered old IPs and just uses it, hoping it works.
  • Note: Apple has a "private wifi mode" or so. While I guess this means, that the device will change its mac, this does not mean, that the mac changes in every request. Maybe it's time based or so...

On Android 10, I also see other problem:

  • Note: You need to activate "Use randomised MAC (default)" in order to see this behaviour. However, this does not mean, that the mac changes in every request. In fact, I still see the same mac every time.
    • This does not really make sense, does it?
  • During the connection process it shows:
    1. "Connecting..."
    2. "Checking quality of your internet connection..."
    3. "Connected without internet"
    4. "Connected"
  • I am not sure yet, what the criteria of the states/stages are.
  • Sometimes ii. and iii. take up to 30 seconds.
  • Some measured timing samples
    • Format: (x/y/z/..) - x = seconds in stage i.; y = seconds in stage ii., z = seconds in stage iii.
    • 0/7/0/..
    • 0/0/0/..
    • 0/7/18/.. (randomized mac, turn wifi off, turn wifi on)
    • 0/7/15/.. (randomized mac, change to other wifi, change back to freifunk wifi)
  • In tcpdump on the clientX interface of the router, I can see:
    • It's quite messy.
    • I was not yet able to match the stages to events in the pcap.
    • One of the DNS queries is for connectivitycheck.gstatic.com/AAAA, which seems to be a check whether this network has a captive portal.
  • I am using an Samsung A6 smartphone with stock firmware.
  • Live capturing remote via ssh: ssh [email protected] tcpdump -n -w - -i client0 ether host ea:b2:67:43:20:f0 | wireshark -i - -k
@lemoer
Copy link
Contributor Author

lemoer commented Jan 25, 2021

@CodeFetch
Copy link
Contributor

From what I've read apple has integrated several techniques to aggressively enforce fast recovery of the network including RFC4436 and several "optimizations" of DHCP. Among them a technique which first tries to renew old leases when reconnecting with a still-valid lease. That's also why Apple tries to increase the lease lifetime by first accepting whatever lease is given to it and then renewing the lease after a short time with a much higher lease time even though it's existing lease is still valid. Still everything that Apple does is conformant with the DHCP standard and should still work with the reference implementation of DHCP.

Therefore ISC is flawed and I think we should just switch to Kea DHCP as it is the successor. I've prepared a role for using Kea DHCP with our existing infrastructure yesterday and will test it on SN06.

@Manawyrm
Copy link
Contributor

I was talking to @lemoer recently about https://github.com/sargon/ddhcpd
Gluon packages for ddhcpd: https://github.com/sargon/gluon-sargon

This might be a worthwhile candidate for a new DHCP server solution as well.
It can operate without any centralized DHCP server at all (and it can handle very small lease times because it's running locally).

Freifunk Kiel uses it pretty successfully on all their nodes.
In any case: Every replacement of ISC DHCP is welcome 👍

CodeFetch added a commit that referenced this issue Jan 25, 2021
Kea is the successor of ISC DHCP and will hopefully not cause
problems as like issue #109.
CodeFetch added a commit that referenced this issue Jan 25, 2021
Kea is the successor of ISC DHCP and will hopefully not cause
problems as like issue #109.
CodeFetch added a commit that referenced this issue Jan 25, 2021
Kea is the successor of ISC DHCP and will hopefully not cause
problems as like issue #109.
CodeFetch added a commit that referenced this issue Jan 26, 2021
Kea is the successor of ISC DHCP and will hopefully not cause
problems as like issue #109.
CodeFetch added a commit that referenced this issue Jan 26, 2021
Kea is the successor of ISC DHCP and will hopefully not cause
problems as like issue #109.
@AiyionPrime
Copy link
Member

I think we should get rid of isc dhcpd sooner or later; and kea appears to be a solid option.
ddhcpd is nice but would require a larger test imo.

@CodeFetch
Copy link
Contributor

@AiyionPrime Unfortunately I've just found out that Debian ships version 1.1.0, because there is no good maintainer... So I'm just reading about how to get dnsmasq running instead.

@CodeFetch
Copy link
Contributor

@AiyionPrime I'm in the Mumble.

CodeFetch added a commit that referenced this issue Jan 26, 2021
Kea is the successor of ISC DHCP and will hopefully not cause
problems as like issue #109.
@CodeFetch
Copy link
Contributor

@AiyionPrime You are offline.

CodeFetch added a commit that referenced this issue Jan 26, 2021
dnsmasq is familiar to us, better maintained than Kea and does not
cause the bug mentioned in issue #109.
CodeFetch added a commit that referenced this issue Jan 26, 2021
dnsmasq is familiar to us, better maintained than Kea and does not
cause the bug mentioned in issue #109.
CodeFetch added a commit that referenced this issue Jan 26, 2021
dnsmasq is familiar to us, better maintained than Kea and does not
cause the bug mentioned in issue #109.
CodeFetch added a commit that referenced this issue Jan 30, 2021
Kea is the successor of ISC DHCP and will hopefully not cause
problems as like issue #109.

Co-authored-by: Vincent Wiemann <[email protected]>
@CodeFetch
Copy link
Contributor

Since the switch to KEA we have not observed any iOS problems.

@lemoer
Copy link
Contributor Author

lemoer commented Feb 11, 2021

The issue on Android 10 is still visible. It seems like, it's not even an dhcp issue. I am not sure yet, what happens in the stages iii. and vi. ("Checking quality of your internet connection..." and "Connected without internet")

@lemoer
Copy link
Contributor Author

lemoer commented Feb 11, 2021

I could not yet find out which android service is checks the connection here.

@lemoer
Copy link
Contributor Author

lemoer commented Feb 11, 2021

Few more observations:

  • One person has phones with android 4 (cyanogen mod), 6 and 7. He also reports similar issues.
  • Android 9 lacks the ability to randomize mac.
    • 3 people didn't see the issue on android 9.
    • 1 person reported, he is also affected with android 9.
  • I personally have this issue on android 10.
  • One person reported, that android 11 seems to be super quick in the connection.

@lemoer
Copy link
Contributor Author

lemoer commented Feb 12, 2021

The checks are implemented and explained here:
https://android.googlesource.com/platform/frameworks/base/+/refs/heads/master/services/core/java/com/android/server/connectivity/NetworkDiagnostics.java#77

lemoer added a commit that referenced this issue Feb 12, 2021
This has to be done, as fdca:ffee:8::DOM::1/64 is still announced
as RDNSS. If the prefix is not announced, the client will send its
dns requests (dst ip: fdca:ffee:8::DOM::1/64) to the mac of the
supernode.

Found in #109 .

Partly reverts: eb7a98b
@lemoer
Copy link
Contributor Author

lemoer commented Feb 12, 2021

I found one bug and fixed it in b01261b .

@lemoer
Copy link
Contributor Author

lemoer commented Feb 12, 2021

However now where the dns resolution is successful, there is still some work left. The connection still takes quite a while. The UI still shows "Checking quality of your internet connection..." and "Connected without internet" for some seconds (not notably faster than before).

I guess, now this must be due to the other checks described in NetworkDiagnostics.java. Maybe one of them is still failing.

By the description in NetworkDiagnostics.java, I would expect to see icmp6 echo requests coming from the android device. Contrary to that expectation, I do not see them.

I think the reason might be, that the address resolution already fails. At least, I see unanswered icmp6 neighbor solicitations querying for the supernode ipv6 addresses originating from the android device. Oddly their source IP is ::. Not sure, whether this is valid icmpv6.

@AiyionPrime
Copy link
Member

AiyionPrime commented Feb 12, 2021

Inspecting the router with a private wifi might help, in order to see, how a 'normal' router handles those requests.

@lemoer
Copy link
Contributor Author

lemoer commented Feb 12, 2021

If I stop using "randomized mac" in android, it starts to work. Connection check is successful in less than 4 seconds.

Some observation on the neighbor solicitation here:

  • The source IP of the neighbor solicitations from android is still ::.
  • However now there is an answer to them.
  • The answer is sent to ff02::1 (mac: 33:33:00:00:00:01).

However, I still do not see any icmpv6 echo requests, as I would expect them from the description in NetworkDiagnostics.java (see link above).

@lemoer
Copy link
Contributor Author

lemoer commented Feb 12, 2021

TODO:

  • Find the issue with "randomized mac" here.
  • Verify that things are fine without "randomized mac" now. I currently have the feeling, things are still a bit slow (~5 seconds sometimes). Things should be faster than this.

Raising a new issue:

  • I have the feeling, that sometimes the SSID hannover.freifunk.net does not show up in the wifi scan.
    • I currently see this on my android device.
    • I already saw this behavior on my linux laptop some day.
    • I also saw this on Windows on some day.
    • It takes about 10-30 seconds till the wifi shows up.
      • Probably this is the time, when the next scan is performed.
    • I already saw this with other WiFis already as well.

@AiyionPrime
Copy link
Member

The source IP of the neighbor solicitations from android is still ::.

Wikipedia says:

Die einzig mögliche Option ist die Link-Layer-Adresse des Senders. Um bei Protokollerweiterungen keine Probleme zu bekommen, müssen alle unbekannten Optionen ignoriert werden.

I think it does use :: during duplicate address detection.

@AiyionPrime
Copy link
Member

The destination address does not necessarily be an address the smartphone gave itself;
ipv6 loves to detect duplicates. and checks for them before every? assignment of a given address to a new interface.

@AiyionPrime
Copy link
Member

There was an unrelated issue with my router beeing offline;
this produced an interesting behaviour on an iPad, tough.
After the device realizes there's no internet behind a certain wifi, it ignores it for a short period of time and does not list it anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants