Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for IPv6 prefixes in namespaces #208

Merged
merged 26 commits into from
Jan 30, 2022
Merged

Conversation

enoperm
Copy link
Contributor

@enoperm enoperm commented Oct 31, 2021

I'm sending an MR to initiate a discussion about this initial implementation.

I have found that specifying an IPv6 prefix for ip_prefix caused the Headscale server to crash, because getAvailableIP assumed an IPv4 address by calling As4().

While I was at it, I also tidied up address generation a bit, because the comment within was inaccurate (a network/broadcast address is one where the host parts of the address are all zero/one bits, not ones that end with eights consecutive zero/one bits), and if I interpret the netaddr API reference correctly, IsZero() and IsLoopback() should never return true for the same address, so I assume the use of && probably had been a typo here.

I also found that machine.go also assumed an IPv4 representation and sent /32 routes to nodes, which tailscaled refused to use, even though tailscale ping managed to resolve the correct destination node.

These changes were enough to ICMPv6 ping working both against namespace addresses, as well as advertised IPv6 routes. As far as I can see, the changes did not break any of the established tests that use IPv4, but I have not yet added any IPv6 specific test coverage - If I read the code correctly, there is a single unit under test preconfigured with an IPv4 prefix, and I'm not sure about the optimal way to handle the situation.

I have also separately tested with the default IPv4 prefix as well and things seem to still work that way.
I'm not sure why yet, but I was only able to access IPv4 advertised routes when I also used an IPv4 prefix for the namespace, and only able to access IPv6 advertised routes with an IPv6 prefix configured for the namespace. Accessing IPv4 advertised routes from an IPv6 prefix or the other way around does not seem to work, and I have yet to perceive any error messages anywhere, so far I can only observe the lack of packets.

@enoperm
Copy link
Contributor Author

enoperm commented Nov 20, 2021

I have had a little bit of time to investigate this branch again, and I have found that I could get IPv6 routes with IPv4-first tailnets or the other way aroudn working. The issue was that none of the hosts had any IPv6 source addresses that was included in their AllowedIPs. By advertising a /128 source address from the source node and manually configuring it on the tailscale0 interface, I was able to get ICMP between a tailscale node and an advertised IPv6 address working with the ip_prefix still set to the default IPv4 subnet. As long as nodes are assigned both an IPv4 and an IPv6 address, fully automatic dual-stack tailnets should be possible.

While assigning addresses from multiple pools would be a simple solution, it does raise the question as to what to do if any of the pools are out of addresses, should a node join fail completely, or should it be assigned addresses from a subset of the available pools? Is it possible to push custom messages to tailscale clients in response to a join request?

@enoperm
Copy link
Contributor Author

enoperm commented Nov 21, 2021

As of 09c7551, by updating my config to include a list of prefixes under the config key ip_prefixes, instead of a single ip_prefix, I have been able to assign both an IPv4 and an IPv6 address to multiple nodes fully automatically, and the nodes could communicate with each other by both addresses. For some reason, only the tailscale IPv4 ranges show up for peers on the output of tailscale status --json, but I think this is probably related to #148.

@kradalby
Copy link
Collaborator

Apologise for the delay, I'm very for this pr, but I had (and have) some refactors to get out of the way before I can take a proper look.

In the mean time, would you be able to add some integration test cases showing of ipv6?

@enoperm
Copy link
Contributor Author

enoperm commented Nov 21, 2021

I was thinking of adding unit tests but I don't mind playing around with integration tests either (though admittedly, I haven't run those yet).

@enoperm
Copy link
Contributor Author

enoperm commented Nov 21, 2021

Though I guess I have a few hundred commits worth of rebasing to do first. Resolving the merge conflicts after adding even more modifications sounds harder than doing them before. :)

@kradalby
Copy link
Collaborator

I saw some unit tests and that's good, but we need the integration ones to ensure we are and remain compatible with tailscale.

So we need both :)

And having it automatic to verify that multiple IPs and ipv6 works as intended is important.

Since this will also change the database schema, we need to test it a bit extra, potentially keep the old field for compatibility.

@enoperm
Copy link
Contributor Author

enoperm commented Nov 21, 2021

Yeah, for now I was only focused on getting it working the simplest possible way, so schema compatibility was not even accounted for. From what I could see, "schema upgrades" happened by GORM automatically creating a new column, and being unable to drop the old one due to sqlite limitations. There's also a bunch of code I haven't tested yet at all, such as the IPv6 magic DNS stuff. It is also not nice of me to squeeze multiple values into a single column the way I currently do.

@enoperm
Copy link
Contributor Author

enoperm commented Nov 21, 2021

Oh, I have noticed something else now that I joined on android as well: For the newly joined phone, the displayed address is the IPv6 one, for the others, it is the IPv4 one (the latter is as expected from the output of tailscale status --json on my laptop). However, clicking on the name of another node in the android app copies the IPv6 address instead of the displayed IPv4 ones. This is not a particularly useful observation, but thought I would mention it while this tab is still open.

@enoperm
Copy link
Contributor Author

enoperm commented Nov 21, 2021

It seems to me that it is possible to retrieve the IPv6 address of peer nodes by running tailscale ping. Whether it returns the IPv6 address because I set that first in my config, or the client has some preference for IPv6, I can not say. The same trick might be a workaround for IPv4 prefixes outside of 100.64.0.0/10 in #148.

@enoperm
Copy link
Contributor Author

enoperm commented Jan 8, 2022

Okay, from what I can see, taildrop via curl is not going to work in integration tests as is in the IPv6 case. Regardless of whether I try to connect to the IPv4 or the IPv6 address on the peer API port, the receiving node seems to forward the connection to 127.0.0.1. This is fine in the IPv4 case, but apparently (I haven't investigated it thoroughly so far), when IPv6 addresses are involved in the tunnel, tailscaled binds that port on IPv6 only, which basically breaks taildrop entirely. I am not sure if it is a side effect of running tailscaled in docker containers without any IPv6 addresses (including loopback). I can override the relevant knob (net.ipv6.conf.all.disable_ipv6=0) on the docker command line to get at least a loopback address without changing the docker daemon config, but from what I gather dockertest does not currently expose the sysctls parameter.

EDIT: Oh okay, it does support it, it is just in a different struct and package: HostConfig

@enoperm
Copy link
Contributor Author

enoperm commented Jan 8, 2022

Okay, having an IPv6 loopback address is not enough in itself to make tailscaled use it as a destination. I still do not see why the node winds up binding to an IPv6 only port yet. Still, with a tun device present this whole problem shouldn't exist.

@enoperm
Copy link
Contributor Author

enoperm commented Jan 8, 2022

I'm not familiar with github actions, is it possible to provide a /dev/net/tun to the containers, and set the equivalent of --cap-add=NET_ADMIN on the command line to let tailscaled create a new interface?

@enoperm
Copy link
Contributor Author

enoperm commented Jan 9, 2022

Okay, as of now I have a branch (https://github.com/enoperm/headscale/tree/ipv6-f) where the integration tests all run fine even on IPv6, but it does rely on the containers being able to instantiate tun devices.
Pros:

  • I imagine this to be a more common setup, so it might make the test environment more "realistic".
  • Avoids the problem with the peerapi being unaccesible in userspace-networking mode on dualstack(/ipv6-only? haven't tested that) configurations. (possibly an upstream tailscaled bug?)
  • Allows us to use tailscale file cp, among other things.
    Cons:
  • I am not sure how the integration test is set up with the CI/CD environment (if at all?), and I do not know whether this could cause any problems.
  • Some tests seemed to log source/destination IP addresses, which was only made possible by having a single overlay address for each node. An example would be the magicdns test that uses tailscale ping - it does print an address, but it does not allow us to choose which address we use when we let it resolve the target for itself. A possible workaround might be to query A or AAAA records specifically from 100.100.100.100, and eliminate the tailscale commands from the test case entirely.

I'll see if I can tidy up the branch into clean little patches later on and update this PR. The changes so far can be categorized as follows:

  • IP handling cleanup (can now process IPv6 prefixes, also more efficient for IPv4 prefixes as a side effect).
  • MagicDNS IPv6 support for arbitrary prefixes (not explicitly tested, yet).
  • Support for multiple IP prefixes per namespace (this one probably deserves a cleaner data model than what I have used so far).
  • Some stability fixes in PollNetMapStream to avoid crashing the control server on machine joins.

@enoperm enoperm force-pushed the ipv6 branch 2 times, most recently from a2ac3cb to 56867cf Compare January 15, 2022 16:29
@enoperm
Copy link
Contributor Author

enoperm commented Jan 15, 2022

I have cleaned up the branch used in this PR. I have added tests that query A records through the DNS interface of tailscaled rather than testing MagicDNS with tailscale ping. It should be trivial to query AAAA ones as well, but that did not work on my tailnet when I queried manually. This may be a side effect of using 100.64.0.0/10 as my IPv4 prefix, while using a custom ULA as my IPv6 one. I'll have to check later on whether using Tailscale's own IPv6 prefix makes it work. If it were to work, then I think it would be safe to assume there is some sort of filtering in tailscaled and the problem does not lie with this version of Headscale.

@enoperm
Copy link
Contributor Author

enoperm commented Jan 16, 2022

Okay, apparently, AAAA records will only be served by MagicDNS if the server has nothing but IPv6 addresses. There seems to be some interest in adding them, but for now (and probably for older versions currently part of the integration test suite), it is not going to happen. I'll go look into which version the tailscale ip command, and its -6 appeared in, and if it seems viable, I'll refactor the tests to use that instead of DNS queries.

@enoperm
Copy link
Contributor Author

enoperm commented Jan 16, 2022

Refactored the MagicDNS integration test and split the contents of the gen/ dir into a separate commit. Assuming I did not forget about anything, PR should be ready for a review.

@kradalby
Copy link
Collaborator

Hi @enoperm This looks like good thorough work and I will try to review it soon, but currently in need of a little break. Hopefully should be able to get around to it the week after tomorrow.

@kradalby kradalby self-requested a review January 16, 2022 18:56
Copy link
Owner

@juanfont juanfont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments.

utils.go Outdated
@@ -141,36 +141,32 @@ func (h *Headscale) getAvailableIP() (*netaddr.IP, error) {
return nil, err
}

ipPrefixNetworkAddress, ipPrefixBroadcastAddress := func() (netaddr.IP, netaddr.IP) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this would be more readable with a named function.

Copy link
Contributor Author

@enoperm enoperm Jan 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally like using anonymous scopes or named functions to to hide away temporary variables, so as not to accidentally refer to them later. I could see retrieving the network and broadcast addresses of an IP range being useful elsewhere, though, so I guess I'll extract it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracted as GetIPPrefixEndpoints. I am not sure this is the best possible name for it, but it is the best I could come up with so far.

@@ -31,7 +33,7 @@ type Machine struct {
MachineKey string `gorm:"type:varchar(64);unique_index"`
NodeKey string
DiscoKey string
IPAddress string
IPAddresses MachineAddresses
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a breaking change, needs to be documented in the changelog.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note of breaking change added to changelog.

@@ -221,10 +221,20 @@ func getHeadscaleConfig() headscale.Config {
dnsConfig, baseDomain := GetDNSConfig()
derpConfig := GetDERPConfig()

configuredPrefixes := viper.GetStringSlice("ip_prefixes")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be documented, and added in the config-example.yml

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example config updated.

}

// TODO: Is this concurrency safe?
// What would happen if multiple hosts were to register at the same time?
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. We might need a lock here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be actually better to put a lock around the callers, it is entirely possible that we get a free IP, but end up not using it because of an error somewhere else before persisting the machine in question - for example, by being able to allocate an IPv6 one but running out of IPv4 ones, or receiving a database error (such as "disk full", "connection lost"...).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I forgot to write about the actual point of putting the lock around the whole flow. If the lock was here, it'd only cover the part of the control flow where the next free address is fetched - if the potentially allocated address is not saved into the database before another client enters this function, it may find the same address to be free. So the mutex must last at least long enough to cover both the fetch-next-address and save-machine-to-db steps.

Copy link
Collaborator

@kradalby kradalby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @enoperm, this is stellar work. I have posted some comments with some questions/clarifications, could you have a look at those?

Other than that I am happy to get this in when we get that resolved.

@@ -221,10 +221,20 @@ func getHeadscaleConfig() headscale.Config {
dnsConfig, baseDomain := GetDNSConfig()
derpConfig := GetDERPConfig()

configuredPrefixes := viper.GetStringSlice("ip_prefixes")
prefixes := make([]netaddr.IPPrefix, 0, len(configuredPrefixes))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am contemplating if it would be sensible to keep ip_prefix in the config file and then mark it as DEPRECATED.

And then in the mean time, make sure it is added to this list and then deduplicated.

We can then remove it in the future.

Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the effective value of ip_prefixes would be set(ip_prefix ~ ip_prefixes)? As long as ip_prefixes defaults to an empty list, that sounds like a sane, backwards compatible way of handling it, but how does it play along with ip_prefixes having a non-zero default value? Wouldn't it mean people who forget to update their configs will unexpectedly see newly joined machines receiving addresses from both their configured pool and the current default IPv4 pool?

Maybe, from a user perspective, it would make more sense to emit a deprecation warning on using ip_prefix, and enforce that one or the other must be set - but not both at once.

Copy link
Contributor Author

@enoperm enoperm Jan 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, do not use viper to set the default value, perform the merge+dedup work, and add defaults values if the resulting set is empty - I think that would do the trick.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that makes sense, I think for a clear message, both emitting a deprecated warning in the logs, but also having it in the config-example.yaml make sense, I think there is a surprising amount of users dont reading the logs unless they have to.

I wonder if Viper has a way of marking and handling deprecation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have implemented this config merging, though I have not tested it manually, yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I have tested it, apparently converting the prefix back into a string does not normalize the address by itself, but converting it to a range drops the host bits, so converting that back into a prefix, and then a string, allows one to build a canonical representation of them. This way 100.64.0.0/10 and 100.64.0.1/10 maps to the same key, and deduplication happens.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thats good, I think that we can live with this until we actually remove the option and can remove that code again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uff, I forgot a debug println in the last commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a fixup.

}

ips[index] = ip
// FIXME: This really deserves a better data model,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you create an issue to track this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose, though I haven't really thought much about it - I guess the idiomatic thing to do in a relational database would be to introduce a separate table for IP addresses. With the right schema and queries, I think it should also help reduce the amount of code that would be needed to be covered by a mutex around address allocation - If we can have allocated, but not yet associated addresses in the table, the getUsedIPs step in it can treat those as used, so we can replace mutex(get-next-address -> save-machine) with mutex(allocate-address) -> save-machine || deallocate-unused-address-on-failure.

Copy link
Contributor Author

@enoperm enoperm Jan 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say, does it even make sense to open an issue about something, that does not yet even exist in on main? I mean, issues are not limited to branches/tags, and as such I think they should be about the mainline. Someone reading an issue about this comment on storing multiple ip addresses per machine may, if they do not already know better, get the impression such a thing already exists on the main branch and is part of a feature, which is not the case.

I feel like might take on more work by saying this, but let's either open the issue once it actually concerns behaviour already part of the application, or do the rework in this branch, in which case there is no issue to open in the first place, just a conversation in this PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the GitHub naming is a problem here, I look at "Issues" at "Tickets" and I just want a work item we can track this against. So as I consider merging this before this FIXME is resolved, I would like to have it tracked so it is less likely to be forgotten.

I want to get this PR in, and run it as a 0.13.0-beta release so we can leverage some of this exciting changes and stability improvements.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #295.

@kradalby
Copy link
Collaborator

kradalby commented Jan 29, 2022

I am happy with this PR and propose the following:

We merge a couple of outstanding fix PRs, plus @enoperm other PR (#278) and release 0.12.4 as a fix release.

Then we merge this PR and start releasing 0.13.0-beta.

@juanfont What do you think? I have a look over the PR and see if you agree with this approach.

@enoperm
Copy link
Contributor Author

enoperm commented Jan 29, 2022

My other PR was #278, though. :)

@juanfont
Copy link
Owner

I am happy with this PR and propose the following:

We merge a couple of outstanding fix PRs, plus @enoperm other PR (#278) and release 0.12.4 as a fix release.

Then we merge this PR and start releasing 0.13.0-beta.

@juanfont What do you think? I have a look over the PR and see if you agree with this approach.

I fully agree. Let's merge the other PRs, release 0.12.4 and a prerelease of 0.13.0 including this amazing piece of engineering by @enoperm.

@kradalby kradalby merged commit 5b5ecd5 into juanfont:main Jan 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants