-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pasta networking mode #16141
Add pasta networking mode #16141
Conversation
thanks for working on this, from a quick look, it seems good to me. Is there any plan to add the packages to Fedora? |
Thanks for having a look!
They're already available, see: https://src.fedoraproject.org/rpms/passt and the Availability section above for details about other distributions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is way more than I can review today, sorry. Just a few comments to get started. And one more: this adds almost 200 lines to the already-bloated helpers.bash
. Please split that into a new helpers.network.bash
(or some other descriptive name) (not "pasta", because some of these helpers seem like they could be useful in other tests).
Oh, also, I really think some of the tests could be refactored. At a very minimum, in your new 505 file, please add a setup
with skip_if_no_pasta
instead of repeating that over and over in each test. (Please follow the guidelines for setup
in the README).
Thank you!
Do you have benchmark data? (container->external via TAP, and external->container via the port forwarder) |
based on observed bound ports from both host and container sides | ||
- **pasta:-T,5201**: enable forwarding of TCP port 5201 from container to | ||
host, using the loopback interface instead of the tap interface for improved | ||
performance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5201 is for iperf3, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...right, it was probably the most commonly forwarded port during development and testing. 😉
Do you think it might lead to confusion? For ports, there's no such thing as reserved ones for documentation. 49151 (IANA reserved, without a given purpose) might be our best bet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left it like that, for the moment.
libpod/networking_pasta.go
Outdated
// Copyright (c) 2022 Red Hat GmbH | ||
// Author: Stefano Brivio <[email protected]> | ||
// | ||
// +build linux |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You do not need this tag if you append _linux.go
to the filename
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know that, thanks for the tip! Renamed file and dropped tag.
libpod/networking_pasta.go
Outdated
"strings" | ||
|
||
"github.com/containernetworking/plugins/pkg/ns" | ||
"github.com/pkg/errors" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deprecated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should just use errors
, instead of errors.Wrap
use fmt.Errorf("msg: %w",err)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry, I wasn't aware. I dropped the import
and changed to fmt.Errorf()
-- I kept the err.(*exec.ExitError).Stderr
part, though, because pasta(1) prints informative output to stderr
as to the reason why it couldn't start.
/approve |
@Luap99 I don't think we can avoid a breaking change here (networks named pasta will no longer function with |
Yes, here. There are a few fundamental links in the description. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would have been really nice if we could keep the new implementation at c/common/libnetwork
as of now slirp4netns is also duplicated in podman and buildah. Since pasta is entierly new implementation , Could we consider it implementing in c/common
and not only in podman ?
But this is not a blocker just my opinion and we can do this later as well.
I don't know enough about that to have an informed opinion, and the description of common isn't helping me much, but I have some questions:
|
@sbrivio-rh Yes buildah is also supposed to consume this for its networking. But I think this would need a better refactoring and even introduce a top level interface, maybe some thing like But this sounds like a bigger refactor therefore I think following PR is fine and maybe can be refactored later into |
In general I agree with that assessment but I do not think it is fair to let @sbrivio-rh make to hard work of rewriting/restructuring this. The code here is simple enough, if we like to abstract this into c/common someone else can easily copy this without much issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to run make vendor and make validate, I suspect many code style issues.
Oh, sorry, I missed that line from
Installing the
Because
Anyway, worked around that with:
and then running directly:
Hmm, yes:
but I don't seem to have added any:
Is there something else I'm missing? |
Do you use go 1.19? see #16011 in that case. If you remove "github.com/pkg/errors" you may not need to run make vendor, only when some deps are changed it is required. |
Yes, 1.19.2, see the output above.
I already installed golangci-lint 1.50.0 with the steps I outlined above, and I only see pre-existing issues, except that:
...I haven't addressed this comment of yours yet, I'll do that soon. |
Done, I also restructured things a bit, mostly in terms of style.
Done. Actually, I parameterised all the forwarding and transfer tests based on keywords in their names, to me it looks much easier to maintain, and it's about 600 lines going away. |
Could we make this the default in Podman and have a way in pasta to override it (e.g. |
Ah, yes, I actually wanted to ask what the default should be. I would rather do this to avoid duplicating options: I'll change this in a bit. |
libpod/networking_pasta_linux.go
Outdated
} else { | ||
addr = "" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not needed a string in go is by default initialized as empty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dropped.
pkg/specgen/generate/namespaces.go
Outdated
if s.NetNS.Value != "" { | ||
val = fmt.Sprintf("pasta:%s", s.NetNS.Value) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not required, I do not understand why this is done for slirp because s.NetNS.Value must be empty anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dropped, tested, not really needed. I left the case for slirp4netns, as I'm not sure enough of the logic.
pkg/specgen/generate/pod_create.go
Outdated
@@ -195,6 +195,12 @@ func MapSpec(p *specgen.PodSpecGenerator) (*specgen.SpecGenerator, error) { | |||
p.InfraContainerSpec.NetworkOptions = p.NetworkOptions | |||
p.InfraContainerSpec.NetNS.NSMode = specgen.Slirp | |||
} | |||
case specgen.Pasta: | |||
logrus.Debugf("Pod will use pasta") | |||
if p.InfraContainerSpec.NetNS.NSMode != "host" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use the specgen.Host constant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
pkg/specgen/generate/pod_create.go
Outdated
logrus.Debugf("Pod will use pasta") | ||
if p.InfraContainerSpec.NetNS.NSMode != "host" { | ||
p.InfraContainerSpec.NetworkOptions = p.NetworkOptions | ||
p.InfraContainerSpec.NetNS.NSMode = specgen.NamespaceMode("pasta") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use a constant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
test/upgrade/helpers.bash
Outdated
@@ -1,6 +1,7 @@ | |||
# -*- bash -*- | |||
|
|||
load "../system/helpers" | |||
load "../system/helpers" # For random_free_port() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same line twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, right, that's why the upgrade test in the CI would still fail. Fixed.
I think you are right this is a breaking change. We should think about keeping support for networks with the name pasta. I think the only good option is to check if the |
I spent a couple of hours of this but I don't see a way to pass down to |
Oh yeah this is going to look ugly, maybe add a bool pastaNetworkExist to |
@mheon PTAL |
Done, assuming it makes sense. |
Conceptually equivalent to networking by means of slirp4netns(1), with a few practical differences: - pasta(1) forks to background once networking is configured in the namespace and quits on its own once the namespace is deleted: file descriptor synchronisation and PID tracking are not needed - port forwarding is configured via command line options at start-up, instead of an API socket: this is taken care of right away as we're about to start pasta - there's no need for further selection of port forwarding modes: pasta behaves similarly to containers-rootlessport for local binds (splice() instead of read()/write() pairs, without L2-L4 translation), and keeps the original source address for non-local connections like slirp4netns does - IPv6 is not an experimental feature, and enabled by default. IPv6 port forwarding is supported - by default, addresses and routes are copied from the host, that is, container users will see the same IP address and routes as if they were in the init namespace context. The interface name is also sourced from the host upstream interface with the first default route in the routing table. This is also configurable as documented - sandboxing and seccomp(2) policies cannot be disabled - only rootless mode is supported. See https://passt.top for more details about pasta. Also add a link to the maintained build of pasta(1) manual as valid in the man page cross-reference checks: that's where the man page for the latest build actually is -- it's not on Github and it doesn't match any existing pattern, so add it explicitly. Signed-off-by: Stefano Brivio <[email protected]>
Currently, wait_for_port() duplicates the check logic implemented by port_is_free(). Add an optional argument to port_is_free(), representing the bound address to check, and call it, dropping the direct check in wait_for_port(). Signed-off-by: Stefano Brivio <[email protected]>
…d protocol Using bash /dev/tcp/ pseudo-device files to probe for bound ports has indeed the advantage of simplicity, but comes with a few drawbacks: - it will actually send data to unsuspecting services that might be running in the same network namespace as the tests, possibly causing unwanted interactions - it doesn't allow for UDP probing - it makes it impossible to clearly distinguish between different address bindings Replace that approach with a new helper, port_is_bound(), that uses procfs entries at /proc/net to detect bound ports, without the need for active probing. We can now implement optional parameters in callers, to check if a port if free for binding to a given address, including any IPv4 (0.0.0.0) or any IPv6 (::0) address, and for a given protocol, TCP or UDP. Extend random_free_port() and random_free_port_range() to support that. The implementation of one function in the file test/system/helpers.bash, namely ipv6_to_procfs(), and the implementation of the corresponding own test, delimited by the markers "# BEGIN ipv6_to_procfs" and "# END ipv6_to_procfs" in the file test/system/helpers.c was provided, on the public forum at: #16141 by Ed Santiago <[email protected]>, who expressly invited me to include them in this code submission. Signed-off-by: Stefano Brivio <[email protected]>
The main helpers.bash file is rather bloated and it's difficult to find stuff there. Move networking functions to their own helper file. While at it, apply a consistent style, and rearrange logically related functions into sections. Suggested-by: Ed Santiago <[email protected]> Signed-off-by: Stefano Brivio <[email protected]>
These tests should cover all the basic networking functionality with pasta(1). Namely, they check: - IPv4 and IPv6 addressing and routing settings - TCP and UDP port forwarding over IPv4 and IPv6 - data transfers and ICMP/ICMPv6 echo requests - the (exceedingly simple) lifecycle handling These tests need some new helpers, to obtain IPv4 and IPv4 addresses and routes, as well as MTU and interface names. Those use jq(1) for parsing. Some availability checks are implemented as well, to skip tests if pasta(1) is not available, or if IPv4 and IPv6 are not usable. To get consistent outcomes across distributions, and to enable uncomplicated termination for UDP tests based on zero-sized packets, use socat(1), which, unlike netcat, doesn't suffer from option inconsistencies depending on flavours (traditional, BSD, NMAP) and versions. Signed-off-by: Stefano Brivio <[email protected]>
…than binds _test_skopeo_credential_sharing() used port_is_free() to check if a port has no active listeners. With the new implementation, this is not equivalent anymore: a port might be in TIME_WAIT, so it's not free, but the listener might be long gone. Add tcp_port_probe() to check if there's an active listener on a given port, and use it in _test_skopeo_credential_sharing(). Signed-off-by: Stefano Brivio <[email protected]>
Unrelated CI failure, waiting:
|
the **vfs** storage driver, which can be disk space expensive and less | ||
performant than other drivers. | ||
|
||
To enable VPN on the container, slirp4netns or pasta needs to be specified; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does anyone know what this even means? I understand it's pre-existing to this PR, but I'm trying to understand what this manpage is saying and not succeeding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is listed under the rootless section, I guess the point here is that without slirp or pasta you cannot have a private netns with internet connection, so you need to use host namespace in that case.
Code LGTM. I have one comment on the manpages, but we shouldn't block on it. |
The out-of-tree Podman patch needs to be rebased every second week or so, and I'm currently trying to get that upstream: containers/podman#16141 Disable demo generation for the moment, so that I avoid wasting time with those rebases. We'll re-enable it later. Signed-off-by: Stefano Brivio <[email protected]>
In pasta mode, ICMP and ICMPv6 echo sockets relay back to us any reply we send: we're on the same host as the target, after all. We discard them by comparing the last sequence we sent with the sequence we receive. However, on the first reply for a given identifier, the sequence might be zero, depending on the implementation of ping(8): we need another value to indicate we haven't sent any sequence number, yet. Use -1 as initialiser in the echo identifier map. This is visible with Busybox's ping, and was reported by Paul on the integration at containers/podman#16141, with: $ podman run --net=pasta alpine ping -c 2 192.168.188.1 ...where only the second reply would be routed back. Reported-by: Paul Holzinger <[email protected]> Fixes: 33482d5 ("passt: Add PASTA mode, major rework") Signed-off-by: Stefano Brivio <[email protected]> Reviewed-by: David Gibson <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM we can split for buildah later.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: edsantiago, flouthoc, Luap99, mheon, sbrivio-rh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
See containers/podman#16141, shipped in Podman 4.4. Signed-off-by: Stefano Brivio <[email protected]> Reviewed-by: David Gibson <[email protected]>
Description
This introduces a new user-mode networking option using
pasta(1)
.pasta implements a translation layer between a Layer-2 network interface (a tap device inside a network namespace) and native Layer-4 sockets (TCP, UDP, ICMP/ICMPv6 echo) on a host. It doesn't require any capabilities or privileges, and it can be used as a simple, purpose-written, high performance replacement for
slirp4netns(1)
.Similarly to slirp4netns, there's no need to create further interfaces on the host. This approach doesn't require any capabilities or privileges.
pasta also implements a tap bypass path for local connections only: packets with a local destination address can be moved directly between Layer-4 sockets, avoiding Layer-2 translations, using the
splice(2)
andrecvmmsg(2)
/sendmmsg(2)
system calls for TCP and UDP, respectively.A few notable differences compared to
slirp4netns(1)
:pasta forks to background once networking is configured in the namespace and quits on its own once the namespace is deleted: file descriptor synchronisation and PID tracking are not needed
port forwarding is configured via command line options at start-up, instead of an API socket: this is taken care of right away as we're about to start pasta
there's no need for further selection of port forwarding modes: pasta behaves similarly to containers-rootlessport for local binds (
splice()
instead ofread()
/write()
pairs, without L2-L4 translation), and keeps the original source address for non-local connections like slirp4netns doesIPv6 is not an experimental feature. IPv6 port forwarding is supported
by default, addresses and routes are copied from the host, that is, container users will see the same IP address and routes as if they were in the init namespace context. The interface name is also sourced from the host upstream interface with the first default route in the routing table. This is also configurable as documented
by default, the host is reachable using the gateway address from the container, unless the
--no-map-gw
option is passedsandboxing and
seccomp(2)
policies cannot be disabledSee also this demo showing pasta operation with Podman. Throughput and latency figures for TCP and UDP, over IPv4 and IPv6, are regularly obtained by CI runs.
Availability
I'm submitting this now after originally proposing this a few months ago given that packages for some distributions (Fedora, openSUSE) are now available.
I maintain unofficial packages for other RPM-based distributions (CentOS Stream, EPEL, Mageia) on my Fedora Copr repository.
Unofficial x86_64 packages for Debian and Debian-based distributions are currently derived from static builds, and there's an open RFP for Debian.
Static build packages are also available for other RPM-based distributions.
Tests
This pull request introduces a number of tests, some of which will fail with the current upstream version of pasta, because I just found a few issues while writing those. Patches are on their way, and they should be available in packages within a few days, but I thought I would submit this early for review anyway.
Additional notes
This also introduces support for #14425, not via native Podman options at the moment, as commented on that issue itself: binding addresses to specific interfaces can only be done by passing opaque options to pasta, as documented.
It should also address #13229 and related issues.
@Luap99 @giuseppe @dgibson @vivier @dfaggioli @eriksjolund @cpach @baude @PhrozenByte
Does this PR introduce a user-facing change?