-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial draft of nftables support #883
Conversation
70de160
to
521b5b3
Compare
Ephemeral COPR build failed. @containers/packit-build please check. |
102cdb5
to
4f3bd79
Compare
Stripping WIP. Passes testing I've thrown at it. Now needs integration tests. |
601c63a
to
a9ff1ce
Compare
Integration tests added. Caveat: I don't have local access to IPv6 here, so I cannot test IPv6 forwarding and port forwarding. |
b6d9c7d
to
c19a317
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial look, not a full review.
FYI this seems to bloat the binary about 1.5 MB to about 21.5 MB total, not that this a blocker or anything I was just curious. I think we could use some time to debloat this at some point.
stmt::Statement::Match(stmt::Match { | ||
left: expr::Expression::BinaryOperation(expr::BinaryOperation::AND( | ||
Box::new(expr::Expression::Named(expr::NamedExpression::Meta( | ||
expr::Meta { | ||
key: expr::MetaKey::Mark, | ||
}, | ||
))), | ||
Box::new(expr::Expression::Number(MASK)), | ||
)), | ||
right: expr::Expression::Number(MASK), | ||
op: stmt::Operator::EQ, | ||
}), | ||
stmt::Statement::Masquerade(None), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WTF is this syntax, this looks so hard to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is atrocious. But somehow better to use than the string API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I'm the nftables-rs maintainer.
I agree with your concerns regarding the usage of the nftables-rs crate. I'm convinced that using the nftables JSON API is the way to go considering portability and maintainability.
Consider this usage example from nftnl-rs (native binding to libnftables): https://github.com/mullvad/nftnl-rs/blob/main/nftnl/examples/add-rules.rs#L147-L153
Likewise we could add this usability layer to nftables-rs through macros. WDYT?
I'm very interested in your opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using macros that add better syntax sounds great, having something closer to the real nft syntax would be ideal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a massive fan of this idea. The existing syntax can be very verbose, and I ended up with a lot of convenience functions to remove that boilerplate; adding macros on the library side would make using the library much more convenient.
And also strongly agree with the JSON API bit. My original attempt at this was using the Mullvad library, but it did not support everything we needed, and the description of that library as being based off reverse-engineered usage of libnftnl inside the nft binary was... concerning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your feedback.
@mheon: I went through the very same process with nftnl-rs, which didn't have the features I needed for another project. This is the reason I created nftables-rs.
I'll look into implementing nft-like syntax through macros.
src/firewall/nft.rs
Outdated
} | ||
|
||
/// Convert a subnet into a chain name. | ||
fn get_subnet_chain_name(subnet: IpNet, dnat: bool) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not a fan of using the subnet as chain name. What is wrong with the existing hash the used in iptables?
We could also just use the network ID I think?
Is there a reason to have one chain per subnet vs chain per network? One per subnet looks incorrect to me, because nft supports both ipv4 and ipv6 at the same time I would expect one chain that matches both the v4 and v6 subnet in there. More chains should also work but doesn't this just make things more complicated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like using subnets because it reads easier in nftables output (very obvious what each chain does), is completely unambiguous (no chance of a hash collision).
I could definitely combine v4 and v6 subnets into a single chain, but I think separating them is a bit more logical. It doesn't really need more rules, either (we need the same number of rules to actually forward traffic, and separate rules to forward traffic to the chain for v4 and v6 subnets).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I need to dump me a nft list ruleset
output to see what it actually looks like to better judge what I prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I really do not like the name like that, my main issue is that with the default podman network create
we juts iterate trough the subnets to assign them, so it is normal to have 10.89.0.0/24, 10.89.1.0/24, 10.89.2.0/24 and so on. This makes it hard for me to follow the chains as only a single digit changes in the name. I have no problem with having the subnet as part of the name but I would prefer if lets say 12 chars of the network ID are added as suffix to make them more distinct to my eyes.
I know we don't really have to look at this often but when we get bug reports we have to and I think a more distinct name will make it easier to find problems (e.g. missing chains/rules)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Only 8 characters though (I don't want things to get ridiculously long with IPv6 addresses)
11e4f4c
to
e9ae31c
Compare
@Luap99 One last failing test, and it's a firewalld reload one. Any idea what might be going on? |
Looks like you do not call into firewalld again to set the subnet trusted. (Do we even have to do that when using nftables?) But because |
The firewalld folks said it was safer to do so. I'll move the firewalld make-trusted call to the top of the loop. |
Alright, that should get tests passing. This is ready for review. |
5753bdf
to
77f8df8
Compare
I noticed these errors in the journal when testing this on my host:
I don't think they are related to this PR but I have to retest with the main/fedora version. Do you see this as well? |
Nope. Don't see anything like that in journal (F38, fully updated, Podman from main, Netavark from my branch) |
f39 ships with firewalld 2.0 and f38 with 1.X I think? So likely a firewalld change then. Anyway we can debug this if you get around to the firewalld stuff, doesn't seem related to this at all as I can reproduce using the fedora version with iptables. |
One question though:
What is the point of the dport match in NETAVARK-HOSTPORT-DNAT? The ..._dnat chain has to match the dport again anyway so just a direct jump should be easier? |
Comments addressed (added network ID to subnet chain names). Should be ready for merge. I'm on the firewalld stuff now, so clearly I need to do a dist-upgrade to f39 and see how badly we're broken (and what fixing it will mean - I really don't want to have to conventionalize on firewalld version) |
I think that jump is mostly to prevent the NETAVARK-HOSTPORT-DNAT chain small. We could eliminate the |
be3f504
to
3a067df
Compare
Ok I assumed a port range would actually only result is one rule not in one per port but seems like such feature (https://bugzilla.netfilter.org/show_bug.cgi?id=1501) is not yet implement or very new, not sure what the status is (finding out if patches on ML are merged is always such a chore) |
If it is merged, it's not in the JSON schema yet, so we can't make use of it. It's unfortunate because it's the only place ranges don't work. RE: Firewalld - The release notes say 2.0 has no breaking changes. I'm looking deeper. |
src/network/internal_types.rs
Outdated
@@ -35,6 +37,8 @@ pub struct TearDownNetwork { | |||
pub struct PortForwardConfigGeneric<Ports, IpAddresses> { | |||
/// id of container | |||
pub container_id: String, | |||
/// id of the network | |||
pub id: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I just remembered the json parser is very strict and fields are required by default so the netavark firewalld reload service would fail to parse the config files created with the old version, not a huge deal but I think just adding #[serde(default)]
to both fields might solve it. The string would be empty but given that iptables code does not use it that would be fine.
Also for consistency I think network_id
is a better name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
c8c548a
to
2a722d4
Compare
Adds an nftables firewall backend and tests for said backend. Implements basic forwarding, port forwarding, and teardown for all relevant rules. Heavily based on our existing iptables driver but with a number of improvements (we live in a dedicated table, so this should play much more nicely with other tools using the firewall; IPv4 and IPv6 share a table and almost all code; and rule structure is a bit simpler because we do have our own table and don't have to worry about cluttering up the FORWARD chain, we'll the the only ones using it. This implementation presently does not support isolation; that will be added in a followon. Fixes containers#816 Signed-off-by: Matthew Heon <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM assuming tests pass, nice work. I checked the simple bridge example with #884 and see a 4-5x speedup with nftables compared to iptables.
/lgtm
/hold
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Luap99, mheon The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold cancel |
This implementation lacks isolation support at present (that'll be a followon PR) and has only very minimal testing at the moment (including absolutely 0 testing for IPv6), but does handle all core tasks including forwarding and all types of port forwarding.
Fixes #816