-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use mac flows to filter xde traffic #61 #62
base: master
Are you sure you want to change the base?
Conversation
Not yet had the luxury to test on real NICs, but we have progress (below taken from a running omicron + SoftNPU instance):
This currently relies on utterly abusing the internals of |
So, the interesting news is that this still works, and works on Intel NICs. Sadly, performance (in the latency sense) is practically identical: EDIT: Marginal changes are expected on these numbers; my test setup is capped at 2x1GbE and these latency measurements only cover C2S results from #62 after `master`
C2S results repeated
So between a few runs we're basically in the same ballpark, possibly a little worse off as we've now added in # master
kyle@farme:~/gits/opte$ pfexec opteadm set-xde-underlay igb0 igb1
kyle@farme:~/gits/opte$ ifconfig | grep igb
igb0: flags=1000942<BROADCAST,RUNNING,PROMISC,MULTICAST,IPv4> mtu 9000 index 3
igb1: flags=1000942<BROADCAST,RUNNING,PROMISC,MULTICAST,IPv4> mtu 9000 index 4
igb0: flags=20002104941<UP,RUNNING,PROMISC,MULTICAST,DHCP,ROUTER,IPv6> mtu 9000 index 3
igb1: flags=20002104941<UP,RUNNING,PROMISC,MULTICAST,DHCP,ROUTER,IPv6> mtu 9000 index 4
# git switch use-the-flow-luke-61, driver recompile, ...
kyle@farme:~/gits/opte$ pfexec opteadm set-xde-underlay igb0 igb1
kyle@farme:~/gits/opte$ ifconfig | grep igb
igb0: flags=1000842<BROADCAST,RUNNING,MULTICAST,IPv4> mtu 9000 index 3
igb1: flags=1000842<BROADCAST,RUNNING,MULTICAST,IPv4> mtu 9000 index 4
igb0: flags=20002104841<UP,RUNNING,MULTICAST,DHCP,ROUTER,IPv6> mtu 9000 index 3
igb1: flags=20002104841<UP,RUNNING,MULTICAST,DHCP,ROUTER,IPv6> mtu 9000 index 4 I don't yet know why zone-to-zone over simnets is broken on CI -- from what I recall it worked on my local helios box before I acquired a second test node. |
What do you get when running a similar traffic flow between the raw IPv6 addresses? |
While running an iperf session over each underlay link for 100s: kyle@farme:~/gits/opte$ cargo kbench in-situ have-a-go
Finished bench [optimized + debuginfo] target(s) in 0.15s
Running benches/xde.rs (target/release/deps/xde-ae24f11c9169898b)
###----------------------###
::: DTrace running... :::
:::Type 'exit' to finish.:::
###----------------------###
dtrace: description 'profile-201us ' matched 2 probes
exit
###---------------------###
:::Awaiting out files...:::
###---------------------###
###-----###
:::done!:::
###-----###
ERROR: No stack counts found
Failed to create flamegraph for xde_rx.
ERROR: No stack counts found
Failed to create flamegraph for xde_mc_tx. No hits for non-Geneve traffic. The flows themselves are: kyle@farme:~/gits/opte$ flowadm show-flow
FLOW LINK IPADDR PROTO LPORT RPORT DSFLD
igb0_xde igb0 -- udp 6081 -- --
igb1_xde igb1 -- udp 6081 -- -- So far as I can tell we can't jointly specify IP addr + family + port, c.f.
|
During setup, we are not doing any work to ensure that Helios has a valid NDP cache entry ready to use over the simnet link we install for testing. As a result, XDE selects the right output port, but installs source and destination MAC addrs of zero. This worked before; the devices were in promiscuous mode, so the packets made into `xde_rx`. In other cases, the underlay traffic in e.g. a SoftNPU deployment was priming all the necessary NCEs, so we always knew the target MAC address. Obviously this is an easy fix here, and in practice we'll always have the NCE for the nexthop (i.e., the sidecar).
7af1fe6
to
11d6bc6
Compare
Today, we get our TX and RX pathways on underlay devices for XDE by creating a secondary MAC client on each device. As part of this process we must attach a unicast MAC address (or specify `MAC_OPEN_FLAGS_NO_UNICAST_ADDR`) during creation to spin up a valid datapath, otherwise we can receive packets on our promiscuous mode handler but any sent packets are immediately dropped by MAC. However, datapath setup then fails to supply a dedicated ring/group for the new client, and the device is reduced to pure software classification. This hard-disables any ring polling threads, and so all packet processing occurs in the interrupt context. This limits throughput and increases OPTE's blast radius on control plane/crucible traffic between sleds. This PR places a hold onto the underlay NICs via `dls`, and makes use of `dls_open`/`dls_close` to acquire a valid transmit pathway onto the original (primary) MAC client, to which we can also attach a promiscuous callback. As desired, we are back in hardware classification. This work is orthogonal to #62 (and related efforts) which will get us out of promiscuous mode -- both are necessary parts of making optimal use of the illumos networking stack. Closes #489 .
This work is on the back burner at the moment as there is more pressing work that can get done; sticking with promisc isn't a problem for the near term future.