Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Datapath overhaul: zero-copy metadata with ingot, 'Compiled' UFTs (#585)
This PR rewrites the core of OPTE's packet model to use zero-copy packet parsing/modification via the [`ingot`](oxidecomputer/ingot#1) library. This enables a few changes which get us just shy of the 3Gbps mark. * **[2.36 -> 2.7]** The use of ingot for modifying packets in both the slowpath (UFT miss) and existing fastpath (UFT hit). * Parsing is faster -- we no longer copy out all packet header bytes onto the stack, and we do not allocate a vector to decompose an `mblk_t` into individual links. * Packet field modifications are applied directly to the `mblk_t` as they happen, and field reads are made from the same source. * Non-encap layers are not copied back out. * **[2.7 -> 2.75]** Packet L4 hashes are cached as part of the UFT, speeding up multipath route selection over the underlay. * **[2.75 -> 2.8]** Incremental Internet checksum recalculation is only performed when applicable fields change on inner flow headers (e.g., NAT'd packets). * VM-to-VM / intra-VPC traffic is the main use case here. * **[2.8 -> 3.05]** `NetworkParser`s now have the concept of inbound & outbound `LightweightMeta` formats. These support the key operations needed to execute all our UFT flows today (`FlowId` lookup, inner headers modification, encap push/pop, cksum update). * This also allows us to pre-serialize any bytes to be pushed in front of a packet, speeding up `EmitSpec`. * This is crucial for outbound traffic in particular, which has far smaller (in `struct`-size) metadata. * UFT misses or incompatible flows fallback to using the full metadata. * **[3.05 -> 2.95]** TCP state tracking uses a separate per-flow lock and does not require any lookup from a UFT. * I do not have numbers on how large the performance loss would be if we held the `Port` lock for the whole time. * (Not measured) Packet/UFT L4 Hashes are used as the Geneve source port, spreading inbound packets over NIC Rx queues based on the inner flow. * This is now possible because of #504 -- software classification would have limited us to the default inbound queue/group. * I feel bad for removing one more FF7 reference, but that is the way of these things. RIP port `7777`. * Previously, Rx queue affinity was derived solely from `(Src Sled, Dst Sled)`. There are several other changes here made to how OPTE functions which are needed to support the zero-copy model. * Each collected set of header transforms are `Arc<>`'d, such that we can apply them outside of the `Port` lock. * `FlowTable<S>`s now store `Arc<FlowEntry<S>>`, rather than `FlowEntry<S>`. * This enables the UFT entry for any flow to store its matched TCP flow, update its hit count and timestamp, and then update the TCP state without reacquring the `Port` lock. * This also drastically simplifies TCP state handling in fast path cases to not rely on post-transformation packets for lookup. * `Opte::process` returns an `EmitSpec` which is needed to finalise a packet before it can be used. * I'm not too happy about the ergonomics, but we have this problem because otherwise we'd need `Packet` to have some self-referential fields when supporting other key parts of XDE (e.g., parse -> use fields -> select port -> process). Closes #571, closes #481, closes #460. Slightly alleviates #435.
- Loading branch information