-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle IPv4 options #571
Comments
It should be said that there is a bug in swallowing these options properly -- we need to adjust the IHL and total_length back down to baseline. As it stands, these are unchanged and so get fed back out when we generate a fresh IP header: opte/lib/opte/src/engine/ip4.rs Lines 546 to 562 in 1201f21
The better fix is just to correctly re-emit them, as I promised we should eventually do... |
This PR rewrites the core of OPTE's packet model to use zero-copy packet parsing/modification via the [`ingot`](oxidecomputer/ingot#1) library. This enables a few changes which get us just shy of the 3Gbps mark. * **[2.36 -> 2.7]** The use of ingot for modifying packets in both the slowpath (UFT miss) and existing fastpath (UFT hit). * Parsing is faster -- we no longer copy out all packet header bytes onto the stack, and we do not allocate a vector to decompose an `mblk_t` into individual links. * Packet field modifications are applied directly to the `mblk_t` as they happen, and field reads are made from the same source. * Non-encap layers are not copied back out. * **[2.7 -> 2.75]** Packet L4 hashes are cached as part of the UFT, speeding up multipath route selection over the underlay. * **[2.75 -> 2.8]** Incremental Internet checksum recalculation is only performed when applicable fields change on inner flow headers (e.g., NAT'd packets). * VM-to-VM / intra-VPC traffic is the main use case here. * **[2.8 -> 3.05]** `NetworkParser`s now have the concept of inbound & outbound `LightweightMeta` formats. These support the key operations needed to execute all our UFT flows today (`FlowId` lookup, inner headers modification, encap push/pop, cksum update). * This also allows us to pre-serialize any bytes to be pushed in front of a packet, speeding up `EmitSpec`. * This is crucial for outbound traffic in particular, which has far smaller (in `struct`-size) metadata. * UFT misses or incompatible flows fallback to using the full metadata. * **[3.05 -> 2.95]** TCP state tracking uses a separate per-flow lock and does not require any lookup from a UFT. * I do not have numbers on how large the performance loss would be if we held the `Port` lock for the whole time. * (Not measured) Packet/UFT L4 Hashes are used as the Geneve source port, spreading inbound packets over NIC Rx queues based on the inner flow. * This is now possible because of #504 -- software classification would have limited us to the default inbound queue/group. * I feel bad for removing one more FF7 reference, but that is the way of these things. RIP port `7777`. * Previously, Rx queue affinity was derived solely from `(Src Sled, Dst Sled)`. There are several other changes here made to how OPTE functions which are needed to support the zero-copy model. * Each collected set of header transforms are `Arc<>`'d, such that we can apply them outside of the `Port` lock. * `FlowTable<S>`s now store `Arc<FlowEntry<S>>`, rather than `FlowEntry<S>`. * This enables the UFT entry for any flow to store its matched TCP flow, update its hit count and timestamp, and then update the TCP state without reacquring the `Port` lock. * This also drastically simplifies TCP state handling in fast path cases to not rely on post-transformation packets for lookup. * `Opte::process` returns an `EmitSpec` which is needed to finalise a packet before it can be used. * I'm not too happy about the ergonomics, but we have this problem because otherwise we'd need `Packet` to have some self-referential fields when supporting other key parts of XDE (e.g., parse -> use fields -> select port -> process). Closes #571, closes #481, closes #460. Slightly alleviates #435.
Today we just swallow any IPv4 options we encounter and never emit any ourselves:
opte/lib/opte/src/engine/ip4.rs
Lines 379 to 384 in 1201f21
Need to decide if we'll just let them through as-is or on a per-option basis. Will we need to rewrite the contents on ingress/egress?
The text was updated successfully, but these errors were encountered: