Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial integration with Oxide Packet Transformation Engine #955

Merged
merged 1 commit into from
May 5, 2022

Conversation

bnaecker
Copy link
Collaborator

  • Brings in OPTE via the opte-ioctl and opte crates.
  • Modifies the instance-ensure request from Nexus to the sled agent, to
    carry the actual information required for setting up the guest OPTE
    port. This includes the actual IP subnet and MAC, rather than things
    like the VPC Subnet UUID.
  • Adds a database query and method to extract the above information from
    both the network interface and VPC subnet tables.
  • Adds OPTE port for the guests (and currently still a VNIC on top),
    with the right OPTE settings for traffic to flow between two guests in
    the same VPC subnet. That's the virtual-to-physical mapping and a
    router entry for the subnet.
  • Adds the VNICs over each OPTE port to the running zone. Note that this
    removes the specification of guest NICs for the zone itself as VNICs.
    They are passed as OPTE ports, and the VNIC is pulled out internally,
    so hopefully little will need to change when the VNIC is removed
    entirely.
  • Store the main underlay address for the sled agent, currently its
    dropshot server IP address, in the instance manager, and forward to
    each instance. It's then used as the underlay address when setting up
    the OPTE ports for the guest.
  • Fixes bad naming of VNICs used to emulate Chelsio NICs in the current
    setup. Removes the unnecessary loopback address, since the sled agent
    provides its main listening address as the underlay address when
    create OPTE ports.
  • Adds better installation of OPTE, xde, and the kernel bits they rely
    on. Don't install opteadm or xde directly, do everything through
    their package repos.

@rcgoodfellow
Copy link
Contributor

This is awesome @bnaecker! Really excited to see this work getting integrated into the control plane 🎉 . Just a few comments on comments above.

@bnaecker bnaecker marked this pull request as draft April 21, 2022 16:34
@bnaecker
Copy link
Collaborator Author

Though I'm moving this to a draft, this PR is definitely ready for review. There are a few reasons for the draft status. First, any CI is blocked on being able to pull OPTE as a dependency. @rzezeski has started turning the crank on making OPTE public, but it won't be instantaneous. Second, given that fact, I'm going to split out the updates to the scripts in ./tools. Those are really independent.

Copy link
Contributor

@rzezeski rzezeski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this all together @bnaecker. I realize that the work and effort required here is not fully appreciated if one were to look at the lines of code alone. Especially given all the testing woes you've hit, plenty of which were my fault. So thank you for your patience and diligence in getting this across the line.

For the most part this seems as I would expect. Some of my comments don't require any action, just stuff I wanted to call out. Though I did not think to explicitly differentiate what needs action and what doesn't. Feel free to reach out for clarification in that regard or if any of my comments are confusing.

nexus/src/db/datastore.rs Outdated Show resolved Hide resolved
nexus/src/sagas.rs Show resolved Hide resolved
openapi/sled-agent.json Show resolved Hide resolved
openapi/sled-agent.json Outdated Show resolved Hide resolved
openapi/sled-agent.json Show resolved Hide resolved
sled-agent/src/opte.rs Outdated Show resolved Hide resolved
sled-agent/src/opte.rs Outdated Show resolved Hide resolved
sled-agent/src/opte.rs Outdated Show resolved Hide resolved
sled-agent/src/opte.rs Outdated Show resolved Hide resolved
tools/install_opte.sh Outdated Show resolved Hide resolved
nexus/src/db/datastore.rs Outdated Show resolved Hide resolved
nexus/src/nexus.rs Outdated Show resolved Hide resolved
sled-agent/src/opte.rs Outdated Show resolved Hide resolved
nexus/src/db/datastore.rs Outdated Show resolved Hide resolved
@bnaecker
Copy link
Collaborator Author

bnaecker commented May 3, 2022

I've verified that all the tests pass locally, and that I can successfully ping between two Alpine guests. This is no longer blocked on OPTE going public, but the linking failure we've seen before is back. I'm going to mark this open, with the assumption it'll wait until that gets fixed to merge.

@bnaecker bnaecker marked this pull request as ready for review May 3, 2022 19:56
@bnaecker bnaecker force-pushed the opteadm-integration branch from a5d0fb5 to 0c126a1 Compare May 4, 2022 22:06
@bnaecker
Copy link
Collaborator Author

bnaecker commented May 5, 2022

@rcgoodfellow @rmustacc @rzezeski I'm planning to merge this once CI is green, unless there are objections. Below is some output showing successfully netcat-ing between two guests using this branch.

Guest 1

This is the nc server, with IP 172.30.0.5.

localhost:~# nc -l -s 172.30.0.5 -p 5556
hello
from
the
other
side
^Z[2]+  Stopped                    nc -l -s 172.30.0.5 -p 5556
localhost:~#

Guest 2

This is the nc client, with IP 172.30.0.6

localhost:~# nc 172.30.0.5 5556
hello
from
the
other
side
^Z[2]+  Stopped                    nc 172.30.0.5 5556
localhost:~#

OPTE trace output

Here's the output of opte/dtrace/opte-trace opte-port-process while the above was happening.

AME         DIR FLOW                                        MBLK               RESULT
opte1        out XXX,0.0.0.0:0,0.0.0.0:0                     0xfffffe59539aaa20 Drop { reason: Layer { name: "arp" } }
opte1        out XXX,0.0.0.0:0,0.0.0.0:0                     0xfffffe59539aaa20 Drop { reason: Layer { name: "arp" } }
opte1        out XXX,0.0.0.0:0,0.0.0.0:0                     0xfffffe595399f600 Hairpin(Packet { avail: 80, source: Allocated, segs: [PacketSeg { mp: 0xfffffe595ba91c60, dblk: 0xfffffe595ba8f900, len: 42, avail: 80 }], state: Initialized { len: 42 } })
opte1        out TCP,172.30.0.6:46459,172.30.0.5:5556        0xfffffe595399f600 Modified
opte0        in  TCP,172.30.0.6:46459,172.30.0.5:5556        0xfffffe595b877500 Modified
opte0        out TCP,172.30.0.5:5556,172.30.0.6:46459        0xfffffe59535fa4c0 Modified
opte1        in  TCP,172.30.0.5:5556,172.30.0.6:46459        0xfffffe594b3ba7a0 Modified
opte1        out TCP,172.30.0.6:46459,172.30.0.5:5556        0xfffffe595399f600 Modified
opte0        in  TCP,172.30.0.6:46459,172.30.0.5:5556        0xfffffe594a66aa40 Modified
opte1        out TCP,172.30.0.6:46459,172.30.0.5:5556        0xfffffe595399f600 Modified
NAME         DIR FLOW                                        MBLK               RESULT
opte0        in  TCP,172.30.0.6:46459,172.30.0.5:5556        0xfffffe5963740d40 Modified
opte0        out TCP,172.30.0.5:5556,172.30.0.6:46459        0xfffffe59535fa4c0 Modified
opte1        in  TCP,172.30.0.5:5556,172.30.0.6:46459        0xfffffe596363e140 Modified
opte1        out TCP,172.30.0.6:46459,172.30.0.5:5556        0xfffffe59537285e0 Modified
opte0        in  TCP,172.30.0.6:46459,172.30.0.5:5556        0xfffffe5963740a80 Modified
opte0        out TCP,172.30.0.5:5556,172.30.0.6:46459        0xfffffe59535fa4c0 Modified
opte1        in  TCP,172.30.0.5:5556,172.30.0.6:46459        0xfffffe594b3baae0 Modified
opte1        out TCP,172.30.0.6:46459,172.30.0.5:5556        0xfffffe59537285e0 Modified
opte0        in  TCP,172.30.0.6:46459,172.30.0.5:5556        0xfffffe594ac9c920 Modified
opte0        out TCP,172.30.0.5:5556,172.30.0.6:46459        0xfffffe595387f9c0 Modified
NAME         DIR FLOW                                        MBLK               RESULT
opte1        in  TCP,172.30.0.5:5556,172.30.0.6:46459        0xfffffe5927dff380 Modified
opte1        out TCP,172.30.0.6:46459,172.30.0.5:5556        0xfffffe59fb91cd80 Modified
opte0        in  TCP,172.30.0.6:46459,172.30.0.5:5556        0xfffffe59508b5a80 Modified
opte0        out TCP,172.30.0.5:5556,172.30.0.6:46459        0xfffffe595387f9c0 Modified
opte1        in  TCP,172.30.0.5:5556,172.30.0.6:46459        0xfffffe5a013b20a0 Modified
opte1        out TCP,172.30.0.6:46459,172.30.0.5:5556        0xfffffe595399f600 Modified
opte0        in  TCP,172.30.0.6:46459,172.30.0.5:5556        0xfffffe594ffbdf40 Modified
opte0        out TCP,172.30.0.5:5556,172.30.0.6:46459        0xfffffe59fe43a860 Modified
opte1        in  TCP,172.30.0.5:5556,172.30.0.6:46459        0xfffffe5953434280 Modified

Caveat

There's a caveat here. The firewall rules are not propagated correctly in this branch. In particular, the guests still can't communicate with one another without adding a rule like:

bnaecker@feldspar : ~/omicron $ pfexec /opt/oxide/opte/bin/opteadm add-fw-rule -p opte0 --action allow --dir in --hosts subnet=172.30.0.0/22 --ports any --priority 10 --protocol tcp

where opte0 is the server guest. I'm planning to do the work to include the current set of firewall rules for the guest in another PR. This branch has lived way too long already.

@bnaecker bnaecker force-pushed the opteadm-integration branch 2 times, most recently from b244926 to 3f171b5 Compare May 5, 2022 18:17
@rzezeski
Copy link
Contributor

rzezeski commented May 5, 2022

@bnaecker I think that makes perfect sense. There is plenty of major improvement here already that seems ready for integration. I think doing the firewall stuff in a follow up makes it cleaner and easier to focus review anyways. So +1 to your plan.

- Brings in OPTE via the `opte-ioctl` and `opte` crates.
- Modifies the instance-ensure request from Nexus to the sled agent, to
  carry the actual information required for setting up the guest OPTE
  port. This includes the actual IP subnet and MAC, rather than things
  like the VPC Subnet UUID.
- Adds a database query and method to extract the above information from
  both the network interface and VPC subnet tables.
- Adds OPTE port for the guests (and currently still a VNIC on top),
  with the right OPTE settings for traffic to flow between two guests in
  the same VPC subnet. That's the virtual-to-physical mapping and a
  router entry for the subnet.
- Adds the VNICs over each OPTE port to the running zone. Note that this
  removes the specification of guest NICs for the zone itself as VNICs.
  They are passed as OPTE ports, and the VNIC is pulled out internally,
  so hopefully little will need to change when the VNIC is removed
  entirely.
- Store the main underlay address for the sled agent, currently its
  dropshot server IP address, in the instance manager, and forward to
  each instance. It's then used as the underlay address when setting up
  the OPTE ports for the guest.

Addressing review comments

Add a unique VNI to each VPC

Updating OPTE dependency and package repos

Add dummy/mock module for OPTE on non-illumos systems

Update error handling to be more self-contained in the callee
@bnaecker bnaecker force-pushed the opteadm-integration branch from 3f171b5 to 5ae31bd Compare May 5, 2022 18:54
@bnaecker bnaecker enabled auto-merge (squash) May 5, 2022 19:00
@bnaecker bnaecker merged commit f20b555 into main May 5, 2022
@bnaecker bnaecker deleted the opteadm-integration branch May 5, 2022 19:45
leftwo pushed a commit that referenced this pull request Oct 4, 2023
Crucible updates
    all Crucible connections should set TCP_NODELAY (#983)
    Use a fixed size for tag and nonce (#957)
    Log crucible opts on start, order crutest options (#974)
    Lock the Downstairs less (#966)
    Cache dirty flag locally, reducing SQLite operations (#970)
    Make stats mutex synchronous (#961)
    Optimize requeue during flow control conditions (#962)
    Update Rust crate base64 to 0.21.4 (#950)
    Do less in control (#949)
    Fix --flush-per-blocks (#959)
    Fast dependency checking (#916)
    Update actions/checkout action to v4 (#960)
    Use `cargo hakari` for better workspace deps (#956)
    Update actions/checkout digest to 8ade135 (#939)
    Cache block size in Guest (#947)
    Update Rust crate ringbuffer to 0.15.0 (#954)
    Update Rust crate toml to 0.8 (#955)
    Update Rust crate reedline to 0.24.0 (#953)
    Update Rust crate libc to 0.2.148 (#952)
    Update Rust crate indicatif to 0.17.7 (#951)
    Remove unused async (#943)
    Use a synchronous mutex for bw/iop_tokens (#946)
    Make flush ID non-locking (#945)
    Use `oneshot` channels instead of `mpsc` for notification (#918)
    Use a strong type for upstairs negotiation (#941)
    Add a "dynamometer" option to crucible-downstairs (#931)
    Get new work and active count in one lock (#938)
    A bunch of misc test cleanup stuff (#937)
    Wait for a snapshot to finish on all downstairs (#920)
    dsc and clippy cleanup. (#935)
    No need to sort ackable_work (#934)
    Use a strong type for repair ID (#928)
    Keep new jobs sorted (#929)
    Remove state_count function on Downstairs (#927)
    Small cleanup to IOStateCount (#932)
    let cmon and IOStateCount use ClientId (#930)
    Fast return for zero length IOs (#926)
    Use a strong type for client ID (#925)
    A few Crucible Agent fixes (#922)
    Use a newtype for `JobId` (#919)
    Don't pass MutexGuard into functions (#917)
    Crutest updates, rename tests, new options (#911)

Propolis updates
    Update tungstenite crates to 0.20
    Use `strum` crate for enum-related utilities
    Wire up bits for CPUID customization
    PHD: improve artifact store (#529)
    Revert abort-on-panic in 'dev' cargo profile
leftwo added a commit that referenced this pull request Oct 5, 2023
Crucible updates
    all Crucible connections should set TCP_NODELAY (#983)
    Use a fixed size for tag and nonce (#957)
    Log crucible opts on start, order crutest options (#974)
    Lock the Downstairs less (#966)
    Cache dirty flag locally, reducing SQLite operations (#970)
    Make stats mutex synchronous (#961)
    Optimize requeue during flow control conditions (#962)
    Update Rust crate base64 to 0.21.4 (#950)
    Do less in control (#949)
    Fix --flush-per-blocks (#959)
    Fast dependency checking (#916)
    Update actions/checkout action to v4 (#960)
    Use `cargo hakari` for better workspace deps (#956)
    Update actions/checkout digest to 8ade135 (#939)
    Cache block size in Guest (#947)
    Update Rust crate ringbuffer to 0.15.0 (#954)
    Update Rust crate toml to 0.8 (#955)
    Update Rust crate reedline to 0.24.0 (#953)
    Update Rust crate libc to 0.2.148 (#952)
    Update Rust crate indicatif to 0.17.7 (#951)
    Remove unused async (#943)
    Use a synchronous mutex for bw/iop_tokens (#946)
    Make flush ID non-locking (#945)
    Use `oneshot` channels instead of `mpsc` for notification (#918)
    Use a strong type for upstairs negotiation (#941)
    Add a "dynamometer" option to crucible-downstairs (#931)
    Get new work and active count in one lock (#938)
    A bunch of misc test cleanup stuff (#937)
    Wait for a snapshot to finish on all downstairs (#920)
    dsc and clippy cleanup. (#935)
    No need to sort ackable_work (#934)
    Use a strong type for repair ID (#928)
    Keep new jobs sorted (#929)
    Remove state_count function on Downstairs (#927)
    Small cleanup to IOStateCount (#932)
    let cmon and IOStateCount use ClientId (#930)
    Fast return for zero length IOs (#926)
    Use a strong type for client ID (#925)
    A few Crucible Agent fixes (#922)
    Use a newtype for `JobId` (#919)
    Don't pass MutexGuard into functions (#917)
    Crutest updates, rename tests, new options (#911)

Propolis updates
    Update tungstenite crates to 0.20
    Use `strum` crate for enum-related utilities
    Wire up bits for CPUID customization
    PHD: improve artifact store (#529)
    Revert abort-on-panic in 'dev' cargo profile

---------

Co-authored-by: Alan Hanson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants