Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Container backed hosts in simulator #3177

Merged

Conversation

hickeng
Copy link
Member

@hickeng hickeng commented Jul 13, 2023

Description

Adds support for container backed hosts in the simulator, in a similar fashion to the current container backed VMs.
Uses a similar pattern of RUN.container but in host.ConfigManager.AdvancedOption as ExtraConfig doesn't exist for hosts.
See simulator/container_host_system_test.go for specific tests and usage.

In adding this support, this PR also:

  • untangles container operations from the sim VM implementation
  • adds per-host OptionManager

This results in the following hierarchy of files relating to containers:

  1. container.go - basic container operations
  2. container_host_system.go and container_virtual_machine.go - host and vm specifics when mocking using containers.
  3. host_system.go and virtual_machine.go - sim objects but mostly without container specifics.

This is an initial PR with expected follow up:

  • create bridge networks for DVSes (PR only creates for pNICs)
  • ESX mock container image providing mocks for esxcli, configstorecli, etc
  • container backed VMs get connected to the appropriate bridge for the DVS/vSwitch structure they're associated with
  • mock for VMCI path to allow in-guest<->host logic to be tested
  • "hostd" in the sim host responds on host mgmt IP, backed by a single vcsim for all "hosts"

There's almost certainly substantial refinement needed around fully populating host NetConfig structures needed, and likely some revision to how the network connectivity is expressed, but I consider this PR a sufficient first-pass and expect those refinements to become clear through attempted use.

The goal of this work is to be able to easily test logic that runs directly on hosts, or depends hosts actually having a network presence. Examples include ESX daemons, testing of network disruption, etc.

Known issues:

Type of change

Please mark options that are relevant:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to
    not work as expected)
  • This change requires a documentation update
  • Build related change

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide
instructions so we can reproduce. If applicable, please also list any relevant
details for your test configuration.

  • Regression tests in simulator/feature_test.go
  • New tests in simulator/container_host_system_test.go
  • New tests in simulator/virtual_machine_test.go

Checklist:

  • My code follows the CONTRIBUTION
    guidelines of
    this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged

@hickeng
Copy link
Member Author

hickeng commented Jul 14, 2023

Current failure in govc tests is because the test assumes (correctly I think) that you should be able to change a VM to have RUN.container specified, then power it on and get a container backing.

However the current behaviour of this PR does not work for that path:

[root@ghicken-z01 test]# ./vendor/github.com/bats-core/bats-core/bin/bats vcsim.bats -f "run container"
vcsim.bats
 ✗ vcsim run container
   (from function `flunk' in file test_helper.bash, line 207,
    from function `assert_success' in file test_helper.bash, line 212,
    in test file vcsim.bats, line 258)
     `assert_success' failed
   Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
   Client:       Podman Engine
   Version:      4.4.1
   API Version:  4.4.1
   Go Version:   go1.18.10
   Built:        Fri Feb 17 02:31:22 2023
   OS/Arch:      linux/amd64
   Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
   []
   Error: inspecting object: no such object: "vcsim-DC0_H0_VM0-265104de-1472-547c-b873-6dc7883fb6cb"
   command failed with exit status 1: Powering on VirtualMachine:vm-55... govc: *types.MissingPowerOnConfiguration
   2023/07/14 07:22:22 vcsim-DC0_H0_VM0-265104de-1472-547c-b873-6dc7883fb6cb volumes: vcsim-DC0_H0_VM0-265104de-1472-547c-b873-6dc7883fb6cb--dmi
   2023/07/14 07:22:22 vcsim-DC0_H0_VM0-265104de-1472-547c-b873-6dc7883fb6cb [docker volume rm -f vcsim-DC0_H0_VM0-265104de-1472-547c-b873-6dc7883fb6cb--dmi ]: exit status 125, vcsim-DC0_H0_VM0-265104de-1472-547c-b873-6dc7883fb6cb--dmi

1 test, 1 failure

[root@ghicken-z01 test]#

I think the simplest path forwards is to create a go test unit test that follows this pattern as it's not straight forwards to use delve with the bats tests.

@dougm
Copy link
Member

dougm commented Jul 24, 2023

I think the simplest path forwards is to create a go test unit test that follows this pattern as it's not straight forwards to use delve with the bats tests.

There's a go example/test, creates the vm, but might be useful for this:

func Example_runContainer() {

We could also only support the container backing for new VMs, not sure adding a container backing to an existing vm is a common use-case?

@hickeng
Copy link
Member Author

hickeng commented Jul 27, 2023

We could also only support the container backing for new VMs, not sure adding a container backing to an existing vm is a common use-case?

I've no idea if it's common, but I'm loath to change behaviour unless it's actively problematic to support. I don't see why it would be if I've structured things well... just not working that way atm. It does mean we should consider the inverse - removing the container backing from a VM without deleting it which I suspect doesn't exist atm.

I'll likely create a container_virtual_machine_test.go and move the container backed VM tests into that for consistency.
At some point I'm going to want to mock the VMCI channel which will require enhancing that surface and don't want to drop those tests into feature-test.go

@hickeng hickeng force-pushed the topic/hickeng/container-backed-sim-hosts branch 3 times, most recently from 5b57fe7 to 6cdeccc Compare August 2, 2023 21:41
@hickeng
Copy link
Member Author

hickeng commented Aug 2, 2023

Tidied up commits, updated messages to meet contributor guidelines, etc.

dougm
dougm previously approved these changes Aug 3, 2023
Copy link
Member

@dougm dougm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @hickeng , this looks great. Just a couple of cosmetic suggestions for now, I might have more but can always follow up after merge

"github.com/vmware/govmomi/vim25/types"
)

const KiB = 1024
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are also in simulator/esx, could they be moved to the govmomi/units package or simulator/internal ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dougm I'm a little torn here - the units in units/size.go are not following the IEC naming pattern but are using KB instead of KiB.
My inclination is to move all references in this PR to reference the units package, then do a rename across the entire codebase as a dedicated commit - that seems to have wider implications. Not sure how constant names play into semver.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the govmomi/units package is used much outside of the repo, breaking compile time change would be ok in this case.

simulator/host_system.go Outdated Show resolved Hide resolved
simulator/container_virtual_machine_test.go Outdated Show resolved Hide resolved
hickeng added 8 commits August 7, 2023 16:20
Refactors the container logic out of the simulator VM file so it can be
used for both VM and host container backings.

The following file structure is now in place:
* container.go - wraps docker operation execs
* container_virtual_machine.go - orchestration of containers for VMs
* container_host_system.go - orchestration of containers for Hosts
* container_xxx_test.go - test for container backed VMs/Hosts

Add CGO_ENABLED=1 to test command with -race
This adds support for backing a host with a container in a similar
manner to how we back VMs with containers.

Hosts do not have the VM ExtraConfig mechanism, and "creation" of a
host is more "register the existance of" vs VMs that are actively
constructed from the provided spec. As such, this uses the advanced
options mechanism provided by the per-host Option Manager instead
of ExtraConfig, but following the same "RUN.container" key/value
approach for defining a container backing.

The created container for a host has the following volumes defined:
* bootbank (read-only)
* altbootbank (read-only)
* OS-DATA (read-write)
* datastore1 (read-write)

The volumes have suitably formed UUIDs, are mounted under
/vmfs/volumes, and have symlinked pretty names.

The volumes are associated with the host via labels, allowing the
use of filtered queries to retrieve volumes associated with a given
host.

All docker invocation is kept in container.go and out of the
container_xxx.go files. Not clear this is a fundamental benefit, but
should make it easier if we ever want to support remote docker hosts.
Creates an OptionManager instance per-host, with valued seeded from the
ESX template but not directly referencing it, ie. template changes will
not reflect into existing OptionManager instances.
OptionManager Query and Update methods work as expected.

Changes made via OptionManager are reflected into host.Config.Options
array, but it's a unidirectional reflection. This is done to match
infered behaviour of ESX.

There are two OptionManager instances for ESX (and I assume for VC),
For ESX they are found at:
* ServiceContent.setting
* ConfigManager.advancedOptions

The settings for ESX are empty, and the template had named the adv
opts as settings. This adds an empty Setting array in the templates to
clearly differentiate which set of BaseOptionValues is used to populate
which OptionManager instance.

Follow up required for:
* VC to determine what the contents of adv opts should be.
* whether HostConfigInfo.Options is adv opts, or combined set
Connects sim-hosts to bridges as specified in their config.
The bridges to use for a given pNIC are expressed in the advanced
options using the following pattern (example for pNIC 0):
	RUN.underlay.vmnic0=vcsim-mgmt-underlay

This uses an existing bridge or creates a new one as
needed.

If a host has a container backing, all pNICs defined in the host
template are discarded and new pNICs are created, 1 per underlay
name provided to simulator.HostSystem.configureContainerBacking.
This was the only sane way I found to indicate which bridges a
host should be connected to.

The IP assigned to the container is reflected into the various
host.Config structures associated with the vmknic, eg.
VirtualNicManagerInfo

The simulator.HostSystem.getNetConfigInterface method is prvoided to
allow a caller to retrieve all the various network entities associated
with a NetConfig binding, eg. "management", "vmotion".

Remove use of errors.Join to maintain support for older Go versions.

Known issues:
* podman volume ls filters act as OR instead of AND which results in
  all volumes being deleted any time a single host is removed. Issue
  opened and fixed in podman main.
If "RUN.container" is added or removed on an existing VM, that change
is applied immediately if the VM is currently powered on.

Modifications to the value of the key do not have an effect unless the
continue needs to be recreated for some reason.

Switches to Go templates for formating docker command output

Includes additional error logging detail
Makes use of the docker events stream to trigger inspect operations
against containers where waiting for things such as IPs.

Corrects prior failure to stop the async container watch when the
container was removed.

Updates to locking to avoid race warnings.

Updates vcsim.bats to look for a volume with `--dmi` suffix instead
of a volume with the plain container name.
Adds a stage to the github actions pipeline that provides an ssh server
that allows interactive login to the environment. This only triggers on
failure.

The reason for adding this is due to repeated failures to find
functional arguements for the specific docker version present. Quirks
around the format parameter values specifically.

This is done using the tmate action:
https://github.com/mxschmitt/action-tmate

Corrects boilerplate
I expect to squash this into an earlier commit once it passes tests
vcsim: support container backing for hosts
@hickeng hickeng force-pushed the topic/hickeng/container-backed-sim-hosts branch from 6cdeccc to f636e96 Compare August 7, 2023 16:21
@hickeng hickeng requested a review from dougm August 7, 2023 18:07
Copy link
Member

@dougm dougm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ship it!

@hickeng hickeng merged commit e7aac6a into vmware:main Aug 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants