Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Windows nodes #732

Closed
johnSchnake opened this issue May 31, 2019 · 25 comments · Fixed by #1255
Closed

Support for Windows nodes #732

johnSchnake opened this issue May 31, 2019 · 25 comments · Fixed by #1255
Assignees
Labels
kind/enhancement New or improved functionality lifecycle/active Actively being worked on p2-moderate
Milestone

Comments

@johnSchnake
Copy link
Contributor

Describe the solution you'd like
We need to make sure that Sonobuoy is built for and works on Windows clusters.

Anything else you would like to add:
There may be some build changes to make but other than that I think this will involve mainly running/rerunning on Windows machines and finding each situation where we do something like use path instead of filepath because we assumed the node would be on linux.

@jvoeller
Copy link

Are there any updates on windows node/container support?

@johnSchnake
Copy link
Contributor Author

No updates at this time. Going to start investigation this milestone though.

@johnSchnake johnSchnake added this to the v0.16.3 milestone Oct 15, 2019
@johnSchnake
Copy link
Contributor Author

Finally got some Azure credentials and will start experimenting with setting up a cluster with Windows nodes. It seems that you cant do that through the UI but only via the CLI after opting-in to experimental features.

@johnSchnake
Copy link
Contributor Author

In the next milestone hope to set it up and try at least a single run to see the first set of errors.

@johnSchnake johnSchnake modified the milestones: v0.16.3, v0.16.4 Nov 4, 2019
@johnSchnake johnSchnake self-assigned this Nov 4, 2019
@johnSchnake johnSchnake modified the milestones: v0.16.4, v0.16.5 Nov 15, 2019
@johnSchnake
Copy link
Contributor Author

Followed instructions here to set up a cluster with some windows nodes: https://aws.amazon.com/blogs/aws/amazon-eks-windows-container-support-now-generally-available/

@johnSchnake
Copy link
Contributor Author

When trying to just naively run sonobuoy the first problem I hit is that if the pod gets scheduled on the windows node then it just gets stuck in a ContainerCreating state.

From what I've been told, typically users will taint their Windows nodes to avoid accidental scheduling of non-Windows workloads there. I will do that next and continue to report on issues.

@murmal
Copy link

murmal commented Nov 15, 2019

an idea would be to add a nodeSelector field to your addon template. Then you can address your linux and windows nodes specifically.

@johnSchnake
Copy link
Contributor Author

Yea it feels like something like that may be required? I can taint things and force Sonobuoy to start on the Linux node, but if that is just the control plane that isn't really ideal. I think the goal of this ticket is to provide a sonobuoy image which would work appropriately on a Windows worker node.

As is, even if run on a linux node in a cluster with some Windows nodes, the first main issue is that the systemd-logs plugin doesn't support Windows nodes but does automatically get run on every node. This leads to them getting stuck.

In addition, when I tried to run e2e it failed due to errImgPull since it looked up the image with the metadata gcr.io/google-containers/conformance:v1.14.8-eks-b7174d. This is really due to a recent change where we let metadata in the k8s version be used by default so things like -beta.0 would be used. Seems like either way someone is going to get a short-end of the stick and have to specify the version manually. If that is the case I'd rather revert part of that change and go back to just using v1.14.8 and if you want to use alpha/beta versions then you have to manually specify them with flags.

@johnSchnake
Copy link
Contributor Author

johnSchnake commented Nov 18, 2019

This will invariably become an umbrella ticket tracking the misc Windows requirements. For now I can see the following:

  • Need Dockerfile updated to be Windows compatible. Can start from either an OS agnostic image (would be nice and consistent with upstream) or just a separate image for Windows.
  • Need Makefile to create Windows (build and push the image)
  • Need ./scripts/run_single_node_worker.sh to be Windows compatible (e.g. have a powershell equiv probably). I dont think we need the run_master.sh at all; it is the default script for the image currently but we override it by default.

@johnSchnake
Copy link
Contributor Author

Goal for the next sprint is just to make a build flow that will create the image necessary; it may not work fully (or at all) but from there we can more quickly iterate on each smaller point.

@johnSchnake johnSchnake changed the title Support for Windows node support Support for Windows nodes Nov 25, 2019
@johnSchnake
Copy link
Contributor Author

Relaying information from a thread on sig-windows when I reached out:

I do know how the current e2e tests work against Windows. There's no need to build a different e2e.test binary, the only difference is a couple of the flags you pass to it...specifically KUBE_TEST_REPO_LIST and --node-os-distro=windows

I think it should be possible at least as a first-pass effort to run Sonobuoy from a Linux node but have it use the Windows repo list/node-os flag. The troublesome caveat is that for the e2e tests to work, you'll need to taint the Linux nodes (to prevent Windows test pods from ending up on them) and remove any Windows taints you might have on your Windows nodes. This is the opposite of the normal taint arrangement that is recommended, but AFAIK is the standard for conformance testing Windows nodes right now.

So you'd need to have the Sonobuoy pod have a toleration/nodeSelector combination to ensure it only ends up on the Linux node

@johnSchnake
Copy link
Contributor Author

I'll have to confirm that "the current e2e tests work against Windows" because it is my understanding that the e2e image isn't built for Windows and you have to specify a different image from the default.

I'll have to investigate the --node-os-distro flag but the KUBE_TEST_REPO_LIST is what Sonobuoy uses in order to support airgapped testing. Windows test runs point to different repos I suppose for the images. I think this is partially for performance since the images can be large and are often preloaded in/close to the cluster.

Will still have to build Sonobuo to run on Windows nodes since the sidecar will be on the windows nodes. If we have to do that then it doesn't seem too meaningful/necessary to ensure the sonobuoy aggregator runs on the Linux node; it may as well be ready to run on a Windows node too.

@johnSchnake johnSchnake modified the milestones: v0.16.5, 0.17.1 Dec 10, 2019
@johnSchnake johnSchnake added the kind/enhancement New or improved functionality label Dec 11, 2019
johnSchnake added a commit that referenced this issue Jan 17, 2020
There may be more of these situations lurking around, but as we
try to support Windows nodes we need to make sure that we are using
filepath unless we know we need forward slashes (like in URLs).

This changes a few calls to path.XYZ to filepath.XYZ and updates some
tests in the same way.

xref #732

Signed-off-by: John Schnake <[email protected]>
johnSchnake added a commit that referenced this issue Jan 21, 2020
If we are to support Windows nodes, we want to avoid using bash scripts
so that we don't have to duplicate the logic and translate it into a
powershell script.

Changes include:
 - remove the run_master.sh script and instead rely on the Dockerfile
to have the proper aggregator invocation. This way, regardless of the
underlying OS, the image knows the right command to use.
 - remove the script for running the worker in single-node mode. This
only served to run Sonobuoy and then sleep for some amount of time to
avoid restarting the container. Instead, a flag was added to the
single-node command and golang handles the sleep functionality now. By
default it sleeps 0 seconds, consistent with the existing logic.
 - Slight modifications to command structure so that the subcommands
can use the cobra RunE method and logging can be done at the top level
only. This helps avoid using `os.Exit` which hinders testability.

xref #732

Signed-off-by: John Schnake <[email protected]>
@johnSchnake
Copy link
Contributor Author

This is going quite well; I've actually gotten Sonobuoy built and run on Windows nodes.

There are some pathing issues w.r.t. untar-ing the tarball created on the server (if not on a windows machine then you end up with files named plugins\foo\results\xyz... where all the path seperators were just considered part of the name. Not sure the best way to keep things sane, I think that maybe the onus just needs to be on the user to untar it appropriately for themselves (there are lots of options for tar to rename files on the fly and such). I dont know if there is some default that would be good for all combination of user machine (windows/darwin/linux) and nodes (linux/windows).

A few more PRs will be coming to address:

  • building/publishing the windows image (will have to be manual; requires a windows box to build on)
  • POC windows plugin
  • fixing up the retrieve command

@zubron zubron modified the milestones: 0.17.1, 0.18.0 Mar 26, 2020
@jayunit100
Copy link
Contributor

jayunit100 commented Oct 28, 2020

Sounds like from conv w @vladimirvivien were going to get back into this now.
as of now,

  • 1.18.5
  • 1.19.0

both give the same error

sonobuoy/sonobuoy:v0.18.5. .... no matching manifest for windows/amd64 10.0.17763 in the manifest list entries

@jayunit100
Copy link
Contributor

also looks like we'll need node-os-distro support (plumb it into the plugin somehow ? )

@perithompson
Copy link

I've tried to use the makefile to build windows and it doesn't seem to work, the base build image needs to be changed to an image that supports multi-arch and cut doesn't work on windows. I will keep trying to get this to work in my environment but thought it was worth opening the discussion @jayunit100. Should I raise a separate issue to track this?

@vladimirvivien
Copy link
Contributor

@perithompson I think so. Please open a new issue outline specifically that we need multi-arch image for windows along with any juicy detail that you find out in your research. Thanks!

@perithompson
Copy link

@vladimirvivien of course! I will get that raised for you! I will try and include everything that I have found!

@perithompson
Copy link

@vladimirvivien I have been looking around, it seems that you can build windows images using docker buildx build and create one image for all architectures. There is a good example here in the pause image

There are some considerations though, essentially what needs to be done is build the binary using Linux and GOOS and then copy onto the windows images using buildx and qemu

@vladimirvivien
Copy link
Contributor

@perithompson thanks for bringing this up again. We will reboot efforts for the Sonobuoy binary to built for Windows. We are also (colleagues) pushing for the e2e pods to be schedulable on win machines.

@perithompson
Copy link

Thanks @vladimirvivien! let me know if I can help in any way!

@perithompson
Copy link

it seems that we need to add --node-os-disto=windows when running this on windows but we don't have a way to set that argument currently even when we deploy windows

@johnSchnake
Copy link
Contributor Author

I think some flag juggling had been done in the past because folks weren't aware of this flag

--aggregator-node-selector nodeSelectors                   Node selectors to add to the aggregator. Values can be given multiple times and are in the form key:value (default map[])

Then its a matter of just ensuring the e2e tests behave properly on a mixed node cluster. I had only vaguely been looking into how the upstream windows folks were doing this when I stepped away from the project. I'm sure most of that has changed now.

Based on what you showed me about buildx/qemu I'm excited to get full windows support. It doesn't seem like many challenges remain.

@vladimirvivien
Copy link
Contributor

@johnSchnake thanks for looking into this.
I vaguely remember that the need was for the conformance images to be scheduled on a Windows node, not the aggregator. But I could be wrong. Anyway, thanks @jayunit100 and @perithompson for unblocking themselves on this before we had an official solution!

@johnSchnake
Copy link
Contributor Author

A build in CI for windows should be landing early next week; will expand testing after that.

@johnSchnake johnSchnake added the lifecycle/active Actively being worked on label May 9, 2021
johnSchnake added a commit that referenced this issue May 12, 2021
Adds windows builds for a number of OS versions to our CI pipeline.

Utilizes buildx and skopeo to build the windows images on the linux
runner and move them to dockerhub, then uses docker manifest to
create the manifest as usual.

In a quick reversal of a previous commit, we actually HAVE to put
all the images as "sonobuoy" and have the different arch/os use
different tags instead of different image names. This has to do with
how the manifest command looks up images and prevents issues wsuch as
"manifest not found" even though it does (as a differently named image
from the others).

Fixes #732

Signed-off-by: John Schnake <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement New or improved functionality lifecycle/active Actively being worked on p2-moderate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants