Skip to content
This repository has been archived by the owner on Apr 25, 2023. It is now read-only.

Enable CI testing with multiple clusters using kind #428

Merged
merged 3 commits into from
Nov 20, 2018

Conversation

font
Copy link
Contributor

@font font commented Nov 17, 2018

This adds support for running CI with multiple clusters using the new tool kind(kubernetes in docker). You can also run the pre-commit.sh script locally as well. The number of clusters is 2 by default, but you can override this if you'd like to test with a different number.

There are a few kind workarounds that are implemented in this PR until these issues are resolved:
kubernetes-sigs/kind#110
kubernetes-sigs/kind#111
kubernetes-sigs/kind#112
kubernetes-sigs/kind#113

Also, adding a versioned release for kind is still WIP (PR is at kubernetes-sigs/kind#81). For now we have to rely on a git sha.

Lastly, this adds ~3 minutes to the CI job run time. There is perhaps some savings by not having to launch a minikube cluster anymore, albeit with some kind workarounds for the time being, but this adds a crucial extra cluster to each test.

Next will be to add docs for a dev workflow.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 17, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: font

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 17, 2018
@k8s-ci-robot k8s-ci-robot requested a review from a user November 17, 2018 04:03
.travis.yml Outdated
# Install must be set to prevent default `go get` to run.
# The dependencies have already been vendored by `dep` so
# we don't need to fetch them.
install:
-

script:
- bash -x ./scripts/pre-commit.sh
- bash -c "DEBUG=y CONFIGURE_INSECURE_REGISTRY=y ./scripts/pre-commit.sh"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(No action required) If the only point of DEBUG is to enable -x, can it be discontinued in favor of calling the script with -x?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, agreed. I had opted for this as I was having difficulty passing a variable and invoking with -x because when using -x the next argument must be an executable bash file. I've addressed that now.

@marun
Copy link
Contributor

marun commented Nov 19, 2018

Awesome sauce, multiple clusters at last!!

I figure it's worth restarting the job a few times to confirm that it's stable. The code LGTM, and in any case it looks likely that the added code will evolve as features and bugs are resolved in the kind repo.

@shashidharatd
Copy link
Contributor

This LGTM. Excited to merge this soon. Also look forward to stable release of kind. I did give a try by locally running the CI script pre-commit.sh and had an issue with image push to local registry

Successfully built ce06ba4d9eee
Successfully tagged 172.17.0.1:5000/federation-v2:e2e
The push refers to repository [172.17.0.1:5000/federation-v2]
Get http://172.17.0.1:5000/v2/: dial tcp 172.17.0.1:5000: getsockopt: connection refused

None the less, CI seems to be successful all the time. So will figure out the issue in my dev env

Thanks @font!

@font
Copy link
Contributor Author

font commented Nov 20, 2018

Successfully built ce06ba4d9eee
Successfully tagged 172.17.0.1:5000/federation-v2:e2e
The push refers to repository [172.17.0.1:5000/federation-v2]
Get http://172.17.0.1:5000/v2/: dial tcp 172.17.0.1:5000: getsockopt: connection refused

None the less, CI seems to be successful all the time. So will figure out the issue in my dev env

@shashidharatd I believe you're hitting that because the docker daemon on your host needs to be configured with the insecure registry. To do that, have you tried setting CONFIGURE_INSECURE_REGISTRY=y before launching the script locally? That's how the script is run in travis. Basically, assuming that's the IP address of your docker bridge interface docker0, the docker daemon will be configured with that insecure registry in /etc/docker/daemon.json. Then the script reloads dockerd by sending it the SIGHUP signal. This should only ever need to be done once and afterwards you no longer need to CONFIGURE_INSECURE_REGISTRY. I am adding some docs around this workflow that should also help.

@shashidharatd
Copy link
Contributor

@font, there could be some corner cases while running in dev environment, which we could eventually fix. Like couple of them i faced.

  1. docker daemon was down after reboot. the reason is we are specifying insecure registry through flags as well as file:
Nov 20 15:40:08 shashi-laptop dockerd[932]: unable to configure the Docker daemon with file /etc/docker/daemon.json: the following directives are specified both as a flag and in the configuration file: insecure-registries: (from flag: [172.17.0.1:5000], from file: [172.17.0.1:5000])
Nov 20 15:40:08 shashi-laptop systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Nov 20 15:40:08 shashi-laptop systemd[1]: docker.service: Failed with result 'exit-code'.
Nov 20 15:40:59 shashi-laptop dockerd[2719]: unable to configure the Docker daemon with file /etc/docker/daemon.json: the following directives are specified both as a flag and in the configuration file: insecure-registries: (from flag: [172.17.0.1:5000], from file: [172.17.0.1:5000])
  1. Failed to delete the 2nd kind cluster although deleting the first cluster was fine. Maybe there is a bug in kind.

@shashidharatd
Copy link
Contributor

CI environment is good and so lets merge this and deal with dev environment issues and documentation in subsequent pr's.
/lgtm
Thanks @font !

@k8s-ci-robot k8s-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 20, 2018
@k8s-ci-robot k8s-ci-robot merged commit fbeb310 into kubernetes-retired:master Nov 20, 2018
@BenTheElder
Copy link

Very cool!

Apologies for the issues in kind, I've been reading back through @font's issues filed there and looking into options for resolving them.

I'm also aiming to eventually get to a place where we have proper releases and strictly require compatibility with future changes, but we're definitely not there yet. There will probably be a few more sweeping changes to enable multi-node support and improve logging in particular.

@font
Copy link
Contributor Author

font commented Nov 21, 2018

@BenTheElder Not a problem. Thanks for all the great work to create kind! Looking forward to the future updates.

We are currently versioning using a specific git sha, so hopefully that should allow us to keep using kind even as future compatibility changes are made.

@BenTheElder
Copy link

Very happy to hear that, especially pinning a specific sha, discussing getting a release out the door as well with @munnerz 👍

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants