-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Machine api mvp #119
Machine api mvp #119
Conversation
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still not completely familiar with the tf/ign and parts of the code. Though, left some comments and questions.
steps/assets/base/tectonic.tf
Outdated
@@ -50,6 +50,7 @@ module "bootkube" { | |||
service_account_private_key_pem = "${local.service_account_private_key_pem}" | |||
|
|||
etcd_endpoints = "${data.template_file.etcd_hostname_list.*.rendered}" | |||
worker_ign_config = "${file("worker.ign")}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who/what creates the worker.ign
file? What is inside the file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, most likely generateIgnConfigStep
is responsible for that, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct :)
- name: machine-api-operator | ||
image: quay.io/alberto_lamela/machine-api-operator:a661677 # TODO: move this to openshift org | ||
command: | ||
- "/machine-api-operator" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the rule to put all the binaries under /
? I noticed his pattern is used on multiple places (even in cluster-api upstream repo). IMHO, whenever possible I prefer /usr/bin/
or /bin
. It's more intuitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the convention is that all tectonic operator binaries are built under the image's root folder.
modules/bootkube/manifests.tf
Outdated
@@ -17,6 +17,9 @@ variable "manifest_names" { | |||
"tectonic-node-controller-operator.yaml", | |||
"tnc-tls-secret.yaml", | |||
"ign-config.yaml", | |||
"app-version-mao.yaml", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mao
is not so far from Mao
. What about mao
-> machine-api-operator
. It's not so long even with the additional chars.
/retest |
2 similar comments
/retest |
/retest |
Getting:
@yifan-gu @smarterclayton would you be able to help to get this green? is there any way we can look at the kube logs to troubleshoot |
You'd probably have to SSH in to the pod or host while it's running. We
need to capture cluster logs in failure cases but we're trying to use the
cluster APIs to get all those.
However, in the case you're looking at it looks like the control plane came
up but then some of the control plane components failed. Let me try to get
that scenario handled (we have kubeconfig but the cluster isn't healthy).
…On Tue, Aug 14, 2018 at 9:34 AM OpenShift CI Robot ***@***.***> wrote:
@enxebre <https://github.com/enxebre>: The following test *failed*, say
/retest to rerun them all:
Test name Commit Details Rerun command
ci/prow/e2e-aws d4738f2
<d4738f2>
link
<https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_installer/119/pull-ci-origin-installer-e2e-aws/447/> /test
e2e-aws
Full PR test history
<https://openshift-gce-devel.appspot.com/pr/openshift_installer/119>. Your
PR dashboard <https://openshift-gce-devel.appspot.com/pr/enxebre>. Please
help us cut down on flakes by linking to
<https://github.com/kubernetes/community/blob/master/contributors/devel/flaky-tests.md#filing-issues-for-flaky-tests>
an open issue
<https://github.com/openshift/installer/issues?q=is:issue+is:open> when
you hit one in your PR.
Instructions for interacting with me using PR comments are available here
<https://git.k8s.io/community/contributors/guide/pull-requests.md>. If
you have questions or suggestions related to my behavior, please file an
issue against the kubernetes/test-infra
<https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:>
repository. I understand the commands that are listed here
<https://go.k8s.io/bot-commands>.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#119 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG_p2naE1LjIV8wvv3J9Chst2708pPpks5uQtHagaJpZM4V4dst>
.
|
I added openshift/release#1179 which should capture
these logs. Will kick your job as soon as I've merged and updated.
…On Tue, Aug 14, 2018 at 9:41 AM Clayton Coleman ***@***.***> wrote:
You'd probably have to SSH in to the pod or host while it's running. We
need to capture cluster logs in failure cases but we're trying to use the
cluster APIs to get all those.
However, in the case you're looking at it looks like the control plane
came up but then some of the control plane components failed. Let me try
to get that scenario handled (we have kubeconfig but the cluster isn't
healthy).
On Tue, Aug 14, 2018 at 9:34 AM OpenShift CI Robot <
***@***.***> wrote:
> @enxebre <https://github.com/enxebre>: The following test *failed*, say
> /retest to rerun them all:
> Test name Commit Details Rerun command
> ci/prow/e2e-aws d4738f2
> <d4738f2>
> link
> <https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_installer/119/pull-ci-origin-installer-e2e-aws/447/> /test
> e2e-aws
>
> Full PR test history
> <https://openshift-gce-devel.appspot.com/pr/openshift_installer/119>. Your
> PR dashboard <https://openshift-gce-devel.appspot.com/pr/enxebre>.
> Please help us cut down on flakes by linking to
> <https://github.com/kubernetes/community/blob/master/contributors/devel/flaky-tests.md#filing-issues-for-flaky-tests>
> an open issue
> <https://github.com/openshift/installer/issues?q=is:issue+is:open> when
> you hit one in your PR.
>
> Instructions for interacting with me using PR comments are available here
> <https://git.k8s.io/community/contributors/guide/pull-requests.md>. If
> you have questions or suggestions related to my behavior, please file an
> issue against the kubernetes/test-infra
> <https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:>
> repository. I understand the commands that are listed here
> <https://go.k8s.io/bot-commands>.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#119 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/ABG_p2naE1LjIV8wvv3J9Chst2708pPpks5uQtHagaJpZM4V4dst>
> .
>
|
/retest |
@smarterclayton thanks for help! we need to rerun test to include @trawler fix openshift/machine-api-operator@d84ba81 This PR runs only the workers as a machineSet. The machine-api-operator and the cluster-api stack run on masters so it can be debugged getting the logs for the relevant pods when workers does not come up. |
The teardown scripts use the kuberentes API to find and retrieve logs - we grab all pods, all node logs, all events, and a number of other things without looking at AWS info. |
/test e2e-aws Failed due to something about reuse |
@@ -8,6 +8,7 @@ data: | |||
clusterName: ${cluster_name} | |||
clusterDomain: ${cluster_domain} | |||
region: ${region} | |||
image: ${image} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@enxebre Is there is path to upgrade the image when cluster updates?
Looking better now. Jenkins test actually passed, how ever it fails to destroy as workers are not under tf control now |
/test build-tarball |
All test are passing now. Jenkins ones show red because terraform is failing to destroy as it does not know how to handle the machines created by the machineSet. We could add kubectl here https://github.com/openshift/installer/blob/master/images/tectonic-smoke-test-env/Dockerfile and |
I wonder why |
Jenkins fails with account limit error:
|
retest this please |
1 similar comment
retest this please |
@enxebre I found that this PR (as it is now) breaks the behavior of the |
/test unit |
/test e2e-aws |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: crawford, enxebre The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold cancel |
@@ -160,3 +160,9 @@ variable "pull_secret" { | |||
type = "string" | |||
description = "Your pull secret. Obtain this from your Tectonic Account: https://account.coreos.com." | |||
} | |||
|
|||
variable "worker_ign_config" { | |||
description = "Worker ignition config" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this temporary? I'd have expected the machine API operator to manage worker ignition configs on its own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not temporary to my knowledge. I'm not sure why the machine-api-operator would manage the contents of the ignition configs. The configs are just an input to be passed along to the MachineSets
. If anything, I would expect the machine-config-operator (this is getting confusing...) to manage the ignition configs and provide them in some way to the machine-api-operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If anything, I would expect the machine-config-operator (this is getting confusing...) to manage the ignition configs and provide them in some way to the machine-api-operator.
That makes sense to me. @abhinavdahiya?
… cluster-api Generated with: $ glide update --strip-vendor $ glide-vc --use-lock-file --no-tests --only-code $ bazel run //:gazelle using: $ glide --version (cd $GOPATH/src/github.com/Masterminds/glide && git describe) v0.13.1-7-g3e13fd1 $ (cd $GOPATH/src/github.com/sgotti/glide-vc && git describe) v0.1.0-2-g6ddf6ee $ bazel version Build label: 0.16.1- (@Non-Git) Build target: bazel-out/k8-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar Build time: Mon Aug 13 16:42:29 2018 (1534178549) Build timestamp: 1534178549 Build timestamp as int: 1534178549 The tectonic-node-controller removal catches us up with 596591b (.*: replace tectonic node controller with machine config operator, 2018-09-10, openshift#232). The cluster-api trim adjusts the content from b00e40e (vendor: Add client from sigs.k8s.io/cluster-api, 2018-09-04, openshift#119). Because cluster-api wasn't in glide.lock, I suspect neither glide nor glide-vc were run before that commit.
The last consumers for these were removed by 124ac35 (*: Use machine-api-operator to deploy worker nodes, 2018-09-04, openshift#119).
And the related, libvirt-specific, tectonic_libvirt_worker_ips. This simplifies the Terraform logic for AWS and OpenStack, and consistently pushes most worker setup into the cluster-API providers who are creating the workers since 124ac35 (*: Use machine-api-operator to deploy worker nodes, 2018-09-04, openshift#119).
And the related, libvirt-specific, tectonic_libvirt_worker_ips. This simplifies the Terraform logic for AWS and OpenStack, and consistently pushes most worker setup into the cluster-API providers who are creating the workers since 124ac35 (*: Use machine-api-operator to deploy worker nodes, 2018-09-04, openshift#119).
…improvements Improvements for the PowerVS asset/installconfig package
This PR provides a minimum functional integration for the installer with the machine API and machineSets.
It removes the terraform step for creating workers machines on AWS so after bootstrapping a cluster with
openshift install
the worker machines are managed by a machineSet object so we can have an early e2e test driven workflowFollow ups: