Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubelet: enable crio runtime #235

Closed

Conversation

rphillips
Copy link
Contributor

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 11, 2018
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: rphillips
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: wking

If they are not already assigned, you can assign the PR to them by writing /assign @wking in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Sep 11, 2018
@@ -11,6 +11,9 @@ ExecStart=/usr/bin/hyperkube \
--bootstrap-kubeconfig=/etc/kubernetes/kubeconfig \
--kubeconfig=/var/lib/kubelet/kubeconfig \
--rotate-certificates \
--container-runtime=remote \
--container-runtime-endpoint=unix:///var/run/crio/crio.sock \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would also need to align with a change in the pod-checkpointer daemonset: https://github.com/kubernetes-incubator/bootkube/blob/master/cmd/checkpoint/main.go#L21

And if tests pass without this change - I'd be worried (and we should make sure checkpoint tests are actually running.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I added the command line arg here.

@rphillips rphillips force-pushed the fixes/change_to_crio_runtime2 branch from cd0dc2d to 91cc901 Compare September 12, 2018 15:15
@wking
Copy link
Member

wking commented Sep 12, 2018

Also related to openshift/machine-config-operator#52.

@abhinavdahiya
Copy link
Contributor

abhinavdahiya commented Sep 13, 2018

Trying to test CRI-O:

We use following flags on kubelet right now:

 --cni-conf-dir=/etc/kubernetes/cni/net.d \
 --cni-bin-dir=/var/lib/cni/bin \

And our networking setup using network operator

  • puts the 10-flannel.conflist to /etc/kubernetes/cni/net.d
  • puts cni binaries to /var/lib/cni/bin

But when i switch on cri-o with kubelet

# The "crio.network" table contains settings pertaining to the
# management of CNI plugins.
[crio.network]

# network_dir is is where CNI network configuration
# files are stored.
network_dir = "/etc/cni/net.d/"

# plugin_dir is is where CNI plugin binaries are stored.
plugin_dir = "/usr/libexec/cni"

There seems to be a mismatch and pods come up with wrong networking...

4: cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 42:fa:29:92:56:47 brd ff:ff:ff:ff:ff:ff
    inet 10.88.0.1/16 scope global cni0
       valid_lft forever preferred_lft forever
    inet6 fe80::40fa:29ff:fe92:5647/64 scope link
       valid_lft forever preferred_lft forever
18: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
    link/ether f2:d6:e5:f2:11:06 brd ff:ff:ff:ff:ff:ff
    inet 10.2.1.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever
    inet6 fe80::f0d6:e5ff:fef2:1106/64 scope link
       valid_lft forever preferred_lft forever

@eparis @ashcrow @aaronlevy

@aaronlevy
Copy link
Contributor

aaronlevy commented Sep 13, 2018

There are some side discussions ongoing - but my general position regarding CNI plugins/configuration:

  • If the OS needs to ship with a default set of read-only CNI plugin binaries / configuration, that's fine - but this should be assumed to be for non-openshift uses (e.g. running podman by hand).

  • All OpenShift CNI binaries / CNI configuration should be distributed / controlled by the network-operator / be coming from inside the cluster.

So hypothetically the OS ships with runtime configured to use:
plugin_dir = /usr/libexec/bin
network_dir = /etc/cni/net.d

However, for OpenShift clusters, we provide kubelet configuration that sets those locations to different locations (just as example):
plugin_dir = /etc/kubernetes/cni/bin
network_dir = /etc/kubernetes/cni/net.d

And both of those dirs are empty until a network daemonset has been deployed (which sets up plugins/configuration).

@abhinavdahiya
Copy link
Contributor

In an out-of-band discussion with @sjenning @mrunalp , the move forward:

  1. CRI-O on RHCOS defaults to using /etc/cni/net.d/ and /opt/cni/bin
  2. Network Operator requires that /etc/cni/net.d/ and /opt/cni/bin be empty;
  3. Network Operator drop its configuration to /etc/cni/net.d/ and /opt/cni/bin
    RHCOS makes sure number 1 and 2 are met

@sjenning
Copy link
Contributor

sjenning commented Sep 13, 2018

in parallel @mrunalp team is getting RHCOS compose to use a faster moving repo for cri-o and podman packages that will have crio.conf values that match the kubelet defaults (/etc/cni/net.d/ and /opt/cni/bin)

@@ -11,6 +11,9 @@ ExecStart=/usr/bin/hyperkube \
--bootstrap-kubeconfig=/etc/kubernetes/kubeconfig \
--kubeconfig=/var/lib/kubelet/kubeconfig \
--rotate-certificates \
--container-runtime=remote \
--container-runtime-endpoint=unix:///var/run/crio/crio.sock \
--runtime-request-timeout=10m \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Motivation for this setting is here.

@ashcrow
Copy link
Member

ashcrow commented Sep 14, 2018

@sjenning sjenning changed the title WIP: kubelet: enable crio runtime kubelet: enable crio runtime Sep 14, 2018
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 14, 2018
@ashcrow
Copy link
Member

ashcrow commented Sep 14, 2018

Changes look good.

@crawford
Copy link
Contributor

/retest

@abhinavdahiya
Copy link
Contributor

/hold

Networking needs work before merging

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 14, 2018
@openshift-ci-robot
Copy link
Contributor

@rphillips: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/e2e-aws-smoke 91cc901 link /test e2e-aws-smoke

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@sjenning
Copy link
Contributor

change happening in #251 now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants