Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebase to upstream 0.44.0 #189

Merged
merged 56 commits into from
Apr 12, 2021

Conversation

cybertron
Copy link
Member

@cybertron cybertron commented Apr 9, 2021

CoreDNS v1.8.1 rebase process

Inspired by the commit description for openshift/coredns#52

Identify carry commits:

git log --oneline --no-merges v0.44.0..openshift/master

Note that the UPSTREAM: <carry>: openshift: prefix was added to any new commits to carry since the previous rebase. This will make new carry commits easy to identify alongside prior carry commits.

After identifying the carry commits, the next step is to create the new commit-tree that will be used for the rebase and then cherry pick the carry commits into the new branch.

The following commands cover these steps:

Process

$ git remote update # make sure we update our refs
$ git checkout v0.44.0 # upstream kubernetes-nmstate 0.44.0 commit tag
$ git checkout -b merge-tmp # create a branch to do our merge work from
$ git checkout openshift/master # we want to be at the tip of the openshift master branch when we run the next command
$ echo 'merge nmstate/kubernetes-nmstate v0.44.0' | git commit-tree merge-tmp^{tree} -p HEAD -p merge-tmp -F -
deadbeef12345678 # id of new merge commit, output by the previous command
$ git branch merge-0.44.0 deadbeef12345678 # create a new branch for the cherry-pick work
$ git checkout merge-0.44.0 # make sure we are on the proper branch
$ git cherry-pick <carry commits>

My 0.44.0 merge commit from the above process is (9e19439).

With the merge-0.44.0 branch in place, I cherry picked the carry commits. Merge conflicts were resolved on an individual commit basis when necessary. Some carry commits were squashed to reduce the number of cherry-picks in future rebases. If there are any questions there please let me know.

Carried Commits (applied bottom to top / oldest first):

29502d99 UPSTREAM: <carry>: Add manifests for 4.8
8dfe9853 UPSTREAM: <carry>: Update Dockerfile images to match ART
c016e6fe UPSTREAM: <carry>: Add manifests for 4.7
09d2d1d2 UPSTREAM: <carry>: Add midstream approvers and reviewers

Pre-Squash Commits (all of these were squashed into one of the above commits):

c00bb43c Add manifests for 4.8
44b5b2ad Fix the example NMState instance in the manifest
bacad25b UPSTREAM: <carry>: Updating openshift-kubernetes-nmstate-handler builder & base images to be consistent with ART Reconciling with https://github.com/openshift/ocp-build-data/tree/b0ab44b419faae6b18e639e780a1fa50a1df8521/images/openshift-kubernetes-nmstate-handler.yml
cdf30ce4 UPSTREAM: <carry>: Updating openshift-kubernetes-nmstate-operator builder & base images to be consistent with ART Reconciling with https://github.com/openshift/ocp-build-data/tree/b0ab44b419faae6b18e639e780a1fa50a1df8521/images/openshift-kubernetes-nmstate-operator.yml
63becd0d Change example name to nmstate, and add nodeSelector
12614732 UPSTREAM: <carry>: Updating openshift-kubernetes-nmstate-handler builder & base images to be consistent with ART Reconciling with https://github.com/openshift/ocp-build-data/tree/d29c8eea8eaa4e66c0af4a855f5a081d1312aeff/images/openshift-kubernetes-nmstate-handler.yml
332a696f UPSTREAM: <carry>: Updating openshift-kubernetes-nmstate-operator builder & base images to be consistent with ART Reconciling with https://github.com/openshift/ocp-build-data/tree/d29c8eea8eaa4e66c0af4a855f5a081d1312aeff/images/openshift-kubernetes-nmstate-operator.yml
5fe061c6 [CARRY] Move the nmstate-handler sa to the correct section
94fa6344 [CARRY] Add cluster rules for nmstate-handler sa
849bd7d8 [CARRY] Remove RELATED_IMAGE_ from the handler image env var
4abf2dae [CARRY] Add securitycontextconstraints to CSV
0ba57572 [CARRY] Correct the env var used for the handler image
182e4cad Remove duplicated 'certified' key in CSV
5f198e96 [CARRY] Make operator bundle for OLM
ae739e3c [CARRY] Add midstream approvers and reviewers

qinqon and others added 30 commits November 25, 2020 14:42
The prow publish job [1] is on commiting since there is no git user email
configured, this change do similar to kubevirtci [2] and adds the
kubevirtbot name and email.

[1] https://storage.googleapis.com/kubevirt-prow/logs/publish-kubernetes-nmstate/1331559224613801984/build-log.txt
[2] https://github.com/kubevirt/kubevirtci/blob/master/publish.sh#L29-L30

Signed-off-by: Quique Llorente <[email protected]>
Operator-sdk and opm are necessary for dealing with bundle files, and
testing those bundle files. Let's vendor them so we can pin the
versions used.

Signed-off-by: Brad P. Crochet <[email protected]>
* Roll out policy configuration sequentially

Policies are by default applied sequentially, one node at a time.
Introduced new API field for NNCP: Parallel, which can be set to true
to roll out configuration in parallel.

Signed-off-by: Radim Hrazdil <[email protected]>

* Address review commits round1

Signed-off-by: Radim Hrazdil <[email protected]>

* Set parallel od test-policy nncp based on env variable

Signed-off-by: Radim Hrazdil <[email protected]>

* Test enactments are progressing concurrently when nncp parallel is true

Signed-off-by: Radim Hrazdil <[email protected]>

* Add skip flag for parallel test when parallel is not set to true

Signed-off-by: Radim Hrazdil <[email protected]>
Currectly kubernetes-nmstate prints the network state every time it does
not find the default gw when the probe is running, clearly this is too
verbose. To fix that this change in case of "-v=production" just print
the error and the gjson path used to find the default gw, in case of
"-v=debug" it also prints the network state.

Signed-off-by: Quique Llorente <[email protected]>
* ci, Change ci Makefile commands use KUBECONFIG and OPERATOR_NAMESPACE

Currently test-e2e-handler and test-e2e-operator use explicit kubeconfig of the
knmstate repo. In order to run tier1 tests, we want to be able to use KUBECONFIG
set by external cluster.
We also add OPERATOR_NAMESPACE to ecplicitly mention that it is used in the tests.
Added OPERATOR_NAMESPACE var to makefile targets commands
Added KUBECONFIG var instead of explicit kubeconfig in ci makefile targets

Signed-off-by: Ram Lavi <[email protected]>

* ci, Remove usage of namespace in cluster scoped objects

Currently we are using the namespace to find NodeNetworkState object
This is wrong, since this object is cluster scoped.
Removed the usage of the namespac when searching of NodeNetworkState objects

Signed-off-by: Ram Lavi <[email protected]>
The kubernetes e2e test reporter was activated for operator tests but
not for handler.

Signed-off-by: Quique Llorente <[email protected]>
Current kubernetes-nmstate version does not Reconcile already populated NNCPs
if Nodes are added to the cluster or node labels are modified. This PR force
Reconcile of all policies if a node is create or updated.

Since Reconcile functions already take care to checking node labels we choose
to simplify the implementation here just checking for Node Update not checking
if the update is for labels or if the labels are for this node, nmstate is declarative
so if if same version is re-apply it should not be a noop.

Signed-off-by: Quique Llorente <[email protected]>
* tests: Show NNCE if NNCP is not matching

If a NNCP is not matching in the e2e tests the Reason and the Message is
missing from the output and also the NNCEs are not presented. This
change uses gomega gstruct matcher so we can have fuzzy match on Reason
and Message and still it will show it them and also add the NNCEs in
case of not matching.

Signed-off-by: Quique Llorente <[email protected]>

* tests: Initialize correctly testenv

The OperatorNamespace variable was not initialized before kubernetes
reporter is created so it couldn't find pods to get the logs from.

Signed-off-by: Quique Llorente <[email protected]>
When rolling out nncp sequentially, if a node fails to apply configuration
other nodes do not attempt to apply it and set their enactment condition to
Aborted.

Signed-off-by: Radim Hrazdil <[email protected]>
* Bump nmstate to 0.3

Looks like nmstate 0.2 is going to approach EOL so we bump to next minor
it also depends on NM 1.26. We are not going to bump kubecirtci node NM
since we have to check that this is retrocompatible.

Signed-off-by: Quique Llorente <[email protected]>

* Don't dump k8s pod logs at test output

After fix the k8s reporter a part from dumping pod logs to a file it
also dump it to GinkgoWriter, the GinkgoWriter show the content written
with it when the test fail. Dumping this to logs makes the output too
long and we already have it at the file, this change remove that and
make the reporter just dump to a file.

Signed-off-by: Quique Llorente <[email protected]>

* Print generated bridge1 and bond1 names

To reduce the collision between test in case tear down is broken we
generate a different name for bridge1 and bond1 variable per test, for
debugging purposes is good to know what this names are per test so you
can see at what test they were generated.

Signed-off-by: Quique Llorente <[email protected]>

* Install tools only at go.mod changes

After some changes at master the go install command to install the tools
is run everytime a make target that need the tools is executed, this
change leave as it was, creating a dependency between the binary and
go.mod.

Signed-off-by: Quique Llorente <[email protected]>

* Adapt thest to nmstate-0.3 checks

After bumping to nmstate-0.3 a lot o test were failing with good reason,
they have bugs at tear-down code and some bridge/bonds where not
propertly deleted, this make nmstate-0.3 fail when another bridge/bond
is trying to use as slave a nic that already belongs to another
bridge/bond. To fix all this now the tests are very strict too and the
network configuration state check is done at all nodes not only at
workers since some of our tests are run at master nodes too (examples,
node label).

Also there is a bug a trace that prints a big and ugly error log related of using
WithValue instead of with name WithName at allPolicies function.

Signed-off-by: Quique Llorente <[email protected]>

* Add sleep after changing node labels at tests

The NNCPs are reconciled if node labels are changed, but there is no
webhook that blocks the operation until the NNCP status is reset, that's
make some times fall into a wrong assumptions around status. This change
add some sleep time after changing labels with a TODO about removing it
when webhook is implemented.

Signed-off-by: Quique Llorente <[email protected]>
Running 'make format' changes logo files, adding grep
filter to ignore these.

Signed-off-by: Radim Hrazdil <[email protected]>
k8s release-notes cmd has change the argument name from --format to
--go-template, this change aligns the infra with that.

Signed-off-by: Quique Llorente <[email protected]>
Future versions of nmstate/nm need to be tested to ensure that they
don't introduce something really incompatible with kubernetes-nmstate to
implement that a new "knob" is added to Makefile in the form of a
NMSTATE_PIN variable that will use future versions if content is
"future".

Signed-off-by: Quique Llorente <[email protected]>
…e#656)

The condition contains ! (negation) which should not be there
and as a consequence incorrect error is returned

Signed-off-by: Radim Hrazdil <[email protected]>
Signed-off-by: Tareq Alayan <[email protected]>

Co-authored-by: Tareq Alayan <[email protected]>
Due to a BZ https://bugzilla.redhat.com/show_bug.cgi?id=1906307
temporarily removing second slave iface from the bond.
When resloved, will add it back.

Signed-off-by: Radim Hrazdil <[email protected]>
The node controller refresh NNSs every 5 seconds at every Reconcile it
do three things, GET Node, call nmstatectl show and UPDATE NNS, this is
too much of apiserver traffic and end up with big cpu usage. This change
reduce the calls to GET and UPDATE by comparing current network state
with last retrieved network state and if they differ then it do them.

Signed-off-by: Quique Llorente <[email protected]>
…#668)

* Print NNS test currentState as string

Comparin directly the NNS Status struct print currentState as slice of
bytes that does not help to find differences, this change tranform the
obtained and expected to strings to make easier to find differences.

Signed-off-by: Quique Llorente <[email protected]>

* Remove gc-timer from e2e tests check

When NNS current is compared the dynamic attributes has to be removed
for it to match like is already dont at controller.

Signed-off-by: Quique Llorente <[email protected]>
Test that try to create the test fail will fail if it's not deleted at
all specs as teardown part.

Signed-off-by: Quique Llorente <[email protected]>
Listen 3.2.1 doesn't support ruby version >= 3.0

Signed-off-by: Radim Hrazdil <[email protected]>
Bumps [nokogiri](https://github.com/sparklemotion/nokogiri) from 1.10.10 to 1.11.1.
- [Release notes](https://github.com/sparklemotion/nokogiri/releases)
- [Changelog](https://github.com/sparklemotion/nokogiri/blob/master/CHANGELOG.md)
- [Commits](sparklemotion/nokogiri@v1.10.10...v1.11.1)

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
The NodeNeworkState is showing up calico interfaces but they are calico
implementation tails and kubernets-nmstate should not show or modify
them, also looks like it affects e2e tests on calico systems [1].

This change just filter them same as we do with veth.

[1] https://prow.apps.ovirt.org/view/gcs/kubevirt-prow/pr-logs/pull/kubevirt_cluster-network-addons-operator/727/pull-e2e-cluster-network-addons-operator-nmstate-functests/1347188092745814016

Signed-off-by: Quique Llorente <[email protected]>
* Replace "slave" with "port" at e2e tests

Future version of nmstate is replacing some degrading terms like slave
with more neutral ones. This change replaace slave with ports to make
tests compatible with new version of nmstate.

Signed-off-by: Quique Llorente <[email protected]>

* Expec miimon as integer value instead of string

Signed-off-by: Quique Llorente <[email protected]>

* Parameterize 'port' field name

Signed-off-by: Quique Llorente <[email protected]>

* Parameterize 'miimon' value

Signed-off-by: Quique Llorente <[email protected]>
Added a validation webhook that denies updating a policy when this
policy is still in progress.

Signed-off-by: Radim Hrazdil <[email protected]>
Trying to refresh NNS every 5 seconds makes nmstatectl to call python
everytime consuming CPU resources, decreasing the frequency will reduce
CPU consumption.

Also increasing the refresh period to 1 minute make CI fail on timeout,
to overcome that we force NNS refresh if a NNCP is applied.

And filter gc-timer and hello-timer hey are already not reflecting
reality since we remove them before
compare them so the value that is present at NNS is not accurate, let's
just remove them from NNS so we can use FilterOut to compare states too,
so we don't compare filtered stuff like veth or calico.

Proper solution is to use varlink so python is already started up at
different container, this is done at nmstate#663.

Signed-off-by: Quique Llorente <[email protected]>
Refreshing NNS at the same time on all the nodes can hit apiserver hard,
let's jitter it a little so refreshing is distributed over time.

Signed-off-by: Quique Llorente <[email protected]>
We have found that at some envs the "bridge" section from linux-bridge
is not present in case the bridge is down, this change ignore the
filtering in this is the case to preveng a segmentation fault.

Signed-off-by: Quique Llorente <[email protected]>
qinqon and others added 22 commits February 24, 2021 16:59
If the KUBEVIRT_PROVIDER is not set at the Makefile the secondary nics
names are not known and test start to fail. This change add the default
KUBEVIRT_PROVIDER at the Makefile and remove it from the automation so
the Makefile is the one ruling about it.

Signed-off-by: Quique Llorente <[email protected]>
nmstate 0.3 introduces a bug where all veths attached to nmstatectl
configured bridge turn into "managed" by NetworkManager. This was not
the case in 0.2.

Due to this regression, NetworkManager deliberatelly detaches veth
ifaces from the bridge and by doing that it disconnectecs Pods/VMs from
the network.

With this change, we explicitly set veth interfaces as unmanaged.

Important: With this change, it is no longer possible to change
configuration of bridges that have veths attached.

Signed-off-by: Petr Horáček <[email protected]>
* Bump kaw to v0.13.0

The certificate overlap mechanism is implemented like renewBefore from
cert-manager.

Signed-off-by: Quique Llorente <[email protected]>

* Correct certificate rotation overlap values

The kube-admission-webhook v0.13.0 use CertOverlapInterval as the moment
when certificates has to start to rotate before the current certificate
expiration time, this change reduce CertOverlapInterval and CARotateInterval
to take that into account.

Signed-off-by: Quique Llorente <[email protected]>
* state: Interpret basic state through an explicit structure

In order to ease the interpretation of the state yaml data, use a type
structure that represents the basic root items of state:
- interfaces
- routes:
  - config
  - running

In order to keep the generic nature of these items content,
they use as value a generic Go interface{}.

The filter helper functions, with this change, have been also made
immutable. They now return the filtered values.

Signed-off-by: Edward Haas <[email protected]>

* state, filter: Introduce unit tests for numeric iface names

Interfaces names with numeric values are accepted by the kernel.
Introduce tests that checks the behavior when using numeric names.

Signed-off-by: Edward Haas <[email protected]>

* state, filter: Do not panic on interpretation of ifaces names

In scenarios where an interface name is composed of numbers in
scientific notation without a dot, the application panics with a failure
to extract a string type from a Go interface (which is actually a float64).

In order to fix the problem, the parsing of the interface name is performed
using a dedicated interface-state structure.

Note: this change does not solve the problem of incorrectly representing
valid scientific numeric values like `10e+02` as strings (they are now
represented back as full integers: "1000").
This is due to the YAML-to-JSON conversion which does not obey to
the defined member type.

ref: yaml/pyyaml#173

Signed-off-by: Edward Haas <[email protected]>

* state, filter: Fix interpretation of scientific numeric iface names

This change solves the problem of incorrectly representing
valid scientific numeric valuesi without a dot in them (e.g. `10e20`)
as strings (e.g. represented back as `1000`).

The problem originates from the YAML-to-JSON conversion which does not obey
to the defined member type.

Fixed by using the go-yaml package (which does not perform such a
conversion from YAML to JSON).
The go-yaml is used just to unmarshal the name from the original raw
byte stream and update the state structure with the proper name string.

Signed-off-by: Edward Haas <[email protected]>
* Bump kaw v0.14.0 and operator-sdk v1.4.2

The webhookconfiguration v1alpha1 is going to be deprecated at
kubernetes 1.22. This change bumps operator-sdk and
kube-admission-webhook to support webhookconfiguration v1

Signed-off-by: Quique Llorente <[email protected]>

* Adapt code to new operator-sdk and kaw versions

After bumping operator-sdk to 1.4.2 the Reconcile function expects a
context.Context and the Meta at events is included in the client.Object
passed to it.

Signed-off-by: Quique Llorente <[email protected]>

* Don't use .gitignore at automation rsync

After bumping deps some of the .gitignore in the project is making rsync ignore
"vendor/github.com/moby/term/.gitignore" and this breaks "make
check-gen"

Signed-off-by: Quique Llorente <[email protected]>
The network configuration at nodes is chaning sometimes without human
intervention (at kubevirtci some ipv6 routes disappear sometimes or
appear sometimes). This change do a best effort and checks nns update
timestamp according to currentState changed or not.

Signed-off-by: Quique Llorente <[email protected]>
Currently, the operator will only reconcile an nmstate resource that
has a name 'nmstate'. This patch makes it possible to add the resource
as any name, but subsequent resources added will be ignored.

Signed-off-by: Brad P. Crochet <[email protected]>
* sync vendor

Signed-off-by: Radim Hrazdil <[email protected]>

* Change spec.Parallel field with spec.MaxUnavailable

Change parallel field with maxUnavailable, which allows
more granular configuration for concurrent policy
configuration.

MaxUnavailable can be either set with an int value or
percentage in string:

spec:
  maxUnavailable: "50%"

or

spec:
  maxUnavailable: 3

Updated used guide and added an example policy
that specifies maxUnavailable.

Signed-off-by: Radim Hrazdil <[email protected]>

* Update tests that check conditions

Update test that check nnce conditions to tolerate both
failing and aborted conditions when invalid nncp is created

Signed-off-by: Radim Hrazdil <[email protected]>
Instead of calculating number of maxUnavailable nodes from
all nmstate enabled nodes, use matching nodes only.

Signed-off-by: Radim Hrazdil <[email protected]>
Keep owners we latest review comments.

Signed-off-by: Quique Llorente <[email protected]>
Currectly the cert-manager from kube-admission-webhook is running
whithing the webhook server itself, this means that we need to create a
placeholder secret and also a very specific readiness probe has to be
used to ensure that secret is properly rotated at start up. This change
introduces a new pod runing the cert-manager code so we don't need to
create the placeholder secret (the webhook pod will not start until the
secret is not created) and we don't need a readiness probess since
secret will be correct form the beginning.

Also start up is faster since webhook pods start at the very moment the
secret is created by cert-manager pod.

Signed-off-by: Quique Llorente <[email protected]>
* Consolidate parallel and sequential lanes

Run the e2e handler test lane only once, configured
to use default 50% of worker nodes to process
NNCPs concurrently.

For that reason, setting number of nodes to 5
so that 2 nodes run concurrently (4/2 == 2)

Signed-off-by: Radim Hrazdil <[email protected]>

* Fix test conditions

Now when a failing NNCP is applied, we accept NNCEs in two
states: aborted and failed.

Signed-off-by: Radim Hrazdil <[email protected]>
Radim has being contributing a lot lately so is fair to make him
maintainer.

Signed-off-by: Quique Llorente <[email protected]>
Although the mounted secret is good from the beginning, webhook will be
only ready when webhook server is started up.

Signed-off-by: Quique Llorente <[email protected]>
test_logs directory is created without x permissions
so it's not possible to cd into it.

Signed-off-by: Radim Hrazdil <[email protected]>
Bumps [kramdown](https://github.com/gettalong/kramdown) from 2.3.0 to 2.3.1.
- [Release notes](https://github.com/gettalong/kramdown/releases)
- [Changelog](https://github.com/gettalong/kramdown/blob/master/doc/news.page)
- [Commits](https://github.com/gettalong/kramdown/commits)

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
The internal Networking team needs to be able to approve and review.

Signed-off-by: Brad P. Crochet <[email protected]>
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 9, 2021
@bcrochet
Copy link
Member

/lgtm
/approve

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 12, 2021
@openshift-merge-robot openshift-merge-robot merged commit 1678a36 into openshift:master Apr 12, 2021
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bcrochet, cybertron

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.