Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CFP-25694: Move cilium/cilium-cli code into cilium/cilium repository #9

Merged
merged 5 commits into from
Aug 16, 2024
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
123 changes: 123 additions & 0 deletions cilium/CFP-25694-move-cilium-cli-to-cilium-repo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
# CFP-25694: Move cilium/cilium-cli code into cilium/cilium repository

**Code Name: Mega Ketchup**

**SIG: [sig-cilium-cli]**

**Begin Design Discussion:** 2023-05-25

**Cilium Release:** 1.15

**Authors:** Michi Mutsuzaki `<[email protected]>`
michi-covalent marked this conversation as resolved.
Show resolved Hide resolved

## Summary

Move [cilium/cilium-cli] code into [cilium/cilium] repository.

## Motivation

The main motivation for moving [cilium/cilium-cli] code into [cilium/cilium]
repository is to minimize the feature development overhead for adding new tests
joestringer marked this conversation as resolved.
Show resolved Hide resolved
to `cilium connectivity test`.

Currently, adding a new test to `cilium connectivity test` requires the following
steps:

- Open a pull request against [cilium/cilium-cli] repo to add a new test, and
wait for the CI build to finish.
- Modify GitHub workflows files to pick up the CI version of `cilium-cli` in
your pull request against [cilium/cilium] repo.
- Once the new tests passes, undo the change in the workflow files.
- Get these pull requests reveiewed and merged.
michi-covalent marked this conversation as resolved.
Show resolved Hide resolved

Even after all these steps, the new test does not become a part of the regular
CI until the [sig-cilium-cli] team releases a new version of `cilium-cli` that
includes the test, and Renovate updates GitHub wokflow files to pick up the new
`cilium-cli` version. This has been a major source of frustration for some Cilium
contributors.
Comment on lines +35 to +39
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cilium-CLI for PRs uses a helm chart and docker images to deploy Cilium CLI into the cluster, then runs it from there right? Looking at this.

Could we model the cilium/cilium CI like this instead, and autopublish CLI images for every PR or every main commit... then provide a way for cilium/cilium PRs to point to those CLI dev images?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking a step back, I guess there's two cases, right? There's:

  1. Developing new e2e tests
  2. Updating Cilium to use the latest CLI

For (1), it's helpful to be able to point arbitrary cilium/cilium PRs towards arbitrary cilium/cilium-cli versions, so that you can co-develop the tests and functionality. Some background here, I would hope that 80%+ of the coverage is already provided by unit tests and module tests in cilium/cilium, so the test changes in cilium-CLI are for the remaining e2e & integration testing. Those tests should be able to autodetect whatever functionality they need in the cluster so that they can be skipped when running against older Cilium versions that don't have the requisite support. While we do need to integrate such tests into cilium/cilium CLI, I am not sure I necessarily see the need for this to be tightly coupled with the implementation that goes into cilium/cilium.

For (2), the existing process to release the new version and rely on renovate seems reasonable enough to me. If we think it needs to happen faster, then someone can post the PR directly instead of renovate, or otherwise we'll get to it when we get to it (and take advantage of the automation that removes some of the manual work there).

Copy link
Member

@joestringer joestringer Jun 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My recollection of the origins of Cilium CLI in general was that it's intended to focus on the user experience in order to simplify installing, observing, maintaining live Cilium clusters. Tools like cilium connectivity test were conceived as a sanity check for user environments so we can quickly see whether a prod environment is healthy or not.

Over time, given that this framework provided mechanisms to develop new tests without the baggage of ginkgo and jenkins, I think it became an attractive place for Cilium developers to expand testing. Now we're at several minutes to assess the cluster sanity, and it's doing a lot of advanced feature testing. We're talking about reconfiguring the target environment to perform some of this testing, and starting to hide such "extra test" options behind special flags to avoid impacting prod environments.

I have a bit of concern that over time we're losing sight of the original UX-focused goals of Cilium-CLI and turning it into the Cilium developer CI tool instead. On the way on that path, we'll simplify some Cilium developer workflows (ideal) but also potentially raise the barrier to entry for cilium-cli contribution (not ideal).

Should the tests for advanced Cilium functionality exist in cilium/cilium or cilium/cilium-cli? Well, probably cilium/cilium. Does that mean that all of cilium-cli should then go into cilium/cilium? I'm not so sure. I'd hope that most testing can be done in unit or module / cell level tests directly in the cilium/cilium tree, rather than being e2e. Even if we do need e2e tests for some of the new functionality, is the user-facing tool the right place to implement a lot of that advanced testing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we model the cilium/cilium CI like this instead, and autopublish CLI images for every PR or every main commit... then provide a way for cilium/cilium PRs to point to those CLI dev images?

we already do that, sort of. you can specify a commit SHA of cilium-cli here: https://github.com/cilium/cilium/blob/6d4b2f7092d634f67c6192c85143051c49846245/.github/workflows/conformance-aks.yaml#L74. you just need to edit these workflow files. i guess some people find this step more annoying than others.

I'd hope that most testing can be done in unit or module / cell level tests directly in the cilium/cilium tree, rather than being e2e. Even if we do need e2e tests for some of the new functionality, is the user-facing tool the right place to implement a lot of that advanced testing?

i think this is the key question. cilium connectivity test unintentionally became "the" place to add e2e tests. maybe that's where things started going wrong 🔥 cc @brb.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gandro had another idea: use cilium-cli's Hook interface https://github.com/cilium/cilium-cli/blob/main/internal/cli/cmd/hooks.go. this might make more sense if we want to keep cilium-cli relatively lean, and at the same time let cilium developers use the same cilium connectivity test framework to add advanced e2e tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then provide a way for cilium/cilium PRs to point to those CLI dev images?

I added this - cilium/cilium#25228. However, bunch of workflows # pull_request: {} need to be uncommented though to make changes effective.

Copy link
Member

@brb brb Jun 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cilium connectivity test unintentionally became "the" place to add e2e tests.

We started adding more tests to the CLI connectivity suite, because otherwise we would have reinvented the CLI connectivity suite for e2e tests (as part of the Hyper Jump project). There was an attempt to introduce the K8s E2E testing fw, but we quickly realized that it's an tremendous effort, and the CLI already provides all the required constructs. Also, the CLI connectivity suite was executed by different GH workflows running in different environments. So adding a new test meant that it will be executed e.g., on EKS.

To address the long runtime concern - we will add a flag to prevent from inclusion more advanced tests which take time. However, I find it useful that users have means to do an extensive testing in their clusters.

I'd hope that most testing can be done in unit or module / cell level tests directly in the cilium/cilium tree, rather than being e2e.

Unfortunately, it's not possible for many features which involve interactions between many subsystems. I doubt that having Egress GW tests in a form of unit / control-plane would give us enough confidence.

In the end I am fine with any approach re cilium-cli (i.e., merge or having a Hook), as longs as test files live in cilium/cilium, and thus get executed for each PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least once a week cilium/cilium would be synced with the most recent CLI revision

Unfortunately, everything which gets tested after a merge usually involves a few folks spending their cycles instead of a single PR owner.


## Goals

* Move [cilium/cilium-cli] code into [cilium/cilium] main branch under `cli/`
directory. This enables Cilium contributors to implement new features with
tests in a single pull request against [cilium/cilium] repository.
* Establish a backporting process for changes inside `cli/` directory.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of the backporting process?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are more details in "Impact: Backporting to stable branches" section. let me know if it makes sense 👀

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I missed the since in the middle there. I think the goal in that case is "Continue to test older Cilium versions against new versions of the CLI". The extra backporting overhead is a logical consequence of the proposal.

* Avoid disrupting the cilium-cli development and release processes as much as
possible.

## Non-Goals

* Rename the name of the `cilium-cli` binary from `cilium` to something else.
* Release `cilium-cli` as a part of Cilium releases.

## Proposal

This proposal consists of three steps to move [cilium/cilium-cli] code into
[cilium/cilium]:

### 1. Move [cilium/cilium-cli] code to [cilium/cilium] under `cli/` directory

I'm assuming that you want to preserve the Git history of [cilium/cilium-cli]. See
[michi-covalent/cilium] on how the code migration might look like. Squashing
these commits into a single commit is another option.

### 2. Make [cilium/cilium-cli] code a part of `github.com/cilium/cilium` Go module

Merge [cilium/cilium-cli] code into `github.com/cilium/cilium` Go module by
removing `cli/go.mod`. This allows [cilium/cilium-cli] code to depend on
[cilium/cilium] code in the same tree.

This is **not** to suggest `cilium-cli` to be tied to a specific `cilium`
version. The goal of this proposal is to make it easy for contributors to add
tests against the latest Cilium code. Continue to maintain the backward
compatibility of `cilium-cli` with respect to `cilium` versions.
joestringer marked this conversation as resolved.
Show resolved Hide resolved

### 3. Release `cilium-cli` from [cilium/cilium-cli] repository

Set up an automated process to periodically sync `cilium-cli` code under `cli/`
directory back to [cilium/cilium-cli] repo, similar to how Kubernetes syncs some
modules from [kubernetes/kubernetes staging directory] to module repos
([apimachinery] for example). Continue releasing `cilium-cli` from
[cilium/cilium-cli] at its own cadence using its own versioning.

See [Tobias' comment](https://github.com/cilium/design-cfps/pull/9#discussion_r1211996796)
for more details.

## Impacts / Key Questions

### Impact: Backporting to stable branches

Starting in `v1.15`, you need to backport bug fixes related to
`cilium connectivity test` command from `main` branch to affected stable
branches since these stable branches use their own copy of `cli/` to run
`cilium connectivity test`. Apply the same [backporting process] as the rest of
the Cilium codebase.

### Key Question: How do you run CI for CLI? More specifically, how do you ensure that changes to Cilium CLI remain compatible with older released versions of Cilium?

Here is the list of GitHub workflows that currently run for each pull requests in [cilium/cilium-cli]
using the latest version of Cilium:

- https://github.com/cilium/cilium-cli/blob/main/.github/workflows/aks-byocni.yaml
- https://github.com/cilium/cilium-cli/blob/main/.github/workflows/eks-tunnel.yaml
- https://github.com/cilium/cilium-cli/blob/main/.github/workflows/eks.yaml
- https://github.com/cilium/cilium-cli/blob/main/.github/workflows/externalworkloads.yaml
- https://github.com/cilium/cilium-cli/blob/main/.github/workflows/gke.yaml
- https://github.com/cilium/cilium-cli/blob/main/.github/workflows/kind.yaml
- https://github.com/cilium/cilium-cli/blob/main/.github/workflows/multicluster.yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which version of Cilium CLI will these tests use?

Note there may also be security considerations particularly for cloud credential cases. I believe we have mitigations in place for this threat model already in cilium-cli which we may need to import.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these tests build cilium-cli from the tree.


michi-covalent marked this conversation as resolved.
Show resolved Hide resolved
michi-covalent marked this conversation as resolved.
Show resolved Hide resolved
[cilium/cilium] has a similar coverage using CI images instead of released images,
so it's probably an overkill to port all of these workflows to [cilium/cilium]. My
proposal is to move https://github.com/cilium/cilium-cli/blob/main/.github/workflows/kind.yaml
to [cilium/cilium], and run this workflow for pull requests that modify files
under `cli/` directory. This ensures that any changes under `cli/` directory get
tested against a released version of Cilium.

[sig-cilium-cli]: https://github.com/orgs/cilium/teams/sig-cilium-cli
[cilium/cilium]: https://github.com/cilium/cilium
[cilium/cilium-cli]: https://github.com/cilium/cilium-cli
[michi-covalent/cilium]: https://github.com/michi-covalent/cilium/pull/169
[michi-covalent/cilium-cli]: https://github.com/michi-covalent/cilium-cli/tree/cli-test
[backporting process]: https://docs.cilium.io/en/stable/contributing/release/backports/
[kubernetes/kubernetes staging directory]: https://github.com/kubernetes/kubernetes/tree/master/staging/
[apimachinery]: https://github.com/kubernetes/apimachinery