Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add flag to administratively enable APIs on bootstrap #25009

Merged
merged 10 commits into from
Apr 27, 2023

Conversation

joestringer
Copy link
Member

@joestringer joestringer commented Apr 20, 2023

Depends on #24967 .

Add a new (hidden) flag, enable-cilium-api-server-access, which accepts a
list of APIs that the daemon should allow REST API clients to mutate.

This flag can be useful to restrict the sorts of changes that can be
made locally directly from the node or from users with access to the
Cilium container. Many of these have been added over time to assist
development and debugging purposes, and may not be necessary in
production environments.

Set the default to allow "all" API modifications (*) so that there is no
change in behaviour for existing deployments.

In order to successfully use this flag, it is likely that at least the
following options must be configured in order to allow cilium-cni to provision
pods, and to ensure that the health of the agent can be assessed by Kubernetes:

  • GetConfig
  • GetHealthz
  • PutEndpointID
  • DeleteEndpointID
  • PostIPAM
  • DeleteIPAMIP

Users may additionally see benefit in providing Get* as a wildcard in order
to allow any read-only APIs to be accessed, which can be very useful for
debugging.

Occasionally, additional "flush" (Delete) APIs can also be very useful in order
to mitigate issues at runtime, so this option should not be used unless you
have a pretty strong understanding of the APIs you are likely to rely upon.

For completeness, similar flags are also added for the cilium-health and
operator APIs, though the usefulness there is more limited.

Related docs: #24968

@joestringer joestringer requested review from a team as code owners April 20, 2023 16:33
@joestringer joestringer requested a review from pippolo84 April 20, 2023 16:33
@joestringer joestringer added the release-note/minor This PR changes functionality that users may find relevant to operating Cilium. label Apr 20, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. and removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Apr 20, 2023
@joestringer
Copy link
Member Author

/test

@aanm aanm self-requested a review April 20, 2023 18:44
@joestringer
Copy link
Member Author

k8s-1.26-kernel-4.19 hit known flake #24453.

k8s-1.27-kernel-net-next hit a provisioning error while waiting for the VM to boot.

I'll re-run these tests.

@joestringer
Copy link
Member Author

/test-1.26-4.19

@joestringer
Copy link
Member Author

/test-1.27-net-next

pkg/api/apidisable.go Outdated Show resolved Hide resolved
api/v1/server.gotmpl Outdated Show resolved Hide resolved
api/v1/server.gotmpl Outdated Show resolved Hide resolved
Copy link
Member

@nathanjsweet nathanjsweet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for API

This preparatory commit introduces a new Cell for each Swagger API
Specification in order to allow reuse earlier on during the agent
initialization / lifecycle. No functional changes intended.

Signed-off-by: Joe Stringer <[email protected]>
Use the autogenerated API specifications in order to populate "allowed
APIs" flags in the api specification objects for each API. This code
will be hooked in for usage in subsequent commits.

Signed-off-by: Joe Stringer <[email protected]>
Now that there's dedicated cells to provide the API specs for each API,
each server can now directly depend on those rather than doing their own
swagger spec instantiation.

Signed-off-by: Joe Stringer <[email protected]>
Add new flag(s) to the daemon which restrict the ability for API clients
to call certain API endpoints. The new option is an allowlist of
Pascalized API endpoints that may be allowed, or optionally a Prefix
followed by the '*' character in order to allow a wildcard of API
endpoints, for example "*" for all API endpoints, or "Get*" for all GET
endpoints.

Set the default to allow all ("*") API modifications so that there is no
change in behaviour for existing deployments.

Signed-off-by: Joe Stringer <[email protected]>
The newly added API access flags are very flexible in the APIs that they
allow to be administratively disabled, but there are some options that
are really required for any standard Cilium operations. Add some basic
sanity checking before applying the user configuration in these cases,
as it could lead to unexpected results such as inability to keep the
Cilium agent running or inability to deploy new endpoints.

Signed-off-by: Joe Stringer <[email protected]>
pkg/api will be used in an upcoming commit, so rename the variable in
this function to avoid that conflict.

Signed-off-by: Joe Stringer <[email protected]>
Similar to the recent commit to administratively enable/disable support for
agent API endpoints, extend this support to the cilium-health API
exposed by nodes as well.

Signed-off-by: Joe Stringer <[email protected]>
For now this is not particularly important as the operator only has a
/healthz API to get the health of the operator. However, if we ever
extend this API in future then it could become useful to consistently
apply API restrictions via the new flag.

Signed-off-by: Joe Stringer <[email protected]>
@joestringer joestringer force-pushed the pr/joe/api-restrictions branch from b97e6b9 to bdf566f Compare April 26, 2023 03:21
@joestringer
Copy link
Member Author

/test

@joestringer
Copy link
Member Author

Hit #25146 on optional ci-aks run. I'll re-kick it.

@joestringer
Copy link
Member Author

/ci-aks

@joestringer
Copy link
Member Author

OK apparently I can't re-run the job since it's now disabled. It's optional anyway, so we can ignore it.

Copy link
Member

@aanm aanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small non-blocking feedback.

Comment on lines +1994 to +1999
"GetConfig",
"GetHealthz",
"PutEndpointID",
"DeleteEndpointID",
"PostIPAM",
"DeleteIPAMIP",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to either leave a comment in the code explaining why each one of these APIs are fundamental and / or leave the reasoning for it in the Warning log message so that users / admins know why it is important.

Also, let's say that the admin removed the liveness probe from the daemonset and therefore doesn't need the GetHealthz and so it gets administratively disabled. Should we still print a Warning every time Cilium starts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 comments make sense, I can send a follow-up.

Warnings are not perfect, but this is exactly the sort of reason it's a warning and not an error or fatal message. This is a way to let the administrator know that they're running an unconventional setup, and if they know what they're doing and are happy with the consequences then they can ignore it. If you're that far off the beaten path, then I'm OK with having a warning until someone complains that it's a real issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joestringer joestringer merged commit 4b0f110 into cilium:main Apr 27, 2023
@joestringer joestringer deleted the pr/joe/api-restrictions branch April 27, 2023 16:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note/minor This PR changes functionality that users may find relevant to operating Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants