Rewriting the kubernetes_logs
source to leverage the "official" Rust client for Kubernetes.
kubernetes_logs
error handling RFCkubernetes_logs
source fixes- Extracting Kubernetes runtime out of Vector code base
- Replacing our in-house Kubernetes code with
kube
wherever possible
- Changing the end-to-end functionality of the
kubernetes_logs
source - Changing the testing strategies we have around the
kubernetes_logs
source today
- Difficulty troubleshooting and supporting this feature as maintainers
- Unresolved bug reports from users related to our in-house Kubernetes code
This change should not affect our UX or existing deployments using the kubernetes_logs
source today. Any changes will be entirely internal to the source's implementation.
Our existing implementation relies on many primitives from kube and we have the opportunity to leverage the higher level tools provided by the library as well.
kube
is the leading Rust client for Kubernetes and we should look to utilize
the community support and experience behind it as much as possible. We can replace
most of our "plumbling" level code with the equivalent, or high level, code from
kube
while keeping the same functionality.
As far as I'm aware the only implementation we need to keep in-house is the Store
as we want to retain the contents in the Store
after receiving a DELETE event for
the corresponding contents. This allows us to enrich events we receive from a Pod
after it's been deleted but we still have an open file handle for it's logs.
Some of our existing code in src/kubernetes
is already taken directly from kube
(an older version) without modification which lends itself to being replaced completely.
Rewriting the kubernetes_logs
source to leverage the existing "official" Kubernetes
client for Rust will give us a more stable and maintable foundation. We currently
are unable to maintain and troubleshoot the kubernetes_logs
source properly and
leveraging kube
directly will provide an existing community of users, tests, and
experiences to improve our ability to support this feature in Vector.
Relying on an external library will reduce the amount of code and complexity that currently exists within Vector and allow us to focus less on making to tools correct and more on ensuring we're using them properly.
Leveraging kube
was considered at the start of the Kubernetes integration project,
but eventually we wrote our own implementation on top of the k8s-openapi
crate.
Over the past year and a half, kube
has matured greatly and today it is being
donated to the CNCF.
This appears to be the state of kube
roughly at the time
of writing our own implementation. While our existing implementation is quite
generic and modular, the needs of the component today are quite limited.
Realistically we just need the following:
- Client: authentication and configuration to call the Kubernetes API
kube-client
- Reflector: error handling and a persistent cache for an event stream
kube-runtime::reflector
Converting the underlying libraries to kube
(where possible) doesn't guarantee
a resolution for reported bugs, but it does shift a non-trivial amount of functionality
out of our project and onto a specialized library.
Troubleshooting and maintaining the existing source has been challenging, and there is a clear lack of understanding of the current implementation. The source is an important part of our OSS offering and frequently used with the Agent role. Not doing this impairs our ability to support Vector running as an Agent in Kubernetes and reflects poorly on our overall reputation.
This avenue was recommended internally, but is a departure from our existing implementation, and thus contains more unknowns and increases the likelihood of breaking changes to the existing source.
This option is capable of being used regardless of what role Vector has been deployed in, but it is a larger change to implement and a larger update for users too. Longer term the source can be decoupled and introduced as a standalone enrichment component.
Do we want to re-evaluate the usage ofNot critical, we can review performance later.evmap
in this component?Do we want to make this change with a version change of the source, opting in to the change with aNo, the risk seems low enough to not warrant a version split between the old and new code.version
option in the configuration?
- Replace contents of
src/kubernetes
with equivalents fromkube
src/kubernetes/client
replaced withkube-client
src/kubernetes/reflector
and dependencies replaced withkube-runtime::reflector
src/kubernetes/state
updates to minimize in-house code
- Ensure unit tests and integration tests show matching behavior before and after rewrite
- Move Kubernetes enrichment to a standalone transform and rework source to be a more simple combination of file source plus the new enrichment transform
- Rewrite kubernetes_logs source to integrate with Kubelet API to reduce calls to the Kubernetes Control Plane.