Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(new source): Initial kubernetes_logs implementation #2653

Merged
merged 118 commits into from
Jul 22, 2020
Merged

Conversation

MOZGIII
Copy link
Contributor

@MOZGIII MOZGIII commented May 20, 2020

This PR implements Kubernetes source according to #2222.
There are a few things are different from the RFC:

  • the name for the source is kubernetes_logs - it's wasn't properly addressed at the RFC, but the logs and other source kinds for the kubernetes would very likely have very different internal design, so we should probably keep them as separate units; to disambiguate from the very beginning I modified the name to be kubernetes_logs;
  • had to bump MSKV to 1.15 - watch bookmarks are introduced in that version, and this is what we want to have to avoid extra desync issues.

Done:

To do (before the merge):

To do (after the merge):

External refs:


The overall approach was to build highly composable components, so that we can reuse them for further work - like for adding a sidecar-oriented transform for pod annotation, or exclusions based on namespace labels.

@github-actions
Copy link

github-actions bot commented May 20, 2020

Great PR! Please pay attention to the following items before merging:

Files matching Cargo.lock:

  • Has at least one other team member approved the dependency changes?

This is an automatically generated QA checklist based on modified files

@MOZGIII MOZGIII force-pushed the k8s-impl branch 15 times, most recently from ac2abb3 to 6fe6fdf Compare May 22, 2020 05:52
@MOZGIII
Copy link
Contributor Author

MOZGIII commented May 25, 2020

The PR is ready for review, please go ahead with it!
There are still a few things left, I'm going to work on those this week. This shouldn't interfere with the review.

@MOZGIII MOZGIII marked this pull request as ready for review May 25, 2020 09:00
@MOZGIII MOZGIII force-pushed the k8s-impl branch 4 times, most recently from 1c78df3 to 4997b3e Compare May 25, 2020 12:05
@Hoverbear
Copy link
Contributor

I am so pumped!

MOZGIII added 24 commits July 22, 2020 00:56
It was controversial, and so this test is added.

Signed-off-by: MOZGIII <[email protected]>
This should've been fixed by now, but it's still causing issues with older
k8s versions that we want to support.

Signed-off-by: MOZGIII <[email protected]>
No actual changes there compared to upstream, just for preservation

Signed-off-by: MOZGIII <[email protected]>
Signed-off-by: MOZGIII <[email protected]>
@MOZGIII MOZGIII merged commit 5308b86 into master Jul 22, 2020
@binarylogic binarylogic deleted the k8s-impl branch September 5, 2020 21:58
mengesb pushed a commit to jacobbraaten/vector that referenced this pull request Dec 9, 2020
…dev#2653)

* Add kubernetes-integration-tests feature

Signed-off-by: MOZGIII <[email protected]>

* Enable kubernetes tests

Signed-off-by: MOZGIII <[email protected]>

* Add skaffold for quick development iterations

Signed-off-by: MOZGIII <[email protected]>

* Add kubernetes mod and cargo feature

Signed-off-by: MOZGIII <[email protected]>

* Add an HTTP client tailored for k8s API in an in-cluster environment

Signed-off-by: MOZGIII <[email protected]>

* Add a decoder for chained k8s responses

Signed-off-by: MOZGIII <[email protected]>

* Add block_on_std to test_util

Signed-off-by: MOZGIII <[email protected]>

* Add tools for processing HTTP bodies as streams of k8s responses

Signed-off-by: MOZGIII <[email protected]>

* Add Watcher trait

Signed-off-by: MOZGIII <[email protected]>

* Add ApiWatcher implementation

Signed-off-by: MOZGIII <[email protected]>

* Add MockWatcher implementation

Signed-off-by: MOZGIII <[email protected]>

* Add a reflector implementation

Signed-off-by: MOZGIII <[email protected]>

* Add a placeholder for kubernetes logs source

Signed-off-by: MOZGIII <[email protected]>

* Add paths provider implementation based on pods state driven by reflector

Signed-off-by: MOZGIII <[email protected]>

* Add parser for k8s log formats

Signed-off-by: MOZGIII <[email protected]>

* Add partial events merger

Signed-off-by: MOZGIII <[email protected]>

* Add pod metadata annotator

Signed-off-by: MOZGIII <[email protected]>

* Add kubernetes logs source implementation

Signed-off-by: MOZGIII <[email protected]>

* Change reflector errors to be less restrictive

Signed-off-by: MOZGIII <[email protected]>

* Better error handling

Signed-off-by: MOZGIII <[email protected]>

* Improve error handling

Signed-off-by: MOZGIII <[email protected]>

* Better error handling for event pipeline

Signed-off-by: MOZGIII <[email protected]>

* Abstract API watcher around watch request builder

Signed-off-by: MOZGIII <[email protected]>

* Add assertions to the reflector internal state in-between the invocations

Signed-off-by: MOZGIII <[email protected]>

* Fix the comment at optional tranform mod

Signed-off-by: MOZGIII <[email protected]>

* Add a to do comment to make transform utils part of core

Signed-off-by: MOZGIII <[email protected]>

* Fix typo at kustomization.yaml

Signed-off-by: MOZGIII <[email protected]>

* More flexible interface at resource version

Signed-off-by: MOZGIII <[email protected]>

* Fix the mod comment at resource version

Signed-off-by: MOZGIII <[email protected]>

* Remove manual minikube init from scripts/skaffold.sh

Skaffold is capable of detecting minikube itself. It will also work with
Docker Desktop on Windows and macOS out of the box.

Signed-off-by: MOZGIII <[email protected]>

* Specify shell at scripts/minikube-docker-env.sh

Signed-off-by: MOZGIII <[email protected]>

* Switch scripts/copy-docker-image-to-minikube.sh to use scripts/minikube-docker-env.sh

Signed-off-by: MOZGIII <[email protected]>

* Switch to using device inodes for file fingerprinting

Signed-off-by: MOZGIII <[email protected]>

* Fix grammar at in-cluster config

Signed-off-by: MOZGIII <[email protected]>

* Add the variable that's missing to the error at in-cluster config

Signed-off-by: MOZGIII <[email protected]>

* Move in_cluster mod declaration to the top of the file

Signed-off-by: MOZGIII <[email protected]>

* Cut some ununsed code from src/kubernetes/resource_version.rs

Signed-off-by: MOZGIII <[email protected]>

* Fix typo at src/kubernetes/reflector.rs

Signed-off-by: MOZGIII <[email protected]>

* Move the file key to the top of the file

Signed-off-by: MOZGIII <[email protected]>

* Fix comment at parsers test

Signed-off-by: MOZGIII <[email protected]>

* Add a comment for Config::self_node_name

Signed-off-by: MOZGIII <[email protected]>

* Allow disabling automatic partial merge

Signed-off-by: MOZGIII <[email protected]>

* Allow customizing the fields names used by pod metadata annotator

Signed-off-by: MOZGIII <[email protected]>

* Abstract reflector around state

Adds an indirection layer while maintaining the same logic.

This unlocks:
- adding debounce to the evmap for advanced state management;
- adding removal delay to mitigate the race condition with pod being
  removed while having a huge log backlog;
Those were doable before, but at the cost of code clarity, reliability and
maintainability.

Signed-off-by: MOZGIII <[email protected]>

* Add support for delayed delete at reflector

Signed-off-by: MOZGIII <[email protected]>

* Reimplement the tests to add delayed deletion testing capability

New channel-based mocks implementation allows to properly eliminate race
conditions and achieve the complete determinism and reliability of the
test scenario.

Signed-off-by: MOZGIII <[email protected]>

* Improve the request preparation code at kubernetes::client::Client

Signed-off-by: MOZGIII <[email protected]>

* Add reexports at src/kubernetes/mod.rs

Signed-off-by: MOZGIII <[email protected]>

* Adjust and use watcher error constructors

Signed-off-by: MOZGIII <[email protected]>

* Eliminate unused transform

Signed-off-by: MOZGIII <[email protected]>

* Add a test for stream error behavior

Signed-off-by: MOZGIII <[email protected]>

* Add tests and derives at transform::util::pick

Signed-off-by: MOZGIII <[email protected]>

* Require Watcher::Stream to be Send

Signed-off-by: MOZGIII <[email protected]>

* Add instrumentation

Signed-off-by: MOZGIII <[email protected]>

* Add state maintenance and move delayed delete logic into to a state wrapper

This unlocks further complicating the logic around state without making
reflector overcomplicated. This better aligns with the goal of building
composable and testable loosely coupled components.

Signed-off-by: MOZGIII <[email protected]>

* Ignore instrumenting state tests

There no way to assert individual emits, and asserting metrics directly
causes issues:
- these tests break the internal tests at the metrics implementation
  itself, since we end up initializing the metrics controller twice;
- testing metrics introduces unintended coupling between subsystems,
  ideally we only need to assert that we emit, but avoid assumptions on
  what the results of that emit are.

Signed-off-by: MOZGIII <[email protected]>

* Clean up reflector tests

Signed-off-by: MOZGIII <[email protected]>

* Add flush debouncing to the evmap state

Signed-off-by: MOZGIII <[email protected]>

* Proper delay control at main test flow

Signed-off-by: MOZGIII <[email protected]>

* Add evmap tests with and without debounce

Signed-off-by: MOZGIII <[email protected]>

* Fix a typo

Signed-off-by: MOZGIII <[email protected]>

* Document the controversial join_host_port

Signed-off-by: MOZGIII <[email protected]>

* Improve instrumenting watcher events

Signed-off-by: MOZGIII <[email protected]>

* Improve api watcher events

Signed-off-by: MOZGIII <[email protected]>

* Rename init to new

Signed-off-by: MOZGIII <[email protected]>

* Use Strings at parser tests

Signed-off-by: MOZGIII <[email protected]>

* Corrected partial message detection hueristics at docker parser

Signed-off-by: MOZGIII <[email protected]>

* Hint for where's what at parser test case

Signed-off-by: MOZGIII <[email protected]>

* Bump base image at skaffold/docker/Dockerfile to debian:bullseye-slim

Without it local builds don't work host OS' with glibc 2.29

Signed-off-by: MOZGIII <[email protected]>

* Better script layout at skaffold/docker/Dockerfile

Signed-off-by: MOZGIII <[email protected]>

* Convert an annotation failure warn log to internal event

Signed-off-by: MOZGIII <[email protected]>

* Correct the shutdown logic

Signed-off-by: MOZGIII <[email protected]>

* Specify STOPSIGNAL at skaffold/docker/Dockerfile

Signed-off-by: MOZGIII <[email protected]>

* Set terminationGracePeriodSeconds at distribution/kubernetes/vector-namespaced.yaml

Signed-off-by: MOZGIII <[email protected]>

* Fix paths generation on Windows

Signed-off-by: MOZGIII <[email protected]>

* Add a to do to unignore instrumenting state tests

Signed-off-by: MOZGIII <[email protected]>

* Add Kubernetes section to the CONTRIBUTING.md

Signed-off-by: MOZGIII <[email protected]>

* Solve the issue with the config generation

Signed-off-by: MOZGIII <[email protected]>

* Better document the intent for the kubernetes_logs source

Signed-off-by: MOZGIII <[email protected]>

* Print vector version upon docker build at skaffold/docker/Dockerfile

This is so that we catch the error with inability to exec vector at
container build time, rather than at runtime.

Signed-off-by: MOZGIII <[email protected]>

* Add more build-time output at skaffold/docker/Dockerfile

Signed-off-by: MOZGIII <[email protected]>

* Add patchelf at skaffold/docker/Dockerfile

Signed-off-by: MOZGIII <[email protected]>

* Optimize commands order at skaffold/docker/Dockerfile for layer caching

Signed-off-by: MOZGIII <[email protected]>

* Add kubectl at CONTRIBUTING.md

Signed-off-by: MOZGIII <[email protected]>

* Add additional details at CONTRIBUTING.md

Signed-off-by: MOZGIII <[email protected]>

* Fix the data dir description

Signed-off-by: MOZGIII <[email protected]>

* Move transform utils to the source mod to avoid introducing global items

Signed-off-by: MOZGIII <[email protected]>

* Fix the typo at CONTRIBUTING.md

Signed-off-by: MOZGIII <[email protected]>

* Eliminate the ApiWatcher::invoke_boxed_stream

Signed-off-by: MOZGIII <[email protected]>

* Add code docs for join_host_port

Signed-off-by: MOZGIII <[email protected]>

* Add a lifecycle system to manage futures sanely and reliably

Signed-off-by: MOZGIII <[email protected]>

* Reorganized the mod and use clauses at src/sources/kubernetes_logs/mod.rs

Signed-off-by: MOZGIII <[email protected]>

* Switch fingerprinter to checksum

Signed-off-by: MOZGIII <[email protected]>

* Revert "Switch fingerprinter to checksum"

Apparently it breaks everything. E2E tests start failing.
We need a better way.

This reverts commit a4c2a7d.

Signed-off-by: MOZGIII <[email protected]>

* Invert the condition at Debounce::signal

Signed-off-by: MOZGIII <[email protected]>

* Fix typo at src/internal_events/kubernetes/instrumenting_state.rs

Signed-off-by: MOZGIII <[email protected]>

* Remove the rate limit from KubernetesLogsEventReceived

Signed-off-by: MOZGIII <[email protected]>

* Adjust the log style at KubernetesLogsEventReceived

Signed-off-by: MOZGIII <[email protected]>

* Move pod metadata annotation to access file path directly

Signed-off-by: MOZGIII <[email protected]>

* Add test to ensure MultiResponseDecoder doesn't leak memory

It was controversial, and so this test is added.

Signed-off-by: MOZGIII <[email protected]>

* Simplified the picker logic and added tests

Signed-off-by: MOZGIII <[email protected]>

* Add a special case for `\n` at the MultiResponseDecoder::finish

Signed-off-by: MOZGIII <[email protected]>

* Workaround for watch API failures

This should've been fixed by now, but it's still causing issues with older
k8s versions that we want to support.

Signed-off-by: MOZGIII <[email protected]>

* Add a long line test case for the CRI format parser

Signed-off-by: MOZGIII <[email protected]>

* Log the buffer state at response decoding error

Signed-off-by: MOZGIII <[email protected]>

* Handle data parsing errors as stream ends

Signed-off-by: MOZGIII <[email protected]>

* Ensure we only read logs under container name subdirectory

Signed-off-by: MOZGIII <[email protected]>

* Promote trace to an error at src/kubernetes/multi_response_decoder.rs

Signed-off-by: MOZGIII <[email protected]>

* Add a test case for bookmark parsing error

Signed-off-by: MOZGIII <[email protected]>

* Remove meaningless leading \n from the boorkmark test

Signed-off-by: MOZGIII <[email protected]>

* Update k8s-openapi to a branch with a fix for the bookmark parsing issue

Signed-off-by: MOZGIII <[email protected]>

* Switch kubernetes_logs source to Fingerprinter::FirstLineChecksum

Signed-off-by: MOZGIII <[email protected]>

* Revert "Handle data parsing errors as stream ends"

This reverts commit c0e1695.

Signed-off-by: MOZGIII <[email protected]>

* Switch k8s-openapi git repo to our fork

No actual changes there compared to upstream, just for preservation

Signed-off-by: MOZGIII <[email protected]>

* Use cargo patch instead of per-crate git spec for k8s-openapi

Signed-off-by: MOZGIII <[email protected]>

* Switch to the upstream of k8s-openapi

Signed-off-by: MOZGIII <[email protected]>

* Fix clippy offences

Signed-off-by: MOZGIII <[email protected]>

* Bump k8s-openapi and switch to crates.io version

Signed-off-by: MOZGIII <[email protected]>
Signed-off-by: Brian Menges <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants