Skip to content

Commit

Permalink
feat(new source): Initial kubernetes_logs implementation (vectordot…
Browse files Browse the repository at this point in the history
…dev#2653)

* Add kubernetes-integration-tests feature

Signed-off-by: MOZGIII <[email protected]>

* Enable kubernetes tests

Signed-off-by: MOZGIII <[email protected]>

* Add skaffold for quick development iterations

Signed-off-by: MOZGIII <[email protected]>

* Add kubernetes mod and cargo feature

Signed-off-by: MOZGIII <[email protected]>

* Add an HTTP client tailored for k8s API in an in-cluster environment

Signed-off-by: MOZGIII <[email protected]>

* Add a decoder for chained k8s responses

Signed-off-by: MOZGIII <[email protected]>

* Add block_on_std to test_util

Signed-off-by: MOZGIII <[email protected]>

* Add tools for processing HTTP bodies as streams of k8s responses

Signed-off-by: MOZGIII <[email protected]>

* Add Watcher trait

Signed-off-by: MOZGIII <[email protected]>

* Add ApiWatcher implementation

Signed-off-by: MOZGIII <[email protected]>

* Add MockWatcher implementation

Signed-off-by: MOZGIII <[email protected]>

* Add a reflector implementation

Signed-off-by: MOZGIII <[email protected]>

* Add a placeholder for kubernetes logs source

Signed-off-by: MOZGIII <[email protected]>

* Add paths provider implementation based on pods state driven by reflector

Signed-off-by: MOZGIII <[email protected]>

* Add parser for k8s log formats

Signed-off-by: MOZGIII <[email protected]>

* Add partial events merger

Signed-off-by: MOZGIII <[email protected]>

* Add pod metadata annotator

Signed-off-by: MOZGIII <[email protected]>

* Add kubernetes logs source implementation

Signed-off-by: MOZGIII <[email protected]>

* Change reflector errors to be less restrictive

Signed-off-by: MOZGIII <[email protected]>

* Better error handling

Signed-off-by: MOZGIII <[email protected]>

* Improve error handling

Signed-off-by: MOZGIII <[email protected]>

* Better error handling for event pipeline

Signed-off-by: MOZGIII <[email protected]>

* Abstract API watcher around watch request builder

Signed-off-by: MOZGIII <[email protected]>

* Add assertions to the reflector internal state in-between the invocations

Signed-off-by: MOZGIII <[email protected]>

* Fix the comment at optional tranform mod

Signed-off-by: MOZGIII <[email protected]>

* Add a to do comment to make transform utils part of core

Signed-off-by: MOZGIII <[email protected]>

* Fix typo at kustomization.yaml

Signed-off-by: MOZGIII <[email protected]>

* More flexible interface at resource version

Signed-off-by: MOZGIII <[email protected]>

* Fix the mod comment at resource version

Signed-off-by: MOZGIII <[email protected]>

* Remove manual minikube init from scripts/skaffold.sh

Skaffold is capable of detecting minikube itself. It will also work with
Docker Desktop on Windows and macOS out of the box.

Signed-off-by: MOZGIII <[email protected]>

* Specify shell at scripts/minikube-docker-env.sh

Signed-off-by: MOZGIII <[email protected]>

* Switch scripts/copy-docker-image-to-minikube.sh to use scripts/minikube-docker-env.sh

Signed-off-by: MOZGIII <[email protected]>

* Switch to using device inodes for file fingerprinting

Signed-off-by: MOZGIII <[email protected]>

* Fix grammar at in-cluster config

Signed-off-by: MOZGIII <[email protected]>

* Add the variable that's missing to the error at in-cluster config

Signed-off-by: MOZGIII <[email protected]>

* Move in_cluster mod declaration to the top of the file

Signed-off-by: MOZGIII <[email protected]>

* Cut some ununsed code from src/kubernetes/resource_version.rs

Signed-off-by: MOZGIII <[email protected]>

* Fix typo at src/kubernetes/reflector.rs

Signed-off-by: MOZGIII <[email protected]>

* Move the file key to the top of the file

Signed-off-by: MOZGIII <[email protected]>

* Fix comment at parsers test

Signed-off-by: MOZGIII <[email protected]>

* Add a comment for Config::self_node_name

Signed-off-by: MOZGIII <[email protected]>

* Allow disabling automatic partial merge

Signed-off-by: MOZGIII <[email protected]>

* Allow customizing the fields names used by pod metadata annotator

Signed-off-by: MOZGIII <[email protected]>

* Abstract reflector around state

Adds an indirection layer while maintaining the same logic.

This unlocks:
- adding debounce to the evmap for advanced state management;
- adding removal delay to mitigate the race condition with pod being
  removed while having a huge log backlog;
Those were doable before, but at the cost of code clarity, reliability and
maintainability.

Signed-off-by: MOZGIII <[email protected]>

* Add support for delayed delete at reflector

Signed-off-by: MOZGIII <[email protected]>

* Reimplement the tests to add delayed deletion testing capability

New channel-based mocks implementation allows to properly eliminate race
conditions and achieve the complete determinism and reliability of the
test scenario.

Signed-off-by: MOZGIII <[email protected]>

* Improve the request preparation code at kubernetes::client::Client

Signed-off-by: MOZGIII <[email protected]>

* Add reexports at src/kubernetes/mod.rs

Signed-off-by: MOZGIII <[email protected]>

* Adjust and use watcher error constructors

Signed-off-by: MOZGIII <[email protected]>

* Eliminate unused transform

Signed-off-by: MOZGIII <[email protected]>

* Add a test for stream error behavior

Signed-off-by: MOZGIII <[email protected]>

* Add tests and derives at transform::util::pick

Signed-off-by: MOZGIII <[email protected]>

* Require Watcher::Stream to be Send

Signed-off-by: MOZGIII <[email protected]>

* Add instrumentation

Signed-off-by: MOZGIII <[email protected]>

* Add state maintenance and move delayed delete logic into to a state wrapper

This unlocks further complicating the logic around state without making
reflector overcomplicated. This better aligns with the goal of building
composable and testable loosely coupled components.

Signed-off-by: MOZGIII <[email protected]>

* Ignore instrumenting state tests

There no way to assert individual emits, and asserting metrics directly
causes issues:
- these tests break the internal tests at the metrics implementation
  itself, since we end up initializing the metrics controller twice;
- testing metrics introduces unintended coupling between subsystems,
  ideally we only need to assert that we emit, but avoid assumptions on
  what the results of that emit are.

Signed-off-by: MOZGIII <[email protected]>

* Clean up reflector tests

Signed-off-by: MOZGIII <[email protected]>

* Add flush debouncing to the evmap state

Signed-off-by: MOZGIII <[email protected]>

* Proper delay control at main test flow

Signed-off-by: MOZGIII <[email protected]>

* Add evmap tests with and without debounce

Signed-off-by: MOZGIII <[email protected]>

* Fix a typo

Signed-off-by: MOZGIII <[email protected]>

* Document the controversial join_host_port

Signed-off-by: MOZGIII <[email protected]>

* Improve instrumenting watcher events

Signed-off-by: MOZGIII <[email protected]>

* Improve api watcher events

Signed-off-by: MOZGIII <[email protected]>

* Rename init to new

Signed-off-by: MOZGIII <[email protected]>

* Use Strings at parser tests

Signed-off-by: MOZGIII <[email protected]>

* Corrected partial message detection hueristics at docker parser

Signed-off-by: MOZGIII <[email protected]>

* Hint for where's what at parser test case

Signed-off-by: MOZGIII <[email protected]>

* Bump base image at skaffold/docker/Dockerfile to debian:bullseye-slim

Without it local builds don't work host OS' with glibc 2.29

Signed-off-by: MOZGIII <[email protected]>

* Better script layout at skaffold/docker/Dockerfile

Signed-off-by: MOZGIII <[email protected]>

* Convert an annotation failure warn log to internal event

Signed-off-by: MOZGIII <[email protected]>

* Correct the shutdown logic

Signed-off-by: MOZGIII <[email protected]>

* Specify STOPSIGNAL at skaffold/docker/Dockerfile

Signed-off-by: MOZGIII <[email protected]>

* Set terminationGracePeriodSeconds at distribution/kubernetes/vector-namespaced.yaml

Signed-off-by: MOZGIII <[email protected]>

* Fix paths generation on Windows

Signed-off-by: MOZGIII <[email protected]>

* Add a to do to unignore instrumenting state tests

Signed-off-by: MOZGIII <[email protected]>

* Add Kubernetes section to the CONTRIBUTING.md

Signed-off-by: MOZGIII <[email protected]>

* Solve the issue with the config generation

Signed-off-by: MOZGIII <[email protected]>

* Better document the intent for the kubernetes_logs source

Signed-off-by: MOZGIII <[email protected]>

* Print vector version upon docker build at skaffold/docker/Dockerfile

This is so that we catch the error with inability to exec vector at
container build time, rather than at runtime.

Signed-off-by: MOZGIII <[email protected]>

* Add more build-time output at skaffold/docker/Dockerfile

Signed-off-by: MOZGIII <[email protected]>

* Add patchelf at skaffold/docker/Dockerfile

Signed-off-by: MOZGIII <[email protected]>

* Optimize commands order at skaffold/docker/Dockerfile for layer caching

Signed-off-by: MOZGIII <[email protected]>

* Add kubectl at CONTRIBUTING.md

Signed-off-by: MOZGIII <[email protected]>

* Add additional details at CONTRIBUTING.md

Signed-off-by: MOZGIII <[email protected]>

* Fix the data dir description

Signed-off-by: MOZGIII <[email protected]>

* Move transform utils to the source mod to avoid introducing global items

Signed-off-by: MOZGIII <[email protected]>

* Fix the typo at CONTRIBUTING.md

Signed-off-by: MOZGIII <[email protected]>

* Eliminate the ApiWatcher::invoke_boxed_stream

Signed-off-by: MOZGIII <[email protected]>

* Add code docs for join_host_port

Signed-off-by: MOZGIII <[email protected]>

* Add a lifecycle system to manage futures sanely and reliably

Signed-off-by: MOZGIII <[email protected]>

* Reorganized the mod and use clauses at src/sources/kubernetes_logs/mod.rs

Signed-off-by: MOZGIII <[email protected]>

* Switch fingerprinter to checksum

Signed-off-by: MOZGIII <[email protected]>

* Revert "Switch fingerprinter to checksum"

Apparently it breaks everything. E2E tests start failing.
We need a better way.

This reverts commit a4c2a7d.

Signed-off-by: MOZGIII <[email protected]>

* Invert the condition at Debounce::signal

Signed-off-by: MOZGIII <[email protected]>

* Fix typo at src/internal_events/kubernetes/instrumenting_state.rs

Signed-off-by: MOZGIII <[email protected]>

* Remove the rate limit from KubernetesLogsEventReceived

Signed-off-by: MOZGIII <[email protected]>

* Adjust the log style at KubernetesLogsEventReceived

Signed-off-by: MOZGIII <[email protected]>

* Move pod metadata annotation to access file path directly

Signed-off-by: MOZGIII <[email protected]>

* Add test to ensure MultiResponseDecoder doesn't leak memory

It was controversial, and so this test is added.

Signed-off-by: MOZGIII <[email protected]>

* Simplified the picker logic and added tests

Signed-off-by: MOZGIII <[email protected]>

* Add a special case for `\n` at the MultiResponseDecoder::finish

Signed-off-by: MOZGIII <[email protected]>

* Workaround for watch API failures

This should've been fixed by now, but it's still causing issues with older
k8s versions that we want to support.

Signed-off-by: MOZGIII <[email protected]>

* Add a long line test case for the CRI format parser

Signed-off-by: MOZGIII <[email protected]>

* Log the buffer state at response decoding error

Signed-off-by: MOZGIII <[email protected]>

* Handle data parsing errors as stream ends

Signed-off-by: MOZGIII <[email protected]>

* Ensure we only read logs under container name subdirectory

Signed-off-by: MOZGIII <[email protected]>

* Promote trace to an error at src/kubernetes/multi_response_decoder.rs

Signed-off-by: MOZGIII <[email protected]>

* Add a test case for bookmark parsing error

Signed-off-by: MOZGIII <[email protected]>

* Remove meaningless leading \n from the boorkmark test

Signed-off-by: MOZGIII <[email protected]>

* Update k8s-openapi to a branch with a fix for the bookmark parsing issue

Signed-off-by: MOZGIII <[email protected]>

* Switch kubernetes_logs source to Fingerprinter::FirstLineChecksum

Signed-off-by: MOZGIII <[email protected]>

* Revert "Handle data parsing errors as stream ends"

This reverts commit c0e1695.

Signed-off-by: MOZGIII <[email protected]>

* Switch k8s-openapi git repo to our fork

No actual changes there compared to upstream, just for preservation

Signed-off-by: MOZGIII <[email protected]>

* Use cargo patch instead of per-crate git spec for k8s-openapi

Signed-off-by: MOZGIII <[email protected]>

* Switch to the upstream of k8s-openapi

Signed-off-by: MOZGIII <[email protected]>

* Fix clippy offences

Signed-off-by: MOZGIII <[email protected]>

* Bump k8s-openapi and switch to crates.io version

Signed-off-by: MOZGIII <[email protected]>
Signed-off-by: Brian Menges <[email protected]>
  • Loading branch information
MOZGIII authored and Brian Menges committed Dec 9, 2020
1 parent f32ad73 commit efb1f24
Show file tree
Hide file tree
Showing 60 changed files with 6,078 additions and 25 deletions.
80 changes: 80 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ expanding into more specifics.
1. [Tips and Tricks](#tips-and-tricks)
1. [Benchmarking](#benchmarking)
1. [Profiling](#profiling)
1. [Kubernetes](#kubernetes)
1. [Humans](#humans)
1. [Documentation](#documentation)
1. [Changelog](#changelog)
Expand Down Expand Up @@ -547,6 +548,85 @@ cat stacks.folded | inferno-flamegraph > flamegraph.svg
And that's it! You now have a flamegraph SVG file that can be opened and
navigated in your favorite web browser.

### Kubernetes

There is a special flow for when you develop portions of Vector that are
designed to work with Kubernetes, like `kubernetes_logs` source or the
`deployment/kubernetes/*.yaml` configs.

This flow facilitates building Vector and deploying it into a cluster.

#### Requirements

There are some extra requirements besides what you'd normally need to work on
Vector:

* `linux` system (create an issue if you want to work with another OS and we'll
help);
* [`skaffold`](https://skaffold.dev/)
* [`docker`](https://www.docker.com/)
* [`kubectl`](https://kubernetes.io/docs/tasks/tools/install-kubectl/)
* [`kustomize`](https://kustomize.io/)
* [`minikube`](https://minikube.sigs.k8s.io/)-powered or other k8s cluster
* [`cargo watch`](https://github.com/passcod/cargo-watch)

#### The dev flow

Once you have the requirements, use the `scripts/skaffold.sh dev` command.

That's it, just one command should take care of everything!

It will

1. build the `vector` binary in development mode,
2. build a docker image from this binary via `skaffold/docker/Dockerfile`,
3. deploy `vector` into the Kubernetes cluster at your current kubectl context
using the built docker image and a mix of our production deployment
configuration from the `distribution/kubernetes/*.yaml` and the special
dev-flow configuration at `skaffold/manifests/*.yaml`; see
`kustomization.yaml` for the exact specification.

As the result of invoking the `scripts/skaffold.sh dev`, you should see
a `skaffold` process running on your local machine, printing the logs from the
deployed `vector` instance.

To stop the process, press `Ctrl+C`, and wait for `skaffold` to clean up
the cluster state and exit.

`scripts/skaffold.sh` wraps `skaffold`, you can use other `skaffold` subcommands
if it fits you better.

#### Troubleshooting

You might need to tweak `skaffold`, here are some hints:

* `skaffold` will try to detect whether a local cluster is used; if a local
cluster is used, `skaffold` won't push the docker images it builds to a
registry.
See [this page](https://skaffold.dev/docs/environment/local-cluster/)
for how you can troubleshoot and tweak this behavior.

* `skaffold` can rewrite the image name so that you don't try to push a docker
image to a repo that you don't have access to.
See [this page](https://skaffold.dev/docs/environment/image-registries/)
for more info.

* For the rest of the `skaffold` tweaks you might want to apply check out
[this page](https://skaffold.dev/docs/environment/).

#### Going through the dev flow manually

Is some cases `skaffold` may not work. It's possible to go through the dev flow
manually, without `skaffold`.

One of the important thing `skaffold` does is it patches the configuration to
tie things together. If you want to go without it, you'll have to take care of
that yourself, thus some additional knowledge of Kubernetes inner workings is
required.

Essentially, the steps you have to take to deploy manually are the same that
`skaffold` will perform, and they're outlined at the previous section.

## Humans

After making your change, you'll want to prepare it for Vector's users
Expand Down
86 changes: 71 additions & 15 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 9 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,7 @@ strip-ansi-escapes = { version = "0.1.0"}
colored = "1.9"
warp = { package = "warp", version = "0.2", default-features = false, optional = true }
evmap = { version = "7", features = ["bytes"], optional = true }
evmap10 = { package = "evmap", version = "10", features = ["bytes"], optional = true }
logfmt = { version = "0.0.2", optional = true }
notify = "4.0.14"
once_cell = "1.3"
Expand All @@ -152,6 +153,7 @@ pulsar = { version = "1.0.0", default-features = false, features = ["tokio-runti
task-compat = "0.1"
cidr-utils = "0.4.2"
pin-project = "0.4.22"
k8s-openapi = { version = "0.9", features = ["v1_15"], optional = true }

# For WASM
vector-wasm = { path = "lib/vector-wasm", optional = true }
Expand Down Expand Up @@ -228,6 +230,10 @@ leveldb-cmake = ["leveldb", "leveldb/leveldb-sys-3"]
wasm = ["lucetc", "lucet-runtime", "lucet-wasi", "vector-wasm", "anyhow"]
wasm-timings = ["wasm"]

# Enables kubernetes dependencies and shared code. Kubernetes-related sources,
# transforms and sinks should depend on this feature.
kubernetes = ["k8s-openapi", "evmap10"]

# Sources
sources = [
"sources-docker",
Expand All @@ -246,6 +252,7 @@ sources = [
"sources-syslog",
"sources-tls",
"sources-vector",
"sources-kubernetes-logs",
]
sources-docker = ["bollard"]
sources-file = ["bytesize"]
Expand All @@ -263,6 +270,7 @@ sources-stdin = ["bytesize"]
sources-syslog = ["sources-socket", "syslog_loose"]
sources-tls = ["sources-http", "sources-logplex", "sources-socket", "sources-splunk_hec"]
sources-vector = ["sources-socket"]
sources-kubernetes-logs = ["kubernetes", "transforms-merge", "transforms-json_parser", "transforms-regex_parser"]

# Transforms
transforms = [
Expand Down Expand Up @@ -427,6 +435,7 @@ kafka-integration-tests = ["sources-kafka", "sinks-kafka"]
loki-integration-tests = ["sinks-loki"]
pulsar-integration-tests = ["sinks-pulsar"]
splunk-integration-tests = ["sinks-splunk_hec", "warp"]
kubernetes-integration-tests = ["sources-kubernetes-logs"]

shutdown-tests = ["sources","sinks-console","sinks-prometheus","sinks-blackhole","unix","rdkafka","transforms-log_to_metric","transforms-lua"]
disable-resolv-conf = []
Expand Down
9 changes: 5 additions & 4 deletions distribution/kubernetes/vector-namespaced.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,12 @@ data:
# Configuration for vector.
# Docs: https://vector.dev/docs/
# Configure the controlled by the deployment.
# Data dir is location controlled at the `DaemonSet`.
data_dir = "/vector-data-dir"
# Ingest logs from Kubernetes.
[sources.kubernetes]
type = "kubernetes"
[sources.kubernetes_logs]
type = "kubernetes_logs"
---
apiVersion: apps/v1
kind: DaemonSet
Expand All @@ -28,11 +28,11 @@ spec:
metadata:
labels:
name: vector
vector.dev/exclude: "true"
spec:
containers:
- name: vector
image: timberio/vector:latest-alpine
imagePullPolicy: Always
args:
- --config
- /etc/vector/*.toml
Expand Down Expand Up @@ -61,6 +61,7 @@ spec:
- name: config-dir
mountPath: /etc/vector/
readOnly: true
terminationGracePeriodSeconds: 60
tolerations:
# This toleration is to have the daemonset runnable on master nodes.
# Remove it if your masters can't run pods.
Expand Down
10 changes: 10 additions & 0 deletions kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# This is a part of our skaffold setup for development.
# Do not use in production.

namespace: vector

resources:
- distribution/kubernetes/vector-global.yaml
- skaffold/manifests/namespace.yaml
- skaffold/manifests/config.yaml
- distribution/kubernetes/vector-namespaced.yaml
3 changes: 2 additions & 1 deletion scripts/copy-docker-image-to-minikube.sh
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,8 @@ docker save "${IMAGES[@]}" | gzip >"$IMAGES_ARCHIVE"
# Start a subshell to preserve the env state.
(
# Switch to minikube docker.
eval "$(minikube --shell bash docker-env)"
# shellcheck source=minikube-docker-env.sh disable=SC1091
. scripts/minikube-docker-env.sh

# Load images.
docker load -i "$IMAGES_ARCHIVE"
Expand Down
9 changes: 9 additions & 0 deletions scripts/minikube-docker-env.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/usr/bin/env bash
set -euo pipefail

if ! COMMANDS="$(minikube --shell bash docker-env)"; then
echo "Unable to obtain docker env from minikube; is minikube started?" >&2
exit 7
fi

eval "$COMMANDS"
Loading

0 comments on commit efb1f24

Please sign in to comment.