-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add memory bandwidth exporter for AI workload. #379
Changes from all commits
87ff280
801eeb1
721b361
90b77b9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
# Copyright (C) 2024 Intel Corporation | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
run: | ||
timeout: 5m | ||
allow-parallel-runners: true | ||
|
||
issues: | ||
exclude-use-default: false | ||
exclude-rules: | ||
- path: _test.go | ||
linters: | ||
- errcheck | ||
|
||
linters: | ||
disable-all: true | ||
enable: | ||
- depguard | ||
- misspell | ||
- revive | ||
- dupl | ||
- errcheck | ||
- exportloopref | ||
- goconst | ||
- gocyclo | ||
- gofmt | ||
- goimports | ||
- gosimple | ||
- govet | ||
- ineffassign | ||
- lll | ||
- misspell | ||
- nakedret | ||
- prealloc | ||
- staticcheck | ||
- typecheck | ||
- unconvert | ||
- unparam | ||
- unused | ||
|
||
|
||
linters-settings: | ||
depguard: | ||
rules: | ||
no_exec_policy: | ||
files: | ||
- "!$test" | ||
deny: | ||
- pkg: "os/exec" | ||
desc: "Using os/exec to run sub processes it not allowed by policy" | ||
errcheck: | ||
exclude-functions: | ||
# Used in HTTP handlers, any error is handled by the server itself. | ||
- (net/http.ResponseWriter).Write | ||
# Never check for logger errors. | ||
- (github.com/go-kit/log.Logger).Log | ||
revive: | ||
rules: | ||
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md#unused-parameter | ||
- name: unused-parameter | ||
severity: warning | ||
disabled: true |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
FROM golang:1.22 AS builder | ||
ARG TARGETOS | ||
ARG TARGETARCH | ||
|
||
WORKDIR /workspace | ||
COPY . /workspace/ | ||
RUN go mod download | ||
|
||
RUN CGO_ENABLED=0 GOOS=${TARGETOS:-linux} GOARCH=${TARGETARCH} go build -a -o memory-bandwidth-exporter cmd/main.go | ||
|
||
FROM ubuntu:22.04 | ||
USER root | ||
WORKDIR / | ||
COPY --from=builder /workspace/memory-bandwidth-exporter . | ||
|
||
ENTRYPOINT ["bash", "-c"] | ||
CMD ["/memory-bandwidth-exporter --collector.node.name=${NODE_NAME} --collector.container.namespaceWhiteList=${NAMESPACE_WHITELIST}"] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,75 @@ | ||
GO_CMD := go | ||
DEBUG ?= 0 | ||
DOCKER_REGISTRY ?= docker.io/opea | ||
CONTAINER_TOOL ?= docker | ||
VERSION ?= latest | ||
|
||
ifeq ($(DEBUG),0) | ||
GOFLAGS=-ldflags="all=-s -w" | ||
endif | ||
|
||
MBE_IMG_NAME = memory-bandwidth-exporter:$(VERSION) | ||
MBE_IMG = ${DOCKER_REGISTRY}/${MBE_IMG_NAME} | ||
|
||
build: | ||
@mkdir -p bin | ||
@echo "Building memory-bandwidth-exporter binary..." | ||
$(GO_CMD) build -o bin/memory-bandwidth-exporter $(GOFLAGS) cmd/main.go | ||
|
||
docker.build: | ||
@echo "Building memory-bandwidth-exporter Docker image..." | ||
$(CONTAINER_TOOL) build -t ${MBE_IMG} -f Dockerfile . | ||
|
||
docker.push: | ||
@echo "Push memory-bandwidth-exporter Docker image..." | ||
$(CONTAINER_TOOL) push ${MBE_IMG} | ||
|
||
clean: | ||
@echo "Cleaning up..." | ||
rm -rf bin | ||
|
||
change_img: | ||
sed -i "s\MBE_IMG\${MBE_IMG}\g" config/manifests/memory-bandwidth-exporter.yaml | ||
|
||
test: | ||
@echo "Running tests..." | ||
$(GO_CMD) test ./... | ||
|
||
lint: golangci-lint ## Run golangci-lint linter & yamllint | ||
@echo "Running linters...${GOLANGCI_LINT}" | ||
$(GOLANGCI_LINT) run ./... | ||
|
||
lint-fix: golangci-lint ## Run golangci-lint linter and perform fixes | ||
$(GOLANGCI_LINT) run --fix ./... | ||
|
||
##@ Dependencies | ||
|
||
## Location to install dependencies to | ||
LOCALBIN ?= $(shell pwd)/bin | ||
$(LOCALBIN): | ||
mkdir -p $(LOCALBIN) | ||
|
||
## Tool Binaries | ||
GOLANGCI_LINT = $(LOCALBIN)/golangci-lint-$(GOLANGCI_LINT_VERSION) | ||
|
||
## Tool Versions | ||
GOLANGCI_LINT_VERSION ?= v1.59.1 | ||
|
||
.PHONY: golangci-lint | ||
golangci-lint: $(GOLANGCI_LINT) ## Download golangci-lint locally if necessary. | ||
$(GOLANGCI_LINT): $(LOCALBIN) | ||
$(call go-install-tool,$(GOLANGCI_LINT),github.com/golangci/golangci-lint/cmd/golangci-lint,${GOLANGCI_LINT_VERSION}) | ||
|
||
# go-install-tool will 'go install' any package with custom target and name of binary, if it doesn't exist | ||
# $1 - target path with name of binary (ideally with version) | ||
# $2 - package url which can be installed | ||
# $3 - specific version of package | ||
define go-install-tool | ||
@[ -f $(1) ] || { \ | ||
set -e; \ | ||
package=$(2)@$(3) ;\ | ||
echo "Downloading $${package}" ;\ | ||
GOBIN=$(LOCALBIN) go install $${package} ;\ | ||
mv "$$(echo "$(1)" | sed "s/-$(3)$$//")" $(1) ;\ | ||
} | ||
endef |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,151 @@ | ||
# memory bandwidth exporter | ||
|
||
Pod/container grained memory bandwidth exporter provides users memory bandwidth metrics of their running containers. The metrics include llc_occupancy, mbm_local_bytes, mbm_total_bytes, cpu utilization and memory usage, and the metrics have been processed. In addition to container-level metrics, it also provides class-level and socket-level metrics. Users can configure the list of metrics to be collected. It serves as an exporter which can be connected to Promethus-like obserbility tools. And it also can be used as a telementry provider. | ||
|
||
Memory bandwidth exporter makes use of state-of-the-art technologies like NRI to build a resource-efficient and well-maintained solution. This solution provides observability to memory bandwidth to OPEA micro-services. It lays the groundwork of better scaling and auto scaling of OPEA. It can also be deployed separately on end user environments, supporting any cases that memory bandwidth metrics are required. | ||
|
||
The memory bandwidth exporter currently only supports Intel platforms with RDT, and will fail on other platforms. We will add node feature discovery in the future. | ||
|
||
## Setup | ||
|
||
### Enable NRI in Containerd | ||
|
||
```sh | ||
# download containerd binary, containerd version v1.7.0 or higher is required | ||
$ wget https://github.com/containerd/containerd/releases/download/v1.7.0/containerd-1.7.0-linux-amd64.tar.gz | ||
|
||
# stop running containerd | ||
$ sudo systemctl stop containerd | ||
|
||
# replace old containerd | ||
$ sudo tar Cxzvf /usr/local containerd-1.7.0-linux-amd64.tar.gz | ||
|
||
# enable NRI in containerd | ||
# add an item in /etc/containerd/config.toml | ||
[plugins."io.containerd.nri.v1.nri"] | ||
disable = false | ||
disable_connections = false | ||
plugin_config_path = "/etc/containerd/certs.d" | ||
plugin_path = "/opt/nri/plugins" | ||
socket_path = "/var/run/nri/nri.sock" | ||
config_file = "/etc/nri/nri.conf" | ||
|
||
# restart containerd | ||
$ sudo systemctl start containerd | ||
$ sudo systemctl status containerd | ||
|
||
# test nri | ||
$ git clone https://github.com/containerd/nri | ||
$ cd nri | ||
$ make | ||
$ ./build/bin/logger -idx 00 | ||
``` | ||
|
||
### Enable RDT | ||
|
||
Mount resctrl to the directory `/sys/fs/resctrl`: | ||
|
||
```sh | ||
$ sudo mount -t resctrl resctrl /sys/fs/resctrl | ||
``` | ||
|
||
### Setup memory bandwidth exporter | ||
|
||
Before setup, you need to configure the runc hook: | ||
|
||
```sh | ||
$ ./config/config.sh | ||
``` | ||
|
||
#### How to build the binary and setup? | ||
|
||
```sh | ||
$ make build | ||
$ sudo ./bin/memory-bandwidth-exporter | ||
# e.g., sudo ./bin/memory-bandwidth-exporter --collector.node.name=<node_name> --collector.container.namespaceWhiteList="calico-apiserver,calico-system,kube-system,tigera-operator" | ||
|
||
# get memory bandwidth metrics | ||
$ curl http://localhost:9100/metrics | ||
``` | ||
|
||
#### How to build the docker image and setup? | ||
|
||
```sh | ||
$ make docker.build | ||
$ sudo docker run \ | ||
-e NODE_NAME=<node_name> \ | ||
-e NAMESPACE_WHITELIST="calico-apiserver,calico-system,kube-system,tigera-operator" \ | ||
--mount type=bind,source=/etc/containers/oci/hooks.d/,target=/etc/containers/oci/hooks.d/ \ | ||
--privileged \ | ||
--cgroupns=host \ | ||
--pid=host \ | ||
--mount type=bind,source=/usr/,target=/usr/ \ | ||
--mount type=bind,source=/sys/fs/resctrl/,target=/sys/fs/resctrl/ \ | ||
--mount type=bind,source=/var/run/nri/,target=/var/run/nri/ \ | ||
-d -p 9100:9100 \ | ||
--name=memory-bandwidth-exporter \ | ||
opea/memory-bandwidth-exporter:latest | ||
|
||
# get memory bandwidth metrics | ||
$ curl http://localhost:9100/metrics | ||
``` | ||
|
||
#### How to deploy on the K8s cluster? | ||
|
||
Build and push your image to the location specified by `MBE_IMG`, and apply manifest: | ||
|
||
```sh | ||
$ make docker.build docker.push MBE_IMG=<some-registry>/opea/memory-bandwidth-exporter:<tag> | ||
$ make change_img MBE_IMG=<some-registry>/opea/memory-bandwidth-exporter:<tag> | ||
# If namespace system does not exist, create it. | ||
$ kubectl create ns system | ||
$ kubectl apply -f config/manifests/memory-bandwidth-exporter.yaml | ||
``` | ||
|
||
Check the installation result: | ||
|
||
```sh | ||
kubectl get pods -n system | ||
NAME READY STATUS RESTARTS AGE | ||
memory-bandwidth-exporter-zxhdl 1/1 Running 0 3m | ||
``` | ||
|
||
get memory bandwidth metrics | ||
|
||
```sh | ||
$ curl http://<memory_bandwidth_exporter_container_ip>:9100/metrics | ||
``` | ||
|
||
#### How to delete binary? | ||
|
||
```sh | ||
$ make clean | ||
``` | ||
|
||
## More flags about memory bandwidth exporter | ||
|
||
There are some flags to help users better use memory bandwidth exporter: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need to stop and restart if we want to change any of the flag values? Or is there an API on this service to set/update? Nice, I see a prometheus exporter - Push/pull There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, we need to stop and restart if we want to change any of the flag values. Thank you for your suggestion, we will implement it as a feature in the future. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We will add the feature of "dynamically updating flag parameters" in the issue#407. |
||
|
||
```sh | ||
-h, --[no-]help Show context-sensitive help (also try --help-long and --help-man). | ||
--collector.node.name="" Give node name. | ||
--collector.container.namespaceWhiteList="" Filter out containers whose namespaces belong to the namespace whitelist, namespaces separated by commas, like "xx,xx,xx". | ||
--collector.container.monTimes=10 Scan the pids of containers created before the exporter starts to prevent the loss of pids. | ||
--collector.container.metrics="all" Enable container collector metrics. | ||
--collector.class.metrics="none" Enable class collector metrics. | ||
--collector.node.metrics="none" Enable node collector metrics. | ||
--web.telemetry-path="/metrics" Path under which to expose metrics. | ||
--[no-]web.disable-exporter-metrics Exclude metrics about the exporter itself (promhttp_*, process_*, go_*). | ||
--web.max-requests=40 Maximum number of parallel scrape requests. Use 0 to disable. | ||
--runtime.gomaxprocs=1 The target number of CPUs Go will run on (GOMAXPROCS) ($GOMAXPROCS) | ||
--[no-]web.systemd-socket Use systemd socket activation listeners instead of port listeners (Linux only). | ||
--web.listen-address=:9100 ... Addresses on which to expose metrics and web interface. Repeatable for multiple addresses. | ||
--web.config.file="" Path to configuration file that can enable TLS or authentication. See: https://github.com/prometheus/exporter-toolkit/blob/master/docs/web-configuration.md | ||
--collector.interval=3s memory bandwidth exporter collect metrics interval | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. might we want to at some later stage monitor some containers more often than others, versus all at the same interval? Perhaps some heustic based approach .. if a workload is pretty steady monitor it less often. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for your suggestion, I will add this feature in issue#407. |
||
--NRIplugin.name="mb-nri-plugin" Plugin name to register to NRI | ||
--NRIplugin.idx="11" Plugin index to register to NRI | ||
--[no-]disableWatch Disable watching hook directories for new hooks | ||
--log.level=info Only log messages with the given severity or above. One of: [debug, info, warn, error] | ||
--log.format=logfmt Output format of log messages. One of: [logfmt, json] | ||
--[no-]version Show application version. | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be a daemon set that runs on every node? Feels that way given you specify node name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it should be a daemon set that runs on every node, and the node name is a label of the metrics.