Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Kepler installation guide #1

Merged
merged 7 commits into from
Apr 28, 2023

Conversation

Al-HusseinHameedJasim
Copy link
Collaborator

No description provided.

Copy link
Owner

@nikimanoledaki nikimanoledaki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thank you for the thorough write-up 👏
Left some Qs as I'm having some issues (similar to the ones I had before) when running this on MacOs 🤔

docs/02-kepler-installation-guide.md Show resolved Hide resolved
docs/02-kepler-installation-guide.md Outdated Show resolved Hide resolved
Comment on lines +68 to +71
# check Kepler's logs
kubectl logs kepler-exporter-<xxxxx> -n kepler -f
# get kepler related metrics
kubectl exec -ti -n kepler daemonset/kepler-exporter -- bash -c "curl localhost:9102/metrics"
Copy link
Owner

@nikimanoledaki nikimanoledaki Apr 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the steps don't work with my minikube configuration on MacOs with the Docker driver.

exporter logs:

k logs kepler-exporter-5l6vg -n kepler -f
I0413 09:23:19.309510       1 gpu.go:43] Failed to init nvml, err: could not init nvml: error opening libnvidia-ml.so.1: libnvidia-ml.so.1: cannot open shared object file: No such file or directory
I0413 09:23:19.311530       1 exporter.go:149] Kepler running on version: eaa17c7
I0413 09:23:19.311579       1 config.go:169] using gCgroup ID in the BPF program: true
I0413 09:23:19.311617       1 config.go:170] kernel version: 5.15
I0413 09:23:19.311776       1 rapl_msr_util.go:129] input/output error
I0413 09:23:19.311863       1 power.go:64] Not able to obtain power, use estimate method
modprobe: FATAL: Module kheaders not found in directory /lib/modules/5.15.49-linuxkit
chdir(/lib/modules/5.15.49-linuxkit/build): No such file or directory
I0413 09:23:20.779625       1 bcc_attacher.go:64] failed to attach the bpf program: <nil>
I0413 09:23:20.779689       1 bcc_attacher.go:124] failed to attach perf module with options [-DNUM_CPUS=6 -DSET_GROUP_ID]: failed to attach the bpf program: <nil>, not able to load eBPF modules
I0413 09:23:20.779714       1 exporter.go:184] failed to start : failed to attach bpf assets: failed to attach the bpf program: <nil>
I0413 09:23:20.779960       1 exporter.go:209] Started Kepler in 1.468330908s

it looks like it fails to find the kernel headers with linuxkit used as the node image. in your setup, did minikube setup the nodes using this image or a different one?

the kepler metric endpoint could not be reached:

curl http://localhost:9102/metrics

curl: (52) Empty reply from server

server logs:

Forwarding from 127.0.0.1:9102 -> 9102
Forwarding from [::1]:9102 -> 9102
Handling connection for 9102
E0413 11:27:39.880892    4870 portforward.go:409] an error occurred forwarding 9102 -> 9102: error forwarding port 9102 to pod 61a3c66c5ebc3e82d48151386d7d26cc37e7dbd9d971593e5f792bce305b0920, uid : exit status 1: 2023/04/13 09:27:39 socat[182461] E connect(5, AF=2 127.0.0.1:9102, 16): Connection refused
error: lost connection to pod

I will also try with a VM-based driver in case anything changes and get back to you!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thank you for the thorough write-up 👏 Left some Qs as I'm having some issues (similar to the ones I had before) when running this on MacOs 🤔

Thank you! :) It's my pleasure!
Let's try to make it work 🛠️

Copy link
Owner

@nikimanoledaki nikimanoledaki Apr 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I managed to get Kepler working with driver=hyperkit ! 🎉
Will add documentation for MacOS users in a separate PR.

Virtualbox was not an option because it breaks with the latest MacOS upgrade: kubernetes/minikube#15274

What do you think about the fact that we will be using different drivers? 🤔 It's good to have a dev env option available for MacOS. I will be home on Wednesday and will be able to run things on my Linux machine if needed. Maybe we can try cross-referencing the tests in both environments. Either way, I think it's good to start getting some numbers, and we can be upfront about the different environments that we used. This is more of a POC than a scientific paper. And either way, the ultimate test would be to run these tests in a consistent environment in the cloud rather than locally, right? :)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh. No. The energy data for the node is 0 so it's not reading it properly. 🤔

# HELP kepler_node_energy_stat Several labeled node metrics
# TYPE kepler_node_energy_stat counter
kepler_node_energy_stat{cpu_architecture="Sky Lake",node_block_devices_used="0",node_curr_bytes_read="0",node_curr_bytes_writes="0",node_curr_cache_miss="0",node_curr_container_cpu_usage_seconds_total="0",node_curr_container_memory_working_set_bytes="0",node_curr_cpu_cycles="0",node_curr_cpu_instr="0",node_curr_cpu_time="0",node_curr_energy_in_core_joule="0",node_curr_energy_in_dram_joule="0",node_curr_energy_in_gpu_joule="0",node_curr_energy_in_other_joule="0",node_curr_energy_in_pkg_joule="0",node_curr_energy_in_uncore_joule="0",node_name="minikube"} 0

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right!
I think for now the more variants of drivers, OSs, etc., the better for cross-checking and results validation.
I agree, the best would be spinning up a cluster in the cloud for the tests.

If it doesn't work on MacOS, I think we can still use Linux for the tests.

I have opened another PR for the first experiment scenario which involves deploying two applications (one at a time) using both, Argo CD and Flux CD, in addition to performing a rolling update and a rollback.

I am thinking of using the following query:

sum(rate(kepler_container_joules_total{container_namespace=~"argocd|flux-system"}[1m])) by (container_namespace)

git clone https://github.com/sustainable-computing-io/kepler.git -b v0.4
cd kepler/
# to configure Prometheus to scrape Kepler-exporter endpoints, Kepler exporter servicemonitor object is required
make build-manifest OPTS="PROMETHEUS_DEPLOY"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you guys applied make on mac? I tried to install it via brew but it is not working for me

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strange, brew install make should do it.. 🤔 did you get any errors?

Copy link
Owner

@nikimanoledaki nikimanoledaki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This worked for me on Arch Linux 🙌
Running it on MacOs is still an issue unfortunately 😢 but let's merge this for now and deal with MacOs separately.

@nikimanoledaki nikimanoledaki merged commit 7f4484b into nikimanoledaki:main Apr 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants