-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Kepler installation guide #1
Add Kepler installation guide #1
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, thank you for the thorough write-up 👏
Left some Qs as I'm having some issues (similar to the ones I had before) when running this on MacOs 🤔
# check Kepler's logs | ||
kubectl logs kepler-exporter-<xxxxx> -n kepler -f | ||
# get kepler related metrics | ||
kubectl exec -ti -n kepler daemonset/kepler-exporter -- bash -c "curl localhost:9102/metrics" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the steps don't work with my minikube configuration on MacOs with the Docker driver.
exporter logs:
k logs kepler-exporter-5l6vg -n kepler -f
I0413 09:23:19.309510 1 gpu.go:43] Failed to init nvml, err: could not init nvml: error opening libnvidia-ml.so.1: libnvidia-ml.so.1: cannot open shared object file: No such file or directory
I0413 09:23:19.311530 1 exporter.go:149] Kepler running on version: eaa17c7
I0413 09:23:19.311579 1 config.go:169] using gCgroup ID in the BPF program: true
I0413 09:23:19.311617 1 config.go:170] kernel version: 5.15
I0413 09:23:19.311776 1 rapl_msr_util.go:129] input/output error
I0413 09:23:19.311863 1 power.go:64] Not able to obtain power, use estimate method
modprobe: FATAL: Module kheaders not found in directory /lib/modules/5.15.49-linuxkit
chdir(/lib/modules/5.15.49-linuxkit/build): No such file or directory
I0413 09:23:20.779625 1 bcc_attacher.go:64] failed to attach the bpf program: <nil>
I0413 09:23:20.779689 1 bcc_attacher.go:124] failed to attach perf module with options [-DNUM_CPUS=6 -DSET_GROUP_ID]: failed to attach the bpf program: <nil>, not able to load eBPF modules
I0413 09:23:20.779714 1 exporter.go:184] failed to start : failed to attach bpf assets: failed to attach the bpf program: <nil>
I0413 09:23:20.779960 1 exporter.go:209] Started Kepler in 1.468330908s
it looks like it fails to find the kernel headers with linuxkit
used as the node image. in your setup, did minikube setup the nodes using this image or a different one?
the kepler metric endpoint could not be reached:
curl http://localhost:9102/metrics
curl: (52) Empty reply from server
server logs:
Forwarding from 127.0.0.1:9102 -> 9102
Forwarding from [::1]:9102 -> 9102
Handling connection for 9102
E0413 11:27:39.880892 4870 portforward.go:409] an error occurred forwarding 9102 -> 9102: error forwarding port 9102 to pod 61a3c66c5ebc3e82d48151386d7d26cc37e7dbd9d971593e5f792bce305b0920, uid : exit status 1: 2023/04/13 09:27:39 socat[182461] E connect(5, AF=2 127.0.0.1:9102, 16): Connection refused
error: lost connection to pod
I will also try with a VM-based driver in case anything changes and get back to you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, thank you for the thorough write-up 👏 Left some Qs as I'm having some issues (similar to the ones I had before) when running this on MacOs 🤔
Thank you! :) It's my pleasure!
Let's try to make it work 🛠️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I managed to get Kepler working with driver=hyperkit
! 🎉
Will add documentation for MacOS users in a separate PR.
Virtualbox was not an option because it breaks with the latest MacOS upgrade: kubernetes/minikube#15274
What do you think about the fact that we will be using different drivers? 🤔 It's good to have a dev env option available for MacOS. I will be home on Wednesday and will be able to run things on my Linux machine if needed. Maybe we can try cross-referencing the tests in both environments. Either way, I think it's good to start getting some numbers, and we can be upfront about the different environments that we used. This is more of a POC than a scientific paper. And either way, the ultimate test would be to run these tests in a consistent environment in the cloud rather than locally, right? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argh. No. The energy data for the node is 0 so it's not reading it properly. 🤔
# HELP kepler_node_energy_stat Several labeled node metrics
# TYPE kepler_node_energy_stat counter
kepler_node_energy_stat{cpu_architecture="Sky Lake",node_block_devices_used="0",node_curr_bytes_read="0",node_curr_bytes_writes="0",node_curr_cache_miss="0",node_curr_container_cpu_usage_seconds_total="0",node_curr_container_memory_working_set_bytes="0",node_curr_cpu_cycles="0",node_curr_cpu_instr="0",node_curr_cpu_time="0",node_curr_energy_in_core_joule="0",node_curr_energy_in_dram_joule="0",node_curr_energy_in_gpu_joule="0",node_curr_energy_in_other_joule="0",node_curr_energy_in_pkg_joule="0",node_curr_energy_in_uncore_joule="0",node_name="minikube"} 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right!
I think for now the more variants of drivers, OSs, etc., the better for cross-checking and results validation.
I agree, the best would be spinning up a cluster in the cloud for the tests.
If it doesn't work on MacOS, I think we can still use Linux for the tests.
I have opened another PR for the first experiment scenario which involves deploying two applications (one at a time) using both, Argo CD and Flux CD, in addition to performing a rolling update and a rollback.
I am thinking of using the following query:
sum(rate(kepler_container_joules_total{container_namespace=~"argocd|flux-system"}[1m])) by (container_namespace)
git clone https://github.com/sustainable-computing-io/kepler.git -b v0.4 | ||
cd kepler/ | ||
# to configure Prometheus to scrape Kepler-exporter endpoints, Kepler exporter servicemonitor object is required | ||
make build-manifest OPTS="PROMETHEUS_DEPLOY" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you guys applied make on mac? I tried to install it via brew but it is not working for me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strange, brew install make
should do it.. 🤔 did you get any errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This worked for me on Arch Linux 🙌
Running it on MacOs is still an issue unfortunately 😢 but let's merge this for now and deal with MacOs separately.
No description provided.