-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add stats to rkt driver #2400
Comments
Seems reasonable. Main concern is that they do not guarantee any stability of the API. |
agreed, that's a risk. the fall-back behavior would have to be to behave as it does today (returning nothing). I notice that the executor already provides some stats via cgroup inspection; would it be possible to just defer to that, as the exec driver does? |
@blalor It would be. The main thing to get would be the cgroup parent (path at which the cgroup files are written) so that stats could be collected. |
@blalor Are you using rkt in production? |
Getting there. It and Nomad (along with Consul, Vault, and Terraform) are core components of new infrastructure we're rolling out. |
I'm showing my ignorance of cgroups, but aren't the rkt-created cgroups children of the Nomad executor's? |
@blalor Yeah you are right. Its been a while since I looked at rkt code. It may be simpler to get this behavior. It may just be not using rkt's isolation in preference of Nomads and then stats would come for free |
We rely on couple of metrics in order to auto-scale deployments on Nomad, including resource usage. Without this feature supported by Nomad natively and without "pods" in Nomad (containers sharing namespaces but having different resource limits) we have to report to dirty hacks such as injecting primitive process manager into a container and running a custom script that analyzes container's cgroup stats on tmpfs and exports that data using our monitoring network. This not only increases complexity tremendously by adding a bunch of moving parts but also not as reliable since resource exhaustion within a container might potentially lead to missing data if the script cannot execute. |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad v0.5.4
The rkt driver doesn't support stat collection/reporting. I initially thought it'd be possible to just report on the cpu, memory, etc. used by the executor, but since rkt handles container isolation and sets up cgroups for the pod, I don't think that will work.
rkt itself doesn't seem to expose any stats. but the rkt api does expose the cgroup. So:
--uuid-file-save
InspectPod
, which will return the cgroup, like"/machine.slice/machine-rkt\\x2d5922fb6f\\x2db4a9\\x2d4408\\x2daf94\\x2d419a4a6efbfe.scope"
I know very little about cgroups. On one of my instances:
Does this seem like a reasonable approach?
The text was updated successfully, but these errors were encountered: