-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exposing Hardware Topology through CRDs in Node feature discovery #333
Comments
I'd suggest to use a more generic name for the package, "cr-updater" or something. Regarding the data/CRD, I was pondering should we prepare for more detailed information, e.g. bake in or make it possible to present numa distances(?) |
@marquiz Sure, we can make the package name more generic. As per the design of topology manager, cpu manager and device manager implement the hint providers interface and provide numa node id as a hint to topology manager. |
A way we can expose the numa distance could probably be like
However I second what @swatisehgal wrote, the thing is we were using the podresources API - and the extensions we were proposing here - as source of truth. However, if there is interest specifically about the numa distances we can maybe learn them reliably from sysfs, kinda like |
Thanks @fromanirh for this code snippet. Getting numa node distance from sysfs is totally doable and populating this as part of NUMANodeResource makes sense! I was considering numa node distance from the point of view of resources and their distance from each other (in my previous comment). |
I don't have the answers myself, either, just throwing some ideas 😊 I was just trying to think a bit ahead and broader and not "cement" this too tightly just for this one use case and scheduler extension. When we add something we should at least try to think about possible other users and usages, too, e.g. some other scheduler extensions and possibly alternative rte daemons digging more detailed data. Just trying to avoid the situation where we quickly patch together the api for one narrow use case and after a while we're stuck with that and have problems serving new users. This case is simple, but still. Maybe we have something to learn from how the Linux kernel handles apis 😄 |
@marquiz Thanks for the input. I see your point and have updated the design document to include NumaNode distances as part of the NumaNodeResource structure |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
Development work is in progress to support this feature. PR will be linked here soon! |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
/remove-lifecycle stale |
|
Expect the missing two PRs by monday |
|
/label kind/feature |
@ArangoGutierrez: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
Hey all, any updates on this feature set? |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
There is progress lately in e2e-tests and /configz endpoint /remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
This is still alive and tracking |
@ffromani @Tal-or @swatisehgal are we expecting new PRs here or should we close this issue? We could also track the missing pieces as separate issues |
@ffromani @PiotrProkop @Tal-or @swatisehgal should we close this issue as implemented and track any future enhancements in separate issues? |
From my perspective yes, we're now in a pretty good shape and we can track future enhancements with separate issues. |
+1. Nice to see that we have come a long way :) Let's close this issue. Thanks! |
Closing as suggested. Thanks for everybody involved for working on this 🎉 /close |
@marquiz: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Currently Node feature Discovery consists of nfd-master and nfd-worker. The former is responsible for labeling Kubernetes node objects and the latter is responsible for detecting features and communicating them to nfd-master. nfd-worker runs as a daemon-set on all the nodes in the cluster.
Resource Topology Exporter (KEP, Code) can nicely fit into NFD architecture by:
Design document explaining the approach and proposed changes are provided in detail here.
Possible implementation approaches of introducing Resource Topology Exporter in NFD operand are summarized below:
Option 1:Introducing Resource Topology Exporter as a source or a new command line option in the nfd-worker. New source (or new command line option) is where nfd-worker would gather hardware topology information. Nfd-master would update NumaTopology CRD
Option 2 (preferred): A new helper daemon eg nfd-node-topology (similar to nfd-worker).
The new helper gathers hardware topology information and sends to nfd-master over gRPC through another endpoint.
The communication between nfd-node-topology between nfd-master is over the separate endpoint which contains hardware topology information but can be easily enhanced to other CRDs
Nfd-master would update NumaTopology CRD using NodeTopologyRequest that it receives from nfd-worker
NOTE: NFD operator would have to be enhanced to cater to nfd-node-topology (like it manages nfd-master and nfd-worker)
The advantage of this approach is that nfd-worker and nfd-master continue to work the same way as they currently do
Changes to the gRPC interface: Introducing another gRPC endpoint for communication between nfd-worker and nfd-master. (completely separate proto as shown below)
Additionally, we propose that NFD becomes home for NodeResourceTopology CRD API definition (and informer, clientset, handlers etc)
The text was updated successfully, but these errors were encountered: