-
Notifications
You must be signed in to change notification settings - Fork 3
2019 09 Meeting notes
Jeff Hanson: HPE Natalie Bates: EEHPCWG Steve Martin: Cray Torsten Wilde: HPE Matt Kappel: Cray Sid Jana: Intel Ryan Grant: SNL Barry Rountree: LLNL Jeff Autor: HPE Todd Rosedahl: IBM Stephanie Brink: LLNL
PowerAPI BoF at SC PowerAPI F2F PowerAPI Version 1.0 Any other business - Notifications proposal for PowerAPI to provide to Power Stack. Natalie would like to talk about Green 500 methodology.
PowerAPI BoF was accepted. Schedule has changed a couple times. Currently Tuesday at noon. Does not overlap with Power Stack BoF. Does overlap with lots of stuff but this is the usual case with SC. Plan for BoF (Ryan/Torsten)
- Positioning of the PowerAPI. How does it relate to rest of ecosystem. What the
- Practical implementation and use of PowerAPI. Especially PowerAPI on arm for example (Astra). What the engineering challenges were in getting information out of a system like Astra.
- Panel and interaction with audience (theme TBD)
- Sid would like to be part of the Panel planning
- Natalie notes the Jeff Autor is here and in the past we have had a heavy Redfish part. Jeff thinks we don't need to have too much more on Redfish. DMTF will have a booth.
- Jim Laros will have a conflict with the procurement BoF.
Green 500 methodology. Measurement taken with entirely with Redfish but this is not quite as clearly stated. Steve said on questions it would not be worthy of a level 3. Jeff said that Alan had questions about implementations and performance limits on how quickly they could sample the sensors. Jeff said no change to spec/schema was done as the questions were on implementation. Not an HPE system so Jeff does not have direct knowledge. That being said there is work on schema to review if they are sufficient when lower level sensors are being read. Sensor model is more at what Power API wanted before (meta sensor data). Todd asked what is being refined? OpenBMC wants to use Redfish but Todd doesn't see that volume and velocity being possible. Cray is moving 500+ nodes, 1 Hz, pure JSON. Todd asked what % of bandwidth of 1G link because OCP says you can only use so much for telemetry. Cray has not measured this. Todd asks if you subscribe can you get just a section of data? Steve says this depends on the implementation and they have several subscriber option. Jeff tells people to attend the eSIM weekly calls to discuss. Jeff says the people need to recall that DMTF has a much more open, faster cycle for request to spec. At ORNL, IBM has OpenBMC using websockets to push data to a service node. To run analytics, alert, ... Original request was for pump optimization. OpenBMC plans to use Redfish not websockets. Todd (as IBM is major contributor to OpenBMC) wants to do it once and be supported.
Natalie asked Ryan if PowerAPI should be part of OpenBMC? Ryan replied reference implementation would need a different class of work than specification. Power API is portable and the abstraction layer works. For reference implementation would need another plugin (or work on an existing one). Example is Astra where the apps with power api instrumentation required no changes. Todd asked he could share he and Ryan have talked about. Todd will do so.
Ryan believes there is system software use cases for PowerAPI that are different from application ones. Should be part of the BoF. Todd can cover this. Which Ryan would like.
Sid's topic on Notification proposal. On how to communicate between layers in Power Stack. It is currently a branch in github. Ryan explained the PS does not have a protocol for how pieces communicated. Power API doesn't care about how. Example is slurm to tell job manager about job power cap which then the job manager tells it's nodes to set their power cap. We have these specific functions in PA. But we don't have a way pass general notifications around the systems when layers are not available. so the idea is to be able to pass notifications at a higher abstraction. Message is sent to an entity as defined in PA. Sid added that the way the Notification proposal is the all entities run an instance of PA (in some way). Sid asked would it be fair to compare NP as RPC or Active Messages? That is the sendor doesn't have to care about the receiver knowing what to do with it. Ryan said it wasn't that clear. Proposal is at a very high level because it all use cases were not understood. Some entities may not need to react to the messages so a more lock step approach would not make sense. Sid we are taking baby steps by geopm team with LLNL, ANL, <>, LRZ on what they wish to send from Job Scheduler to PS. Tie with notification proposal is what kind of notifications to send. geopm plugin slurmctl and one for slurmd. do a quick prototype. no concrete protocol yet use existing slurm plugin and notification parameters. Notification proposal has some level skipping ideas because of corner cases on failure cases. PA has not yet looked at this. Barry said LLNL has the idea of system admin can cap without anyone having an option of changing it and use of the notification as a way to tell that it is set. Sid said that Cray and HPE mentioned there are already tools to control power (like Barry's example) so how do the other entities react? Sid asked are there workload managers enabled to power nodes off under it's own intelligence? Steve said that this is typically an admin human action. This will get added to agenda for F2F.
Ryan will send out more details as we ran out of time.