-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update the code to use scontrol in place of slurm APIs to drain the #74
base: main
Are you sure you want to change the base?
Conversation
3b048a6
to
478801c
Compare
@@ -219,15 +220,34 @@ func (s *Server) processRequest(AppConfig Config, conn net.Conn, req *http.Reque | |||
log.Printf("Origin Of Condition: %s", originOfCondition) | |||
for _, triggerEvent := range AppConfig.TriggerEvents { | |||
if severity == triggerEvent.Severity { | |||
log.Printf("Matched Trigger Event: %s with action %s", triggerEvent.Severity, triggerEvent.Action) | |||
if triggerEvent.Message != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the action will only be triggered if the severity message matches as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the user doesn't want to match on the 'message' field, it can be left empty. I will add it to the comments in .env
TRIGGER_EVENTS="[\ | ||
{\"Severity\":\"Fatal\",\"Action\":\"DrainNode\"},\ | ||
{\"Severity\":\"Critical\",\"Action\":\"DrainNode\"} | ||
{\"Severity\":\"Critical\",\"Message\":\"message 1|This is a critical test event\",\"Action\":\"DrainNode\", \"DrainReasonPrefix\":\"RebootNeeded\"},\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is an action necessary if it's just going to be a drain anyway?
Could you also add a comment for these fields as well to make it clear?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left it as is in case we need to use it for some other action in the future. Will update the comments
redfish-exporter/slurm/slurm.go
Outdated
@@ -160,6 +201,183 @@ func (c *Client) GetNodes() ([]string, error) { | |||
return nodes, nil | |||
} | |||
|
|||
func (c *Client) GetNodeReason(nodeName string) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this code needed anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left in the functions which use API for future use. I will update the name of this function
"log" | ||
"strings" | ||
|
||
"github.com/nod-ai/ADA/redfish-exporter/metrics" | ||
) | ||
|
||
const ( | ||
Drain = "DrainNode" | ||
Drain = "DrainNode" | ||
ExlcudeReasonSet = "DRAIN_EXCLUDE_REASON_SET" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the user provide this? Is this different than the exclude reason from .env?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for internal logging
node to make it slurm version independent
478801c
to
034e2ad
Compare
@@ -160,6 +201,183 @@ func (c *Client) GetNodes() ([]string, error) { | |||
return nodes, nil | |||
} | |||
|
|||
func (c *Client) GetNodeReasonWithAPI(nodeName string) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed now that we are moving everything to scontrol
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had tested this API path also, left it in if we move to API later
node to make it slurm version independent