-
Notifications
You must be signed in to change notification settings - Fork 489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Notification NOS running SAI and SDK, about HW/FW failures #1777
Notification NOS running SAI and SDK, about HW/FW failures #1777
Conversation
inc/saiswitch.h
Outdated
_In_ sai_timespec_t timestamp, | ||
_In_ sai_switch_asic_sdk_health_category_t category, | ||
_In_ sai_switch_health_data_t data, | ||
_In_ sai_u8_list_t description); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this probably should be const since contains pointer to data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now data is passed as value not as pointer
In sai_switch_health_data_t data,
Do you suggest changing to
In const sai_switch_health_data_t *data
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i was talking about sai_u8_list_t, since u8 list has pointer inside, which without const can be modified
c72d540
to
50b21ce
Compare
|
||
/** @passparam data_type */ | ||
sai_health_data_t data; | ||
} sai_health_t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't see this defined in header files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not defined in the headers since SER is a separate PR
The intention is to show the health event can be used as infrastructure for additional future event
For each event, we will need separate PR. The struct with specific details will be defined in the PR, and will added to the union.
I am not trying to address every possible health event at the moment, just create an extendable infrastructure which can easily be extended
_In_ sai_switch_asic_sdk_health_severity_t severity, | ||
_In_ sai_timespec_t timestamp, | ||
_In_ sai_switch_asic_sdk_health_category_t category, | ||
_In_ sai_switch_health_data_t data, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from definitions so far, this "data" is actually a data type, and only one type of data SAI_HEALTH_DATA_TYPE_GENERAL can be passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes currently we don't need to use data for the general type
but we created extendable infrastructure, per Jai's request, so additional events can be added
|
||
2) For all types that have additional data, a struct should be created. The first time a new type that has actual struct data will be created, union should be created as well. Each struct should be added to that union. | ||
|
||
For example if we want to add a SER item (https://github.com/opencomputeproject/SAI/pull/1307), the flow will be : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example looks not complete. It should show how the attribute and parameters should be assigned and passed to function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be addressed in SER specific PR
It is out of scope of this PR which is the general infrastructure
Could we please look into supporting the data defined in json? sai_json_t is available. |
@rlhui I don't see how it's related to SAI layer The parameters for the health event are well typed, other than the description field which is free text and vendor specific |
Please see #1324 as an example. For the categories added, they can also be defined as part of json format. |
Can we please update PR to add description |
Can we pleae also update PR title ? Currently it's truncated. |
Added description to the PR |
50b21ce
to
07cae66
Compare
build is failing |
|
07cae66
to
f6f25c3
Compare
Signed-off-by: Andrii Mandiuk <[email protected]>
c63f71f
to
3999b41
Compare
SAI_SWITCH_ASIC_SDK_HEALTH_SEVERITY_WARNING, | ||
|
||
/** Switch event severity notice */ | ||
SAI_SWITCH_ASIC_SDK_HEALTH_SEVERITY_NOTICE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do we know if a specific HEALTH issue is cleared. Also if its correctable/non-correctable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re if it's correctable or not, the severity field can be used as indication
If it's fatal, then it's not correctable
If it's warning or notice, then it is correctable
Re clearing of health issue
The concept is for one time event
It's not an alarm where a certain threshold is crossed and then can later be cleared when going under the threshold
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
typedef struct _sai_switch_health_data_t | ||
{ | ||
/** Type of switch health data */ | ||
sai_health_data_type_t data_type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this struct contains only one entry, is it not better to use data_type directly in notification ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's an extendable placeholder
check out possible extension for SER in the .MD above
The ask in the community was to be able to extend and add info for specific features in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we want to make structs constant then this would not expand any more
…teproject#1777) Health event is a way to inform NOS about some HW issue in the switch. When some error on switch occur, SAI should in some way inform a NOS about the status. In order to provide flexibility depending on different types issues SAI generates health event with several parameters that describes that event. Hence, to inform NOS - sai_switch_asic_sdk_health_event_notification_fn callback is invoked. Signed-off-by: Andrii Mandiuk <[email protected]>
Health event is a way to inform NOS about some HW issue in the switch. When some error on switch occur, SAI should in some way inform a NOS about the status. In order to provide flexibility depending on different types issues SAI generates health event with several parameters that describes that event. Hence, to inform NOS - sai_switch_asic_sdk_health_event_notification_fn callback is invoked.