-
Notifications
You must be signed in to change notification settings - Fork 547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[chassis][syncd][sai] Adjusting response timeout during syncd init #2159
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
In VOQ based chassis where syncd uses VOQ SAI, if there are large number of front panel ports, SAI takes more than 1 minutes to complete the switch create initialization. Because of this, the switch create request sent by orchagent is not getting response within the default response wait time of 1 minute. So the orchagent declares switch create failure and crashes. The number of ports need to be initialized by SAI depends on number of ports per asic and total number of system ports configured in the system. The total number of system ports in the system in turn depends on number of line cards supported, number of asics per line card and number of ports supported by each asic. Therefore in a fully populated system, which is an often expected scenario, this crashing will happen. To fix this, in orchagent, the syncd response time out is set to 5 minutes for line (voq) card and 10 minutes for supervisor (fabric) card before sending request for switch create and is set back to default wait time after the switch create. Signed-off-by: vedganes <[email protected]>
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
1 similar comment
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
judyjoseph
approved these changes
Mar 1, 2022
prsunny
approved these changes
Mar 1, 2022
judyjoseph
pushed a commit
that referenced
this pull request
Mar 1, 2022
…2159) In VOQ based chassis where syncd uses VOQ SAI, if there are large number of front panel ports, SAI takes more than 1 minutes to complete the switch create initialization. Because of this, the switch create request sent by orchagent is not getting response within the default response wait time of 1 minute. So the orchagent declares switch create failure and crashes. To fix this, in orchagent, the syncd response time out is set to 5 minutes for line (voq) card and 10 minutes for supervisor (fabric) card before sending request for switch create and is set back to default wait time after the switch create. Signed-off-by: vedganes <[email protected]>
6 tasks
preetham-singh
pushed a commit
to preetham-singh/sonic-swss
that referenced
this pull request
Aug 6, 2022
…onic-net#2159) In VOQ based chassis where syncd uses VOQ SAI, if there are large number of front panel ports, SAI takes more than 1 minutes to complete the switch create initialization. Because of this, the switch create request sent by orchagent is not getting response within the default response wait time of 1 minute. So the orchagent declares switch create failure and crashes. To fix this, in orchagent, the syncd response time out is set to 5 minutes for line (voq) card and 10 minutes for supervisor (fabric) card before sending request for switch create and is set back to default wait time after the switch create. Signed-off-by: vedganes <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What I did
Fix for syncd response time out for switch create request from orchagent
Why I did it
In VOQ based chassis where syncd uses VOQ SAI, if there are large number of front panel ports, SAI takes more than 1 minutes to complete the switch create initialization. Because of this, the switch create request sent by orchagent is not getting response within the default response wait time of 1 minute. So the orchagent declares switch create failure and crashes.
The number of ports need to be initialized by SAI depends on number of ports per asic and total number of system ports configured in the system. The total number of system ports in the system in turn depends on number of line cards supported, number of asics per line card and number of ports supported by each asic. Therefore in a fully populated system, which is an often expected scenario, this crashing will happen.
To fix this, in orchagent, the syncd response time out is set to 5 minutes for line (voq) card and 10 minutes for supervisor (fabric) card before sending request for switch create and is set back to default wait time after the switch create.
How I verified it
Details if related