Fix rasdaemon crash during bootup on AMD CPU #100
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why I did it
Booting SONiC on a AMD EPYC 16-Core CPU is causing rasdaemon to crash. This is not a major blocker because rasdaemon eventually restarts and is stable after a point.
Coredump stack trace:
Known issue for rasdaemon:
https://github.com/mchehab/rasdaemon/issues/77
Fixed here:
https://github.com/mchehab/rasdaemon/commit/f1ea76375281001cdf4a048c1a4a24d86c6fbe48
Unfortunately this fix is not present in the default bookworm version. So, backported the fix and compiled rasdaemon from source
Here is the patch: https://sources.debian.org/patches/rasdaemon/0.8.0-2/0001-Check-CPUs-online-not-configured.patch/
Work item tracking
How I did it
How to verify it
Booted the image built with these changes and no issue in observed
Before this change:
Which release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)