-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Soft lock up issues in eks nodes with logs indicating ena issue #129
Comments
Hi @cshivashankar , In order to make this efficient, can you please contact me through my email address [email protected] so that we continue investigating the issue offline. I need you to provide me with more information. Thanks, |
Thanks @sameehj for the quick response. Regards, |
I'm closing this issue for now since we have resolved it offline, please feel free to reopen if needed. Thanks, |
@sameehj we're experiencing very similar log messages and behavior on EKS 1.16 running AmazonLinux. Are you able to provide details on the resolution you reached? TIA -Erik |
Hi Erik
Could mind sharing instance-id for trouble shooting on our side ?
Feel free to send it to [email protected]<mailto:[email protected]>
…-Nafea
From: Erik Schwartz <[email protected]>
Reply-To: amzn/amzn-drivers <[email protected]>
Date: Wednesday, May 27, 2020 at 5:01 PM
To: amzn/amzn-drivers <[email protected]>
Cc: Subscribed <[email protected]>
Subject: Re: [amzn/amzn-drivers] Soft lock up issues in eks nodes with logs indicating ena issue (#129)
@sameehj<https://github.com/sameehj> we're experiencing very similar log messages and behavior on EKS 1.16 running AmazonLinux. Are you able to provide details on the resolution you reached? TIA -Erik
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub<#129 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AFTRWCPVWFJEMT22GPIEL33RTWSR3ANCNFSM4MYDSTGA>.
|
Hi @eeeschwartz I have raised issue for AMI at awslabs/amazon-eks-ami#454. |
I'm cautiously optimistic that upgrading our CNI plugin to 1.6.1 has resolved the issue. We were seeing 1-2 nodes churn per hour. Since upgrading we have 10 hours w/o churn so it looks promising. Thanks for the help |
Hi,
We are running eks clusters on version 1.14 , often we are experiencing node issues and node becomes unresponsive due to soft lock up issues , when logs were analyzed following information was found
It shows that Transaction wasnt completed in time by ENA on eth3 which could have triggered the issue. Due to this soft lockup issue , node doesn't recover . Eventhough its a EKS node issue , logs indicate there could be some issue with ENA.
Can you please confirm if there is any ENA issue related to this ? How can this be solved ?
Kindly let me know if any other information is required from my end.
The text was updated successfully, but these errors were encountered: