-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
panic: failed to apply request nomad.(*nomadFSM).Apply #10943
Comments
Interestingly it looks like the message was an empty
package main
import (
"fmt"
"github.com/hashicorp/nomad/nomad/structs"
)
func main() {
buf := []byte{0x2e, 0x84, 0xa6, 0x52, 0x65, 0x67, 0x69, 0x6f, 0x6e, 0xa3, 0x73, 0x74, 0x67, 0xa9, 0x4e, 0x61, 0x6d, 0x65, 0x73, 0x70, 0x61, 0x63, 0x65, 0xa0, 0xa9, 0x41, 0x75, 0x74, 0x68, 0x54, 0x6f, 0x6b, 0x65, 0x6e, 0xda, 0x0, 0x24, 0x30, 0x38, 0x32, 0x62, 0x63, 0x33, 0x35, 0x63, 0x2d, 0x63, 0x64, 0x63, 0x33, 0x2d, 0x65, 0x35, 0x30, 0x39, 0x2d, 0x31, 0x62, 0x65, 0x61, 0x2d, 0x37, 0x34, 0x62, 0x31, 0x63, 0x64, 0x62, 0x38, 0x30, 0x65, 0x65, 0x34, 0xa9, 0x46, 0x6f, 0x72, 0x77, 0x61, 0x72, 0x64, 0x65, 0x64, 0xc2}
msgType := structs.MessageType(buf[0])
var req structs.OneTimeToken
if err := structs.Decode(buf[1:], &req); err != nil {
panic(fmt.Errorf("failed to decode request: %v", err))
}
fmt.Println("Message ID: ", msgType)
fmt.Printf("OneTimeSecretID: '%s'\n", req.OneTimeSecretID)
fmt.Printf("AccessorID: '%s'\n", req.AccessorID)
fmt.Printf("ExpiresAt: '%s'\n", req.ExpiresAt)
fmt.Printf("CreateIndex: '%d'\n", req.CreateIndex)
fmt.Printf("ModifyIndex: '%d'\n", req.ModifyIndex)
} 🤔 I'm not sure how a |
Yikes @Mongey ! That is very terrible indeed. We'll investigate and cut a patch bug fix shortly. It looks like the We will follow up with more update by tomorrow. I'm very sorry that you ran into this issue and had the outage, and thank you for reaching out. We will follow up with more info shortly. |
Thanks @Mongey again. We have a fix! I was able to reproduce the behavior by having a 1.1.X leader and 1.0.X followers and issuing a Though, I couldn't reproduce the failure while 1.0.X servers remained the leaders, and I'm unable to discern the code that can trigger such failure. If that what occured in your environment, I'd love having the ~15 minutes logs of all server if possible. You can redact it and email it to [email protected] if that works better for you than posting it here. |
Fix a panic in handling one-time auth tokens, used to support `nomad ui --authenticate`. If the nomad leader is a 1.1.x with some servers running as 1.0.x, the pre-1.1.0 servers risk crashing and the cluster may lose quorum. That can happen when `nomad authenticate -ui` command is issued, or when the leader scans for expired tokens every 10 minutes. Fixed #10943 .
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
v1.0.4
/v1.1.2
Operating system and Environment details
Linux ip-<redacted> 4.9.0-16-amd64 #1 SMP Debian 4.9.272-1 (2021-06-21) x86_64 GNU/Linux
Issue
While upgrading from v1.0.4. to v1.1.2 I encountered some issues, and lost the cluster leadership, requiring me to perform an Outage Recovery
We've got 3 server nodes and were replacing nodes via an aws instance-refresh, going from 3 -> 4, 30 min wait, back to 3,. Something like this
When a v1.1.2 node was introduced the v1.0.4 leader crashed with the following logs
Reproduction steps
I haven't attempted to reproduce this yet.
Expected Result
No 💥 💣
Actual Result
panic.
Job file (if appropriate)
n/a
Nomad Server logs (if appropriate)
See above? (I can get more logs if required)
Nomad Client logs (if appropriate)
n/a
The text was updated successfully, but these errors were encountered: