Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime error: invalid memory address or nil pointer dereference #19644

Closed
shantanugadgil opened this issue Jan 6, 2024 · 6 comments · Fixed by #19652
Closed

runtime error: invalid memory address or nil pointer dereference #19644

shantanugadgil opened this issue Jan 6, 2024 · 6 comments · Fixed by #19652
Assignees
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/jobspec type/bug

Comments

@shantanugadgil
Copy link
Contributor

shantanugadgil commented Jan 6, 2024

Nomad version

Nomad v1.7.2
BuildDate 2023-12-13T19:59:42Z
Revision 64e3dca

Operating system and Environment details

Amazon Linux 2 / Amazon Linux 2023

Issue

2024-01-06T19:04:30.003Z [ERROR] http: http: panic serving NN.NN.NN.NNN:39804: runtime error: invalid memory address or nil pointer dereference                              
goroutine 3299 [running]:                                                                                                                                                    
net/http.(*conn).serve.func1()                                                                                                                                               
        net/http/server.go:1868 +0xb9                                                                                                                                        
panic({0x2a60280?, 0x4f15a40?})                                                                                                                                              
        runtime/panic.go:920 +0x270                                                                                                                                          
github.com/hashicorp/nomad/nomad/structs.(*Job).Validate(0xc005aefb00)                                                                                                       
        github.com/hashicorp/nomad/nomad/structs/structs.go:4677 +0x1c60                                                                                                     
github.com/hashicorp/nomad/nomad.(*jobValidate).Validate(0xc00084ef78, 0xc005aefb00)                                                                                         
        github.com/hashicorp/nomad/nomad/job_endpoint_hooks.go:356 +0x4e                                                                                                     
github.com/hashicorp/nomad/nomad.(*Job).admissionValidators(0xc000e36460, 0x40b0b8?)                                                                                         
        github.com/hashicorp/nomad/nomad/job_endpoint_hooks.go:138 +0x13c                                                                                                    
github.com/hashicorp/nomad/nomad.(*Job).admissionControllers(0xc0006da000?, 0x3038d73?)                                                                                      
        github.com/hashicorp/nomad/nomad/job_endpoint_hooks.go:105 +0x51                                                                                                     
github.com/hashicorp/nomad/nomad.(*Job).Plan(0xc000e36460, 0xc00617c300, 0xc00417e310)                                                                                       
        github.com/hashicorp/nomad/nomad/job_endpoint.go:1804 +0x196                                                                                                         
reflect.Value.call({0xc00009f680?, 0xc00084f160?, 0x7f61c50dad98?}, {0x303a316, 0x4}, {0xc005735538, 0x3, 0x0?})                                           
        reflect/value.go:596 +0xce7                                                   
reflect.Value.Call({0xc00009f680?, 0xc00084f160?, 0x4a7435?}, {0xc005735538?, 0xc005735588?, 0xa19bc7?})                                                                                                           
        reflect/value.go:380 +0xb9                                                                       
net/rpc.(*service).call(0xc000e266c0, 0x2db8640?, 0x9?, 0x0, 0xc000643880, 0x40?, {0x2f07880?, 0xc00617c300?, 0x0?}, {0x2bd9ce0, ...}, ...)                                                                        
        net/rpc/server.go:382 +0x211                                                                     
net/rpc.(*Server).ServeRequest(0x415325?, {0x37e2038, 0xc004ae6100})                                                                                                                                               
        net/rpc/server.go:503 +0x165                                                                     
github.com/hashicorp/nomad/nomad.(*Server).RPC(0xc0006da000, {0x304de73, 0x8}, {0x2f07880?, 0xc00617c2a0}, {0x2bd9ce0?, 0xc00417e2a0})                                                                             
        github.com/hashicorp/nomad/nomad/server.go:1984 +0xeb                                                          
github.com/hashicorp/nomad/command/agent.(*Agent).RPC(0xc005fd5420?, {0x304de73?, 0xc0049e9700?}, {0x2f07880?, 0xc00617c2a0?}, {0x2bd9ce0?, 0xc00417e2a0?})                  
        github.com/hashicorp/nomad/command/agent/agent.go:1274 +0x11b                                                                                                        
github.com/hashicorp/nomad/command/agent.(*HTTPServer).jobPlan(0xc005fd5420, {0x37de6f0, 0xc00617c120}, 0xc0049e9700, {0xc006adac0c, 0x15})                                                                        
        github.com/hashicorp/nomad/command/agent/job_endpoint.go:189 +0x324                              
github.com/hashicorp/nomad/command/agent.(*HTTPServer).JobSpecificRequest(0x0?, {0x37de6f0, 0xc00617c120}, 0xc0049e9700)                                                                                                                      
        github.com/hashicorp/nomad/command/agent/job_endpoint.go:84 +0x405                                             
github.com/hashicorp/nomad/command/agent.(*HTTPServer).registerHandlers.(*HTTPServer).wrap.func4({0x37de6f0, 0xc00617c120}, 0xc0049e9700)                                                                                                     
        github.com/hashicorp/nomad/command/agent/http.go:715 +0x168                                                    
net/http.HandlerFunc.ServeHTTP(0xc000105800?, {0x37de6f0?, 0xc00617c120?}, 0xc004ad99e8?)                                                                                                                                                     
        net/http/server.go:2136 +0x29               
net/http.(*ServeMux).ServeHTTP(0x0?, {0x37de6f0, 0xc00617c120}, 0xc0049e9700)                                          
        net/http/server.go:2514 +0x142                     
github.com/hashicorp/nomad/command/agent.NewHTTPServers.CompressHandler.CompressHandlerLevel.func3({0x37d6e70?, 0xc0007c0540}, 0xc0049e9700)                                                                                                  
        github.com/gorilla/[email protected]/compress.go:141 +0x547                                                      
net/http.HandlerFunc.ServeHTTP(0x415325?, {0x37d6e70?, 0xc0007c0540?}, 0xc0007c0501?)                                  
        net/http/server.go:2136 +0x29                      
net/http.serverHandler.ServeHTTP({0x37ce378?}, {0x37d6e70?, 0xc0007c0540?}, 0x6?)                                      
        net/http/server.go:2938 +0x8e                      
net/http.(*conn).serve(0xc004429170, {0x37e1438, 0xc0067bfad0})                                                        
        net/http/server.go:2009 +0x5f4                     
created by net/http.(*Server).Serve in goroutine 347                                                                   
        net/http/server.go:3086 +0x5cb                     
2024-01-06T19:04:37.658Z [INFO]  nomad: setting up raft bolt store: no_freelist_sync=false       

Reproduction steps

submitting a particular job via Terraform (1.6.6) and the Terraform provider (2.1.0) causes the server leader to crash.

Expected Result

This should not happen. Downgrading the servers to 1.6.5 make the error go away.

Actual Result

server crashes, leader changes

Job file (if appropriate)

too big and customized to share for now.

Nomad Server logs (if appropriate)

the segfault traceback has been posted above

Nomad Client logs (if appropriate)

N/A

NOTE: This has started occurring recently.

I have already upgraded all the clients to 1.7.2

How long before I would have to downgrade all client to 1.6.5 as well? OR can I leave the cluster like this until 1.7.3 (hopefully with fixes)?

@shantanugadgil
Copy link
Contributor Author

After applying the job without errors, I tried yet-another upgrade of the servers to 1.7.2 and the issue recurred.

I am downgrading the servers again to 1.6.5 and leaving them for now. I shall try to reproduce the issue maybe tomorrow

@jrasell
Copy link
Member

jrasell commented Jan 8, 2024

Hi @shantanugadgil and thanks for raising this issue which I have been able to reproduce locally running a dev agent. It looks like this occurs due to a specific combination of parameters which you can see within the example jobspec below. I will work on a fix for this and should have a PR ready shortly.

How long before I would have to downgrade all client to 1.6.5 as well? OR can I leave the cluster like this until 1.7.3 (hopefully with fixes)?

This bug will only impact the Nomad servers and does not impact the Nomad clients.

main.tf file:

terraform {
  required_providers {
    nomad = {
      source  = "hashicorp/nomad"
      version = "= 2.1.0"
    }
  }
}

provider "nomad" {
  address = "http://127.0.0.1:4646"
}

resource "nomad_job" "gh19644" {
  jobspec = file("${path.module}/gh19644.nomad.hcl")
}

gh19644.nomad.hcl file:

job "gh19644" {
  type = "system"
  group "cache" {
    max_client_disconnect      = "1h"
    prevent_reschedule_on_lost = true
    task "redis" {
      driver = "docker"
      config {
        image = "redis:7"
      }
    }
  }
}

The server panic output:

    2024-01-08T09:54:34.710Z [DEBUG] http: request complete: method=PUT path=/v1/job/gh19644/plan?namespace=default duration="133.75µs"
    2024-01-08T09:54:34.710Z [ERROR] http: http: panic serving 127.0.0.1:51394: runtime error: invalid memory address or nil pointer dereference
goroutine 435 [running]:
net/http.(*conn).serve.func1()
	net/http/server.go:1868 +0xb0
panic({0x104e65540?, 0x1068415d0?})
	runtime/panic.go:920 +0x26c
github.com/hashicorp/nomad/nomad/structs.(*Job).Validate(0x1400102ab40)
	github.com/hashicorp/nomad/nomad/structs/structs.go:4677 +0x1aa0
github.com/hashicorp/nomad/nomad.(*jobValidate).Validate(0x14000088f60, 0x1400102ab40)
	github.com/hashicorp/nomad/nomad/job_endpoint_hooks.go:356 +0x40
github.com/hashicorp/nomad/nomad.(*Job).admissionValidators(0x14000b1c370, 0x1029e4980?)
	github.com/hashicorp/nomad/nomad/job_endpoint_hooks.go:138 +0x10c
github.com/hashicorp/nomad/nomad.(*Job).admissionControllers(0x14000b54000?, 0x1043c6bb7?)
	github.com/hashicorp/nomad/nomad/job_endpoint_hooks.go:105 +0x44
github.com/hashicorp/nomad/nomad.(*Job).Plan(0x14000b1c370, 0x140002871a0, 0x140000d0620)
	github.com/hashicorp/nomad/nomad/job_endpoint.go:1804 +0x130
reflect.Value.call({0x14000b81620?, 0x14000089148?, 0x1052d6de0?}, {0x1043c8068, 0x4}, {0x14000dcb4e8, 0x3, 0x14000692480?})
	reflect/value.go:596 +0x994
reflect.Value.Call({0x14000b81620?, 0x14000089148?, 0x14000dcb458?}, {0x14000dcb4e8?, 0x1052681a0?, 0x104c24540?})
	reflect/value.go:380 +0x94
net/rpc.(*service).call(0x14000ae4400, 0x105434a58?, 0x140000fff00?, 0x0, 0x14000b56200, 0x10519b4e0?, {0x1052681a0?, 0x140002871a0?, 0x107787b18?}, {0x104c24540?, ...}, ...)
	net/rpc/server.go:382 +0x204
net/rpc.(*Server).ServeRequest(0x50?, {0x105434a58, 0x140000fff00})
	net/rpc/server.go:503 +0x110
github.com/hashicorp/nomad/nomad.(*Server).RPC(0x14000b54000, {0x1043da92d, 0x8}, {0x1052681a0?, 0x14000287140}, {0x104c24540?, 0x140000d05b0})
	github.com/hashicorp/nomad/nomad/server.go:1984 +0xec
github.com/hashicorp/nomad/command/agent.(*Agent).RPC(0x1400083ed20?, {0x1043da92d?, 0x140007c0000?}, {0x1052681a0?, 0x14000287140?}, {0x104c24540?, 0x140000d05b0?})
	github.com/hashicorp/nomad/command/agent/agent.go:1274 +0xcc
github.com/hashicorp/nomad/command/agent.(*HTTPServer).jobPlan(0x1400083ed20, {0x105431270, 0x140002870e0}, 0x140007c0000, {0x140009fea4c, 0x7})
	github.com/hashicorp/nomad/command/agent/job_endpoint.go:189 +0x2c8
github.com/hashicorp/nomad/command/agent.(*HTTPServer).JobSpecificRequest(0x2a53924000697828?, {0x105431270, 0x140002870e0}, 0x140007c0000)
	github.com/hashicorp/nomad/command/agent/job_endpoint.go:84 +0x440
github.com/hashicorp/nomad/command/agent.(*HTTPServer).registerHandlers.(*HTTPServer).wrap.func4({0x105431270, 0x140002870e0}, 0x140007c0000)
	github.com/hashicorp/nomad/command/agent/http.go:715 +0x100
net/http.HandlerFunc.ServeHTTP(0x140006979a8?, {0x105431270?, 0x140002870e0?}, 0x1073ec108?)
	net/http/server.go:2136 +0x38
net/http.(*ServeMux).ServeHTTP(0x10542a070?, {0x105431270, 0x140002870e0}, 0x140007c0000)
	net/http/server.go:2514 +0x144
github.com/hashicorp/nomad/command/agent.NewHTTPServers.CompressHandler.CompressHandlerLevel.func3({0x10542a070?, 0x1400021f340}, 0x140007c0000)
	github.com/gorilla/[email protected]/compress.go:141 +0x4ac
net/http.HandlerFunc.ServeHTTP(0x10?, {0x10542a070?, 0x1400021f340?}, 0x1400021f340?)
	net/http/server.go:2136 +0x38
net/http.serverHandler.ServeHTTP({0x1054220d8?}, {0x10542a070?, 0x1400021f340?}, 0x6?)
	net/http/server.go:2938 +0xbc
net/http.(*conn).serve(0x1400078e630, {0x105433f78, 0x140006b8240})
	net/http/server.go:2009 +0x518
created by net/http.(*Server).Serve in goroutine 292
	net/http/server.go:3086 +0x4cc

@jrasell jrasell self-assigned this Jan 8, 2024
@jrasell jrasell added theme/jobspec stage/accepted Confirmed, and intend to work on. No timeline committment though. labels Jan 8, 2024
@shantanugadgil
Copy link
Contributor Author

have been able to reproduce locally running a dev agent

Awesome thanks. Same for me too, the above jobspec causes the server traceback for me too.

Looking forward to a new release !? 🙂

=======

How long before I would have to downgrade all client to 1.6.5 as well? OR can I leave the cluster like this until 1.7.3 (hopefully with fixes)?

This bug will only impact the Nomad servers and does not impact the Nomad clients.

The above question was more from a validity perspective, where the upgrade docs say that server_version should be >= client_version.

Currently on the affected cluster jobs seem to be running fine

@shantanugadgil
Copy link
Contributor Author

shantanugadgil commented Jan 8, 2024

from a docs perspective there could be a doc bug here:

#19653

@shantanugadgil
Copy link
Contributor Author

Would Nomad 1.7.3 be happening any time soon?

Copy link

github-actions bot commented Jan 2, 2025

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 2, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/jobspec type/bug
Projects
Development

Successfully merging a pull request may close this issue.

2 participants