runtime error: invalid memory address or nil pointer dereference #19644

shantanugadgil · 2024-01-06T19:20:10Z

Nomad version

Nomad v1.7.2
BuildDate 2023-12-13T19:59:42Z
Revision 64e3dca

Operating system and Environment details

Amazon Linux 2 / Amazon Linux 2023

Issue

2024-01-06T19:04:30.003Z [ERROR] http: http: panic serving NN.NN.NN.NNN:39804: runtime error: invalid memory address or nil pointer dereference                              
goroutine 3299 [running]:                                                                                                                                                    
net/http.(*conn).serve.func1()                                                                                                                                               
        net/http/server.go:1868 +0xb9                                                                                                                                        
panic({0x2a60280?, 0x4f15a40?})                                                                                                                                              
        runtime/panic.go:920 +0x270                                                                                                                                          
github.com/hashicorp/nomad/nomad/structs.(*Job).Validate(0xc005aefb00)                                                                                                       
        github.com/hashicorp/nomad/nomad/structs/structs.go:4677 +0x1c60                                                                                                     
github.com/hashicorp/nomad/nomad.(*jobValidate).Validate(0xc00084ef78, 0xc005aefb00)                                                                                         
        github.com/hashicorp/nomad/nomad/job_endpoint_hooks.go:356 +0x4e                                                                                                     
github.com/hashicorp/nomad/nomad.(*Job).admissionValidators(0xc000e36460, 0x40b0b8?)                                                                                         
        github.com/hashicorp/nomad/nomad/job_endpoint_hooks.go:138 +0x13c                                                                                                    
github.com/hashicorp/nomad/nomad.(*Job).admissionControllers(0xc0006da000?, 0x3038d73?)                                                                                      
        github.com/hashicorp/nomad/nomad/job_endpoint_hooks.go:105 +0x51                                                                                                     
github.com/hashicorp/nomad/nomad.(*Job).Plan(0xc000e36460, 0xc00617c300, 0xc00417e310)                                                                                       
        github.com/hashicorp/nomad/nomad/job_endpoint.go:1804 +0x196                                                                                                         
reflect.Value.call({0xc00009f680?, 0xc00084f160?, 0x7f61c50dad98?}, {0x303a316, 0x4}, {0xc005735538, 0x3, 0x0?})                                           
        reflect/value.go:596 +0xce7                                                   
reflect.Value.Call({0xc00009f680?, 0xc00084f160?, 0x4a7435?}, {0xc005735538?, 0xc005735588?, 0xa19bc7?})                                                                                                           
        reflect/value.go:380 +0xb9                                                                       
net/rpc.(*service).call(0xc000e266c0, 0x2db8640?, 0x9?, 0x0, 0xc000643880, 0x40?, {0x2f07880?, 0xc00617c300?, 0x0?}, {0x2bd9ce0, ...}, ...)                                                                        
        net/rpc/server.go:382 +0x211                                                                     
net/rpc.(*Server).ServeRequest(0x415325?, {0x37e2038, 0xc004ae6100})                                                                                                                                               
        net/rpc/server.go:503 +0x165                                                                     
github.com/hashicorp/nomad/nomad.(*Server).RPC(0xc0006da000, {0x304de73, 0x8}, {0x2f07880?, 0xc00617c2a0}, {0x2bd9ce0?, 0xc00417e2a0})                                                                             
        github.com/hashicorp/nomad/nomad/server.go:1984 +0xeb                                                          
github.com/hashicorp/nomad/command/agent.(*Agent).RPC(0xc005fd5420?, {0x304de73?, 0xc0049e9700?}, {0x2f07880?, 0xc00617c2a0?}, {0x2bd9ce0?, 0xc00417e2a0?})                  
        github.com/hashicorp/nomad/command/agent/agent.go:1274 +0x11b                                                                                                        
github.com/hashicorp/nomad/command/agent.(*HTTPServer).jobPlan(0xc005fd5420, {0x37de6f0, 0xc00617c120}, 0xc0049e9700, {0xc006adac0c, 0x15})                                                                        
        github.com/hashicorp/nomad/command/agent/job_endpoint.go:189 +0x324                              
github.com/hashicorp/nomad/command/agent.(*HTTPServer).JobSpecificRequest(0x0?, {0x37de6f0, 0xc00617c120}, 0xc0049e9700)                                                                                                                      
        github.com/hashicorp/nomad/command/agent/job_endpoint.go:84 +0x405                                             
github.com/hashicorp/nomad/command/agent.(*HTTPServer).registerHandlers.(*HTTPServer).wrap.func4({0x37de6f0, 0xc00617c120}, 0xc0049e9700)                                                                                                     
        github.com/hashicorp/nomad/command/agent/http.go:715 +0x168                                                    
net/http.HandlerFunc.ServeHTTP(0xc000105800?, {0x37de6f0?, 0xc00617c120?}, 0xc004ad99e8?)                                                                                                                                                     
        net/http/server.go:2136 +0x29               
net/http.(*ServeMux).ServeHTTP(0x0?, {0x37de6f0, 0xc00617c120}, 0xc0049e9700)                                          
        net/http/server.go:2514 +0x142                     
github.com/hashicorp/nomad/command/agent.NewHTTPServers.CompressHandler.CompressHandlerLevel.func3({0x37d6e70?, 0xc0007c0540}, 0xc0049e9700)                                                                                                  
        github.com/gorilla/[email protected]/compress.go:141 +0x547                                                      
net/http.HandlerFunc.ServeHTTP(0x415325?, {0x37d6e70?, 0xc0007c0540?}, 0xc0007c0501?)                                  
        net/http/server.go:2136 +0x29                      
net/http.serverHandler.ServeHTTP({0x37ce378?}, {0x37d6e70?, 0xc0007c0540?}, 0x6?)                                      
        net/http/server.go:2938 +0x8e                      
net/http.(*conn).serve(0xc004429170, {0x37e1438, 0xc0067bfad0})                                                        
        net/http/server.go:2009 +0x5f4                     
created by net/http.(*Server).Serve in goroutine 347                                                                   
        net/http/server.go:3086 +0x5cb                     
2024-01-06T19:04:37.658Z [INFO]  nomad: setting up raft bolt store: no_freelist_sync=false

Reproduction steps

submitting a particular job via Terraform (1.6.6) and the Terraform provider (2.1.0) causes the server leader to crash.

Expected Result

This should not happen. Downgrading the servers to 1.6.5 make the error go away.

Actual Result

server crashes, leader changes

Job file (if appropriate)

too big and customized to share for now.

Nomad Server logs (if appropriate)

the segfault traceback has been posted above

Nomad Client logs (if appropriate)

N/A

NOTE: This has started occurring recently.

I have already upgraded all the clients to 1.7.2

How long before I would have to downgrade all client to 1.6.5 as well? OR can I leave the cluster like this until 1.7.3 (hopefully with fixes)?

The text was updated successfully, but these errors were encountered:

shantanugadgil · 2024-01-06T20:31:24Z

After applying the job without errors, I tried yet-another upgrade of the servers to 1.7.2 and the issue recurred.

I am downgrading the servers again to 1.6.5 and leaving them for now. I shall try to reproduce the issue maybe tomorrow

jrasell · 2024-01-08T10:00:51Z

Hi @shantanugadgil and thanks for raising this issue which I have been able to reproduce locally running a dev agent. It looks like this occurs due to a specific combination of parameters which you can see within the example jobspec below. I will work on a fix for this and should have a PR ready shortly.

How long before I would have to downgrade all client to 1.6.5 as well? OR can I leave the cluster like this until 1.7.3 (hopefully with fixes)?

This bug will only impact the Nomad servers and does not impact the Nomad clients.

main.tf file:

terraform {
  required_providers {
    nomad = {
      source  = "hashicorp/nomad"
      version = "= 2.1.0"
    }
  }
}

provider "nomad" {
  address = "http://127.0.0.1:4646"
}

resource "nomad_job" "gh19644" {
  jobspec = file("${path.module}/gh19644.nomad.hcl")
}

gh19644.nomad.hcl file:

job "gh19644" {
  type = "system"
  group "cache" {
    max_client_disconnect      = "1h"
    prevent_reschedule_on_lost = true
    task "redis" {
      driver = "docker"
      config {
        image = "redis:7"
      }
    }
  }
}

The server panic output:

    2024-01-08T09:54:34.710Z [DEBUG] http: request complete: method=PUT path=/v1/job/gh19644/plan?namespace=default duration="133.75µs"
    2024-01-08T09:54:34.710Z [ERROR] http: http: panic serving 127.0.0.1:51394: runtime error: invalid memory address or nil pointer dereference
goroutine 435 [running]:
net/http.(*conn).serve.func1()
	net/http/server.go:1868 +0xb0
panic({0x104e65540?, 0x1068415d0?})
	runtime/panic.go:920 +0x26c
github.com/hashicorp/nomad/nomad/structs.(*Job).Validate(0x1400102ab40)
	github.com/hashicorp/nomad/nomad/structs/structs.go:4677 +0x1aa0
github.com/hashicorp/nomad/nomad.(*jobValidate).Validate(0x14000088f60, 0x1400102ab40)
	github.com/hashicorp/nomad/nomad/job_endpoint_hooks.go:356 +0x40
github.com/hashicorp/nomad/nomad.(*Job).admissionValidators(0x14000b1c370, 0x1029e4980?)
	github.com/hashicorp/nomad/nomad/job_endpoint_hooks.go:138 +0x10c
github.com/hashicorp/nomad/nomad.(*Job).admissionControllers(0x14000b54000?, 0x1043c6bb7?)
	github.com/hashicorp/nomad/nomad/job_endpoint_hooks.go:105 +0x44
github.com/hashicorp/nomad/nomad.(*Job).Plan(0x14000b1c370, 0x140002871a0, 0x140000d0620)
	github.com/hashicorp/nomad/nomad/job_endpoint.go:1804 +0x130
reflect.Value.call({0x14000b81620?, 0x14000089148?, 0x1052d6de0?}, {0x1043c8068, 0x4}, {0x14000dcb4e8, 0x3, 0x14000692480?})
	reflect/value.go:596 +0x994
reflect.Value.Call({0x14000b81620?, 0x14000089148?, 0x14000dcb458?}, {0x14000dcb4e8?, 0x1052681a0?, 0x104c24540?})
	reflect/value.go:380 +0x94
net/rpc.(*service).call(0x14000ae4400, 0x105434a58?, 0x140000fff00?, 0x0, 0x14000b56200, 0x10519b4e0?, {0x1052681a0?, 0x140002871a0?, 0x107787b18?}, {0x104c24540?, ...}, ...)
	net/rpc/server.go:382 +0x204
net/rpc.(*Server).ServeRequest(0x50?, {0x105434a58, 0x140000fff00})
	net/rpc/server.go:503 +0x110
github.com/hashicorp/nomad/nomad.(*Server).RPC(0x14000b54000, {0x1043da92d, 0x8}, {0x1052681a0?, 0x14000287140}, {0x104c24540?, 0x140000d05b0})
	github.com/hashicorp/nomad/nomad/server.go:1984 +0xec
github.com/hashicorp/nomad/command/agent.(*Agent).RPC(0x1400083ed20?, {0x1043da92d?, 0x140007c0000?}, {0x1052681a0?, 0x14000287140?}, {0x104c24540?, 0x140000d05b0?})
	github.com/hashicorp/nomad/command/agent/agent.go:1274 +0xcc
github.com/hashicorp/nomad/command/agent.(*HTTPServer).jobPlan(0x1400083ed20, {0x105431270, 0x140002870e0}, 0x140007c0000, {0x140009fea4c, 0x7})
	github.com/hashicorp/nomad/command/agent/job_endpoint.go:189 +0x2c8
github.com/hashicorp/nomad/command/agent.(*HTTPServer).JobSpecificRequest(0x2a53924000697828?, {0x105431270, 0x140002870e0}, 0x140007c0000)
	github.com/hashicorp/nomad/command/agent/job_endpoint.go:84 +0x440
github.com/hashicorp/nomad/command/agent.(*HTTPServer).registerHandlers.(*HTTPServer).wrap.func4({0x105431270, 0x140002870e0}, 0x140007c0000)
	github.com/hashicorp/nomad/command/agent/http.go:715 +0x100
net/http.HandlerFunc.ServeHTTP(0x140006979a8?, {0x105431270?, 0x140002870e0?}, 0x1073ec108?)
	net/http/server.go:2136 +0x38
net/http.(*ServeMux).ServeHTTP(0x10542a070?, {0x105431270, 0x140002870e0}, 0x140007c0000)
	net/http/server.go:2514 +0x144
github.com/hashicorp/nomad/command/agent.NewHTTPServers.CompressHandler.CompressHandlerLevel.func3({0x10542a070?, 0x1400021f340}, 0x140007c0000)
	github.com/gorilla/[email protected]/compress.go:141 +0x4ac
net/http.HandlerFunc.ServeHTTP(0x10?, {0x10542a070?, 0x1400021f340?}, 0x1400021f340?)
	net/http/server.go:2136 +0x38
net/http.serverHandler.ServeHTTP({0x1054220d8?}, {0x10542a070?, 0x1400021f340?}, 0x6?)
	net/http/server.go:2938 +0xbc
net/http.(*conn).serve(0x1400078e630, {0x105433f78, 0x140006b8240})
	net/http/server.go:2009 +0x518
created by net/http.(*Server).Serve in goroutine 292
	net/http/server.go:3086 +0x4cc

shantanugadgil · 2024-01-08T10:20:34Z

have been able to reproduce locally running a dev agent

Awesome thanks. Same for me too, the above jobspec causes the server traceback for me too.

Looking forward to a new release !? 🙂

=======

How long before I would have to downgrade all client to 1.6.5 as well? OR can I leave the cluster like this until 1.7.3 (hopefully with fixes)?

This bug will only impact the Nomad servers and does not impact the Nomad clients.

The above question was more from a validity perspective, where the upgrade docs say that server_version should be >= client_version.

Currently on the affected cluster jobs seem to be running fine

shantanugadgil · 2024-01-08T10:33:05Z

from a docs perspective there could be a doc bug here:

#19653

shantanugadgil · 2024-01-08T18:27:48Z

Would Nomad 1.7.3 be happening any time soon?

github-actions · 2025-01-02T02:15:37Z

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

shantanugadgil added the type/bug label Jan 6, 2024

jrasell self-assigned this Jan 8, 2024

jrasell added theme/jobspec stage/accepted Confirmed, and intend to work on. No timeline committment though. labels Jan 8, 2024

jrasell mentioned this issue Jan 8, 2024

server: Fix panic when validating non-service reschedule block. #19652

Merged

jrasell closed this as completed in #19652 Jan 8, 2024

hc-github-team-nomad-core mentioned this issue Jan 8, 2024

Backport of server: Fix panic when validating non-service reschedule block. into release/1.7.x #19656

Merged

tgross added this to Nomad - Community Issues Triage Jun 24, 2024

tgross moved this to Done in Nomad - Community Issues Triage Jun 24, 2024

github-actions bot locked as resolved and limited conversation to collaborators Jan 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime error: invalid memory address or nil pointer dereference #19644

runtime error: invalid memory address or nil pointer dereference #19644

shantanugadgil commented Jan 6, 2024 •

edited

Loading

shantanugadgil commented Jan 6, 2024

jrasell commented Jan 8, 2024

shantanugadgil commented Jan 8, 2024

shantanugadgil commented Jan 8, 2024 •

edited

Loading

shantanugadgil commented Jan 8, 2024

github-actions bot commented Jan 2, 2025

runtime error: invalid memory address or nil pointer dereference #19644

runtime error: invalid memory address or nil pointer dereference #19644

Comments

shantanugadgil commented Jan 6, 2024 • edited Loading

Nomad version

Operating system and Environment details

Issue

Reproduction steps

Expected Result

Actual Result

Job file (if appropriate)

Nomad Server logs (if appropriate)

Nomad Client logs (if appropriate)

shantanugadgil commented Jan 6, 2024

jrasell commented Jan 8, 2024

shantanugadgil commented Jan 8, 2024

shantanugadgil commented Jan 8, 2024 • edited Loading

shantanugadgil commented Jan 8, 2024

github-actions bot commented Jan 2, 2025

shantanugadgil commented Jan 6, 2024 •

edited

Loading

shantanugadgil commented Jan 8, 2024 •

edited

Loading