Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CSI] Error deregistering volume when job are recycled #7625

Closed
chenjpu opened this issue Apr 4, 2020 · 7 comments
Closed

[CSI] Error deregistering volume when job are recycled #7625

chenjpu opened this issue Apr 4, 2020 · 7 comments

Comments

@chenjpu
Copy link

chenjpu commented Apr 4, 2020

Nomad version

nomad
0.11.0-beta2

Operating system and Environment details

centos
hostpath csi driver

Issue

Stop job, and run nomad system gc command ,mysql0 volume is wrong through ui access.
image
1.console

# nomad volume status mysql0
ID                   = mysql0
Name                 = mysql0
External ID          = 45b58ca3-7668-11ea-88fd-0242ac110004
Plugin ID            = csi-hostpath
Provider             = hostpath.csi.k8s.io
Version              = v1.4.0-rc2-10-g4129e73
Schedulable          = true
Controllers Healthy  = 3
Controllers Expected = 3
Nodes Healthy        = 3
Nodes Expected       = 3
Access Mode          = single-node-writer
Attachment Mode      = file-system
Mount Options        = <none>
Namespace            = default

Allocations
No allocations placed
# nomad volume deregister mysql0
Error deregistering volume: Unexpected response code: 500 (volume in use: mysql0)
@chenjpu
Copy link
Author

chenjpu commented Apr 4, 2020

curl http://127.0.0.1:4466/v1/volume/csi/mysql0

{
	"AccessMode": "single-node-writer",
	"AttachmentMode": "file-system",
	"ControllerRequired": false,
	"ControllersExpected": 3,
	"ControllersHealthy": 3,
	"CreateIndex": 19349,
	"ExternalID": "45b58ca3-7668-11ea-88fd-0242ac110004",
	"ID": "mysql0",
	"ModifyIndex": 19465,
	"MountOptions": null,
	"Name": "mysql0",
	"Namespace": "default",
	"NodesExpected": 3,
	"NodesHealthy": 3,
	"PluginID": "csi-hostpath",
	"Provider": "hostpath.csi.k8s.io",
	"ProviderVersion": "v1.4.0-rc2-10-g4129e73",
	"ReadAllocs": {},
	"ResourceExhausted": null,
	"Schedulable": true,
	"Topologies": [],
	"WriteAllocs": {
		"ae433700-3be8-7d07-40b1-fc62bf432cb0": null
	}
}

Alloc ae433700-3be8-7d07-40b1-fc62bf432cb0 does not exist .

@tgross
Copy link
Member

tgross commented Apr 4, 2020

Looks like you're probably running into a case like: #7605 That'll be fixed in the 0.11.0-rc1 release going out early next week.

@chenjpu
Copy link
Author

chenjpu commented Apr 6, 2020

@tgross A memory address error occurs in this line of code when the corresponding alloc object does not exist feasible.go#L307

@tgross
Copy link
Member

tgross commented Apr 6, 2020

Thanks @chenjpu! Fixed in #7633

@tgross tgross closed this as completed Apr 6, 2020
@chenjpu
Copy link
Author

chenjpu commented Apr 7, 2020

@tgross
If this method loads non-existent alloc objects, that cause the volume to be in an abnormal state and can not be used or deregister.

And this error seems to be related to the above NPE.

@chenjpu
Copy link
Author

chenjpu commented Apr 7, 2020

Not sure why a non-existent Alloc object id appears in the CSIVolume. :(

@github-actions
Copy link

github-actions bot commented Nov 9, 2022

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 9, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants