-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't delete disk pools and failed replicas do not rebuild. #1744
Comments
For node 4, seems the pool is deadlocked. Do you have any logs older so we could identify the cause?
There's no pool on /dev/nvme2n1. I suggest you stable device links, ex: /dev/by-id/: https://openebs.io/docs/user-guides/replicated-storage-user-guide/replicated-pv-mayastor/rs-configuration Then on /dev/nvme3n1, we seem to got stuck but here I think I now get what's going on, see:
And then the pool7 is using the same device:
Although not sure why we get EBUSY here, should have returned error saying another pool exists on the same device, this may likely be a bug. |
Unfortunatelly I do not have older logs. Story about node7 on /dev/nvme3n1 ... that disk was errored out with: Illegal byte sequence ... as it was not usable I deleted CRD. And then created new CRD, but due to some copy+paste it got wrong resource name. As you see device is online. But actually previous entry was not removed ( k8s-node-7-nvme3n1) after I deleted CRD. It is stuck there. Node7 /dev/nvme2n1 ... disk was by accident repurposed and reformatted. Once we figured out - we wanted to remove from pool. We expected as soon it is removed from pool that mayastor will rebalance and data what was there would be re-distributed across active pools. k8s-node-4-nvme3n1-1 and k8s-node-4-nvme3n1. k8s-node-4-nvme3n1 was failing and we did nvme wipe. I was not able to bring back to array with the same name (e.g. CRD was not there with that id as I deleted, but when I tried to re-use name it errored out) so we chose new name. Now I have thos 3 entries stuck and I can't get rid of them. And disk replicas associated with those entries are still there. Is there api call .. or direct database edit I can do to make them go away and replicas assigned to those disks gets re-distributed across live pool ? Maybe it is worth to mention that all that happened with an v2.2.0 mayastor version. As we couldn't resolve issue was hoping upgrade to 2.6.1 will help. |
I have pools stuck like:
k8s-node-6-nvme4n1 aio:///dev/nvme4n1?uuid=5f08abc6-6924-446e-997b-24685c967cc2 true k8s-node-6 Online 1.7TiB 1.6TiB 171.7GiB 1.6TiB
k8s-node-4-nvme3n1 /dev/nvme3n1 true k8s-node-4 Unknown 0 B 0 B 0 B
k8s-node-7-nvme2n1 /dev/nvme2n1 true k8s-node-7 Unknown 0 B 0 B 0 B
k8s-node-7-nvme3n1 /dev/nvme3n1 true k8s-node-7 Unknown 0 B 0 B 0 B
last 3 items.
I deleted CRD's but does not seem to help.
Then I noticed that there are bad replicas which references those pools (maybe it is cause why pool are not removed)
ec945ba5-b62f-43c9-8bd8-bbbadc81c7a7 861d034a-2ede-4a68-a925-837659541710 k8s-node-7 k8s-node-7-nvme3n1 Unknown
└─ 1a25f0a1-40f8-44c5-9f20-3f2949924b7c k8s-node-8 k8s-node-8-nvme3n1 Online 10GiB 10GiB 0 B
mayastor-2024-09-25--17-51-42-UTC.tar.gz
So - here are two questions ....
how to get rid of bad pools .. and how to force to re-allocate/rebuild replica ?
The text was updated successfully, but these errors were encountered: