Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ProvisioningFailed - IntegrityError: UNIQUE constraint failed: #8

Closed
jp83 opened this issue Mar 5, 2020 · 13 comments
Closed

ProvisioningFailed - IntegrityError: UNIQUE constraint failed: #8

jp83 opened this issue Mar 5, 2020 · 13 comments

Comments

@jp83
Copy link

jp83 commented Mar 5, 2020

I've been looking forward to trying this out for awhile in my homelab and unifying all my storage on freenas to take care of all the various apps relying on sqlite that don't do well over NFS. Anyways, I haven't dug into this a whole lot, just hoping you can point me in the right direction or if you've moved on past FreeNAS version 11.1-U7 since I've seen other comments about 11.2 and 11.3 things changing.

From FreeNAS I see that it's created the zvol, iscsi target, extent, and associated target. From rancher gui I see the storage class but no associated PV or PVCs. I only see this error in the events when I apply the test pod. I guessing it looks more like freenas isn't returning the correct data

Warning ProvisioningFailed Failed to provision volume with StorageClass "freenas-iscsi": Error creating targetgroup for {ID:0 Target:2 Authgroup:0 Authtype:None Initialdigest:Auto Initiatorgroup:1 Portalgroup:1} - message: {"error_message":"UNIQUE constraint failed: services_iscsitargetgroups.iscsi_target_id, services_iscsitargetgroups.iscsi_target_portalgroup_id","traceback":"Traceback (most recent call last):\n\n File "/usr/local/lib/python3.6/site-packages/django/db/backends/utils.py", line 64, in execute\n return self.cursor.execute(sql, params)\n\n File "./freenasUI/freeadmin/sqlite3_ha/base.py", line 412, in execute\n execute = self.locked_retry(Database.Cursor.execute, query, params)\n\n File "./freenasUI/freeadmin/sqlite3_ha/base.py", line 389, in locked_retry\n rv = method(self, *args, **kwargs)\n\nsqlite3.IntegrityError: UNIQUE constraint failed: services_iscsitargetgroups.iscsi_target_id, services_iscsitargetgroups.iscsi_target_portalgroup_id\n\n\nThe above exception was the direct cause of the following exception:\n\n\nTraceback (most recent call last):\n\n File "/usr/local/lib/python3.6/site-packages/tastypie/resources.py", line 219, in wrapper\n response = callback(request, *args, **kwargs)\n\n File "/usr/local/lib/python3.6/site-packages/tastypie/resources.py", line 450, in dispatch_list\n return self.dispatch('list', request, **kwargs)\n\n File "./freenasUI/api/utils.py", line 247, in dispatch\n request_type, request, *args, **kwargs\n\n File "/usr/local/lib/python3.6/site-packages/tastypie/resources.py", line 482, in dispatch\n response = method(request, **kwargs)\n\n File "/usr/local/lib/python3.6/site-packages/tastypie/resources.py", line 1384, in post_list\n updated_bundle = self.obj_create(bundle, **self.remove_api_resource_names(kwargs))\n\n File "/usr/local/lib/python3.6/site-packages/tastypie/resources.py", line 2175, in obj_create\n return self.save(bundle)\n\n File "./freenasUI/api/utils.py", line 410, in save\n form.save()\n\n File "/usr/local/lib/python3.6/site-packages/django/forms/models.py", line 453, in save\n self.instance.save()\n\n File "/usr/local/lib/python3.6/site-packages/django/db/models/base.py", line 796, in save\n force_update=force_update, update_fields=update_fields)\n\n File "/usr/local/lib/python3.6/site-packages/django/db/models/base.py", line 824, in save_base\n updated = self._save_table(raw, cls, force_insert, force_update, using, update_fields)\n\n File "/usr/local/lib/python3.6/site-packages/django/db/models/base.py", line 908, in _save_table\n result = self._do_insert(cls._base_manager, using, fields, update_pk, raw)\n\n File "/usr/local/lib/python3.6/site-packages/django/db/models/base.py", line 947, in _do_insert\n using=using, raw=raw)\n\n File "/usr/local/lib/python3.6/site-packages/django/db/models/manager.py", line 85, in manager_method\n return getattr(self.get_queryset(), name)(*args, **kwargs)\n\n File "/usr/local/lib/python3.6/site-packages/django/db/models/query.py", line 1045, in _insert\n return query.get_compiler(using=using).execute_sql(return_id)\n\n File "/usr/local/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 1054, in execute_sql\n cursor.execute(sql, params)\n\n File "/usr/local/lib/python3.6/site-packages/django/db/backends/utils.py", line 64, in execute\n return self.cursor.execute(sql, params)\n\n File "/usr/local/lib/python3.6/site-packages/django/db/utils.py", line 94, in exit\n six.reraise(dj_exc_type, dj_exc_value, traceback)\n\n File "/usr/local/lib/python3.6/site-packages/django/utils/six.py", line 685, in reraise\n raise value.with_traceback(tb)\n\n File "/usr/local/lib/python3.6/site-packages/django/db/backends/utils.py", line 64, in execute\n return self.cursor.execute(sql, params)\n\n File "./freenasUI/freeadmin/sqlite3_ha/base.py", line 412, in execute\n execute = self.locked_retry(Database.Cursor.execute, query, params)\n\n File "./freenasUI/freeadmin/sqlite3_ha/base.py", line 389, in locked_retry\n rv = method(self, *args, **kwargs)\n\ndjango.db.utils.IntegrityError: UNIQUE constraint failed: services_iscsitargetgroups.iscsi_target_id, services_iscsitargetgroups.iscsi_target_portalgroup_id\n"}, status: 500

And I just looked at the provisioner in kube-system and t's in CrashLoopBackOff with similar messages. Attached log.
container.log

@travisghansen
Copy link
Owner

Are you just starting out with this? Or do you already have stuff provisioned?

@jp83
Copy link
Author

jp83 commented Mar 5, 2020

Just starting, why? Better options?

@travisghansen
Copy link
Owner

Yes, my next gen version of this is available over here:

I guess I should know what version of k8s you're on and what distro of k8s you use as well.

To address the actual issue, I'm guessing you'll hit the same problem. I've seen some weird database issues with FreeNAS before (had nothing to do with either provisioner) where invalid entries weren't cleaned up properly perhaps even from using the UI. They could have stemmed from 11.1 like you're using but not entirely sure. In any case, if you can run the other project let's get you moved over to that and I'll send some info in a bit with how I cleaned up the DB in the past manually.

@jp83
Copy link
Author

jp83 commented Mar 5, 2020

Cool, appreciate the response and honestly open to suggestions. A bit of background, I'm currently running "all-in-one" with FreeNAS virtualized (HBA passthrough) on a single esxi (free license) host. My lab mostly runs home prod stuff (plex and the like, home automation, etc). VMs are running RancherOS and I used Rancher for setting up the k8s cluster, I was on 1.13 and just upgraded it all to 1.15 and will keep working up. If I can figure out best re-architecture of my storage I'll likely rebuild and just copy data over. Currently I've been playing with and have OpenEBS running (messed with cStor and then went back to jiva default). Even though I'm just trying to model a real setup, the problem is having 3 replicas on same hardware has redundancy < 1 since it needs at least 2 up for R/W access. I really liked the idea of baremetal k8s, but don't trust myself with the storage and otherwise don't feel like my prior 12-bay R510 or new R720xd is fully utilized if it's only used for storage. I toyed with extra servers, don't really need the power or expense, also vSAN through VMUG, rook-ceph, etc. but I keep coming back to simplicity of FreeNAS, with benefits of snapshots and easy remote backup through zsend. With decent hardware and backups seems to be reliable enough for my needs despite my fascination with HA. I use HDDs for bulk storage (media) and NVMe for local VM datastores (mostly to bootstrap FreeNAS VM and the k8s nodes). I could dedicate at least a couple NVMe passed through for fast storage (likely looped back for a couple other VMs) or keep using virtual disks to play around and just make sure they're across different drives.

@travisghansen
Copy link
Owner

Wow! Not messing around :)

Well, you're at a new enough k8s version to use csi, but you may have issues with iscsi based on containerized k8s. See this: rancher/rke#1846

I'm not sure if it's possible to achieve the fix using rancher directly with rancher os so let me know what you think.

The new project supports all the fancy new stuff like snapshots etc, but snapshot support recently went beta and changed things around a bit from the alpha support I programmed for and I haven't updated to support the new stuff. It requires 1.17 anyhow I believe. It will do snapshots to another pool on the same host, but I'm considering adding support for remote snapshots which sounds like might be of interest to you.

I'll give some notes on the DB issue with FreeNAS when I'm not just on my phone.

@travisghansen
Copy link
Owner

travisghansen commented Mar 6, 2020

Ok, so DON'T do this exactly (or possibly even at all). However I was constantly getting some crazy error responses from the API and it turned out to be that exact table that was messed up. I was getting some errors (don't recall exactly what) but the DB had entries that were not visible in the UI...I found it strange so started doing some digging into the DB directly and that's when I found the bogus rows. I was in a position that I could just wipe stuff and start over completely clean:

# stop services to work on the db directly
/usr/local/etc/rc.d/django stop
/usr/local/etc/rc.d/nginx stop

# wipe tables
sqlite3 /data/freenas-v1.db
DELETE from services_iscsitargetgroups;
DELETE from services_iscsitargettoextent;

# reset SEQ
UPDATE SQLITE_SEQUENCE SET SEQ=0 WHERE NAME='services_iscsitargettoextent';

# restart services
/usr/local/etc/rc.d/django restart
/usr/local/etc/rc.d/nginx restart

I'm not entirely sure when/how the DB got the invalid entries in there, so a different course of action may be required (I wouldn't update to 11.3 FYI, not because of the API issue but just generally I've seen several comments saying it's not ready).

@travisghansen
Copy link
Owner

Any more luck with this one?

@jp83
Copy link
Author

jp83 commented Mar 21, 2020

Was gone for spring break and busy. Just kind of documenting here for reference. I had another test target from a windows machine so maybe that messed up the table. I looked into the sqlite db but only see 1 entry after deleting others from webgui...

sqlite> SELECT * from services_iscsitargetgroups;
1|Auto|1||1|None|1
sqlite> SELECT * from services_iscsitargettoextent;
1|1||1

I guess the error is having a problem with uniqueness, so prob more related to the index.

Here's the schema

sqlite> .schema services_iscsitargetgroups
CREATE TABLE IF NOT EXISTS "services_iscsitargetgroups" ("iscsi_target_portalgroup_id" integer NOT NULL, "iscsi_target_initialdigest" varchar(120) NOT NULL, "iscsi_target_id" integer NOT NULL, "iscsi_target_authgroup" integer, "iscsi_target_initiatorgroup_id" integer, "iscsi_target_authtype" varchar(120) NOT NULL, "id" integer PRIMARY KEY);
CREATE INDEX "services_iscsitargetgroups_39e2d7df" ON "services_iscsitargetgroups"("iscsi_target_initiatorgroup_id");
CREATE INDEX "services_iscsitargetgroups_dcc120ea" ON "services_iscsitargetgroups"("iscsi_target_portalgroup_id");
CREATE INDEX "services_iscsitargetgroups_c939c4d7" ON "services_iscsitargetgroups"("iscsi_target_id");
**CREATE UNIQUE INDEX "services_iscsitargetgroups_iscsi_target_id__iscsi_target_portalgroup_id" ON "services_iscsitargetgroups"("iscsi_target_id", "iscsi_target_portalgroup_id");**

And the index(es) aren't empty:

sqlite> PRAGMA index_list('services_iscsitargetgroups');
0|services_iscsitargetgroups_iscsi_target_id__iscsi_target_portalgroup_id|1|c|0
1|services_iscsitargetgroups_c939c4d7|0|c|0
2|services_iscsitargetgroups_dcc120ea|0|c|0
3|services_iscsitargetgroups_39e2d7df|0|c|0
sqlite> PRAGMA index_list('services_iscsitargettoextent');
0|services_iscsitargettoextent_iscsi_target_id__iscsi_extent_id|1|c|0
1|services_iscsitargettoextent_c939c4d7|0|c|0

Anyways, I started up another fresh test instance of Freenas 11.1-U7 so I can easily blow away if needed. I deleted the class and updated the secret to point to other server, not sure what I missed. I thought I missed something simple but then realized the iscsi service doesn't really start if there's not already a valid target (which I inadvertantly did before with my windows test).

Mar 21 09:54:09 freenas ctld[16753]: portal-group "pg1" not assigned to any target

So created another test target, extent, and associated target, then I can see a new pvc created (again like before), but still have issues with the dynamic provisioner and it ends up in CrashLoopBackOff.

I0321 15:23:24.993243 1 controller.go:615] Starting provisioner controller e84693f2-6b87-11ea-a445-76654d6ff16c!
I0321 15:23:24.993334 1 controller.go:652] Started provisioner controller e84693f2-6b87-11ea-a445-76654d6ff16c!
I0321 15:23:25.011585 1 leaderelection.go:156] attempting to acquire leader lease...
I0321 15:23:41.528627 1 leaderelection.go:178] successfully acquired lease to provision for pvc default/freenas-test-iscsi-pvc
I0321 15:23:41.535038 1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"freenas-test-iscsi-pvc", UID:"4176cb57-677b-4e5a-ab7f-90e246f1a79a", APIVersion:"v1", ResourceVersion:"77825912", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/freenas-test-iscsi-pvc"
I0321 15:23:41.595827 1 provisioner.go:401] Creating target: "pvc-4176cb57-677b-4e5a-ab7f-90e246f1a79a", zvol: "vol1/k8s/pvc-4176cb57-677b-4e5a-ab7f-90e246f1a79a", extent: "pvc-4176cb57-677b-4e5a-ab7f-90e246f1a79a"
I0321 15:23:41.707123 1 provisioner.go:424] Zvol vol1/k8s/pvc-4176cb57-677b-4e5a-ab7f-90e246f1a79a already exists
I0321 15:23:41.753682 1 target.go:74] found Target name: pvc-4176cb57-677b-4e5a-ab7f-90e246f1a79a - {ID:2 Name:pvc-4176cb57-677b-4e5a-ab7f-90e246f1a79a Alias: Mode:iscsi}
I0321 15:23:41.789357 1 provisioner.go:473] failed attempt to create TargetGroup 500
E0321 15:23:43.212317 1 controller.go:1074] Failed to provision volume for claim "default/freenas-test-iscsi-pvc" with StorageClass "freenas-iscsi": Error creating targetgroup for {ID:0 Target:2 Authgroup:0 Authtype:None Initialdigest:Auto Initiatorgroup:1 Portalgroup:1} - message: {"error_message":"UNIQUE constraint failed: services_iscsitargetgroups.iscsi_target_id, services_iscsitargetgroups.iscsi_target_portalgroup_id","traceback":"Traceback (most recent call last):\n\n File \"/usr/local/lib/python3.6/site-packages/django/db/backends/utils.py\", line 64, in execute\n return self.cursor.execute(sql, params)\n\n File \"./freenasUI/freeadmin/sqlite3_ha/base.py\", line 412, in execute\n execute = self.locked_retry(Database.Cursor.execute, query, params)\n\n File \"./freenasUI/freeadmin/sqlite3_ha/base.py\", line 389, in locked_retry\n rv = method(self, *args, **kwargs)\n\nsqlite3.IntegrityError: UNIQUE constraint failed: services_iscsitargetgroups.iscsi_target_id, services_iscsitargetgroups.iscsi_target_portalgroup_id\n\n\nThe above exception was the direct cause of the following exception:\n\n\nTraceback (most recent call last):\n\n File \"/usr/local/lib/python3.6/site-packages/tastypie/resources.py\", line 219, in wrapper\n response = callback(request, *args, **kwargs)\n\n File \"/usr/local/lib/python3.6/site-packages/tastypie/resources.py\", line 450, in dispatch_list\n return self.dispatch('list', request, **kwargs)\n\n File \"./freenasUI/api/utils.py\", line 247, in dispatch\n request_type, request, *args, **kwargs\n\n File \"/usr/local/lib/python3.6/site-packages/tastypie/resources.py\", line 482, in dispatch\n response = method(request, **kwargs)\n\n File \"/usr/local/lib/python3.6/site-packages/tastypie/resources.py\", line 1384, in post_list\n updated_bundle = self.obj_create(bundle, **self.remove_api_resource_names(kwargs))\n\n File \"/usr/local/lib/python3.6/site-packages/tastypie/resources.py\", line 2175, in obj_create\n return
self.save(bundle)\n\n File \"./freenasUI/api/utils.py\", line 410, in save\n form.save()\n\n File \"/usr/local/lib/python3.6/site-packages/django/forms/models.py\", line 453, in save\n self.instance.save()\n\n File \"/usr/local/lib/python3.6/site-packages/django/db/models/base.py\", line 796, in save\n force_update=force_update, update_fields=update_fields)\n\n File \"/usr/local/lib/python3.6/site-packages/django/db/models/base.py\", line 824, in save_base\n updated = self._save_table(raw, cls, force_insert, force_update, using, update_fields)\n\n File \"/usr/local/lib/python3.6/site-packages/django/db/models/base.py\", line 908, in _save_table\n result = self._do_insert(cls._base_manager, using, fields, update_pk, raw)\n\n File \"/usr/local/lib/python3.6/site-packages/django/db/models/base.py\", line 947, in _do_insert\n using=using, raw=raw)\n\n File \"/usr/local/lib/python3.6/site-packages/django/db/models/manager.py\", line 85, in manager_method\n return getattr(self.get_queryset(), name)(*args, **kwargs)\n\n File \"/usr/local/lib/python3.6/site-packages/django/db/models/query.py\", line 1045, in _insert\n return query.get_compiler(using=using).execute_sql(return_id)\n\n File \"/usr/local/lib/python3.6/site-packages/django/db/models/sql/compiler.py\", line 1054, in execute_sql\n cursor.execute(sql, params)\n\n File \"/usr/local/lib/python3.6/site-packages/django/db/backends/utils.py\", line 64, in execute\n return self.cursor.execute(sql, params)\n\n File \"/usr/local/lib/python3.6/site-packages/django/db/utils.py\", line 94, in __exit__\n six.reraise(dj_exc_type, dj_exc_value, traceback)\n\n File \"/usr/local/lib/python3.6/site-packages/django/utils/six.py\", line 685, in reraise\n raise value.with_traceback(tb)\n\n File \"/usr/local/lib/python3.6/site-packages/django/db/backends/utils.py\", line 64, in execute\n return self.cursor.execute(sql, params)\n\n File \"./freenasUI/freeadmin/sqlite3_ha/base.py\", line 412, in execute\n execute = self.locked_re
try(Database.Cursor.execute, query, params)\n\n File \"./freenasUI/freeadmin/sqlite3_ha/base.py\", line 389, in locked_retry\n rv = method(self, *args, **kwargs)\n\ndjango.db.utils.IntegrityError: UNIQUE constraint failed: services_iscsitargetgroups.iscsi_target_id, services_iscsitargetgroups.iscsi_target_portalgroup_id\n"}, status: 500
I0321 15:23:43.212416 1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"freenas-test-iscsi-pvc", UID:"4176cb57-677b-4e5a-ab7f-90e246f1a79a", APIVersion:"v1", ResourceVersion:"77825912", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' Failed to provision volume with StorageClass "freenas-iscsi": Error creating targetgroup for {ID:0 Target:2 Authgroup:0 Authtype:None Initialdigest:Auto Initiatorgroup:1 Portalgroup:1} - message: {"error_message":"UNIQUE constraint failed: services_iscsitargetgroups.iscsi_target_id, services_iscsitargetgroups.iscsi_target_portalgroup_id","traceback":"Traceback (most recent call last):\n\n File \"/usr/local/lib/python3.6/site-packages/django/db/backends/utils.py\", line 64, in execute\n return self.cursor.execute(sql, params)\n\n File \"./freenasUI/freeadmin/sqlite3_ha/base.py\", line 412, in execute\n execute = self.locked_retry(Database.Cursor.execute, query, params)\n\n File \"./freenasUI/freeadmin/sqlite3_ha/base.py\", line 389, in locked_retry\n rv = method(self, *args, **kwargs)\n\nsqlite3.IntegrityError: UNIQUE constraint failed: services_iscsitargetgroups.iscsi_target_id, services_iscsitargetgroups.iscsi_target_portalgroup_id\n\n\nThe above exception was the direct cause of the following exception:\n\n\nTraceback (most recent call last):\n\n File \"/usr/local/lib/python3.6/site-packages/tastypie/resources.py\", line 219, in wrapper\n response = callback(request, *args, **kwargs)\n\n File \"/usr/local/lib/python3.6/site-packages/tastypie/resources.py\", line 450, in dispatch_list\n return self.dispatch('list', request, **kwargs)\n\n File \"./freenasUI/api/utils.py\", line 247, in dispatch\n request_type, request, *args, **kwargs\n\n File \"/usr/local/lib/python3.6/site-packages/tastypie/resources.py\", line 482, in dispatch\n response = method(request, **kwargs)\n\n File \"/usr/local/lib/python3.6/site-packages/tastypie/resources.py\", line 1384, in pos
t_list\n updated_bundle = self.obj_create(bundle, **self.remove_api_resource_names(kwargs))\n\n File \"/usr/local/lib/python3.6/site-packages/tastypie/resources.py\", line 2175, in obj_create\n return self.save(bundle)\n\n File \"./freenasUI/api/utils.py\", line 410, in save\n form.save()\n\n File \"/usr/local/lib/python3.6/site-packages/django/forms/models.py\", line 453, in save\n self.instance.save()\n\n File \"/usr/local/lib/python3.6/site-packages/django/db/models/base.py\", line 796, in save\n force_update=force_update, update_fields=update_fields)\n\n File \"/usr/local/lib/python3.6/site-packages/django/db/models/base.py\", line 824, in save_base\n updated = self._save_table(raw, cls, force_insert, force_update, using, update_fields)\n\n File \"/usr/local/lib/python3.6/site-packages/django/db/models/base.py\", line 908, in _save_table\n result = self._do_insert(cls._base_manager, using, fields, update_pk, raw)\n\n File \"/usr/local/lib/python3.6/site-packages/django/db/models/base.py\", line 947, in _do_insert\n using=using, raw=raw)\n\n File \"/usr/local/lib/python3.6/site-packages/django/db/models/manager.py\", line 85, in manager_method\n return getattr(self.get_queryset(), name)(*args, **kwargs)\n\n File \"/usr/local/lib/python3.6/site-packages/django/db/models/query.py\", line 1045, in _insert\n return query.get_compiler(using=using).execute_sql(return_id)\n\n File \"/usr/local/lib/python3.6/site-packages/django/db/models/sql/compiler.py\", line 1054, in execute_sql\n cursor.execute(sql, params)\n\n File \"/usr/local/lib/python3.6/site-packages/django/db/backends/utils.py\", line 64, in execute\n return self.cursor.execute(sql, params)\n\n File \"/usr/local/lib/python3.6/site-packages/django/db/utils.py\", line 94, in __exit__\n six.reraise(dj_exc_type, dj_exc_value, traceback)\n\n File \"/usr/local/lib/python3.6/site-packages/django/utils/six.py\", line 685, in reraise\n raise value.with_traceback(tb)\n\n File \"/usr/local/lib/python3.6/site-pac
kages/django/db/backends/utils.py\", line 64, in execute\n return self.cursor.execute(sql, params)\n\n File \"./freenasUI/freeadmin/sqlite3_ha/base.py\", line 412, in execute\n execute = self.locked_retry(Database.Cursor.execute, query, params)\n\n File \"./freenasUI/freeadmin/sqlite3_ha/base.py\", line 389, in locked_retry\n rv = method(self, *args, **kwargs)\n\ndjango.db.utils.IntegrityError: UNIQUE constraint failed: services_iscsitargetgroups.iscsi_target_id, services_iscsitargetgroups.iscsi_target_portalgroup_id\n"}, status: 500
I0321 15:23:43.539062 1 leaderelection.go:198] stopped trying to renew lease to provision for pvc default/freenas-test-iscsi-pvc, task failed
W0321 15:23:43.539206 1 controller.go:686] retrying syncing claim "default/freenas-test-iscsi-pvc" because failures 0 < threshold 15
E0321 15:23:43.539244 1 controller.go:701] error syncing claim "default/freenas-test-iscsi-pvc": Error creating targetgroup for {ID:0 Target:2 Authgroup:0 Authtype:None Initialdigest:Auto Initiatorgroup:1 Portalgroup:1} - message: {"error_message":"UNIQUE constraint failed: services_iscsitargetgroups.iscsi_target_id, services_iscsitargetgroups.iscsi_target_portalgroup_id","traceback":"Traceback (most recent call last):\n\n File \"/usr/local/lib/python3.6/site-packages/django/db/backends/utils.py\", line 64, in execute\n return self.cursor.execute(sql, params)\n\n File \"./freenasUI/freeadmin/sqlite3_ha/base.py\", line 412, in execute\n execute = self.locked_retry(Database.Cursor.execute, query, params)\n\n File \"./freenasUI/freeadmin/sqlite3_ha/base.py\", line 389, in locked_retry\n rv = method(self, *args, **kwargs)\n\nsqlite3.IntegrityError: UNIQUE constraint failed: services_iscsitargetgroups.iscsi_target_id, services_iscsitargetgroups.iscsi_target_portalgroup_id\n\n\nThe above exception was the direct cause of the following exception:\n\n\nTraceback (most recent call last):\n\n File \"/usr/local/lib/python3.6/site-packages/tastypie/resources.py\", line 219, in wrapper\n response = callback(request, *args, **kwargs)\n\n File \"/usr/local/lib/python3.6/site-packages/tastypie/resources.py\", line 450, in dispatch_list\n return self.dispatch('list', request, **kwargs)\n\n File \"./freenasUI/api/utils.py\", line 247, in dispatch\n request_type, request, *args, **kwargs\n\n File \"/usr/local/lib/python3.6/site-packages/tastypie/resources.py\", line 482, in dispatch\n response = method(request, **kwargs)\n\n File \"/usr/local/lib/python3.6/site-packages/tastypie/resources.py\", line 1384, in post_list\n updated_bundle = self.obj_create(bundle, **self.remove_api_resource_names(kwargs))\n\n File \"/usr/local/lib/python3.6/site-packages/tastypie/resources.py\", line 2175, in obj_create\n return self.save(bundle)\n\n File \"./freenasUI/api/utils.
py\", line 410, in save\n form.save()\n\n File \"/usr/local/lib/python3.6/site-packages/django/forms/models.py\", line 453, in save\n self.instance.save()\n\n File \"/usr/local/lib/python3.6/site-packages/django/db/models/base.py\", line 796, in save\n force_update=force_update, update_fields=update_fields)\n\n File \"/usr/local/lib/python3.6/site-packages/django/db/models/base.py\", line 824, in save_base\n updated = self._save_table(raw, cls, force_insert, force_update, using, update_fields)\n\n File \"/usr/local/lib/python3.6/site-packages/django/db/models/base.py\", line 908, in _save_table\n result = self._do_insert(cls._base_manager, using, fields, update_pk, raw)\n\n File \"/usr/local/lib/python3.6/site-packages/django/db/models/base.py\", line 947, in _do_insert\n using=using, raw=raw)\n\n File \"/usr/local/lib/python3.6/site-packages/django/db/models/manager.py\", line 85, in manager_method\n return getattr(self.get_queryset(), name)(*args, **kwargs)\n\n File \"/usr/local/lib/python3.6/site-packages/django/db/models/query.py\", line 1045, in _insert\n return query.get_compiler(using=using).execute_sql(return_id)\n\n File \"/usr/local/lib/python3.6/site-packages/django/db/models/sql/compiler.py\", line 1054, in execute_sql\n cursor.execute(sql, params)\n\n File \"/usr/local/lib/python3.6/site-packages/django/db/backends/utils.py\", line 64, in execute\n return self.cursor.execute(sql, params)\n\n File \"/usr/local/lib/python3.6/site-packages/django/db/utils.py\", line 94, in __exit__\n six.reraise(dj_exc_type, dj_exc_value, traceback)\n\n File \"/usr/local/lib/python3.6/site-packages/django/utils/six.py\", line 685, in reraise\n raise value.with_traceback(tb)\n\n File \"/usr/local/lib/python3.6/site-packages/django/db/backends/utils.py\", line 64, in execute\n return self.cursor.execute(sql, params)\n\n File \"./freenasUI/freeadmin/sqlite3_ha/base.py\", line 412, in execute\n execute = self.locked_retry(Database.Cursor.execute, query, params)\n\n Fil
e \"./freenasUI/freeadmin/sqlite3_ha/base.py\", line 389, in locked_retry\n rv = method(self, *args, **kwargs)\n\ndjango.db.utils.IntegrityError: UNIQUE constraint failed: services_iscsitargetgroups.iscsi_target_id, services_iscsitargetgroups.iscsi_target_portalgroup_id\n"}, status: 500
I0321 15:23:43.560247 1 leaderelection.go:156] attempting to acquire leader lease...
I0321 15:23:43.568511 1 leaderelection.go:178] successfully acquired lease to provision for pvc default/freenas-test-iscsi-pvc
I0321 15:23:43.571167 1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"freenas-test-iscsi-pvc", UID:"4176cb57-677b-4e5a-ab7f-90e246f1a79a", APIVersion:"v1", ResourceVersion:"77826047", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/freenas-test-iscsi-pvc"
I0321 15:23:43.625147 1 provisioner.go:401] Creating target: "pvc-4176cb57-677b-4e5a-ab7f-90e246f1a79a", zvol: "vol1/k8s/pvc-4176cb57-677b-4e5a-ab7f-90e246f1a79a", extent: "pvc-4176cb57-677b-4e5a-ab7f-90e246f1a79a"
I0321 15:23:44.574455 1 extent.go:99] found Extent name: pvc-4176cb57-677b-4e5a-ab7f-90e246f1a79a - {ID:2 AvailThreshold:0 Blocksize:512 Comment:default/freenas-test-iscsi-pvc Filesize:0 InsecureTpc:true Legacy:false Naa:0x6589cfc00000046ac0d3aa132cb03b07 Name:pvc-4176cb57-677b-4e5a-ab7f-90e246f1a79a Path:/dev/zvol/vol1/k8s/pvc-4176cb57-677b-4e5a-ab7f-90e246f1a79a Disk:zvol/vol1/k8s/pvc-4176cb57-677b-4e5a-ab7f-90e246f1a79a Pblocksize:true Ro:false Rpm:SSD Serial:000c29d93f5901 Type:ZVOL Xen:false}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xfb844d]
goroutine 87 [running]:
github.com/travisghansen/freenas-iscsi-provisioner/provisioner.(*freenasProvisioner).Provision(0xc00000c200, 0xc0003acf48, 0x6, 0xc0004aa0c0, 0x28, 0x0, 0x0, 0x0, 0xc0003a2340, 0xc0004a9e90, ...)
/home/travis/gopath/src/github.com/travisghansen/freenas-iscsi-provisioner/provisioner/provisioner.go:606 +0x1b1d
github.com/travisghansen/freenas-iscsi-provisioner/vendor/github.com/kubernetes-incubator/external-storage/lib/controller.(*ProvisionController).provisionClaimOperation(0xc000470000, 0xc0003a2340, 0x1f46060, 0xc0000dcfc0)
/home/travis/gopath/src/github.com/travisghansen/freenas-iscsi-provisioner/vendor/github.com/kubernetes-incubator/external-storage/lib/controller/controller.go:1066 +0xdf2
github.com/travisghansen/freenas-iscsi-provisioner/vendor/github.com/kubernetes-incubator/external-storage/lib/controller.(*ProvisionController).lockProvisionClaimOperation.func1(0xc00009c120)
/home/travis/gopath/src/github.com/travisghansen/freenas-iscsi-provisioner/vendor/github.com/kubernetes-incubator/external-storage/lib/controller/controller.go:935 +0x76
created by github.com/travisghansen/freenas-iscsi-provisioner/vendor/github.com/kubernetes-incubator/external-storage/lib/leaderelection.(*LeaderElector).Run
/home/travis/gopath/src/github.com/travisghansen/freenas-iscsi-provisioner/vendor/github.com/kubernetes-incubator/external-storage/lib/leaderelection/leaderelection.go:129 +0xad

Updating to 11.2 in my test freenas to see if that changes anything.

I was just backtracking to see if I could better understand and quickly fix, but I need to move forward with your CSI implementation, it's just not as clear to me yet, and I think I'm trying to go the opposite direction away from helm charts, other than using them as initial templates for all the various deployment, etc. yamls.

@travisghansen
Copy link
Owner

Ok, yeah let me know how it goes and if you need help understanding what's going on with csi.

I'll dig through the backtrace and see if I can figure out what's going on.

@jp83
Copy link
Author

jp83 commented Mar 21, 2020

Well, whatdayaknow, updating to 11.2-U8 and basically everything started working swimmingly. If I were to use this for a bit, I don't guess there's any upgrade path to CSI or to remap PVCs? I guess this is kinda a general concern going forward, if I update freenas or something and break the dynamic provisioning is there a (good) way to match up the dynamic PVCs without having to restore data from a backup?

I've been holding up updating my main FreeNAS because I've got a local backup and remote running on older HP Microserver Gen7 with USB boot drive that seems to be chugging along with zfs send just fine. I'm afraid updates, I guess in particular new zfs features flags may make this incompatible and force updates there which would probably require SSD boot disks.

So it the basic procedure for your CSI just to add the helm repo and deploy with updated values like in the freenas example file?

@travisghansen
Copy link
Owner

Wow, I used with 11.1 quite a bit so u7 might be a dud.

Couple things:

  • I've used the legacy providers (this project and the nfs variant) for probably close to 2 years now without issue. Generally they've been solid with very very few issues.

  • I put the csi driver in use on a cluster with lots of churn a few months back...it's been flawless to this point.

  • the csi driver does almost all of it's actions over ssh invoking zfs commands directly instead of using the FreeNAS API. The surface area for potential incompatible call with FreeNAS is greatly diminished.

You are correct, there is no upgrade path currently between these and csi and likely won't be since a couple features had to be removed. I personally have just been slowly moving old workloads to csi using zfs send etc. I also haven't addressed the manual import process but have thought about it a bit and may implement something in the future. It's technically possible for sure, but requires setting quite a few things manually including custom zfs properties on datasets/zvols etc.

Yes, to deploy the csi driver you should only need to update the values as appropriate and run helm install. Everything required to work correctly will be installed. I wouldn't do anything with snapshots currently as k8s recently updated that from alpha to beta with breaking changes (I'll update the chart soon to support the beta stuff). Also if you intend to use resize features etc you may need to set some feature flags on your k8s components depending on what version you're running.

@jp83
Copy link
Author

jp83 commented Mar 21, 2020

Ok, I'll try to get the CSI driver going and ask any other relevant questions over there.

I'm really curious about the exacts steps of your migration process with zfs send. I think my main concern (still learning) is accidentally blowing away data. I think I'd like to set default policy to retain rather than delete pvcs, and so I'm wondering if I redeploy something the best way to copy data from existing (old) pvc to new dynamically created one, or remap to old one. Once dynamically provisioned I see the volume claim yaml, but it's not editable, so I guess the other concern is backing up this info in the cluster, or rather be able to start a new cluster, redeploy, and remap/bring data over.

For snapshots I was hoping from the freenas side I could just nightly (when things should mostly be quiescent) create a snapshot to zfs send remotely. In practice, is this not robust enough and can lead to data corruption? I'm assuming the real k8s way will temporarily pause data (scale down deployment?), tell freenas to snapshot, and then spin back up?

Resizing would be a great feature, as it's something I've been concerned about particularly when I'm unsure how things will grow, so interested more on that later. For now a lot of the storage size appears fixed upon creation so I figured I'd eventually get backup/restore going to accomplish that when needed.

Thanks again.

@travisghansen
Copy link
Owner

Yeah, you can set the policy on the class and any new PVs provisioned will automatically use the new policy. You can update any existing PVs manually.

Generally the steps would be:

  1. Set the reclaim policy to Retain
  2. Delete the app/PVC
  3. Recreate app/PVC with new storage class
  4. set relevant workload replicas to 0 (ie: stop all pods accessing the data)
  5. Go to directly to storage system and copy however appropriate (zfs send or rsync or whatever)
  6. bump replicas and ensure workloads have appropriate data etc
  7. Set reclaim policy on old PV to Delete
  8. Delete old PV

Regarding snapshots you are more than welcome to do snapshots as you mentioned and it's a great idea. What I mean by snapshots in this context is using the k8s resources (https://kubernetes.io/docs/concepts/storage/volume-snapshots/) to create 'snapshots' via the k8s API (ie: structured resources in k8s). After you create a structured snapshot you can actually create new PVCs based off a k8s snapshot. Now, in the csi driver 'snapshots' can actually take 2 forms:

  • actual zfs snapshots (saves space)
  • creating a k8s snapshot by using zfs send to create a new dataset/zvol (increases space but decouples the 'snapshot' from the PV/PVC)

When creating snapshots out-of-band or using the new snapshots features of k8s no scaling down is required.

Resizing was the primary motiviator behind implementing the csi driver. The legacy tooling doesn't provide the mechanisms to do this unfortunately. Suffice to say, I've implemented every nook/cranny/detail of the spec to work with this. nfs shares simply resize the dataset server-side, iscsi volumes will resize zvol on the server and then the disk/filesystem on the node. If you're using the raw block device feature then only the disk size is updated on the node by rescanning the connection. All of it can be done 'online' without stopping the pod(s) running the service.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants