Lingering .send-#####-1 holds #173

rottegift · 2014-05-02T13:10:34Z

I'm seeing stale holds.

$ sudo zfs destroy -v ssdpool/foo@zfs-auto-snap_hourly-2014-05-01-0810
will destroy ssdpool/foo@zfs-auto-snap_hourly-2014-05-01-0810
will reclaim 1.65G
cannot destroy snapshot ssdpool/foo@zfs-auto-snap_hourly-2014-05-01-0810: dataset is busy
$ sudo zfs holds ssdpool/foo@zfs-auto-snap_hourly-2014-05-01-0810
NAME                                              TAG            TIMESTAMP
ssdpool/foo@zfs-auto-snap_hourly-2014-05-01-0810  .send-95118-1  Thu May  1 19:27 2014
ssdpool/foo@zfs-auto-snap_hourly-2014-05-01-0810  .send-78468-1  Thu May  1 13:17 2014

(This is in a pipeline zfs send ... | ssh target zfs recv ... where the snapshot in question succeeds

receiving incremental stream of ssdpool/foo@zfs-auto-snap_hourly-2014-05-01-0810 into OldStuff/CLA-TM/from_ssdpool/ssdpool/foo@zfs-auto-snap_hourly-2014-05-01-0810
received 1.92GB stream in 1606 seconds (1.23MB/sec)

and that whole incremental send succeeded without error -- I am 80% sure, anyway; the recv host has crashed a couple times in the past ocuple ofdays, and I cannot exclude the possibility that this snapshot was in the range of (one or two) incremental send(s) where the recv went away.

Removing the holds by hand correctly activates a defer_destroy

$ sudo zfs destroy -d -v ssdpool/foo@zfs-auto-snap_hourly-2014-05-01-0810
will destroy ssdpool/foo@zfs-auto-snap_hourly-2014-05-01-0810
will reclaim 1.65G
$ zfs release  .send-95118-1 ssdpool/foo@zfs-auto-snap_hourly-2014-05-01-0810  
$ zfs release .send-78468-1 ssdpool/foo@zfs-auto-snap_hourly-2014-05-01-0810

I'm pretty sure this has hit other zfs implementations in the past too, but perhaps O3X's zfs send cleanup has a local bug rather than inheriting this from upstream.

The text was updated successfully, but these errors were encountered:

rottegift · 2014-05-02T13:34:57Z

It doesn't seem to be illumos #3645 (for which we have ZoL's fix around dump_bytes).

rottegift · 2014-05-02T13:45:27Z

Oddly, the same hold on several other snapshots did not linger:

2014-05-01.13:17:26 [txg:94689933] hold ssdpool/foo@zfs-auto-snap_hourly-2014-04-29-2231 (20027) tag=.send-78468-1 temp=1 refs=1
2014-05-01.13:17:26 [txg:94689933] hold ssdpool/foo@2014-04-29-225559 (20183) tag=.send-78468-1 temp=1 refs=1
2014-05-01.13:17:26 [txg:94689933] hold ssdpool/foo@zfs-auto-snap_hourly-2014-04-29-2331 (20439) tag=.send-78468-1 temp=1 refs=1
2014-05-01.13:17:26 [txg:94689933] hold ssdpool/foo@zfs-auto-snap_hourly-2014-04-30-0912 (260) tag=.send-78468-1 temp=1 refs=1
2014-05-01.13:17:26 [txg:94689933] hold ssdpool/foo@2014-04-30-095114 (557) tag=.send-78468-1 temp=1 refs=1
2014-05-01.13:17:26 [txg:94689933] hold ssdpool/foo@2014-04-30-135114 (647) tag=.send-78468-1 temp=1 refs=1
2014-05-01.13:17:26 [txg:94689933] hold ssdpool/foo@zfs-auto-snap_hourly-2014-04-30-1835 (4481) tag=.send-78468-1 temp=1 refs=1
2014-05-01.13:17:26 [txg:94689933] hold ssdpool/foo@zfs-auto-snap_hourly-2014-04-30-1935 (5007) tag=.send-78468-1 temp=1 refs=1
2014-05-01.13:17:26 [txg:94689933] hold ssdpool/foo@zfs-auto-snap_hourly-2014-04-30-2035 (5417) tag=.send-78468-1 temp=1 refs=1
2014-05-01.13:17:26 [txg:94689933] hold ssdpool/foo@zfs-auto-snap_hourly-2014-04-30-2135 (5880) tag=.send-78468-1 temp=1 refs=1
2014-05-01.13:17:26 [txg:94689933] hold ssdpool/foo@zfs-auto-snap_hourly-2014-04-30-2235 (6342) tag=.send-78468-1 temp=1 refs=1
2014-05-01.13:17:26 [txg:94689933] hold ssdpool/foo@zfs-auto-snap_hourly-2014-04-30-2335 (6781) tag=.send-78468-1 temp=1 refs=1
2014-05-01.13:17:26 [txg:94689933] hold ssdpool/foo@zfs-auto-snap_hourly-2014-05-01-0035 (7182) tag=.send-78468-1 temp=1 refs=1
2014-05-01.13:17:26 [txg:94689933] hold ssdpool/foo@zfs-auto-snap_hourly-2014-05-01-0135 (7575) tag=.send-78468-1 temp=1 refs=1
2014-05-01.13:17:26 [txg:94689933] hold ssdpool/foo@zfs-auto-snap_hourly-2014-05-01-0235 (7993) tag=.send-78468-1 temp=1 refs=1
2014-05-01.13:17:26 [txg:94689933] hold ssdpool/foo@zfs-auto-snap_hourly-2014-05-01-0335 (8371) tag=.send-78468-1 temp=1 refs=1
2014-05-01.13:17:26 [txg:94689933] hold ssdpool/foo@zfs-auto-snap_hourly-2014-05-01-0435 (8789) tag=.send-78468-1 temp=1 refs=1
2014-05-01.13:17:26 [txg:94689933] hold ssdpool/foo@zfs-auto-snap_hourly-2014-05-01-0810 (1287) tag=.send-78468-1 temp=1 refs=1
2014-05-01.13:17:26 [txg:94689933] hold ssdpool/foo@zfs-auto-snap_hourly-2014-05-01-0910 (1729) tag=.send-78468-1 temp=1 refs=1
2014-05-01.13:17:26 [txg:94689933] hold ssdpool/foo@zfs-auto-snap_hourly-2014-05-01-1010 (2165) tag=.send-78468-1 temp=1 refs=1
2014-05-01.13:17:26 [txg:94689933] hold ssdpool/foo@zfs-auto-snap_hourly-2014-05-01-1110 (2647) tag=.send-78468-1 temp=1 refs=1
2014-05-01.13:17:26 [txg:94689933] hold ssdpool/foo@2014-05-01-114403 (2951) tag=.send-78468-1 temp=1 refs=1
2014-05-01.19:27:39 [txg:94693796] hold ssdpool/foo@zfs-auto-snap_hourly-2014-04-30-2335 (6781) tag=.send-95118-1 temp=1 refs=2
2014-05-01.19:27:39 [txg:94693796] hold ssdpool/foo@zfs-auto-snap_hourly-2014-05-01-0035 (7182) tag=.send-95118-1 temp=1 refs=2
2014-05-01.19:27:39 [txg:94693796] hold ssdpool/foo@zfs-auto-snap_hourly-2014-05-01-0135 (7575) tag=.send-95118-1 temp=1 refs=2
2014-05-01.19:27:39 [txg:94693796] hold ssdpool/foo@zfs-auto-snap_hourly-2014-05-01-0235 (7993) tag=.send-95118-1 temp=1 refs=2
2014-05-01.19:27:39 [txg:94693796] hold ssdpool/foo@zfs-auto-snap_hourly-2014-05-01-0335 (8371) tag=.send-95118-1 temp=1 refs=2
2014-05-01.19:27:39 [txg:94693796] hold ssdpool/foo@zfs-auto-snap_hourly-2014-05-01-0435 (8789) tag=.send-95118-1 temp=1 refs=2
2014-05-01.19:27:39 [txg:94693796] hold ssdpool/foo@zfs-auto-snap_hourly-2014-05-01-0810 (1287) tag=.send-95118-1 temp=1 refs=2
2014-05-01.19:27:39 [txg:94693796] hold ssdpool/foo@zfs-auto-snap_hourly-2014-05-01-0910 (1729) tag=.send-95118-1 temp=1 refs=2
2014-05-01.19:27:39 [txg:94693796] hold ssdpool/foo@zfs-auto-snap_hourly-2014-05-01-1010 (2165) tag=.send-95118-1 temp=1 refs=2
2014-05-01.19:27:39 [txg:94693796] hold ssdpool/foo@zfs-auto-snap_hourly-2014-05-01-1110 (2647) tag=.send-95118-1 temp=1 refs=2
2014-05-01.19:27:39 [txg:94693796] hold ssdpool/foo@2014-05-01-114403 (2951) tag=.send-95118-1 temp=1 refs=2
2014-05-01.19:27:39 [txg:94693796] hold ssdpool/foo@zfs-auto-snap_hourly-2014-05-01-1210 (3164) tag=.send-95118-1 temp=1 refs=1
2014-05-01.19:27:39 [txg:94693796] hold ssdpool/foo@zfs-auto-snap_hourly-2014-05-01-1310 (3562) tag=.send-95118-1 temp=1 refs=1
2014-05-01.19:27:39 [txg:94693796] hold ssdpool/foo@zfs-auto-snap_hourly-2014-05-01-1410 (3995) tag=.send-95118-1 temp=1 refs=1
2014-05-01.19:27:39 [txg:94693796] hold ssdpool/foo@zfs-auto-snap_hourly-2014-05-01-1510 (8657) tag=.send-95118-1 temp=1 refs=1
2014-05-01.19:27:39 [txg:94693796] hold ssdpool/foo@2014-05-01-191853 (8746) tag=.send-95118-1 temp=1 refs=1

and my subsequent by-hand zfs release

2014-05-02.14:01:16 [txg:94725259] release ssdpool/foo@zfs-auto-snap_hourly-2014-05-01-0810 (1287) tag=.send-95118-1 refs=1
2014-05-02.14:01:25 [txg:94725261] release ssdpool/foo@zfs-auto-snap_hourly-2014-05-01-0810 
(1287) tag=.send-78468-1 refs=0

and finally

$ uptime
14:38  up 1 day,  8:28, 24 users, load averages: 4.52 4.35 4.42

rottegift · 2014-05-03T15:00:50Z

Hm, after a series of reboots, full exports and full imports, all while the zfs recv receiver is stable, I'm still seeing this. I'll try to provide a reduced test case.

lundman · 2014-05-13T01:10:13Z

It is not all send operations that trigger the automatic holds to be in effect;

"doall" and "replicate" has to be on for holds to be used. Ie, -I and -R.

Then it makes a list of snapshots to hold, and call lzc_hold().

This uses the (somewhat) new zfs_onexit_ API to setup callbacks when the cleanup_fd is closed. On OSX, this does differ, in that we have to use current_proc() instead of fd to match.

The setup is done in zfs_ioc_hold

    if (nvlist_lookup_int32(args, "cleanup_fd", &cleanup_fd) == 0) {
        error = zfs_onexit_fd_hold(cleanup_fd, &minor);
        if (error != 0)
            return (error);
    }
    error = dsl_dataset_user_hold(holds, minor, errlist);

Now in zfs_onexit_fd_hold we already do the OSX translation to use current_proc, and in zfsdev_release we are (only) given the current_proc to match the correct onexit callback, and call it. In this case it should be dsl_dataset_user_release_onexit().

It is possible there is some case we missed for the OSX version, but I don't consider it a release-blocker for the next Installer at this moment.

rottegift · 2014-06-16T09:02:18Z

This persists in master after the 0.6.3 sync.

ilovezfs · 2014-06-16T14:12:47Z

Yeah, we think we may know why the holds are not released. Cleanup is currently per-process not per-open-fd, so if it happens that /dev/zfs is opened, opened, closed, closed, only one of the two cleanups is happening, whereas if it were opened, closed, opened, closed, both would happen. This is not straightforward to fix, but we'll probably need to do something like what Apple does with audit_sdevs:
http://fxr.watson.org/fxr/source/bsd/security/audit/audit_session.c?v=xnu-2050.18.24;im=excerpts#L2009

ilovezfs · 2014-06-16T14:16:50Z

Evidence for this is in your paste:
#194 (comment)

Notice:

16/06/2014 14:56:50.000 kernel[0]: zfsdev_open, flag 03 devtype 8192, proc is 0xffffff80364db5d8: thread 0xffffff8031382590
16/06/2014 14:56:50.000 kernel[0]: created zs 0xffffff81f7f551e8
16/06/2014 14:56:50.000 kernel[0]: zfsdev_open, flag 03 devtype 8192, proc is 0xffffff80364db5d8: thread 0xffffff8031382590
16/06/2014 14:56:50.000 kernel[0]: zs already exists

rottegift · 2014-06-16T14:28:22Z

Ok, neat. I have a local workaround that cleans up the stale holds, but will happily test proposed fixes.

and patch that into zfsdev_minor_alloc(), causing a unique minor to be allocated for each open. We then need to create softstate for ctldev so we can differentiate between ctldev and zvol ioctls. Issue #173

lundman · 2014-06-17T06:32:25Z

A bit more work went into that, you also need to pull SPL.

rottegift closed this as completed May 2, 2014

rottegift reopened this May 3, 2014

rottegift mentioned this issue May 29, 2014

vdev_iokit branch #178

Closed

rottegift mentioned this issue Jun 11, 2014

zfs send -R -i or -I | recv -F -dv wont delete old snapshots on receiver side openzfs/zfs#1253

Closed

ilovezfs added the bug label Jun 16, 2014

ilovezfs closed this as completed Jun 17, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lingering .send-#####-1 holds #173

Lingering .send-#####-1 holds #173

rottegift commented May 2, 2014

rottegift commented May 2, 2014

rottegift commented May 2, 2014

rottegift commented May 3, 2014

lundman commented May 13, 2014

rottegift commented Jun 16, 2014

ilovezfs commented Jun 16, 2014

ilovezfs commented Jun 16, 2014

rottegift commented Jun 16, 2014

lundman commented Jun 17, 2014

Lingering .send-#####-1 holds #173

Lingering .send-#####-1 holds #173

Comments

rottegift commented May 2, 2014

rottegift commented May 2, 2014

rottegift commented May 2, 2014

rottegift commented May 3, 2014

lundman commented May 13, 2014

rottegift commented Jun 16, 2014

ilovezfs commented Jun 16, 2014

ilovezfs commented Jun 16, 2014

rottegift commented Jun 16, 2014

lundman commented Jun 17, 2014