Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zfs-auto-snap_daily - dataset already exists #118

Open
tonyblue2 opened this issue Jul 31, 2020 · 6 comments
Open

zfs-auto-snap_daily - dataset already exists #118

tonyblue2 opened this issue Jul 31, 2020 · 6 comments

Comments

@tonyblue2
Copy link

Hello,

I installed zfs-auto-snapshot to get snapshots of my datasets.

So far, it works quite well. But once a week I get the error message:

cannot create snapshot 'rpool/home@zfs-auto-snap_daily-2020-07-30-2228': dataset already exists
cannot create snapshot 'rpool/ROOT@zfs-auto-snap_daily-2020-07-30-2228': dataset already exists
cannot create snapshot 'rpool/ROOT/pve-1@zfs-auto-snap_daily-2020-07-30-2228': dataset already exists
no snapshots were created
cannot create snapshot 'rpool/home/data1@zfs-auto-snap_daily-2020-07-30-2228': dataset already exists
no snapshots were created
cannot create snapshot 'rpool/home/data2@zfs-auto-snap_daily-2020-07-30-2228': dataset already exists
no snapshots were created
...

/etc/cron.daily/zfs-auto-snapshot:

#!/bin/sh

# Only call zfs-auto-snapshot if it's available
which zfs-auto-snapshot > /dev/null || exit 0

exec zfs-auto-snapshot --quiet --syslog --label=daily --keep=31 //

Somehow it seems like a second call to zfs-auto-snapshot - doesn't it?

Maybe someone has a tip for me how to find out?

Thank you so much!

Tony

@ljmanz
Copy link

ljmanz commented Dec 29, 2020

Hello,

I see the same error as @tonyblue2 using the zfs-auto-snap but only for the frequent snapshots appearing once per day. Hourly, daily etc. are running without problems:

cannot create snapshot 'rpool@zfs-auto-snap_frequent-2020-12-29-0234': dataset already exists
cannot create snapshot 'rpool/rootfs@zfs-auto-snap_frequent-2020-12-29-0234': dataset already exists
cannot create snapshot 'rpool/rootfs/opt@zfs-auto-snap_frequent-2020-12-29-0234': dataset already exists
no snapshots were created
cannot create snapshot 'rpool/rootfs/var/lib@zfs-auto-snap_frequent-2020-12-29-0234': dataset already exists
no snapshots were created
...

The cron call is based on the project examples, running every 15 minutes and keeping 64 snapshots:

PATH="/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin"

*/15 * * * * root which zfs-auto-snapshot > /dev/null || exit 0 ; zfs-auto-snapshot --verbose --quiet --syslog --label=frequent --keep=64 //

The cron job is run only once every 15 minutes and there are no multiple cron services running.

Still the zfs-auto-snap log is all over the place, also repeating snapshots multiple times:

Dec 29 02:41:49 host zfs-auto-snap[26512]: @zfs-auto-snap_frequent-2020-12-29-0141, 5 created, 5 destroyed, 0 warnings.
Dec 29 02:41:55 host zfs-auto-snap[28829]: @zfs-auto-snap_hourly-2020-12-29-0141, 5 created, 5 destroyed, 0 warnings.
Dec 29 02:47:49 host zfs-auto-snap[11796]: zfs snapshot -o com.sun:auto-snapshot-desc='-'  'rpool@zfs-auto-snap_frequent-2020-12-29-0147' returned 1
Dec 29 02:47:49 host zfs-auto-snap[12201]: zfs snapshot -o com.sun:auto-snapshot-desc='-'  'rpool/rootfs@zfs-auto-snap_frequent-2020-12-29-0147' returned 1
Dec 29 02:47:58 host zfs-auto-snap[13853]: zfs snapshot -o com.sun:auto-snapshot-desc='-' -r 'datastore@zfs-auto-snap_frequent-2020-12-29-0147' returned 1
Dec 29 02:47:59 host zfs-auto-snap[14280]: @zfs-auto-snap_frequent-2020-12-29-0147, 3 created, 3 destroyed, 2 warnings.
Dec 29 02:48:03 host zfs-auto-snap[17621]: zfs snapshot -o com.sun:auto-snapshot-desc='-' -r 'rpool/rootfs/opt@zfs-auto-snap_frequent-2020-12-29-0147' returned 1
Dec 29 02:48:03 host zfs-auto-snap[17705]: zfs snapshot -o com.sun:auto-snapshot-desc='-' -r 'rpool/rootfs/var/lib@zfs-auto-snap_frequent-2020-12-29-0147' returned 1
Dec 29 02:48:03 host zfs-auto-snap[17706]: @zfs-auto-snap_frequent-2020-12-29-0147, 2 created, 2 destroyed, 3 warnings.
Dec 29 03:34:35 host zfs-auto-snap[11090]: zfs snapshot -o com.sun:auto-snapshot-desc='-'  'rpool@zfs-auto-snap_frequent-2020-12-29-0234' returned 1
Dec 29 03:34:35 host zfs-auto-snap[11264]: zfs snapshot -o com.sun:auto-snapshot-desc='-'  'rpool@zfs-auto-snap_frequent-2020-12-29-0234' returned 1
Dec 29 03:34:35 host zfs-auto-snap[11278]: zfs snapshot -o com.sun:auto-snapshot-desc='-'  'rpool/rootfs@zfs-auto-snap_frequent-2020-12-29-0234' returned 1
Dec 29 03:34:35 host zfs-auto-snap[11678]: zfs snapshot -o com.sun:auto-snapshot-desc='-'  'rpool/rootfs@zfs-auto-snap_frequent-2020-12-29-0234' returned 1
Dec 29 03:34:50 host zfs-auto-snap[15302]: zfs snapshot -o com.sun:auto-snapshot-desc='-' -r 'datastore@zfs-auto-snap_frequent-2020-12-29-0234' returned 1
Dec 29 03:34:50 host zfs-auto-snap[15299]: zfs snapshot -o com.sun:auto-snapshot-desc='-' -r 'datastore@zfs-auto-snap_frequent-2020-12-29-0234' returned 1
Dec 29 03:34:50 host zfs-auto-snap[15343]: zfs snapshot -o com.sun:auto-snapshot-desc='-' -r 'rpool/rootfs/opt@zfs-auto-snap_frequent-2020-12-29-0234' returned 1
Dec 29 03:34:50 host zfs-auto-snap[15747]: zfs snapshot -o com.sun:auto-snapshot-desc='-' -r 'rpool/rootfs/var/lib@zfs-auto-snap_frequent-2020-12-29-0234' returned 1
Dec 29 03:34:50 host zfs-auto-snap[15748]: @zfs-auto-snap_frequent-2020-12-29-0234, 2 created, 2 destroyed, 3 warnings.
Dec 29 03:34:50 host zfs-auto-snap[15750]: @zfs-auto-snap_frequent-2020-12-29-0234, 2 created, 2 destroyed, 3 warnings.
Dec 29 03:35:00 host zfs-auto-snap[17467]: zfs snapshot -o com.sun:auto-snapshot-desc='-' -r 'rpool/rootfs/opt@zfs-auto-snap_frequent-2020-12-29-0234' returned 1
Dec 29 03:35:00 host zfs-auto-snap[17551]: zfs snapshot -o com.sun:auto-snapshot-desc='-' -r 'rpool/rootfs/var/lib@zfs-auto-snap_frequent-2020-12-29-0234' returned 1
Dec 29 03:35:00 host zfs-auto-snap[17552]: @zfs-auto-snap_frequent-2020-12-29-0234, 1 created, 1 destroyed, 4 warnings.
Dec 29 03:35:13 host zfs-auto-snap[20477]: @zfs-auto-snap_hourly-2020-12-29-0234, 5 created, 5 destroyed, 0 warnings.
Dec 29 03:46:18 host zfs-auto-snap[25819]: @zfs-auto-snap_frequent-2020-12-29-0246, 5 created, 5 destroyed, 0 warnings.

I also took a look at the script but could not see anything that would produce this behaviour.

Manual creation of the snapshots did not reproduce the issue. I will try to debug this further and will post more verbose output if I can reproduce the issue manually.

Thanks in advance.

Lukas

@ljmanz
Copy link

ljmanz commented Jan 21, 2021

I found the problem for my case:

With a high number of datasets in combination with HDDs which are not that fast,
the Cron started the next frequent (15 minute interval) snapshot before the previous one finished.

This resulted in the Cron forking various frequent snapshots, accumulating over time, that then used the same timestamp from the environment and resulted in the error "dataset already exists".

This also happens for the hourly snapshot in times of high load (e.g. backups).

@deviantintegral
Copy link

That would make sense. I encountered this while a scrub was running, and after the scrub completed the errors stopped.

I suppose the cron process could set up a lock, but perhaps this is best as documentation or a better error message.

@bbccdd
Copy link

bbccdd commented Jul 29, 2022

Having similar issues, but with the hourlies during low-load situations. Indeed, locking would be a good solution I think.

@tomchiverton
Copy link

tomchiverton commented Aug 12, 2023

Happens here too. Low load.

Maybe all the scripts need to include something like

if mkdir /var/lock/zfs-auto-snap; then
  # echo "Locking succeeded" 
else
  # echo "Lock failed - exit" 
  exit 1
fi

and them rmdir /var/lock/zfs-auto-snap at the end ?

( in my case, the issue was this cron bug : https://stackoverflow.com/questions/31886555/centos-6-7-cron-bug-run-parts-starts-twice, but it'd be easy to reduce the foot gun chances here )

@bedaro
Copy link

bedaro commented Sep 5, 2023

One way to fix this would be to use systemd timers rather than cron scripts. You can configure each timer using OnUnitActiveSec so it will run on an interval based on the finish time, rather than the start time.

Of course systemd can be controversial. I'm willing to make & contribute the fix if it's an acceptable change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants