Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lost pools after many successful send/receives #3010

Closed
cousins opened this issue Jan 12, 2015 · 21 comments
Closed

Lost pools after many successful send/receives #3010

cousins opened this issue Jan 12, 2015 · 21 comments
Labels
Status: Inactive Not being actively updated

Comments

@cousins
Copy link

cousins commented Jan 12, 2015

I recently (December) set up a new server to backup our primary ZFS server. It has CentOS 6.6 and ZFS/SPL version 0.6.3. I set up a big pool (similar to the primary pool) and then used send/receive to send snapshots over to the backup server. After the initial snapshots were done I sent incrementals over. All seemed fine.

On Friday (always on a Friday or holiday...) I noticed that the daily incremental sends weren't working anymore. Checking on the new server now I don't see some of the pools and I find no reference to any of the snapshots on the backup server. zpool status shows no problems and dmesg also shows nothing. I thought maybe there was just some inconsistency so I rebooted but things look exactly the same. I am hesitant to do anything until I get some advice.

In looking at the history I see that I didn't actually create a sub-pool on the backup server. So for instance I did:

zfs send pool2/os-grizzly@2014-12-18 | ssh nfs2-ib zfs receive -Fduv pool3

And then subsequently:

zfs send -R -i pool2/os-grizzly@2014-12-18 pool2/os-grizzly@2014-12-19 | ssh nfs2-ib zfs receive -Fduv pool3

And all seemed to work. I remember seeing /pool3/os-grizzly on the backup server and I could see snapshots listed on the backup server for pool3/os-grizzly. I have backup logs showing that it sent/received the streams successfully.

I have other pools that are still showing up but they are pools that show a "create" entry in the zpool history log. Since it seemed to work without doing a create I figured I was all set. Now I'm wondering what happened. The data went somewhere and I wonder where it went. Are there any steps I can take to see what happened? Since this is a backup I'm not worried about the data. I'm more worried about pools vanishing when I really need them so I'd like to know what happened.

I can supply more detailed information but for now I'm just looking for higher level information about what might have happened.

Thanks a lot for any advice.

Steve

@dweeezil
Copy link
Contributor

@cousins Is there a chance the sending system has the 0574855 commit? As mentioned in #2907, it can cause a zfs recv -R to remove filesystems and/or snapshots unexpectedly.

@cousins
Copy link
Author

cousins commented Jan 13, 2015

@dweezil The sending system has zfs-0.6.2-224_g4d8c78c.el6.x86_64 Could that be the issue? That server is a production server that we can't update easily and it has been running flawlessly, until now possibly.

I guess my main question right now is: when sending a snapshot of a pool, does a pool of that same name need to exist already on the receiving side? For instance, with sub-pool pool2/os-grizzly, if I send that to pool3 on another server that doesn't already have pool3/os-grizzly is it expected that pool3/os-grizzly will get pruned some time?

Thanks a lot,

Steve

@dweeezil
Copy link
Contributor

@cousins Your sending system of 4d8c78c does not have 0574855.

If you are sending a full stream, the dataset should not exist on the receiving side. If you are sending an incremental stream, it should exist. If you're not sure of the way the dataset name is being constructed on the receiving side, you should run the receive with "-n" and it will show what it would do without doing anything.

@cousins
Copy link
Author

cousins commented Jan 13, 2015

@dweeezil So, it seems that I did things correctly. I just found:

zpool history -i

and now I see a lot of destroy events from last Thursday afternoon. I'm showing all of the transactions for the 8th. The first ones are from an automatic backup in the middle of the night. The destroy ones start at 16:25

2015-01-08.01:20:57 [txg:2659759] receive pool3/cinder-volumes/%recv (1699)
2015-01-08.01:21:15 [txg:2659763] finish receiving pool3/cinder-volumes/%recv (1699) snap=2015-01-08
2015-01-08.01:21:15 [txg:2659763] clone swap pool3/cinder-volumes/%recv (1699) parent=cinder-volumes
2015-01-08.01:21:15 [txg:2659763] snapshot pool3/cinder-volumes@2015-01-08 (1715)
2015-01-08.01:21:15 [txg:2659763] destroy pool3/cinder-volumes/%recv (1699)
2015-01-08.01:21:22 zfs receive -Fduv pool3
2015-01-08.01:21:22 [txg:2659765] receive pool3/os-grizzly/%recv (1721)
2015-01-08.01:32:54 [txg:2659862] finish receiving pool3/os-grizzly/%recv (1721) snap=2015-01-08
2015-01-08.01:32:54 [txg:2659862] clone swap pool3/os-grizzly/%recv (1721) parent=os-grizzly
2015-01-08.01:32:54 [txg:2659862] snapshot pool3/os-grizzly@2015-01-08 (1744)
2015-01-08.01:32:54 [txg:2659862] destroy pool3/os-grizzly/%recv (1721)
2015-01-08.01:33:01 zfs receive -Fduv pool3
2015-01-08.01:33:01 [txg:2659864] receive pool3/netflow/%recv (1750)
2015-01-08.01:39:23 [txg:2659917] finish receiving pool3/netflow/%recv (1750) snap=2015-01-08
2015-01-08.01:39:23 [txg:2659917] clone swap pool3/netflow/%recv (1750) parent=netflow
2015-01-08.01:39:23 [txg:2659917] snapshot pool3/netflow@2015-01-08 (1755)
2015-01-08.01:39:23 [txg:2659917] destroy pool3/netflow/%recv (1750)
2015-01-08.01:39:27 zfs receive -Fduv pool3
2015-01-08.01:39:27 [txg:2659919] receive pool3/epool1/home/%recv (1761)
2015-01-08.01:40:58 [txg:2659932] finish receiving pool3/epool1/home/%recv (1761) snap=2015-01-08
2015-01-08.01:40:58 [txg:2659932] clone swap pool3/epool1/home/%recv (1761) parent=home
2015-01-08.01:40:58 [txg:2659932] snapshot pool3/epool1/home@2015-01-08 (1775)
2015-01-08.01:40:58 [txg:2659932] destroy pool3/epool1/home/%recv (1761)
2015-01-08.16:25:18 [txg:2668384] destroy pool3/cinder-volumes@2014-12-18 (239)
2015-01-08.16:25:21 [txg:2668386] destroy pool3/cinder-volumes@2015-01-02 (1086)
2015-01-08.16:25:25 [txg:2668388] destroy pool3/cinder-volumes@2015-01-08 (1715)
2015-01-08.16:25:28 [txg:2668390] destroy pool3/cinder-volumes@2015-01-05 (1404)
2015-01-08.16:25:31 [txg:2668392] destroy pool3/cinder-volumes@2014-12-27 (535)
2015-01-08.16:25:34 [txg:2668394] destroy pool3/cinder-volumes@2015-01-01 (950)
2015-01-08.16:25:37 [txg:2668396] destroy pool3/cinder-volumes@2014-12-29 (688)
2015-01-08.16:25:40 [txg:2668398] destroy pool3/cinder-volumes@2014-12-23 (383)
2015-01-08.16:25:43 [txg:2668400] destroy pool3/cinder-volumes@2014-12-30 (791)
2015-01-08.16:25:46 [txg:2668402] destroy pool3/cinder-volumes@2014-12-24 (421)
2015-01-08.16:25:49 [txg:2668404] destroy pool3/cinder-volumes@2015-01-06 (1512)
2015-01-08.16:25:52 [txg:2668406] destroy pool3/cinder-volumes@2015-01-04 (1301)
2015-01-08.16:25:54 [txg:2668408] destroy pool3/cinder-volumes@2014-12-26 (485)
2015-01-08.16:25:57 [txg:2668410] destroy pool3/cinder-volumes@2015-01-03 (1190)
2015-01-08.16:25:59 [txg:2668412] destroy pool3/cinder-volumes@2014-12-25 (455)
2015-01-08.16:26:02 [txg:2668414] destroy pool3/cinder-volumes@2015-01-07 (1631)
2015-01-08.16:26:05 [txg:2668416] destroy pool3/cinder-volumes@2014-12-31 (859)
2015-01-08.16:26:07 [txg:2668418] destroy pool3/cinder-volumes@2014-12-22 (359)
2015-01-08.16:26:10 [txg:2668420] destroy pool3/cinder-volumes@2014-12-28 (611)
2015-01-08.16:26:13 [txg:2668422] destroy pool3/cinder-volumes (220)
2015-01-08.16:26:15 [txg:2668424] destroy pool3/os-grizzly@2015-01-06 (1539)
2015-01-08.16:26:18 [txg:2668426] destroy pool3/os-grizzly@2014-12-24 (435)
2015-01-08.16:26:20 [txg:2668428] destroy pool3/os-grizzly@2014-12-29 (707)
2015-01-08.16:26:23 [txg:2668430] destroy pool3/os-grizzly@2014-12-23 (396)
2015-01-08.16:26:25 [txg:2668432] destroy pool3/os-grizzly@2015-01-01 (972)
2015-01-08.16:26:28 [txg:2668434] destroy pool3/os-grizzly@2014-12-30 (811)
2015-01-08.16:26:31 [txg:2668436] destroy pool3/os-grizzly@2014-12-27 (552)
2015-01-08.16:26:34 [txg:2668438] destroy pool3/os-grizzly@2015-01-05 (1430)
2015-01-08.16:26:36 [txg:2668440] destroy pool3/os-grizzly@2015-01-02 (1109)
2015-01-08.16:26:39 [txg:2668442] destroy pool3/os-grizzly@2015-01-08 (1744)
2015-01-08.16:26:41 [txg:2668444] destroy pool3/os-grizzly@2014-12-18 (214)
2015-01-08.16:26:44 [txg:2668446] destroy pool3/os-grizzly@2014-12-31 (880)
2015-01-08.16:26:47 [txg:2668448] destroy pool3/os-grizzly@2014-12-22 (347)
2015-01-08.16:26:49 [txg:2668450] destroy pool3/os-grizzly@2014-12-28 (629)
2015-01-08.16:26:52 [txg:2668452] destroy pool3/os-grizzly@2014-12-17 (150)
2015-01-08.16:26:55 [txg:2668454] destroy pool3/os-grizzly@2015-01-07 (1659)
2015-01-08.16:26:58 [txg:2668456] destroy pool3/os-grizzly@2014-12-25 (470)
2015-01-08.16:27:00 [txg:2668458] destroy pool3/os-grizzly@2015-01-03 (1214)
2015-01-08.16:27:03 [txg:2668460] destroy pool3/os-grizzly@2014-12-26 (501)
2015-01-08.16:27:05 [txg:2668462] destroy pool3/os-grizzly@2015-01-04 (1326)
2015-01-08.16:27:08 [txg:2668464] destroy pool3/os-grizzly (123)
2015-01-08.16:27:11 [txg:2668466] rename pool3/epool1 (165) -> pool3/recv-345-1
2015-01-08.16:29:49 [txg:2668492] set pool3/omg-pool@20141231 (1693) $hasrecvd=
2015-01-08.16:29:53 [txg:2668494] destroy pool3/recv-345-1/home@2014-12-31 (1019)
2015-01-08.16:29:55 [txg:2668496] destroy pool3/recv-345-1/home@2015-01-07 (1690)
2015-01-08.16:29:58 [txg:2668498] destroy pool3/recv-345-1/home@2015-01-03 (1241)
2015-01-08.16:30:00 [txg:2668500] destroy pool3/recv-345-1/home@2015-01-04 (1355)
2015-01-08.16:30:03 [txg:2668502] destroy pool3/recv-345-1/home@2015-01-06 (1573)
2015-01-08.16:30:06 [txg:2668504] destroy pool3/recv-345-1/home@2014-12-29 (774)
2015-01-08.16:30:08 [txg:2668506] destroy pool3/recv-345-1/home@2015-01-01 (1032)
2015-01-08.16:30:11 [txg:2668508] destroy pool3/recv-345-1/home@2015-01-05 (1458)
2015-01-08.16:30:14 [txg:2668510] destroy pool3/recv-345-1/home@2014-12-18 (258)
2015-01-08.16:30:17 [txg:2668512] destroy pool3/recv-345-1/home@2015-01-08 (1775)
2015-01-08.16:30:19 [txg:2668514] destroy pool3/recv-345-1/home@2015-01-02 (1152)
2015-01-08.16:30:22 [txg:2668516] destroy pool3/recv-345-1/home (171)
2015-01-08.16:30:25 [txg:2668518] destroy pool3/netflow@2015-01-04 (1339)
2015-01-08.16:30:27 [txg:2668520] destroy pool3/netflow@2015-01-03 (1226)
2015-01-08.16:30:30 [txg:2668522] destroy pool3/netflow@2015-01-07 (1671)
2015-01-08.16:30:32 [txg:2668524] destroy pool3/netflow@2014-12-17 (152)
2015-01-08.16:30:35 [txg:2668526] destroy pool3/netflow@2014-12-31 (990)
2015-01-08.16:30:37 [txg:2668528] destroy pool3/netflow@2015-01-08 (1755)
2015-01-08.16:30:40 [txg:2668530] destroy pool3/netflow@2015-01-02 (1120)
2015-01-08.16:30:43 [txg:2668532] destroy pool3/netflow@2015-01-05 (1441)
2015-01-08.16:30:45 [txg:2668534] destroy pool3/netflow@2014-12-30 (921)
2015-01-08.16:30:48 [txg:2668536] destroy pool3/netflow@2015-01-01 (1044)
2015-01-08.16:30:50 [txg:2668538] destroy pool3/netflow@2015-01-06 (1551)
2015-01-08.16:30:52 [txg:2668540] destroy pool3/netflow (113)
2015-01-08.16:30:55 [txg:2668542] destroy pool3/recv-345-1 (165)

And yet:

zpool history

for the 8th just shows:

2015-01-08.01:21:22 zfs receive -Fduv pool3
2015-01-08.01:33:01 zfs receive -Fduv pool3
2015-01-08.01:39:27 zfs receive -Fduv pool3

which are three incrementals that were done early in the morning. No events showing me trying to destroy anything.

Any idea what happened?

Thanks a lot.

Steve

@cousins
Copy link
Author

cousins commented Jan 13, 2015

Incidently,I just noticed that /pool3 looks like:

drwxr-xr-x. 2 root root 2 Dec 29 11:25 cinder-volumes
drwxr-xr-x. 2 root root 2 Dec 17 23:49 netflow
drwxr-xr-x 3 root root 3 Apr 7 2014 omg-pool
drwxr-xr-x. 2 root root 2 Dec 17 23:49 os-grizzly
drwxr-xr-x. 3 root root 3 Jan 8 16:27 recv-345-1
drwxr-xr-x 15 root root 17 Jan 3 19:30 secondary

The only directories that have data in them are omg-pool and secondary. I don't know what recv-345-1 is. A temporary directory/pool for receiving the snapshot? I see it has a date of the same time that the destroys took place.

Thanks for your help.

Steve

@cousins
Copy link
Author

cousins commented Jan 13, 2015

The story continues...

Since the omg-pool seemed ok I tried sending an incremental. I got some alarming messages:

[root@nfs1 etc]# zfs send -R -i omg-pool@20141231 omg-pool@20150112 | ssh nfs2-ib zfs receive -Fduv pool3
attempting destroy pool3
failed - trying rename pool3 to pool3recv-32032-1
failed (2) - will try again on next pass
receiving incremental stream of omg-pool@20150112 into pool3@20150112
cannot receive incremental stream: most recent snapshot of pool3 does not
match incremental source
attempting destroy pool3
failed - trying rename pool3 to pool3recv-32032-2
failed (2) - will try again on next pass
warning: cannot send 'omg-pool@20150112': Broken pipe

The end effect was that it ended up destroying pool3/secondary. Is there a way to roll back the transaction? Here is what zfs history -il shows:

2015-01-13.17:59:22 [txg:2744174] destroy pool3/secondary (232) [on nfs2.localdomain]

The previous listing I showed that the last transaction from the good backup was txg 2659919. Can I tell zfs to roll back to that and get back the pools?

Thanks,

Steve

@dweeezil
Copy link
Contributor

@cousins I think you ran into problems because the filesystems were mounted on the receiving side and the rename operation failed. Unfortunately zfs receive -F can be very dangerous if one of the intermediate steps fails. I'll try running a few tests to see whether I can reproduce this.

@cousins
Copy link
Author

cousins commented Jan 14, 2015

@dweeezil I see. I'm questioning why I ran -F in the first place now. Am I likely to have any luck rolling back to a previous txg to get the data back? There has been very little going on on the system since last Thursday.

@eborisch
Copy link

I would like to see stronger wording around recv -F cautioning against automated use. I think it is getting used (I've seen it pop up on the freebsd-fs list, as well) as a convenient way to expire old snapshots (when auto-snapshotting is providing that feature on the source side) but the risk outweighs the benefits for an automated System_A --> System_B periodic update.

Use of readonly:on at the backup destination (preventing any accidental modifications by just wandering around directories; especially if atime:on is set) can also help prevent needing to roll-back the destination for the receive to work (the other 'benefit' of -F)...

@eborisch
Copy link

From man zfs: (in send -R description):

If the -i or -I flags are used in conjunction with the -R flag, an incremental replication stream is generated. The current values of properties, and current snapshot and file system names are set when the stream is received. If the -F flag is specified when this stream is received, snapshots and file systems that do not exist on the sending side are destroyed.

(A similar phrase is repeated in the recv section.)

He indicated above he is doing send -Ri to a recv -F, so yes, snapshots and entire filesystems can easily be destroyed. If users/joe (a filesystem) is removed (and therefore all of its snapshots are removed, too) from the source, users/joe (and all its snapshot history) will be removed on the next send -Ri <...> | recv -F <...> operation.

Which likely isn't what is desired for an update to a backup with a history of snapshots.

This isn't a bad thing or a wrong thing, it's just something that needs a few more beware dragons around recv -F in the man pages for my tastes. It serves a purpose, but likely in recovery/setup operations, and not in a cron job.

@cousins
Copy link
Author

cousins commented Jan 15, 2015

@eborisch Thanks for the information. The thing is that the snapshots and pools still exist on the sending side so my understanding is that the receiving side shouldn't have deleted these if -F is working correctly. That said, I'm going to see about not using -F anymore

The current priority is to try to roll back to a txg that is before the pools were destroyed. Any recommendations on how best to do that?

@eborisch
Copy link

@kpande To make sure the backup filesystems remain locally unchanged, use atime:off at a minimum (to prevent reading from the filesystem causing modifications), and readonly:on if possible. (Because, it's a backup, right?) Note that readonly:on does not mean you can't zfs recv into the filesystem to update it. If you don't need it constantly accessible, consider keeping it unmounted on the backup system.

And to (hopefully) clarify, you don't need to prevent using -F; you just need to be aware of what it does: it forces the receiving filesystem tree to match the state of the sending one. Think of it like rsync's --delete option.

Without -F, recv (of a stream created with zfs send -RI @time_m a@time_n) goes through, looks to make sure that each starting filesystem@snapshot a/b/c@time_m has a matching local A/b/c@time_m (names possibly truncated some based on the recv options) to "start from" -- and that zfs get written@time_m a/b == 0 bytes (locally unchanged.) If it's been locally changed, you'll need to rollback (on the destination) to the @time_m snapshot and then try again.

As an example without -F, if there is A/b/old filesystem on the destination, but no corresponding a/b/old on the sender, it is untouched on the destination. The same goes for A/b/c@time_old.

With -F, it will helpfully roll back to the state to match @time_m and then process the recv, while also removing filesystems and snapshots below the recv destination that do not exist (at time_n) on the send side.

As an example with -F, if there is an A/b/old filesystem on the destination, but no corresponding a/b/old on the sender, it is removed from the destination. The same goes for A/b/c@time_old.

@cousins
Copy link
Author

cousins commented Jan 15, 2015

@eborisch This was my understanding too. So, it seems that something else happened with my system or recv -F has a bug in it.

@dweeezil Have you found anything?

Also, anyone have any ideas about rolling back the TXG to get the pools back?

Thanks,

Steve

@dweeezil
Copy link
Contributor

@cousins I did find one interesting bug which could cause havoc in a scripted environment. If the source snapshot specified with zfs send -i or zfs send -I does not exist, no error is generated and instead, it simply creates a full stream. When used in conjunction with zfs receive -F, a full stream will cause everything to be destroyed first. I've got no idea whether this was the cause of your problem and am continuing to investigate.

@eborisch
Copy link

@dweezil I don't think that's a bug. It means that if I:

  1. synced X:a/b@m to Y:A/b@m
  2. create X:a/b/c
  3. snap -r X:a/b@n
  4. send -RI @m X:a/b@n | recv -d Y:A

Then X:a/b/c is sent and received as a full stream and springs into existence at Y:A/b/c, which I would argue is desirable.

If even prints out a message saying what it is doing; something like "source snapshot does not exist; sending full stream."

@dweeezil
Copy link
Contributor

That's why I said "in a scripted environment" (when zfs receive -F is being used). I'd not mind seeing the 2 somewhat unrelated features (rollback versus destruction) of zfs receive -F split into two separate options.

For my part, I don't use "-F" in my automated environment at all (nor do I use zfs send -R). Instead I wrote what must be the 1000th different replication script because I wasn't happy with anything else I could find. Among other things, it handles all the intermediate steps on its own.

One other note I'd like to add is that if you can avoid it, send/receive is potentially more reliable when the received-to filesystems are not mounted while the receive is in progress.

@cousins
Copy link
Author

cousins commented Jan 20, 2015

@dweeezil Thanks for the advice. I have revised my scripts to not use -F. In the past and I'll plan on keeping the pools on the remote system unmounted although it is very desirable to be able to have it mounted.

As I'm getting no takers in helping to roll-back to a specific TXG I'll open up another thread.

Thanks very much everyone with special thanks to Tim.

@dweeezil
Copy link
Contributor

@cousins Regarding rewind imports, you can get an idea of the uberblocks available with something like zdb -lu /dev/<pool_member> | grep txg | sort -u. I have a feeling the uberblock you want is long gone, but if you see it, the zpool import -T command is what you're looking for. Good luck!

@eborisch
Copy link

Setting the local readonly property to on is likely sufficient. (In lieu of
keeping them unmounted.) the goal is to avoid changes on the backup copy so
that the recv works (without -F).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Inactive Not being actively updated
Projects
None yet
Development

No branches or pull requests

5 participants
@behlendorf @dweeezil @cousins @eborisch and others