-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lost pools after many successful send/receives #3010
Comments
@dweezil The sending system has zfs-0.6.2-224_g4d8c78c.el6.x86_64 Could that be the issue? That server is a production server that we can't update easily and it has been running flawlessly, until now possibly. I guess my main question right now is: when sending a snapshot of a pool, does a pool of that same name need to exist already on the receiving side? For instance, with sub-pool pool2/os-grizzly, if I send that to pool3 on another server that doesn't already have pool3/os-grizzly is it expected that pool3/os-grizzly will get pruned some time? Thanks a lot, Steve |
@cousins Your sending system of 4d8c78c does not have 0574855. If you are sending a full stream, the dataset should not exist on the receiving side. If you are sending an incremental stream, it should exist. If you're not sure of the way the dataset name is being constructed on the receiving side, you should run the receive with "-n" and it will show what it would do without doing anything. |
@dweeezil So, it seems that I did things correctly. I just found: zpool history -i and now I see a lot of destroy events from last Thursday afternoon. I'm showing all of the transactions for the 8th. The first ones are from an automatic backup in the middle of the night. The destroy ones start at 16:25 2015-01-08.01:20:57 [txg:2659759] receive pool3/cinder-volumes/%recv (1699) And yet: zpool history for the 8th just shows: 2015-01-08.01:21:22 zfs receive -Fduv pool3 which are three incrementals that were done early in the morning. No events showing me trying to destroy anything. Any idea what happened? Thanks a lot. Steve |
Incidently,I just noticed that /pool3 looks like: drwxr-xr-x. 2 root root 2 Dec 29 11:25 cinder-volumes The only directories that have data in them are omg-pool and secondary. I don't know what recv-345-1 is. A temporary directory/pool for receiving the snapshot? I see it has a date of the same time that the destroys took place. Thanks for your help. Steve |
The story continues... Since the omg-pool seemed ok I tried sending an incremental. I got some alarming messages: [root@nfs1 etc]# zfs send -R -i omg-pool@20141231 omg-pool@20150112 | ssh nfs2-ib zfs receive -Fduv pool3 The end effect was that it ended up destroying pool3/secondary. Is there a way to roll back the transaction? Here is what zfs history -il shows: 2015-01-13.17:59:22 [txg:2744174] destroy pool3/secondary (232) [on nfs2.localdomain] The previous listing I showed that the last transaction from the good backup was txg 2659919. Can I tell zfs to roll back to that and get back the pools? Thanks, Steve |
@cousins I think you ran into problems because the filesystems were mounted on the receiving side and the rename operation failed. Unfortunately |
@dweeezil I see. I'm questioning why I ran -F in the first place now. Am I likely to have any luck rolling back to a previous txg to get the data back? There has been very little going on on the system since last Thursday. |
I would like to see stronger wording around recv -F cautioning against automated use. I think it is getting used (I've seen it pop up on the freebsd-fs list, as well) as a convenient way to expire old snapshots (when auto-snapshotting is providing that feature on the source side) but the risk outweighs the benefits for an automated System_A --> System_B periodic update. Use of |
From man zfs: (in send -R description):
(A similar phrase is repeated in the recv section.) He indicated above he is doing send -Ri to a recv -F, so yes, snapshots and entire filesystems can easily be destroyed. If users/joe (a filesystem) is removed (and therefore all of its snapshots are removed, too) from the source, users/joe (and all its snapshot history) will be removed on the next send -Ri <...> | recv -F <...> operation. Which likely isn't what is desired for an update to a backup with a history of snapshots. This isn't a bad thing or a wrong thing, it's just something that needs a few more beware dragons around |
@eborisch Thanks for the information. The thing is that the snapshots and pools still exist on the sending side so my understanding is that the receiving side shouldn't have deleted these if -F is working correctly. That said, I'm going to see about not using -F anymore The current priority is to try to roll back to a txg that is before the pools were destroyed. Any recommendations on how best to do that? |
@kpande To make sure the backup filesystems remain locally unchanged, use And to (hopefully) clarify, you don't need to prevent using -F; you just need to be aware of what it does: it forces the receiving filesystem tree to match the state of the sending one. Think of it like rsync's --delete option. Without -F, recv (of a stream created with As an example without -F, if there is A/b/old filesystem on the destination, but no corresponding a/b/old on the sender, it is untouched on the destination. The same goes for A/b/c@time_old. With -F, it will helpfully roll back to the state to match @time_m and then process the recv, while also removing filesystems and snapshots below the recv destination that do not exist (at time_n) on the send side. As an example with -F, if there is an A/b/old filesystem on the destination, but no corresponding a/b/old on the sender, it is removed from the destination. The same goes for A/b/c@time_old. |
@cousins I did find one interesting bug which could cause havoc in a scripted environment. If the source snapshot specified with |
@dweezil I don't think that's a bug. It means that if I:
Then X:a/b/c is sent and received as a full stream and springs into existence at Y:A/b/c, which I would argue is desirable. If even prints out a message saying what it is doing; something like "source snapshot does not exist; sending full stream." |
That's why I said "in a scripted environment" (when For my part, I don't use "-F" in my automated environment at all (nor do I use One other note I'd like to add is that if you can avoid it, send/receive is potentially more reliable when the received-to filesystems are not mounted while the receive is in progress. |
@dweeezil Thanks for the advice. I have revised my scripts to not use -F. In the past and I'll plan on keeping the pools on the remote system unmounted although it is very desirable to be able to have it mounted. As I'm getting no takers in helping to roll-back to a specific TXG I'll open up another thread. Thanks very much everyone with special thanks to Tim. |
@cousins Regarding rewind imports, you can get an idea of the uberblocks available with something like |
Setting the local readonly property to on is likely sufficient. (In lieu of |
I recently (December) set up a new server to backup our primary ZFS server. It has CentOS 6.6 and ZFS/SPL version 0.6.3. I set up a big pool (similar to the primary pool) and then used send/receive to send snapshots over to the backup server. After the initial snapshots were done I sent incrementals over. All seemed fine.
On Friday (always on a Friday or holiday...) I noticed that the daily incremental sends weren't working anymore. Checking on the new server now I don't see some of the pools and I find no reference to any of the snapshots on the backup server. zpool status shows no problems and dmesg also shows nothing. I thought maybe there was just some inconsistency so I rebooted but things look exactly the same. I am hesitant to do anything until I get some advice.
In looking at the history I see that I didn't actually create a sub-pool on the backup server. So for instance I did:
And then subsequently:
And all seemed to work. I remember seeing /pool3/os-grizzly on the backup server and I could see snapshots listed on the backup server for pool3/os-grizzly. I have backup logs showing that it sent/received the streams successfully.
I have other pools that are still showing up but they are pools that show a "create" entry in the zpool history log. Since it seemed to work without doing a create I figured I was all set. Now I'm wondering what happened. The data went somewhere and I wonder where it went. Are there any steps I can take to see what happened? Since this is a backup I'm not worried about the data. I'm more worried about pools vanishing when I really need them so I'd like to know what happened.
I can supply more detailed information but for now I'm just looking for higher level information about what might have happened.
Thanks a lot for any advice.
Steve
The text was updated successfully, but these errors were encountered: