-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Snapshot Directory (.zfs) #173
Comments
The .zfs snapshot directory is not yet supported. It's on the list of development items which need to be worked on. Snapshots can still be mounted directly read-only using 'mount -t zfs pool/dataset@snap /mntpoint'. |
Summary of Required WorkWhile snapshots do work the .zfs snapshot directory has not yet been implemented. Snapshots can be manually mounted as needed with the mount command, mount -t zfs dataset@snap /mnt/snap. To implement the .zfs snapshot directory a special .zfs inode must be created. This inode will have custom hooks which allow it list available snapshots as part of readdir(), and when a list is traversed the dataset must be mounted on demand. This should all be doable using the existing Linux automounter framework which has the advantage of simplifying the zfs code. |
Following your advice to use automounter I came up with the following: #!/bin/bash
# /etc/auto.zfs
# This file must be executable to work! chmod 755!
key="$1"
opts="-fstype=zfs"
for P in /bin /sbin /usr/bin /usr/sbin
do
if [ -x $P/zfs ]
then
ZFS=$P/zfs
break
fi
done
[ -x $ZFS ] || exit 1
ZFS="$ZFS list -rHt snapshot -o name $key"
$ZFS | LC_ALL=C sort -k 1 | \
awk -v key="$key" -v opts="$opts" -- '
BEGIN { ORS=""; first=1 }
{ if (first) { print opts; first=0 }; s=$1; sub(key, "", s); sub(/@/, "/", s); print " \\\n\t" s " :" $1 }
END { if (!first) print "\n"; else exit 1 } ' and to /etc/auto.master add /.zfs /etc/auto.zfs Snapshots can then be easily accessed through /.zfs/poolname/fsname/snapshotname. |
Neat. Thanks for posting this. As you say it provides a handy way to get to the snapshots until we get the .zfs directory in place. |
Hi Brian, I would like to work on this. |
Sounds good to me! I'd very like to see this get done, it's a big deal for people who want to use ZFS for an NFS server. I've tried to describe my initial thinking at a high level in the previous comment: https://github.com/behlendorf/zfs/issues/173#issuecomment-1095388 Why don't you dig in to nitty gritty details of this and we can discuss and concern, problems, issue you run in too. |
Thank you Brian, I will start with this and discuss if i have any problems. FYI, I have created a branch named snapshot of your fork 'zfs' repo and wil be working for this in that branch. |
Hi Brian, I am done with snapshot automounting framework. When a snapshot is created for example 'snap1' on a pool by name 'tank' which is default mounted on '/tank', one can access the contents of the snapshot by doing cd to '/tank/.zfs/snapshot/snap1', implementation is done using the linux automount framework as you have suggested. Also, when someone tries to destroy this dataset, then the snapshot is unmounted. When someone tries to destroy the pool with the snapshot mounted, then the pool can be destroyed. Multiple snapshot mounts/unmounts work. Other places where snapshot unmount is called are rename and promote, it also works now. But, one issue is that the functions which I am calling are linux kernel exported GPL functions which conflicts with the ZFS CDDL license. Currently to check the implementation, I changed the CDDL license to GPL. One way of solving this issue is, write a wrapper functions in SPL module which is GPL licensed and export them from spl and make use of them in ZFS, instead of directly calling the GPL exported symbols of linux kernel. But want to know your opinion on this. These symbols are : - vfs_kern_mount(), do_add_mount(), mark_mounts_for_expiry(). BTW, I am currently working on the access of auto-mounted snapshots through the NFS. The link to the branch is : - https://github.com/rohan-puri/zfs/tree/snapshot. Please have a look at the branch, if you get time and let me know if the implementation way seems to be correct or not. |
Hi Rohan, Thanks again for working on this. Here are my initial review comments and questions:
|
Would this automounter idea make it impossible to see the .zfs directories over nfs? |
Good question, I hope not. We should see what the current Linux behavior is regarding automouts on NFS servers. There's no reason we can't use ext4 for this today and see if traversing in to an automount mapped directory via NFS triggers the automount. If it doesn't we'll need to figure out if there's nothing we can do about it, we absolutely want the .zfs directory to work for NFS mounts. |
I've come across this issue while implementing libshare. Assuming auto-mounted directories aren't visible (haven't tested it yet) a possible work-around would be to use the "crossmnt" NFS option, although that would have the side effect of making other sub-volumes accessible via NFS which is different from what Solaris does. |
Hello Brian, I tried using the bziller's script posted above, was able to mount the zfs snapshots using the linux automounter. We can make changes to the script and write minimal kernel code to show the .zfs and snapshot dir lists. I agree with the approach you have provided. Only one thing which we need to take care of is the unmounting to happen not only in case when the mounted snapshot file-system is idle but also in the following cases : -
When we use linux automounter, to force expiry of mountpoint we need to send USR1 signal to automount It unmounts the unused snapshots. Have checked this. Now the thing is we need to trigger this command for each of the above cases. Need your feedback on this. |
Wonderful, I'm glad to have you working on this. I also completely agree we need to handle the 5 cases your describing. I think the cleanest way to do this will be to update the I don't think you need to inform the automounter of this in anyway, but I haven't checked so I could be wrong about that. |
Ideally the snapshot directory feature should work across cgroups/OpenVZ containers, so (end-)users can access snapshots when using ZFS datasets to store the root filesystem for containers. |
Hi Brian, I have implemented the minimal set of directory hierarchy for per file-system (in kernel) in my fork (new snapshot branch created) of your zfs repository, which supports .zfs dir, snapshot dir and snap entries dir's <dentry,inode> creations. Was playing with linux automounter, facing some issues : - bziller above used the indirect map, in which we need to specify a mount-point in the auto.master file, under which the zfs snapshot datasets would be mounted (this list is generated by giving the key which in this case is the fs dataset to the auto.zfs map file which is specific for the zfs fs). In this case bziller solved the problem using /.zfs as the autofs mountpoint. But each snapshot needs to be mounted under the .zfs/snapshot dir of its mount-point, which is different for each file system. So this autofs mount-point has to be different for each individual zfs file system, under which we will mount snapshots related to that fs later on (using some kind of script auto.zfs as you said). So the problem over here is which mountpoint we need to specify in the auto.master file ?
Need your view on this. |
This will be much easier to do once we integrate with systemd. Systemd will take care of doing the right thing with respect to configuring the Linux kernel automounter -- all we will have to do is simply export the filesystems over NFS and voila. |
In my view the cleanest solution will be your number 1 above. When a new zfs filesystem is created it will be automatically added to /etc/auto.master and the automount daemon signaled to pick up the change. Conversely, when a dataset is destroyed it must be removed and the automount daemon signaled. For example, if we create the filesystem tank/fish the /etc/auto.master would be updated like this. /tank/fish/.zfs/snapshot /etc/auto.zfs -zpool=tank The /etc/auto.zfs script can then be used as a generic indirect map as described above. However, since it would be good to validate the key against the known set of snapshots we also need to make the pool name available to the script. I believe this can be done by passing it as an option to the map. The man page says arguments with leading dashes are considered options for the maps but I haven't tested this. Sound good? |
To play the devils advocate: I can think of quite a few sysadmins who wouldn't take kindly at all to "some random filesystem" changing system config files. Isn't this a situation similar to the way mounting of zfs filesystems is handled? |
The trouble is we want to leverage the automount daemon to automatically do the mount for us so we don't need to have all the snapshots mounted all the time. For that to work we need to keep the automount daemon aware of the available snapshots via the config file. |
I assumed (naively I'm sure) that there would be some kind of API that would allow to dynamically register / remove automount maps without having to modify the config file. |
If only that were true. :) |
How about running multiple instances of the automount daemon? Then ZFS could have it's own separate auto.master file just for handling its own snapshot mounts. |
I believe the automounted daemon is phased out in favor of systemd units. Those should be used first. We must write a small systems helper to inject units into systemd without touching configuration or unit files. |
1 similar comment
I believe the automounted daemon is phased out in favor of systemd units. Those should be used first. We must write a small systems helper to inject units into systemd without touching configuration or unit files. |
Hello Rudd-O, I agree that we should leverage systemd instead of making changes to the current infra (automounte daemon). But not all the systems may come with systemd, in which case we must provide an alternate way. Hello Brian, I do agree with ulope's point, also when I was working on it earlier and trying to implement the first solution (#173 (comment)) that you mentioned. Even after restarting the automount daemon I was not seeing the changes & they were getting reflected only after reboot. NFS, CIFS, etc all these file systems make use of in-kernel mount, in which case we dont have to rely on automount for mounting. Also all the above 5 cases in which unmount is to be triggered can also covered in this approach also. Need your input :) |
I totally agree there has got to be an alternate non-systemd way. It'll probably mean some code duplication for a couple of years. It's okay. |
We can avoid this by following the approach described in #173 (comment) , need Brians input on it though. Whats your opinion on that? |
So I think there are a few things to consider here. I agree that using the automounter while it seemed desirable on the surface seems to be causing more trouble than it's worth. Implementing the .zfs snapshot directly by mounting the snapshot via a kernel upcall during .zfs path traversal seems like a reasonable approach. However, it's now clear to me why Solaris does something entirely different. If we mount the snapshots like normal filesystems under .zfs they will not be available from nfsv3 clients because they will have a different fsid. Since this is probably a pretty common use case it may be worth reimplementing the Solaris solution. That said, I wouldn't object to including your proposed solution as a stop gap for the short to medium term. |
Brilliant observation.Sent from my Android phone with K-9 Mail. Please excuse my brevity. Brian Behlendorf [email protected] wrote: So I think there are a few things to consider here. I agree that using the automounter while it seemed desirable on the surface seems to be causing more trouble than it's worth. Implementing the .zfs snapshot directly by mounting the snapshot via a kernel upcall during .zfs path traversal seems like a reasonable approach. However, it's now clear to me why Solaris does something entirely different. If we mount the snapshots like normal filesystems under .zfs they will not be available from nfsv3 clients because they will have a different fsid. Since this is probably a pretty common use case it may be worth reimplementing the Solaris solution. That said, I wouldn't object to including your proposed solution as a stop gap for the short to medium term. Reply to this email directly or view it on GitHub: |
Fixes "dataset not found" error on zfs destory <snapshot> see openzfs#173. Fixes race in dsl_dataset_user_release_tmp() when the temp snapshot from zfs diff dataset@snap command is used see openzfs#481.
Think I have found what was causing the dataset does not exist issue, have submitted a pull request. |
Refreshed version of the patch which includes |
Am doing some testing on accessing snapshots over NFS, but currently am unable to access the .zfs directory from the client, after: zfs set snapdir=visible system, I can now see the .zfs directory on the NFS client, but attempting to access it gives: ls: cannot open directory /system/.zfs: No such file or directory... So far it looks to be some issue looking up the attr's on the .zfs inode? Am seeing this from the client NFS: nfs_revalidate_inode: (0:17/-1) getattr failed, error=-2 Below are some debug messages from NFS on both sides and some output of a systemtap script I'm using to explore the issue, am a bit stuck of where to go from here... Anyone have this working, or have any suggestions of direction I can take to further debug the issue?
|
Think I may have tracked down the issue with not being able to do ls -la /system/.zfs. The path is something like: getattr -> nfs client -> nfs server -> nfsd3_proc_getattr -> zpl_fh_to_dentry -> zfs_vget In zfs_vget() attempts to retreive a znode, but as its a control directory it doesn't have a backing znode so should not do a normal lookup, this condition is identified in zfs_vget() here:
But from my traces I found this condition was not triggered. I did a trace of the locals in zfs_vget() and got the following:
Did a printk to confirm:
So looks like zlfid->zf_setid is not long enough, or perhaps it's been trucated by nfs3, but either way object ends up with not enough f's: ffffffffffff so doesn't match ZFSCTL_INO_ROOT. As a test I adjusted the values for the control inode defines:
Then tested, am I am now able to traverse the .zfs directory and see the shares and snapshot dirs inside. Noticed that other ZFS implementations use low values for the control dir inodes, have we gone to large on these, or is this a limitation in nfs3 itself, wondering on how best to deal with it? |
Now I can get into the .zfs directory, trying to cd to the snapshot dir causes cd to hang and I get the following in dmesg ( slowly getting there! ):
|
@b333z Nice job. Yes, your exactly right I'd forgotten about this NFS limit when selecting those object IDs. There's actually a very nice comment in the code detailing exactly where this limit comes from. So we're limited to 48-bits for NFSv2 compatibility reasons, and actually the DMU imposes a (not widely advertised) 48-bit object number limit too. include/sys/zfs_vfsops.h:105 /* * Normal filesystems (those not under .zfs/snapshot) have a total * file ID size limited to 12 bytes (including the length field) due to * NFSv2 protocol's limitation of 32 bytes for a filehandle. For historical * reasons, this same limit is being imposed by the Solaris NFSv3 implementation * (although the NFSv3 protocol actually permits a maximum of 64 bytes). It * is not possible to expand beyond 12 bytes without abandoning support * of NFSv2. * * For normal filesystems, we partition up the available space as follows: * 2 bytes fid length (required) * 6 bytes object number (48 bits) * 4 bytes generation number (32 bits) * * We reserve only 48 bits for the object number, as this is the limit * currently defined and imposed by the DMU. */ typedef struct zfid_short { uint16_t zf_len; uint8_t zf_object[6]; /* obj[i] = obj >> (8 * i) */ uint8_t zf_gen[4]; /* gen[i] = gen >> (8 * i) */ } zfid_short_t; include/sys/zfs_znode.h:160 /* * The directory entry has the type (currently unused on Solaris) in the * top 4 bits, and the object number in the low 48 bits. The "middle" * 12 bits are unused. */ #define ZFS_DIRENT_TYPE(de) BF64_GET(de, 60, 4) #define ZFS_DIRENT_OBJ(de) BF64_GET(de, 0, 48) So the right fix is going to have to be use smaller values for the ZFSCTL_INO_* constants... along with a very good comment explaining why those values are what they are. As you noted they are quite a bit larger than their upstream counterparts. The reason is because the upstream code creates a separate namespace for the small .zfs/ directory and then mounts the snapshots on top of it. Under Linux it was far easier just to just create these directories in the same namespace and the original zfs filesystem. However, since they are in the same namespace (unlike upstream) we needed to make sure the object ids never conflicted so they used the upper most object ids. Since zfs allocates all of it's object ids from 1 in a monotonically increasing fashion there wouldn't be a conflict. The second issue looks like it's caused by trying to allocate a new inode for the .zfs/snapshot directory when one already exists in the namespace. Normally, this wouldn't occur in the usual vfs callpaths but the NFS paths differ. We're going to need to perform a lookup and only create the inode when the lookup fails. See zfsctl_snapdir_lookup() as an example of this. I'd really like to get this code done and merged in to master but I don't have the time to run down all these issues right now. If you can work on this and resolve the remaining NFS bugs that would be great, I'm happy to iterate with you on this in the bug and merge it once it's done. |
Sounds good Brian, I'll expand my test env to include nfs2 and nfs4 and work towards resolving any remaining issues. |
Making some slow progress on this, I can traverse the control directory structure down to the snapshots now ( still on nfs3 ). As you said it looks like the it was trying to create a new inodes for the control directories when they were already there, so adding a lookup as you suggested seems to have done the trick. These are the changes that I have so far:
Have tried a few combinations in zfs_vget of dealing with a snapshot directory ( currently an ilookup ) but so far all I get is the "." and ".." directories inside with a 1970 timestamp. Had tried traversing the directory first via the local zfs mount to ensure the snapshot is mounted, then traversing via nfs but still get an empty directory. My currently thinking is that nfsd refuses to export anything below that point as its a new mount point. Did some experimentation in forcing/ensuring that getattr returned the same stat->dev as the parent filesystem, that didn't seem to help. I will start doing some tracing on the nfs code so I can see what its doing. I then did some experimentation with the crossmnt nfs option, that seems to have some promise it seems to be attempting to traverse the mount but give a stale file handle error but looked to at least attempting it. Anyhow, slowly getting my head around it all, just planning to keep improving my tracing and hopefully get to the bottom of it soon, let us know if you have any idea's or tips! |
Sounds good. Since I want to avoid this branch getting any staler than it already is I'm seriously considering merging this change in without the NFS support now that -rc7 has been tagged. We can further work on the NFS issues in another bug. Any objections? As for the specific NFS issues your seeing that idea here is to basically fake out NFS for snapshots. The snapshot filesystems should be created with the same fsid as their parent so NFS can't tell the difference. Then it should allow traversal even without the crossmnt option. The NFS handles themselves are constructed in such a way as to avoid collisions and so lookups will be performed in the proper dataset. That said, clearly that's all not working quite right under Linux. We'll still need to dig in to why. |
I have no objections, would be great to get this code merged, there's some great functionality there even without NFS support, so if I can assist in any way, let us know. I'll continue to dig deeper on the nfs stuff and see what I can find. |
The deed is done. Thanks for being patient with me to make sure this was done right. The core .zfs/snapshot code has been merged in the master with the following limitations.
Please open new issues for any problems you observe. |
Add support for the .zfs control directory. This was accomplished by leveraging as much of the existing ZFS infrastructure as posible and updating it for Linux as required. The bulk of the core functionality is now all there with the following limitations. *) The .zfs/snapshot directory automount support requires a 2.6.37 or newer kernel. The exception is RHEL6.2 which has backported the d_automount patches. *) Creating/destroying/renaming snapshots with mkdir/rmdir/mv in the .zfs/snapshot directory works as expected. However, this functionality is only available to root until zfs delegations are finished. * mkdir - create a snapshot * rmdir - destroy a snapshot * mv - rename a snapshot The following issues are known defeciences, but we expect them to be addressed by future commits. *) Add automount support for kernels older the 2.6.37. This should be possible using follow_link() which is what Linux did before. *) Accessing the the .zfs/snapshot directory via NFS is not yet possible. The majority of the ground work for this is complete. However, finishing this work will require resolving some lingering integration issues with the Linux NFS kernel server. *) The .zfs/shares directory exists but no futher smb functionality has yet been implemented. Contributions-by: Rohan Puri <[email protected]> Contributiobs-by: Andrew Barnes <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #173
can any one help me |
@baquar I'm not exactly sure what your asking. ZFS is a local filesystem if you take a snapshot it will just be visible in .zfs snapshot on that system. Gluster which layers on top of ZFS will not replicate it to your other systems. You can however manually ship it to the other system with send/recv. |
@behlendorf hiii behlendorf i m sorry u did nt got me, issue was i am unable to find snapshot location in zfs.0 |
@behlendorf one more question i have i m mounting snapshot using this commond mount -t zfs datapool@dara /export/queue-data/ to share directory but it is in read only mode could you please tell me how to set permission to snapshot in zfs |
@baquar You can't write to snapshot, make clone if you need to write. And use mailing list for questions. |
@aikudinov thanks you i really appreciate your warm response but my question is can we set full permission to snapshot i am sending it to another system and restoring it |
@baquar IMHO the best way to get familiar with the spl/zfs code is to pick an open issue you care about and see if you can fix it. We're always happy to have the help and are willing to provide advise and hints. However let me second @aikudinov and point you to the [email protected] mailing list. There are lots of helpful people reading the list who can probably very quickly answer your questions. As for your question about the snapshots they are by definition immutable. If you need a read-write copy you must clone it and mount the clone. |
…penzfs#173) Signed-off-by: Vishnu Itta <[email protected]>
Signed-off-by: Paul Dagnelie <[email protected]>
Was unable to find the .zfs directory. Normally this directory is present after snapshots are created. However I could not find this directory with the rc-2 release after snapshots are created.
The text was updated successfully, but these errors were encountered: