Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When a ZFS File system is set to CASE INSENSITIVE we should be calling d_add_ci on the dentry #4136

Closed
RichardSharpe opened this issue Dec 23, 2015 · 16 comments
Milestone

Comments

@RichardSharpe
Copy link
Contributor

Hi folks,

We just ran into an issue that caused a rename deadlock in Linux.

Kernel 3.10.0, but the same will happen, I believe with the latest kernel.

We use the casesensitivity=insensitive attribute.

Because ZoL does not call d_add_ci with the canonical case (at least, I cannot find any such calls) Linux ends up with two dentries pointing to the same inode if you present two paths with different case.

This happened to us and causes a self-deadlock in lock_rename because it is passed two different dentries that point to the same inode.

I think the fix has to be in zpl_inode.c:zpl_lookup where we check if the file system is case insensitive and if so call d_add_ci(dentry, ip, realpnp) and add a realpnp parameter to the zfs_lookup call we make, or something like that.

For now, we can work around the issue in Samba, I believe, however, it would be good to have a real solution to this problem.

@behlendorf
Copy link
Contributor

@RichardSharpe nice analysis. It looks like the fix your proposing can be made entirely in the Linux specific zpl_lookup() function. You can check how the filesystem is mounted by consulting zsb->z_case and because it's set once at mount time you know it can't change unless the filesystem is remounted. Then as you suggest you just need to use d_add_ci() to add a case-exact name.

It would be great if you could propose a patch for this and open a pull request.

@RichardSharpe
Copy link
Contributor Author

@behlendorf

I will work something up over the next few days.

@RichardSharpe
Copy link
Contributor Author

Simple test case:

  1. Create an fs with casesensitivity=insensitive
  2. mkdir /mnt-point/test
  3. touch /mnt-point/test/test.txt
  4. mv /mnt-point/test/test.txt /mnt-point/Test/test1.txt

Hung and deadlocked at this point.

@RichardSharpe
Copy link
Contributor Author

Here is a first attempt at this. It is untested (although it compiles), and I have at least one reservation:

diff --git a/module/zfs/zpl_inode.c b/module/zfs/zpl_inode.c
index e0e8ee3..ccbe68a 100644
--- a/module/zfs/zpl_inode.c
+++ b/module/zfs/zpl_inode.c
@@ -44,13 +44,24 @@ zpl_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags)
        struct inode *ip;
        int error;
        fstrans_cookie_t cookie;
+       pathname_t *ppn = NULL;
+       pathname_t pn;
+       zfs_sb_t *zsb = dentry->d_sb->s_fs_info;
+
+       /* What about mixed case? */
+       /* If we are a case insensitive fs, we need the real name */
+       if (zsb->z_case == ZFS_CASE_INSENSITIVE) {
+               pn.pn_bufsize = ZFS_MAXNAMELEN;
+               pn.pn_buf = kmem_zalloc(ZFS_MAXNAMELEN, KM_SLEEP);
+               ppn = &pn;
+       }

        if (dlen(dentry) > ZFS_MAXNAMELEN)
                return (ERR_PTR(-ENAMETOOLONG));

        crhold(cr);
        cookie = spl_fstrans_mark();
-       error = -zfs_lookup(dir, dname(dentry), &ip, 0, cr, NULL, NULL);
+       error = -zfs_lookup(dir, dname(dentry), &ip, 0, cr, NULL, ppn);
        spl_fstrans_unmark(cookie);
        ASSERT3S(error, <=, 0);
        crfree(cr);
@@ -69,7 +80,23 @@ zpl_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags)
                        return (ERR_PTR(error));
        }

-       return (d_splice_alias(ip, dentry));
+       /*
+        * If we are case insensitive, call the correct function
+        * to install the name.
+        */
+       if (ppn) {
+               struct dentry *new_dentry;
+               struct qstr real_name;
+
+               real_name.name = pn.pn_buf;
+               real_name.len = strlen(pn.pn_buf);
+               real_name.hash = full_name_hash(real_name.name, real_name.len);
+               new_dentry = d_add_ci(dentry, ip, &real_name);
+               kmem_free(pn.pn_buf, ZFS_MAXNAMELEN);
+               return (new_dentry);
+       }
+       else
+               return (d_splice_alias(ip, dentry));
 }

 void

My reservation is that we are adding a kmem_alloc and kmem_free in the lookup path.

It would have been nicer if zfs_lookup could return a pointer to the actual name, since we are not going to touch the name.

An additional reservation is that I have not thought about or handled caseensitivity=mixed.

I will try to test this later today.

@RichardSharpe
Copy link
Contributor Author

With a small change (MAXNAMELEN rather than ZFS_MAXNAMELEN) seems to work.

By work I mean:

  1. No OOPSes.
  2. Lookup still works (ls works etc)
  3. The rename test case no longer deadlocks.

More testing required.

@RichardSharpe
Copy link
Contributor Author

@ptx0: It might turn up with Roaming Profiles but I don't think it is specifically related to roaming profiles. We found it running FSCT.

@RichardSharpe
Copy link
Contributor Author

I have run fsstress many times against the changes and dozens of renames where the case differs in the from and to paths (not the last component) and no deadlocks seen.

I created a pull request.

@behlendorf
Copy link
Contributor

@RichardSharpe thanks for tackling this and opening a PR. The heart of the fix looks good, I've added some review comments so we can get it polished for inclusion. Please go ahead and refresh it when you can.

@RichardSharpe
Copy link
Contributor Author

@behlendorf:

Re:

You mentioned ZFS_MAXNAMELEN caused a panic. What was the cause of it? These values differ only by 1 and ZFS_MAXNAMELEN (255) is smaller compared to MAXNAMELEN (256) so I'm surprised. This should be ZFS_MAXNAMELEN as you originally had it.

Hmmm, I can't remember a panic.

However, I was printing out some names for debugging and was concerned that there might not be enough space for the trailing NULL.

@behlendorf
Copy link
Contributor

@RichardSharpe I must have misinterpreted your comment above.

@RichardSharpe
Copy link
Contributor Author

  1. I have created a new pull request. I might have mistakenly created two. Can I close one or more?
  2. Using the github interface is there any way to pull the ZoL repository into my fork so I am always at the top?

@behlendorf
Copy link
Contributor

@RichardSharpe when you open a pull request it's recommended that you use a branch per change. This way you can easily pull master from the ZoL repository and rebase your branch on top of those changes. Github provides some nice documentation, I'd suggest closing out #4147 and opening a new PR using a branch if you need to refresh this again.

@RichardSharpe
Copy link
Contributor Author

OK, I will try to close out 4147.

We hit that memory leak in QA a couple of times. It is quite impressive. 733MB leaked in the malloc-256 slab.

@RichardSharpe
Copy link
Contributor Author

I created a new pull request with what I think are the fixes but I noticed that two of the checks failed.

However, that seems to be because wget failed ... what can I do about it?

@behlendorf
Copy link
Contributor

@RichardSharpe looks like an unrelated hiccup in the test framework. I'll resubmit them for the sake of completeness but the patch looks good, and tests well, for me. Thanks for pushing this over the finish line, I'll get it merged it master.

goulvenriou pushed a commit to Alyseo/zfs that referenced this issue Jan 17, 2016
When casesensitivity=insensitive is set for the
file system, we can deadlock in a rename if the user uses different case
for each path. For example rename("A/some-file.txt", "a/some-file.txt").

The simple test for this is:

1. mkdir some-dir in a ZFS file system
2. touch some-dir/some-file.txt
3. mv Some-dir/some-file.txt some-dir/some-other-file.txt

This last request deadlocks trying to relock the i_mutex on the inode for
the parent directory.

The solution is to use d_add_ci in zpl_lookup if we are on a file system
that has the casesensitivity=insensitive attribute set.

This patch checks if we are working on a case insensitive file system and if
so, allocates storage for the case insensitive name and passes it to
zfs_lookup and then calls d_add_ci instead of d_splice_alias.

The performance impact seems to be minimal even though we have introduced a
kmalloc and kfree in the lookup path.

The problem was found when running Microsoft's FSCT against Samba on top of
ZFS On Linux.

Signed-off-by: Richard Sharpe <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#4136
nedbass pushed a commit that referenced this issue Mar 23, 2016
When casesensitivity=insensitive is set for the
file system, we can deadlock in a rename if the user uses different case
for each path. For example rename("A/some-file.txt", "a/some-file.txt").

The simple test for this is:

1. mkdir some-dir in a ZFS file system
2. touch some-dir/some-file.txt
3. mv Some-dir/some-file.txt some-dir/some-other-file.txt

This last request deadlocks trying to relock the i_mutex on the inode for
the parent directory.

The solution is to use d_add_ci in zpl_lookup if we are on a file system
that has the casesensitivity=insensitive attribute set.

This patch checks if we are working on a case insensitive file system and if
so, allocates storage for the case insensitive name and passes it to
zfs_lookup and then calls d_add_ci instead of d_splice_alias.

The performance impact seems to be minimal even though we have introduced a
kmalloc and kfree in the lookup path.

The problem was found when running Microsoft's FSCT against Samba on top of
ZFS On Linux.

Signed-off-by: Richard Sharpe <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes #4136
@behlendorf behlendorf added this to the 0.6.5.6 milestone Mar 23, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@behlendorf @RichardSharpe and others