Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hang in setxattr on zfs mount point #5124

Closed
cgaspar opened this issue Sep 17, 2016 · 9 comments
Closed

Hang in setxattr on zfs mount point #5124

cgaspar opened this issue Sep 17, 2016 · 9 comments

Comments

@cgaspar
Copy link

cgaspar commented Sep 17, 2016

netatalk becomes unkillable (kernel 4.7.3-200.fc24.x86_64, zfs 0.6.5.8). strace shows:
...
lstat("/export/cimedia", {st_mode=S_IFDIR|S_ISGID|0775, st_size=15, ...}) = 0
getxattr("/export/cimedia", "system.posix_acl_access", 0x7ffd1ac3dc40, 132) = -1 EOPNOTSUPP (Operation not supported)
lgetxattr("/export/cimedia", "user.org.netatalk.has-Extended-A"..., 0x7ffd1ac3e460, 4) = -1 ENODATA (No data available)
getuid() = 0
geteuid() = 0
setresuid(-1, 0, -1) = 0
setxattr("/export/cimedia", "user.org.netatalk.has-Extended-A"..., "yes", 4, 0
and the process is stuck in the kernel forever...

@cgaspar
Copy link
Author

cgaspar commented Sep 17, 2016

After a reboot I couldn't reproduce the problem :-(
If it happens again I'll try and grab a core.

@kernelOfTruth
Copy link
Contributor

kernelOfTruth commented Sep 17, 2016

@cgaspar for reference:

what is the value of

zfs get xattr foo

(where foo is the name of your pool)

when was the pool created (which which ZFS version) ?

zfs get creation foo

are any special features enabled ?

In what way are you accessing the pool via netatalk ?

What's the status of selinux ?

@cgaspar
Copy link
Author

cgaspar commented Sep 17, 2016

vault xattr sa local
Created with 0.6.5.7, I believe:
vault creation Wed Jun 8 22:24 2016 -
possibly interesting properties:

vault/cimedia  compression            on                     received
vault/cimedia  atime                  off                    received
vault/cimedia  xattr                  sa                     inherited from vault
vault/cimedia  version                5                      -
vault/cimedia  utf8only               on                     -
vault/cimedia  normalization          formD                  -
vault/cimedia  casesensitivity        insensitive            -
vault/cimedia  com.sun:auto-snapshot  true                   received

So it's possible a snapshot race could be involved

This was netatalk trying to launch itself, it didn't get far enough to actually respond to network clients

selinux:

SELinux status:                 enabled
SELinuxfs mount:                /sys/fs/selinux
SELinux root directory:         /etc/selinux
Loaded policy name:             targeted
Current mode:                   enforcing
Mode from config file:          enforcing
Policy MLS status:              enabled
Policy deny_unknown status:     allowed
Max kernel policy version:      30

@cgaspar
Copy link
Author

cgaspar commented Sep 20, 2016

It looks like the filesystem is now completely dead. Any filesystem access to it hangs. Sadly kdump appears to go into some loop and fail, so I can't grab a crash dump :-(

@behlendorf
Copy link
Contributor

@cgaspar if you can check the console log by running dmesg. There's a good chance there's additional debugging there including a stack trace.

@cgaspar
Copy link
Author

cgaspar commented Sep 20, 2016

I used zfs send/recv to copy the filesystem to a different pool on the same host. I copied the original solaris->linux snapshot and a more recent snapshot. Both get stuck in the kernel on any access. I did a sysrq-l and sysrq-t, output is attached. I guess next step is to try rolling back to an older kernel with an older zfs version and see what happens.

zfshang.trace.txt

@cgaspar
Copy link
Author

cgaspar commented Sep 20, 2016

Rolling back to 4.6.5-300.fc24.x86_64 + zfs 0.6.5.8 restored service. So some combination of 4.7.3-200.fc24.x86_64 and zfs 0.6.5.8 appears to be toxic with this filesystem. I'm happy to build/install/run anything that might help track down the problem.

@tuxoko
Copy link
Contributor

tuxoko commented Sep 20, 2016

@cgaspar
This is likely because you are using case insensitive. Linux 4.7 introduces parallel lookup and along with case insensitive cause a deadlock in current ZFS implementation.

@tuxoko
Copy link
Contributor

tuxoko commented Sep 20, 2016

@cgaspar Please test #5141

DeHackEd pushed a commit to DeHackEd/zfs that referenced this issue Oct 19, 2016
We must not use d_add_ci if the dentry already has the real name. Otherwise,
d_add_ci()->d_alloc_parallel() will find itself on the lookup hash and wait
on itself causing deadlock.

Tested-by: satmandu
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Closes openzfs#5124
Closes openzfs#5141
Closes openzfs#5147
Closes openzfs#5148
stiell pushed a commit to stiell/zfs that referenced this issue Oct 21, 2016
We must not use d_add_ci if the dentry already has the real name. Otherwise,
d_add_ci()->d_alloc_parallel() will find itself on the lookup hash and wait
on itself causing deadlock.

Tested-by: satmandu
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Closes openzfs#5124 
Closes openzfs#5141 
Closes openzfs#5147 
Closes openzfs#5148
DeHackEd pushed a commit to DeHackEd/zfs that referenced this issue Oct 29, 2016
We must not use d_add_ci if the dentry already has the real name. Otherwise,
d_add_ci()->d_alloc_parallel() will find itself on the lookup hash and wait
on itself causing deadlock.

Tested-by: satmandu
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Closes openzfs#5124
Closes openzfs#5141
Closes openzfs#5147
Closes openzfs#5148
behlendorf pushed a commit to behlendorf/zfs that referenced this issue Jan 20, 2017
We must not use d_add_ci if the dentry already has the real name. Otherwise,
d_add_ci()->d_alloc_parallel() will find itself on the lookup hash and wait
on itself causing deadlock.

Tested-by: satmandu
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Closes openzfs#5124
Closes openzfs#5141
Closes openzfs#5147
Closes openzfs#5148
behlendorf pushed a commit to behlendorf/zfs that referenced this issue Feb 2, 2017
We must not use d_add_ci if the dentry already has the real name. Otherwise,
d_add_ci()->d_alloc_parallel() will find itself on the lookup hash and wait
on itself causing deadlock.

Tested-by: satmandu
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Closes openzfs#5124
Closes openzfs#5141
Closes openzfs#5147
Closes openzfs#5148
behlendorf pushed a commit to behlendorf/zfs that referenced this issue Feb 2, 2017
We must not use d_add_ci if the dentry already has the real name. Otherwise,
d_add_ci()->d_alloc_parallel() will find itself on the lookup hash and wait
on itself causing deadlock.

Tested-by: satmandu
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Closes openzfs#5124
Closes openzfs#5141
Closes openzfs#5147
Closes openzfs#5148
behlendorf pushed a commit to behlendorf/zfs that referenced this issue Feb 2, 2017
We must not use d_add_ci if the dentry already has the real name. Otherwise,
d_add_ci()->d_alloc_parallel() will find itself on the lookup hash and wait
on itself causing deadlock.

Tested-by: satmandu
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Closes openzfs#5124
Closes openzfs#5141
Closes openzfs#5147
Closes openzfs#5148
behlendorf pushed a commit to behlendorf/zfs that referenced this issue Feb 2, 2017
We must not use d_add_ci if the dentry already has the real name. Otherwise,
d_add_ci()->d_alloc_parallel() will find itself on the lookup hash and wait
on itself causing deadlock.

Tested-by: satmandu
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Closes openzfs#5124
Closes openzfs#5141
Closes openzfs#5147
Closes openzfs#5148
behlendorf pushed a commit to behlendorf/zfs that referenced this issue Feb 2, 2017
We must not use d_add_ci if the dentry already has the real name. Otherwise,
d_add_ci()->d_alloc_parallel() will find itself on the lookup hash and wait
on itself causing deadlock.

Tested-by: satmandu
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Closes openzfs#5124
Closes openzfs#5141
Closes openzfs#5147
Closes openzfs#5148
behlendorf pushed a commit to behlendorf/zfs that referenced this issue Feb 2, 2017
We must not use d_add_ci if the dentry already has the real name. Otherwise,
d_add_ci()->d_alloc_parallel() will find itself on the lookup hash and wait
on itself causing deadlock.

Tested-by: satmandu
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Closes openzfs#5124
Closes openzfs#5141
Closes openzfs#5147
Closes openzfs#5148
Requires-builders: style
behlendorf pushed a commit that referenced this issue Feb 3, 2017
We must not use d_add_ci if the dentry already has the real name. Otherwise,
d_add_ci()->d_alloc_parallel() will find itself on the lookup hash and wait
on itself causing deadlock.

Tested-by: satmandu
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Closes #5124
Closes #5141
Closes #5147
Closes #5148
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants