SPL PANIC when creating a pool on top of a Ceph RBD #241

tdb · 2013-05-24T15:08:12Z

I'm trying to create a pool on top of a Ceph RBD. My setup is:

Running on a VMware VM
Ubuntu precise using linux-generic-lts-raring kernel (3.8.0.22.21)
ZFS/SPL 0.6.1 from ppa:zfs-native/stable
Ceph 0.61.2 from ceph.com repository

I can create a pool on top of a local disk without any problems. But when I put it on top of a Ceph RBD (block device) I get the following error:

# rbd ls -l
NAME     SIZE PARENT FMT PROT LOCK
cephzfs 1024G          1
# rbd map cephzfs --pool rbd --name client.admin
# ls -la /dev/rbd/rbd/cephzfs /dev/rbd1
brw-rw---- 1 root disk 251, 0 May 24 16:04 /dev/rbd1
lrwxrwxrwx 1 root root     10 May 24 16:04 /dev/rbd/rbd/cephzfs -> ../../rbd1
# zpool create pool1 /dev/rbd/rbd/cephzfs
cannot open 'pool1': dataset does not exist

And this panic:

[10582.132665] VERIFY(shpp->sh_eof == shpp->sh_pool_create_len) failed
[10582.132816] SPLError: 1746:0:(spa_history.c:276:spa_history_log_sync()) SPL PANIC
[10582.132958] SPL: Showing stack for process 1746
[10582.132962] Pid: 1746, comm: txg_sync Tainted: PF          O 3.8.0-22-generic #33~precise1-Ubuntu
[10582.132963] Call Trace:
[10582.132999]  [] spl_debug_dumpstack+0x27/0x40 [spl]
[10582.133006]  [] spl_debug_bug+0x82/0xe0 [spl]
[10582.133045]  [] spa_history_log_sync+0x428/0x650 [zfs]
[10582.133077]  [] dsl_sync_task_group_sync+0x123/0x210 [zfs]
[10582.133107]  [] dsl_pool_sync+0x41b/0x530 [zfs]
[10582.133140]  [] spa_sync+0x3a8/0xa50 [zfs]
[10582.133160]  [] ? ktime_get_ts+0x4c/0xe0
[10582.133195]  [] txg_sync_thread+0x2df/0x540 [zfs]
[10582.133229]  [] ? txg_init+0x250/0x250 [zfs]
[10582.133238]  [] thread_generic_wrapper+0x78/0x90 [spl]
[10582.133246]  [] ? __thread_create+0x310/0x310 [spl]
[10582.133255]  [] kthread+0xc0/0xd0
[10582.133259]  [] ? flush_kthread_worker+0xb0/0xb0
[10582.133272]  [] ret_from_fork+0x7c/0xb0
[10582.133275]  [] ? flush_kthread_worker+0xb0/0xb0

And then the following repeats after that until I reboot:

[10779.414291] INFO: task txg_sync:1746 blocked for more than 120 seconds.
[10779.414442] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[10779.414588] txg_sync        D ffff88003737b460     0  1746      2 0x00000000
[10779.414596]  ffff88003c517ad8 0000000000000046 0000000000000013 ffff88003fc13f40
[10779.414601]  ffff88003c517fd8 ffff88003c517fd8 ffff88003c517fd8 0000000000013f40
[10779.414604]  ffff88003b2d9740 ffff88003b08c5c0 ffffffff81c15347 0000000000000000
[10779.414607] Call Trace:
[10779.414624]  [] schedule+0x29/0x70
[10779.414652]  [] spl_debug_bug+0xb5/0xe0 [spl]
[10779.414716]  [] spa_history_log_sync+0x428/0x650 [zfs]
[10779.414751]  [] dsl_sync_task_group_sync+0x123/0x210 [zfs]
[10779.414785]  [] dsl_pool_sync+0x41b/0x530 [zfs]
[10779.414818]  [] spa_sync+0x3a8/0xa50 [zfs]
[10779.414825]  [] ? ktime_get_ts+0x4c/0xe0
[10779.414863]  [] txg_sync_thread+0x2df/0x540 [zfs]
[10779.414897]  [] ? txg_init+0x250/0x250 [zfs]
[10779.414906]  [] thread_generic_wrapper+0x78/0x90 [spl]
[10779.414914]  [] ? __thread_create+0x310/0x310 [spl]
[10779.414919]  [] kthread+0xc0/0xd0
[10779.414922]  [] ? flush_kthread_worker+0xb0/0xb0
[10779.414926]  [] ret_from_fork+0x7c/0xb0
[10779.414929]  [] ? flush_kthread_worker+0xb0/0xb0
[10899.176620] INFO: task txg_sync:1746 blocked for more than 120 seconds.
[10899.176758] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[10899.176902] txg_sync        D ffff88003737b460     0  1746      2 0x00000000
[10899.176906]  ffff88003c517ad8 0000000000000046 0000000000000013 ffff88003fc13f40
[10899.176910]  ffff88003c517fd8 ffff88003c517fd8 ffff88003c517fd8 0000000000013f40
[10899.176913]  ffff88003b2d9740 ffff88003b08c5c0 ffffffff81c15347 0000000000000000
[10899.176917] Call Trace:
[10899.176926]  [] schedule+0x29/0x70
[10899.176958]  [] spl_debug_bug+0xb5/0xe0 [spl]
[10899.176998]  [] spa_history_log_sync+0x428/0x650 [zfs]
[10899.177030]  [] dsl_sync_task_group_sync+0x123/0x210 [zfs]
[10899.177059]  [] dsl_pool_sync+0x41b/0x530 [zfs]
[10899.177092]  [] spa_sync+0x3a8/0xa50 [zfs]
[10899.177097]  [] ? ktime_get_ts+0x4c/0xe0
[10899.177132]  [] txg_sync_thread+0x2df/0x540 [zfs]
[10899.177166]  [] ? txg_init+0x250/0x250 [zfs]
[10899.177178]  [] thread_generic_wrapper+0x78/0x90 [spl]
[10899.177186]  [] ? __thread_create+0x310/0x310 [spl]
[10899.177191]  [] kthread+0xc0/0xd0
[10899.177194]  [] ? flush_kthread_worker+0xb0/0xb0
[10899.177198]  [] ret_from_fork+0x7c/0xb0
[10899.177202]  [] ? flush_kthread_worker+0xb0/0xb0

I'm happy to provide any further information required or do testing as needed.

Thank you.
Tim.

hvenzke · 2013-05-28T22:01:46Z

use the real physical name /dev/rbd1
no symlinks with zfs !!

tdb · 2013-05-28T22:15:49Z

It makes no difference I'm afraid. The panic is identical.

hvenzke · 2013-05-28T23:09:53Z

Well , then the bug is at Ceph RBD ´s logic basicly as that provide the storange .

ZFS on linux is known to work with native drbd fine.

Ceph RBD snapshoot featgers are overkill as ZFS does that itsself.

Can you try make an gfs cluster or lustre fs on it ?

tdb · 2013-05-29T10:12:46Z

Ceph RBD works fine with other file systems for me, and ZFS works fine with other underlying storage. So it's hard to be precise about where the problem lies. In any case, ZFS shouldn't panic, surely? That's a bug.

Ceph provides a distributed file system which is why I want to use it. ZFS also has some great features for managing multiple file systems within a single pool including snapshots.

behlendorf · 2013-06-07T17:02:24Z

@tdb You're hitting a VERIFY in the code while attempting to sync out the history buffer to disk. For some reason the buffer lengths aren't being correctly updated. Since this only happens on top of a ceph rbd I suspect their block device is behaving slightly differently that the rest of the Linux block drivers. For the purposes of a test you could try commenting out the VERIFY like this, although I my suspicion is you'll likely hit another issue quickly. However, that failure may shed some more light on exactly what's going wrong.

diff --git a/module/zfs/spa_history.c b/module/zfs/spa_history.c
index 9fb75f3..2d45266 100644
--- a/module/zfs/spa_history.c
+++ b/module/zfs/spa_history.c
@@ -272,8 +272,8 @@ spa_history_log_sync(void *arg1, void *arg2, dmu_tx_t *tx)
            NV_ENCODE_XDR, KM_PUSHPAGE) == 0);

        mutex_enter(&spa->spa_history_lock);
-       if (hap->ha_log_type == LOG_CMD_POOL_CREATE)
-               VERIFY(shpp->sh_eof == shpp->sh_pool_create_len);
+//     if (hap->ha_log_type == LOG_CMD_POOL_CREATE)
+//             VERIFY(shpp->sh_eof == shpp->sh_pool_create_len);

        /* write out the packed length as little endian */
        le_len = LE_64((uint64_t)reclen);

Related to this most people usually think about putting ceph on top over zfs not vise-versa. This behavior was recently fixed in master so you might try that. It won't get you features like distributed snapshots but it will bring many of zfs's other benefits.

tdb · 2013-06-07T20:44:39Z

@behlendorf Thanks for the reply. I made the change suggested (against 0.6.1) and saw the following:

# zpool create pool1 /dev/rbd1
cannot open 'pool1': dataset does not exist

So that's the same as before. Checking zpool status afterwards showed a good pool, but zfs status didn't show any filesystems. No panic though.

Then I tried to repeat it. This time I got a panic after creating the pool, and zpool status hung. The panic was:

[  183.924160] divide error: 0000 [#1] SMP
[  183.924349] Modules linked in: coretemp(F) microcode(F) psmouse(F) ppdev(F) vmw_balloon(F) serio_raw(F) i2c_piix4(F) vmwgfx(F) mac_hid(F) ttm(F) shpchp(F) drm(F) parport_pc(F) rbd(F) libceph(F) lp(F) parport(F) zfs(POF) zcommon(POF) znvpair(POF) zavl(POF) zunicode(POF) spl(OF) floppy(F) e1000(F) mptspi(F) mptscsih(F) mptbase(F) btrfs(F) zlib_deflate(F) libcrc32c(F)
[  183.926033] CPU 0
[  183.926100] Pid: 2019, comm: txg_sync Tainted: PF          O 3.8.0-23-generic #34~precise1-Ubuntu VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
[  183.926385] RIP: 0010:[]  [] spa_history_write+0x82/0x1d0 [zfs]
[  183.926631] RSP: 0018:ffff88003c549ab8  EFLAGS: 00010246
[  183.926742] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[  183.926878] RDX: 0000000000000000 RSI: 0000000000000020 RDI: 0000000000000000
[  183.927015] RBP: ffff88003c549b28 R08: ffff88003cfb4b40 R09: 0000000000000003
[  183.927151] R10: ffff880037062303 R11: 316462722f766564 R12: ffff88003c496600
[  183.927287] R13: ffff88003be36000 R14: ffff88003cf9a000 R15: 0000000000000008
[  183.927424] FS:  0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[  183.927574] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  183.927690] CR2: 00007f3b12ef0000 CR3: 000000003b141000 CR4: 00000000000007f0
[  183.927924] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  183.928132] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  183.928312] Process txg_sync (pid: 2019, threadinfo ffff88003c548000, task ffff88003bc8ae80)
[  183.928535] Stack:
[  183.928633]  0000000000000002 ffffffffa01e3360 ffff88003cfb4b40 ffff88003c549ba0
[  183.929007]  ffff88003cf9a000 0000000000000008 ffff88003be36000 0000000068163d54
[  183.929382]  ffff88003b8a2cc0 ffff88003b8a2cc0 ffff88003be36000 ffff88003cfb4b40
[  183.929757] Call Trace:
[  183.929903]  [] spa_history_log_sync+0x221/0x610 [zfs]
[  183.930106]  [] dsl_sync_task_group_sync+0x123/0x210 [zfs]
[  183.930312]  [] dsl_pool_sync+0x41b/0x530 [zfs]
[  183.930507]  [] spa_sync+0x3a8/0xa50 [zfs]
[  183.930667]  [] ? ktime_get_ts+0x4c/0xe0
[  183.930852]  [] txg_sync_thread+0x2df/0x540 [zfs]
[  183.931049]  [] ? txg_init+0x250/0x250 [zfs]
[  183.931219]  [] thread_generic_wrapper+0x78/0x90 [spl]
[  183.931397]  [] ? __thread_create+0x310/0x310 [spl]
[  183.931568]  [] kthread+0xc0/0xd0
[  183.936038]  [] ? flush_kthread_worker+0xb0/0xb0
[  183.936149]  [] ret_from_fork+0x7c/0xb0
[  183.936251]  [] ? flush_kthread_worker+0xb0/0xb0
[  183.936360] Code: 55 b0 48 89 fa 48 29 f2 48 01 c2 48 39 55 b8 0f 82 bc 00 00 00 4c 8b 75 b0 41 bf 08 00 00 00 48 29 c8 31 d2 49 8b b5 70 08 00 00 <48> f7 f7 4c 8d 45 c0 4c 89 f7 48 01 ca 48 29 d3 48 83 fb 08 49
[  183.938433] RIP  [] spa_history_write+0x82/0x1d0 [zfs]
[  183.938599]  RSP 
[  183.938710] ---[ end trace f7a46262c37aea79 ]---

If I had a more concrete idea of what was happening I'd be happy to file a bug with Ceph.

behlendorf · 2013-06-07T23:53:06Z

Divide by zero, now that's interesting. Can you dump the exact code for your build as follows, it should look something like this but the exact line might differ. I want to know where that device by zero occurred.

[behlendo@rhel-6-2-amd64 zfs]$ gdb module/zfs/zfs.ko
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/behlendo/src/git/zfs/module/zfs/zfs.ko...done.
(gdb) 
(gdb)  list *(spa_history_write+0x82)
0x58a62 is in spa_history_write (/home/behlendo/src/git/zfs/module/zfs/../../module/zfs/spa_history.c:129).
124             int err;
125
126             phys_bof = spa_history_log_to_phys(shpp->sh_bof, shpp);
127             firstread = MIN(sizeof (reclen), shpp->sh_phys_max_off - phys_bof);
128
129             if ((err = dmu_read(mos, spa->spa_history, phys_bof, firstread,
130                 buf, DMU_READ_PREFETCH)) != 0)
131                     return (err);
132             if (firstread != sizeof (reclen)) {
133                     if ((err = dmu_read(mos, spa->spa_history,
(gdb) quit

tdb · 2013-06-08T01:16:25Z

I've been building the module using dkms, but it appears to be stripping the module or not building it with symbols in the first place. Is there a way to modify that behaviour? Or am I going to need to ditch that and build it myself?

I've tried setting the relevant things in /etc/default/zfs.

chrisrd · 2013-06-08T01:33:10Z

Given previously failing VERIFY:

+//     if (hap->ha_log_type == LOG_CMD_POOL_CREATE)
+//             VERIFY(shpp->sh_eof == shpp->sh_pool_create_len);

...static analysis suggests:

static int
spa_history_write(spa_t *spa, void *buf, uint64_t len, spa_history_phys_t *shpp,
    dmu_tx_t *tx)
{
    ...
        phys_eof = spa_history_log_to_phys(shpp->sh_eof, shpp);
    ...
}

static uint64_t
spa_history_log_to_phys(uint64_t log_off, spa_history_phys_t *shpp)
{
        uint64_t phys_len;

        phys_len = shpp->sh_phys_max_off - shpp->sh_pool_create_len;
        return ((log_off - shpp->sh_pool_create_len) % phys_len      <<<< BOOM!
            + shpp->sh_pool_create_len);
}

behlendorf · 2013-06-18T16:46:47Z

@tdb It depends on your kernel and what the default build options are. For example, the Ubuntu kernels will always strip the symbols. It may also not be needed since @chrisrd has likely spotted the offending line here.

It seems likely that we're somehow reading bogus data from the ceph rbd. It would be useful to see what those values are. If you're still interested in chasing this can you try the following patch. It will log the offending value to the console before the crash. It would be useful to run it several times to see if the values remain constant or change.

diff --git a/module/zfs/spa_history.c b/module/zfs/spa_history.c
index 9fb75f3..700f364 100644
--- a/module/zfs/spa_history.c
+++ b/module/zfs/spa_history.c
@@ -223,6 +223,13 @@ spa_history_log_sync(void *arg1, void *arg2, dmu_tx_t *tx)
         */
        VERIFY(0 == dmu_bonus_hold(mos, spa->spa_history, FTAG, &dbp));
        shpp = dbp->db_data;
+#ifdef _KERNEL
+       printk("sh_pool_create_len = %llu\n", shpp->sh_pool_create_len);
+       printk("sh_phys_max_off = %llu\n", shpp->sh_phys_max_off);
+       printk("sh_bof = %llu\n", shpp->sh_bof);
+       printk("sh_eof = %llu\n", shpp->sh_eof);
+       printk("sh_records_losts = %llu\n", shpp->sh_records_lost);
+#endif

        dmu_buf_will_dirty(dbp, tx);

tdb · 2013-06-18T21:47:34Z

@behlendorf It looks like either through fiddling or other updates that I've managed to move the error:

[  422.936633]  rbd1: unknown partition table
[  422.936705] rbd: rbd1: added with size 0x10000000000
[  441.362250] SPL: using hostid 0x007f0101
[  441.470098] SPLError: 1682:0:(zap_micro.c:301:mze_find()) VERIFY3(mze->mze_cd == (&(zn->zn_zap)->zap_u.zap_micro.zap_phys->mz_chunk[(mze)->mze_chunkid])->mze_cd) failed (0 == 1635019877)
[  441.470418] SPLError: 1682:0:(zap_micro.c:301:mze_find()) SPL PANIC
[  441.470544] SPL: Showing stack for process 1682
[  441.470552] Pid: 1682, comm: txg_sync Tainted: PF          O 3.8.0-25-generic #37~precise1-Ubuntu
[  441.470554] Call Trace:
[  441.470579]  [] spl_debug_dumpstack+0x27/0x40 [spl]
[  441.470589]  [] spl_debug_bug+0x82/0xe0 [spl]
[  441.470636]  [] mze_find+0x13a/0x270 [zfs]
[  441.470677]  [] zap_lookup_norm+0x9e/0x1c0 [zfs]
[  441.470685]  [] ? kmem_free_debug+0x4b/0x150 [spl]
[  441.470725]  [] zap_lookup+0x33/0x40 [zfs]
[  441.470765]  [] spa_feature_is_active+0x8a/0xf0 [zfs]
[  441.470799]  [] dsl_scan_active+0x76/0xc0 [zfs]
[  441.470833]  [] dsl_scan_sync+0x4f/0xe30 [zfs]
[  441.470873]  [] ? zio_wait+0x23d/0x480 [zfs]
[  441.470910]  [] ? bpobj_enqueue_cb+0x20/0x20 [zfs]
[  441.470947]  [] spa_sync+0x417/0xcd0 [zfs]
[  441.470968]  [] ? ktime_get_ts+0x4c/0xe0
[  441.471007]  [] txg_sync_thread+0x30a/0x640 [zfs]
[  441.471016]  [] ? kmem_free_debug+0x4b/0x150 [spl]
[  441.471054]  [] ? txg_quiesce_thread+0x540/0x540 [zfs]
[  441.471062]  [] thread_generic_wrapper+0x78/0x90 [spl]
[  441.471070]  [] ? __thread_create+0x310/0x310 [spl]
[  441.471080]  [] kthread+0xc0/0xd0
[  441.471084]  [] ? flush_kthread_worker+0xb0/0xb0
[  441.471096]  [] ret_from_fork+0x7c/0xb0
[  441.471100]  [] ? flush_kthread_worker+0xb0/0xb0

If that's of no use to you, let me know and I'll try to get the machine back how it was. I notice the kernel version has changed, and I'm fairly sure a ceph update got pulled in too.

behlendorf · 2013-06-18T23:32:34Z

@tdb This just looks like garbage data from disk as well. One thing which did catch my eye however from the above log was the size of the rbd device. 0x10000000000 is a surprisingly round number for the partition, is this expected? Also are you creating a partition table for zfs manually, or allowing it to partition the device?

[  422.936705] rbd: rbd1: added with size 0x10000000000

tdb · 2013-06-19T00:44:18Z

@behlendorf I noticed that size too. It's a 1GB partition, so it's actually correct.

# rbd ls -l
NAME     SIZE PARENT FMT PROT LOCK
cephzfs 1024G          1

I was giving the raw device to ZFS, rather than creating a partition.

If I use fdisk to but a partition table on the disk, but without adding any partitions, I get the following when creating a pool:

# zpool create pool1 /dev/rbd1
internal error: Invalid argument
Aborted (core dumped)

If I create a partition on it I get the same errors as I mentioned previously (mze_find) when creating a pool on /dev/rbd1p1.

Just for comparison, here's the output creating an ext4 filesystem on the same partition:

root@ubuntu:~# mkfs.ext4 /dev/rbd1p1
mke2fs 1.42 (29-Nov-2011)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=1024 blocks, Stripe width=1024 blocks
67108864 inodes, 268434432 blocks
13421721 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
8192 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
        4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
        102400000, 214990848
Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done
root@ubuntu:~# mount /dev/rbd1p1 /mnt
root@ubuntu:~# df -h /mnt
Filesystem      Size  Used Avail Use% Mounted on
/dev/rbd1p1    1008G   72M  957G   1% /mnt
root@ubuntu:~#

behlendorf · 2013-06-19T16:28:59Z

Strange. Well the only way these failures make sense is if something odd is happening at the block device layer. My next suggestion would be to use blktrace to grab a trace log for the rbd device. That would allow us to look for something unusual in the way the rbd or zfs is behaving.

http://www.cse.unsw.edu.au/~aaronc/iosched/doc/blktrace.html

tdb · 2013-06-20T14:40:26Z

@behlendorf Does this output help?

https://gist.github.com/tdb/2ae734e546be0c5e1d39

behlendorf · 2013-06-20T21:30:55Z

@tdb That's exactly the log I wanted to see, but unfortunately it doesn't really show anything strange. All the I/O looks reasonable and is doing what I'd expect a zpool create to do. It's the right size and it's all within the size of the device. However, what is interesting is that it doesn't show any reads before the crash.

That's got me wondering if the rbd driver might be modifying parts of the pages in the bvecs during the write. That could explain this issue, but we'd need to put a debug patch together to see.

chrisrd · 2013-06-21T06:27:02Z

@TBD Based on little more than the mention of modifying bvecs, this commit which touches drivers/block/rbd.c might be relevant:

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d74c6d514fe314b8bdab58b487b25992291577ec

block: Add bio_for_each_segment_all()

__bio_for_each_segment() iterates bvecs from the specified index
instead of bio->bv_idx.  Currently, the only usage is to walk all the
bvecs after the bio has been advanced by specifying 0 index.

For immutable bvecs, we need to split these apart;
bio_for_each_segment() is going to have a different implementation.
This will also help document the intent of code that's using it -
bio_for_each_segment_all() is only legal to use for code that owns the
bio.

If your kernel doesn't have that patch already it could be worthwhile trying a kernel including it. It looks to have been introduced some time between v3.9 and v3.10-rc1. Possibly even worth trying v3.10-rc6 which has pulled in a bunch of rbd.c changes

tdb · 2013-06-21T11:28:03Z

@chrisrd Using the Ubuntu mainline kernels I tried v3.9.7, but it behaved the same. I checked and it doesn't cotain the commit you mentioned above. So I tried v3.10-rc6 and I get the following build error in spl:

Making all in module
make[2]: Entering directory `/var/lib/dkms/spl/0.6.1/build/module'
make -C /lib/modules/3.10.0-031000rc6-generic/build SUBDIRS=`pwd`  CONFIG_SPL=m modules
make[3]: Entering directory `/usr/src/linux-headers-3.10.0-031000rc6-generic'
  CC [M]  /var/lib/dkms/spl/0.6.1/build/module/spl/../../module/spl/spl-debug.o
  CC [M]  /var/lib/dkms/spl/0.6.1/build/module/spl/../../module/spl/spl-proc.o
In file included from /var/lib/dkms/spl/0.6.1/build/include/sys/kmem.h:38:0,
                 from /var/lib/dkms/spl/0.6.1/build/include/sys/kstat.h:32,
                 from /var/lib/dkms/spl/0.6.1/build/module/spl/../../module/spl/spl-proc.c:28:
/var/lib/dkms/spl/0.6.1/build/include/sys/vmsystm.h:77:8: error: redefinition of ‘struct vmalloc_info’
include/linux/vmalloc.h:173:8: note: originally defined here
/var/lib/dkms/spl/0.6.1/build/module/spl/../../module/spl/spl-proc.c: In function ‘proc_dir_entry_match’:
/var/lib/dkms/spl/0.6.1/build/module/spl/../../module/spl/spl-proc.c:1126:15: error: dereferencing pointer to incomplete type
/var/lib/dkms/spl/0.6.1/build/module/spl/../../module/spl/spl-proc.c:1129:32: error: dereferencing pointer to incomplete type
/var/lib/dkms/spl/0.6.1/build/module/spl/../../module/spl/spl-proc.c: In function ‘proc_dir_entry_find’:
/var/lib/dkms/spl/0.6.1/build/module/spl/../../module/spl/spl-proc.c:1137:16: error: dereferencing pointer to incomplete type
/var/lib/dkms/spl/0.6.1/build/module/spl/../../module/spl/spl-proc.c:1137:37: error: dereferencing pointer to incomplete type
/var/lib/dkms/spl/0.6.1/build/module/spl/../../module/spl/spl-proc.c: In function ‘proc_dir_entries’:
/var/lib/dkms/spl/0.6.1/build/module/spl/../../module/spl/spl-proc.c:1150:16: error: dereferencing pointer to incomplete type
/var/lib/dkms/spl/0.6.1/build/module/spl/../../module/spl/spl-proc.c:1150:37: error: dereferencing pointer to incomplete type
/var/lib/dkms/spl/0.6.1/build/module/spl/../../module/spl/spl-proc.c: In function ‘spl_proc_init’:
/var/lib/dkms/spl/0.6.1/build/module/spl/../../module/spl/spl-proc.c:1177:2: error: implicit declaration of function ‘create_proc_entry’ [-Werror=implicit-function-declaration]
/var/lib/dkms/spl/0.6.1/build/module/spl/../../module/spl/spl-proc.c:1177:21: warning: assignment makes pointer from integer without a cast [enabled by default]
/var/lib/dkms/spl/0.6.1/build/module/spl/../../module/spl/spl-proc.c:1181:27: error: dereferencing pointer to incomplete type
/var/lib/dkms/spl/0.6.1/build/module/spl/../../module/spl/spl-proc.c: In function ‘proc_dir_entry_match’:
/var/lib/dkms/spl/0.6.1/build/module/spl/../../module/spl/spl-proc.c:1130:1: warning: control reaches end of non-void function [-Wreturn-type]
cc1: some warnings being treated as errors
make[5]: *** [/var/lib/dkms/spl/0.6.1/build/module/spl/../../module/spl/spl-proc.o] Error 1
make[4]: *** [/var/lib/dkms/spl/0.6.1/build/module/spl] Error 2
make[3]: *** [_module_/var/lib/dkms/spl/0.6.1/build/module] Error 2
make[3]: Leaving directory `/usr/src/linux-headers-3.10.0-031000rc6-generic'
make[2]: *** [modules] Error 2
make[2]: Leaving directory `/var/lib/dkms/spl/0.6.1/build/module'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/var/lib/dkms/spl/0.6.1/build'
make: *** [all] Error 2

Have spl/zfs been tested with v3.10 yet?

behlendorf · 2013-06-21T16:56:41Z

@tdb There are pull requests open for 3.10 support by they are still under going review before getting merged. They should be safe to use, the only real questions around them are do they accidentally break builds on older kernels and are they as clean as they can be.

@chrisrd I don't think the referenced commit will help, but it wouldn't hurt to try. We'll probably need to instrument the zfs vdev_disk.c code to see exactly what's happening to the bios.

tdb · 2013-08-25T00:02:06Z

Just a quick update on this. I've tried again with 0.6.2 and the following two kernels:

http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.10.9-saucy/
http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.11-rc6-saucy/

Same problem:

Aug 25 00:51:29 ubuntu-12042 kernel: [  142.393672] SPLError: 2851:0:(zap_micro.c:301:mze_find()) VERIFY3(mze->mze_cd == (&(zn->zn_zap)->zap_u.zap_micro.zap_phys->mz_chunk[(mze)->mze_chunkid])->mze_cd) failed (0 == 825307184)
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394034] SPLError: 2851:0:(zap_micro.c:301:mze_find()) SPL PANIC
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394160] SPL: Showing stack for process 2851
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394164] CPU: 0 PID: 2851 Comm: txg_sync Tainted: PF          O 3.11.0-031100rc6-generic #201308181835
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394166] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 02/22/2012
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394169]  ffff88003c59da00 ffff88003c4ab9c8 ffffffff81720b9b 0000000000000007
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394173]  0000000000000000 ffff88003c4ab9d8 ffffffffa018f4d7 ffff88003c4aba18
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394176]  ffffffffa01907a2 ffffffffa01a4b4d ffff880036998880 ffff88003c59da00
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394179] Call Trace:
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394203]  [] dump_stack+0x46/0x58
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394221]  [] spl_debug_dumpstack+0x27/0x40 [spl]
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394246]  [] spl_debug_bug+0x82/0xe0 [spl]
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394314]  [] mze_find+0x13a/0x270 [zfs]
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394359]  [] zap_lookup_norm+0x9e/0x1c0 [zfs]
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394368]  [] ? kmem_free_debug+0x4b/0x150 [spl]
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394410]  [] zap_lookup+0x33/0x40 [zfs]
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394451]  [] spa_feature_is_active+0x8a/0xf0 [zfs]
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394485]  [] dsl_scan_active+0x76/0xc0 [zfs]
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394520]  [] dsl_scan_sync+0x4f/0xe30 [zfs]
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394559]  [] ? zio_wait+0x23d/0x4a0 [zfs]
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394596]  [] ? bpobj_enqueue_cb+0x20/0x20 [zfs]
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394633]  [] spa_sync+0x48a/0xd60 [zfs]
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394649]  [] ? ktime_get_ts+0x4c/0xe0
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394687]  [] txg_sync_thread+0x30a/0x640 [zfs]
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394696]  [] ? kmem_free_debug+0x4b/0x150 [spl]
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394733]  [] ? txg_quiesce_thread+0x540/0x540 [zfs]
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394742]  [] thread_generic_wrapper+0x78/0x90 [spl]
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394750]  [] ? __thread_create+0x310/0x310 [spl]
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394759]  [] kthread+0xc0/0xd0
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394763]  [] ? flush_kthread_worker+0xb0/0xb0
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394771]  [] ret_from_fork+0x7c/0xb0
Aug 25 00:51:29 ubuntu-12042 kernel: [  142.394776]  [] ? flush_kthread_worker+0xb0/0xb0

tdb · 2013-11-21T23:28:27Z

Using 0.6.2 and the linux-image-generic-lts-saucy 3.11.0.13.12 kernel on Ubuntu precise I now get the following:

# zpool create pool2 /dev/rbd1
internal error: Invalid argument
Aborted (core dumped)

The core file contains:

#0  0x00007ffa1abad425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffa1abb0b8b in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007ffa1b383782 in ?? () from /lib/libzfs.so.2
#3  0x00007ffa1b383b70 in zfs_standard_error_fmt () from /lib/libzfs.so.2
#4  0x00007ffa1b364a1e in zfs_open () from /lib/libzfs.so.2
#5  0x000000000040bc98 in zpool_do_create (argc=, argv=) at ../../cmd/zpool/zpool_main.c:1057
#6  0x0000000000404d26 in main (argc=4, argv=0x7fffecdc5178) at ../../cmd/zpool/zpool_main.c:5709

And this in the log:

Nov 21 23:08:22 ubuntu-12042 kernel: [  116.240529] SPLError: 1688:0:(spa.c:6190:spa_sync()) VERIFY3(bpobj_iterate(defer_bpo, spa_free_sync_cb, zio, tx) == 0) failed (22 == 0)
Nov 21 23:08:22 ubuntu-12042 kernel: [  116.240786] SPLError: 1688:0:(spa.c:6190:spa_sync()) SPL PANIC
Nov 21 23:08:22 ubuntu-12042 kernel: [  116.240899] SPL: Showing stack for process 1688
Nov 21 23:08:22 ubuntu-12042 kernel: [  116.240910] CPU: 0 PID: 1688 Comm: txg_sync Tainted: PF          O 3.11.0-13-generic #20~precise2-Ubuntu
Nov 21 23:08:22 ubuntu-12042 kernel: [  116.240912] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/30/2013
Nov 21 23:08:22 ubuntu-12042 kernel: [  116.240915]  0000000000000005 ffff88003c6f9c48 ffffffff8173a05d 0000000000000007
Nov 21 23:08:22 ubuntu-12042 kernel: [  116.240919]  0000000000000000 ffff88003c6f9c58 ffffffffa01794d7 ffff88003c6f9c98
Nov 21 23:08:22 ubuntu-12042 kernel: [  116.240922]  ffffffffa017a7a2 ffffffffa018ebed ffff88003b804000 0000000000000005
Nov 21 23:08:22 ubuntu-12042 kernel: [  116.240925] Call Trace:
Nov 21 23:08:22 ubuntu-12042 kernel: [  116.240943]  [] dump_stack+0x46/0x58
Nov 21 23:08:22 ubuntu-12042 kernel: [  116.240971]  [] spl_debug_dumpstack+0x27/0x40 [spl]
Nov 21 23:08:22 ubuntu-12042 kernel: [  116.240979]  [] spl_debug_bug+0x82/0xe0 [spl]
Nov 21 23:08:22 ubuntu-12042 kernel: [  116.241024]  [] spa_sync+0x9f7/0xdb0 [zfs]
Nov 21 23:08:22 ubuntu-12042 kernel: [  116.241080]  [] txg_sync_thread+0x364/0x6a0 [zfs]
Nov 21 23:08:22 ubuntu-12042 kernel: [  116.241122]  [] ? txg_quiesce_thread+0x520/0x520 [zfs]
Nov 21 23:08:22 ubuntu-12042 kernel: [  116.241131]  [] thread_generic_wrapper+0x78/0x90 [spl]
Nov 21 23:08:22 ubuntu-12042 kernel: [  116.241139]  [] ? __thread_create+0x310/0x310 [spl]
Nov 21 23:08:22 ubuntu-12042 kernel: [  116.241145]  [] kthread+0xc0/0xd0
Nov 21 23:08:22 ubuntu-12042 kernel: [  116.241149]  [] ? flush_kthread_worker+0xb0/0xb0
Nov 21 23:08:22 ubuntu-12042 kernel: [  116.241158]  [] ret_from_fork+0x7c/0xb0
Nov 21 23:08:22 ubuntu-12042 kernel: [  116.241162]  [] ? flush_kthread_worker+0xb0/0xb0
Nov 21 23:10:27 ubuntu-12042 kernel: [  240.848936] INFO: task txg_sync:1688 blocked for more than 120 seconds.
Nov 21 23:10:27 ubuntu-12042 kernel: [  240.849079] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 21 23:10:27 ubuntu-12042 kernel: [  240.849220] txg_sync        D ffff880036a5ece0     0  1688      2 0x00000000
Nov 21 23:10:27 ubuntu-12042 kernel: [  240.849226]  ffff88003c6f9c48 0000000000000046 ffffffff81ae70b3 ffff88003fc14580
Nov 21 23:10:27 ubuntu-12042 kernel: [  240.849230]  ffff88003c6f9fd8 ffff88003c6f9fd8 ffff88003c6f9fd8 0000000000014580
Nov 21 23:10:27 ubuntu-12042 kernel: [  240.849233]  ffff88003cd69770 ffff88003cd6aee0 0000000000000000 0000000000000000
Nov 21 23:10:27 ubuntu-12042 kernel: [  240.849236] Call Trace:
Nov 21 23:10:27 ubuntu-12042 kernel: [  240.849252]  [] schedule+0x29/0x70
Nov 21 23:10:27 ubuntu-12042 kernel: [  240.849299]  [] spl_debug_bug+0xb5/0xe0 [spl]
Nov 21 23:10:27 ubuntu-12042 kernel: [  240.849346]  [] spa_sync+0x9f7/0xdb0 [zfs]
Nov 21 23:10:27 ubuntu-12042 kernel: [  240.849387]  [] txg_sync_thread+0x364/0x6a0 [zfs]
Nov 21 23:10:27 ubuntu-12042 kernel: [  240.849427]  [] ? txg_quiesce_thread+0x520/0x520 [zfs]
Nov 21 23:10:27 ubuntu-12042 kernel: [  240.849445]  [] thread_generic_wrapper+0x78/0x90 [spl]
Nov 21 23:10:27 ubuntu-12042 kernel: [  240.849454]  [] ? __thread_create+0x310/0x310 [spl]
Nov 21 23:10:27 ubuntu-12042 kernel: [  240.849460]  [] kthread+0xc0/0xd0
Nov 21 23:10:27 ubuntu-12042 kernel: [  240.849464]  [] ? flush_kthread_worker+0xb0/0xb0
Nov 21 23:10:27 ubuntu-12042 kernel: [  240.849468]  [] ret_from_fork+0x7c/0xb0
Nov 21 23:10:27 ubuntu-12042 kernel: [  240.849471]  [] ? flush_kthread_worker+0xb0/0xb0

Further zpool commands generate the following:

Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182141] SPLError: 2064:0:(zap_micro.c:1292:zap_cursor_retrieve()) VERIFY3(mze->mze_cd == mzep->mze_cd) failed (0 == 1635019877)
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182264] SPLError: 2064:0:(zap_micro.c:1292:zap_cursor_retrieve()) SPL PANIC
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182329] SPL: Showing stack for process 2064
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182335] CPU: 0 PID: 2064 Comm: zpool Tainted: PF          O 3.11.0-13-generic #20~precise2-Ubuntu
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182337] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/30/2013
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182338]  ffff88003c25b640 ffff88003bdebac8 ffffffff8173a05d 0000000000000007
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182341]  0000000000000000 ffff88003bdebad8 ffffffffa01794d7 ffff88003bdebb18
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182343]  ffffffffa017a7a2 ffffffffa018ebed ffff88003bdebbf8 ffff88003c25b640
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182345] Call Trace:
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182352]  [] dump_stack+0x46/0x58
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182363]  [] spl_debug_dumpstack+0x27/0x40 [spl]
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182367]  [] spl_debug_bug+0x82/0xe0 [spl]
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182400]  [] zap_cursor_retrieve+0x24a/0x480 [zfs]
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182414]  [] ? default_spin_lock_flags+0x9/0x10
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182441]  [] ? zap_unlockdir+0x108/0x1a0 [zfs]
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182466]  [] spa_add_feature_stats+0x213/0x440 [zfs]
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182471]  [] ? kmem_alloc_debug+0x138/0x3b0 [spl]
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182476]  [] ? kmem_alloc_debug+0x138/0x3b0 [spl]
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182482]  [] ? nvlist_remove_all+0x8f/0xd0 [znvpair]
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182506]  [] ? spa_config_held+0xb9/0xd0 [zfs]
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182531]  [] ? spa_add_l2cache+0x29/0x3f0 [zfs]
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182555]  [] ? spa_add_spares+0x25/0x360 [zfs]
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182579]  [] spa_get_stats+0x10f/0x330 [zfs]
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182584]  [] ? kmem_alloc_debug+0x138/0x3b0 [spl]
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182610]  [] zfs_ioc_pool_stats+0x31/0x70 [zfs]
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182636]  [] zfsdev_ioctl+0x53b/0x5b0 [zfs]
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182646]  [] ? ftrace_raw_event_do_sys_open+0x100/0x110
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182651]  [] do_vfs_ioctl+0x7c/0x2f0
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182653]  [] SyS_ioctl+0x91/0xb0
Nov 21 23:17:44 ubuntu-12042 kernel: [  678.182657]  [] system_call_fastpath+0x1a/0x1f

behlendorf · 2013-11-21T23:31:22Z

@tdb This was due to an ABI change between the user utilities and kmods. It is unrelated to your original issue. Make sure you rebuild everything such that the utilities exactly match the kmods.

tdb · 2013-11-22T00:04:33Z

@behlendorf Ah, ok, sorry for the noise. I actually just installed the binary packages from launchpad, which built the kmods with dkms. So I would have expected that to stay in sync. Anyway - as you say, not related to this issue.

rbraddy · 2013-12-18T03:50:57Z

Is this still an issue or has a resolution been found?

tdb · 2013-12-18T11:58:14Z

@rbraddy I'm not aware of a resolution yet, but @behlendorf can confirm. Are you seeing the same problem? It'd be good to know it's not just me!

rbraddy · 2013-12-18T15:00:47Z

Yes, we are seeing the same issue with creating ZFS storage pool atop of Ceph RDB block device - zpool create failure and kernel panic.

Creating ext4 filesystem on RDB works perfectly. RBD is used extensively today by various cloud stacks (e.g., Open Stack, Cloud Stack and others), so there seems to be no issue with how it presents itself as a block device for those file systems.

Having RDB work well with ZFS is very important, as it addresses one of the major drawbacks to ZFS - a single point of failure on direct-attached storage, plus the ability to scale out. RADOS is very impressive technology, and combined with ZFS promises to be the most powerful filesystem around. Ceph's filesystem is not ready for prime time, so it just makes sense for these two technologies to work well together and be supported (the way ZFS is supported underneath Ceph OSD's today).

I agree that it's odd that: a) ZFS is the only major file system that is not working atop of RDB today, and b) ZFS panics instead of failing gracefully in the face of whatever incompatibility exists.

Having said that, from what I have seen, ZFS does work a bit differently than many other file systems. In our testing, we also encountered strange behavior by PARTED when trying to delete an existing ext4 partition that we initially configured atop of RDB, in an attempt to create an empty GPT partition in preparation for use with ZFS. ZFS creates its own partitioning scheme from what I have seen, so this may be a clue. We are still investigating, but at this point lack the deep kernel expertise required to reconcile the issue between these two complex systems.

In reading through this thread, about six months ago, I see Brian proposed something as a next step that does not appear to have occurred yet, to gather more information as a next step. I'm wondering if it makes sense to pursue that line of analysis next:

From @behlendorf : It seems likely that we're somehow reading bogus data from the ceph rbd. It would be useful to see what those values are. If you're still interested in chasing this can you try the following patch. It will log the offending value to the console before the crash. It would be useful to run it several times to see if the values remain constant or change.

diff --git a/module/zfs/spa_history.c b/module/zfs/spa_history.c
index 9fb75f3..700f364 100644
--- a/module/zfs/spa_history.c
+++ b/module/zfs/spa_history.c
@@ -223,6 +223,13 @@ spa_history_log_sync(void *arg1, void *arg2, dmu_tx_t *tx)
         */
        VERIFY(0 == dmu_bonus_hold(mos, spa->spa_history, FTAG, &dbp));
        shpp = dbp->db_data;
+#ifdef _KERNEL
+       printk("sh_pool_create_len = %llu\n", shpp->sh_pool_create_len);
+       printk("sh_phys_max_off = %llu\n", shpp->sh_phys_max_off);
+       printk("sh_bof = %llu\n", shpp->sh_bof);
+       printk("sh_eof = %llu\n", shpp->sh_eof);
+       printk("sh_records_losts = %llu\n", shpp->sh_records_lost);
+#endif

        dmu_buf_will_dirty(dbp, tx);

dweeezil · 2013-12-30T03:54:16Z

I just wanted to post a note here to say that I've started actively looking into this problem. I'm occasionally able to reproduce similar problems as the original report but my general observation is that any other forms of chaos can seem to result from running ZFS atop RDB. Unfortunately, I got sidetracked while looking into this and burned a ton of time tracking down the problem described in openzfs/zfs#2010. With that out of the way, hopefully I'm back on track now.

Also, I should mention that this should likely be a ZFS issue rather than an SPL issue.

hvenzke · 2013-12-30T20:55:34Z

@dweeezil Tim , some of the logs i have read about this with zfs+ Ceph RDB said that the ZFS ´s used partion table not supported by PARTED ?!??

What exactly Partion type been set BEVOR you try to make an zpool on the Ceph RDB ?
did you tried sliced `(diskP2 )setup instead of wholedisk(disk) ?

3 . did you tried fdisk on the Ceph RDB disk , type "bf" usage ?

uppon my ZFS skills BF are the default , someone may allowed to correct me .-)

dweeezil · 2013-12-30T22:13:45Z

I'm still trying to get a grip on the actual problem. So far, I'm fairly certain the problem is not simply that the rbd block device behaves differently than do block devices.

@remsnet For my current testbed, I'm generally creating my ZFS pool on a single pre-created partition on the rbd device (actually, my preferred testbed is to dd a known good pool on to my rbd and test from there). I'm hoping to narrow down the problem a bit more within the next day or so once I get more time to look at it.

The failures I'm seeing when performing normal filesystem operations are many and varied. I'm concerned that zfs+rbd is exceeding Linux's kernel stack limit but I've not been able to prove it. I do plan on building a 16K stack kernel as part of my further testing to try to rule it out. Using debugfs' stack_trace feature has been very iffy with wild pointer (NULL or close-to-null) dereferences typically occurring in the ftrace_call() function, itself. I also plan on doing some instrumenting of rbd by itself to get a handle on its "base" stack utilization. The failures I'm seeing are typical of those you'd see when memory (the stack in particular) is overwritten.

I'll post more information as I get it it.

behlendorf · 2014-01-07T00:48:47Z

@dweeezil It's great to see you looking in to this. Stack overun is certainly one possible explanation for this, I could easily believe that the ceph rbd is more stack heavy that other block devices in the kernel. As you said rebuilding your kernel with 16k stacks would be the easiest way to check for this.

commit 178eda2 upstream. It has been reported that using ZFSonLinux on rbd will result in memory corruption. The bug report can be found here: openzfs/spl#241 http://tracker.ceph.com/issues/7790 The reason is that ZFS will send pages with page_count 0 into rbd, which in turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with page_count 0, as it will do get_page and put_page, and erroneously free the page. This type of issue has been noted before, and handled in iscsi, drbd, etc. So, rbd should also handle this. This fix address this issue by fall back to slower sendmsg when page_count 0 detected. Cc: Sage Weil <[email protected]> Cc: Yehuda Sadeh <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Reviewed-by: Ilya Dryomov <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

commit 178eda29ca721842f2146378e73d43e0044c4166 upstream. It has been reported that using ZFSonLinux on rbd will result in memory corruption. The bug report can be found here: openzfs/spl#241 http://tracker.ceph.com/issues/7790 The reason is that ZFS will send pages with page_count 0 into rbd, which in turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with page_count 0, as it will do get_page and put_page, and erroneously free the page. This type of issue has been noted before, and handled in iscsi, drbd, etc. So, rbd should also handle this. This fix address this issue by fall back to slower sendmsg when page_count 0 detected. Cc: Sage Weil <[email protected]> Cc: Yehuda Sadeh <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Reviewed-by: Ilya Dryomov <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 178eda2 upstream. It has been reported that using ZFSonLinux on rbd will result in memory corruption. The bug report can be found here: openzfs/spl#241 http://tracker.ceph.com/issues/7790 The reason is that ZFS will send pages with page_count 0 into rbd, which in turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with page_count 0, as it will do get_page and put_page, and erroneously free the page. This type of issue has been noted before, and handled in iscsi, drbd, etc. So, rbd should also handle this. This fix address this issue by fall back to slower sendmsg when page_count 0 detected. Cc: Sage Weil <[email protected]> Cc: Yehuda Sadeh <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Reviewed-by: Ilya Dryomov <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 178eda29ca721842f2146378e73d43e0044c4166 upstream. It has been reported that using ZFSonLinux on rbd will result in memory corruption. The bug report can be found here: openzfs/spl#241 http://tracker.ceph.com/issues/7790 The reason is that ZFS will send pages with page_count 0 into rbd, which in turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with page_count 0, as it will do get_page and put_page, and erroneously free the page. This type of issue has been noted before, and handled in iscsi, drbd, etc. So, rbd should also handle this. This fix address this issue by fall back to slower sendmsg when page_count 0 detected. Cc: Sage Weil <[email protected]> Cc: Yehuda Sadeh <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Reviewed-by: Ilya Dryomov <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 178eda29ca721842f2146378e73d43e0044c4166 upstream. It has been reported that using ZFSonLinux on rbd will result in memory corruption. The bug report can be found here: openzfs/spl#241 http://tracker.ceph.com/issues/7790 The reason is that ZFS will send pages with page_count 0 into rbd, which in turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with page_count 0, as it will do get_page and put_page, and erroneously free the page. This type of issue has been noted before, and handled in iscsi, drbd, etc. So, rbd should also handle this. This fix address this issue by fall back to slower sendmsg when page_count 0 detected. Cc: Sage Weil <[email protected]> Cc: Yehuda Sadeh <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Reviewed-by: Ilya Dryomov <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Corinna Vinschen <[email protected]>

commit 178eda29ca721842f2146378e73d43e0044c4166 upstream. It has been reported that using ZFSonLinux on rbd will result in memory corruption. The bug report can be found here: openzfs/spl#241 http://tracker.ceph.com/issues/7790 The reason is that ZFS will send pages with page_count 0 into rbd, which in turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with page_count 0, as it will do get_page and put_page, and erroneously free the page. This type of issue has been noted before, and handled in iscsi, drbd, etc. So, rbd should also handle this. This fix address this issue by fall back to slower sendmsg when page_count 0 detected. Cc: Sage Weil <[email protected]> Cc: Yehuda Sadeh <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Reviewed-by: Ilya Dryomov <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 178eda2 upstream. It has been reported that using ZFSonLinux on rbd will result in memory corruption. The bug report can be found here: openzfs/spl#241 http://tracker.ceph.com/issues/7790 The reason is that ZFS will send pages with page_count 0 into rbd, which in turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with page_count 0, as it will do get_page and put_page, and erroneously free the page. This type of issue has been noted before, and handled in iscsi, drbd, etc. So, rbd should also handle this. This fix address this issue by fall back to slower sendmsg when page_count 0 detected. Cc: Sage Weil <[email protected]> Cc: Yehuda Sadeh <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Reviewed-by: Ilya Dryomov <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 178eda29ca721842f2146378e73d43e0044c4166 upstream. It has been reported that using ZFSonLinux on rbd will result in memory corruption. The bug report can be found here: openzfs/spl#241 http://tracker.ceph.com/issues/7790 The reason is that ZFS will send pages with page_count 0 into rbd, which in turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with page_count 0, as it will do get_page and put_page, and erroneously free the page. This type of issue has been noted before, and handled in iscsi, drbd, etc. So, rbd should also handle this. This fix address this issue by fall back to slower sendmsg when page_count 0 detected. Cc: Sage Weil <[email protected]> Cc: Yehuda Sadeh <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Reviewed-by: Ilya Dryomov <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 7acab58bc1ac49ba542d92bfdfd5c3047f1d2a59 Merge: 4340f8095da0 c2f7eb8029e2 Author: Shilin Victor <[email protected]> Date: Sat Feb 10 21:44:03 2018 +0300 Merge tag 'v3.10.42' into linux-3.10.y This is the 3.10.42 stable release commit c2f7eb8029e23c4f5445340d8fc0d05367538e6d Author: Greg Kroah-Hartman <[email protected]> Date: Sat Jun 7 13:48:31 2014 -0700 Linux 3.10.42 commit efccdcdb63a7f7cc7cc1816f0d5e2524eb084c72 Author: Thomas Gleixner <[email protected]> Date: Tue Jun 3 12:27:08 2014 +0000 futex: Make lookup_pi_state more robust commit 54a217887a7b658e2650c3feff22756ab80c7339 upstream. The current implementation of lookup_pi_state has ambigous handling of the TID value 0 in the user space futex. We can get into the kernel even if the TID value is 0, because either there is a stale waiters bit or the owner died bit is set or we are called from the requeue_pi path or from user space just for fun. The current code avoids an explicit sanity check for pid = 0 in case that kernel internal state (waiters) are found for the user space address. This can lead to state leakage and worse under some circumstances. Handle the cases explicit: Waiter | pi_state | pi->owner | uTID | uODIED | ? [1] NULL | --- | --- | 0 | 0/1 | Valid [2] NULL | --- | --- | >0 | 0/1 | Valid [3] Found | NULL | -- | Any | 0/1 | Invalid [4] Found | Found | NULL | 0 | 1 | Valid [5] Found | Found | NULL | >0 | 1 | Invalid [6] Found | Found | task | 0 | 1 | Valid [7] Found | Found | NULL | Any | 0 | Invalid [8] Found | Found | task | ==taskTID | 0/1 | Valid [9] Found | Found | task | 0 | 0 | Invalid [10] Found | Found | task | !=taskTID | 0/1 | Invalid [1] Indicates that the kernel can acquire the futex atomically. We came came here due to a stale FUTEX_WAITERS/FUTEX_OWNER_DIED bit. [2] Valid, if TID does not belong to a kernel thread. If no matching thread is found then it indicates that the owner TID has died. [3] Invalid. The waiter is queued on a non PI futex [4] Valid state after exit_robust_list(), which sets the user space value to FUTEX_WAITERS | FUTEX_OWNER_DIED. [5] The user space value got manipulated between exit_robust_list() and exit_pi_state_list() [6] Valid state after exit_pi_state_list() which sets the new owner in the pi_state but cannot access the user space value. [7] pi_state->owner can only be NULL when the OWNER_DIED bit is set. [8] Owner and user space value match [9] There is no transient state which sets the user space TID to 0 except exit_robust_list(), but this is indicated by the FUTEX_OWNER_DIED bit. See [4] [10] There is no transient state which leaves owner and user space TID out of sync. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Kees Cook <[email protected]> Cc: Will Drewry <[email protected]> Cc: Darren Hart <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit 9ad5dabd87e8dd5506529e12e4e8c7b25fb88d7a Author: Thomas Gleixner <[email protected]> Date: Tue Jun 3 12:27:07 2014 +0000 futex: Always cleanup owner tid in unlock_pi commit 13fbca4c6ecd96ec1a1cfa2e4f2ce191fe928a5e upstream. If the owner died bit is set at futex_unlock_pi, we currently do not cleanup the user space futex. So the owner TID of the current owner (the unlocker) persists. That's observable inconsistant state, especially when the ownership of the pi state got transferred. Clean it up unconditionally. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Kees Cook <[email protected]> Cc: Will Drewry <[email protected]> Cc: Darren Hart <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit 63d6ad59dd43f44249150aa8c72eeb01bbe0a599 Author: Thomas Gleixner <[email protected]> Date: Tue Jun 3 12:27:06 2014 +0000 futex: Validate atomic acquisition in futex_lock_pi_atomic() commit b3eaa9fc5cd0a4d74b18f6b8dc617aeaf1873270 upstream. We need to protect the atomic acquisition in the kernel against rogue user space which sets the user space futex to 0, so the kernel side acquisition succeeds while there is existing state in the kernel associated to the real owner. Verify whether the futex has waiters associated with kernel state. If it has, return -EINVAL. The state is corrupted already, so no point in cleaning it up. Subsequent calls will fail as well. Not our problem. [ tglx: Use futex_top_waiter() and explain why we do not need to try restoring the already corrupted user space state. ] Signed-off-by: Darren Hart <[email protected]> Cc: Kees Cook <[email protected]> Cc: Will Drewry <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit b58623fb64ff0454ec20bce7a02275a20c23086d Author: Thomas Gleixner <[email protected]> Date: Tue Jun 3 12:27:06 2014 +0000 futex-prevent-requeue-pi-on-same-futex.patch futex: Forbid uaddr == uaddr2 in futex_requeue(..., requeue_pi=1) commit e9c243a5a6de0be8e584c604d353412584b592f8 upstream. If uaddr == uaddr2, then we have broken the rule of only requeueing from a non-pi futex to a pi futex with this call. If we attempt this, then dangling pointers may be left for rt_waiter resulting in an exploitable condition. This change brings futex_requeue() in line with futex_wait_requeue_pi() which performs the same check as per commit 6f7b0a2a5c0f ("futex: Forbid uaddr == uaddr2 in futex_wait_requeue_pi()") [ tglx: Compare the resulting keys as well, as uaddrs might be different depending on the mapping ] Fixes CVE-2014-3153. Reported-by: Pinkie Pie Signed-off-by: Will Drewry <[email protected]> Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Darren Hart <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit 4237cc8ef3fc3916c337423cbaab818890e628c8 Author: Stanislaw Gruszka <[email protected]> Date: Wed Feb 19 13:15:17 2014 +0100 ath9k: protect tid->sched check [ Upstream commit 21f8aaee0c62708654988ce092838aa7df4d25d8 ] We check tid->sched without a lock taken on ath_tx_aggr_sleep(). That is race condition which can result of doing list_del(&tid->list) twice (second time with poisoned list node) and cause crash like shown below: [424271.637220] BUG: unable to handle kernel paging request at 00100104 [424271.637328] IP: [<f90fc072>] ath_tx_aggr_sleep+0x62/0xe0 [ath9k] ... [424271.639953] Call Trace: [424271.639998] [<f90f6900>] ? ath9k_get_survey+0x110/0x110 [ath9k] [424271.640083] [<f90f6942>] ath9k_sta_notify+0x42/0x50 [ath9k] [424271.640177] [<f809cfef>] sta_ps_start+0x8f/0x1c0 [mac80211] [424271.640258] [<c10f730e>] ? free_compound_page+0x2e/0x40 [424271.640346] [<f809e915>] ieee80211_rx_handlers+0x9d5/0x2340 [mac80211] [424271.640437] [<c112f048>] ? kmem_cache_free+0x1d8/0x1f0 [424271.640510] [<c1345a84>] ? kfree_skbmem+0x34/0x90 [424271.640578] [<c10fc23c>] ? put_page+0x2c/0x40 [424271.640640] [<c1345a84>] ? kfree_skbmem+0x34/0x90 [424271.640706] [<c1345a84>] ? kfree_skbmem+0x34/0x90 [424271.640787] [<f809dde3>] ? ieee80211_rx_handlers_result+0x73/0x1d0 [mac80211] [424271.640897] [<f80a07a0>] ieee80211_prepare_and_rx_handle+0x520/0xad0 [mac80211] [424271.641009] [<f809e22d>] ? ieee80211_rx_handlers+0x2ed/0x2340 [mac80211] [424271.641104] [<c13846ce>] ? ip_output+0x7e/0xd0 [424271.641182] [<f80a1057>] ieee80211_rx+0x307/0x7c0 [mac80211] [424271.641266] [<f90fa6ee>] ath_rx_tasklet+0x88e/0xf70 [ath9k] [424271.641358] [<f80a0f2c>] ? ieee80211_rx+0x1dc/0x7c0 [mac80211] [424271.641445] [<f90f82db>] ath9k_tasklet+0xcb/0x130 [ath9k] Bug report: https://bugzilla.kernel.org/show_bug.cgi?id=70551 Reported-and-tested-by: Max Sydorenko <[email protected]> Signed-off-by: Stanislaw Gruszka <[email protected]> Signed-off-by: John W. Linville <[email protected]> [ xl: backported to 3.10: adjusted context ] Signed-off-by: Xiangyu Lu <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit 3c3fa08f4c7770ad35bb10755fb9b1c80e34dee4 Author: Guennadi Liakhovetski <[email protected]> Date: Sat Apr 26 12:51:31 2014 -0300 media: V4L2: fix VIDIOC_CREATE_BUFS in 64- / 32-bit compatibility mode commit 97d9d23dda6f37d90aefeec4ed619d52df525382 upstream. If a struct contains 64-bit fields, it is aligned on 64-bit boundaries within containing structs in 64-bit compilations. This is the case with struct v4l2_window, which contains pointers and is embedded into struct v4l2_format, and that one is embedded into struct v4l2_create_buffers. Unlike some other structs, used as a part of the kernel ABI as ioctl() arguments, that are packed, these structs aren't packed. This isn't a problem per se, but the ioctl-compat code for VIDIOC_CREATE_BUFS contains a bug, that triggers in such 64-bit builds. That code wrongly assumes, that in struct v4l2_create_buffers, struct v4l2_format immediately follows the __u32 memory field, which in fact isn't the case. This bug wasn't visible until now, because until recently hardly any applications used this ioctl() and mostly embedded 32-bit only drivers implemented it. This is changing now with addition of this ioctl() to some USB drivers, e.g. UVC. This patch fixes the bug by copying parts of struct v4l2_create_buffers separately. Signed-off-by: Guennadi Liakhovetski <[email protected]> Acked-by: Laurent Pinchart <[email protected]> Signed-off-by: Mauro Carvalho Chehab <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit 2e008074b2f19ba550393e3a33334fd1dd5da082 Author: Guennadi Liakhovetski <[email protected]> Date: Mon Apr 14 10:49:34 2014 -0300 media: V4L2: ov7670: fix a wrong index, potentially Oopsing the kernel from user-space commit cfece5857ca51d1dcdb157017aba226f594e9dcf upstream. Commit 75e2bdad8901a0b599e01a96229be922eef1e488 "ov7670: allow configuration of image size, clock speed, and I/O method" uses a wrong index to iterate an array. Apart from being wrong, it also uses an unchecked value from user-space, which can cause access to unmapped memory in the kernel, triggered by a normal desktop user with rights to use V4L2 devices. Signed-off-by: Guennadi Liakhovetski <[email protected]> Acked-by: Jonathan Corbet <[email protected]> Signed-off-by: Mauro Carvalho Chehab <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit 4f792a2972e6f320484abfc940f978177131facc Author: Antti Palosaari <[email protected]> Date: Thu Apr 10 21:18:16 2014 -0300 media: fc2580: fix tuning failure on 32-bit arch commit 8845cc6415ec28ef8d57b3fb81c75ef9bce69c5f upstream. There was some frequency calculation overflows which caused tuning failure on 32-bit architecture. Use 64-bit numbers where needed in order to avoid calculation overflows. Thanks for the Finnish person, who asked remain anonymous, reporting, testing and suggesting the fix. Signed-off-by: Antti Palosaari <[email protected]> Signed-off-by: Mauro Carvalho Chehab <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit 0a4e3565df0c91bf0f7a68dee09e45c9d9b2d360 Author: Alex Williamson <[email protected]> Date: Tue Apr 22 10:08:40 2014 -0600 iommu/amd: Fix interrupt remapping for aliased devices commit e028a9e6b8a637af09ac4114083280df4a7045f1 upstream. An apparent cut and paste error prevents the correct flags from being set on the alias device resulting in MSI on conventional PCI devices failing to work. This also produces error events from the IOMMU like: AMD-Vi: Event logged [INVALID_DEVICE_REQUEST device=00:14.4 address=0x000000fdf8000000 flags=0x0a00] Where 14.4 is a PCIe-to-PCI bridge with a device behind it trying to use MSI interrupts. Signed-off-by: Alex Williamson <[email protected]> Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit a757a4e215574f2c92fc990275fa5e02159771e1 Author: Chunwei Chen <[email protected]> Date: Wed Apr 23 12:35:09 2014 +0800 libceph: fix corruption when using page_count 0 page in rbd commit 178eda29ca721842f2146378e73d43e0044c4166 upstream. It has been reported that using ZFSonLinux on rbd will result in memory corruption. The bug report can be found here: https://github.com/zfsonlinux/spl/issues/241 http://tracker.ceph.com/issues/7790 The reason is that ZFS will send pages with page_count 0 into rbd, which in turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with page_count 0, as it will do get_page and put_page, and erroneously free the page. This type of issue has been noted before, and handled in iscsi, drbd, etc. So, rbd should also handle this. This fix address this issue by fall back to slower sendmsg when page_count 0 detected. Cc: Sage Weil <[email protected]> Cc: Yehuda Sadeh <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Reviewed-by: Ilya Dryomov <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit 534cc5572c710370d2bfe4e6b382950fd52c2c00 Author: Guenter Roeck <[email protected]> Date: Thu May 15 09:33:42 2014 -0700 powerpc: Fix 64 bit builds with binutils 2.24 commit 7998eb3dc700aaf499f93f50b3d77da834ef9e1d upstream. With binutils 2.24, various 64 bit builds fail with relocation errors such as arch/powerpc/kernel/built-in.o: In function `exc_debug_crit_book3e': (.text+0x165ee): relocation truncated to fit: R_PPC64_ADDR16_HI against symbol `interrupt_base_book3e' defined in .text section in arch/powerpc/kernel/built-in.o arch/powerpc/kernel/built-in.o: In function `exc_debug_crit_book3e': (.text+0x16602): relocation truncated to fit: R_PPC64_ADDR16_HI against symbol `interrupt_end_book3e' defined in .text section in arch/powerpc/kernel/built-in.o The assembler maintainer says: I changed the ABI, something that had to be done but unfortunately happens to break the booke kernel code. When building up a 64-bit value with lis, ori, shl, oris, ori or similar sequences, you now should use @high and @higha in place of @h and @ha. @h and @ha (and their associated relocs R_PPC64_ADDR16_HI and R_PPC64_ADDR16_HA) now report overflow if the value is out of 32-bit signed range. ie. @h and @ha assume you're building a 32-bit value. This is needed to report out-of-range -mcmodel=medium toc pointer offsets in @toc@h and @toc@ha expressions, and for consistency I did the same for all other @h and @ha relocs. Replacing @h with @high in one strategic location fixes the relocation errors. This has to be done conditionally since the assembler either supports @h or @high but not both. Signed-off-by: Guenter Roeck <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit c99612d30ffcdf6ff41281a84e8df2a56c8b7d20 Author: Harald Freudenberger <[email protected]> Date: Wed May 7 16:51:29 2014 +0200 crypto: s390 - fix aes,des ctr mode concurrency finding. commit 3901c1124ec5099254a9396085f7798153a7293f upstream. An additional testcase found an issue with the last series of patches applied: the fallback solution may not save the iv value after operation. This very small fix just makes sure the iv is copied back to the walk/desc struct. Signed-off-by: Harald Freudenberger <[email protected]> Signed-off-by: Herbert Xu <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit d1ae1920b53e00849397c3f6f63dace219de46a0 Author: Horia Geanta <[email protected]> Date: Fri Apr 18 13:01:42 2014 +0300 crypto: caam - add allocation failure handling in SPRINTFCAT macro commit 27c5fb7a84242b66bf1e0b2fe6bf40d19bcc5c04 upstream. GFP_ATOMIC memory allocation could fail. In this case, avoid NULL pointer dereference and notify user. Cc: Kim Phillips <[email protected]> Signed-off-by: Horia Geanta <[email protected]> Signed-off-by: Herbert Xu <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit a0d3102153fc5d9cf8bb49b62ac9655f9f63b493 Author: Olof Johansson <[email protected]> Date: Fri Apr 11 15:19:41 2014 -0700 i2c: s3c2410: resume race fix commit ce78cc071f5f541480e381cc0241d37590041a9d upstream. Don't unmark the device as suspended until after it's been re-setup. The main race would be w.r.t. an i2c driver that gets resumed at the same time (asyncronously), that is allowed to do a transfer since suspended is set to 0 before reinit, but really should have seen the -EIO return instead. Signed-off-by: Olof Johansson <[email protected]> Signed-off-by: Doug Anderson <[email protected]> Acked-by: Kukjin Kim <[email protected]> Signed-off-by: Wolfram Sang <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit a6b6cde1481125b886f726756a3364be0fb9f93e Author: Du, Wenkai <[email protected]> Date: Thu Apr 10 23:03:19 2014 +0000 i2c: designware: Mask all interrupts during i2c controller enable commit 47bb27e78867997040a228328f2a631c3c7f2c82 upstream. There have been "i2c_designware 80860F41:00: controller timed out" errors on a number of Baytrail platforms. The issue is caused by incorrect value in Interrupt Mask Register (DW_IC_INTR_MASK) when i2c core is being enabled. This causes call to __i2c_dw_enable() to immediately start the transfer which leads to timeout. There are 3 failure modes observed: 1. Failure in S0 to S3 resume path The default value after reset for DW_IC_INTR_MASK is 0x8ff. When we start the first transaction after resuming from system sleep, TX_EMPTY interrupt is already unmasked because of the hardware default. 2. Failure in normal operational path This failure happens rarely and is hard to reproduce. Debug trace showed that DW_IC_INTR_MASK had value of 0x254 when failure occurred, which meant TX_EMPTY was unmasked. 3. Failure in S3 to S0 suspend path This failure also happens rarely and is hard to reproduce. Adding debug trace that read DW_IC_INTR_MASK made this failure not reproducible. But from ISR call trace we could conclude TX_EMPTY was unmasked when problem occurred. The patch masks all interrupts before the controller is enabled to resolve the faulty DW_IC_INTR_MASK conditions. Signed-off-by: Wenkai Du <[email protected]> Acked-by: Mika Westerberg <[email protected]> [wsa: improved the comment and removed typo in commit msg] Signed-off-by: Wolfram Sang <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit 670a6ed522e0fa814d74547cb95fbc4854660474 Author: Wolfram Sang <[email protected]> Date: Mon May 5 18:36:21 2014 +0200 i2c: rcar: bail out on zero length transfers commit d7653964c590ba846aa11a8f6edf409773cbc492 upstream. This hardware does not support zero length transfers. Instead, the driver does one (random) byte transfers currently with undefined results for the slaves. We now bail out. Signed-off-by: Wolfram Sang <[email protected]> Signed-off-by: Wolfram Sang <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit feed5a88f45f26e4fda34838b4929536d4a8e775 Author: Hans de Goede <[email protected]> Date: Mon May 5 11:38:09 2014 +0200 ACPI / blacklist: Add dmi_enable_osi_linux quirk for Asus EEE PC 1015PX commit f6e6e1b9fee88c90586787b71dc49bb3ce62bb89 upstream. Without this this EEE PC exports a non working WMI interface, with this it exports a working "good old" eeepc_laptop interface, fixing brightness control not working as well as rfkill being stuck in a permanent wireless blocked state. This is not an ideal way to fix this, but various attempts to fix this otherwise have failed, see: References: https://bugzilla.redhat.com/show_bug.cgi?id=1067181 Reported-and-tested-by: [email protected] Signed-off-by: Hans de Goede <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit a40aac07285bdf77ed67af1423676ff5548ef51b Author: Levente Kurusa <[email protected]> Date: Tue May 6 15:57:48 2014 +0200 libata: clean up ZPODD when a port is detached commit a6f9bf4d2f965b862b95213303d154e02957eed8 upstream. When a ZPODD device is unbound via sysfs, the ACPI notify handler is not removed. This causes panics as observed in Bug #74601. The panic only happens when the wake happens from outside the kernel (i.e. inserting a media or pressing a button). Add a loop to ata_port_detach which loops through the port's devices and checks if zpodd is enabled, if so call zpodd_exit. Reviewed-by: Aaron Lu <[email protected]> Signed-off-by: Levente Kurusa <[email protected]> Signed-off-by: Tejun Heo <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit 019c8ec9e3c6a3616f10be9eda359f1927d1f8b1 Author: Mikulas Patocka <[email protected]> Date: Thu Feb 20 18:01:01 2014 -0500 dm crypt: fix cpu hotplug crash by removing per-cpu structure commit 610f2de3559c383caf8fbbf91e9968102dff7ca0 upstream. The DM crypt target used per-cpu structures to hold pointers to a ablkcipher_request structure. The code assumed that the work item keeps executing on a single CPU, so it didn't use synchronization when accessing this structure. If a CPU is disabled by writing 0 to /sys/devices/system/cpu/cpu*/online, the work item could be moved to another CPU. This causes dm-crypt crashes, like the following, because the code starts using an incorrect ablkcipher_request: smpboot: CPU 7 is now offline BUG: unable to handle kernel NULL pointer dereference at 0000000000000130 IP: [<ffffffffa1862b3d>] crypt_convert+0x12d/0x3c0 [dm_crypt] ... Call Trace: [<ffffffffa1864415>] ? kcryptd_crypt+0x305/0x470 [dm_crypt] [<ffffffff81062060>] ? finish_task_switch+0x40/0xc0 [<ffffffff81052a28>] ? process_one_work+0x168/0x470 [<ffffffff8105366b>] ? worker_thread+0x10b/0x390 [<ffffffff81053560>] ? manage_workers.isra.26+0x290/0x290 [<ffffffff81058d9f>] ? kthread+0xaf/0xc0 [<ffffffff81058cf0>] ? kthread_create_on_node+0x120/0x120 [<ffffffff813464ac>] ? ret_from_fork+0x7c/0xb0 [<ffffffff81058cf0>] ? kthread_create_on_node+0x120/0x120 Fix this bug by removing the per-cpu definition. The structure ablkcipher_request is accessed via a pointer from convert_context. Consequently, if the work item is rescheduled to a different CPU, the thread still uses the same ablkcipher_request. This change may undermine performance improvements intended by commit c0297721 ("dm crypt: scale to multiple cpus") on select hardware. In practice no performance difference was observed on recent hardware. But regardless, correctness is more important than performance. Signed-off-by: Mikulas Patocka <[email protected]> Signed-off-by: Mike Snitzer <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit aece4fa7368debd14ac07ebaf569587ff02cc596 Author: Michael Neuling <[email protected]> Date: Mon Mar 3 14:21:40 2014 +1100 powerpc/tm: Fix crash when forking inside a transaction commit 621b5060e823301d0cba4cb52a7ee3491922d291 upstream. When we fork/clone we currently don't copy any of the TM state to the new thread. This results in a TM bad thing (program check) when the new process is switched in as the kernel does a tmrechkpt with TEXASR FS not set. Also, since R1 is from userspace, we trigger the bad kernel stack pointer detection. So we end up with something like this: Bad kernel stack pointer 0 at c0000000000404fc cpu 0x2: Vector: 700 (Program Check) at [c00000003ffefd40] pc: c0000000000404fc: restore_gprs+0xc0/0x148 lr: 0000000000000000 sp: 0 msr: 9000000100201030 current = 0xc000001dd1417c30 paca = 0xc00000000fe00800 softe: 0 irq_happened: 0x01 pid = 0, comm = swapper/2 WARNING: exception is not recoverable, can't continue The below fixes this by flushing the TM state before we copy the task_struct to the clone. To do this we go through the tmreclaim patch, which removes the checkpointed registers from the CPU and transitions the CPU out of TM suspend mode. Hence we need to call tmrechkpt after to restore the checkpointed state and the TM mode for the current task. To make this fail from userspace is simply: tbegin li r0, 2 sc <boom> Kudos to Adhemerval Zanella Neto for finding this. Signed-off-by: Michael Neuling <[email protected]> cc: Adhemerval Zanella Neto <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]> [Backported to 3.10: context adjust] Signed-off-by: Xue Liu <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit c9d6d5c009e96eb257d36b5c43d9c2d94f02cbf8 Author: Andy Grover <[email protected]> Date: Wed May 14 15:48:06 2014 -0700 target: Don't allow setting WC emulation if device doesn't support commit 07b8dae38b09bcfede7e726f172e39b5ce8390d9 upstream. Just like for pSCSI, if the transport sets get_write_cache, then it is not valid to enable write cache emulation for it. Return an error. see https://bugzilla.redhat.com/show_bug.cgi?id=1082675 Reviewed-by: Chris Leech <[email protected]> Signed-off-by: Andy Grover <[email protected]> Signed-off-by: Nicholas Bellinger <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit 8a2629ad0ba7902262df7ae980265d8f93e2dfc2 Author: Sagi Grimberg <[email protected]> Date: Tue Apr 29 13:13:45 2014 +0300 Target/iser: Fix iscsit_accept_np and rdma_cm racy flow commit 531b7bf4bd795d9a09eac92504322a472c010bc8 upstream. RDMA CM and iSCSI target flows are asynchronous and completely uncorrelated. Relying on the fact that iscsi_accept_np will be called after CM connection request event and will wait for it is a mistake. When attempting to login to a few targets this flow is racy and unpredictable, but for parallel login to dozens of targets will race and hang every time. The correct synchronizing mechanism in this case is pending on a semaphore rather than a wait_for_event. We keep the pending interruptible for iscsi_np cleanup stage. (Squash patch to remove dead code into parent - nab) Reported-by: Slava Shwartsman <[email protected]> Signed-off-by: Sagi Grimberg <[email protected]> Signed-off-by: Nicholas Bellinger <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit 5de94f8f4acfce3ff79592cb1d6c58ae6c0b420e Author: Sagi Grimberg <[email protected]> Date: Tue Apr 29 13:13:44 2014 +0300 Target/iser: Fix wrong connection requests list addition commit 9fe63c88b1d59f1ce054d6948ccd3096496ecedb upstream. Should be adding list_add_tail($new, $head) and not the other way around. Signed-off-by: Sagi Grimberg <[email protected]> Signed-off-by: Nicholas Bellinger <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit d0e845f6565ceed9ff36b95ac68cf705b97a5844 Author: Marcel Apfelbaum <[email protected]> Date: Thu May 15 12:42:49 2014 -0600 PCI: shpchp: Check bridge's secondary (not primary) bus speed commit 93fa9d32670f5592c8e56abc9928fc194e1e72fc upstream. When a new device is added below a hotplug bridge, the bridge's secondary bus speed and the device's bus speed must match. The shpchp driver previously checked the bridge's *primary* bus speed, not the secondary bus speed. This caused hot-add errors like: shpchp 0000:00:03.0: Speed of bus ff and adapter 0 mismatch Check the secondary bus speed instead. [bhelgaas: changelog] Link: https://bugzilla.kernel.org/show_bug.cgi?id=75251 Fixes: 3749c51ac6c1 ("PCI: Make current and maximum bus speeds part of the PCI core") Signed-off-by: Marcel Apfelbaum <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit 55d9b08514ede9334e443337e9bf6181a3f8b114 Author: Arnd Bergmann <[email protected]> Date: Wed Apr 23 14:49:17 2014 +0200 genirq: Provide irq_force_affinity fallback for non-SMP commit 4c88d7f9b0d5fb0588c3386be62115cc2eaa8f9f upstream. Patch 01f8fa4f01d "genirq: Allow forcing cpu affinity of interrupts" added an irq_force_affinity() function, and 30ccf03b4a6 "clocksource: Exynos_mct: Use irq_force_affinity() in cpu bringup" subsequently uses it. However, the driver can be used with CONFIG_SMP disabled, but the function declaration is only available for CONFIG_SMP, leading to this build error: drivers/clocksource/exynos_mct.c:431:3: error: implicit declaration of function 'irq_force_affinity' [-Werror=implicit-function-declaration] irq_force_affinity(mct_irqs[MCT_L0_IRQ + cpu], cpumask_of(cpu)); This patch introduces a dummy helper function for the non-SMP case that always returns success, to get rid of the build error. Since the patches causing the problem are marked for stable backports, this one should be as well. Signed-off-by: Arnd Bergmann <[email protected]> Cc: Krzysztof Kozlowski <[email protected]> Acked-by: Kukjin Kim <[email protected]> Link: http://lkml.kernel.org/r/5619084.0zmrrIUZLV@wuerfel Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit 62dcb5801ac032188a20dc45e6d7b682a028adcf Author: Linus Torvalds <[email protected]> Date: Wed May 14 16:33:54 2014 -0700 x86-64, modify_ldt: Make support for 16-bit segments a runtime option commit fa81511bb0bbb2b1aace3695ce869da9762624ff upstream. Checkin: b3b42ac2cbae x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels disabled 16-bit segments on 64-bit kernels due to an information leak. However, it does seem that people are genuinely using Wine to run old 16-bit Windows programs on Linux. A proper fix for this ("espfix64") is coming in the upcoming merge window, but as a temporary fix, create a sysctl to allow the administrator to re-enable support for 16-bit segments. It adds a "/proc/sys/abi/ldt16" sysctl that defaults to zero (off). If you hit this issue and care about your old Windows program more than you care about a kernel stack address information leak, you can do echo 1 > /proc/sys/abi/ldt16 as root (add it to your startup scripts), and you should be ok. The sysctl table is only added if you have COMPAT support enabled on x86-64, but I assume anybody who runs old windows binaries very much does that ;) Signed-off-by: H. Peter Anvin <[email protected]> Link: http://lkml.kernel.org/r/CA%2B55aFw9BPoD10U1LfHbOMpHWZkvJTkMcfCs9s3urPr1YyWBxw@mail.gmail.com Signed-off-by: Greg Kroah-Hartman <[email protected]> commit 56ecdc3d9e5b91f411e6f3ba63229d332b54af8e Author: James Hogan <[email protected]> Date: Tue May 13 23:58:24 2014 +0100 metag: Reduce maximum stack size to 256MB commit d71f290b4e98a39f49f2595a13be3b4d5ce8e1f1 upstream. Specify the maximum stack size for arches where the stack grows upward (parisc and metag) in asm/processor.h rather than hard coding in fs/exec.c so that metag can specify a smaller value of 256MB rather than 1GB. This fixes a BUG on metag if the RLIMIT_STACK hard limit is increased beyond a safe value by root. E.g. when starting a process after running "ulimit -H -s unlimited" it will then attempt to use a stack size of the maximum 1GB which is far too big for metag's limited user virtual address space (stack_top is usually 0x3ffff000): BUG: failure at fs/exec.c:589/shift_arg_pages()! Signed-off-by: James Hogan <[email protected]> Cc: Helge Deller <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: [email protected] Cc: [email protected] Cc: John David Anglin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit 44563045712ee4f385b5f3814d69fc73b5f22288 Author: Mikulas Patocka <[email protected]> Date: Thu May 8 15:51:37 2014 -0400 metag: fix memory barriers commit 2425ce84026c385b73ae72039f90d042d49e0394 upstream. Volatile access doesn't really imply the compiler barrier. Volatile access is only ordered with respect to other volatile accesses, it isn't ordered with respect to general memory accesses. Gcc may reorder memory accesses around volatile access, as we can see in this simple example (if we compile it with optimization, both increments of *b will be collapsed to just one): void fn(volatile int *a, long *b) { (*b)++; *a = 10; (*b)++; } Consequently, we need the compiler barrier after a write to the volatile variable, to make sure that the compiler doesn't reorder the volatile write with something else. Signed-off-by: Mikulas Patocka <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Signed-off-by: James Hogan <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit aece7dc95409f8934281954a7e82ddf55b765913 Author: Charles Keepax <[email protected]> Date: Tue May 13 13:45:15 2014 +0100 ASoC: wm8962: Update register CLASS_D_CONTROL_1 to be non-volatile commit 44330ab516c15dda8a1e660eeaf0003f84e43e3f upstream. The register CLASS_D_CONTROL_1 is marked as volatile because it contains a bit, DAC_MUTE, which is also mirrored in the ADC_DAC_CONTROL_1 register. This causes problems for the "Speaker Switch" control, which will report an error if the CODEC is suspended because it relies on a volatile register. To resolve this issue mark CLASS_D_CONTROL_1 as non-volatile and manually keep the register cache in sync by updating both bits when changing the mute status. Reported-by: Shawn Guo <[email protected]> Signed-off-by: Charles Keepax <[email protected]> Tested-by: Shawn Guo <[email protected]> Signed-off-by: Mark Brown <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit d642daf637d02dacf216d7fd9da7532a4681cfd3 Author: Roger Pau Monne <[email protected]> Date: Tue Oct 29 18:31:14 2013 +0100 xen-blkfront: restore the non-persistent data path commit bfe11d6de1c416cea4f3f0f35f864162063ce3fa upstream. When persistent grants were added they were always used, even if the backend doesn't have this feature (there's no harm in always using the same set of pages). This restores the old data path when the backend doesn't have persistent grants, removing the burden of doing a memcpy when it is not actually needed. Signed-off-by: Roger Pau Monné <[email protected]> Reported-by: Felipe Franciosi <[email protected]> Cc: Felipe Franciosi <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: David Vrabel <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]> [v2: Fix up whitespace issues] Tested-by: Felipe Franciosi <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit ba4abe2e7f32f6d7fe3d92eeb4b1748c10b5f601 Author: Roger Pau Monne <[email protected]> Date: Mon Aug 12 12:53:44 2013 +0200 xen-blkfront: revoke foreign access for grants not mapped by the backend commit fbe363c476afe8ec992d3baf682670a4bd1b6ce6 upstream. There's no need to keep the foreign access in a grant if it is not persistently mapped by the backend. This allows us to free grants that are not mapped by the backend, thus preventing blkfront from hoarding all grants. The main effect of this is that blkfront will only persistently map the same grants as the backend, and it will always try to use grants that are already mapped by the backend. Also the number of persistent grants in blkfront is the same as in blkback (and is controlled by the value in blkback). Signed-off-by: Roger Pau Monné <[email protected]> Reviewed-by: David Vrabel <[email protected]> Acked-by: Matt Wilson <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: David Vrabel <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]> Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit 46c0326164c98e556c35c3eb240273595d43425d Author: Jianyu Zhan <[email protected]> Date: Mon Apr 14 13:47:40 2014 +0800 percpu: make pcpu_alloc_chunk() use pcpu_mem_free() instead of kfree() commit 5a838c3b60e3a36ade764cf7751b8f17d7c9c2da upstream. pcpu_chunk_struct_size = sizeof(struct pcpu_chunk) + BITS_TO_LONGS(pcpu_unit_pages) * sizeof(unsigned long) It hardly could be ever bigger than PAGE_SIZE even for large-scale machine, but for consistency with its couterpart pcpu_mem_zalloc(), use pcpu_mem_free() instead. Commit b4916cb17c26 ("percpu: make pcpu_free_chunk() use pcpu_mem_free() instead of kfree()") addressed this problem, but missed this one. tj: commit message updated Signed-off-by: Jianyu Zhan <[email protected]> Signed-off-by: Tejun Heo <[email protected]> Fixes: 099a19d91ca4 ("percpu: allow limited allocation before slab is online) Signed-off-by: Greg Kroah-Hartman <[email protected]> commit 19d65166742f901cc14290494560f6b224cd2d2b Author: Thomas Petazzoni <[email protected]> Date: Fri Apr 18 14:19:52 2014 +0200 bus: mvebu-mbus: allow several windows with the same target/attribute commit b566e782be32145664d96ada3e389f17d32742e5 upstream. Having multiple windows with the same target and attribute is actually legal, and can be useful for PCIe windows, when PCIe BARs have a size that isn't a power of two, and we therefore need to create several MBus windows to cover the PCIe BAR for a given PCIe interface. Fixes: fddddb52a6c4 ('bus: introduce an Marvell EBU MBus driver') Signed-off-by: Thomas Petazzoni <[email protected]> Link: https://lkml.kernel.org/r/1397823593-1932-7-git-send-email-thomas.petazzoni@free-electrons.com Tested-by: Neil Greatorex <[email protected]> Signed-off-by: Jason Cooper <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit f56fb0d42b47b87b12c4936a77429d9dd1c7c4c6 Author: Lai Jiangshan <[email protected]> Date: Fri Apr 18 11:04:16 2014 -0400 workqueue: make rescuer_thread() empty wq->maydays list before exiting commit 4d595b866d2c653dc90a492b9973a834eabfa354 upstream. After a @pwq is scheduled for emergency execution, other workers may consume the affectd work items before the rescuer gets to them. This means that a workqueue many have pwqs queued on @wq->maydays list while not having any work item pending or in-flight. If destroy_workqueue() executes in such condition, the rescuer may exit without emptying @wq->maydays. This currently doesn't cause any actual harm. destroy_workqueue() can safely destroy all the involved data structures whether @wq->maydays is populated or not as nobody access the list once the rescuer exits. However, this is nasty and makes future development difficult. Let's update rescuer_thread() so that it empties @wq->maydays after seeing should_stop to guarantee that the list is empty on rescuer exit. tj: Updated comment and patch description. Signed-off-by: Lai Jiangshan <[email protected]> Signed-off-by: Tejun Heo <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit aac8b37ffaa2bacc0430aa7b45c7d3aad22209fc Author: Lai Jiangshan <[email protected]> Date: Fri Apr 18 11:04:16 2014 -0400 workqueue: fix a possible race condition between rescuer and pwq-release commit 77668c8b559e4fe2acf2a0749c7c83cde49a5025 upstream. There is a race condition between rescuer_thread() and pwq_unbound_release_workfn(). Even after a pwq is scheduled for rescue, the associated work items may be consumed by any worker. If all of them are consumed before the rescuer gets to them and the pwq's base ref was put due to attribute change, the pwq may be released while still being linked on @wq->maydays list making the rescuer dereference already freed pwq later. Make send_mayday() pin the target pwq until the rescuer is done with it. tj: Updated comment and patch description. Signed-off-by: Lai Jiangshan <[email protected]> Signed-off-by: Tejun Heo <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit 55a3dfcc84ab3dc82708d93cd0bca4a0aad7715c Author: Daeseok Youn <[email protected]> Date: Wed Apr 16 14:32:29 2014 +0900 workqueue: fix bugs in wq_update_unbound_numa() failure path commit 77f300b198f93328c26191b52655ce1b62e202cf upstream. wq_update_unbound_numa() failure path has the following two bugs. - alloc_unbound_pwq() is called without holding wq->mutex; however, if the allocation fails, it jumps to out_unlock which tries to unlock wq->mutex. - The function should switch to dfl_pwq on failure but didn't do so after alloc_unbound_pwq() failure. Fix it by regrabbing wq->mutex and jumping to use_dfl_pwq on alloc_unbound_pwq() failure. Signed-off-by: Daeseok Youn <[email protected]> Acked-by: Lai Jiangshan <[email protected]> Signed-off-by: Tejun Heo <[email protected]> Fixes: 4c16bd327c74 ("workqueue: implement NUMA affinity for unbound workqueues") Signed-off-by: Greg Kroah-Hartman <[email protected]> commit 04931ac044a638b79ea3c4b48c448b66cae0c2b5 Author: J. Bruce Fields <[email protected]> Date: Tue May 20 15:55:21 2014 -0400 nfsd4: remove lockowner when removing lock stateid commit a1b8ff4c97b4375d21b6d6c45d75877303f61b3b upstream. The nfsv4 state code has always assumed a one-to-one correspondance between lock stateid's and lockowners even if it appears not to in some places. We may actually change that, but for now when FREE_STATEID releases a lock stateid it also needs to release the parent lockowner. Symptoms were a subsequent LOCK crashing in find_lockowner_str when it calls same_lockowner_ino on a lockowner that unexpectedly has an empty so_stateids list. Signed-off-by: J. Bruce Fields <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit 02016987ba67614366a3d7cbd58b401ca956f816 Author: J. Bruce Fields <[email protected]> Date: Thu May 8 11:19:41 2014 -0400 nfsd4: warn on finding lockowner without stateid's commit 27b11428b7de097c42f205beabb1764f4365443b upstream. The current code assumes a one-to-one lockowner<->lock stateid correspondance. Signed-off-by: J. Bruce Fields <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit 53a3b8bea5827a9f647f411d9230e563e745c58c Author: Kinglong Mee <[email protected]> Date: Fri Apr 18 20:49:04 2014 +0800 NFSD: Call ->set_acl with a NULL ACL structure if no entries commit aa07c713ecfc0522916f3cd57ac628ea6127c0ec upstream. After setting ACL for directory, I got two problems that caused by the cached zero-length default posix acl. This patch make sure nfsd4_set_nfs4_acl calls ->set_acl with a NULL ACL structure if there are no entries. Thanks for Christoph Hellwig's advice. First problem: ............ hang ........... Second problem: [ 1610.167668] ------------[ cut here ]------------ [ 1610.168320] kernel BUG at /root/nfs/linux/fs/nfsd/nfs4acl.c:239! [ 1610.168320] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC [ 1610.168320] Modules linked in: nfsv4(OE) nfs(OE) nfsd(OE) rpcsec_gss_krb5 fscache ip6t_rpfilter ip6t_REJECT cfg80211 xt_conntrack rfkill ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw auth_rpcgss nfs_acl snd_intel8x0 ppdev lockd snd_ac97_codec ac97_bus snd_pcm snd_timer e1000 pcspkr parport_pc snd parport serio_raw joydev i2c_piix4 sunrpc(OE) microcode soundcore i2c_core ata_generic pata_acpi [last unloaded: nfsd] [ 1610.168320] CPU: 0 PID: 27397 Comm: nfsd Tainted: G OE 3.15.0-rc1+ #15 [ 1610.168320] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 1610.168320] task: ffff88005ab653d0 ti: ffff88005a944000 task.ti: ffff88005a944000 [ 1610.168320] RIP: 0010:[<ffffffffa034d5ed>] [<ffffffffa034d5ed>] _posix_to_nfsv4_one+0x3cd/0x3d0 [nfsd] [ 1610.168320] RSP: 0018:ffff88005a945b00 EFLAGS: 00010293 [ 1610.168320] RAX: 0000000000000001 RBX: ffff88006700bac0 RCX: 0000000000000000 [ 1610.168320] RDX: 0000000000000000 RSI: ffff880067c83f00 RDI: ffff880068233300 [ 1610.168320] RBP: ffff88005a945b48 R08: ffffffff81c64830 R09: 0000000000000000 [ 1610.168320] R10: ffff88004ea85be0 R11: 000000000000f475 R12: ffff880068233300 [ 1610.168320] R13: 0000000000000003 R14: 0000000000000002 R15: ffff880068233300 [ 1610.168320] FS: 0000000000000000(0000) GS:ffff880077800000(0000) knlGS:0000000000000000 [ 1610.168320] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 1610.168320] CR2: 00007f5bcbd3b0b9 CR3: 0000000001c0f000 CR4: 00000000000006f0 [ 1610.168320] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1610.168320] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1610.168320] Stack: [ 1610.168320] ffffffff00000000 0000000b67c83500 000000076700bac0 0000000000000000 [ 1610.168320] ffff88006700bac0 ffff880068233300 ffff88005a945c08 0000000000000002 [ 1610.168320] 0000000000000000 ffff88005a945b88 ffffffffa034e2d5 000000065a945b68 [ 1610.168320] Call Trace: [ 1610.168320] [<ffffffffa034e2d5>] nfsd4_get_nfs4_acl+0x95/0x150 [nfsd] [ 1610.168320] [<ffffffffa03400d6>] nfsd4_encode_fattr+0x646/0x1e70 [nfsd] [ 1610.168320] [<ffffffff816a6e6e>] ? kmemleak_alloc+0x4e/0xb0 [ 1610.168320] [<ffffffffa0327962>] ? nfsd_setuser_and_check_port+0x52/0x80 [nfsd] [ 1610.168320] [<ffffffff812cd4bb>] ? selinux_cred_prepare+0x1b/0x30 [ 1610.168320] [<ffffffffa0341caa>] nfsd4_encode_getattr+0x5a/0x60 [nfsd] [ 1610.168320] [<ffffffffa0341e07>] nfsd4_encode_operation+0x67/0x110 [nfsd] [ 1610.168320] [<ffffffffa033844d>] nfsd4_proc_compound+0x21d/0x810 [nfsd] [ 1610.168320] [<ffffffffa0324d9b>] nfsd_dispatch+0xbb/0x200 [nfsd] [ 1610.168320] [<ffffffffa00850cd>] svc_process_common+0x46d/0x6d0 [sunrpc] [ 1610.168320] [<ffffffffa0085433>] svc_process+0x103/0x170 [sunrpc] [ 1610.168320] [<ffffffffa032472f>] nfsd+0xbf/0x130 [nfsd] [ 1610.168320] [<ffffffffa0324670>] ? nfsd_destroy+0x80/0x80 [nfsd] [ 1610.168320] [<ffffffff810a5202>] kthread+0xd2/0xf0 [ 1610.168320] [<ffffffff810a5130>] ? insert_kthread_work+0x40/0x40 [ 1610.168320] [<ffffffff816c1ebc>] ret_from_fork+0x7c/0xb0 [ 1610.168320] [<ffffffff810a5130>] ? insert_kthread_work+0x40/0x40 [ 1610.168320] Code: 78 02 e9 e7 fc ff ff 31 c0 31 d2 31 c9 66 89 45 ce 41 8b 04 24 66 89 55 d0 66 89 4d d2 48 8d 04 80 49 8d 5c 84 04 e9 37 fd ff ff <0f> 0b 90 0f 1f 44 00 00 55 8b 56 08 c7 07 00 00 00 00 8b 46 0c [ 1610.168320] RIP [<ffffffffa034d5ed>] _posix_to_nfsv4_one+0x3cd/0x3d0 [nfsd] [ 1610.168320] RSP <ffff88005a945b00> [ 1610.257313] ---[ end trace 838254e3e352285b ]--- Signed-off-by: Kinglong Mee <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit d6a18aea9577844da0cdc0a595cbedde46b512d8 Author: Trond Myklebust <[email protected]> Date: Fri Apr 18 14:43:57 2014 -0400 NFSd: call rpc_destroy_wait_queue() from free_client() commit 4cb57e3032d4e4bf5e97780e9907da7282b02b0c upstream. Mainly to ensure that we don't leave any hanging timers. Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit ed6ad7a5caac4bc865280a2946b54f348a3bb2f4 Author: Trond Myklebust <[email protected]> Date: Fri Apr 18 14:43:56 2014 -0400 NFSd: Move default initialisers from create_client() to alloc_client() commit 5694c93e6c4954fa9424c215f75eeb919bddad64 upstream. Aside from making it clearer what is non-trivial in create_client(), it also fixes a bug whereby we can call free_client() before idr_init() has been called. Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: J. Bruce Fields <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit 21ec04003007ce13a632b1d53816e27f63e4dc3f Author: Takashi Iwai <[email protected]> Date: Fri May 23 09:02:44 2014 +0200 ALSA: hda - Fix onboard audio on Intel H97/Z97 chipsets commit 77f07800cb456bed6e5c345e6e4e83e8eda62437 upstream. The recent Intel H97/Z97 chipsets need the similar setups like other Intel chipsets for snooping, etc. Especially without snooping, the audio playback stutters or gets corrupted. This fix patch just adds the corresponding PCI ID entry with the proper flags. Reported-and-tested-by: Arthur Borsboom <[email protected]> Signed-off-by: Takashi Iwai <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit e60b0a2765dca37d50133463e639492b7e46a06a Author: Hans de Goede <[email protected]> Date: Mon May 19 22:52:30 2014 -0700 Input: synaptics - T540p - unify with other LEN0034 models commit 6d396ede224dc596d92d7cab433713536e68916c upstream. The T540p has a touchpad with pnp-id LEN0034, all the models with this pnp-id have the same min/max values, except the T540p where the values are slightly off. Fix them to be identical. This is a preparation patch for simplifying the quirk table. Signed-off-by: Hans de Goede <[email protected]> Signed-off-by: Dmitry Torokhov <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit 2d597d4480e99eaf426eded8e9e9fcb6feb40673 Author: Hans de Goede <[email protected]> Date: Wed May 14 11:10:40 2014 -0700 Input: synaptics - add min/max quirk for the ThinkPad W540 commit 0b5fe736fe923f1f5e05413878d5990e92ffbdf5 upstream. https://bugzilla.redhat.com/show_bug.cgi?id=1096436 Tested-and-reported-by: [email protected] Signed-off-by: Hans de Goede <[email protected]> Signed-off-by: Dmitry Torokhov <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit 5e02a153d1a0d5131a9943762f01179b42f074e2 Author: Hans de Goede <[email protected]> Date: Mon May 5 09:36:43 2014 -0700 Input: elantech - fix touchpad initialization on Gigabyte U2442 commit 36189cc3cd57ab0f1cd75241f93fe01de928ac06 upstream. The hw_version 3 Elantech touchpad on the Gigabyte U2442 does not accept 0x0b as initialization value for r10, this stand-alone version of the driver: http://planet76.com/drivers/elantech/psmouse-elantech-v6.tar.bz2 Uses 0x03 which does work, so this means not setting bit 3 of r10 which sets: "Enable Real H/W Resolution In Absolute mode" Which will result in half the x and y resolution we get with that bit set, so simply not setting it everywhere is not a solution. We've been unable to find a way to identify touchpads where setting the bit will fail, so this patch uses a dmi based blacklist for this. https://bugzilla.kernel.org/show_bug.cgi?id=61151 Reported-by: Philipp Wolfer <[email protected]> Tested-by: Philipp Wolfer <[email protected]> Signed-off-by: Hans de Goede <[email protected]> Signed-off-by: Dmitry Torokhov <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit 7f63e60bd6e1843fd3510d4cbc036e3d388944d1 Author: Sheng-Liang Song <[email protected]> Date: Thu Apr 24 16:28:29 2014 -0700 Input: atkbd - fix keyboard not working on some LG laptops commit 3d725caa9dcc78c3dc9e7ea0c04f626468edd9c9 upstream. After issuing ATKBD_CMD_RESET_DIS, keyboard on some LG laptops stops working. The workaround is to stop issuing ATKBD_CMD_RESET_DIS commands. In order to keep changes in atkbd driver to the minimum we check DMI signature and only skip ATKBD_CMD_RESET_DIS if we are running on LG LW25-B7HV or P1-J273B. Signed-off-by: Sheng-Liang Song <[email protected]> Signed-off-by: Dmitry Torokhov <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit f6de6225ca40427023398d1b22a6af810792741a Author: Romain Izard <[email protected]> Date: Tue Mar 4 10:09:39 2014 +0100 trace: module: Maintain a valid user count commit 098507ae3ec2331476fb52e85d4040c1cc6d0ef4 upstream. The replacement of the 'count' variable by two variables 'incs' and 'decs' to resolve some race conditions during module unloading was done in parallel with some cleanup in the trace subsystem, and was integrated as a merge. Unfortunately, the formula for this replacement was wrong in the tracing code, and the refcount in the traces was not usable as a result. Use 'count = incs - decs' to compute the user count. Link: http://lkml.kernel.org/p/[email protected] Acked-by: Ingo Molnar <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Frederic Weisbecker <[email protected]> Fixes: c1ab9cab7509 "merge conflict resolution" Signed-off-by: Romain Izard <[email protected]> Signed-off-by: Steven Rostedt <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit 863a921283fca22d63865314182e7c9e5fba0ad3 Author: K. Y. Srinivasan <[email protected]> Date: Thu Apr 3 18:02:45 2014 -0700 Drivers: hv: vmbus: Negotiate version 3.0 when running on ws2012r2 hosts commit 03367ef5ea811475187a0732aada068919e14d61 upstream. Only ws2012r2 hosts support the ability to reconnect to the host on VMBUS. This functionality is needed by kexec in Linux. To use this functionality we need to negotiate version 3.0 of the VMBUS protocol. Signed-off-by: K. Y. Srinivasan <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit 8567f561ed67f4a444b554c8463d129c2ad0e8ad Author: Chew, Kean ho <[email protected]> Date: Sat Mar 1 00:03:56 2014 +0800 i2c: i801: enable Intel BayTrail SMBUS commit 1b31e9b76ef8c62291e698dfdb973499986a7f68 upstream. Add Device ID of Intel BayTrail SMBus Controller. Signed-off-by: Chew, Kean ho <[email protected]> Signed-off-by: Chew, Chiau Ee <[email protected]> Reviewed-by: Jean Delvare <[email protected]> Signed-off-by: Wolfram Sang <[email protected]> Cc: "Chang, Rebecca Swee Fun" <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit 1b796c0acb32f18938041159d2d9538b26b75893 Author: James Ralston <[email protected]> Date: Mon Nov 4 09:29:48 2013 -0800 i2c: i801: Add Device IDs for Intel Wildcat Point-LP PCH commit afc659241258b40b683998ec801d25d276529f43 upstream. This patch adds the SMBus Device IDs for the Intel Wildcat Point-LP PCH. Signed-off-by: James Ralston <[email protected]> Signed-off-by: Wolfram Sang <[email protected]> Cc: "Chang, Rebecca Swee Fun" <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit 4e32a7c66fae40bde0fbff8cbc893eabe8575135 Author: Salva Peiró <[email protected]> Date: Wed Apr 30 19:48:02 2014 +0200 media: media-device: fix infoleak in ioctl media_enum_entities() commit e6a623460e5fc960ac3ee9f946d3106233fd28d8 upstream. This fixes CVE-2014-1739. Signed-off-by: Salva Peiró <[email protected]> Acked-by: Laurent Pinchart <[email protected]> Signed-off-by: Mauro Carvalho Chehab <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit c4f3c998c17e31c73c1ab223469435a12358d25e Author: Dan Carpenter <[email protected]> Date: Thu Nov 7 08:08:44 2013 +0000 clk: vexpress: NULL dereference on error path commit 6b4ed8b00e93bd31f24a25f59ed8d1b808d0cc00 upstream. If the allocation fails then we dereference the NULL in the error path. Just return directly. Fixes: ed27ff1db869 ('clk: Versatile Express clock generators ("osc") driver') Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: Pawel Moll <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit d2157c29092990941b82488a3653d95fc9e2cb7a Author: Tim Chen <[email protected]> Date: Mon Mar 17 16:52:26 2014 -0700 crypto: crypto_wq - Fix late crypto work queue initialization commit 130fa5bc81b44b6cc1fbdea3abf6db0da22964e0 upstream. The crypto algorithm modules utilizing the crypto daemon could be used early when the system start up. Using module_init does not guarantee that the daemon's work queue is initialized when the cypto alorithm depending on crypto_wq starts. It is necessary to initialize the crypto work queue earlier at the subsystem init time to make sure that it is initialized when used. Signed-off-by: Tim Chen <[email protected]> Signed-off-by: Herbert Xu <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit b944e0e0b84a1f8871b09723a1d035d70368e298 Author: Geert Uytterhoeven <[email protected]> Date: Mon Apr 14 18:52:14 2014 +0200 Documentation: Update stable address in Chinese and Japanese translations commit 98b0f811aade1b7c6e7806c86aa0befd5919d65f upstream. The English and Korean translations were updated, the Chinese and Japanese weren't. Signed-off-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit b020ee793f714e7d293078a1e0ea8a545c33b16f Author: Emil Goode <[email protected]> Date: Sun Mar 9 21:06:51 2014 +0100 brcmsmac: fix deadlock on missing firmware commit 8fc1e8c240aab968db658b2d8d079b4391207a36 upstream. When brcm80211 firmware is not installed networking hangs. A deadlock happens because we call ieee80211_unregister_hw() from the .start callback of struct ieee80211_ops. When .start is called we are under rtnl lock and ieee80211_unregister_hw() tries to take it again. Function call stack: dev_change_flags() __dev_change_flags() __dev_open() ASSERT_RTNL() <-- Assert rtnl lock ops->ndo_open() .ndo_open = ieee80211_open, ieee80211_open() ieee80211_do_open() drv_start() local->ops->start() .start = brcms_ops_start, brcms_ops_start() brcms_remove() ieee80211_unregister_hw() rtnl_lock() <-- Here we deadlock Introduced by: commit 25b5632fb35ca61b8ae3eee235edcdc2883f7a5e ("brcmsmac: request firmware in .start() callback") This patch fixes the bug by removing the call to brcms_remove() and moves the brcms_request_fw() call to the top of the .start callback to not initiate anything unless firmware is installed. Signed-off-by: Emil Goode <[email protected]> Signed-off-by: John W. Linville <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit 5d33fff5ca9aab5ae7d28fc06179a91eb529758e Author: Russell King <[email protected]> Date: Sun Apr 6 15:20:03 2014 -0700 leds: leds-pwm: properly clean up after probe failure commit 392369019eb96e914234ea21eda806cb51a1073e upstream. When probing with DT, we add each LED one at a time. If we find a LED without a PWM device (because it is not available yet) we fail the initialisation, unregister previous LEDs, and then by way of managed resources, we free the structure. The problem with this is we may have a scheduled and active work_struct in this structure, and this results in a nasty kernel oops. We need to cancel this work_struct properly upon cleanup - and the cleanup we require is the same cleanup as we do when the LED platform device is removed. Rather than writing this same code three times, move it into a separate function and use it in all three places. Fixes: c971ff185f64 ("leds: leds-pwm: Defer led_pwm_set() if PWM can sleep") Signed-off-by: Russell King <[email protected]> Signed-off-by: Bryan Wu <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit d3f09691b3583edfabbd0ea04ef3e015df23a708 Author: Martin Peres <[email protected]> Date: Fri Mar 14 00:26:52 2014 +0100 drm/nouveau/pm/fan: drop the fan lock in fan_update() before rescheduling commit 61679fe153b2b9ea5b5e2ab93305419e85e99a9d upstream. This should fix a deadlock that has been reported to us where fan_update() would hold the fan lock and try to grab the alarm_program_lock to reschedule an update. On an other CPU, the alarm_program_lock would have been taken before calling fan_update(), leading to a deadlock. We should Cc: <[email protected]> # 3.9+ Reported-by: Marcin Slusarz <[email protected]> Tested-by: Timothée Ravier <[email protected]> Tested-by: Boris Fersing (IRC nick fersingb, no public email address) Signed-off-by: Martin Peres <[email protected]> Signed-off-by: Ben Skeggs <[email protected]> Signed-off-by: Greg Kroah-…

commit 178eda29ca721842f2146378e73d43e0044c4166 upstream. It has been reported that using ZFSonLinux on rbd will result in memory corruption. The bug report can be found here: openzfs/spl#241 http://tracker.ceph.com/issues/7790 The reason is that ZFS will send pages with page_count 0 into rbd, which in turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with page_count 0, as it will do get_page and put_page, and erroneously free the page. This type of issue has been noted before, and handled in iscsi, drbd, etc. So, rbd should also handle this. This fix address this issue by fall back to slower sendmsg when page_count 0 detected. Cc: Sage Weil <[email protected]> Cc: Yehuda Sadeh <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Reviewed-by: Ilya Dryomov <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 178eda2 upstream. It has been reported that using ZFSonLinux on rbd will result in memory corruption. The bug report can be found here: openzfs/spl#241 http://tracker.ceph.com/issues/7790 The reason is that ZFS will send pages with page_count 0 into rbd, which in turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with page_count 0, as it will do get_page and put_page, and erroneously free the page. This type of issue has been noted before, and handled in iscsi, drbd, etc. So, rbd should also handle this. This fix address this issue by fall back to slower sendmsg when page_count 0 detected. Cc: Sage Weil <[email protected]> Cc: Yehuda Sadeh <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Reviewed-by: Ilya Dryomov <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 178eda29ca721842f2146378e73d43e0044c4166 upstream. It has been reported that using ZFSonLinux on rbd will result in memory corruption. The bug report can be found here: openzfs/spl#241 http://tracker.ceph.com/issues/7790 The reason is that ZFS will send pages with page_count 0 into rbd, which in turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with page_count 0, as it will do get_page and put_page, and erroneously free the page. This type of issue has been noted before, and handled in iscsi, drbd, etc. So, rbd should also handle this. This fix address this issue by fall back to slower sendmsg when page_count 0 detected. Cc: Sage Weil <[email protected]> Cc: Yehuda Sadeh <[email protected]> Signed-off-by: Chunwei Chen <[email protected]> Reviewed-by: Ilya Dryomov <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPL PANIC when creating a pool on top of a Ceph RBD #241

SPL PANIC when creating a pool on top of a Ceph RBD #241

tdb commented May 24, 2013

hvenzke commented May 28, 2013

tdb commented May 28, 2013

hvenzke commented May 28, 2013

tdb commented May 29, 2013

behlendorf commented Jun 7, 2013

tdb commented Jun 7, 2013

behlendorf commented Jun 7, 2013

tdb commented Jun 8, 2013

chrisrd commented Jun 8, 2013

behlendorf commented Jun 18, 2013

tdb commented Jun 18, 2013

behlendorf commented Jun 18, 2013

tdb commented Jun 19, 2013

behlendorf commented Jun 19, 2013

tdb commented Jun 20, 2013

behlendorf commented Jun 20, 2013

chrisrd commented Jun 21, 2013

tdb commented Jun 21, 2013

behlendorf commented Jun 21, 2013

tdb commented Aug 25, 2013

tdb commented Nov 21, 2013

behlendorf commented Nov 21, 2013

tdb commented Nov 22, 2013

rbraddy commented Dec 18, 2013

tdb commented Dec 18, 2013

rbraddy commented Dec 18, 2013

dweeezil commented Dec 30, 2013

hvenzke commented Dec 30, 2013

dweeezil commented Dec 30, 2013

behlendorf commented Jan 7, 2014

SPL PANIC when creating a pool on top of a Ceph RBD #241

SPL PANIC when creating a pool on top of a Ceph RBD #241

Comments

tdb commented May 24, 2013

hvenzke commented May 28, 2013

tdb commented May 28, 2013

hvenzke commented May 28, 2013

tdb commented May 29, 2013

behlendorf commented Jun 7, 2013

tdb commented Jun 7, 2013

behlendorf commented Jun 7, 2013

tdb commented Jun 8, 2013

chrisrd commented Jun 8, 2013

behlendorf commented Jun 18, 2013

tdb commented Jun 18, 2013

behlendorf commented Jun 18, 2013

tdb commented Jun 19, 2013

behlendorf commented Jun 19, 2013

tdb commented Jun 20, 2013

behlendorf commented Jun 20, 2013

chrisrd commented Jun 21, 2013

tdb commented Jun 21, 2013

behlendorf commented Jun 21, 2013

tdb commented Aug 25, 2013

tdb commented Nov 21, 2013

behlendorf commented Nov 21, 2013

tdb commented Nov 22, 2013

rbraddy commented Dec 18, 2013

tdb commented Dec 18, 2013

rbraddy commented Dec 18, 2013

dweeezil commented Dec 30, 2013

hvenzke commented Dec 30, 2013

dweeezil commented Dec 30, 2013

behlendorf commented Jan 7, 2014