Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

l2_asize corrupted in e0b0ca983 #1990

Closed
DeHackEd opened this issue Dec 19, 2013 · 5 comments
Closed

l2_asize corrupted in e0b0ca983 #1990

DeHackEd opened this issue Dec 19, 2013 · 5 comments
Milestone

Comments

@DeHackEd
Copy link
Contributor

If you're running the latest git head and have an L2ARC, check out your l2_asize arcstat. Look a little broken? Especially immediately after booting up and importing? I bisected it and happened in e0b0ca9


This is what I did for a little analysis, but I doubt much below here will be useful for fixing it.

--- a/module/zfs/arc.c
+++ b/module/zfs/arc.c
@@ -139,6 +139,9 @@
 #include <vm/anon.h>
 #include <sys/fs/swapnode.h>
 #include <sys/zpl.h>
+#include <linux/printk.h>
+#else
+#define printk(x,y)  /* Nothing */
 #endif
 #include <sys/callb.h>
 #include <sys/kstat.h>
@@ -1611,6 +1614,7 @@ arc_hdr_destroy(arc_buf_hdr_t *hdr)
                if (l2hdr != NULL) {
                        list_remove(l2hdr->b_dev->l2ad_buflist, hdr);
                        ARCSTAT_INCR(arcstat_l2_size, -hdr->b_size);
+                       printk(KERN_INFO "l2_aize -= %d\n", (int)l2hdr->b_asize);
                        ARCSTAT_INCR(arcstat_l2_asize, -l2hdr->b_asize);
                        kmem_free(l2hdr, sizeof (l2arc_buf_hdr_t));
                        arc_space_return(L2HDR_SIZE, ARC_SPACE_L2HDRS);
@@ -3571,6 +3575,7 @@ arc_release(arc_buf_t *buf, void *tag)
        buf->b_private = NULL;
 
        if (l2hdr) {
+               printk(KERN_INFO "l2_asize -= %d\n", (int)l2hdr->b_asize);
                ARCSTAT_INCR(arcstat_l2_asize, -l2hdr->b_asize);
                list_remove(l2hdr->b_dev->l2ad_buflist, hdr);
                kmem_free(l2hdr, sizeof (l2arc_buf_hdr_t));
@@ -4447,6 +4452,7 @@ l2arc_write_done(zio_t *zio)
                         * Error - drop L2ARC entry.
                         */
                        list_remove(buflist, ab);
+                       printk(KERN_INFO "l2_asize -= %d\n", (int)abl2->b_asize);
                        ARCSTAT_INCR(arcstat_l2_asize, -abl2->b_asize);
                        ab->b_l2hdr = NULL;
                        kmem_free(abl2, sizeof (l2arc_buf_hdr_t));
@@ -4702,6 +4708,7 @@ top:
                         */
                        if (ab->b_l2hdr != NULL) {
                                abl2 = ab->b_l2hdr;
+                               printk(KERN_INFO "l2_asize -= %d\n", (int)abl2->b_asize);
                                ARCSTAT_INCR(arcstat_l2_asize, -abl2->b_asize);
                                ab->b_l2hdr = NULL;
                                kmem_free(abl2, sizeof (l2arc_buf_hdr_t));
@@ -4967,6 +4974,7 @@ l2arc_write_buffers(spa_t *spa, l2arc_dev_t *dev, uint64_t target_sz,
        ARCSTAT_BUMP(arcstat_l2_writes_sent);
        ARCSTAT_INCR(arcstat_l2_write_bytes, write_asize);
        ARCSTAT_INCR(arcstat_l2_size, write_sz);
+       printk(KERN_INFO "l2_asize += %d\n", (int)write_asize);
        ARCSTAT_INCR(arcstat_l2_asize, write_asize);
        vdev_space_update(dev->l2ad_vdev, write_psize, 0, 0);
 
Dec 18 20:09:54 localhost kernel: l2_asize += 11100672
Dec 18 20:09:54 localhost kernel: l2_asize -= 1024
Dec 18 20:09:54 localhost kernel: l2_asize -= 10240
Dec 18 20:09:54 localhost kernel: l2_asize -= 512
Dec 18 20:09:54 localhost kernel: l2_asize -= 2560
Dec 18 20:09:54 localhost kernel: l2_asize -= 2048
Dec 18 20:09:54 localhost kernel: l2_asize += 55296
Dec 18 20:09:55 localhost kernel: l2_asize += 33280
Dec 18 20:09:56 localhost kernel: l2_asize += 68096
Dec 18 20:09:57 localhost kernel: l2_asize += 67584
# fgrep l2_ /proc/spl/kstat/zfs/arcstats 
evict_l2_cached                 4    0
evict_l2_eligible               4    0
evict_l2_ineligible             4    4096
l2_hits                         4    0
l2_misses                       4    4640
l2_feeds                        4    212
l2_rw_clash                     4    0
l2_read_bytes                   4    0
l2_write_bytes                  4    11324928
l2_writes_sent                  4    5
l2_writes_done                  4    5
l2_writes_error                 4    0
l2_writes_hdr_miss              4    0
l2_evict_lock_retry             4    0
l2_evict_reading                4    0
l2_free_on_write                4    0
l2_abort_lowmem                 4    0
l2_cksum_bad                    4    0
l2_io_error                     4    0
l2_size                         4    16501760
l2_asize                        4    21486145024    <=
l2_hdr_size                     4    141840
l2_compress_successes           4    3428
l2_compress_zeros               4    0
l2_compress_failures            4    0

21486145024-11100672+1024+10240+512+512+2048-55296-33280-68096-67584
= 21474834432
= 2^31 * 10 - 2048 (??)

behlendorf added a commit to behlendorf/zfs that referenced this issue Feb 1, 2014
Commit e0b0ca9 accidentally corrupted the l2_asize displayed in
arcstats.  This was caused by changing the l2arc_buf_hdr.b_asize
member from an int to uint32_t type.  There are places in the
code where this field is cast to a uint64_t resulting in the
b_hits member being treated as part of b_asize.

To resolve the issue the type has been changed to a uint64_t,
and the b_hits member is placed after the enum to prevent the
size of the structure from increasing.

This is a good example of exactly why it's a bad idea to use
ambiguous types (int) in these structures.

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#1990
@behlendorf
Copy link
Contributor

@DeHackEd Thanks for doing the hard work on this one. After you were able to bisect to the offending commit it was easy to find the issue. #2093 resolves the issue, but if you could verify that I'd appreciate it.

@DeHackEd
Copy link
Contributor Author

DeHackEd commented Feb 1, 2014

l2_size                         4    3145230336
l2_asize                        4    547742208
l2_hdr_size                     4    58055696

Been running for 20 minutes so far. Looking good!

@behlendorf
Copy link
Contributor

@DeHackEd Still good 2 days later I trust?

@DeHackEd
Copy link
Contributor Author

DeHackEd commented Feb 3, 2014

l2_size                         4    71782336000
l2_asize                        4    8791361536
...
arc_meta_max                    4    18446744073709551520

Since I'm mixing and matching patches I can't tell who's at fault for that metadata one. Since I have the ARC improvements patch loaded I would assume that's the case. l2_asize specifically continues to be correct.

@behlendorf
Copy link
Contributor

@DeHackEd Yes, I'm sure the meta_max is related to the other changes. Thanks.

FransUrbo pushed a commit to FransUrbo/zfs that referenced this issue Feb 6, 2014
Commit e0b0ca9 accidentally corrupted the l2_asize displayed in
arcstats.  This was caused by changing the l2arc_buf_hdr.b_asize
member from an int to uint32_t type.  There are places in the
code where this field is cast to a uint64_t resulting in the
b_hits member being treated as part of b_asize.

To resolve the issue the type has been changed to a uint64_t,
and the b_hits member is placed after the enum to prevent the
size of the structure from increasing.

This is a good example of exactly why it's a bad idea to use
ambiguous types (int) in these structures.

Signed-off-by: DHE <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#1990
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants