Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent arc_c collapse #4256

Closed
wants to merge 2 commits into from
Closed

Prevent arc_c collapse #4256

wants to merge 2 commits into from

Conversation

dweeezil
Copy link
Contributor

Adjusting arc_c directly is racy because it can happen in the context
of multiple threads. It should always be >= 2 * maxblocksize. Set it
to a known valid value rather than adjusting it directly.

Reverts: 935434e
Fixes: #3904
Fixes: #4161

Adjusting arc_c directly is racy because it can happen in the context
of multiple threads.  It should always be >= 2 * maxblocksize.  Set it
to a known valid value rather than adjusting it directly.

Reverts: 935434e
Fixes: openzfs#3904
Fixes: openzfs#4161
@dweeezil
Copy link
Contributor Author

This is intended to be a lock-free alternative to #4197.

@behlendorf
Copy link
Contributor

@dweeezil nice, this LGTM and tests well. If you don't have any remaining concerns I'd like to merge it to master to resolve those occasional ztest failures.

@dweeezil
Copy link
Contributor Author

@behlendorf Go for it. The only thing I'm wondering is whether we should remove the ASSERT3U(arc_c, >=, 2ULL << SPA_MAXBLOCKSHIFT); in arc_adapt().

@kernelOfTruth
Copy link
Contributor

@dweeezil Looks like the buildbot just hit the assertion

ztest: ../../module/zfs/arc.c:3815: Assertion `(arc_stats.arcstat_c.value.ui64) >= 2ULL << 24 (0x183d6c8 >= 0x2000000)' failed.
/sbin/ztest[0x408133]
/lib64/libpthread.so.0(+0xf790)[0x7f5b11474790]
/lib64/libc.so.6(gsignal+0x35)[0x7f5b11103625]
/lib64/libc.so.6(abort+0x175)[0x7f5b11104e05]
/lib64/libc.so.6(+0x2b74e)[0x7f5b110fc74e]
/lib64/libc.so.6(__assert_perror_fail+0x0)[0x7f5b110fc810]
/lib64/libc.so.6(+0x2b87b)[0x7f5b110fc87b]
/lib64/libzpool.so.2(+0x37628)[0x7f5b125b3628]
/lib64/libzpool.so.2(+0x395d3)[0x7f5b125b55d3]
/lib64/libzpool.so.2(arc_buf_alloc+0x1fa)[0x7f5b125bc84a]
/lib64/libzpool.so.2(+0x4c04c)[0x7f5b125c804c]
/lib64/libzpool.so.2(dbuf_read+0xe6)[0x7f5b125c8396]
/lib64/libzpool.so.2(+0x512e7)[0x7f5b125cd2e7]
/lib64/libzpool.so.2(dbuf_sync_list+0x92)[0x7f5b125cd512]
/lib64/libzpool.so.2(dnode_sync+0x450)[0x7f5b125f5e70]
/lib64/libzpool.so.2(+0x5e89e)[0x7f5b125da89e]
/lib64/libzpool.so.2(dmu_objset_sync+0x1e5)[0x7f5b125dbef5]
/lib64/libzpool.so.2(dsl_dataset_sync+0xb2)[0x7f5b125f8bd2]
/lib64/libzpool.so.2(dsl_pool_sync+0xbe)[0x7f5b1260f04e]
/lib64/libzpool.so.2(spa_sync+0x2e0)[0x7f5b1263a180]
/lib64/libzpool.so.2(+0xd0308)[0x7f5b1264c308]
/lib64/libzpool.so.2(zk_thread_helper+0x22c)[0x7f5b125a766c]
/lib64/libpthread.so.0(+0x7a51)[0x7f5b1146ca51]
/lib64/libc.so.6(clone+0x6d)[0x7f5b111b993d]
child died with signal 6
+ RESULT=3
+ test 3 '!=' 0
+ echo 'Exited ztest with error 3'
Exited ztest with error 3

http://buildbot.zfsonlinux.org/builders/CentOS%206.7%20x86_64%20%28TEST%29/builds/469
#4258 (ABD2 rebase + this pull-request)

which is supposedly fixed by #4197

@behlendorf
Copy link
Contributor

OK, I'm going to hold off then. This needs some more investigation, perhaps we missed a place where a negative value gets added in with atomic_add_64(). I'd lean towards leaving the ASSERT there.

In addition to a simpler structure, protect against underflow in the
calculation of the new arc_c value.
@dweeezil
Copy link
Contributor Author

The only thing I can think of is that arc_shrink() could get too large of a to_free value and cause underflow. Given the logic in arc_reclaim_thread(), this certainly seems possible. I've added another commit to slightly refactor arc_shrink() and protect against too large a to_free. For sanity, I suppose it ought to also protect against a negative to_free but right now, the only caller already does that.

@dweeezil
Copy link
Contributor Author

I've applied the 48f4c26 patch to @kernelOfTruth's kernelOfTruth/zfs@eee4f05 (abd_next-rebase_22.01.2016 branch) and successfully ran an hour of ztest (modified to cap physmem at 1GiB) with 40 threads.

It seems to be doing ok on the 'bots, too, but only a few have finished so far.

@kernelOfTruth
Copy link
Contributor

@dweeezil that's good to hear !

Thanks for letting me know :)

@behlendorf
Copy link
Contributor

@dweeezil, @kernelOfTruth this LGTM and tested well. I've squashed these two commit together and merged them to master as:

1b8951b Prevent arc_c collapse

@behlendorf behlendorf closed this Jan 25, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants