-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenZFS 6950 - ARC should cache compressed data #4768
Conversation
Please note that for the purpose of merging we split out the regression test suite from the compressed arc patch. They are two different features and we will have a pull request coming shortly for the test suite. |
The failure for the built-in build is because we don't have support for a DTRACE2_PROBE of the form (dmu_dbuf_impl_t, multilist_sublist_t). This needs to be added but I'm not sure how its done. Would appreciate help with this. |
@ behlendorf - Looks like ztest failed on Ubuntu bot. Is there an easy way to get the core file(s)?
|
@ behlendorf - any advice/tips on adding a new DTRACE probe definition? Like in |
@don-brady You need to add a new template to |
Start by adding something like this to the bottom of
(completely untested, sorry) which should get things moving in the right direction. |
@don-brady right now there's no way to get the crash dumps. But feel free to extend the script to better support this kind of thing. That said, it looks like something much worse happened during the testing. Normally when ztest fails the worst thing which can happen is that test case fails and the user space process cores. However in this case the entire test node went unresponsive and all subsequent tests failed. Normally, that's a clue to check the console log from either this test case or an earlier one to see if you can find an earlier problem. In this case since none of the automated tested passed I checked the console logs for CentOS 7 and found the following. That would explain all of the test wreckage.
Regarding the dtrace probes, @dweeezil summed it up nicely. |
How do I build the tree with dtrace macros enabled? I'm trying to add the new dtrace2 macro but I'm not sure how to test it. |
@dpquigl since the dtrace macro's depend on ftrace which is GPL the only way to enable them is in a local build. Change the It also looks like the updated patch is hitting an ASSERT. |
Yea I hadn't tracked down the assert yet. I was trying to get the last of the builds to pass successfully before I started working on that. Ill redo the build with the changes you suggested and try to get the dtrace/ftrace error ironed out first. |
@dpquigl @behlendorf I looked at the assert last week. The new dbuf cache is built with the multilist primitives (also used by L2 ARC). The state of |
@behlendorf Is there something in that pull request that you think will fix our issue with multilist_link_active? If so is it a single commit I can extract or do we need to wait on the entire patch? |
@dpquigl I just referenced this PR from the ZFS encryption PR so the developers working on encryption are aware of this work. I haven't looked in to the ASSERT at all, but it looks like you've sorted out all the build issues which is great. |
@behlendorf Yea I added the dbuf ftrace support and fixed the arc ftrace support because of struct changes. Don is working on tracking down the assert and I've built an Ubuntu vm and have ZFS built and I'll be tracking down the ztest failure on ubuntu. |
@behlendorf Is there any way to get some assistance in trying to track down some of these testing failures? I built an ubuntu vm and ran the commands that supposedly failed and they ran through and passed on my local vm. |
There are still a couple existing ztest failures which occur once and a while. I wouldn't worry about the ztest failure until the assert is addressed. Particulatly if you can't reproduce the failure locally. |
@behlendorf @dpquigl I found the issue with the ASSERT. I'll do some more testing but the root cause seems to be a missing |
@behlendorf The patch has been rebased to the latest master. I don't know why the tests are failing on the test machines but they seem to clear our tests on Centos and when running the failed commands on ubuntu they pass for me. I'm not sure how you want to handle this now. |
@dpquigl It seems that most of the tests are failing because (1) the packaging is failing to include or properly place ztest and (2) There is a panic occuring in the zconfig test. Also the debian 8 slave disconnected at some point (that happens occasionally). |
@behlendorf So the last two runs that went through came up pretty good with several of the machines coming up all green. I'm hoping this run with the tunables does the same and we can try to get the last of the problems figured out and get this merged. If we can do that then I think it would be better to rebase the zfs resume send patches on top of compressed arc instead of the other way around. |
@dpquigl it's great to see this PR coming together. But we're going to need to run down any observed failures before this can be merged in order to keep the master branch is good shape for other developers. Ignoring some spurious buildbot infrastructure problems and a long standing ztest failure all the tests have been passing on all the builders. I've resubmitted the failures above in order to get clean test runs to determine if any new issues were introduced. I'd also like to merge this PR after the resumable send changes since that's the same order they were merged in OpenZFS. This afternoon I resolved the last remaining issue with #4742 so as long as it passes the automated testing I expect to be able to merge it tomorrow. That will help bring a few key areas back in sync with the OpenZFS code. After it's merged it would be great if you could rebase this against master. Then I'll be happy to review it and time permitting help run down any remaining outstanding issues. |
Rebased on top of latest master. |
So now that the zfs send patches have been merged and this has been rebased can we work on looking into these test failures? |
@behlendorf Is there any way we can get a concerted effort to get this looked at and merged? I have a bunch of other work that is depending on this and I'd like to not have to keep carrying these patches forward. |
@dpquigl the finalized version of this feature was just merged to the OpenZFS repository so we should be in a good position to get this merged. If you can refresh this against master, making sure to include any last minute fixes made to the OpenZFS version we can get this merged. |
@behlendorf The version currently up there contains all of George's fixes. According to the PR here there are no conflicts with master that would prevent you from merging the commit. We had a F2F meeting this week and the Compressed ARC code we have completed passed our internal test machines with 100% test success. I'm not sure why the testers here are failing but we're seeing good results from the code internally. If you need the patch rebased I can try to do it this weekend but it will most likely have to wait until Monday. |
@dpquigl that's great news regarding your internal testing. Many of the test failures appear to have been due to problems in the master branch which have been addressed since this PR was last refreshed. For testing purposes I've opened #4863 with this patch rebased against the latest master code. If things go well there this should be ready to merge. |
Minor issue when building without debugging enabled
Also just a heads up there will be some simple conflicts to resolve against master due to 25458cb which was merged. I'd like to get this merged so I'll be spending some time looking in to the remaining testing failures. |
Original comments from @angstymeat which were posted to #4863. Thanks for testing this out and providing us some feedback.
This is almost certainly by the use of
|
@behlendorf Made the change you requested and pushed it as a separate commit for now. You can feel free to merge the commit into the first one during merge. |
@behlendorf The builtin builds seem to fail because of a bug in SPL. |
@dpquigl thanks. The built in failures should be resolved fairly soon, they're due to changes in the latest kernel.org kernel. |
Authored by: George Wilson <[email protected]> Reviewed by: Prakash Surya <[email protected]> Reviewed by: Dan Kimmel <[email protected]> Reviewed by: Matt Ahrens <[email protected]> Reviewed by: Paul Dagnelie <[email protected]> Ported by: David Quigley <[email protected]> This review covers the reading and writing of compressed arc headers, sharing data between the arc_hdr_t and the arc_buf_t, and the implementation of a new dbuf cache to keep frequently access data uncompressed. I've added a new member to l1 arc hdr called b_pdata. The b_pdata always hangs off the arc_buf_hdr_t (if an L1 hdr is in use) and points to the physical block for that DVA. The physical block may or may not be compressed. If compressed arc is enabled and the block on-disk is compressed, then the b_pdata will match the block on-disk and remain compressed in memory. If the block on disk is not compressed, then neither will the b_pdata. Lastly, if compressed arc is disabled, then b_pdata will always be an uncompressed version of the on-disk block. Typically the arc will cache only the arc_buf_hdr_t and will aggressively evict any arc_buf_t's that are no longer referenced. This means that the arc will primarily have compressed blocks as the arc_buf_t's are considered overhead and are always uncompressed. When a consumer reads a block we first look to see if the arc_buf_hdr_t is cached. If the hdr is cached then we allocate a new arc_buf_t and decompress the b_pdata contents into the arc_buf_t's b_data. If the hdr already has a arc_buf_t, then we will allocate an additional arc_buf_t and bcopy the uncompressed contents from the first arc_buf_t to the new one. Writing to the compressed arc requires that we first discard the b_pdata since the physical block is about to be rewritten. The new data contents will be passed in via an arc_buf_t (uncompressed) and during the I/O pipeline stages we will copy the physical block contents to a newly allocated b_pdata. When an l2arc is inuse it will also take advantage of the b_pdata. Now the l2arc will always write the contents of b_pdata to the l2arc. This means that when compressed arc is enabled that the l2arc blocks are identical to those stored in the main data pool. This provides a significant advantage since we can leverage the bp's checksum when reading from the l2arc to determine if the contents are valid. If the compressed arc is disabled, then we must first transform the read block to look like the physical block in the main data pool before comparing the checksum and determining it's valid. OpenZFS Issue: https://www.illumos.org/issues/6950
Closing, replaced by #5009. |
Authored by: George Wilson [email protected]
Reviewed by: Prakash Surya [email protected]
Reviewed by: Dan Kimmel [email protected]
Reviewed by: Matt Ahrens [email protected]
Reviewed by: Paul Dagnelie [email protected]
Ported by: David Quigley [email protected]
This review covers the reading and writing of compressed arc headers, sharing
data between the arc_hdr_t and the arc_buf_t, and the implementation of a new
dbuf cache to keep frequently access data uncompressed.
I've added a new member to l1 arc hdr called b_pdata. The b_pdata always hangs
off the arc_buf_hdr_t (if an L1 hdr is in use) and points to the physical block
for that DVA. The physical block may or may not be compressed. If compressed
arc is enabled and the block on-disk is compressed, then the b_pdata will match
the block on-disk and remain compressed in memory. If the block on disk is not
compressed, then neither will the b_pdata. Lastly, if compressed arc is
disabled, then b_pdata will always be an uncompressed version of the on-disk
block.
Typically the arc will cache only the arc_buf_hdr_t and will aggressively evict
any arc_buf_t's that are no longer referenced. This means that the arc will
primarily have compressed blocks as the arc_buf_t's are considered overhead and
are always uncompressed. When a consumer reads a block we first look to see if
the arc_buf_hdr_t is cached. If the hdr is cached then we allocate a new
arc_buf_t and decompress the b_pdata contents into the arc_buf_t's b_data. If
the hdr already has a arc_buf_t, then we will allocate an additional arc_buf_t
and bcopy the uncompressed contents from the first arc_buf_t to the new one.
Writing to the compressed arc requires that we first discard the b_pdata since
the physical block is about to be rewritten. The new data contents will be
passed in via an arc_buf_t (uncompressed) and during the I/O pipeline stages we
will copy the physical block contents to a newly allocated b_pdata.
When an l2arc is inuse it will also take advantage of the b_pdata. Now the
l2arc will always write the contents of b_pdata to the l2arc. This means that
when compressed arc is enabled that the l2arc blocks are identical to those
stored in the main data pool. This provides a significant advantage since we
can leverage the bp's checksum when reading from the l2arc to determine if the
contents are valid. If the compressed arc is disabled, then we must first
transform the read block to look like the physical block in the main data pool
before comparing the checksum and determining it's valid.
OpenZFS Issue: https://www.illumos.org/issues/6950