-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dedicated box testing] ABD2 + Illumos #4950 + Illumos #2605 (master January 16th, 2015) #4235
[dedicated box testing] ABD2 + Illumos #4950 + Illumos #2605 (master January 16th, 2015) #4235
Conversation
zfsolinux currently uses vmalloc backed slab for ARC buffers. There are some major problems with this approach. One is that 32-bit system have only a handful of vmalloc space. Another is that the fragmentation in slab will easily deplete memory in busy system. With ABD, we use scatterlist to allocate data buffers. In this approach we can allocate in HIGHMEM, which alleviates vmalloc space pressure on 32-bit. Also, we don't have to rely on slab, so there's no fragmentation issue. But for metadata buffers, we still uses linear buffer from slab. The reason for this is that there are a lot of *_phys pointers directly point to metadata buffers. Thus, it would be much more complicated to use scatter buffer for metadata. Currently, ABD is not enabled and its API will treat them as normal buffers. We will enable it once all relevant code is modified to use the API. Signed-off-by: Chunwei Chen <[email protected]>
Modify/Add incremental fletcher function prototype to match abd_iterate_rfunc callback type. Also, reduce duplicated code a bit in zfs_fletcher.c. Signed-off-by: Chunwei Chen <[email protected]>
1. Use abd_t in arc_buf_t->b_data, dmu_buf_t->db_data, zio_t->io_data and zio_transform_t->zt_orig_data 2. zio_* function take abd_t for data Signed-off-by: Chunwei Chen <[email protected]>
1. Add checksum function for abd_t 2. Use abd_t version checksum function in zio_checksum_table 3. Make zio_checksum_compute and zio_checksum_error handle abd_t Signed-off-by: Chunwei Chen <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
…d zil.c Signed-off-by: Chunwei Chen <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Use ABD API on related pointers and functions.(b_data, db_data, zio_*(), etc.) Suggested-by: DHE <[email protected]> Signed-off-by: Chunwei Chen <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Currently, abd_uiomove repeatedly calls abd_copy_*_off. The problem is that it will need to do abd_miter_advance repeatedly over the parts that were skipped before. We split out the miter part of the abd_copy_*_off into abd_miter_copy_*. These new function will take miter directly and they will automatically advance it after finish. We initialize an miter in uiomove and use the iterator copy functions to solve the stated problem. Signed-off-by: Chunwei Chen <[email protected]>
The check is needed to make sure the user buffer is indeed in user space. Also change copy_{to,from}_user to __copy_{to,from}_user so that we don't repeatedly call access_ok. Signed-off-by: Chunwei Chen <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
When we aren't allocating in HIGHMEM, we can try to allocate contiguous pages, we can also use sg_alloc_table_from_pages to merge adjacent pages for us. This will allow more efficient cache prefetch and also reduce sg iterator overhead. And this has been tested to greatly improve performance. Signed-off-by: Jinshan Xiong <[email protected]> Signed-off-by: Chunwei Chen <[email protected]>
This patch adds non-highmem scatter ABD and also adds a new set of functions, namely abd_buf_segment and friends, for accessing those ABD. This is a preparation for metadata scatter ABD. Signed-off-by: Chunwei Chen <[email protected]>
Some metadata types are not required or not easily to use scatter ABD. So to allow ARC to accommodate both types of metadata, we add a new flag ARC_FLAG_META_SCATTER to indicate which type the buffer belongs. We also introduce a new type arc_buf_alloc_t, which is basically arc_buf_contents_t with a flag indicating scatter metadata. Users can pass it to arc_buf_alloc to decide which buffer type to allocate. Signed-off-by: Chunwei Chen <[email protected]>
We add a new member ot_scatter to dmu_ot to determine the ABD type for each DMU type. We temporary set them all to FALSE, until some types are ready to handle scatter abd. BP_GET_BUFA_TYPE and DBUF_GET_BUFA_TYPE will returns with the scatter flag set up accordingly, so they can be passed to arc_buf_alloc, and the ARC subsystem will returns the correct ABD type. Signed-off-by: Chunwei Chen <[email protected]>
Use non-highmem scatter ABD for indirect block, i.e. level > 0. abd_array should be used to access the blkptr. Signed-off-by: Chunwei Chen <[email protected]>
Use non-highmem scatter ABD for dnode array blocks, abd_array should be used to access the dnodes. Signed-off-by: Chunwei Chen <[email protected]>
abd_alloc_scatter always allocate nr*PAGE_SIZE memory. So in order to save memory, we allow it to fallback to do linear allocation when size is less than PAGE_SIZE. Note that orignally, we requires that zio->io_data remains the same type before it reaches lower level, vdev_{queue,cache,disk}. After this patch, however, transformation like compression may change a scatter io_data to linear. This is fine, because every scatter ABD aware API can also take linear ABD, but not vice versa. So we relax the type check in push transform to check linear only. Signed-off-by: Chunwei Chen <[email protected]>
In dsl_scan_scrub_cb and spa_load_verify_cb, we originally always allocated linear ABD. Now we try to allocate scatter ABD according to the BUFA type of the blkptr to reduce unnecessary spl slab allocation. Also in zio_ddt_read_start, we match the parent zio->io_data ABD type for the same reason. Signed-off-by: Chunwei Chen <[email protected]>
SPA history is access purely through dmu_read/dmu_write. It doesn't require any modification except for setting ot_scatter. Signed-off-by: Chunwei Chen <[email protected]>
Add abd version byteswap function with the name "abd_<old bswap func name>". Note that abd_byteswap_uint*_array and abd_dnode_buf_byteswap can handle scatter buffer, so now we don't need extra borrow/copy. Signed-off-by: Chunwei Chen <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
The arc_c value can be updated concurrently by multiple threads including the arc reclaim thread, kswapd, kthreadd and user processes. This patch updates it under a mutex to close the race between the checking of its value and subsequent updates. Reverts: 935434e Fixes: openzfs#3904 Fixes: openzfs#4161
…esystem Reviewed by: Adam Leventhal <[email protected]> Reviewed by: George Wilson <[email protected]> Reviewed by: Sebastien Roy <[email protected]> Reviewed by: Boris Protopopov <[email protected]> Approved by: Dan McDonald <[email protected]> Ported-by: Richard Yao <[email protected]> Porting notes: 1. ZoL currently does not log discards to zvols, so the portion of this patch that modifies the discard logging to mark it as freeing space has been discarded. 2. may_delete_now had been removed from zfs_remove() in ZoL. It has been reintroduced. 3. We do not try to emulate vnodes, so the following lines are not valid on Linux: mutex_enter(&vp->v_lock); may_delete_now = vp->v_count == 1 && !vn_has_cached_data(vp); mutex_exit(&vp->v_lock); This has been replaced with: mutex_enter(&zp->z_lock); may_delete_now = atomic_read(&ip->i_count) == 1 && !(zp->z_is_mapped); mutex_exit(&zp->z_lock);
Reviewed by: George Wilson <[email protected]> Reviewed by: Paul Dagnelie <[email protected]> Reviewed by: Richard Elling <[email protected]> Reviewed by: Xin Li <[email protected]> Reviewed by: Arne Jansen <[email protected]> Approved by: Dan McDonald <[email protected]> References: https://www.illumos.org/issues/2605 illumos/illumos-gate@9c3fd12 Porting notes: The tests from Illumos in usr/src/test/zfs-tests were left out Further the following files which don't exist in ZoL: usr/src/cmd/truss/expound.c usr/src/lib/libzfs/common/mapfile-vers usr/src/lib/libzfs_core/common/mapfile-vers usr/src/pkg/manifests/system-test-zfstest.mf From a readability point of view the changes in [module/zfs/dmu_traverse.c] from openzfs@ecfb0b5 Fix misuse of input argument in traverse_visitbp appear to be the optimum - also there's no functional change, when going with upstream the compiler would likely complain about ISO C90 errors - so only switching cdnp = buf->b_data; to child_dnp = buf->b_data; diverged code base from Illumos: [cmd/zstreamdump/zstreamdump.c] openzfs@37f8a88 Illumos 5746 - more checksumming in zfs send [include/sys/zfs_ioctl.h] openzfs@0cee240 Speed up 'zfs list -t snapshot -o name -s name' [module/zfs/zfs_ioctl.c] openzfs@13fe019 Illumos 3464 zfs synctask code needs restructuring [module/zfs/dmu_send.c] openzfs@b58986e Use large stacks when available openzfs@044baf0 Use taskq for dump_bytes() ISO C90 - mixed declarations and code: struct send_block_record *to_data; struct receive_ign_obj_node *n; account for 'missing braces around initializer' in 'struct send_thread_arg to_arg =' ISO C90 - mixed declarations and code: void *payload = NULL; nvlist_t *nvl = fnvlist_alloc(); uint64_t one = 1; uint64_t zero = 1; char recvname[ZFS_MAXNAMELEN]; uint64_t val; uint32_t payloadlen = drc->drc_drr_begin->drr_payloadlen; size_t payload_len = 0; [module/zfs/dsl_dataset.c] Illumos 4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R hasn't landed yet, thus ' #include <sys/zio_checksum.h> is still missing, ' #include <sys/dmu_send.h> however is needed further replaced fletcher_4_native(compressed, compressed_size, NULL, &cksum); with fletcher_4_native(compressed, compressed_size, &cksum); ctx_template argument was removed (purpose: the template will be reused to make initialization more efficient) C99 or C11 mode: declare 'int i' at the beginning of the function. ISO C90 - mixed declarations and code: char buf[256]; zio_cksum_t cksum; char *propval = kmem_asprintf("%u-%llx-%llx-%s", char recvname[ZFS_MAXNAMELEN]; dsl_dataset_t *recv_ds; [libzfs_sendrecv.c] ' #include <sha2.h> isn't listed in the header list since sha2, sha256 is implemented differently in ZoL. replace fletcher_4_native(compressed, len, NULL, &cksum); with fletcher_4_native(compressed, len, &cksum); declare 'int nread, i;' at the beginning of the function openzfs@b8864a2 Fix gcc cast warnings [module/zcommon/zfs_prop.c] openzfs@11b9ec2 Add full SELinux support [module/zfs/dmu_traverse.c] openzfs@ecfb0b5 Fix misuse of input argument in traverse_visitbp openzfs@79c76d5 Change KM_PUSHPAGE -> KM_SLEEP openzfs@a168788 Reduce stack for traverse_visitbp() recursion openzfs@b8d06fc Switch KM_SLEEP to KM_PUSHPAGE openzfs@47050a8 Fix stack traverse_impl() ISO C90: place 'uint64_t obj, error;' at the beginning, further replace 'err' with 'error' 8fbbb17 Enable scatter ABD for DMU_OT_DNODE [man/man8/zfs.8] Part of the sections for 'zfs receive' and 'zfs send' was rewritten and reordered to approximate upstream. Ported-by: kernelOfTruth [email protected]
buildbots - kindly also build 4950 and 2605 commits ? anyway, will see tomorrow if there still is stall in these http://buildbot.zfsonlinux.org/console the weird thing is that on the 29th commit (Update arc_c under a mutex) instead of the 31st the |
Closing as stale. |
references:
#3441 ABD: linear/scatter dual typed buffer for ARC (ver 2)
#4207 Illumos #4950 files sometimes can't be removed from a full filesystem
#4213 Illumos 2605 want to resume interrupted zfs send
4.5 linux kernel compatibility [upstream] changes haven't been included yet since they locally lead to compilation failure (base kernel 4.4.0)