Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dedicated box testing] ABD2 + Illumos #4950 + Illumos #2605 (master January 16th, 2015) #4235

Conversation

kernelOfTruth
Copy link
Contributor

references:
#3441 ABD: linear/scatter dual typed buffer for ARC (ver 2)
#4207 Illumos #4950 files sometimes can't be removed from a full filesystem
#4213 Illumos 2605 want to resume interrupted zfs send

4.5 linux kernel compatibility [upstream] changes haven't been included yet since they locally lead to compilation failure (base kernel 4.4.0)

tuxoko and others added 30 commits January 16, 2016 21:59
zfsolinux currently uses vmalloc backed slab for ARC buffers. There are some
major problems with this approach. One is that 32-bit system have only a
handful of vmalloc space. Another is that the fragmentation in slab will easily
deplete memory in busy system.

With ABD, we use scatterlist to allocate data buffers. In this approach we can
allocate in HIGHMEM, which alleviates vmalloc space pressure on 32-bit. Also,
we don't have to rely on slab, so there's no fragmentation issue.

But for metadata buffers, we still uses linear buffer from slab. The reason for
this is that there are a lot of *_phys pointers directly point to metadata
buffers. Thus, it would be much more complicated to use scatter buffer for
metadata.

Currently, ABD is not enabled and its API will treat them as normal buffers.
We will enable it once all relevant code is modified to use the API.

Signed-off-by: Chunwei Chen <[email protected]>
Modify/Add incremental fletcher function prototype to match abd_iterate_rfunc
callback type. Also, reduce duplicated code a bit in zfs_fletcher.c.

Signed-off-by: Chunwei Chen <[email protected]>
1. Use abd_t in arc_buf_t->b_data, dmu_buf_t->db_data, zio_t->io_data and
zio_transform_t->zt_orig_data
2. zio_* function take abd_t for data

Signed-off-by: Chunwei Chen <[email protected]>
1. Add checksum function for abd_t
2. Use abd_t version checksum function in zio_checksum_table
3. Make zio_checksum_compute and zio_checksum_error handle abd_t

Signed-off-by: Chunwei Chen <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Use ABD API on related pointers and functions.(b_data, db_data, zio_*(), etc.)

Suggested-by: DHE <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
Currently, abd_uiomove repeatedly calls abd_copy_*_off. The problem is that it
will need to do abd_miter_advance repeatedly over the parts that were skipped
before.

We split out the miter part of the abd_copy_*_off into abd_miter_copy_*. These
new function will take miter directly and they will automatically advance it
after finish. We initialize an miter in uiomove and use the iterator copy
functions to solve the stated problem.

Signed-off-by: Chunwei Chen <[email protected]>
The check is needed to make sure the user buffer is indeed in user space. Also
change copy_{to,from}_user to __copy_{to,from}_user so that we don't
repeatedly call access_ok.

Signed-off-by: Chunwei Chen <[email protected]>
When we aren't allocating in HIGHMEM, we can try to allocate contiguous pages,
we can also use sg_alloc_table_from_pages to merge adjacent pages for us. This
will allow more efficient cache prefetch and also reduce sg iterator overhead.
And this has been tested to greatly improve performance.

Signed-off-by: Jinshan Xiong <[email protected]>
Signed-off-by: Chunwei Chen <[email protected]>
This patch adds non-highmem scatter ABD and also adds a new set of functions,
namely abd_buf_segment and friends, for accessing those ABD.

This is a preparation for metadata scatter ABD.

Signed-off-by: Chunwei Chen <[email protected]>
Some metadata types are not required or not easily to use scatter ABD. So to
allow ARC to accommodate both types of metadata, we add a new flag
ARC_FLAG_META_SCATTER to indicate which type the buffer belongs.

We also introduce a new type arc_buf_alloc_t, which is basically
arc_buf_contents_t with a flag indicating scatter metadata. Users can pass it
to arc_buf_alloc to decide which buffer type to allocate.

Signed-off-by: Chunwei Chen <[email protected]>
We add a new member ot_scatter to dmu_ot to determine the ABD type for each
DMU type. We temporary set them all to FALSE, until some types are ready to
handle scatter abd.

BP_GET_BUFA_TYPE and DBUF_GET_BUFA_TYPE will returns with the scatter flag set
up accordingly, so they can be passed to arc_buf_alloc, and the ARC subsystem
will returns the correct ABD type.

Signed-off-by: Chunwei Chen <[email protected]>
Use non-highmem scatter ABD for indirect block, i.e. level > 0. abd_array
should be used to access the blkptr.

Signed-off-by: Chunwei Chen <[email protected]>
Use non-highmem scatter ABD for dnode array blocks, abd_array should be used
to access the dnodes.

Signed-off-by: Chunwei Chen <[email protected]>
abd_alloc_scatter always allocate nr*PAGE_SIZE memory. So in order to save
memory, we allow it to fallback to do linear allocation when size is less than
PAGE_SIZE.

Note that orignally, we requires that zio->io_data remains the same type
before it reaches lower level, vdev_{queue,cache,disk}. After this patch,
however, transformation like compression may change a scatter io_data to
linear. This is fine, because every scatter ABD aware API can also take linear
ABD, but not vice versa. So we relax the type check in push transform to check
linear only.

Signed-off-by: Chunwei Chen <[email protected]>
In dsl_scan_scrub_cb and spa_load_verify_cb, we originally always allocated
linear ABD. Now we try to allocate scatter ABD according to the BUFA type of
the blkptr to reduce unnecessary spl slab allocation.

Also in zio_ddt_read_start, we match the parent zio->io_data ABD type for the
same reason.

Signed-off-by: Chunwei Chen <[email protected]>
SPA history is access purely through dmu_read/dmu_write. It doesn't require
any modification except for setting ot_scatter.

Signed-off-by: Chunwei Chen <[email protected]>
Add abd version byteswap function with the name "abd_<old bswap func name>".

Note that abd_byteswap_uint*_array and abd_dnode_buf_byteswap can handle
scatter buffer, so now we don't need extra borrow/copy.

Signed-off-by: Chunwei Chen <[email protected]>
The arc_c value can be updated concurrently by multiple threads including
the arc reclaim thread, kswapd, kthreadd and user processes.  This patch
updates it under a mutex to close the race between the checking of its
value and subsequent updates.

Reverts: 935434e
Fixes: openzfs#3904
Fixes: openzfs#4161
…esystem

Reviewed by: Adam Leventhal <[email protected]>
Reviewed by: George Wilson <[email protected]>
Reviewed by: Sebastien Roy <[email protected]>
Reviewed by: Boris Protopopov <[email protected]>
Approved by: Dan McDonald <[email protected]>
Ported-by: Richard Yao <[email protected]>

Porting notes:
1. ZoL currently does not log discards to zvols, so the portion of this patch that
modifies the discard logging to mark it as freeing space has been discarded.

2. may_delete_now had been removed from zfs_remove() in ZoL. It has been
reintroduced.

3. We do not try to emulate vnodes, so the following lines are not valid
on Linux:

	mutex_enter(&vp->v_lock);
	may_delete_now = vp->v_count == 1 && !vn_has_cached_data(vp);
	mutex_exit(&vp->v_lock);

This has been replaced with:

	mutex_enter(&zp->z_lock);
	may_delete_now = atomic_read(&ip->i_count) == 1 && !(zp->z_is_mapped);
	mutex_exit(&zp->z_lock);
Reviewed by: George Wilson <[email protected]>
Reviewed by: Paul Dagnelie <[email protected]>
Reviewed by: Richard Elling <[email protected]>
Reviewed by: Xin Li <[email protected]>
Reviewed by: Arne Jansen <[email protected]>
Approved by: Dan McDonald <[email protected]>

References:
https://www.illumos.org/issues/2605
illumos/illumos-gate@9c3fd12

Porting notes:

The tests from Illumos in usr/src/test/zfs-tests were left out
Further the following files which don't exist in ZoL:
usr/src/cmd/truss/expound.c
usr/src/lib/libzfs/common/mapfile-vers
usr/src/lib/libzfs_core/common/mapfile-vers
usr/src/pkg/manifests/system-test-zfstest.mf

From a readability point of view the changes in [module/zfs/dmu_traverse.c]
from openzfs@ecfb0b5 Fix misuse of input argument in traverse_visitbp
appear to be the optimum - also there's no functional change,
when going with upstream the compiler would likely complain about ISO C90 errors - so only switching
cdnp = buf->b_data;
to
child_dnp = buf->b_data;

diverged code base from Illumos:

[cmd/zstreamdump/zstreamdump.c]
openzfs@37f8a88 Illumos 5746 - more checksumming in zfs send

[include/sys/zfs_ioctl.h]
openzfs@0cee240 Speed up 'zfs list -t snapshot -o name -s name'

[module/zfs/zfs_ioctl.c]
openzfs@13fe019 Illumos 3464 zfs synctask code needs restructuring

[module/zfs/dmu_send.c]
openzfs@b58986e Use large stacks when available
openzfs@044baf0 Use taskq for dump_bytes()
ISO C90 - mixed declarations and code: 	struct send_block_record *to_data;
					struct receive_ign_obj_node *n;
account for 'missing braces around initializer' in 'struct send_thread_arg to_arg ='
ISO C90 - mixed declarations and code: 	void *payload = NULL;
					nvlist_t *nvl = fnvlist_alloc();
					uint64_t one = 1; uint64_t zero = 1;
					char recvname[ZFS_MAXNAMELEN];
					uint64_t val;
					uint32_t payloadlen = drc->drc_drr_begin->drr_payloadlen;
					size_t payload_len = 0;

[module/zfs/dsl_dataset.c]
Illumos 4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R
hasn't landed yet, thus
' #include <sys/zio_checksum.h>
is still missing,

' #include <sys/dmu_send.h>
however is needed

further
replaced
fletcher_4_native(compressed, compressed_size, NULL, &cksum);
with
fletcher_4_native(compressed, compressed_size, &cksum);

ctx_template argument was removed (purpose: the template will be reused to make initialization more efficient)

C99 or C11 mode: declare 'int i' at the beginning of the function.

ISO C90 - mixed declarations and code: 	char buf[256];
					zio_cksum_t cksum;
					char *propval = kmem_asprintf("%u-%llx-%llx-%s",
					char recvname[ZFS_MAXNAMELEN];
					dsl_dataset_t *recv_ds;

[libzfs_sendrecv.c]
' #include <sha2.h>
isn't listed in the header list since sha2, sha256 is implemented differently in ZoL.

replace
fletcher_4_native(compressed, len, NULL, &cksum);
with
fletcher_4_native(compressed, len, &cksum);

declare 'int nread, i;' at the beginning of the function

openzfs@b8864a2 Fix gcc cast warnings

[module/zcommon/zfs_prop.c]
openzfs@11b9ec2 Add full SELinux support

[module/zfs/dmu_traverse.c]
openzfs@ecfb0b5 Fix misuse of input argument in traverse_visitbp
openzfs@79c76d5 Change KM_PUSHPAGE -> KM_SLEEP
openzfs@a168788 Reduce stack for traverse_visitbp() recursion
openzfs@b8d06fc Switch KM_SLEEP to KM_PUSHPAGE
openzfs@47050a8 Fix stack traverse_impl()
ISO C90: place 'uint64_t obj, error;' at the beginning, further replace 'err' with 'error'

8fbbb17 Enable scatter ABD for DMU_OT_DNODE

[man/man8/zfs.8]
Part of the sections for 'zfs receive' and 'zfs send' was rewritten and reordered to approximate upstream.

Ported-by: kernelOfTruth [email protected]
@kernelOfTruth
Copy link
Contributor Author

buildbots - kindly also build 4950 and 2605 commits ?

anyway, will see tomorrow if there still is stall in these

http://buildbot.zfsonlinux.org/console

the weird thing is that on the 29th commit (Update arc_c under a mutex) instead of the 31st the
TEST buildbots are being let loose (and already have finished, but no progress on commit 29/31 or thereafter) - which should actually only be run on the last commit in a pull-request 🎱

@kernelOfTruth
Copy link
Contributor Author

same as, but with split out commits:
#4236
[dedicated box testing] ABD2 + Illumos #4950 + Illumos #2605 (master January 16th, 2015)

@kernelOfTruth kernelOfTruth changed the title [eval][buildbot] ABD2 + Illumos #4950 + Illumos #2605 (master January 16th, 2015) [dedicated box testing] ABD2 + Illumos #4950 + Illumos #2605 (master January 16th, 2015) Jan 17, 2016
@behlendorf
Copy link
Contributor

Closing as stale.

@behlendorf behlendorf closed this Mar 24, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants