Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenZFS 7104 - increase indirect block size #5679

Closed
wants to merge 1 commit into from

Conversation

gmelikov
Copy link
Member

Authored by: Matthew Ahrens [email protected]
Reviewed by: George Wilson [email protected]
Reviewed by: Paul Dagnelie [email protected]
Reviewed by: Dan McDonald [email protected]
Approved by: Robert Mustacchi [email protected]
Ported-by: George Melikov [email protected]

OpenZFS-issue: https://www.illumos.org/issues/7104
OpenZFS-commit: openzfs/openzfs@4b5c8e9

@@ -679,6 +679,8 @@ typedef struct zpool_rewind_policy {

/*
* This is needed in userland to report the minimum necessary device size.
*
* Note that the zfs test suite uses 64MB vdevs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your changes to the test suite below render this comment obsolete :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this change, thanks!

@gmelikov
Copy link
Member Author

Looks like this PR requires more disk space for buildbot tests. @behlendorf Should i correct a PR someway?

@behlendorf
Copy link
Contributor

@gmelikov if you make the following two changes we should be able to get a full test run.

First, creating a sparse volume for zconfig test 6 should significantly reduce the space requirements and allow the test to pass.

diff --git a/scripts/zconfig.sh b/scripts/zconfig.sh
index 45b6644..e78c80d 100755
--- a/scripts/zconfig.sh
+++ b/scripts/zconfig.sh
@@ -360,7 +360,7 @@ test_6() {
        # Create a pool and volume.
        ${ZFS_SH} zfs="spa_config_path=${TMP_CACHE}" || fail 1
        ${ZPOOL_CREATE_SH} -p ${POOL_NAME} -c lo-raid0 || fail 2
-       ${ZFS} create -V 800M ${FULL_ZVOL_NAME} || fail 3
+       ${ZFS} create -s -V 800M ${FULL_ZVOL_NAME} || fail 3
        ${ZFS} set snapdev=visible ${FULL_ZVOL_NAME} || fail 3
        label /dev/zvol/${FULL_ZVOL_NAME} msdos || fail 4
        partition /dev/zvol/${FULL_ZVOL_NAME} primary 1 -1 || fail 4

Second, the ziltest.sh is hard coded to run under /var/tmp/ and there doesn't appear to be enough space. Let's just disable it in this PR so we can run the entire ZFS Test Suite. If you add the following line to your commit comment ziltest will be skipped.

TEST_ZILTEST_SKIP="yes"

What needs to happen longer term is for us to move the zconfig.sh and ziltest.sh test case in to the ZFS Test Suite framework. There's not a really compelling reason to keep them separate anymore.

@gmelikov gmelikov force-pushed the autoport-oz7104 branch 2 times, most recently from 6b0ff71 to 7f49f36 Compare January 28, 2017 09:17
@gmelikov
Copy link
Member Author

@behlendorf did so, thank you!

@gmelikov
Copy link
Member Author

@behlendorf some tests require more space, for example /usr/share/zfs/zfs-tests/tests/functional/cli_root/zpool_import/setup.

@behlendorf
Copy link
Contributor

@gmelikov each of the builders has 32G of scratch space mounted as /mnt/ which is used for testing. In order to make to easy and (mostly) safe for anyone to run the test suite the zfs-tests.sh script creates loop back devices backed by 2G file sparse files in /mnt/. It appears that the increased minimum vdev size in the test suite exceeds the default 2G loopback device size.

Increasing the default size of the files backing the loopback devices to 8G gives us enough space to run these tests but seems excessive. These tests ran perfectly well before with 200M partitions before this patch, and now we're requesting 768M ones. Some tests may also now take longer to run with larger devices.

My suggestion would be that we split the difference. Create 4G loopback devices and 512M vdevs for these test cases.

diff --git a/scripts/zfs-tests.sh b/scripts/zfs-tests.sh
index 01fb2b2..087c83b 100755
--- a/scripts/zfs-tests.sh
+++ b/scripts/zfs-tests.sh
@@ -39,7 +39,7 @@ QUIET=
 CLEANUP=1
 CLEANUPALL=0
 LOOPBACK=1
-FILESIZE="2G"
+FILESIZE="4G"
 RUNFILE=${RUNFILE:-"linux.run"}
 FILEDIR=${FILEDIR:-/var/tmp}
 DISKS=${DISKS:-""}
@@ -165,7 +165,7 @@ OPTIONS:
        -k          Disable cleanup after test failure
        -f          Use files only, disables block device tests
        -d DIR      Use DIR for files and loopback devices
-       -s SIZE     Use vdevs of SIZE (default: 2G)
+       -s SIZE     Use vdevs of SIZE (default: 4G)
        -r RUNFILE  Run tests in RUNFILE (default: linux.run)
 
 EXAMPLES:
diff --git a/tests/zfs-tests/tests/functional/cli_root/zpool_create/setup.ksh b/
index e72f806..28adaea 100755
--- a/tests/zfs-tests/tests/functional/cli_root/zpool_create/setup.ksh
+++ b/tests/zfs-tests/tests/functional/cli_root/zpool_create/setup.ksh
@@ -52,12 +52,12 @@ if [[ -n $DISK ]]; then
         #
        cleanup_devices $DISK
 
-        partition_disk $((($MINVDEVSIZE / (1024 * 1024)) * 3))m $DISK 7
+        partition_disk $((($MINVDEVSIZE / (1024 * 1024)) * 2))m $DISK 7
 else
        for disk in `$ECHO $DISKSARRAY`; do
                cleanup_devices $disk
 
-               partition_disk $((($MINVDEVSIZE / (1024 * 1024)) * 3))m $disk 7
+               partition_disk $((($MINVDEVSIZE / (1024 * 1024)) * 2))m $disk 7
        done
 fi
diff --git a/tests/zfs-tests/tests/functional/cli_root/zpool_import/zpool_import
index c3a6950..d0ddc91 100644
--- a/tests/zfs-tests/tests/functional/cli_root/zpool_import/zpool_import.cfg
+++ b/tests/zfs-tests/tests/functional/cli_root/zpool_import/zpool_import.cfg
@@ -112,7 +112,7 @@ export DISK_COUNT ZFS_DISK1 ZFSSIDE_DISK1 ZFS_DISK2 ZFSSIDE_
 
 export FS_SIZE=1g
 export FILE_SIZE=$MINVDEVSIZE
-export SLICE_SIZE="$((MINVDEVSIZE / (1024 * 1024)))m"
+export SLICE_SIZE="$((($MINVDEVSIZE / (1024 * 1024)) * 2))m"
 export MAX_NUM=5
 export GROUP_NUM=3
 export DEVICE_DIR=$TEST_BASE_DIR/dev_import-test

Some additional minor tuning might be required based on the test results but the VMs themselves should have enough space. As long as we get the test parameters right.

This change also appears to have introduced a memory leak we'll need to run down.

http://build.zfsonlinux.org/builders/Ubuntu%2016.04%20x86_64%20Kmemleak%20%28TEST%29/builds/393/steps/shell_11/logs/kmemleak

@gmelikov
Copy link
Member Author

gmelikov commented Jan 31, 2017

@behlendorf i updated PR, looks like FILESIZE variable exists in zfs-buildbot scripts too, changed it in openzfs/zfs-buildbot#53

behlendorf pushed a commit to openzfs/zfs-buildbot that referenced this pull request Jan 31, 2017
@behlendorf
Copy link
Contributor

@gmelikov you'll want to add this hunk to the patch when you next refresh it. It's from openzfs/openzfs@ec740b0 and resolves the reservation_013_pos and reservation_017_pos test failures. The proposed changes in #5714 should address the reservation_015_pos and reservation_016_pos failures due to the increased indirect block size.

diff --git a/usr/src/test/zfs-tests/tests/functional/reservation/reservation.shlib b/usr/src/test/zfs-te
index e58e198..d0a9b38 100644
--- a/usr/src/test/zfs-tests/tests/functional/reservation/reservation.shlib
+++ b/usr/src/test/zfs-tests/tests/functional/reservation/reservation.shlib
@@ -25,7 +25,7 @@
 #
 
 #
-# Copyright (c) 2013 by Delphix. All rights reserved.
+# Copyright (c) 2013, 2016 by Delphix. All rights reserved.
 #
 
 . $STF_SUITE/tests/functional/reservation/reservation.cfg
@@ -144,7 +144,7 @@ function volsize_to_reservation
        typeset vol=$1
        typeset -i volsize=$2
 
-       typeset -i DN_MAX_INDBLKSHIFT=14
+       typeset -i DN_MAX_INDBLKSHIFT=17
        typeset -i SPA_BLKPTRSHIFT=7
        typeset -i SPA_DVAS_PER_BP=3
 

@gmelikov
Copy link
Member Author

gmelikov commented Feb 1, 2017

@behlendorf thanks, made these changes.

@behlendorf
Copy link
Contributor

@gmelikov please add TEST_ZFSTESTS_DISKSIZE=4G to the commit message. Then we'll get the larger loopback devices for this test which should address the setup failures.

@gmelikov
Copy link
Member Author

gmelikov commented Feb 1, 2017

@behlendorf done.

@gmelikov
Copy link
Member Author

gmelikov commented Feb 1, 2017

@behlendorf looks like TEST_ZFSTESTS_DISKSIZE=4G didn't work, maybe i did something wrong?

@behlendorf
Copy link
Contributor

@gmelikov it appears to be due to a bug in the test script, can you please review openzfs/zfs-buildbot#56. Then I can merge it and we can resubmit this.

@gmelikov
Copy link
Member Author

gmelikov commented Feb 1, 2017

@behlendorf thanks for fix! Already approved it.

@gmelikov gmelikov force-pushed the autoport-oz7104 branch 2 times, most recently from 4c7f896 to d4dc1d9 Compare February 2, 2017 07:57
@gmelikov
Copy link
Member Author

gmelikov commented Feb 2, 2017

Now /usr/share/zfs/zfs-tests/tests/functional/redundancy/redundancy_002_pos is failing with strange error: 09:16:42.33 ERROR: /sbin/zpool replace -f testpool.28906 /var/tmp/basedir.14913/vdev0 /var/tmp/basedir.14913/vdev0 exited 1 09:16:42.33 invalid vdev specification the following errors must be manually repaired: /var/tmp/basedir.14913/vdev0 is part of active pool 'testpool.28906'

As i understand /sbin/zpool replace -f testpool.28906 /var/tmp/basedir.14913/vdev0 /var/tmp/basedir.14913/vdev0 should finish successfully. This test wasn't changed.

@behlendorf
Copy link
Contributor

@gmelikov the following small patch to the test suite should resolve the remaining failures.

diff --git a/tests/zfs-tests/tests/functional/cli_root/zpool_import/zpool_import.cfg b/tests/zfs-tests/tests/functional/cli_root/zpool_import/zpool_import.cfg
index d0ddc91..b49281f 100644
--- a/tests/zfs-tests/tests/functional/cli_root/zpool_import/zpool_import.cfg
+++ b/tests/zfs-tests/tests/functional/cli_root/zpool_import/zpool_import.cfg
@@ -110,8 +110,8 @@ esac
 
 export DISK_COUNT ZFS_DISK1 ZFSSIDE_DISK1 ZFS_DISK2 ZFSSIDE_DISK2
 
-export FS_SIZE=1g
-export FILE_SIZE=$MINVDEVSIZE
+export FS_SIZE="$((($MINVDEVSIZE / (1024 * 1024)) * 16))m"
+export FILE_SIZE="$(($MINVDEVSIZE / 2))"
 export SLICE_SIZE="$((($MINVDEVSIZE / (1024 * 1024)) * 2))m"
 export MAX_NUM=5
 export GROUP_NUM=3
diff --git a/tests/zfs-tests/tests/functional/redundancy/redundancy.kshlib b/tests/zfs-tests/tests/functional/redundancy/redundancy.kshlib
index 491aa3e..4f2f801 100644
--- a/tests/zfs-tests/tests/functional/redundancy/redundancy.kshlib
+++ b/tests/zfs-tests/tests/functional/redundancy/redundancy.kshlib
@@ -226,7 +226,9 @@ function replace_missing_devs
 
        typeset vdev
        for vdev in $@; do
-               log_must $MKFILE $MINVDEVSIZE $vdev
+               log_must $GNUDD if=/dev/zero of=$vdev \
+                    bs=1024k count=$(($MINDEVSIZE / (1024 * 1024))) \
+                    oflag=fdatasync
                log_must $ZPOOL replace -f $pool $vdev $vdev
                while true; do
                        if ! is_pool_resilvered $pool ; then
  • The zpool_import_* failures were caused by FS_SIZE being set to small. This additional variable was added specifically for Linux which is why the needed change wasn't in the OpenZFS patch. The updated FS_SIZE is now scaled by the MINVDEVSIZE as was done by the rest of the patch.

  • The FILE_SIZE was reduced to half the $MINVDEVSIZE to speed up the run time so we don't hit the 10m timeout when running on loopback devices. This is still 2x what it was previously.

  • The redundancy*_ failures were not introduced by this patch but it seems to have made them more likely. I've seen them occur infrequently in the past. The issue here is that zpool create uses O_DIRECT to read the labels from disk. This bypasses the cache making it possible for old labels to still being discovered on disk when the test expects it to have been cleared.



@gmelikov
Copy link
Member Author

gmelikov commented Feb 3, 2017

@behlendorf thank you, applied your patch!

@gmelikov
Copy link
Member Author

gmelikov commented Feb 3, 2017

Another one:

Test: /usr/share/zfs/zfs-tests/tests/functional/snapshot/snapshot_008_pos (run as root) [00:00] [FAIL]
10:54:03.66 ASSERTION: Verify that destroying snapshots returns space to the pool.
10:54:03.67 NOTE: Populate the /var/tmp/testdir23092 directory
10:54:03.67 SUCCESS: /usr/share/zfs/zfs-tests/bin/file_write -o create -f /var/tmp/testdir23092/file1 -b 8192 -c 20 -d 1
10:54:03.68 SUCCESS: /sbin/zfs snapshot testpool.23092/[email protected]
10:54:03.69 SUCCESS: /usr/share/zfs/zfs-tests/bin/file_write -o create -f /var/tmp/testdir23092/file2 -b 8192 -c 20 -d 2
10:54:03.70 SUCCESS: /sbin/zfs snapshot testpool.23092/[email protected]
10:54:03.71 SUCCESS: /usr/share/zfs/zfs-tests/bin/file_write -o create -f /var/tmp/testdir23092/file3 -b 8192 -c 20 -d 3
10:54:03.72 SUCCESS: /sbin/zfs snapshot testpool.23092/[email protected]
10:54:03.72 SUCCESS: /usr/share/zfs/zfs-tests/bin/file_write -o create -f /var/tmp/testdir23092/file4 -b 8192 -c 20 -d 4
10:54:03.73 SUCCESS: /sbin/zfs snapshot testpool.23092/[email protected]
10:54:03.73 SUCCESS: /usr/share/zfs/zfs-tests/bin/file_write -o create -f /var/tmp/testdir23092/file5 -b 8192 -c 20 -d 5
10:54:03.75 SUCCESS: /sbin/zfs snapshot testpool.23092/[email protected]
10:54:03.75 SUCCESS: /usr/share/zfs/zfs-tests/bin/file_write -o create -f /var/tmp/testdir23092/file6 -b 8192 -c 20 -d 6
10:54:03.76 SUCCESS: /sbin/zfs snapshot testpool.23092/[email protected]
10:54:03.76 SUCCESS: /usr/share/zfs/zfs-tests/bin/file_write -o create -f /var/tmp/testdir23092/file7 -b 8192 -c 20 -d 7
10:54:03.77 SUCCESS: /sbin/zfs snapshot testpool.23092/[email protected]
10:54:03.78 SUCCESS: /usr/share/zfs/zfs-tests/bin/file_write -o create -f /var/tmp/testdir23092/file8 -b 8192 -c 20 -d 8
10:54:03.79 SUCCESS: /sbin/zfs snapshot testpool.23092/[email protected]
10:54:03.79 SUCCESS: /usr/share/zfs/zfs-tests/bin/file_write -o create -f /var/tmp/testdir23092/file9 -b 8192 -c 20 -d 9
10:54:03.80 SUCCESS: /sbin/zfs snapshot testpool.23092/[email protected]
10:54:03.81 SUCCESS: /sbin/zfs destroy testpool.23092/[email protected]
10:54:03.83 SUCCESS: /sbin/zfs destroy testpool.23092/[email protected]
10:54:03.84 SUCCESS: /sbin/zfs destroy testpool.23092/[email protected]
10:54:03.85 SUCCESS: /sbin/zfs destroy testpool.23092/[email protected]
10:54:03.87 SUCCESS: /sbin/zfs destroy testpool.23092/[email protected]
10:54:03.89 SUCCESS: /sbin/zfs destroy testpool.23092/[email protected]
10:54:03.90 SUCCESS: /sbin/zfs destroy testpool.23092/[email protected]
10:54:03.91 SUCCESS: /sbin/zfs destroy testpool.23092/[email protected]
10:54:03.92 SUCCESS: /sbin/zfs destroy testpool.23092/[email protected]
10:54:03.93 NOTE: Performing local cleanup via log_onexit (cleanup)
10:54:03.94 Space not freed. (3960835072 != 3962538496)

IIRC destroy snapshot don't give back all space at once and it may have some time to free it? Can we check it somehow?

@behlendorf
Copy link
Contributor

@gmelikov the snapshot_008_pos failure looks unrelated to this change. Regardless I've resubmitted the build to be sure. Assuming it doesn't fail again this should be ready to merge.

I've seen this particular failure occasionally in the past and I suspect you're right it's due to the space being freed asynchronously in the background. It caused rare failures in this test and others. Let's tackle addressing this failure in it's own PR.

@behlendorf
Copy link
Contributor

behlendorf commented Feb 4, 2017

@gmelikov can you please rebase this on master and remove the TEST_ZILTEST_SKIP="yes" and TEST_ZFSTESTS_DISKSIZE=4G lines from the comment message. Then we'll be able to verify everything's in good shape. I've updated the buildbot again to make 4G the default size.

@gmelikov
Copy link
Member Author

gmelikov commented Feb 4, 2017

@behlendorf of course, done.

gmelikov added a commit to gmelikov/zfs-buildbot that referenced this pull request Feb 5, 2017
gmelikov added a commit to gmelikov/zfs-buildbot that referenced this pull request Feb 5, 2017
@behlendorf behlendorf mentioned this pull request Feb 8, 2017
5 tasks
@behlendorf
Copy link
Contributor

@gmelikov PR #5758 should resolve the ziltest.sh failures uncovered by this PR.

@behlendorf
Copy link
Contributor

@gmelikov I've rebased this branch on top of the ziltest fix which was merged.

@gmelikov gmelikov force-pushed the autoport-oz7104 branch 2 times, most recently from 5473c29 to c87e3e9 Compare February 9, 2017 08:51
Authored by: Matthew Ahrens <[email protected]>
Reviewed by: George Wilson <[email protected]>
Reviewed by: Paul Dagnelie <[email protected]>
Reviewed by: Dan McDonald <[email protected]>
Approved by: Robert Mustacchi <[email protected]>
Ported-by: George Melikov <[email protected]>

OpenZFS-issue: https://www.illumos.org/issues/7104
OpenZFS-commit: openzfs/openzfs@4b5c8e9
@gmelikov
Copy link
Member Author

gmelikov commented Feb 9, 2017

@behlendorf thanks, I rebased my branch too.

@behlendorf behlendorf closed this in d7958b4 Feb 9, 2017
wli5 pushed a commit to wli5/zfs that referenced this pull request Feb 28, 2017
Authored by: Matthew Ahrens <[email protected]>
Reviewed by: George Wilson <[email protected]>
Reviewed by: Paul Dagnelie <[email protected]>
Reviewed by: Dan McDonald <[email protected]>
Approved by: Robert Mustacchi <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Ported-by: George Melikov <[email protected]>

OpenZFS-issue: https://www.illumos.org/issues/7104
OpenZFS-commit: openzfs/openzfs@4b5c8e9
Closes openzfs#5679
wli5 pushed a commit to wli5/zfs that referenced this pull request Feb 28, 2017
Authored by: Matthew Ahrens <[email protected]>
Reviewed by: George Wilson <[email protected]>
Reviewed by: Paul Dagnelie <[email protected]>
Reviewed by: Dan McDonald <[email protected]>
Approved by: Robert Mustacchi <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Ported-by: George Melikov <[email protected]>

OpenZFS-issue: https://www.illumos.org/issues/7104
OpenZFS-commit: openzfs/openzfs@4b5c8e9
Closes openzfs#5679
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants