Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve ashift handling #2024

Closed
FransUrbo opened this issue Jan 4, 2014 · 22 comments · Fixed by #5763
Closed

Improve ashift handling #2024

FransUrbo opened this issue Jan 4, 2014 · 22 comments · Fixed by #5763

Comments

@FransUrbo
Copy link
Contributor

I just got three additional 3TB disks to replace disks in a vdev with 1.5TB disks that isn't under warranty any more.

The pool was originally created like this (all 1.5TB disks, five vdevs with three disks in each as a RAIDz):

History for 'share':
2011-11-03.22:41:47 zpool create -f -o ashift=12 share raidz1 scsi-SATA_ST31500341AS_9VS3S9YD scsi-SATA_ST31500341AS_9VS08THF scsi-SATA_ST31500341AS_9VS16S63
2011-11-03.22:41:50 zpool add -f share raidz1 scsi-SATA_ST31500341AS_9VS4XK4T scsi-SATA_ST31500341AS_9VS4Q3F4 scsi-SATA_ST1500DL003-9VT_5YD1F2KF
2011-11-03.22:41:54 zpool add -f share raidz1 scsi-SATA_ST31500341AS_9VS3SAWS scsi-SATA_ST31500341AS_9VS0DR98 scsi-SATA_ST31500341AS_9VS13W11
2011-11-03.22:41:58 zpool add -f share raidz1 scsi-SATA_ST31500341AS_9VS4VT5R scsi-SATA_ST31500341AS_9VS4Q38C scsi-SATA_ST31500341AS_9VS4WM30
2011-11-03.22:42:00 zpool add -f share raidz1 scsi-SATA_ST31500341AS_9VS4VT5X scsi-SATA_ST31500341AS_9VS4WWPA scsi-SATA_ST31500341AS_9VS0H3A9

The disks in the first vdev was replaced with 3TB disks some months ago:

2013-04-20.14:08:08 zpool replace -f share scsi-SATA_ST31500341AS_9VS08THF scsi-SATA_ST3000DM001-9YN_Z1F0X41F
2013-04-23.09:40:23 zpool replace share scsi-SATA_ST31500341AS_9VS3S9YD scsi-SATA_ST3000DM001-9YN_Z1F0WJSN
2013-04-24.21:55:34 zpool replace share scsi-SATA_ST31500341AS_9VS16S63 scsi-SATA_ST3000DM001-9YN_Z1F0X3P4

This naturally worked just fine, so I didn't have any suspicion that the current replace would fail. But I got 'devices have different sector alignment'. Googling this, I found #1381.

So double checking with zdb I found:

celia(ZFSRoot):~# zdb | grep ashift
            ashift: 12
            ashift: 9
            ashift: 9
            ashift: 9
            ashift: 9

So the zpool add didn't create the vdevs with ashift=12, but the default ashift=9!

This was not expected. I had expected the ashift to be inherited (all documentation talks about 'use ashift=12 when creating the pool' - which I did).

One could argue that this is a documentation issue, but I think that would be a (to) easy fix. Currently I see that '-o ashift=xx' is allowed for add, but I'm quite sure it wasn't when I created the pool a year ago. Don't remember the exact version I used then, but some of the 0.6.0 rc versions I think. 0.6.0-rc11 was stable at the time, so I guess that's the one...

PS. I've now done the first replace using '-o ashift=9' (been resilvering for about 14h, 16h to go at ~200M/s). So this explains why my pool have been so darn slow! I thought it was cpu/mem/controller that wasn't fast enough so I never bothered to investigate further - it was fast enough for my use case...

@FransUrbo
Copy link
Contributor Author

I just verified this with the latest HEAD:

root@debianzfsroot:/usr/src# dmesg | grep ZFS:
[   10.634825] ZFS: Loaded module v0.6.2-671_gf4179a8, ZFS pool version 5000, ZFS filesystem version 5
root@debianzfsroot:/usr/src# zdb | egrep '^[a-z]|ashift'
rpool:
            ashift: 12
root@debianzfsroot:/usr/src# zpool create -f -o ashift=12 test scsi-SATA_VBOX_HARDDISK_VB6-1a2b3c4d
root@debianzfsroot:/usr/src# zdb | egrep '^[a-z]|ashift'
rpool:
            ashift: 12
test:
            ashift: 12
root@debianzfsroot:/usr/src# zpool add -f test scsi-SATA_VBOX_HARDDISK_VB7-1a2b3c4d
root@debianzfsroot:/usr/src# zdb | egrep '^[a-z]|ashift'
rpool:
            ashift: 12
test:
            ashift: 12
            ashift: 9
root@debianzfsroot:/usr/src# zpool status
  pool: rpool
 state: ONLINE
  scan: none requested
config:

        NAME                                             STATE     READ WRITE CKSUM
        rpool                                            ONLINE       0     0     0
          raidz3-0                                       ONLINE       0     0     0
            scsi-SATA_VBOX_HARDDISK_VB05bb47dd-7d16cc18  ONLINE       0     0     0
            scsi-SATA_VBOX_HARDDISK_VB2be664e9-56c45d10  ONLINE       0     0     0
            scsi-SATA_VBOX_HARDDISK_VB472d7731-6f7e6042  ONLINE       0     0     0
            scsi-SATA_VBOX_HARDDISK_VBd99ecd61-e6bd2e05  ONLINE       0     0     0
            scsi-SATA_VBOX_HARDDISK_VBe5a09da2-3b9c7b36  ONLINE       0     0     0

errors: No known data errors

  pool: test
 state: ONLINE
  scan: none requested
config: 

        NAME                                    STATE     READ WRITE CKSUM
        test                                    ONLINE       0     0     0
          scsi-SATA_VBOX_HARDDISK_VB6-1a2b3c4d  ONLINE       0     0     0
          scsi-SATA_VBOX_HARDDISK_VB7-1a2b3c4d  ONLINE       0     0     0

errors: No known data errors

This disk have been exported with shareiscsi=blocksize=512 (probably used the default volblocksize which ended up as 512).

Unfortunately, it seems that VirtualBox have the block size hard coded to 512, because I can't find anywhere to change it (I created a number of volumes with volblocksize=4096 and shared that with blocksize=4096, but I still end up with a 512 block device in the host).

@behlendorf
Copy link
Contributor

@FransUrbo I agree the current behavior isn't exactly what one would expect. It feels like when you're creating the pool you're setting a pool wide property which will be inherited. Unfortunately, the code doesn't work that way the optimum ashift value is vdev specific. By default it will select whatever ashift size the drive claims is optimal as long as it's compatible with the ashift sizes used by other vdevs.

At a minimum we should document this behavior clearly. Even better would be to make the code slightly smarter so zpool add prefers to use the same ashift size as other drives in the same vdev. Then only fall back to the drive reported size if it's impossible to use this preferred size.

There some related work from the FreeBSD guys in #1671 but it still needs a careful review and testing.

FransUrbo added a commit to FransUrbo/zfs that referenced this issue Jun 6, 2014
behlendorf pushed a commit that referenced this issue Jun 6, 2014
Users need to be aware that when adding devices to an existing pool
they may need to override automatically detected ashift value.
This will all depend on the exact hardware they are using.

Signed-off-by: Turbo Fredriksson <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes: #2024
@behlendorf behlendorf modified the milestones: 0.6.3, 0.6.4 Jun 6, 2014
@FransUrbo
Copy link
Contributor Author

@behlendorf I see that you put this in for 0.6.3, but the real solution is to write code that takes care of this. Maybe I shouldn't have written a 'Closes' in the commit, but instead 'Closes (partly)' :).

Could you tag this issue for 0.6.4 (and not close it if/when you accept the pull req), but keep the #2363 for 0.6.3?

@behlendorf
Copy link
Contributor

Sure, I've reopened the issue and bumped it to 0.6.4. But I am glad we're documenting the current behavior more clearly.

@behlendorf behlendorf modified the milestones: 0.6.4, 0.6.3 Jun 9, 2014
behlendorf pushed a commit that referenced this issue Jun 27, 2014
Users need to be aware that when replacing devices in an existing
pool they may need to override automatically detected ashift value.
This will all depend on the exact hardware they are using.

Signed-off-by: Turbo Fredriksson <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue #2024
@GregorKopka
Copy link
Contributor

I agree the current behavior isn't exactly what one would expect.

This is a mild understatement.

It feels like when you're creating the pool you're setting a pool wide property which will be inherited.

Running

$ zpool get ashift *pool*
pool  ashift  12 local

certainly leads to that expectation.

Unfortunately, the code doesn't work that way the optimum ashift value is vdev specific.

It is very confusing.

If ZFS completely ignores the ashift property of the pool when it comes to managing vdevs (add, attach, replace) - then what is it's use and/or why is it still in existance as a pool property?

@FransUrbo
Copy link
Contributor Author

It is very confusing.

If ZFS completely ignores the ashift property of the pool when it comes to managing vdevs (add, attach, replace) - then what is it's use and/or why is it still in existance as a pool property?

It is confusing, I grant you that. But a VDEV is part of a pool, which makes it a
pool property (and not a dataset property).

@GregorKopka
Copy link
Contributor

So zpool get ashift will report something like 9,12 when vdevs with different ashift exist?

@FransUrbo
Copy link
Contributor Author

So zpool get ashift will report something like 9,12 when vdevs with different ashift exist?

No, but maybe it should?

@behlendorf behlendorf removed this from the 0.6.4 milestone Oct 30, 2014
@behlendorf behlendorf added Type: Feature Feature request or new feature Difficulty - Hard labels Oct 30, 2014
@behlendorf behlendorf changed the title ashift isn't inherited by added vdevs Improve ashift handling Oct 30, 2014
ryao pushed a commit to ryao/zfs that referenced this issue Nov 29, 2014
Users need to be aware that when replacing devices in an existing
pool they may need to override automatically detected ashift value.
This will all depend on the exact hardware they are using.

Signed-off-by: Turbo Fredriksson <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#2024
@JKDingwall
Copy link
Contributor

I've just been bitten by the assumption that the value used at create time is inherited for subsequent attach operations. Specifically the FAQ http://zfsonlinux.org/faq.html says:

"This tuning can only be set when the pool is first created and it will result in a decrease of capacity." 

I think this should be clarified asap.

@FransUrbo
Copy link
Contributor Author

zpool(8):

ashift
  […]
  Keep  in  mind is that the ashift is vdev specific and is not a pool global.
  This means that when adding new vdevs to an existing pool you may
  need to specify the shift.

It's clearly there… If you/people don't read the documentation, then it doesn't matter how or where we put it...

@JKDingwall
Copy link
Contributor

I did read the documentation and it told me that the property can only be set when the pool is first created. It seems reasonable to assume that the documentation is consistent and correct whether that is the man page or the FAQ. As it stands I consider the FAQ text contradicts the and suggest it should be amended for clarity or that it should simply refer man page if that is the authoritative source rather than making a contradictory statement.

The line in the man page was added in 022f7bf which is pre-dated by https://web.archive.org/web/20140307194846/http://zfsonlinux.org/faq.html.

@FransUrbo
Copy link
Contributor Author

There is no contradiction! Both apply:

The following property can be set at creation time:

ashift
  […]
  Additionally, the minimum (disk) write size will be set to the specified size, so this represents a space vs. performance trade-off.
  […]
  Since the property cannot be changed after pool creation […]
  […]
  Keep  in  mind is that the ashift is vdev specific and is not a pool global.

@JKDingwall
Copy link
Contributor

I perceive a contradiction because my interpretation of the FAQ text is that the ashift value cannot be changed once the zpool has been created. This is correct in the sense that for the original vdev it cannot but it also implies that all vdevs which are subsequently attached will use the same value which is not true and was clarified by the man page commit. It would be helpful for the FAQ to make the same clarification as the man page that the parameter only applies to the initial vdev and that future operations should use the ashift option again when necessary.

@FransUrbo
Copy link
Contributor Author

my interpretation of the FAQ text is that the ashift value cannot be changed once the zpool has been created.

Correct.

it also implies that all vdevs which are subsequently attached will use the same value

And that's where the man page comes in:

[…] ashift is vdev specific […] when adding new vdevs to an existing pool you may need to specify the ashift.

It would be helpful for the FAQ to make the same clarification as the man page that the parameter only applies to the initial vdev and that future operations should use the ashift option again when necessary.

I could agree on that. BUT,

  1. Code trumps man page
  2. Man page trumps FAQ
  3. FAQ trumps HOWTO

If you want to create a FAQ update, feel free to create a pull request against https://github.com/zfsonlinux/zfsonlinux.github.com.

@behlendorf
Copy link
Contributor

@JKDingwall I'm all for updating the FAQ if you could propose some clear language to avoid confusion.

@behlendorf behlendorf added this to the 0.7.0 milestone Sep 22, 2015
@Sachiru
Copy link

Sachiru commented Sep 23, 2015

Perhaps instead of the sentence

This tuning can only be set when the pool is first created and it will result in a decrease of capacity.

We can word it as

This tuning can only be set when devices are first added to a pool, such as when the pool is first created or when a new vdev is added. Do note that this setting will result in a decrease of capacity.

@FransUrbo
Copy link
Contributor Author

Sounds good to me! @Sachiru do you also just happen, perchance, have a wording for the FAQ? :)

@Sachiru
Copy link

Sachiru commented Sep 24, 2015

I've created a pull request with the updated FAQ here:

zfsonlinux/zfsonlinux.github.com#30

@behlendorf
Copy link
Contributor

@Sachiru thank you, I've merged your updated FAQ entry.

@rincebrain
Copy link
Contributor

This really seems like a problem - I got burned by this as well, setting ashift=12 at creation time, not realizing that it would need me to also set ashift=12 every time I ever added a device to a pool again if I didn't want to mix them.

If I cut a pull request that changed the zpool add behavior to default to the pool-wide ashift property if set to non-zero, and require -f if it's set to 9 and one or more of the devices report =12, would it be likely to be taken?

@FransUrbo
Copy link
Contributor Author

@rincebrain Sounds likely to be accepted. IF you do the same for replace as well :). I've been planning on doing something similar, but I haven't had the energy or enough interest in doing it :).

@behlendorf
Copy link
Contributor

@rincebrain we'd have to see the proposed change of course, but yes that sounds reasonable. As long as you handle both the add and replace cases.

@behlendorf behlendorf modified the milestones: 0.8.0, 0.7.0 Mar 25, 2016
@behlendorf behlendorf removed this from the 0.8.0 milestone Oct 11, 2016
@behlendorf behlendorf removed the Type: Feature Feature request or new feature label Oct 11, 2016
behlendorf pushed a commit that referenced this issue May 3, 2017
This commit allow higher ashift values (up to 16) in 'zpool create'

The ashift value was previously limited to 13 (8K block) in b41c990
because the limited number of uberblocks we could fit in the
statically sized (128K) vdev label ring buffer could prevent the
ability the safely roll back a pool to recover it.

Since b02fe35 the largest uberblock size we support is 8K: this
allow us to store a minimum number of 16 uberblocks in the vdev
label, even with higher ashift values.

Additionally change 'ashift' pool property behaviour: if set it will
be used as the default hint value in subsequent vdev operations
('zpool add', 'attach' and 'replace'). A custom ashift value can still
be specified from the command line, if desired.

Finally, fix a bug in add-o_ashift.ksh caused by a missing variable.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: loli10K <[email protected]>
Closes #2024 
Closes #4205 
Closes #4740 
Closes #5763
@gmelikov gmelikov removed the Type: Documentation Indicates a requested change to the documentation label Nov 2, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants