-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filling special vdevs #15226
Comments
You can use https://github.com/markusressel/zfs-inplace-rebalancing as a stop-gap solution. Due to how it works however, you should ensure no application is able to modify the files when the operation is in progress. I'm currently working on a more robust version, but atomicity in userspace isn't 100% possible. |
This already exists, it's available via |
You're missing the point completely. Try that with a large dataset - it's unrealistic to have a spare 30TB to do that. The point of in-place rebalance is to do it without needing to have as much free space as your data takes. |
Try your approach with any dataset that has a somewhat recent snapshot which (for reasons) can't be dropped instantly: it'll double in allocated size and stay that way. No way around that without block pointer rewrite. |
That's only true if you do indeed copy every single file, but if the goal is preloading a special device then there are some tweaks that can be made. But again you seem to be missing the point; the entire purpose of this issue is to avoid the need for either of these options (in-place copying or entire dataset sending). ZFS already knows which records could be moved/copied onto the special device(s), we just need a way to do it that can be triggered separately from ordinary writing. |
What you ask for is requiring block pointer rewrite. Please read up on that topic before jumping to unfounded conclusions about anyone missing points. |
It's only required if the data is actually moved from the vdev to the special device; if it's only copied between them then this should be less challenging. Most users aren't going to care whether the data is moved rather than copied, as the goal is to improve performance; this issue was also posted following #15118 where copying is the goal anyway. The fact that it's possible to migrate the data via "hacks" (zfs send/receive or "in place" copying) means ZFS can already do this without block pointer rewriting, it just needs to do it automatically and only for the required data, i.e- identify |
As the whole point of the exercise is to move existing data to a different vdev - which, as of the design decision that was taken at the beginning of ZFS to both never ever overwrite data on-disk and have a strong checksum chain up to the uberblocks, is technically impossible without block pointer rewrite - you ask for something that (at this point in time and for the foreseeable future) is simply not possible. With your argument of "Most users aren't going to care whether the data is moved rather than copied" you also invalidate any need for this feature, which basically also just boils down to a glorified wrapper around a bash script that fits onto a 80s aera text display. As of your argument that this would be just a follow up to #15118: please see #15118 (comment) |
That's not the point at all, as moving the data isn't required in the short term; the key goal is to "preload" a new special device so it's filled with data as quickly as possible, as the primary purpose of a special device is to accelerate reads/writes to the array, the added capacity is a secondary benefit (and with #15118, potentially unwanted). While it would be nice if the data is moved, so that capacity can be freed up (as with recreating the pool/datasets) it's not a requirement for this feature to function. If we ever get proper block pointer rewriting it can be used to change this to a proper move, but the feature in no way requires that to be implemented first. |
Please read the comment I linked (and the link inside it). That would give you what you desire, with a technically feasible approach that does not require changing fundamentals of ZFS nor having to ignore a significant part of the ZFS feature set. |
Who said anything about either? ZFS already has and manages copies of data (it is one of the most fundamental parts of ZFS). At most the only new capability that might be required is the ability to indicate a "new" record is a redundant copy of another, so it can be skipped by |
To indicate a "new" record being a copy of another requires to rewrite the block pointer(s) toward the "old" record, else pool free space accounting will fail and lead to destruction of the pool. Would you please notice the word "rewrite" there and from that figure out what consequences it has? |
No it doesn't; while block pointer rewriting would be beneficial as it would avoid the need for "new" records to do this initially, it's also a capability that can be add later when (if) block pointer rewrite is added. I'm getting really tired of having to point out to you that features you keep shooting down as "impossible" without block pointer rewrite are simply intended to automate what can already be done via copying. If it can be done by wastefully copying everything, then it can be done by more precisely copying only what it is needed. If it can later be done by only rewriting existing blocks then great, that'll be a fantastic benefit of that feature should it ever see the light of day. But a potential a future feature that may never get implemented can't keep being used as an excuse not to implement other useful features. |
@GregorKopka that seems unnecessarily hostile as it is, vdev removal is a thing now that was also considered to not be technically possible without block pointer rewrite. more conducive dialog would be nice, such as the limitations of the vdev removal approach and how it might be applicable here. |
Hostilities started right here, after I pointed out technical problems with the suggestion.
Please see #9129 and related about the limitations of vdev removal and what could possibly be done about them. All current (and, AFAIK, so far suggested) approaches to vdev removal work on a completely different layer of the ZFS stack and (while they could be, in theory, augmented to distribute blocks from the to-be-removed vdev by some additional criterias onto different vdevs) add at least one level of indirection, that not only increases required storage space (from mapping the old DVA to the new one), but also needs to be taken on every access (which impacts performance by increasing latency and reducing available IO bandwidth of the bare metal drives). From this I do not see how taking that route could help for the stated goal of this issue: moving data onto other vdevs in the case that there is not enough space in the pool to hold a temporary copy of the dataset. |
That comment is in no way hostile whatsoever; even if it were (which it isn't) that wouldn't excuse your behaviour. You've been trolling multiple issues, arguing in bad faith, creating straw-man arguments, insulting people, and flat out just lying about what others have asked for, case in point:
Once again, the priority with filling a While the "vdev remap" layer could be usable it's again not required, and may be overkill for the purpose. This feature request is asking for something we can already do ourselves (copy the files), except that having ZFS do it would allow it to be done more precisely and automatically (eliminating the need for users to somehow identify the files to copy themselves). While it would be great if it could be done by moving, if it's going to be too costly/difficult then it's not a requirement and never has been. I have now tried multiple times to remind you of what the actual goal of this issue is, I have to assume you are aware by now, yet you are still trying to tell everyone (including the issue's author) that they want something else. Why are you here if you don't care what the issue even is? Feel free to ignore it and related issues instead. |
Instead of assuming (ass-u-me: makeing an ass out of you and me) that I'm a troll who just wants to fuck with you out of pure boredom... you could maybe think for a moment about potential other reasons why someone who is here 10+ years repeatedly tries to point out technical and practical shortcomings perceived with feature(s) you suggest. |
And yet it holds true, as here you are with more insults. As I've already pointed out multiple times you haven't been raising "technical shortcomings" with this feature, you've been making false assumptions about what the feature's purpose is so you can shoot down a version of it that nobody actually asked for; if you don't want to give feedback on the feature that was actually requested then all you're doing is making straw-man arguments in bad faith that we all could very much do without. |
I'm hesitant to respond here, as the atmosphere definitely went from productive to personal sandbox* On the technical side of things I have to agree with @Haravikk, especially on "This feature request is asking for something we can already do ourselves (copy the files), except that having ZFS do it would allow it to be done more precisely and automatically": this hits the nail on the head and more. It's impossible in the userspace to ensure safety of this operation. With Linux removing mandatory locks, unless one can ensure nothing is using the dataset.... it's not safe to perform the operation. It's orders of magnitude more dangerous to do it in userspace, not to mention the efficiency. On top of that, there are many scenarios in my line of work where moving the data isn't really the main objective but ensuring that special vdev is filled in to speed up the application. Currently my production system uses ~1.4% of data contents for special vdev. If I had to settle for wasting 1.5% of space on the main pool so that it's copied to the special vdev before the data rotates it would be perfectly acceptable. It would be great if the data was truly moved but it's more a cherry on the top in my usecase. Yes, doing this 100% perfectly is complex (or maybe even not feasible without large changes, as you guys pointed out). However, perfect is the enemy of good. There are tradeoffs everywhere and as long as these are sane and documented most of the people are ok with that. ZFS has many of them: it will probably use more space, it will use more CPU, and cause higher I/O usage... but in return provide a lot of benefits over FAT32 (yes, I'm deliberately hyperbolic here). It's a tradeoff. *Offtopic: |
Describe the feature would like to see added to OpenZFS
I would like to see a new command, or possibly an option added to
zfs scrub
that allows records to be transferred/copied ontospecial
devices if capacity allows.Essentially this would scan the dataset(s) for records within the special record size, and copy/move them to the special device so they can immediately benefit from the faster device.
How will this feature improve OpenZFS?
Currently when a
special
device is added to an existing pool, there is no immediate benefit, as any existing records that could be on the special device will remain where they are. This meansspecial
devices really need to be added when a pool is created, or dataset(s) need to be re-created in order to gain the full benefits.Otherwise only new records will be stored to the special device, and thereby benefit from any increased speed.
Additional context
I'm creating this request as an extension of issue #15118 ("write-through" special devices) which is why I mention transfer/copy as this feature within the context of a "safe" special device (no redundancy required) would involve copying rather than moving qualifying records.
However they should be considered independent as while this is more important for #15118 (as reloading will restore lost performance), it isn't critical that "reloading" be implemented first.
The text was updated successfully, but these errors were encountered: