-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
batch mode deduplication support (feature request/discussion) #1071
Comments
So what @ryao suggests is pretty much what happens today with inline dedup. The dedup tables are updated asynchronously during the txg writeback. The fundamental motivation for doing here is that I/O is expensive so it's better to dedup it once while it's already in memory. Of course if your system doesn't have enough memory for the dedup tables you end up performing I/O anyway and trashing performance. However, it seems to me that there is another logical place where dedup could be performed with minimal extra overhead and that's during a scrub/resilver. Performing a lazy/batch dedup in the context of scrub/resilver has a couple of... advantages:
disadvantages:
|
IMHO, the biggest problem facing dedupe today is extended lock up time when doing a zfs destroy, and worse, long mount time when rebooting in the middle of a destroy. Even with gobs of RAM, rebooting a pool that is in the middle of destroying a big deduped filesystem can take hours, days, or even weeks if the disks are slow at random IO. (I personally have experienced a reboot that took over 2 weeks on a large capacity system used for backups.) I believe the solution to this general area is two ways: First, make zfs destroy a lazy, background process, which does not hold any locks and does not affect pool mount time. Second, let us add a new vdev type: metadata. Get (say) two mirrored SSDs and add to the pool as "metadata" type, similar to how you would add log or cache currently. All metadata would live on this vdev and give a nice boost to normal (non-deduped) pool performance, as well as getting rid of the problem of loss of cache on reboot. |
@andys, this improvement is already implemented upstream in Illumos. (See https://www.illumos.org/issues/2619) ZoL will get it when the Feature Flags code is merged. |
Cool. What do you think of my "dedicated metadata vdev" idea? I believe it has been implemented before by tegile.com - they added a vdev type called "meta" for their proprietary ZFS-based SAN. |
+1 for lazy dedup |
Hello! I tried to use online dedupliation for big 70Tb storage and got excessive performance degradation. But for my load pattern I have whole night without any I/O load and I can run deduplication and compression manually. But ZFS provides only online deduplication. Will be fine if you add ability to run deduplication in background. Thank you for attention! |
@kpande How do the penalties compare? How much penalty would you suffer from with lazy dedup? |
Just wanted to mention that this feature is still very much wanted. I love dedup because it allows lazy backups; just dump a system backup into the designated backup dataset which has dedup enabled, and if anything already exists in a prior backup, dedup will catch it. However, in its current implementation, I have to copy backups to a non-deduped dataset first, and then slowly copy the backup over to the deduped dataset, periodically pausing once I/O gets clogged (or else any other application trying to access any data from the same pool will block, great design btw). This works but requires a lot of manual intervention, pausing the copy, unpausing it again when system load goes back to normal, etc. It's sort of workable as I can keep an eye on it on my second monitor while doing other things, but it seems like a ridiculous amount of manual supervision to ensure that a simple file copy doesn't become a poolwide DoS attack. |
Mr. Yao seems to think that the current zfs architecture might be close to having what would be required to expand out and do batch deduplication as well. The usefulness of the current inline deduplication method is highly limited due the drastic I/O performance hits it causes, so I thought this might be worth exploring.
Just so we're all on the same page, batch deduplication involves writing the data some place temporarily, then coming back and deduplicating it later. In other words, the deduplication happens in the background so it has as close to zero impact on write performance as possible for userland.
Ideally, blocks would be written out to storage and another process would come back around and dedup 'dirty' blocks at intervals, or as load based on some form of system metric drops below a threshold (similar concept to ksm/ksmtuned for kvm). However, I suspect this would bring about changes that might be worse about breaking compatibility and would also likely be somewhat less than trivial to implement.
Mr. Yao had some initial thoughts on the matter that might have some merit though. He seemed to think that it might be possible to use the ZIL for the temporary storage and do the actual deduplication prior to sending the data to its final resting place in the filesystem.
At least, I believe that was the gist of it. I'm sure he'll correct me if it's not quite right :)
The text was updated successfully, but these errors were encountered: