-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to run/trigger compression/deduplication of pool/volume manually #3013
Comments
Right, at the moment doing this transparently isn't supported. You're either going to need to do what you're doing with send/recv to a temporary volume which gets renamed. Or you could write a script to do this on a per-file basis for a dataset. If compression is enabled for the dataset new files will be compressed so you would just need to do something like this Doing this transparently in the background is technically possible but the same caveats regarding snapshot apply. They are immutable, period. Obviously someone would still need to write the code for this. |
Thank you very much! I wrote simple Perl script for this task https://gist.github.com/pavel-odintsov/aa497b6d9b351e7b3e2b and it works well. |
Unfortunately file-to-file iteration for my data is extremely slow. I run file_rewrite.pl about 36+ hours ago and now about 6% of data was processed. Processing of files is still not reliable way because files with broken names (due to encoding issues; not related with ZFS) did not processed correctly. Can I do same on block level in-place? I want to get all used blocks of my volume and do compression for they blocks instead relaying on files. |
No. You could send/recv for the pool with incremental snapshots. That would allow you to keep the downtown to a minimum. |
This issue is even more important in case of ZVOL when we can't touch every file in filesystem (ntfs, refs and another non linux fs). |
@behlendorf is it required to recreate the file or is it enough just to re-write the blocks? Can this rewriting be done at the VFS level? As I can see from the source code it should be enough. In this case one can implement 'toucher' using e.g. |
@paboldin simply re-dirtying the block is enough given two caviots.
|
See also #2554. |
The very old problem of BP rewrite. AFAIR, everybody that try abort saying it is too difficult. :-( |
I wrote a small shell script to replicate, verify and overwrite all files in the current working directory and all its descendant directories in order to trigger ZFS compression. Use with significant caution and make sure to have a backup beforehand. |
So, if for example we enabled deduplication and compression at the same time, or enabled compression and changed checksum algorithm, then dirtied all the blocks, it would result in them all being rewritten? (I presume a combination of deduplication and changed checksum would also work?) What would be the best way to re-dirty a block, given a hypothetical outer loop that cycles over every block of every file? Can it be done without changing the block's contents? (is this what the above conditions ensure?) Is this something that really should be done from within ZFS itself? From the accompanying library? Baseless speculation: Part of me wonders if it's possible to introduce a sequence number* in the block pointers just to make data appear "different" to zio_nop_write() without altering the settings. Then it's a matter of going through the directory tree and progressively dirtying every block of every file, so long as there's space** (and maybe I/O capacity) available to accommodate it. *a "please rewrite" flag would have to be set on everything, though perhaps that traversal wouldn't be so bad. Also maybe not, if you consider a flag to be a 2-value sequence number. Hmm. **might be enough to say to ZFS "please leave at least 200 GB" though one would expect the space to be reclaimed if there are no snapshots pinning it |
This is starting to remind me a little of the issue thread for radz expansion ( #12225 ). There were similar requests for a way to trigger the reformatting of old data to the new stripe width, though it may or may not be more tricky there. |
Hello!
I have big amount of non compressed data in multiple pools and volumes. I want to enable compression because my data compressed very well in synthetic tests.
I enabled compression for pool:
zfs set compression=lz4 data
But I can't find any way to compress data on pool without copying it again.
I do following:
It works well and compression going perfectly.
But how I can do compression in place without service interruption and creating temporary volumes?
I review zio.c code and found code used for compression is not hard to understand. What problems with in-place data compression or decompression?
This ticket can be related with #1071 but deduplication logic is very different in compare with compression.
The text was updated successfully, but these errors were encountered: