Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

btrfs-progs: mkfs: add --compress #925

Open
wants to merge 41 commits into
base: devel
Choose a base branch
from

Conversation

maharmstone
Copy link
Contributor

@maharmstone maharmstone commented Nov 19, 2024

Adds a --compress option to mkfs.btrfs when using --rootdir. (Resurrection of #882, closed by mistake.)

@adam900710, this includes your two patches, and all the suggestions that you made. If you're happy with all this, I suggested accepting this PR and abandoning #919.

kdave and others added 30 commits September 19, 2024 01:51
[BUG]
Currently with python3.12, the python bindding will always result the
following warning:

    [PY]     libbtrfsutil
/usr/lib/python3.12/site-packages/setuptools/_distutils/extension.py:134: UserWarning: Unknown Extension options: 'headers'
  warnings.warn(msg)

[CAUSE]
In the setup.py which specifies the files to be included into the package,
we use setuptools::Extension to specify the file lists and include paths.

But there is no handling of Extension::headers member, thus resulting the
above warning.

[FIX]
According to the docs of setuptools, MANIFEST.in is the file controlling
what files should be included.
So instead of the non-supported headers, use MANIFEST.in to include the
needed headers.

Signed-off-by: Qu Wenruo <[email protected]>
…escription

Instead of copying the file during custom build commands, just use a
soft link to re-use the existing README.d from libbtrfsutil.

Issue: kdave#310
Signed-off-by: Qu Wenruo <[email protected]>
With --chroot, the receive subcommand unconditionally sent a non-error
status message to stderr, e.g.:

$ btrfs --quiet receive --chroot /some/path
Chroot to /some/path

Signed-off-by: Sebastian Hamann <[email protected]>
Change mkfs.btrfs --subvol so that instead of being of the form --subvol
DIR:FLAGS, it's instead --subvol MODIFIER:DIR, with MODIFIER being ro,
rw, default, or ro-default.

Signed-off-by: Mark Harmstone <[email protected]>
Reworks mkfs.btrfs --subvol so that dir and full_path in struct
rootdir_subvol are stored as arrays rather than pointers.

Signed-off-by: Mark Harmstone <[email protected]>
…otdir-subvol

Adds tests to mkfs-tests/036-rootdir-subvol for the modifiers to
mkfs.btrfs --subvol: ro, rw, default, and default-ro.

Signed-off-by: Mark Harmstone <[email protected]>
On systems with glibc 2.34 and 2.39, the following warning appears when
building the binary:

    [CC]     common/help.o
common/help.c: In function ‘usage’:
common/help.c:315:58: warning: ‘%s’ directive argument is null [-Wformat-overflow=]
  315 |                 fprintf(outf, "No short description for '%s'\n", token);
      |                                                          ^~
common/help.c:312:46: warning: ‘%s’ directive argument is null [-Wformat-overflow=]
  312 |                 fprintf(outf, "No usage for '%s'\n", token);
      |                                              ^~

This happens for usage() which passes NULL pointer as token. Normally
this is fine, as fprintf() will output "(null)" for the NULL pointer,
but it's still not ideal.

Fix the warning by changing the token to "" if it's NULL.

Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: Anand Jain <[email protected]>
The slides for the talk "Kernel maintainership: an oral tradition",
linked to in the documentation, seem to have gone from the Linux
Foundation website. Change to the version on bootlin.com.

Signed-off-by: Mark Harmstone <[email protected]>
Fix a false positive in btrfs check, where we were returning an error
because an explicit hole in the log tree had no associated csum entry.

Signed-off-by: Mark Harmstone <[email protected]>
Use the correct address and size when looking for the csums for a
compressed extent in the tree log.

Signed-off-by: Mark Harmstone <[email protected]>
Fixes false positive in btrfs check that was causing btrfs/192 to fail.

It looks like there's circumstances in which btrfs_log_changed_extents
can also log unmodified extents, but not their csums. If that happens,
check the main csum root as well before showing an error.

Signed-off-by: Mark Harmstone <[email protected]>
This "stil" -> "still" typo is causing the latest CI spellchecks to fail.

Fix that so we can get a good CI run.

Signed-off-by: Qu Wenruo <[email protected]>
Added DEV_ITEM object id to the reserved object id list. It's historical reason
to let both of DEV_ITEM and ROOT_TREE have same object id. Developers should
be aware of it.

Signed-off-by: HAN Yuwei <[email protected]>
[ Replace immediate number with key names ]
Signed-off-by: Qu Wenruo <[email protected]>
Adds a test that btrfs check doesn't flag a file extent in the log tree as
an error because of no corresponding csum entry, if the file extent is
actually an explicit hole. See commit f6dc0e8.

The log tree in hole.img.xz:

log tree key (TREE_LOG ROOT_ITEM 5)
leaf 5373952 items 3 free space 15981 generation 7 owner TREE_LOG
leaf 5373952 flags 0x1(WRITTEN) backref revision 1
checksum stored 091c2f32
checksum calced 091c2f32
fs uuid 70d417f0-0173-49ca-bdf1-b789a798974d
chunk uuid b7ec3806-ea09-4f18-a5ef-126a2d79105e
        item 0 key (257 INODE_ITEM 0) itemoff 16123 itemsize 160
                generation 7 transid 7 size 4096 nbytes 0
                block group 0 mode 100644 links 1 uid 0 gid 0 rdev 0
                sequence 1 flags 0x0(none)
                atime 1721050040.375996658 (2024-07-15 14:27:20)
                ctime 1721050040.375996658 (2024-07-15 14:27:20)
                mtime 1721050040.375996658 (2024-07-15 14:27:20)
                otime 0.2820488960 (1970-01-01 01:00:00)
        item 1 key (257 INODE_REF 256) itemoff 16109 itemsize 14
                index 2 namelen 4 name: file
        item 2 key (257 EXTENT_DATA 0) itemoff 16056 itemsize 53
                generation 7 type 1 (regular)
                extent data disk byte 0 nr 0
                extent data offset 0 nr 4096 ram 4096
                extent compression 0 (none)

Signed-off-by: Mark Harmstone <[email protected]>
[ Move the image to 065-valid-log tree ]
Signed-off-by: Qu Wenruo <[email protected]>
…lid-log-tree

Adds a test that when checking for missing csums in the log tree, btrfs
check considers the compressed rather than uncompressed size of the
extent. See commit 175cbfc.

The log tree in compressed.img.xz:

log tree key (TREE_LOG ROOT_ITEM 5)
leaf 5373952 items 4 free space 15952 generation 7 owner TREE_LOG
leaf 5373952 flags 0x1(WRITTEN) backref revision 1
checksum stored 61faf6f2
checksum calced 61faf6f2
fs uuid 70d417f0-0173-49ca-bdf1-b789a798974d
chunk uuid b7ec3806-ea09-4f18-a5ef-126a2d79105e
        item 0 key (257 INODE_ITEM 0) itemoff 16123 itemsize 160
                generation 7 transid 7 size 4096 nbytes 0
                block group 0 mode 100644 links 1 uid 0 gid 0 rdev 0
                sequence 1 flags 0x0(none)
                atime 1721050040.375996658 (2024-07-15 14:27:20)
                ctime 1721050040.375996658 (2024-07-15 14:27:20)
                mtime 1721050040.375996658 (2024-07-15 14:27:20)
                otime 0.2820488960 (1970-01-01 01:00:00)
        item 1 key (257 INODE_REF 256) itemoff 16109 itemsize 14
                index 2 namelen 4 name: file
        item 2 key (257 EXTENT_DATA 0) itemoff 16056 itemsize 53
                generation 7 type 1 (regular)
                extent data disk byte 13631488 nr 4096
                extent data offset 0 nr 8192 ram 8192
                extent compression 1 (zlib)
        item 3 key (EXTENT_CSUM EXTENT_CSUM 13631488) itemoff 16052 itemsize 4
                range start 13631488 end 13635584 length 4096

Signed-off-by: Mark Harmstone <[email protected]>
[ Move the image to 065-valid-log tree ]
Signed-off-by: Qu Wenruo <[email protected]>
… already present

Adds a test that btrfs check doesn't report an error because of a
missing csum entry in the log tree, if that csum is already in the
normal csum tree. See commit 6c7d2a3.

The checksum tree in already-present.img.xz:

checksum tree key (CSUM_TREE ROOT_ITEM 0)
leaf 5390336 items 1 free space 16254 generation 5 owner CSUM_TREE
leaf 5390336 flags 0x1(WRITTEN) backref revision 1
checksum stored ff20c282
checksum calced ff20c282
fs uuid 70d417f0-0173-49ca-bdf1-b789a798974d
chunk uuid b7ec3806-ea09-4f18-a5ef-126a2d79105e
        item 0 key (EXTENT_CSUM EXTENT_CSUM 13631488) itemoff 16279 itemsize 4
                range start 13631488 end 13635584 length 4096

The log tree in already-present.img.xz:

log tree key (TREE_LOG ROOT_ITEM 5)
leaf 5373952 items 3 free space 15981 generation 7 owner TREE_LOG
leaf 5373952 flags 0x1(WRITTEN) backref revision 1
checksum stored 63afa0e5
checksum calced 63afa0e5
fs uuid 70d417f0-0173-49ca-bdf1-b789a798974d
chunk uuid b7ec3806-ea09-4f18-a5ef-126a2d79105e
        item 0 key (257 INODE_ITEM 0) itemoff 16123 itemsize 160
                generation 7 transid 7 size 4096 nbytes 0
                block group 0 mode 100644 links 1 uid 0 gid 0 rdev 0
                sequence 1 flags 0x0(none)
                atime 1721050040.375996658 (2024-07-15 14:27:20)
                ctime 1721050040.375996658 (2024-07-15 14:27:20)
                mtime 1721050040.375996658 (2024-07-15 14:27:20)
                otime 0.2820488960 (1970-01-01 01:00:00)
        item 1 key (257 INODE_REF 256) itemoff 16109 itemsize 14
                index 2 namelen 4 name: file
        item 2 key (257 EXTENT_DATA 0) itemoff 16056 itemsize 53
                generation 7 type 1 (regular)
                extent data disk byte 13631488 nr 4096
                extent data offset 0 nr 4096 ram 4096
                extent compression 0 (none)

Signed-off-by: Mark Harmstone <[email protected]>
[ Move the image to 065-valid-log tree ]
Signed-off-by: Qu Wenruo <[email protected]>
For compat_ro_flags_num and incompat_flags_num, just use ARRAY_SIZE().

Signed-off-by: Qu Wenruo <[email protected]>
This includes:

- Remove the "__" prefix
  Now the "__" is no longer recommended, and there is no function taking
  the "print_readable_flag" in the first place.

- Move the supported flags calculation into print_readable_flag()
  Since all callers are doing the same work before calling the function.

Signed-off-by: Qu Wenruo <[email protected]>
The current print-tree can not handle unsupported inode flags, e.g.
created by Synology's out-of-tree btrfs implementation.

The existing one just checks all the supported flags, and if no flag
hits, it will output "none" no matter if there is any unsupported one.

Fix this by implementing sprint_readable_flag(), and use the same
handling of print_readable_flag().
Although for inode flag, adds one extra handling to output "none" if no
flag hit at all.

Signed-off-by: Qu Wenruo <[email protected]>
- The IRC channel
- The mailling list
- The btrfs-progs repo for user space bugs

Signed-off-by: Colin Snover <[email protected]>
[ Add an SoB line ]
Signed-off-by: Qu Wenruo <[email protected]>
- Fix the format of the transid mismatch reason and type

- Fix a typo in the reason

- Explain more on the recoverable case
  That both a regular metadata read and read-write scrub can do the
  same trick.

- Add an extra data salvage method using "rescue=all,ro" mount option

Signed-off-by: Colin Snover <[email protected]>
[ Add an SoB line ]
Signed-off-by: Qu Wenruo <[email protected]>
- Btrfs-check does device assembly automatically
  Thus no difference when specifying different devices of the same
  filesystem

- Btrfs-check automatically choose good metadata
  Thus as long as there is any good mirror for metadata, it will not
  report error for that repariable metadata.

Signed-off-by: Colin Snover <[email protected]>
[ Add an SoB line, remove the scrub recommendation as btrfs-check is
  supposed to choose the good mirror ]
Signed-off-by: Qu Wenruo <[email protected]>
Since balance is copying the old good data/metadata into a new chunk
(which can be on the same failed device), it's not a safe way to handle
failed devices.

Signed-off-by: Colin Snover <[email protected]>
[ Add an SoB and simple commit message, remove the unnecessary
  explanation, and guide the user to use `btrfs dev replace` ]
Signed-off-by: Qu Wenruo <[email protected]>
- Explain that scrub is device based

- Add extra warning on NOCOW files
  Which implies NODATASUM, and can cause unexpected stale data to be
  returned.

- Explain the limitation of scrub
  As it can only do very basic checksum verification and very basic
  mirror based repair.

Signed-off-by: Colin Snover <[email protected]>
[ Add an SoB line and commit message, remove the mention of btrfs-check errors,
  as there is no evidence/example where btrfs-check failed to choose a good mirror. ]
Signed-off-by: Qu Wenruo <[email protected]>
* fix BTRFS capitalization
* fix repetition
* wording and punctuation in 'Nested subvolumes'
* wording and punctuation in 'system root layouts'
* wording and punctuation in 'Mount options'
* wording in 'Inode numbers'
* wording and punctuation in 'Performance'
(@silopolis) and others added 11 commits November 15, 2024 16:58
Update graphs.

[ci skip]

Signed-off-by: David Sterba <[email protected]>
musl 1.2.5 no longer defines basename in strings.h and requires including
libgen.h as specified by POSIX, and builds now fail with this without it:
common/device-utils.c: In function 'device_get_partition_size_sysfs':
common/device-utils.c:345:16: warning: implicit declaration of function 'basename' [-Wimplicit-function-declaration]
  345 |         name = basename(path);
      |                ^~~~~~~~
common/device-utils.c:345:14: warning: assignment to 'char *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
  345 |         name = basename(path);
      |              ^

Link: https://gitlab.alpinelinux.org/alpine/aports/-/issues/16106
Signed-off-by: Dominique Martinet <[email protected]>
Reviewed-by: Qu Wenruo <[email protected]>
when stdio is line buffered printf will not flush anything (on musl?),
leaving the program hanging without displaying any prompt and weird
dialogs such as the following:
```
alpine:~# btrfstune -S 0 /dev/mmcblk1p1
WARNING: this is dangerous, clearing the seeding flag may cause the derived device not to be mountable!
y
WARNING: seeding flag is not set on /dev/mmcblk1p1
We are going to clear the seeding flag, are you sure? [y/N]: alpine:~#
```

forcing flush makes the prompt display properly

Signed-off-by: Dominique Martinet <[email protected]>
Reviewed-by: Qu Wenruo <[email protected]>
The function btrfs_record_file_extent() has extra handling that's
specific to convert, like allowing the range to be split by block group
boundary and image file extent boundary.

All of these split can only lead to corruption for non-converted fs.
As the only caller out of btrfs-convert is rootdir, which expects the
file extent item insert to respect the reserved data extent, and never
to be split.

Thankfully this is not going to cause huge problem, as
btrfs_record_file_extent() has extra checks if the data extent overlaps
with any existing one, and if it doesn't the handling will be the same
as the kernel.

But to avoid abuse, change btrfs_record_file_extent() by:

- Rename it to btrfs_convert_file_extent()
  And add extra comments on that it is specific to btrfs-convert.

- Move it to convert/common.[ch]

- Introduce a helper insert_reserved_file_extent() for rootdir.c

Signed-off-by: Qu Wenruo <[email protected]>
…extent item

Just like insert_reserved_file_extent() from the kernel, we can make
btrfs_insert_file_extent() accept an on-stack file extent item
directly.

This makes btrfs_insert_file_extent() more flex, and it can now handle
the converted file extent where it has an non-zero offset.

And this makes it much easier to expand for future compressed file
extent generation.

Signed-off-by: Qu Wenruo <[email protected]>
There were two major problems with add_file_items(): it was
writing all files sector-by-sector, making compression impossible, and
it was assuming that pread would never do a short read.

Fix these problems, and create a new helper read_and_write_extent().

Signed-off-by: Mark Harmstone <[email protected]>
Add an option --compress to mkfs.btrfs, to allow creating files
using zlib when using --rootdir.

Signed-off-by: Mark Harmstone <[email protected]>
Allow --compress to work with zstd, when compiled in.

Signed-off-by: Mark Harmstone <[email protected]>
Allow --compress to work with lzo.

Signed-off-by: Mark Harmstone <[email protected]>
@maharmstone maharmstone changed the title btrfs-progs: mkfs: add --compression btrfs-progs: mkfs: add --compress Nov 19, 2024
@adam900710
Copy link
Collaborator

adam900710 commented Nov 19, 2024

Thanks a lot. I'll close #919 and use this branch instead.

Although to avoid previous accidents l'll take a more comprehensive review before merge.
So far the patches looks good to me (through the github WebUI) and I believe we can push this for the next cycle.

Copy link
Collaborator

@adam900710 adam900710 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most small style problems, but the compression parameter and the inode flag handling is not subtle, and affecting all the other compression algorithms.

@@ -455,6 +459,65 @@ static int insert_reserved_file_extent(struct btrfs_trans_handle *trans,
return ret;
}

static ssize_t zlib_compress_extent(struct btrfs_inode_item *btrfs_inode,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a simple comment explaining on the return value.

As it is not the regular return 0 if everything is fine. Instead it returns the size of the compressed data.

* Try to compress the first sector - if it would be larger,
* return -E2BIG.
*/
if (!(btrfs_stack_inode_flags(btrfs_inode) & BTRFS_INODE_COMPRESS)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part looks a little weird to me.
If an inode doesn't have COMPRESS flag, why we even should try compressing the first sector?

And I really prefer not passing an on-stack btrfs_inode to the compression code, especially we're only checking the compress flag.

I understand you want to emulate the kernel behavior, but to me, the try-first-sector behavior is not really that reliably. E.g. the "first" sector is only really the first sector of the buffer, it's not even the real first sector of a file.

Furthermore, if we got one 1MiB extent that its first sector is not compressible but overall it can still be compressed, then I'd say we lose the chance to save some space.

Thus I'd prefer to completely remove the btrfs_inode parameter. Just do the compression of the input buffer, check if it's making things larger, and return the number of the compressed size.

Let the caller to determine if we should set COMPRESS for the inode.

if (!(btrfs_stack_inode_flags(btrfs_inode) & BTRFS_INODE_COMPRESS))
return -E2BIG;

return 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return value 0 also looks weird to me.

If we got a compression error, we return -EINVAL.
If we got something compressed, we return the compressed size. (for Z_STREAM_END)

So this means, Z_OK will lead to return value 0?

Do we have the correct handling for Z_OK (should it be the same as Z_STREAM_END)?

goto out;
}

if (zlib_ret == Z_STREAM_END && strm.avail_out > 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, should we also handle Z_OK?

ret = zlib_compress_inline_extent(buffer, st->st_size,
&comp_buf, &comp_size);
if (ret < 0)
goto end;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we can not just fallback to regular inline extents?

@@ -384,7 +384,8 @@ static int reiserfs_convert_tail(struct btrfs_trans_handle *trans,
length, offset, convert_flags);

ret = btrfs_insert_inline_extent(trans, root, objectid,
offset, body, length);
offset, body, length,
0, length);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use BTRFS_COMPRESS_NONE to replace immediate number 0.

@kdave kdave force-pushed the devel branch 3 times, most recently from 72c9865 to 164145b Compare November 28, 2024 13:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants