Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for big files #17

Open
kenorb opened this issue Apr 26, 2018 · 25 comments
Open

Support for big files #17

kenorb opened this issue Apr 26, 2018 · 25 comments

Comments

@kenorb
Copy link
Contributor

kenorb commented Apr 26, 2018

$ afsctool -c somebigfile
Skipping file somebigfile with unsupportable size 6678124800
Unable to compress file.
@RJVB
Copy link
Owner

RJVB commented Apr 26, 2018 via email

@wmertens
Copy link

I have a 13GB sqlite db that compresses down to 800MB. It would be cool to be able to keep it compressed, but I wonder how that will perform, being stored in the resource fork.

On my server I use btrfs and that compresses extents, so by "block". That works great.

@RJVB
Copy link
Owner

RJVB commented Jan 10, 2019 via email

@wmertens
Copy link

argh of course, read-only. Hmm, ZFS would indeed be nice.

I wonder about the ZFS performance, I'd use it to put my projects tree, but I should probably move the compile cache off it. And then of course I'll need to pick an appropriate volume size, and upgrading to new macOS versions is gated by ZFS. Hmmm. Why can't Apple be like the Apple of 2005, when they were building the best Unix laptop there was?

SQLite performs admirably on large DBs :) in the end, it comes down to algorithms, and SQLite is very well implemented, and isn't burdened by a network layer. If your filesystem can hold it, SQLite can manage it.

@RJVB
Copy link
Owner

RJVB commented Jan 10, 2019 via email

@wmertens
Copy link

I'll give ZFS a whirl as a weekend project. Been quite a while since I used it.

My 2015 MBP just can't seem to handle modern web development any more, but maybe that's hardware issues. I should have stuck to 10.9 too, the last few releases didn't bring anything useful. 😞

Any extra info about those DB crashes? https://www.sqlite.org/howtocorrupt.html is good reading. SQLite is probably the most installed database in the world, so data corruption bugs are rare now.

@lucianmarin
Copy link

It will be great if there's was some kind of support for large files.
I have to work with HaveIBeenPwned passwords files and they eat a lot of space.

@gingerbeardman
Copy link
Contributor

gingerbeardman commented Jun 7, 2019

@lucianmarin have you tried it? see #17 (comment) which says that it's been added since Jan, but needs testing.

@RJVB
Copy link
Owner

RJVB commented Jun 8, 2019 via email

@lucianmarin
Copy link

lucianmarin commented Jun 8, 2019

I notice that the pwned files are 7zipped; those files will not compress with HFS compression.

Tried it on uncompressed (*.txt) version of those files. I got a system crash at the end. asfctool -v says it's compressed (46.2% savings), but I get system crashes while reading it.

Previously I used the Homebrew version of afsctool (1.6.4) which said: Unable to compress file.

@RJVB
Copy link
Owner

RJVB commented Jun 8, 2019 via email

@lucianmarin
Copy link

lucianmarin commented Jun 8, 2019

Yes, the standard kernel panic message and a reboot. I didn't use the -L option, instead I used -1 -c. As I read, this would trigger the ZIP compression.

I have a MBP 2017, 16 GB RAM. The memory usage of afsctool was 9 GB of memory with 8.9 GB of compressed memory. The swap increased to 15.9 GB at the end. The file I was trying to compress has 19.9 GB.

I think you should calculate the memory usage in advance based on file size. If the system can allocate that kind of memory, then it should be allowed to compress it. I'm not sure if this is a file system implementation, but the ordinary ZIP tools don't use that kind of memory.

Anyway, I used the tool on 3GB of CSV files with 96.2% savings. That's huge! Thank you for your work.

@wmertens
Copy link

wmertens commented Jun 8, 2019 via email

@RJVB
Copy link
Owner

RJVB commented Jun 8, 2019 via email

@wmertens
Copy link

wmertens commented Jun 8, 2019

It is my pet peeve that they went for a read-only variant without official easy compression tools because they also sell diskspace. That became even more apparent after they failed to include proper compression in their new filesystem.

This. So much. The devtools eat like 13GB of your disk, and if you run them through afsctool, you recover 7GB. Basically they're stealing 2.5% of developers' SSD drives.

@RJVB
Copy link
Owner

RJVB commented Jun 8, 2019 via email

@gingerbeardman
Copy link
Contributor

gingerbeardman commented Jun 8, 2019

This. So much. The devtools eat like 13GB of your disk, and if you run them through afsctool, you recover 7GB. Basically they're stealing 2.5% of developers' SSD drives.

Only true if you download and install manually.

If you download the dev tools (Xcode) — or indeed any other app — from the Mac App Store, they are HFS+ compressed (LZVN)

I also use CleanMyMacX to keep on top of unused/old SDKs, Simulators, builds, and other cruft.

@wmertens
Copy link

wmertens commented Jun 8, 2019 via email

@Dr-Emann
Copy link
Contributor

Dr-Emann commented Oct 22, 2022

I believe the real limit is probably 2/4 GiB (depending on if the values are actually unsigned 32 bits) compressed size (approximately), since it seems all of the formats use a 32 bit offset from near the start of the file to store the location of compressed blocks.

If the file @lucianmarin was trying to compress was 19.9 GB, and reported 46.2% savings, that would still be well over 4 GiB when compressed, which would overflow the 32 bit offsets, and I wouldn't be surprised if there was kernel-side code which would crash in the presence of wrap-around, especially if the kernel actually interprets them as 32 bit SIGNED values, since the blocks would be interpreted as starting at a negative offset.

@RJVB
Copy link
Owner

RJVB commented Oct 22, 2022

Good point!

But still something that could have been avoided by using relative offsets and a wide enough accumulator "register".

@Dr-Emann
Copy link
Contributor

Dr-Emann commented Oct 31, 2022

Nope! Looks like it really is on the uncompressed size: removing the size check, writing files of all zeros, of length 4000000000 works, length 4500000000 fails (or.. succeeds writing, but then kernel panics on read), and since it's all zeros, it compresses super well, so the compressed size is nowhere near any issues.

@DanielSmedegaardBuus
Copy link

Quick question, gotta leave for the ferry, is there any way to make afsctool compress a large file (in this case 2GB vmdk-slices that are on average 90% zeroes) even if it instantly decides its incompressible due to... I'm guessing the first megabyte or so being incompressible?

@Dr-Emann
Copy link
Contributor

Dr-Emann commented Apr 6, 2023

By default, afsctool will give up if even a single block does not compress (grows even slightly when compressed). If you pass -L ("Allow larger-than-raw compressed chunks"), it will keep going, even if the whole file doesn't compress.

@RJVB
Copy link
Owner

RJVB commented Apr 6, 2023 via email

@Dr-Emann
Copy link
Contributor

Dr-Emann commented Apr 6, 2023

Right, yeah, the size check is currently for 2 GiB:

#define CMP_MAX_SUPPORTED_SIZE ((off_t)(1UL << 31) - 1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants