Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

💡 Have lzma backend respect -p# option #133

Closed
pete4abw opened this issue Aug 14, 2023 · 9 comments
Closed

💡 Have lzma backend respect -p# option #133

pete4abw opened this issue Aug 14, 2023 · 9 comments
Assignees

Comments

@pete4abw
Copy link
Owner

lrzip-next Version

lrzip-next version 0.12.0

Feature Suggestion

When limiting threads in lrzip-next with the -p option, more threads are used than requested. Have lzma backend respect threads requested.

Steps to reproduce

lrzip-next -p4 file
Viewing system usage all CPU cores and threads are in use.
threadsp4

Relevant log output

N/A

Please provide system details

OS Distro: Debian
Kernel Version (uname -a): 6.1.0-10-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.38-2
System ram (free -h): $ free -h
total used free shared buff/cache available
Mem: 15Gi 7.3Gi 4.5Gi 4.8Gi 8.9Gi 8.1Gi
Swap: 15Gi 1.2Gi 14Gi

Additional Context

As originally reported by @Darkyere as Issue 247.

For lrzip-next and lzma SDK 23.01, the solution is to disable the Multi Threading Match Finder using the compiler define -DZ7_ST. This will restrict lzma to use one thread to compress/decompress. Other backends may or may not perform as expected since we use external libraries. ZPAQ does not have this issue.

AM_CFLAGS = \
  -D_REENTRANT \
  -D_7ZIP_LARGE_PAGES \
  -DZ7_ST \
  -I@top_srcdir@/src \
  -I../include
@pete4abw pete4abw self-assigned this Aug 14, 2023
@pete4abw
Copy link
Owner Author

A way out seems to be simple. By default, if -p# > 1, then the lzma backend will use as many threads as are available. If -p# == 1, then it won't. So, A solution may be to add yet another command line option to prohibit lzma from using more threads than are specified in -p#.

Currently, in stream.c:

458         lzma_ret = LzmaCompress(c_buf, &dlen, cthread->s_buf,
 459                 (size_t)cthread->s_len, lzma_properties, &prop_size,
 460                                 control->compression_level,
 461                                 control->dictSize, /* dict size. 0 = set default, otherwise control->dictSize */
 462                                 LZMA_LC, LZMA_LP, LZMA_PB, /* lc, lp, pb */
 463                                 (control->compression_level < 7 ? 32 : 64), /* fb */
 464                                 (control->threads > 1 ? 2 : 1));
 465                                 /* LZMA spec has threads = 1 or 2 only. */

If control->threads > 1 (i.e. -p# > 1), multi-threading in lzma is enabled by passing 2 to the backend. If we create YET ANOTHER COMMAND LINE OPTION, say --nobemt for no backend multi-threading, it can be used to pass a 1 to the backend which would bypass multi-threading completely (i.e. LzFindMt functions would not be used). This may only work with lzma. ZPAQ does not seem to have this issue, and the other compression methods are using external libraries which may not be able to be controlled.

@Darkyere , how does this sound?

@pete4abw
Copy link
Owner Author

@Darkyere See whats-next branch to compile updates for --nobemt option. Remember, the rzip preprocessor holds one thread so there will always be one more thread used than requested.

@pete4abw
Copy link
Owner Author

@Darkyere , it should be noted that disabling multi threading in lzma will result in slower compression. Also, lower compression. Here are some stats:

Using -L9 -p3 --nobemt

linux-6.x.tar - Compression Ratio: 9.987. bpb: 0.801. Average Compression Speed: 5.233MB/s.
Total time: 00:04:52.61

Not using --nobemt

linux-6.x.tar - Compression Ratio: 9.985. bpb: 0.801. Average Compression Speed: 7.527MB/s.
Total time: 00:03:22.91

@Darkyere
Copy link

I am very happy you actually wanna try and implement a way to controll the threads.
Even if it will cost compression time and size in the end.

I am a bit in doubt if it is proberly implemented yet.

I tried a few experiements and couldten understand that with both -p4 and -p5 it created 4 threads every time.

I then tried making a command as following lrzip-next -L9 -p2 --nobemt.
But it seemed to spawn 4 threads, evern though it should have created 3 where one should have been for rzip.

2023-08-15 (2)

I thought i would check the part that worked before in the old lrzip where i would just compress a tar file.
Or more precisly a normal vzdump with no compression.
Since for some reason tar refuses to accept the stdout from vzdump.

And it also created 4 threads with the command lrzip-next -L9 -p2 --nobemt

2023-08-15 (3)

I of course have no clue what is going on. Just thought i should post my findings.

@pete4abw
Copy link
Owner Author

pete4abw commented Aug 15, 2023 via email

@pete4abw
Copy link
Owner Author

@Darkyere , I ran the program and cannot duplicate your issue.

Run with -vv -p2, then -vv -p1 and then stop the run after all verbose details show. You will see output like:

$ lrzip-next -vvL9 -p2 --nobemt file
The following options are in effect for this COMPRESSION.
Threading is ENABLED. Number of CPUs detected: 2
Detected 16,538,464,256 bytes ram
Nice Value: 19
Show Progress
Max Verbose
Temporary Directory set as: /tmp/
Compression mode is: LZMA. LZ4 Compressibility testing enabled
Compression level 7
RZIP Compression level 7
Initial LZMA Dictionary Size: 33,554,432
No Backend Multi Threading
MD5 Hashing Used
Heuristically Computed Compression Window: 105 = 10,500MB
Storage time in seconds 1,393,426,958
File size: 1,602,304,000
Succeeded in testing 1,602,304,000 sized mmap for rzip pre-processing
Will take 1 pass
Chunk size: 1,602,304,000
Byte width: 4
Per Thread Memory Overhead is 392,183,808
Succeeded in testing 2,778,855,424 sized malloc for back end compression
Using up to 3 threads to compress up to 534,102,016 bytes each.
Beginning rzip pre-processing phase

Show this output and maybe I can help. NO PIPE.

@pete4abw
Copy link
Owner Author

I cannot duplicate your problem. All seems right, even with pipe.
tar -cf - /share/software/Kernel/linux-6.x | ./wn/src/lrzip-next -vv -L9 -p3 --nobemt -o test.tar.lrz

lrzip-next-pipe

@Darkyere
Copy link

I tried starting over and do a new attempt.
New git clone and checkout whats-next plus compiling etc.

I then ran the command with -vv and this is the output of its settings.

vzdump 100 --compress 0 --stdout | lrzip-next -vv -L9 -p3 --nobemt -o /Proxmox-BackUp/Manual-vzdump/vzdump-lrzip-next-vv-L9-p2---nobemt.lrzip
The following options are in effect for this COMPRESSION.
Threading is ENABLED. Number of CPUs detected: 3
Detected 16,696,479,744 bytes ram
Nice Value: 19
Show Progress
Max Verbose
Output Filename Specified: /Proxmox-BackUp/Manual-vzdump/vzdump-lrzip-next-vv-L9-p2---nobemt.lrzip
Temporary Directory set as: /tmp/
Compression mode is: LZMA. LZ4 Compressibility testing enabled
Compression level 9
RZIP Compression level 9
Initial LZMA Dictionary Size: 134,217,728
No Backend Multi Threading
MD5 Hashing Used
Heuristically Computed Compression Window: 53 = 5,300MB
Storage time in seconds 1,393,468,201

Here it seems like respecting the threads and using the "No Backend Multi Threading" which i suppose is because of --nobemt


Now i am a bit curius of something. Because of the extra output.
If the fourth thread is for creating a hash perhaps?
Since when this outut appeared
Malloced 5,565,493,248 for checksum ckbuf
There spawned a fouth thread.
So in that case it should still be using 2 threads plus 1 for rzip, and then 1 would appear for hashing which i would still consider it doing what was the idea.

If this is not the case.
I also noticed that it spawned the fourth thread when it reached around 8 or 9 GB in the compression.
Can this be triggered because of the amount being compressed.

Just throwing out some ideas to better understand whats going on.

Best regards,
Darkyere

@pete4abw
Copy link
Owner Author

@Darkyere If you examine the tree entries for lrzip-next observe the CPU%. Under heavy use, it should equal 100 x # threads requested or threads + 1. There may be an extra thread line, but it may show 0% CPU use. Without a custom pthread library, or debugging it, there is nothing more I can do here. If you're satisfied with the output of --nobemt, then let me know and I'll push it to the main branch. Otherwise, I don't think I can take time to debug this special case anymore. Thank you for bringing this issue up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants