From bedb13500a82b2d14de9618addffca5d2ddfc787 Mon Sep 17 00:00:00 2001
From: Dan Riley <Daniel.Riley@cornell.edu>
Date: Wed, 13 Jul 2016 10:12:15 -0400
Subject: [PATCH] Adjust dictionary size downward for small baskets

LZMA by default creates very large hash tables for its dictionaries, e.g., at compression level 4, the hash table is 4Mi 4 byte entries, 16 MiB total. The hash table has to be zeroed before use so it is allocated via calloc(), which means all the pages have to be allocated, mapped and written. ROOT baskets are often much smaller than the default LZMA dictionaries; for small baskets, the large dictionary has very little compression benefit, while zeroing the hash table can be more expensive than the actual compression operation.

Since R__zipLZMA() is actually being used to compress a buffer of known size, not a stream, we can use the size of the buffer to estimate an appropriate size for the dictionary. This PR uses a slightly more advanced part of the LZMA API to set the dictionary size to 1/4 the size of the input buffer, if that is smaller than the default size from the selected preset compression level. In tests with CMS data, this results in less than 1% increase in the output size and (in one test job) a 25% reduction in job total run time, with LZMA compression time reduced by 80% (all of that time that was being spent in memset() zeroing the hash table).

I also tested this with the "Event" test program with Brian's changes from #59. With the same test parameters as Brian ("./Event 4000 6 99 1 1000 2"), I get

ZLIB level-6: 14.4 MB/s
Original LZMA level-6: 2.3 MB/s
Modified LZMA level-6: 3.0 MB/s

With 100 tracks per event (and hence smaller baskets) the improvement is from 2.2 MB/s to 3.9 MB/s.

This change should be fully transparent and backwards compatible.
---
 core/lzma/src/ZipLZMA.c | 27 ++++++++++++++++++++++++---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/core/lzma/src/ZipLZMA.c b/core/lzma/src/ZipLZMA.c
index 677fdb087fb26..b7c16d8d04628 100644
--- a/core/lzma/src/ZipLZMA.c
+++ b/core/lzma/src/ZipLZMA.c
@@ -19,7 +19,13 @@ void R__zipLZMA(int cxlevel, int *srcsize, char *src, int *tgtsize, char *tgt, i
 {
    uint64_t out_size;             /* compressed size */
    unsigned in_size   = (unsigned) (*srcsize);
+   uint32_t dict_size_est = in_size/4;
    lzma_stream stream = LZMA_STREAM_INIT;
+   lzma_options_lzma opt_lzma2;
+   lzma_filter filters[] = {
+      { .id = LZMA_FILTER_LZMA2, .options = &opt_lzma2 },
+      { .id = LZMA_VLI_UNKNOWN,  .options = NULL },
+   };
    lzma_ret returnStatus;
 
    *irep = 0;
@@ -33,9 +39,24 @@ void R__zipLZMA(int cxlevel, int *srcsize, char *src, int *tgtsize, char *tgt, i
    }
 
    if (cxlevel > 9) cxlevel = 9;
-   returnStatus = lzma_easy_encoder(&stream,
-                                    (uint32_t)(cxlevel),
-                                    LZMA_CHECK_CRC32);
+
+   if (lzma_lzma_preset(&opt_lzma2, cxlevel)) {
+      return;
+   }
+
+   if (LZMA_DICT_SIZE_MIN > dict_size_est) {
+      dict_size_est = LZMA_DICT_SIZE_MIN;
+   }
+   if (opt_lzma2.dict_size > dict_size_est) {
+      /* reduce the dictionary size if larger than 1/4 the input size, preset
+         dictionaries size can be expensively large
+       */
+      opt_lzma2.dict_size = dict_size_est;
+   }
+
+   returnStatus = lzma_stream_encoder(&stream,
+                                      filters,
+                                      LZMA_CHECK_CRC32);
    if (returnStatus != LZMA_OK) {
       return;
    }