Improve compressed NRRD read performance #92
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR makes two changes to improve the performance of the
nrrd.write
function for compressed NRRD files, i.e. gzip or bzip2 compressed data.The first change is to switch the decompressed data buffer from a
bytes
object to abytearray
. Abytes
object is immutable and so appending to abytes
object requires that a new object is created in memory with the data.bytearray
is a mutable object similar tobytes
except that appending to thebytearray
will result in adding the data to the array and allocating memory only if necessary. In addition, importing the data into a Numpy array is switched fromnp.fromstring
tonp.frombuffer
. This is done because we no longer have a string and because there is a warning aboutnp.fromstring
being deprecated. Performance tests (see the issue for more details) show a large speedup with this improvement.The second change fine tunes the chunk size parameter and how it is used in
nrrd.read
. Previously, the chunk size was set to 1MB and it would read a 1MB chunk and then decompress it. For larger files, this is inefficient. This PR changes the chunk size to be 1GB and also changes it such that the entire compressed data is read all at once and then only the decompression is chunked. The reasoning is detailed below...Based on an initial analysis, it was found that increasing the chunk size for
nrrd.write
actually increased the amount of time required to write the file. The exact difference may vary depending on the machine but the general trend should be consistent. With that, the write chunk size was kept at 1MB to preserve RAM while writing. The example experiment uses random data to be written which likely affects the compression ratio, but additional tests were done with non-random data and similar results were achieved as described.Experiment for writing large amounts of data with various chunk sizes:
https://gist.github.com/addisonElliott/097de1ca1311026e2e116541c9eed0c5#file-write_experiment-py
Changing the
nrrd.read
chunk size alone does not improve performance. Upon analysis, it was found that there was a large delay for small & large files with a large chunk size when callingfh.read(CHUNK_SIZE)
. For example, a 3kB file took 0.7s to return with a 1GB chunk size while it takes 100s of microseconds when using a 1MB chunk size. In the experiment linked below, it is shown that there is a speedup for large files but smaller files had almost a 50% slow down in performance.Experiment for reading small/large files with various chunk sizes:
https://gist.github.com/addisonElliott/097de1ca1311026e2e116541c9eed0c5#file-read_normal_experiment-py
As mentioned above, the resolution to the slow down with a larger chunk size on smaller files is to read the entire file into memory at once using
fh.read()
(no argument). This has the disadvantage of using additional memory but the fact of the matter is that reading a raw-encoded NRRD file loads the entire file into memory. In addition, the data will need to be in memory anyway when it is converted to a Numpy array. Even furthermore, we are reading the decompressed data, so the user should have enough RAM if they are expecting to be able to hold the uncompressed data in RAM.With the entire compressed file read in at once now, the chunk size is set to be 1GB to increase performance on reading larger files while keeping the same performance for smaller files. The chunk size only sets the amount of data to decompress at once.
One potential concern with a larger chunk size is that there is an issue in older versions of Python where zlib is unable to decompress data that is larger than 4GB in file size. See issue #21 for more details. However, as far as I can tell this is fixed in the latest version of Python 2.7 and all versions of Python 3. See source here and here for more information. Note that these fixes came out after the
pynrrd
issue was reported.The fixed benchmark can be seen here:
https://gist.github.com/addisonElliott/097de1ca1311026e2e116541c9eed0c5#file-read_fixed_experiment-py
Note that performance is similar for small files and for the 1GB file with a 1GB chunk size, the performance is increased by ~15%.
Fixes issue #88