Datarate dependent compressor #358

JoranAngevaare · 2021-02-05T14:20:55Z

What is the problem / what does the code in this PR do
Based on Ivy's nice study, it's clear we can do a better job for computing on compressing the data at the daq. Especially if the rate is low, we can use more computationally expensive algorithms.

What does this code do
Use three different compressors based on:

Chunk size (there is a max size for some)
Datarate (if the datarate is high, we don't have time for large compression)

I hope @napoliion might at some point open a PR with a better motivated distinction between these different compressors. For the moment, these numbers are rough guesses and don't use potential compressors like LZMA.

ershockley

Perfect, thanks @jorana! Between this and #343 I am quite happy :)

JoranAngevaare · 2021-02-06T14:45:17Z

bin/bootstrax

-        data_rate = int(sum([d['rate'] for d in docs]))
+        data_rate = None
+        started_looking = time.time()
+        while data_rate is None:


@darrylmasson I think this is why we might have seen more than usual failures at the eb lately. Before this nice aggregation we had a check to see if the data_rate actually returned something. Now we should again be in the good.

Alternately, if we make sure the run hasn't started within the last 10 or 15 seconds, we can be sure that the dispatcher has been through a few update cycles.

Sure, this is what I had first but then decided against it because it would lead to unnecessary waiting time: 96364e9

This was also because I set the time to 1 minute rather than 10 s :)

Solved in
b9c6220

darrylmasson · 2021-02-08T08:54:04Z

bin/bootstrax

+    if datarate < 50:
+        # Low datarate, we can do very large compression
+        return 'bz2'
+    if chunk_size_mb > 1000:


This is a very conservative value. We should be able to go to 1.8G and still have 15% overhead between us and the 31-bit issue. Given that zstd is squeezier (and has higher throughput), I think we should try to use that as much as possible.

Fair, you are right:

import strax a = np.zeros(int(2e8), dtype=np.int64) print(f'Buffer of {a.nbytes/(1e9)} GB') strax.save_file('test.test', a, compressor='zstd')

However, we need to keep in mind that we don't want to be running into issues where just one chunk is more chunky than the others, thereby disallowing us to save the file.

Nevertheless, I agree, let's set it to 1.8 GB

Solved in
ee5277a

JoranAngevaare added 3 commits February 5, 2021 14:42

use compression based on datarate

919e541

check that the run is at least 10 s old

a033c1b

be safe and set 1GB buffer threshold

62309d4

JoranAngevaare requested review from darrylmasson, ershockley and napoliion February 5, 2021 14:21

ershockley approved these changes Feb 5, 2021

View reviewed changes

JoranAngevaare added 2 commits February 6, 2021 14:13

Update bootstrax

9eb0466

make sure we know the datarate

96364e9

JoranAngevaare mentioned this pull request Feb 6, 2021

specify saver timeout at context level AxFoundation/strax#394

Merged

JoranAngevaare commented Feb 6, 2021

View reviewed changes

JoranAngevaare added the enhancement New feature or request label Feb 6, 2021

Merge branch 'master' into datarate_dependent_compressor

0b50666

darrylmasson approved these changes Feb 8, 2021

View reviewed changes

JoranAngevaare added 2 commits February 8, 2021 10:11

let run run for at least 15 s

b9c6220

use ZSTD more

ee5277a

JoranAngevaare merged commit 6db3444 into master Feb 8, 2021

JoranAngevaare deleted the datarate_dependent_compressor branch February 8, 2021 09:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Datarate dependent compressor #358

Datarate dependent compressor #358

JoranAngevaare commented Feb 5, 2021 •

edited

Loading

ershockley left a comment

JoranAngevaare Feb 6, 2021

darrylmasson Feb 8, 2021

JoranAngevaare Feb 8, 2021

JoranAngevaare Feb 8, 2021

darrylmasson Feb 8, 2021

JoranAngevaare Feb 8, 2021

JoranAngevaare Feb 8, 2021

Datarate dependent compressor #358

Datarate dependent compressor #358

Conversation

JoranAngevaare commented Feb 5, 2021 • edited Loading

ershockley left a comment

Choose a reason for hiding this comment

JoranAngevaare Feb 6, 2021

Choose a reason for hiding this comment

darrylmasson Feb 8, 2021

Choose a reason for hiding this comment

JoranAngevaare Feb 8, 2021

Choose a reason for hiding this comment

JoranAngevaare Feb 8, 2021

Choose a reason for hiding this comment

darrylmasson Feb 8, 2021

Choose a reason for hiding this comment

JoranAngevaare Feb 8, 2021

Choose a reason for hiding this comment

JoranAngevaare Feb 8, 2021

Choose a reason for hiding this comment

JoranAngevaare commented Feb 5, 2021 •

edited

Loading