-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Datarate dependent compressor #358
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bin/bootstrax
Outdated
data_rate = int(sum([d['rate'] for d in docs])) | ||
data_rate = None | ||
started_looking = time.time() | ||
while data_rate is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@darrylmasson I think this is why we might have seen more than usual failures at the eb lately. Before this nice aggregation we had a check to see if the data_rate actually returned something. Now we should again be in the good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternately, if we make sure the run hasn't started within the last 10 or 15 seconds, we can be sure that the dispatcher has been through a few update cycles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, this is what I had first but then decided against it because it would lead to unnecessary waiting time: 96364e9
This was also because I set the time to 1 minute rather than 10 s :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solved in
b9c6220
bin/bootstrax
Outdated
if datarate < 50: | ||
# Low datarate, we can do very large compression | ||
return 'bz2' | ||
if chunk_size_mb > 1000: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very conservative value. We should be able to go to 1.8G and still have 15% overhead between us and the 31-bit issue. Given that zstd is squeezier (and has higher throughput), I think we should try to use that as much as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair, you are right:
import strax
a = np.zeros(int(2e8), dtype=np.int64)
print(f'Buffer of {a.nbytes/(1e9)} GB')
strax.save_file('test.test', a, compressor='zstd')
However, we need to keep in mind that we don't want to be running into issues where just one chunk is more chunky than the others, thereby disallowing us to save the file.
Nevertheless, I agree, let's set it to 1.8 GB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solved in
ee5277a
What is the problem / what does the code in this PR do
Based on Ivy's nice study, it's clear we can do a better job for computing on compressing the data at the daq. Especially if the rate is low, we can use more computationally expensive algorithms.
What does this code do
Use three different compressors based on:
I hope @napoliion might at some point open a PR with a better motivated distinction between these different compressors. For the moment, these numbers are rough guesses and don't use potential compressors like
LZMA
.