Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datarate dependent compressor #358

Merged
merged 8 commits into from
Feb 8, 2021
Merged

Conversation

JoranAngevaare
Copy link
Contributor

@JoranAngevaare JoranAngevaare commented Feb 5, 2021

What is the problem / what does the code in this PR do
Based on Ivy's nice study, it's clear we can do a better job for computing on compressing the data at the daq. Especially if the rate is low, we can use more computationally expensive algorithms.

What does this code do
Use three different compressors based on:

  1. Chunk size (there is a max size for some)
  2. Datarate (if the datarate is high, we don't have time for large compression)

I hope @napoliion might at some point open a PR with a better motivated distinction between these different compressors. For the moment, these numbers are rough guesses and don't use potential compressors like LZMA.

Copy link
Contributor

@ershockley ershockley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thanks @jorana! Between this and #343 I am quite happy :)

bin/bootstrax Outdated
data_rate = int(sum([d['rate'] for d in docs]))
data_rate = None
started_looking = time.time()
while data_rate is None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@darrylmasson I think this is why we might have seen more than usual failures at the eb lately. Before this nice aggregation we had a check to see if the data_rate actually returned something. Now we should again be in the good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternately, if we make sure the run hasn't started within the last 10 or 15 seconds, we can be sure that the dispatcher has been through a few update cycles.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, this is what I had first but then decided against it because it would lead to unnecessary waiting time: 96364e9

This was also because I set the time to 1 minute rather than 10 s :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solved in
b9c6220

@JoranAngevaare JoranAngevaare added the enhancement New feature or request label Feb 6, 2021
bin/bootstrax Outdated
if datarate < 50:
# Low datarate, we can do very large compression
return 'bz2'
if chunk_size_mb > 1000:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very conservative value. We should be able to go to 1.8G and still have 15% overhead between us and the 31-bit issue. Given that zstd is squeezier (and has higher throughput), I think we should try to use that as much as possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, you are right:

import strax
a = np.zeros(int(2e8), dtype=np.int64)
print(f'Buffer of {a.nbytes/(1e9)} GB')
strax.save_file('test.test', a, compressor='zstd')

However, we need to keep in mind that we don't want to be running into issues where just one chunk is more chunky than the others, thereby disallowing us to save the file.

Nevertheless, I agree, let's set it to 1.8 GB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solved in
ee5277a

@JoranAngevaare JoranAngevaare merged commit 6db3444 into master Feb 8, 2021
@JoranAngevaare JoranAngevaare deleted the datarate_dependent_compressor branch February 8, 2021 09:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants