how to ZstdCompressionDict? Some real world examples? #155

ewerybody · 2021-06-17T18:06:48Z

ewerybody
Jun 17, 2021

Hello!
I'm having a hard time digging through readthedocs ...

First of all: This is not about Python Dictionaries, right? In fact this got me super confused in the beginning. Maybe this could be somehow stated in the docs? :)

So here data is just my data to compress as bytes:

>>> dict_data = zstandard.ZstdCompressionDict(data)

If we want to stick to the python dict idea: This could very well be data = json.dump(py_dict), right?

Now here it gets tricky:

>>> dict_data = zstandard.train_dictionary(size, samples)
>>> dict_size = len(dict_data)  # will not be larger than ``size``

So size is some integer but one line later we get dict_size which is obviously different? Or is it? Why is it just the whole data byte size? Here in the tests there is a function get_optimal_dict_size_heuristically() But why is this optimal? A comment would be good there :)

And samples: apparently this is a list! But how should I select the samples ideally? On the main docs page it says:

Training Zstandard is achieved by providing it with a few samples (one file per sample).

So .. given the json dump from above: Yet we don't have any file. :)

Probably I'm acting really really dumb and all this is rather obvious. Maybe I just need a little hint to pull this off.
Would someone be so kind to help me explaining?
Thanks already!
ëRiC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to ZstdCompressionDict? Some real world examples? #155

{{title}}

Replies: 0 comments

Select a reply

how to ZstdCompressionDict? Some real world examples? #155

ewerybody Jun 17, 2021

Replies: 0 comments

ewerybody
Jun 17, 2021