You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello!
I'm having a hard time digging through readthedocs ...
First of all: This is not about Python Dictionaries, right? In fact this got me super confused in the beginning. Maybe this could be somehow stated in the docs? :)
So here data is just my data to compress as bytes:
If we want to stick to the python dict idea: This could very well be data = json.dump(py_dict), right?
Now here it gets tricky:
>>> dict_data = zstandard.train_dictionary(size, samples)
>>> dict_size = len(dict_data) # will not be larger than ``size``
So size is some integer but one line later we get dict_size which is obviously different? Or is it? Why is it just the whole data byte size? Here in the tests there is a function get_optimal_dict_size_heuristically() But why is this optimal? A comment would be good there :)
And samples: apparently this is a list! But how should I select the samples ideally? On the main docs page it says:
Training Zstandard is achieved by providing it with a few samples (one file per sample).
So .. given the json dump from above: Yet we don't have any file. :)
Probably I'm acting really really dumb and all this is rather obvious. Maybe I just need a little hint to pull this off.
Would someone be so kind to help me explaining?
Thanks already!
ëRiC
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hello!
I'm having a hard time digging through readthedocs ...
First of all: This is not about Python Dictionaries, right? In fact this got me super confused in the beginning. Maybe this could be somehow stated in the docs? :)
So here
data
is just my data to compress as bytes:If we want to stick to the python dict idea: This could very well be
data = json.dump(py_dict)
, right?Now here it gets tricky:
So
size
is some integer but one line later we getdict_size
which is obviously different? Or is it? Why is it just the whole data byte size? Here in the tests there is a functionget_optimal_dict_size_heuristically()
But why is this optimal? A comment would be good there :)And
samples
: apparently this is a list! But how should I select the samples ideally? On the main docs page it says:So .. given the json dump from above: Yet we don't have any file. :)
Probably I'm acting really really dumb and all this is rather obvious. Maybe I just need a little hint to pull this off.
Would someone be so kind to help me explaining?
Thanks already!
ëRiC
Beta Was this translation helpful? Give feedback.
All reactions