-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Make zstd
compressed index files available
#648
Comments
Thanks for opening this issue, that seems reasonable to me, but I'm wondering if @barabo has opinions on this? |
How would you feel about supporting on the fly content-encoding: zstd I wish we could stop generating repodata.json and then one or more compressed copies of the same. |
On the fly |
Not sure, we're talking about anaconda.org dynamically generated channels and not defaults / conda-forge static hosted channels? |
I'm not quite sure what the difference is. The purpose is to speed up download of things like I don't know whether |
Closed by conda/conda-index#65 Thank you @dholth 🎉 When will the |
It is available for repo.anaconda.com defaults. It will be available on conda-forge after the channel clone system update is deployed. |
@dholth thanks a lot for this, great work! Do you have an indication on when the channel clone system update will be deployed? It would be great if you could comment here if it is so that it can be tested. |
Quick note that Anaconda is on a company holiday today. |
@corneliusroemer repodata.json.zst should be available on conda-forge, and on repo.anaconda.com/main (defaults)! Please experiment. Is it byte-identical? Are the caches invalidated at the same time? How's the speed? |
@dholth that's fantastic new 🎉 I can confirm that
There seem to be some differences, most notably the Here is a diff from zstd to uncompressed: Calculated as follows:
So the zstd compressed appears to be a superset of uncompressed. cc @jonashaag for your PR for mamba support |
@corneliusroemer try a cache-busting technique like curl ?random, or curl -I (what is last-modified). I don't think it is possible for it to be the same as repodata-from-packages.json. https://github.com/conda/conda-index/blob/main/conda_index/index/__init__.py#L817 |
Ah yes, that worked @dholth - appending a random URL param got me identical files, thanks!
|
Was the cached repodata.json older than your repodata.json.zst then? Normally they should all have a HTTP Last-Modified within a few seconds of each other. It's also possible to download both of them at exactly the wrong moment, when one has been updated and the other hasn't. |
The differences were reproducible - it wasn't because I downloaded on just before the other. So it looks like it was a cache thing. I didn't check last-modified headers. I'll play more with it and will see whether it is indeed just a stale cache. |
We will check the cache invalidation logic. |
Better: |
Works for me. What's the difference between
|
Current is the newest version of everything plus its dependencies. Conda tries that first. |
Checklist
What is the idea?
Right now, conda channel servers provide
.bz2
compressed indexes. That's alright, but bz2 isn't really state of the art anymore. Instead,zstd
has quickly taken over for fast compression/decompression with good compression ratios.It would be great if you could offer
zstd
compressed indexes in addition tobz2
and on the flygzip
compressed ones.Why is this needed?
Mamba channel index downloads are currently rate limited by gzip server compression (see #637).
While this issue could eventually get fixed, mamba developers have stated that they are not keen to add
bz2
support tomamba
and instead would preferzstd
compressed indexes.It would be great if either #637 or this issue could be implemented fairly soon as it does cause mamba to be quite a bit slower than it should be.
What should happen?
No response
Additional Context
See here for @wolfv's proposal to add
zstd
compressed indexes: mamba-org/mamba#2021 (comment) :The text was updated successfully, but these errors were encountered: