-
-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read the registry straight from the tarball #2431
Conversation
895bc9a
to
f9b146d
Compare
A lot of tools (RegistryCI, CompatHelper, RetroCap, etc.) access registry files directly. Maybe we could have an environment variable that disables this functionality? So, by default, we would use this functionality. But if you set something like this... ENV["JULIA_UNPACK_REGISTRY_TARBALLS"] = "true" ... then it would untar the registry tarballs so that all of the registry files would be available. |
17e2950
to
c2481b8
Compare
Is the underlying problem here that tar files have no global index, so you're forced to do a lot of reading and seeking just to get to one small part of it? In that case it might be an improvement to do a one-time creation of a much smaller index sidecar file for the tar when it's downloaded, and store the index. Essentially a cache of the offsets found in |
Partially, however, reading all the headers is quite fast:
The big difference is time is likely due to that there is a bunch of caching for the other types of registries that I haven't hooked this up to. After that, I think the timing should be fine. |
Right that seems quite conclusive. 50 ms to scan the tar file is only roughly 1/10 of the slowdown. |
ac81779
to
22b15ab
Compare
Aren't they using the git version of the registry? |
22b15ab
to
8fbe2fe
Compare
They usually use the Git version. But, for example, CompatHelper can be toggled to use the Pkg server version instead, if for example someone wants to use a private registry with CompatHelper, and that private registry is only available from a private Pkg server. |
Those bots could start using the API in https://github.com/JuliaLang/Pkg.jl/blob/master/src/Registry/registry_instance.jl. I guess we can keep the existing one, but the number of configurations (git, uncompressed pkg server, compressed pkg server) is getting hard to manage. |
4ffea6a
to
fea2361
Compare
9cf03f5
to
6aa5b53
Compare
Should be ready for review. |
6aa5b53
to
b5fbad9
Compare
0f46055
to
4df6cc9
Compare
8a3de1d
to
6a64c9b
Compare
I haven't reviewed everything, but I approve of the method of extracting the tarball into a |
One thing you might have a comment about is https://github.com/JuliaLang/Pkg.jl/pull/2431/files#diff-207055bf1cfbe124497d7079ce1ffb06c549f3534fc7b0988b2051fd1a74f1c2R191. Since we cache things based on the tree-hash we don't want to have to read the compressed file to know what the tree hash of that registry is. Therefore it creates another file at registry installation (
|
Yeah, that seems fine to me. Other options I considered when thinking about this feature were:
But just having a file with this info seems simple too. What about a single TOML file with metadata about all of the registries? We'd have to lock on modifying that file, obviously, but that's a thing we do often in Pkg, so not worse than anything else we do. |
I also thought about using the file name too but you probably want to be able to retrieve both the UUID and the tree sha and at that point, the file name becomes kinda big which is why I went with the file. I think I don't see any big advantage in keeping the registry info for all registries in one file and it makes the implementation slightly harder I think I rather keep it the way it is now. |
If I'm understanding correctly, is the layout with this PR like this?
If so, maybe avoid the extra directory level and call the second file |
Almost, it is:
I thought about putting the two files under |
Hmm. Feels a bit messy to me. I think as long as we have clear criteria for detecting which one exists we should be ok. The current criteria are:
If someone is switching back to older Julia versions, how is the last case treated? It might be better if the |
1.6 detects that the folder has no
Yes, but when you switch back to Julia 1.7 it will find two copies of the General registry (remember the old 1.6 registry is still valid) Figuring out which one to use (which one is more up to date) feels awkward. |
If there are two, I would say pick the one that more Julia versions will understand and get rid of the other one. |
More specifically:
Julia 1.1 to 1.4 and 1.5 with disabled package server first makes a |
0075dbc
to
3e29422
Compare
Been trying this out today and it seems to work well. Also made it backwards compatible so that old julias won't freak out when it finds compressed registries. |
What file layout did you end up going with? On the last Pkg call we had settled on this:
The first two would be ignored by older Julias, allowing them to keep working. |
Exactly that |
Example:
TODO