conda · wolfv · Apr 12, 2024 · Apr 12, 2024 · Apr 12, 2024 · Apr 25, 2024
diff --git a/cep-oci.md b/cep-oci.md
@@ -14,30 +14,33 @@ An OCI manifest is referenced by a name and a tag.
 
 For further details, please refer to the official [OCI Distribution spec](https://github.com/opencontainers/distribution-spec/blob/v1.0/spec.md#definitions).
 
-### Conda package artifacts on an OCI registry
-
-The manifest for a conda package on an OCI registry should look like follows.
+### Layers
 
-It should have a name and a tag. The name is `<channel>/<subdir>/<package-name>`.
-The tag is the version and build string of the packages, using a `-` as a separator.
+Each layer must be a [descriptor](https://github.com/opencontainers/image-spec/blob/main/descriptor.md#properties) containing at least the 3 required fields:
 
-For example, a package like `xtensor-0.10.4-h431234.conda` would map to a OCI registry `conda-forge/linux-64/xtensor:0.10.4-h431234`.
+- The `mediaType` of the referenced content.
+- The `digest` of the targeted content.
+- The `size` of the raw content (in bytes).
 
-### Layers
+### MediaTypes
 
-A conda package, in an OCI registry, should ship up to 3 layers:
+Global and already defined mediaTypes are described [here](https://github.com/opencontainers/image-spec/blob/main/media-types.md#oci-image-media-types).
 
-- The package data itself, as a tarball. (mandatory)
-  - This can be either a `.tar.bz2` (v1) or a `.conda` (v2) file, or both as separate layers.
-- The package `info` folder as a gzipped "tar.gz" file.
-- The package `info/index.json` file as a plain JSON file.
+Custom mediaTypes defined for the conda channels use case are as follows:
 
-The mediaType for the different layers is as follows:
+| Blob type        | Content type              | mediaType                                   |
+|------------------|---------------------------|---------------------------------------------|
+| conda package    | .tar.bz2 package          | application/vnd.conda.package.v1            |
+| conda package    | .conda package            | application/vnd.conda.package.v2            |
+| package info     | `info` folder as gzip     | application/vnd.conda.info.v1.tar+gzip      |
+| package info     | `index.json` file         | application/vnd.conda.info.index.v1+json    |
+| repodata         | `repodata.json` file      | application/vnd.conda.repodata.v1+json      |
+| repodata         | `repodata.json.zst` file  | application/vnd.conda.repodata.v1+json+zst  |
+| repodata         | `repodata.json.gz` file   | application/vnd.conda.repodata.v1+json+gzip |
+| repodata         | `repodata.json.bz2` file  | application/vnd.conda.repodata.v1+json+bz2  |
+| repodata         | `repodata.json.jlap` file | application/vnd.conda.jlap.v1               |
 
-- for a .tar.bz2 package, the mediaType is `application/vnd.conda.package.v1`
-- for a .conda package, the mediaType is `application/vnd.conda.package.v2`
-- for the `info` folder as gzip the mediaType is `application/vnd.conda.info.v1.tar+gzip`
-- for the `index.json` file the mediaType is `application/vnd.conda.info.index.v1+json`
+If needed, more mediaTypes could be specified (i.e `application/vnd.conda.info.v1.tar+zst`).
 
 Using the `mediaType` field in the manifest, we can find the layer + SHA256 hash to pull the corresponding blob.
 Each `mediaType` should only be present in one layer.
@@ -51,7 +54,7 @@ On an OCI registry it should be stored under `<channel>/<subdir>/repodata.json`.
 The repodata file should have one entry that has the `latest` tag. This entry should point to the latest version of the repodata.
 All versions of the repodata should also be tagged with a UTC timestamp of the following format: `YYYY.MM.DD.HH.MM.SS`, e.g. `2024.04.12.07.06.32`.
 
-The mediaType for the raw `repodata.json` file is `application/vnd.conda.repodata.v1+json`. However, for large repositories it's advised to store the `zstd` encoded repodata file with the mediaType `application/vnd.conda.repodata.v1+json+zstd` as an additional layer in `<channel>/<subdir>/repodata.json`. ([ref](https://github.com/opencontainers/image-spec/blob/main/layer.md#gzip-media-types))
+The mediaType for the raw `repodata.json` file is `application/vnd.conda.repodata.v1+json`. However, for large repositories it's advised to store the `zstd` encoded repodata file with the mediaType `application/vnd.conda.repodata.v1+json+zst` as an additional layer in `<channel>/<subdir>/repodata.json`. ([ref](https://github.com/opencontainers/image-spec/blob/main/layer.md#gzip-media-types))
 
 Other encodings are also accepted:
 
@@ -64,6 +67,22 @@ For `jlap`, the following mediaType is used:
 
 The `jlap` file should also be stored under the `<channel>/<subdir>/repodata.json` path as an additional layer.
 
+### Conda package artifacts on an OCI registry
+
+The manifest for a conda package on an OCI registry should look like follows.
+
+It should have a name and a tag. The name is `<channel>/<subdir>/<package-name>`.
+The tag is the version and build string of the packages, using a `-` as a separator.
+
+For example, a package like `xtensor-0.10.4-h431234.conda` would map to a OCI registry `conda-forge/linux-64/xtensor:0.10.4-h431234`.
+
+A conda package, in an OCI registry, should ship up to 3 layers:
+
+- The package data itself, as a tarball. (mandatory)
+  - This can be either a `.tar.bz2` (v1) or a `.conda` (v2) file, or both as separate layers.
+- The package `info` folder as a gzipped "tar.gz" file.
+- The package `info/index.json` file as a plain JSON file.
+
 ### Mapping a conda-package to the OCI registry
 
 A given conda-package is identified by a URL like `<subdir>/<package-name>-<version>-<build>.<ext>` where `<subdir>` is the platform and architecture, `<package-name>` is the name of the package, `<version>` is the version of the package, `<build>` is the build string of the package, and `<ext>` is the extension of the package file.
@@ -96,3 +115,39 @@ Some characters that are used in the conda-forge repository as part of the build
 - `+` is replaced by `__p__`
 - `!` is replaced by `__e__`
 - `=` is replaced by `__eq__`
+
+#### Authentication
+
+Pulling a public image from a Container registry can be done anonymously ([ref](https://docs.github.com/en/packages/learn-github-packages/about-permissions-for-github-packages#visibility-and-access-permissions-for-packages)).
+
+A token can be requested with `pull` scope, using the following URL:
+`https://ghcr.io/token?scope=repository:<org>/<channel-name>/<subdir>/<package-name-or-repodata.json>:pull`
+
+Note that in the case of pulling repodata, the name `repodata.json` is always used in the URL regardless of the encoding.
+
+#### Implementation (conda / mamba / rattler)
+
+##### mamba
+
+In order to fetch packages from an OCI registry, we need to set a mirror (can be more than one) for the channel to be used (e.g `conda-forge`).
+This can be done in the rc file as follows:
+
+```
+mirrored_channels:
+  conda-forge: ["oci://ghcr.io/channel-mirrors/conda-forge"]
+```
+
+When a user requests installing a package (with the configuration set above, and using `conda-forge` channel), a set of requests to fetch `repodata.json` are first performed as follows:
+
+- A token is requested to anonymously pull `repodata.json` using the following URL:\
+`https://ghcr.io/token?scope=repository:channel-mirrors/conda-forge/<subdir>/repodata.json:pull`
+- The manifest is then pulled using `https://ghcr.io/v2/channel-mirrors/conda-forge/<subdir>/repodata.json/manifests/<reference>`.\
+`<reference>` is always set to `latest` in `mamba`.\
+This is also where the repodata file encoding is handled (checking `mediaType` field in the layers).\
+In `mamba`, `zstd` encoding has priority if present, otherwise, raw `repodata.json` is picked, and the corresponding SHA256 hash is set for the next step.
+- Repodata blob is then downloaded using:\
+`https://ghcr.io/v2/channel-mirrors/conda-forge/<subdir>/repodata.json/blobs/sha256:<HASH>`
+
+Then, to fetch the package itself, and using the same token, the corresponding blob is downloaded using:
+`https://ghcr.io/v2/channel-mirrors/conda-forge/<subdir>/<package-name>/blobs/sha256:<HASH>`
+where <HASH> is the SHA256 hash of the requested package, retrieved from `repodata.json`.