Skip to content

Latest commit

 

History

History
164 lines (121 loc) · 10.8 KB

cep-0017.md

File metadata and controls

164 lines (121 loc) · 10.8 KB

CEP 17 - Optional Python site-packages path in repodata

Title Optional Python site-packages path in repodata
Status Accepted
Author(s) Jonathan Helmus <[email protected]>
Created Sep 13, 2024
Updated Sep 13, 2024
Discussion #90
Implementation conda/conda#14256

To avoid confusion this document will use a stylized python to indicate a conda package with that name. Other uses of "Python" will be followed by a qualifier, like "programming language" or "interpreter". Specific implementations of the Python programming language will be referred to by name, such as CPython or PyPy.

Abstract

We propose adding a new optional field in repodata.json that python packages can use to specify the destination path when installing noarch: python packages.

Background

In the conda ecosystem the package named python provides, either directly or via a dependency, an interpreter for the Python programming language. This interpreter is often CPython but can also be alternative implementations such as PyPy or GraalPy.

Tools like conda, mamba and pixi support installing noarch: python packages into conda environments. Whereas the content of other conda packages are linked into the target environment based upon their path within the package, files within the "site-packages" directory of noarch: python packages are linked into the site-packages directory of the target environment.

In conda 24.7.1 and many other versions the location of the site-packages directory within the environment is Lib/site-packages on Windows and lib/pythonX.Y/site-packages on other platforms where X and Y are the major and minor versions of the python package installed in the environment. These paths are the defaults for CPython but are not the correct paths for alternative implementation of the Python programming language or for some configurations of CPython. For example PyPy uses lib/pypyX.Y/site-packages on POSIX systems.

In these cases kludgy workarounds, often a symlink, are used in the python package to support the installation of noarch: python packages. In the next section a new field will be proposed that allows python package to explicitly declare the location of the site-packages directory to avoid the need for these work-arounds.

Specification

An optional field python_site_packages_path may be included in a conda package's info/index.json file as a string and in the corresponding repodata.json entry for the package. When defined, this field specifies the path of the Python interpreter's site-packages directory relative to the root of the environment. If a python package that includes this field is installed into an environment, all noarch: python packages that are installed into the environment will have the files in their "site-packages" directory linked into the path specified by this field.

A value of "null" can be used to indicate that this field is not specified. It is acceptable to not include the field in repodata.json in this case.

If this field is missing or specified as null a default site-packages path will be used. This path is Lib/site-packages if the environment targets a Windows platform and lib/pythonX.Y/site-packages on other platforms where X and Y are the major and minor versions of the python package installed in the environment. This matches the current behavior of conda, mamba and pixi.

This field should only be included in the metadata for python packages but tools must not fail if packages with other names include this field, rather the entry should be ignored.

To avoid unnecessarily increasing the size of repodata.json, it is highly recommended that this field only be included in python packages where it is required, that is where the site-packages directory is not the default value. Packages with other names or python packages that use the default site-packages path should not include this field.

This path must not point to a location outside of the root of the environment (e.g. it cannot navigate up a directory or be an absolute path). If a package specifies a path which violates this requirement the package must not be installed and an appropriate error should be shown to the user.

A possible way to check the validity of the python_site_packages_path field is with this function:

def is_valid(target_prefix: str, python_site_packages_path: str) -> bool:
    target_prefix = os.path.realpath(target_prefix)
    full_path = os.path.realpath(os.path.join(target_prefix, python_site_packages_path))
    test_prefix = os.path.commonpath((target_prefix, full_path))
    return test_prefix == target_prefix

When a package which includes this optional field is indexed the python_site_packages_path field will be included in the repodata entry for the package. For example the repodata entry for a python-3.13.0rc1 package using the free-threading build configuration might look as follows:

    "python-3.13.0rc1-haa6bb3f_0_cpython_cp313t.tar.bz2": {
      "build": "haa6bb3f_0_cpython_cp313t",
      "build_number": 0,
      "depends": [
        "bzip2 >=1.0.8,<2.0a0",
        "expat >=2.6.2,<3.0a0",
        "libffi >=3.4,<4.0a0",
        "ncurses >=6.4,<7.0a0",
        "openssl >=3.0.14,<4.0a0",
        "readline >=8.1.2,<9.0a0",
        "sqlite >=3.45.3,<4.0a0",
        "tk >=8.6.14,<8.7.0a0",
        "tzdata",
        "xz >=5.4.6,<6.0a0",
        "zlib >=1.2.13,<1.3.0a0"
      ],
      "license": "PSF-2.0",
      "license_family": "PSF",
      "timestamp": 1722610680768,
      "track_features": "free-threading",
      "md5": "c09289eb86239e1221533457d861f1a3",
      "name": "python",
      "size": 16332720,
      "subdir": osx-arm64,
      "version": "3.13.0rc1",
      "sha256": "fa0ae22c13450fe6c30c754ee5efbd7fe7e7533b878d7be96e74de56211d19df",
      "python_site_packages_path": "lib/python3.13t/site-packages"
    },

This field will also be included in repodata derivatives, like current_repodata.json or sharded repodata when appropriate.

Because this field is present in repodata.json it can be hot-fixed to correct mistakes or omissions. This allows existing python packages to retroactively specify the locations of their site-packages directory.

Motivation

This proposal was motivated by the free-threading configuration of CPython 3.13 which uses a different site-packages location on POSIX systems. Specifically, the free-threading build uses lib/python3.13t/site-packages.

A symlink between these paths or logic in sitecustomize.py allows packages installed into the incorrect site-packages directory to operate but this is less than ideal.

For additional details and discussion see conda issue 14053.

Although this is motivated by the free-threading configuration of CPython, the same underlying issue occurs in many non-CPython implementations. This includes both PyPy and GraalPy which use different paths for site-packages than CPython. The conda packages in the conda-forge channel for both of these use symlinks to work around tools which link files into the incorrect site-packages directory.

Security

Because this proposal can change the location where files are installed it is worth considering what, if any, security implementation this change entails.

A primary security concern is that by allowing the python package to specify where files will be installed, a malicious package could write or overwrite files with malicious content.

If files can only be installed into the environment where the package is installed, the damage from an attack is limited since a malicious package can already install files anywhere in the environment.

For example a malicious python package could specify an invalid site-packages path, in which case noarch: python packages installed in the environment would not work. This is inconvenient but not a security concern and should be addressed by removing or disabling the package in question.

The malicious package could also specify a site-packages path that would cause files from noarch: python packages to populate or replace key files in the environment. But this type of attack is already possible, in fact the python package itself could include these files.

Of more concern is the possibility of a package writing files outside of the environment. This could replace critical system files with malicious content. Because of this risk, paths which are outside of the environment are invalid as python_site_packages_path entries. Tools must check that the field is valid and error out when it is invalid.

Backwards Compatibility

This change maintains backwards compatibility. All existing python packages do not include this field. The behavior when this field is not specified matches what is currently done by conda, mamba and pixi. The only behavior change occurs when this field is included.

Because existing releases of conda, mamba and pixi do not support this field, python packages should continue to provide work around to allow files to be installed into an incorrect site-packages directory until such time as this proposal has been implemented and available for a sufficient amount of time.

Other sections

A number of alternatives were discussed and considered to address this problem. These include:

  • Using symlinks or a sitecustomize.py file to allow linking into the incorrect site-packages directory. This is a work-around for the problem, not a fix.
  • Querying the interpreter to determine the site-packages directory. This was rejected as it requires running the interpreter as part of the install step and requires that the python package be installed and operational before noarch: python packages can be linked.
  • Using the build string of the python or python_abi package to determine the site-packages location. This is an indirect method of obtaining information that can be more effectively specified explicitly.
  • Storing this information in another metadata file, such as about.json or paths.json. The downside of this approach is that the data cannot be hotfixed and the path for the site-packages directory would not be known until the python package is downloaded and unpacked, potentially limiting the order in which packages can be linked.

For more alternatives and discussion see conda issue 14053.

Copyright

All CEPs are explicitly CC0 1.0 Universal.