Title | Optional Python site-packages path in repodata |
Status | Accepted |
Author(s) | Jonathan Helmus <[email protected]> |
Created | Sep 13, 2024 |
Updated | Sep 13, 2024 |
Discussion | #90 |
Implementation | conda/conda#14256 |
To avoid confusion this document will use a stylized python
to indicate a conda package with that name.
Other uses of "Python" will be followed by a qualifier, like "programming language" or "interpreter".
Specific implementations of the Python programming language will be referred to by name, such as CPython or PyPy.
We propose adding a new optional field in repodata.json
that python
packages can use to specify the destination path when installing noarch: python
packages.
In the conda ecosystem the package named python
provides, either directly or via a dependency, an interpreter for the Python programming language.
This interpreter is often CPython but can also be alternative implementations such as PyPy or GraalPy.
Tools like conda, mamba and pixi support installing noarch: python
packages into conda environments.
Whereas the content of other conda packages are linked into the target environment based upon their path within the package, files within the "site-packages" directory of noarch: python
packages are linked into the site-packages directory of the target environment.
In conda 24.7.1 and many other versions the location of the site-packages directory within the environment is Lib/site-packages
on Windows and lib/pythonX.Y/site-packages
on other platforms where X and Y are the major and minor versions of the python
package installed in the environment.
These paths are the defaults for CPython but are not the correct paths for alternative implementation of the Python programming language or for some configurations of CPython.
For example PyPy uses lib/pypyX.Y/site-packages
on POSIX systems.
In these cases kludgy workarounds, often a symlink, are used in the python
package to support the installation of noarch: python
packages.
In the next section a new field will be proposed that allows python
package to explicitly declare the location of the site-packages directory to avoid the need for these work-arounds.
An optional field python_site_packages_path
may be included in a conda package's info/index.json
file as a string and in the corresponding repodata.json
entry for the package.
When defined, this field specifies the path of the Python interpreter's site-packages directory relative to the root of the environment.
If a python
package that includes this field is installed into an environment, all noarch: python
packages that are installed into the environment will have the files in their "site-packages" directory linked into the path specified by this field.
A value of "null" can be used to indicate that this field is not specified.
It is acceptable to not include the field in repodata.json
in this case.
If this field is missing or specified as null
a default site-packages path will be used.
This path is Lib/site-packages
if the environment targets a Windows platform and lib/pythonX.Y/site-packages
on other platforms where X and Y are the major and minor versions of the python
package installed in the environment.
This matches the current behavior of conda, mamba and pixi.
This field should only be included in the metadata for python
packages but tools must not fail if packages with other names include this field, rather the entry should be ignored.
To avoid unnecessarily increasing the size of repodata.json
, it is highly recommended that this field only be included in python
packages where it is required, that is where the site-packages directory is not the default value.
Packages with other names or python
packages that use the default site-packages path should not include this field.
This path must not point to a location outside of the root of the environment (e.g. it cannot navigate up a directory or be an absolute path). If a package specifies a path which violates this requirement the package must not be installed and an appropriate error should be shown to the user.
A possible way to check the validity of the python_site_packages_path
field is with this function:
def is_valid(target_prefix: str, python_site_packages_path: str) -> bool:
target_prefix = os.path.realpath(target_prefix)
full_path = os.path.realpath(os.path.join(target_prefix, python_site_packages_path))
test_prefix = os.path.commonpath((target_prefix, full_path))
return test_prefix == target_prefix
When a package which includes this optional field is indexed the python_site_packages_path
field will be included in the repodata entry for the package.
For example the repodata entry for a python-3.13.0rc1
package using the free-threading build configuration might look as follows:
"python-3.13.0rc1-haa6bb3f_0_cpython_cp313t.tar.bz2": {
"build": "haa6bb3f_0_cpython_cp313t",
"build_number": 0,
"depends": [
"bzip2 >=1.0.8,<2.0a0",
"expat >=2.6.2,<3.0a0",
"libffi >=3.4,<4.0a0",
"ncurses >=6.4,<7.0a0",
"openssl >=3.0.14,<4.0a0",
"readline >=8.1.2,<9.0a0",
"sqlite >=3.45.3,<4.0a0",
"tk >=8.6.14,<8.7.0a0",
"tzdata",
"xz >=5.4.6,<6.0a0",
"zlib >=1.2.13,<1.3.0a0"
],
"license": "PSF-2.0",
"license_family": "PSF",
"timestamp": 1722610680768,
"track_features": "free-threading",
"md5": "c09289eb86239e1221533457d861f1a3",
"name": "python",
"size": 16332720,
"subdir": osx-arm64,
"version": "3.13.0rc1",
"sha256": "fa0ae22c13450fe6c30c754ee5efbd7fe7e7533b878d7be96e74de56211d19df",
"python_site_packages_path": "lib/python3.13t/site-packages"
},
This field will also be included in repodata derivatives, like current_repodata.json
or sharded repodata when appropriate.
Because this field is present in repodata.json
it can be hot-fixed to correct mistakes or omissions.
This allows existing python
packages to retroactively specify the locations of their site-packages directory.
This proposal was motivated by the free-threading configuration of CPython 3.13 which uses a different site-packages location on POSIX systems.
Specifically, the free-threading build uses lib/python3.13t/site-packages
.
A symlink between these paths or logic in sitecustomize.py allows packages installed into the incorrect site-packages directory to operate but this is less than ideal.
For additional details and discussion see conda issue 14053.
Although this is motivated by the free-threading configuration of CPython, the same underlying issue occurs in many non-CPython implementations. This includes both PyPy and GraalPy which use different paths for site-packages than CPython. The conda packages in the conda-forge channel for both of these use symlinks to work around tools which link files into the incorrect site-packages directory.
Because this proposal can change the location where files are installed it is worth considering what, if any, security implementation this change entails.
A primary security concern is that by allowing the python
package to specify where files will be installed, a malicious package could write or overwrite files with malicious content.
If files can only be installed into the environment where the package is installed, the damage from an attack is limited since a malicious package can already install files anywhere in the environment.
For example a malicious python
package could specify an invalid site-packages path, in which case noarch: python
packages installed in the environment would not work.
This is inconvenient but not a security concern and should be addressed by removing or disabling the package in question.
The malicious package could also specify a site-packages path that would cause files from noarch: python
packages to populate or replace key files in the environment.
But this type of attack is already possible, in fact the python
package itself could include these files.
Of more concern is the possibility of a package writing files outside of the environment.
This could replace critical system files with malicious content.
Because of this risk, paths which are outside of the environment are invalid as python_site_packages_path
entries.
Tools must check that the field is valid and error out when it is invalid.
This change maintains backwards compatibility.
All existing python
packages do not include this field.
The behavior when this field is not specified matches what is currently done by conda, mamba and pixi.
The only behavior change occurs when this field is included.
Because existing releases of conda, mamba and pixi do not support this field, python
packages should continue to provide work around to allow files to be installed into an incorrect site-packages directory until such time as this proposal has been implemented and available for a sufficient amount of time.
A number of alternatives were discussed and considered to address this problem. These include:
- Using symlinks or a
sitecustomize.py
file to allow linking into the incorrect site-packages directory. This is a work-around for the problem, not a fix. - Querying the interpreter to determine the site-packages directory.
This was rejected as it requires running the interpreter as part of the install step and requires that the
python
package be installed and operational beforenoarch: python
packages can be linked. - Using the build string of the
python
orpython_abi
package to determine the site-packages location. This is an indirect method of obtaining information that can be more effectively specified explicitly. - Storing this information in another metadata file, such as
about.json
orpaths.json
. The downside of this approach is that the data cannot be hotfixed and the path for the site-packages directory would not be known until thepython
package is downloaded and unpacked, potentially limiting the order in which packages can be linked.
For more alternatives and discussion see conda issue 14053.
All CEPs are explicitly CC0 1.0 Universal.