Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Developer Guide: Document practices for data files #37399

Merged
merged 13 commits into from
Mar 31, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 60 additions & 11 deletions src/doc/en/developer/coding_basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -89,9 +89,9 @@ In particular,
Files and directory structure
=============================

Roughly, the Sage directory tree is layout like this. Note that we use
``SAGE_ROOT`` in the following as a shortcut for the (arbitrary) name
of the directory containing the Sage sources:
Roughly, the Sage directory tree is laid out like this. Note that we
use ``SAGE_ROOT`` in the following as a shortcut for the name of the
directory containing the Sage sources:

.. CODE-BLOCK:: text

Expand All @@ -104,7 +104,7 @@ of the directory containing the Sage sources:
setup.py
...
sage/ # Sage library
ext_data/ # extra Sage resources (formerly src/ext)
ext_data/ # extra Sage resources (legacy)
bin/ # the scripts in local/bin that are tracked
upstream/ # tarballs of upstream sources
local/ # installed binaries
Expand Down Expand Up @@ -149,15 +149,36 @@ Adding new top-level packages below :mod:`sage` should be done
sparingly. It is often better to create subpackages of existing
packages.

Non-Python Sage source code and supporting files can be included in one
of the following places:
Non-Python Sage source code and small supporting files can be
included in one of the following places:

- In the directory of the Python code that uses that file. When the
Sage library is installed, the file will be installed in the same
location as the Python code. For example,
``SAGE_ROOT/src/sage/interfaces/maxima.py`` needs to use the file
``SAGE_ROOT/src/sage/interfaces/maxima.lisp`` at runtime, so it refers
to it as ::
location as the Python code. This is referred to as "package data".

The preferred way to access the data from Python is using the
`importlib.resources API
<https://importlib-resources.readthedocs.io/en/latest/using.html>`_,
in particular the function :func:`importlib.resources.files`.
Using it, you can:

- open a resource for text reading: ``fd = files(package).joinpath(resource).open('rt')``
- open a resource for binary reading: ``fd = files(package).joinpath(resource).open('rb')``
- read a resource as text: ``text = files(package).joinpath(resource).read_text()``
- read a resource as bytes: ``bytes = files(package).joinpath(resource).read_bytes()``
- open an xz-compressed resource for text reading: ``fd = lzma.open(files(package).joinpath(resource).open('rb'), 'rt')``
- open an xz-compressed resource for binary reading: ``fd = lzma.open(files(package).joinpath(resource).open('rb'), 'rb')``

If the file needs to be used outside of Python, then the
preferred way is using the context manager
:func:`importlib.resources.as_file`. It should be imported in the
same way as shown above.

- Older code in the Sage library accesses
the package data in more direct ways. For example,
``SAGE_ROOT/src/sage/interfaces/maxima.py`` uses the file
``SAGE_ROOT/src/sage/interfaces/maxima.lisp`` at runtime, so it
refers to it as::

os.path.join(os.path.dirname(__file__), 'sage-maxima.lisp')

Expand All @@ -169,11 +190,39 @@ of the following places:
from sage.env import SAGE_EXTCODE
file = os.path.join(SAGE_EXTCODE, 'directory', 'file')

In both cases, the files must be listed (explicitly or via wildcards) in
This practice is deprecated, see :issue:`33037`.

In all cases, the files must be listed (explicitly or via wildcards) in
the section ``options.package_data`` of the file
``SAGE_ROOT/pkgs/sagemath-standard/setup.cfg.m4`` (or the corresponding
file of another distribution).

Large data files should not be added to the Sage source tree. Instead, it
is proposed to do the following:

- create a separate git repository and upload them there [2]_,

- add metadata to the repository that make it a pip-installable
package (distribution package), as explained for example in the
`Python Packaging User Guide
<https://packaging.python.org/en/latest/tutorials/packaging-projects/>`_,

- `upload it to PyPI
<https://packaging.python.org/en/latest/tutorials/packaging-projects/#uploading-the-distribution-archives>`_,

- create metadata in ``SAGE_ROOT/build/pkgs`` that make your new
pip-installable package known to Sage; see :ref:`chapter-packaging`.

For guiding examples of external repositories that host large data
files, see https://github.com/sagemath/conway-polynomials, and
https://github.com/gmou3/matroid-database.

.. [2]

It is also suggested that the files are compressed, e.g., through
the command ``xz -e``. They can then be read via a command such as
``lzma.open(file, 'rt')``.


Learn by copy/paste
===================
Expand Down
1 change: 1 addition & 0 deletions src/doc/en/developer/coding_in_python.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ Coding in Python for Sage
This chapter discusses some issues with, and advice for, coding in
Sage.

.. _section-python-language-standard:

Python language standard
========================
Expand Down
Loading