Skip to content

Commit

Permalink
Merge pull request #14 from a-reich/docs_wip
Browse files Browse the repository at this point in the history
Docs built with Sphinx
  • Loading branch information
a-reich authored Jun 20, 2022
2 parents fcfe0f5 + 325fb26 commit dff9186
Show file tree
Hide file tree
Showing 8 changed files with 189 additions and 23 deletions.
20 changes: 18 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# versioned-pickle
A small utility Python package for adding environment metadata to pickle files and warning on mismatch when loaded.

Rendered documentation including full reference is available at the [Github Pages site](https://a-reich.github.io/versioned-pickle/). The repository is [here](https://github.com/a-reich/versioned-pickle).

# What does this do for me?
`versioned-pickle` records metadata about the Python environment when used to pickle an object,
checks the new environment when unpickling, compares the two and warns if they are not considered to match.
Expand All @@ -19,9 +21,14 @@ outputted info to update your environment in whatever way you choose. This is be
ecosystem and how to specify then recreate an environment has many nuances and several different tools
are popular (pip, conda, pipenv, poetry, etc.).
# Installation
To install from source the latest commit from Github: `pip install git+https://github.com/a-reich/versioned-pickle.git`
To install from source the latest commit from Github:
```
pip install git+https://github.com/a-reich/versioned-pickle.git
```
To install a specific built wheel from GH:
` pip install versioned-pickle@https://github.com/a-reich/versioned-pickle/releases/download/v0.3.2/versioned_pickle-0.3.2-py3-none-any.whl`
```
pip install versioned-pickle@https://github.com/a-reich/versioned-pickle/releases/download/v0.3.3/versioned_pickle-0.3.2-py3-none-any.whl
```
Python versions >=3.8 are supported.
# Usage
`versioned-pickle` provides a drop-in replacement for the standard library `pickle` module,
Expand All @@ -48,6 +55,15 @@ to include, in increasing order of strictness:
* "installed" - all installed distributions.

(The Python version is also recorded but not used in validation by default).

A unique feature of the tool which is less obvious and which users might not know how to implement for themselves
is the "object" scope, which tells versioned-pickle to **introspect** the object and intelligently **determine**
which packages need their versions recorded, based on which modules define the types encountered during pickling.
This is handy if, for instance, you have a dictionary of pandas DataFrames you were working with and did
some plotting of with matplotlib. You can pickle the dictionary for use later, and versioned-pickle
will record the pandas version for validation
but ignore matplotlib (which you don't really need just to load the object.)

Environment metadata is obtained using `importlib.metadata`. Modules that are loaded directly
from sys.path without being installed as part of a distribution, or functions/classes
only defined in __main__, are ignored (it's assumed that if you're using this package you already
Expand Down
20 changes: 20 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = source
BUILDDIR = build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
20 changes: 20 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
This is the code for building the documentation with Sphinx.

Developer notes:

There are a few nuances to getting this setup working right. I wanted to create nicer docs and use some
more advanced capabilities like automatically generating docs content from the source code & docstrings,
but I also didn't want to spend a lot of effort learning new RST/Sphinx syntax, as most of the content is simple.

Therefore, the docs are written mostly as Markdown files which the [MyST-Parser](https://myst-parser.readthedocs.io/en/latest/) extension
converts at build time. I dislike the raw Sphinx style for docstrings and use Numpy style instead. There's also an extension for adding the .nojekyll file to make Github Pages render the docs without extra processing.

The current steps for publishing docs are:
1. Checkout the main branch and then create `git checkout -b gh-pages`.
2. Install sphinx and myst-parser.
3. `cd docs; .\make html`
4. Check the sphinx build worked and open locally in a browser to view.
5. `mv build/html/* .` (GH pages renders the docs folder only.)
6. Add and commit the docs folder to git, push to remote. Do not push the built output to other branches.

In the future this could be automated in CI.
35 changes: 35 additions & 0 deletions docs/make.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
@ECHO OFF

pushd %~dp0

REM Command file for Sphinx documentation

if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
)
set SOURCEDIR=source
set BUILDDIR=build

%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
echo.
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
echo.installed, then set the SPHINXBUILD environment variable to point
echo.to the full path of the 'sphinx-build' executable. Alternatively you
echo.may add the Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
echo.https://www.sphinx-doc.org/
exit /b 1
)

if "%1" == "" goto help

%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
goto end

:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%

:end
popd
6 changes: 6 additions & 0 deletions docs/source/API_reference.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# versioned_pickle API reference

```{eval-rst}
.. automodule:: versioned_pickle
:members:
```
54 changes: 54 additions & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
# list see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html

# -- Path setup --------------------------------------------------------------

# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
# import os
# import sys
# sys.path.insert(0, os.path.abspath('.'))


# -- Project information -----------------------------------------------------

project = "versioned-pickle"
copyright = "2022, a-reich"
author = "a-reich"


# -- General configuration ---------------------------------------------------

# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = ["sphinx.ext.autodoc", "sphinx.ext.napoleon", "myst_parser", "sphinx.ext.githubpages"]

# Add any paths that contain templates here, relative to this directory.
templates_path = ["_templates"]

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]


# -- Options for HTML output -------------------------------------------------

# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = "alabaster"

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ["_static"]

# -- Manually added configuration
myst_heading_anchors = 3
5 changes: 5 additions & 0 deletions docs/source/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
```{include} ../../README.md
```

# API reference:
[See a detailed description of the API here](API_reference.md).
52 changes: 31 additions & 21 deletions versioned_pickle/__init__.py
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
"""Main module of versioned-pickle.
The main API consists of these functions:
dump
dumps
load
loads
These can be used as a nearly drop-in replacement for the corresponding functions from the stdlib pickle module.
:meth:`dump`
:meth:`load`
These can be used as a nearly drop-in replacement for the corresponding functions from the stdlib ``pickle`` module.
Only these are needed for normal use. Additional public objects
(including EnvironmentMetadata and PackageMismatchWarning) are exposed only for potentially customizing the treatment
of environment metadata or handling of mismatches.
(including EnvironmentMetadata and PackageMismatchWarning) are exposed only for introspecting
or potentially customizing the treatment of environment metadata and handling of mismatches.
"""

from __future__ import annotations
Expand Down Expand Up @@ -47,28 +47,32 @@ class EnvironmentMetadata:
Attributes
---------
packages: dict of distribution names to version strings
py_ver: 3-tuple of ints, the python interpreter version
package_scope: {"object", "loaded", "installed"},
packages:
dict of distribution names to version strings
py_ver:
the python interpreter version
package_scope: {"object", "loaded", "installed"}
the type of scope that was used for which packages to include.
"""

packages: dict[str, str]
py_ver: tuple[int, int, int]
package_scope: str
package_scope: Literal["object", "loaded", "installed"]

# TODO: add checks for valid field values? in a optional custom method or an auto-called __post_init__?
@classmethod
def from_scope(
cls, package_scope: str = "object", object_modules: Iterable[str] | None = None
cls,
package_scope: Literal["object", "loaded", "installed"] = "object",
object_modules: Iterable[str] | None = None,
) -> EnvironmentMetadata:
"""Construct an EnvironmentMetadata based on the type of scope for which packages to include.
This is the typical way to construct instances, not calling the class name directly.
Params
Parameters
-------
package_scope: str,
package_scope: {"object", "loaded", "installed"}
can be "object" meaning the specific modules needed for an object, in which case module names
must be specified in object_modules, or "loaded", or "installed".
object_modules: optional Iterable[str],
Expand Down Expand Up @@ -96,7 +100,7 @@ def from_scope(

return cls(packages=packages, py_ver=sys.version_info[:3], package_scope=package_scope)

def to_header_dict(self) -> dict[str, Any]:
def to_header_dict(self) -> dict[str, dict[str, Any]]:
"""Get a representation of the metadata as a Python-native dict.
Used when one doesn't want to have import versioned_pickle itself, such as in the header created
Expand Down Expand Up @@ -199,10 +203,14 @@ def reducer_override(self, obj: object) -> object:
return NotImplemented


def dump(obj: object, file: typ.IO[bytes], package_scope: str = "object") -> None:
def dump(
obj: object,
file: typ.IO[bytes],
package_scope: Literal["object", "loaded", "installed"] = "object",
) -> None:
"""Pickle an object's data to a file with environment metadata.
Params
Parameters
------
obj: any object to pickle
file: file-like obj (writable, binary mode)
Expand Down Expand Up @@ -233,10 +241,12 @@ def load(file: typ.IO[bytes], return_meta: bool = False) -> object: # type: ign
The saved EnvironmentMetadata from the environment that dumped the file is checked against the
current EnvironmentMetadata. Extra packages in the load env are ignored as is python version.
If they do not match, a PackageMismatchWarning is warned with details of the mismatches.
Params
Parameters
------
file: file-like obj (readable, binary mode)
return_meta: optional bool, if True return a tuple of the object and its metadata
return_meta: optional bool
if True return a tuple of the object and its metadata
"""
header_dict = pickle.load(file)
pickled_meta = EnvironmentMetadata.from_header_dict(header_dict)
Expand All @@ -258,7 +268,7 @@ def load(file: typ.IO[bytes], return_meta: bool = False) -> object: # type: ign
raise validation from exc


def dumps(obj: object, package_scope: str = "object") -> bytes:
def dumps(obj: object, package_scope: Literal["object", "loaded", "installed"] = "object") -> bytes:
"""Like dump, but returns an in-memory bytes object instead of using a file."""
f = io.BytesIO()
dump(obj, f, package_scope=package_scope)
Expand Down

0 comments on commit dff9186

Please sign in to comment.