Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include the generated files into Python library #384

Merged
merged 30 commits into from
Jul 4, 2024
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
872946c
Setup Python package blueprint.
ielis Apr 4, 2024
c987108
reworking for backwards compat
iimpulse Apr 4, 2024
4f953eb
Merge pull request #1 from iimpulse/python/include-generated-files
ielis Apr 5, 2024
ccea657
Add explanation to the __init__ file.
ielis Apr 5, 2024
d933390
Finish testing.
ielis Apr 5, 2024
5ed2441
Revert changes to `pom.xml`, do everything in the deployment script.
ielis Apr 5, 2024
598c3bb
Improve clean-up.
ielis Apr 5, 2024
6201116
Merge branch 'master' into python/include-generated-files
ielis Apr 5, 2024
4ed9b76
Finish merging `master` to `python/include-generated-files`.
ielis Apr 5, 2024
8396f13
Reorder protobuf plugin goals to the original order.
ielis Apr 5, 2024
b131bb8
Update the python README
ielis Apr 5, 2024
7c03f2d
Move the vrs/vrsatile proto files into `phenopackets`.
ielis Apr 19, 2024
2e3540a
Allow to import all `v2` building blocks.
ielis Apr 30, 2024
e30ef0e
Add documentation for Python.
ielis Apr 30, 2024
71adade
Update Maven due to `protobuf-maven-plugin`.
ielis May 6, 2024
ef66fcd
Use `io.github.ascopes:protobuf-maven-plugin` instead of xolstice to …
ielis May 6, 2024
7ef57c1
Ignore `*.pyi` files.
ielis May 6, 2024
f3a9e24
Use Maven to manage the lifecycle of the Python protobuf bindings.
ielis May 6, 2024
b469b69
Merge pull request #2 from ielis/python/generate-pyi-descriptors
ielis May 23, 2024
527bc72
Ignore VS Code files.
ielis Jul 1, 2024
60010d1
Upgrade protobuf plugin, embed protobuf source files in JAR.
ielis Jul 1, 2024
6ce7f95
Add CI to run Python tests.
ielis Jul 1, 2024
976bfe3
Checkout `d045bb0` in vrs-protobuf.
ielis Jul 1, 2024
b0cc680
Only collect Python tests for now.
ielis Jul 1, 2024
b1aa56d
Change to `python` before running Python tests.
ielis Jul 1, 2024
9cf2f7e
Now run the Python tests.
ielis Jul 1, 2024
732764b
The exception message differs across Python versions. However, we do …
ielis Jul 1, 2024
4c6fdd7
Update `maven-dependency-submission-action`.
ielis Jul 1, 2024
95a47c1
Revert the `maven-dependency-submission-action` version.
ielis Jul 1, 2024
7372760
Mention the limitations of VRS objects in Phenopacket Schema v2.
ielis Jul 1, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,10 @@ nb-configuration.xml
### Python template
!requirements.txt

# We do not track the generated Protobuf files for now.
python/**/*_pb2.py
python/**/*_pb2.pyi

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
Expand Down
2 changes: 1 addition & 1 deletion .mvn/wrapper/maven-wrapper.properties
Original file line number Diff line number Diff line change
@@ -1 +1 @@
distributionUrl=https://repo1.maven.org/maven2/org/apache/maven/apache-maven/3.8.1/apache-maven-3.8.1-bin.zip
distributionUrl=https://repo1.maven.org/maven2/org/apache/maven/apache-maven/3.9.6/apache-maven-3.9.6-bin.zip
77 changes: 26 additions & 51 deletions deploy-python.sh
Original file line number Diff line number Diff line change
@@ -1,30 +1,15 @@
# Create Temporary Destination
# Phenopackets folder
TEMP_DIRECTORY=$(mktemp -d)
echo "Building phenopacket distribution files in temporary directory at $TEMP_DIRECTORY"
TEMP_DIRECTORY_PYTHON_MODULE="$TEMP_DIRECTORY/phenopackets"
TEMP_DIRECTORY_TESTS_MODULE="$TEMP_DIRECTORY/tests"
TEMP_DIRECTORY_VIRTUAL_ENV="$TEMP_DIRECTORY/phenopackets-venv"
declare -a pyfiles=("base" "phenopackets" "biosample" "disease" "genome" "individual" "interpretation" "medical_action" "measurement" "meta_data" "pedigree" "phenotypic_feature" "vrsatile")
# Functions
createInitFile(){
echo "import pkg_resources" >> "$TEMP_DIRECTORY/phenopackets/__init__.py"
echo "__version__ = pkg_resources.get_distribution('phenopackets').version" >> "$TEMP_DIRECTORY/phenopackets/__init__.py"
for i in "${pyfiles[@]}"
do
echo "from .${i}_pb2 import *" >> "$TEMP_DIRECTORY/phenopackets/__init__.py"
done
}
#!/usr/bin/env bash

replaceImports(){
for i in "${pyfiles[@]}"
do
sed -i '' 's/from phenopackets.schema.v2.core/from . /g' "$TEMP_DIRECTORY_PYTHON_MODULE/${i}_pb2.py"
sed -i '' 's/from ga4gh.vrsatile.v1/from . /g' "$TEMP_DIRECTORY_PYTHON_MODULE/${i}_pb2.py"
sed -i '' 's/from ga4gh.vrs.v1/from . /g' "$TEMP_DIRECTORY_PYTHON_MODULE/${i}_pb2.py"
done
}
# Build and Deploy Python Package
# We assume the script is ran from the top-level repository folder as ./deploy-python.sh

DIRECTORY=./python
echo "Building phenopacket distribution files in directory at $DIRECTORY"

# Ensure we generated the protobuf Python files.
./mvnw clean compile

cd $DIRECTORY || { echo "Deployment FAILED. Couldn't find directory" ; exit 1; }
createVirtualEnvironment(){
echo "Creating Python virtual environment at ${1}"
python3 -m venv "${1}" &> /dev/null
Expand All @@ -39,42 +24,32 @@ createVirtualEnvironment(){
echo "Virtual environment created successfully";
}

# Create python module
mkdir $TEMP_DIRECTORY_PYTHON_MODULE
createInitFile
cp ./target/generated-sources/protobuf/python/phenopackets/schema/v2/phenopackets_pb2.py $TEMP_DIRECTORY_PYTHON_MODULE
cp ./target/generated-sources/protobuf/python/phenopackets/schema/v2/core/* $TEMP_DIRECTORY_PYTHON_MODULE
cp ./target/generated-sources/protobuf/python/ga4gh/vrsatile/v1/vrsatile_pb2.py $TEMP_DIRECTORY_PYTHON_MODULE
cp ./target/generated-sources/protobuf/python/ga4gh/vrs/v1/vrs_pb2.py $TEMP_DIRECTORY_PYTHON_MODULE
replaceImports
# Create tests module
mkdir $TEMP_DIRECTORY_TESTS_MODULE
cp ./src/test/python/* $TEMP_DIRECTORY_TESTS_MODULE
# Copy Packaging files
cp requirements.txt setup.py pom.xml LICENSE README.rst $TEMP_DIRECTORY

# Create Python venv in virtual directory
TEMP_DIRECTORY_VIRTUAL_ENV="phenopackets-venv"
createVirtualEnvironment $TEMP_DIRECTORY_VIRTUAL_ENV
cd $TEMP_DIRECTORY || { echo "Deployment FAILED. Couldn't cd to temp directory" ; exit 1; }
# shellcheck disable=SC1090
source "$TEMP_DIRECTORY_VIRTUAL_ENV/bin/activate"
pip install -r "$TEMP_DIRECTORY/requirements.txt"
# Dependencies for building/deploying
python3 -m pip install setuptools wheel twine xmltodict || { echo "Deployment FAILED. Failed to install python dependencies" ; exit 1; }

# Test
pip install -e .
python3 setup.py test || { echo "Deployment FAILED. Unittest Failure" ; exit 1; }
# Build
python3 setup.py sdist bdist_wheel || { echo "Deployment FAILED. Building python package" ; exit 1; }
python3 -m pip install ".[test]"
pytest || { echo "Deployment FAILED. Unittest Failure" ; exit 1; }

# Install dependencies for building/deploying
python3 -m pip install build twine || { echo "Deployment FAILED. Failed to install python dependencies" ; exit 1; }
# Build
python3 -m build || { echo "Deployment FAILED. Building python package" ; exit 1; }
# Deploy - Remove --repository testpypi flag for production.
if [ $1 = "release-prod" ]; then
if [ "$1" = "release-prod" ]; then
python3 -m twine upload dist/*
elif [ $1 = "release-test" ]; then
elif [ "$1" = "release-test" ]; then
python3 -m twine upload --repository testpypi dist/*
else
echo "Python Release was prepared successfully. No release argument provided, use one of [release-prod, release-test] to make the production/test release."
fi



# Clean up
echo "Cleaning up the build environment and the build files"
deactivate
rm -rf build dist ${TEMP_DIRECTORY_VIRTUAL_ENV}
cd ..
./mvnw clean
140 changes: 140 additions & 0 deletions docs/python.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
.. _rstpython:

###################################
Working with Phenopackets in Python
###################################

Similarly to :ref:`Java <rstjava>`, the :ref:`Phenopacket Schema <rstschema>` can be considered the source of truth
for the specification, and the JSON produced by an arbitrary implementation can be used to inter-operate
with other services. Nevertheless, we **strongly** suggest to use the `phenopackets` library available
from Python Package Index (PyPi) or use the Python bindings generated by Protobuf compiler from the Protobuf files.

Here we provide a brief overview of the `phenopackets` library.


Install `phenopackets` into your Python environment
***************************************************

The `phenopackets` package can be installed from PyPi by running:

.. code-block:: shell

python3 -m pip install phenopackets

We use `pip` to install `phenopackets` and the required libraries/dependencies.


Create building blocks programmatically
***************************************

Let's start by importing all building blocks of Phenopacket Schema v2:

>>> import phenopackets.schema.v2 as pps2

Now we can access all building blocks of v2 Phenopacket Schema via `pps2` alias.

For instance, we can create an :ref:`Ontology class <rstontologyclass>` that corresponds to a Human Phenotype Ontology
term for *Spherocytosis* (`HP:0004444`):

>>> spherocytosis = pps2.OntologyClass(id='HP:0004444', label='Spherocytosis')
>>> spherocytosis # doctest: +NORMALIZE_WHITESPACE
id: "HP:0004444"
label: "Spherocytosis"

All schema building blocks, including `OntologyClass`, are available under `pps2` alias, and can be created with constructors that accept key/value arguments.
The constructors will not allow passing of arbitrary attributes:

>>> pps2.OntologyClass(foo='bar')
Traceback (most recent call last):
...
ValueError: Protocol message OntologyClass has no "foo" field.

We do not have to provide all attributes at the creation time and we can set the fields sequentially
using Python property syntax, to achieve the same outcome:

>>> spherocytosis2 = pps2.OntologyClass()
>>> spherocytosis2.id = 'HP:0004444'
>>> spherocytosis2.label = 'Spherocytosis'
>>> spherocytosis == spherocytosis2
True

However, setting the field values with property syntax only works for
`singular <https://protobuf.dev/reference/python/python-generated/#singular-fields-proto3>`_ (non-message) fields,
such as `bool`, `int`, `str`, or `float`, and the assignment will *NOT* work for message fields:

>>> pf = pps2.PhenotypicFeature()
>>> pf.type = spherocytosis
Traceback (most recent call last):
...
AttributeError: Assignment not allowed to field "type" in protocol message object.

To set a message field, we must use the `CopyFrom` function:

>>> pf.type.CopyFrom(spherocytosis)
>>> pf # doctest: +NORMALIZE_WHITESPACE
type {
id: "HP:0004444"
label: "Spherocytosis"
}

Last, a repeated field can be set using list-like semantics:

>>> modifiers = (
... pps2.OntologyClass(id='HP:0003623', label='Neonatal onset'),
... pps2.OntologyClass(id='HP:0011010', label='Chronic'),
... )
>>> pf.modifiers.extend(modifiers)
>>> pf # doctest: +NORMALIZE_WHITESPACE
type {
id: "HP:0004444"
label: "Spherocytosis"
}
modifiers {
id: "HP:0003623"
label: "Neonatal onset"
}
modifiers {
id: "HP:0011010"
label: "Chronic"
}

See `Protobuf documentation <https://protobuf.dev/reference/python/python-generated/#repeated-fields>`_
for more info.


Building blocks I/O
*******************

Having an instance with data, we can write the content into Protobuf's wire format:

>>> binary_str = pf.SerializeToString()
>>> binary_str
b'\x12\x1b\n\nHP:0004444\x12\rSpherocytosis*\x1c\n\nHP:0003623\x12\x0eNeonatal onset*\x15\n\nHP:0011010\x12\x07Chronic'

and get the same content back:

>>> pf2 = pps2.PhenotypicFeature()
>>> _ = pf2.ParseFromString(binary_str)
>>> pf == pf2
True

We can also dump the content of the building block to a *JSON* string or to a `dict` with Python objects using
`MessageToJson <https://googleapis.dev/python/protobuf/latest/google/protobuf/json_format.html#google.protobuf.json_format.MessageToJson>`_
or `MessageToDict <https://googleapis.dev/python/protobuf/latest/google/protobuf/json_format.html#google.protobuf.json_format.MessageToDict>`_
functions:

>>> from google.protobuf.json_format import MessageToDict
>>> json_dict = MessageToDict(pf)
>>> json_dict
{'type': {'id': 'HP:0004444', 'label': 'Spherocytosis'}, 'modifiers': [{'id': 'HP:0003623', 'label': 'Neonatal onset'}, {'id': 'HP:0011010', 'label': 'Chronic'}]}

We complete the JSON round-trip using
`Parse <https://googleapis.dev/python/protobuf/latest/google/protobuf/json_format.html#google.protobuf.json_format.Parse>`_
or `ParseDict <https://googleapis.dev/python/protobuf/latest/google/protobuf/json_format.html#google.protobuf.json_format.ParseDict>`_
functions:

>>> from google.protobuf.json_format import ParseDict
>>> pf2 = ParseDict(json_dict, pps2.PhenotypicFeature())
>>> pf == pf2
True

1 change: 1 addition & 0 deletions docs/working.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ produced as part of the build (:ref:`rstjavabuild`).
:maxdepth: 1

Working with Phenopackets in Java <java>
Working with Phenopackets in Python <python>
Working with Phenopackets in C++ <cpp>

Security disclaimer
Expand Down
64 changes: 46 additions & 18 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@
<protobuf.version>3.20.3</protobuf.version>
<jackson.version>2.15.2</jackson.version>
<hapi-fhir.version>5.7.2</hapi-fhir.version>
<python.src>${project.basedir}/python/src</python.src>
</properties>

<distributionManagement>
Expand Down Expand Up @@ -169,33 +170,44 @@
</dependencies>

<build>
<extensions>
<extension>
<groupId>kr.motd.maven</groupId>
<artifactId>os-maven-plugin</artifactId>
<version>1.6.0</version>
</extension>
</extensions>
<plugins>
<plugin>
<groupId>org.xolstice.maven.plugins</groupId>
<groupId>io.github.ascopes</groupId>
<artifactId>protobuf-maven-plugin</artifactId>
<version>0.6.1</version>
<extensions>true</extensions>
<version>1.2.1</version>

<configuration>
<protocArtifact>com.google.protobuf:protoc:${protobuf.version}:exe:${os.detected.classifier}</protocArtifact>
<protocVersion>${protobuf.version}</protocVersion>
<sourceDirectories>
<sourceDirectory>src/main/proto</sourceDirectory>
</sourceDirectories>
<!-- We partition the generated sources by language, a folder per language.
Each language is generated in a dedicated execution.
Since Java is enabled by default, we disable Java here
to prevent generating Java classes into the other language's folder.
-->
<javaEnabled>false</javaEnabled>
</configuration>
<executions>
<execution>
<id>generate-java</id>
<goals>
<goal>generate</goal>
</goals>
<configuration>
<javaEnabled>true</javaEnabled>
</configuration>
</execution>
<execution>
<id>generate-python</id>
<goals>
<goal>compile</goal>
<goal>test-compile</goal>
<goal>compile-python</goal>
<goal>test-compile-python</goal>
<goal>compile-cpp</goal>
<goal>test-compile-cpp</goal>
<goal>compile-js</goal>
<goal>generate</goal>
</goals>
<configuration>
<pythonEnabled>true</pythonEnabled>
<pythonStubsEnabled>true</pythonStubsEnabled>
<outputDirectory>${python.src}</outputDirectory>
</configuration>
</execution>
</executions>
</plugin>
Expand Down Expand Up @@ -252,6 +264,22 @@
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-clean-plugin</artifactId>
<version>3.3.2</version>
<configuration>
<filesets>
<fileset>
<!-- Delete the generated *.py and *.pyi files except of the manually added re-exports. -->
<directory>${python.src}</directory>
<excludes>
<exclude>**/__init__.py</exclude>
</excludes>
</fileset>
</filesets>
</configuration>
</plugin>
</plugins>
</build>
</project>
1 change: 1 addition & 0 deletions python/LICENSE
Loading
Loading