Most of Presidio's services are written in Go. The presidio-analyzer
module, in charge of detecting entities in text, is written in Python. This document details the required parts for developing for Presidio.
-
Install go 1.11 and Python 3.7
-
Install the golang packages via dep
dep ensure
-
Install tesseract OCR framework. (Optional, only for Image anonymization)
-
Build and install re2 (Optional. Presidio will use
regex
instead ofpyre2
ifre2
is not installed)re2_version="2018-12-01" wget -O re2.tar.gz https://github.com/google/re2/archive/${re2_version}.tar.gz mkdir re2 tar --extract --file "re2.tar.gz" --directory "re2" --strip-components 1 cd re2 && make install
-
Install pipenv
Pipenv is a Python workflow manager, handling dependencies and environment for python packages, it is used in the Presidio's Analyzer project as the dependencies manager
pip3 install --user pipenv
brew install pipenv
Additional installation instructions: https://pipenv.readthedocs.io/en/latest/install/#installing-pipenv
-
Create virtualenv for the project & Install all requirements in the Pipfile, including dev requirements Install the Python packages for the analyzer in the
presidio-analyzer
folder, run:pipenv install --dev --sequential
-
Run all tests
pipenv run pytest
-
To run arbitrary scripts within the virtual env, start the command with
pipenv run
. For example:pipenv run flake8 analyzer --exclude "*pb2*.py"
pipenv run pylint analyzer
pipenv run pip freeze
-
Start shell:
pipenv shell
-
Run commands in the shell
pytest pylint analyzer pip freeze
- To use presidio-analyzer as a python library, see Installing presidio-analyzer as a standalone Python package
- To add new recognizers in order to support new entities, see Adding new custom recognizers
- Installing and building the entire Presidio solution is currently not supported on Windows. However, installing and building the different docker images, or the Python package for detecting entities (presidio-analyzer) is possible on Windows. See here
- Build the bins with
make build
- Build the base containers with
make docker-build-deps DOCKER_REGISTRY=${DOCKER_REGISTRY} PRESIDIO_DEPS_LABEL=${PRESIDIO_DEPS_LABEL}
(If you do not specify a valid, logged-in, registry a warning will echo to the standard output) - Build the the Docker image with
make docker-build DOCKER_REGISTRY=${DOCKER_REGISTRY} PRESIDIO_DEPS_LABEL=${PRESIDIO_DEPS_LABEL} PRESIDIO_LABEL=${PRESIDIO_LABEL}
- Push the Docker images with
make docker-push DOCKER_REGISTRY=${DOCKER_REGISTRY} PRESIDIO_LABEL=${PRESIDIO_LABEL}
- Run the tests with
make test
- Adding a file in go requires the
make go-format
command before running and building the service. - Run functional tests with
make test-functional
- Updating python dependencies instructions
GRPC_PORT
:3001
GRPC listen port
GRPC_PORT
:3002
GRPC listen port
WEB_PORT
:8080
HTTP listen portREDIS_URL
:localhost:6379
, Optional: Redis addressANALYZER_SVC_ADDRESS
:localhost:3001
, Analyzer addressANONYMIZER_SVC_ADDRESS
:localhost:3002
, Anonymizer address
Developing presidio as a whole on Windows is currently not supported. However, it is possible to run and test the presidio-analyzer module, in charge of detecting entities in text, on Windows using Docker:
- Run locally the core services Presidio needs to operate:
docker run --rm --name test-redis --network testnetwork -d -p 6379:6379 redis
docker run --rm --name test-presidio-anonymizer --network testnetwork -d -p 3001:3001 -e GRPC_PORT=3001 mcr.microsoft.com/presidio-anonymizer:latest
docker run --rm --name test-presidio-recognizers-store --network testnetwork -d -p 3004:3004 -e GRPC_PORT=3004 -e REDIS_URL=test-redis:6379 mcr.microsoft.com/presidio-recognizers-store:latest
-
Navigate to
<Presidio folder>/presidio-analyzer
-
Install the python packages if didn't do so yet:
pipenv install --dev --sequential
- If you want to experiment with
analyze
requests, navigate into theanalyzer
folder and start serving the analyzer service:
pipenv run python __main__.py serve --grpc-port 3000
- In a new
pipenv shell
window you can runanalyze
requests, for example:
pipenv run python __main__.py analyze --text "John Smith drivers license is AC432223" --fields "PERSON" "US_DRIVER_LICENSE" --grpc-port 3000
-
Edit
post.lua
. Change the template name -
Run wrk
wrk -t2 -c2 -d30s -s post.lua http://<api-service-address>/api/v1/projects/<my-project>/analyze
-
If deploying from a private registry, verify that Kubernetes has access to the Docker Registry.
-
If using a Kubernetes secert to manage the registry authentication, make sure it is registered under 'presidio' namespace
Edit charts/presidio/values.yaml to:
- Setup secret name (for private registries)
- Change presidio services version
- Change default scale