Digital Publishing Search API
A Go application microservice to provide query functionality on the ONS Website
Set up dependencies locally as follows:
- In dp-compose run
docker-compose up -d
to run ElasticSearch 7.10- dp-compose will run Elasticsearch 7.10 on port 11200 to not conflict with ES 2.2 running on port 9200
- If using the POST /search endpoint then authorisation for this requires running Vault and Zebedee as follows:
- In any directory run
vault server -dev
as Zebedee has a dependency on Vault - In the zebedee directory run
./run.sh
to run Zebedee
Then run make debug
- Requires ElasticSearch running on port 11200
- Requires Zebedee running on port 8082
- No further dependencies other than those defined in
go.mod
An overview of the configuration options available, either as a table of environment variables, or with a link to a configuration guide.
Environment variable | Default | Description |
---|---|---|
AWS_FILENAME | "" | The AWS file location for finding credentials to sign AWS http requests |
AWS_PROFILE | "" | The AWS profile to use from credentials file to sign AWS http requests |
AWS_REGION | eu-west-2 | The AWS region to use when signing requests with AWS SDK |
AWS_SERVICE | "es" | The AWS service that the AWS SDK signing mechanism needs to sign a request |
AWS_SIGNER | false | The AWS signer flag will determine if requests to Elasticsearch contain round tripper for signing requests |
AWS_TLS_INSECURE_SKIP_VERIFY | false | This should never be set to true, as it disables SSL certificate verification. Used only for development |
BIND_ADDR | :23900 | The host and port to bind to |
BERLIN_URL | "http://localhost:28900" | HTTP URL of the NLP Berlin API |
CATEGORY_URL | "http://localhost:28800" | HTTP URL of the NLP Category API |
DEFAULT_LIMIT | 10 | The default limit of search results in a page |
DEFAULT_MAXIMUM_LIMIT | 100 | The default maximum limit of search results in a page |
DEFAULT_OFFSET | 0 | The default offset of search results |
ELASTIC_SEARCH_URL | "http://localhost:11200" | Http url of the ElasticSearch server |
GRACEFUL_SHUTDOWN_TIMEOUT | 5s | The graceful shutdown timeout in seconds (time.Duration format) |
HEALTHCHECK_CRITICAL_TIMEOUT | 90s | Time to wait until an unhealthy dependent propagates its state to make this app unhealthy (time.Duration format) |
HEALTHCHECK_INTERVAL | 30s | Time between self-healthchecks (time.Duration format) |
NLP_SETTINGS | See below | NLP Settings |
ENABLE_NLP_WEIGHTING | false | Feature flag for enabling NLP Weighting functionality via Scrubber, Category and Berlin |
OTEL_BATCH_TIMEOUT | 5s | Interval between pushes to OT Collector |
OTEL_EXPORTER_OTLP_ENDPOINT | "http://localhost:4317" | URL for OpenTelemetry endpoint |
OTEL_SERVICE_NAME | "dp-search-api" | Service name to report to telemetry tools |
OTEL_ENABLED | false | Feature flag to enable OpenTelemetry |
SCRUBBER_URL | "http://localhost:28700" | |
ZEBEDEE_URL | "http://localhost:8082" | The URL to Zebedee (for authorisation) |
NLP Hub Settings are set as JSON, of which the default is:
{\"category_weighting\": 100000000.0, \"category_limit\": 100, \"default_state\": \"gb\"}
Key | Type | Description |
---|---|---|
category_weighting | float | How important is the category weighting when using them in ElasticSearch |
category_limit | int | Limits how many categories are returned |
default_state | string |
Documentation of the API interface is described using swagger 2.0. This specification can be found here.
Applications trying to interact with the API can use the Go SDK package which contains a list of client methods that are maintained to align with the API. For futher reading on how to use the client follow this link.
- You will need an user account for the aws account you are trying to connect to
- You will need to be given a policy to allow read and write access to the AWS Elasticsearch cluster
To connect to managed Elasticsearch cluster in AWS, you will want to port forward 11200 to the domain endpoint. Using the dp tool, one can do this like so:
dp ssh develop <ip of aws box> -p 9200:<elasticsearch cluster domain endpoint e.g. "<unique identifier>" + "eu-west-1.es.amazonaws.com:443"
Once connected, run the following make target:
make local
Use the dp-search-reindex-batch script to build a local search index from data in your local zebedee and dataset api data.
(Note, an earlier version of this service included a reindexing script that has now been replaced by the above tool)
See ARCHITECTURE for details.
See CONTRIBUTING for details.
Copyright © 2016-2023, Office for National Statistics (https://www.ons.gov.uk)
Released under MIT license, see LICENSE for details.