Text Embeddings Inference (TEI)
is a comprehensive toolkit designed for efficient deployment and serving of open source text embeddings models.
It enable us to host our own reranker endpoint seamlessly.
This README provides set-up instructions and comprehensive details regarding the reranking microservice via TEI.
- Start the TEI service:
-
For Gaudi HPU:
export HF_TOKEN=${your_hf_api_token} export RERANK_MODEL_ID="BAAI/bge-reranker-base" export volume=$PWD/data docker run -d -p 6060:80 -v $volume:/data -e http_proxy=$http_proxy -e https_proxy=$https_proxy --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 --model-id $RERANK_MODEL_ID --hf-api-token $HF_TOKEN
-
For Xeon CPU:
export HF_TOKEN=${your_hf_api_token} export RERANK_MODEL_ID="BAAI/bge-reranker-base" export volume=$PWD/data docker run -d -p 6060:80 -v $volume:/data -e http_proxy=$http_proxy -e https_proxy=$https_proxy --pull always ghcr.io/huggingface/tei-gaudi:1.5.2 --model-id $RERANK_MODEL_ID --hf-api-token $HF_TOKEN
- Verify the TEI Service: Run the following command to check if the service is up and running.
export ip_address=$(hostname -I | awk '{print $1}')
curl ip_address:6060/rerank \
-X POST \
-d '{"query":"What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}' \
-H 'Content-Type: application/json'
-
Build the Docker image for the reranking microservice:
docker build --no-cache \ -t opea/reranking:comps \ --build-arg https_proxy=$https_proxy \ --build-arg http_proxy=$http_proxy \ --build-arg SERVICE=tei \ -f comps/rerankings/src/Dockerfile .
-
Run the reranking microservice and connect it to the TEI service:
docker run -d --name="reranking-tei-server" -e LOGFLAG=True -p 8000:8000 --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e TEI_RERANKING_ENDPOINT=$TEI_RERANKING_ENDPOINT -e HF_TOKEN=$HF_TOKEN -e RERANK_COMPONENT_NAME="OPEA_TEI_RERANKING" opea/reranking:comps
Deploy both the TEI Reranking Service and the Reranking Microservice using Docker Compose.
🔹 Steps:
-
Set environment variables:
export RERANK_MODEL_ID="BAAI/bge-reranker-base" export TEI_RERANKING_PORT=12003 export RERANK_PORT=8000 export TEI_RERANKING_ENDPOINT="http://${host_ip}:${TEI_RERANKING_PORT}" export TAG=comps export host_ip=${host_ip}
-
Navigate to the Docker Compose directory:
cd comps/rerankings/deployment/docker_compose/
-
Start the services:
-
For Gaudi HPU:
docker compose up reranking-tei -d
-
For Xeon CPU:
docker compose up reranking-tei-gaudi -d
Verify the reranking service is running:
curl http://localhost:8000/v1/health_check \
-X GET \
-H 'Content-Type: application/json'
-
Execute reranking process by providing query and documents
curl http://localhost:8000/v1/reranking \ -X POST \ -d '{"initial_query":"What is Deep Learning?", "retrieved_docs": [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' \ -H 'Content-Type: application/json'
- You can add the parameter
top_n
to specify the return number of the reranker model, default value is 1.
curl http://localhost:8000/v1/reranking \ -X POST \ -d '{"initial_query":"What is Deep Learning?", "retrieved_docs": [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}], "top_n":2}' \ -H 'Content-Type: application/json'
- You can add the parameter
-
Port Mapping: Ensure the ports are correctly mapped to avoid conflicts with other services.
-
Model Selection: Choose a model appropriate for your use case, like "BAAI/bge-reranker-base".
-
Environment Variables: Use http_proxy and https_proxy for proxy setup if necessary.
-
Data Volume: The
-v ./data:/data
flag ensures the data directory is correctly mounted.