From f93fd86e00a7f36cc8e6298c2f8f70fe8586e3ed Mon Sep 17 00:00:00 2001 From: Xie Zhihao Date: Fri, 21 Jun 2024 14:43:22 +0800 Subject: [PATCH] Add guardrails gateway proposal Signed-off-by: Xie Zhihao --- .../24-06-21-OPEA-001-Guardrails-Gateway.md | 176 ++++++++++++++++++ 1 file changed, 176 insertions(+) create mode 100644 community/rfcs/24-06-21-OPEA-001-Guardrails-Gateway.md diff --git a/community/rfcs/24-06-21-OPEA-001-Guardrails-Gateway.md b/community/rfcs/24-06-21-OPEA-001-Guardrails-Gateway.md new file mode 100644 index 00000000..f413b6b9 --- /dev/null +++ b/community/rfcs/24-06-21-OPEA-001-Guardrails-Gateway.md @@ -0,0 +1,176 @@ +## RFC Title + +Guardrails Gateway + +## RFC Content + +### Author + +[zhxie](https://github.com/zhxie), [Forrest-zhao](https://github.com/Forrest-zhao), [ruijin-intel](https://github.com/ruijin-intel) + +### Status + +Under Review + +### Objective + +Deploy opt-in guardrails in gateway on deployment environment. + +### Motivation + +- Reduce latency in network transmission and protocol encoding/decoding. +- Support stateful guardrails. +- Enhance Observability. +- Leverage OpenVINO for AI acceleration instructions including AVX, AVX512 and AMX. + +### Design Proposal + +#### Inference In Place + +The LangChain-like workflow is presented below. + +```mermaid +graph LR + Entry(Entry)-->Gateway + Gateway-->Embedding + Embedding-->Gateway + Gateway-->Retrieve + Retrieve-->Gateway + Gateway-->Rerank + Rerank-->Gateway + Gateway-->LLM + LLM-->Guardrails + Guardrails-->LLM + LLM-->Gateway +``` + +All services use RESTful API calling to communicate. There is overhead in network transmission and protocol encoding/decoding. Early studies have shown that each hop adds a 3ms of latency, which can be even longer when mTLS is turned on for security reason in inter-nodes deployment. + +The opt-in guardrails in gateway works in the architecture given below. + +```mermaid +graph LR + Entry(Entry)-->Gateway["Gateway\nGuardrails"] + Gateway-->Embedding + Embedding-->Gateway + Gateway-->Retrieve + Retrieve-->Gateway + Gateway-->Rerank + Rerank-->Gateway + Gateway-->LLM + LLM-->Gateway +``` + +The gateway can host multiple guardrails without extra network transmission or protocol encoding/decoding. In the real world deployment, there may be many guardrails in all perspectives, and the gateway is the best place to provide guardrails for the system. + +The gateway consists of 2 basic components, inference runtime and guardrails. + +```mermaid +graph TD + Gateway---Runtime[Inference Runtime API] + Runtime---OpenVINO + Runtime---PyTorch + Runtime---Others[...] + Gateway---Guardrails + Guardrails---Load[Load Model] + Guardrails---Inference + Guardrails---Access[Access Control] +``` + +A unified inference runtime API provides a general interface for inference runtimes. Any inference runtime can be integrated into the system including OpenVINO. The guardrails leverages the inferece runtime and decides if the request/reponse is valid. + +#### Stateful Guardrails + +The traditional workflow from ingress to egress is presented below. + +```mermaid +flowchart LR + Entry(Entry)-->GuardrailsA + GuardrailsA["Guardrails\nAnti-Jailbreaking"]-->Embedding + Embedding-->Retrieve + Retrieve-->Rerank + Rerank-->LLM + LLM-->GuardrailsB["Guardrails\nAnti-Profanity"] +``` + +Guardrails service provides certain protection for LLM, such as anti-jailbreaking, anti-poisoning for the input side, anti-toxicity, factuality check for the output side, and PII detection for both input and output side. + +Guardrails can also be spliited into 2 types, stateless and stateful. Guardrails including anti-jailbreaking, anti-toxicity and PII detection are considered as stateless guards, since they do not rely on both prompt input and response output, while anti-hallucination is regarded as a stateful guard, it needs both input and ouput for the relativity between. + +[Guardrails Microservice](https://github.com/xuechendi/GenAIComps/tree/pii_detection/comps/guardrails) provides certain guardrails as microservice, but due to the limitation microservice, it is not able to track requests for responses, leading to difficulty in providing stateless guard ability. + +The opt-in guardrails in gateway works in the architecture given below. + +```mermaid +flowchart LR + Entry(Entry)-->GuardrailsA + subgraph Gateway + GuardrailsA["Guardrails\nAnti-Jailbreaking"]-->GuardrailsC + GuardrailsB-->GuardrailsC + end + GuardrailsC["Guardrails\nAnti-Hallucination"]-->Embedding + Embedding-->Retrieve + Retrieve-->Rerank + Rerank-->LLM + LLM-->GuardrailsB["Guardrails\nAnti-Profanity"] +``` + +As a alternative choice, the gateway will also provide guardrails ability, no matter stateful or stateless. + +#### Observability + +Envoy is the most popular proxy in cloud native, which contains out-of-box access log, stats and metrics, and can be integrated into observability platform including OpenTelemetry and Prometheus naturally. + +Guardrails in gateway will leverages these abilities about observability to meet potential regulartory and compliance needs. + +#### Multi-Services Deployment + +Let's say the embedding and LLM services are AI-powered and require guardrails protection. + +The opt-in gateway can be deployed as a gateway or sidecar services. + +```mermaid +graph LR + Entry(Entry)-->Embedding + subgraph SidecarA[Sidecar] + Embedding + end + Embedding-->Retrieve + Retrieve-->Rerank + Rerank-->LLM + subgraph SidecarB[Sidecar] + LLM + end +``` + +The gateway can also work with guardrails microservices. + +```mermaid +graph LR + Entry(Entry)-->GuardrailsC["Guardrails\nAnti-Hallucination"] + GuardrailsC["Guardrails\nAnti-Hallucination"]-->GuardrailsA["Guardrails\nAnti-Jailbreaking"] + GuardrailsA-->Embedding + Embedding-->Retrieve + Retrieve-->Rerank + Rerank-->GuardrailsB["Guardrails\nAnti-Jailbreaking"] + GuardrailsB-->LLM + LLM-->GuardrailsD["Guardrails\nAnti-Profanity"] + subgraph Gateway + GuardrailsD-->GuardrailsC + end +``` + +### Alternatives Considered + +[Guardrails Microservice](https://github.com/xuechendi/GenAIComps/tree/pii_detection/comps/guardrails): has provided certain guardrails, however it only supports stateless guardrails. + +### Compatibility + +N/A + +### Miscs + +- TODO + + - [ ] API definitions for meta service deployment and Kubernetes deployment + - [ ] Envoy inference framework and guardrails HTTP filter