Skip to content

Commit

Permalink
Add guardrails gateway proposal
Browse files Browse the repository at this point in the history
Signed-off-by: Xie Zhihao <[email protected]>
  • Loading branch information
zhxie committed Jun 21, 2024
1 parent f1fec03 commit f93fd86
Showing 1 changed file with 176 additions and 0 deletions.
176 changes: 176 additions & 0 deletions community/rfcs/24-06-21-OPEA-001-Guardrails-Gateway.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
## RFC Title

Guardrails Gateway

## RFC Content

### Author

[zhxie](https://github.com/zhxie), [Forrest-zhao](https://github.com/Forrest-zhao), [ruijin-intel](https://github.com/ruijin-intel)

### Status

Under Review

### Objective

Deploy opt-in guardrails in gateway on deployment environment.

### Motivation

- Reduce latency in network transmission and protocol encoding/decoding.
- Support stateful guardrails.
- Enhance Observability.
- Leverage OpenVINO for AI acceleration instructions including AVX, AVX512 and AMX.

### Design Proposal

#### Inference In Place

The LangChain-like workflow is presented below.

```mermaid
graph LR
Entry(Entry)-->Gateway
Gateway-->Embedding
Embedding-->Gateway
Gateway-->Retrieve
Retrieve-->Gateway
Gateway-->Rerank
Rerank-->Gateway
Gateway-->LLM
LLM-->Guardrails
Guardrails-->LLM
LLM-->Gateway
```

All services use RESTful API calling to communicate. There is overhead in network transmission and protocol encoding/decoding. Early studies have shown that each hop adds a 3ms of latency, which can be even longer when mTLS is turned on for security reason in inter-nodes deployment.

The opt-in guardrails in gateway works in the architecture given below.

```mermaid
graph LR
Entry(Entry)-->Gateway["Gateway\nGuardrails"]
Gateway-->Embedding
Embedding-->Gateway
Gateway-->Retrieve
Retrieve-->Gateway
Gateway-->Rerank
Rerank-->Gateway
Gateway-->LLM
LLM-->Gateway
```

The gateway can host multiple guardrails without extra network transmission or protocol encoding/decoding. In the real world deployment, there may be many guardrails in all perspectives, and the gateway is the best place to provide guardrails for the system.

The gateway consists of 2 basic components, inference runtime and guardrails.

```mermaid
graph TD
Gateway---Runtime[Inference Runtime API]
Runtime---OpenVINO
Runtime---PyTorch
Runtime---Others[...]
Gateway---Guardrails
Guardrails---Load[Load Model]
Guardrails---Inference
Guardrails---Access[Access Control]
```

A unified inference runtime API provides a general interface for inference runtimes. Any inference runtime can be integrated into the system including OpenVINO. The guardrails leverages the inferece runtime and decides if the request/reponse is valid.

#### Stateful Guardrails

The traditional workflow from ingress to egress is presented below.

```mermaid
flowchart LR
Entry(Entry)-->GuardrailsA
GuardrailsA["Guardrails\nAnti-Jailbreaking"]-->Embedding
Embedding-->Retrieve
Retrieve-->Rerank
Rerank-->LLM
LLM-->GuardrailsB["Guardrails\nAnti-Profanity"]
```

Guardrails service provides certain protection for LLM, such as anti-jailbreaking, anti-poisoning for the input side, anti-toxicity, factuality check for the output side, and PII detection for both input and output side.

Guardrails can also be spliited into 2 types, stateless and stateful. Guardrails including anti-jailbreaking, anti-toxicity and PII detection are considered as stateless guards, since they do not rely on both prompt input and response output, while anti-hallucination is regarded as a stateful guard, it needs both input and ouput for the relativity between.

[Guardrails Microservice](https://github.com/xuechendi/GenAIComps/tree/pii_detection/comps/guardrails) provides certain guardrails as microservice, but due to the limitation microservice, it is not able to track requests for responses, leading to difficulty in providing stateless guard ability.

The opt-in guardrails in gateway works in the architecture given below.

```mermaid
flowchart LR
Entry(Entry)-->GuardrailsA
subgraph Gateway
GuardrailsA["Guardrails\nAnti-Jailbreaking"]-->GuardrailsC
GuardrailsB-->GuardrailsC
end
GuardrailsC["Guardrails\nAnti-Hallucination"]-->Embedding
Embedding-->Retrieve
Retrieve-->Rerank
Rerank-->LLM
LLM-->GuardrailsB["Guardrails\nAnti-Profanity"]
```

As a alternative choice, the gateway will also provide guardrails ability, no matter stateful or stateless.

#### Observability

Envoy is the most popular proxy in cloud native, which contains out-of-box access log, stats and metrics, and can be integrated into observability platform including OpenTelemetry and Prometheus naturally.

Guardrails in gateway will leverages these abilities about observability to meet potential regulartory and compliance needs.

#### Multi-Services Deployment

Let's say the embedding and LLM services are AI-powered and require guardrails protection.

The opt-in gateway can be deployed as a gateway or sidecar services.

```mermaid
graph LR
Entry(Entry)-->Embedding
subgraph SidecarA[Sidecar]
Embedding
end
Embedding-->Retrieve
Retrieve-->Rerank
Rerank-->LLM
subgraph SidecarB[Sidecar]
LLM
end
```

The gateway can also work with guardrails microservices.

```mermaid
graph LR
Entry(Entry)-->GuardrailsC["Guardrails\nAnti-Hallucination"]
GuardrailsC["Guardrails\nAnti-Hallucination"]-->GuardrailsA["Guardrails\nAnti-Jailbreaking"]
GuardrailsA-->Embedding
Embedding-->Retrieve
Retrieve-->Rerank
Rerank-->GuardrailsB["Guardrails\nAnti-Jailbreaking"]
GuardrailsB-->LLM
LLM-->GuardrailsD["Guardrails\nAnti-Profanity"]
subgraph Gateway
GuardrailsD-->GuardrailsC
end
```

### Alternatives Considered

[Guardrails Microservice](https://github.com/xuechendi/GenAIComps/tree/pii_detection/comps/guardrails): has provided certain guardrails, however it only supports stateless guardrails.

### Compatibility

N/A

### Miscs

- TODO

- [ ] API definitions for meta service deployment and Kubernetes deployment
- [ ] Envoy inference framework and guardrails HTTP filter

0 comments on commit f93fd86

Please sign in to comment.