Skip to content

Commit

Permalink
Merge branch 'lm-sys:main' into feat/docker
Browse files Browse the repository at this point in the history
  • Loading branch information
raldone01 authored Jul 14, 2024
2 parents 37297ce + 3a85b9e commit ef867d5
Showing 1 changed file with 9 additions and 3 deletions.
12 changes: 9 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# RouteLLM

RouteLLM is a framework for serving and evaluating LLM routers.
RouteLLM is a framework for serving and evaluating LLM routers.

[ [Blog](http://lmsys.org/blog/2024-07-01-routellm/) ] [ [Paper](https://arxiv.org/abs/2406.18665) ]

<p align="center">
Expand All @@ -10,7 +11,8 @@ RouteLLM is a framework for serving and evaluating LLM routers.
Our core features include:

- Drop-in replacement for OpenAI's client (or launch an OpenAI-compatible server) to route simpler queries to cheaper models.
- Trained routers are provided out of the box, which we have shown to **reduce costs by up to 85%** on widely-used benchmarks such as MT Bench while maintaining **95% GPT-4 performance**.
- Trained routers are provided out of the box, which we have shown to **reduce costs by up to 85%** while maintaining **95% GPT-4 performance** on widely-used benchmarks like MT Bench.
- Benchmarks also demonstrate that these routers achieve the same performance as commercial offerings while being **>40% cheaper**.
- Easily extend the framework to include new routers and compare the performance of routers across multiple benchmarks.

## Installation
Expand Down Expand Up @@ -116,6 +118,8 @@ Different LLMs vary widely in their costs and capabilities, which leads to a dil

*LLM routing* offers a solution to this. We introduce a router that looks at queries and routes simpler queries to smaller, cheaper models, saving costs while maintaining quality. We focus on routing between 2 models: a stronger, more expensive model and a cheaper but weaker model. Each request is also associated with a _cost threshold_ that determines the cost-quality tradeoff of that request - a higher cost threshold leads to lower cost but may lead to lower-quality responses.

The research in this repository was conducted in [collaboration with Anyscale](https://www.anyscale.com/blog/building-an-llm-router-for-high-quality-and-cost-effective-responses), and we are grateful for their help and support.

## Server

RouteLLM offers a lightweight OpenAI-compatible server for routing requests based on different routing strategies:
Expand Down Expand Up @@ -177,7 +181,9 @@ The full list of routers:

While these routers have been trained on the `gpt-4-1106-preview` and `mixtral-8x7b-instruct-v0.1` model pair, we have found that these routers generalize well to other strong and weak model pairs as well. Therefore, you can replace the model pair used for routing without having to retrain these models!

For the full details, refer to our [paper](https://arxiv.org/abs/2406.18665).
We also provide detailed instructions on how to train the LLM-based classifier in the following [notebook](https://github.com/anyscale/llm-router/blob/main/README.ipynb).

For the full details, refer to our [paper](https://arxiv.org/abs/2406.18665).

## Configuration

Expand Down

0 comments on commit ef867d5

Please sign in to comment.