Skip to content

Latest commit

 

History

History
77 lines (51 loc) · 2.39 KB

get_started.md

File metadata and controls

77 lines (51 loc) · 2.39 KB

Get Started

LMDeploy offers functionalities such as model quantization, offline batch inference, online serving, etc. Each function can be completed with just a few simple lines of code or commands.

Installation

Install lmdeploy with pip (python 3.8+) or from source

pip install lmdeploy

The default prebuilt package is compiled on CUDA 11.8. However, if CUDA 12+ is required, you can install lmdeploy by:

export LMDEPLOY_VERSION=0.2.0
export PYTHON_VERSION=38
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl

Offline batch inference

import lmdeploy
pipe = lmdeploy.pipeline("internlm/internlm-chat-7b")
response = pipe(["Hi, pls intro yourself", "Shanghai is"])
print(response)

For more information on inference pipeline parameters, please refer to here.

Serving

LMDeploy's api_server enables models to be easily packed into services with a single command. The provided RESTful APIs are compatible with OpenAI's interfaces. Below are an example of service startup:

lmdeploy serve api_server internlm/internlm-chat-7b

The default port of api_server is 23333. After the server is launched, you can communicate with server on terminal through api_client:

lmdeploy serve api_client http://0.0.0.0:23333

You can overview and try out api_server APIs online by swagger UI at http://0.0.0.0:23333, or you can read the API specification from here.

Quantization

LMDeploy provides the following quantization methods. Please visit the following links for the detailed guide

Useful Tools

LMDeploy CLI offers the following utilities, helping users experience LLM features conveniently

Inference with Command line Interface

lmdeploy chat turbomind internlm/internlm-chat-7b

Serving with Web UI

LMDeploy adopts gradio to develop the online demo.

# install dependencies
pip install lmdeploy[serve]
# launch gradio server
lmdeploy serve gradio internlm/internlm-chat-7b