compatible with openai/tgi/vllm request format #275

lkk12014402 · 2024-07-04T14:49:04Z

Description

compatible with openai/tgi/vllm completion request format/parameters
priority default values (openai > tgi > vllm). With same default value, the opea service latency is almost same with native server tgi/vllm
replace from langchain_community.llms import VLLMOpenAI with from openai import OpenAI, which can reduce initialization time (0.15s -> 0.01s) and only init once
simplify post-process for openai format response (especially remove encode)

for more information, see https://pre-commit.ci

codecov · 2024-07-04T14:52:33Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Files with missing lines	Coverage Δ
comps/cores/proto/docarray.py	`100.00% <100.00%> (ø)`

for more information, see https://pre-commit.ci

lkk12014402 · 2024-07-04T16:09:58Z

Opea/OpenAI/VLLM/TGI completion request format comparision

OpenAI API Reference - OpenAI API	vLLM vllm/entrypoints/openai/protocol.py	TGI GenerateParameters	OPEA (now)	OPEA (this pr)
model: Union[str	model: str			model: Optional[str] = None # for openai, not used by tgi
prompt: Union[str	prompt: Union[List[int], List[List[int]], str, List[str]]			query: str # alias 'prompt'
best_of: Optional[int] Defaults to 1	best_of: Optional[int] = None	default = "null"		best_of: Optional[int] = 1
echo: Optional[bool] Defaults to false	echo: Optional[bool] = False			echo: Optional[bool] = False
frequency_penalty: Optional[float] Defaults to 0	frequency_penalty: Optional[float] = 0.0	default = "null"	frequency_penalty: Optional[float] = 0.0	frequency_penalty: Optional[float] = 0.0
logit_bias: Optional[Dict[str, int]] Defaults to null	logit_bias: Optional[Dict[str, float]] = None			logit_bias: Optional[Dict[str, float]] = None
logprobs: Optional[int] Defaults to null	logprobs: Optional[int] = None			logprobs: Optional[int] = None
max_tokens: Optional[int] Defaults to 16	max_tokens: Optional[int] = 16	default = "100"	max_tokens: Optional[int] = 1024	max_new_tokens: Optional[int] = 16 # alias 'max_tokens'
n: Optional[int] Defaults to 1	n: int = 1			n: Optional[int] = 1
presence_penalty: Optional[float] Defaults to 0	presence_penalty: Optional[float] = 0.0			presence_penalty: Optional[float] = 0.0
seed: Optional[int] Defaults to null	seed: Optional[int] = Field(None,	default = "null"		seed: Optional[int] = None
stop: Union[Optional[str] Defaults to null	stop: Optional[Union[str, List[str]]] = Field(default_factory=list)	stop: Vec, default = []		stop: Union[Optional[str], List[str], None] = None
stream: Optional[Literal[False]] Defaults to false	stream: Optional[bool] = False		stream: Optional[bool] = False	streaming: Optional[bool] = False # alias 'stream'
stream_options: Optional[ Defaults to null	stream_options: Optional[StreamOptions] = None			stream_options: Optional[StreamOptions] = None
suffix: Optional[str] Defaults to null	suffix: Optional[str] = None			suffix: Optional[str] = None
temperature: Optional[float] Defaults to 1	temperature: Optional[float] = 1.0	default = "null",	temperature: Optional[float] = 0.01	temperature: Optional[float] = 1.0
top_p: Optional[float] Defaults to 1	top_p: Optional[float] = 1.0	default = "null",	top_p: Optional[float] = 0.95	top_p: Optional[float] = 1.0
user: str Defaults to null	user: Optional[str] = None			user: Optional[str] = None
	more parameters for vllm…..
	…	default = "null"	top_k: Optional[int] = 10	top_k: Optional[int] = None
	…	default = "null"	typical_p: float = 0.95	typical_p: Optional[float] = None
	…	default = "null"	repetition_penalty: Optional[float] = 1.03	repetition_penalty: Optional[float] = None
		do_sample: bool, default = "false"

comps/llms/text-generation/vllm-xft/llm.py

lkk12014402 · 2024-09-05T01:26:39Z

implement it in another pr

* add comment Signed-off-by: Sun, Xuehao <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by: Sun, Xuehao <[email protected]> * remove test Signed-off-by: Sun, Xuehao <[email protected]> * Update message Signed-off-by: Sun, Xuehao <[email protected]> * update message Signed-off-by: Sun, Xuehao <[email protected]> * Add dependency review Signed-off-by: Sun, Xuehao <[email protected]> --------- Signed-off-by: Sun, Xuehao <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

lkk12014402 added 2 commits July 4, 2024 22:38

refact openai request.

2d3f991

Compatible with openai/tgi/vllm request format.

1e4f4ca

lkk12014402 requested review from hshen14, Spycsh, kevinintel, lvliang-intel, letonghan and feng-intel July 4, 2024 14:49

[pre-commit.ci] auto fixes from pre-commit.com hooks

8a18751

for more information, see https://pre-commit.ci

lkk12014402 and others added 3 commits July 4, 2024 22:53

remove stream_options for error(Extra inputs are not permitted).

a2c6080

fix import.

acd13a2

[pre-commit.ci] auto fixes from pre-commit.com hooks

9a31ecf

for more information, see https://pre-commit.ci

fix value.

89929a8

ftian1 reviewed Jul 5, 2024

View reviewed changes

comps/llms/text-generation/vllm-xft/llm.py Show resolved Hide resolved

lkk12014402 mentioned this pull request Jul 5, 2024

Adapt Vllm response format. opea-project/GenAIExamples#379

Merged

3 tasks

lvliang-intel approved these changes Jul 8, 2024

View reviewed changes

lkk12014402 mentioned this pull request Jul 10, 2024

Define embedding/ranking/llm request/response format #289

Draft

lkk12014402 mentioned this pull request Aug 5, 2024

refact embedding/ranking/llm request/response by referring to openai format #405

Merged

lkk12014402 closed this Sep 5, 2024

eero-t mentioned this pull request Dec 18, 2024

Update chatqna.py to support vLLM embeddings opea-project/GenAIExamples#1237

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compatible with openai/tgi/vllm request format #275

compatible with openai/tgi/vllm request format #275

lkk12014402 commented Jul 4, 2024 •

edited

Loading

codecov bot commented Jul 4, 2024 •

edited

Loading

lkk12014402 commented Jul 4, 2024 •

edited

Loading

lkk12014402 commented Sep 5, 2024

compatible with openai/tgi/vllm request format #275

compatible with openai/tgi/vllm request format #275

Conversation

lkk12014402 commented Jul 4, 2024 • edited Loading

Description

codecov bot commented Jul 4, 2024 • edited Loading

Codecov Report

lkk12014402 commented Jul 4, 2024 • edited Loading

Opea/OpenAI/VLLM/TGI completion request format comparision

lkk12014402 commented Sep 5, 2024

lkk12014402 commented Jul 4, 2024 •

edited

Loading

codecov bot commented Jul 4, 2024 •

edited

Loading

lkk12014402 commented Jul 4, 2024 •

edited

Loading