Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llm gateway rework #3

Merged
merged 18 commits into from
Jan 15, 2025
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions .envdefault
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,19 @@ LINTO_FRONT_THEME=LinTO-green
ORGANIZATION_DEFAULT_PERMISSIONS=upload,summary,session
[email protected]
SUPER_ADMIN_PWD=superadmin

# OpenAI
OPENAI_API_TOKEN=sk***
OPENAI_API_BASE=***

ORGANIZATION_DEFAULT_PERMISSIONS=upload,summary,session
[email protected]
SUPER_ADMIN_PWD=superadmin

# OpenAI
OPENAI_API_TOKEN=sk***
OPENAI_API_BASE=***

ORGANIZATION_DEFAULT_PERMISSIONS=upload,summary,session
[email protected]
SUPER_ADMIN_PWD=superadminpassword
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,3 @@
**/running/*.yaml
websocket.pcap
.env
damienlaine marked this conversation as resolved.
Show resolved Hide resolved
45 changes: 45 additions & 0 deletions conf-templates/llm/.hydra-conf/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
defaults :
- _self_
- services :
- en
- fr

prompt_path: ./prompts/
backend_defaults :
name: null
modelName: null
totalContextLength: null
maxGenerationLength: null
tokenizerClass: null
createNewTurnAfter: null
summaryTurns: null
maxNewTurns: null
temperature: null
top_p: null
reduceSummary: null
consolidateSummary: null
service_name: ${oc.env:SERVICE_NAME,LLM_Gateway}

api_params:
api_base: ${oc.env:OPENAI_API_BASE,http://localhost:9000/v1}
api_key: ${oc.env:OPENAI_API_TOKEN,EMPTY}
max_retries: ${oc.decode:${oc.env:MAX_RETRIES,6}}
max_retry_delay: ${oc.decode:${oc.env:MAX_RETRY_DELAY,10}}
service_port: ${oc.decode:${oc.env:HTTP_PORT,8000}}
workers: ${oc.decode:${oc.env:CONCURRENCY,1}}
timeout: ${oc.decode:${oc.env:TIMEOUT,60}}
ws_polling_interval: ${oc.decode:${oc.env:WS_POLLING_INTERVAL,3}}

semaphore:
max_concurrent_inferences: ${oc.decode:${oc.env:MAX_CONCURRENT_INFERENCES,3}}

swagger:
url: ${oc.env:SWAGGER_URL,/docs}
title: ${oc.env:SWAGGER_TITLE,STT API Documentation}
description: ${oc.env:SWAGGER_DESCRIPTION,API to make summary of text using LLMs.}

services_broker:
url: ${oc.env:SERVICES_BROKER,redis://localhost:6379}
password: ${oc.env:BROKER_PASS,EMPTY}

debug: false
21 changes: 21 additions & 0 deletions conf-templates/llm/.hydra-conf/services/en.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
en:
type: summary
fields: 2
name: summarize-en
route: summarize/en
description:
fr: English summary
backend: vLLM
flavor:
- name: llama
modelName: meta-llama-31-8b-it
totalContextLength: 128000
maxGenerationLength: 2048
tokenizerClass: LlamaTokenizer
createNewTurnAfter: 250
summaryTurns: 3
maxNewTurns: 9
temperature: 0.2
top_p: 0.7
reduceSummary: false
consolidateSummary: false
htagourti marked this conversation as resolved.
Show resolved Hide resolved
21 changes: 21 additions & 0 deletions conf-templates/llm/.hydra-conf/services/fr.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
fr:
type: summary
fields: 2
name: summarize-fr
route: summarize/fr
description:
fr: Résumé français
backend: vLLM
flavor:
- name: llama
modelName: meta-llama-31-8b-it
totalContextLength: 128000
maxGenerationLength: 2048
tokenizerClass: LlamaTokenizer
createNewTurnAfter: 250
summaryTurns: 3
maxNewTurns: 9
temperature: 0.2
top_p: 0.7
reduceSummary: false
consolidateSummary: false
16 changes: 16 additions & 0 deletions conf-templates/llm/prompts/summarize-en.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
You must summarize a transcript following these guidelines:
Always use standard spelling conventions.
Rely strictly on the text to be processed without including external information.
Remove the mention of the speaker followed by ":" in the summary.
Explain the content without using the first-person narrative.
Never write anything other than the summary of the processed speech turns, do not provide information about the reduction and processing carried out, never present the summarized text out of context (no "Here is the summary of the speech turns:").
Never include in the summary any statements from the speech turns summarized so far.
The speech turns can be in any language and must be translated into English.

### Speech turns summarized so far (do not repeat or summarize again)
{}

### Speech turns to process
{}

### Speech turns summarized (in English)
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
Vous devez résumer une transcription en suivant les directives suivantes :
Toujours utiliser les conventions orthographiques standard du français.
S'appuyer strictement sur le texte à traiter sans inclure d'informations externes.
Enlever la mention du locuteur suivie de ":" dans le résumé.
Expliquer le propos sans reprendre le tour de parole à la première personne.
Ne jamais rien écrire d'autre que le résumé des tours de parole traités, ne pas donner d'informations sur la réduction et les traitements réalisés, ne jamais présenter le texte résumé en sortant du contexte (pas de "Voici le résumé des tours de parole : ").
Ne jamais inclure dans le résumé des propos issus des tours de paroles résumé jusque là.

### Tours de parole résumés jusque là (ne surtout pas répéter ou résumer à nouveau)
{}

### Tours de parole à traiter
{}

Vous devez résumer une transcription en suivant les directives suivantes :
Toujours utiliser les conventions orthographiques standard du français.
S'appuyer strictement sur le texte à traiter sans inclure d'informations externes.
Enlever la mention du locuteur suivie de ":" dans le résumé.
Expliquer le propos sans reprendre le tour de parole à la première personne.
Ne jamais rien écrire d'autre que le résumé des tours de parole traités, ne pas donner d'informations sur la réduction et les traitements réalisés, ne jamais présenter le texte résumé en sortant du contexte (pas de "Voici le résumé des tours de parole : ").
Ne jamais inclure dans le résumé des propos issus des tours de paroles résumé jusque là.
### Tours de parole résumés jusque là (ne surtout pas répéter ou résumer à nouveau)
{}
### Tours de parole à traiter
{}
### Tours de parole résumés (en français)
23 changes: 0 additions & 23 deletions conf-templates/llm/summary.json

This file was deleted.

41 changes: 38 additions & 3 deletions scripts/build-config.sh
Original file line number Diff line number Diff line change
Expand Up @@ -52,9 +52,9 @@ build_stt() {
build_llm() {
echo "Building LLM..."

mkdir -p "${LINTO_SHARED_MOUNT}/llm_services/" \
${LINTO_SHARED_MOUNT}/models/
cp -r "${CONFIG_TEMPLATES}/llm/"* "${LINTO_SHARED_MOUNT}/llm_services/"
mkdir -p ${LINTO_SHARED_MOUNT}/models/

cp -r "${CONFIG_TEMPLATES}/llm" "${LINTO_SHARED_MOUNT}"

create_networks "net_llm_services"
}
Expand Down Expand Up @@ -115,6 +115,41 @@ build_session() {
mkdir -p ${LINTO_LOCAL_MOUNT}/database/postgres/db-session-database/
}

build_khaldi-french-streaming() {
echo "Building Live streaming..."
TARGET_FOLDER="${LINTO_SHARED_MOUNT}/models/AMs/french"

if [ ! -d "$TARGET_FOLDER" ]; then
ZIP_URL="https://dl.linto.ai/downloads/model-distribution/acoustic-models/fr-FR/linSTT_AM_fr-FR_v2.2.0.zip"
ZIP_FILE="${TARGET_FOLDER}/linSTT_AM_fr-FR_v2.2.0.zip"

echo "Creating target folder: $TARGET_FOLDER"
mkdir -p "$TARGET_FOLDER"
curl -L -o "$ZIP_FILE" "$ZIP_URL"
unzip -o "$ZIP_FILE" -d "$TARGET_FOLDER"
rm "$ZIP_FILE"
fi

TARGET_FOLDER="${LINTO_SHARED_MOUNT}/models/LMs/french"

if [ ! -d "$TARGET_FOLDER" ]; then
ZIP_URL="https://dl.linto.ai/downloads/model-distribution/decoding-graphs/LVCSR/fr-FR/decoding_graph_fr-FR_Big_v2.2.0.zip"
ZIP_FILE="${TARGET_FOLDER}/linSTT_AM_fr-FR_v2.2.0.zip"
echo "Creating target folder: $TARGET_FOLDER"
mkdir -p "$TARGET_FOLDER"
curl -L -o "$ZIP_FILE" "$ZIP_URL"
unzip -o "$ZIP_FILE" -d "$TARGET_FOLDER"
rm "$ZIP_FILE"
fi
}

build_whisper-streaming() {
echo "Building whisper..."

mkdir -p ${LINTO_SHARED_MOUNT}/audios/api_uploads \
${LINTO_SHARED_MOUNT}/models/
}

build_kaldi-french-streaming() {
echo "Building Live streaming..."
TARGET_FOLDER="${LINTO_SHARED_MOUNT}/models/AMs/french"
Expand Down
10 changes: 8 additions & 2 deletions scripts/build-services.sh
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,8 @@ generate_yaml_files() {
-V DIARIZATION_DEFAULT=$diarization_service \
-V GPU_MODE=$gpu_mode \
-V ENABLE_SESSION_STUDIO=$enable_session_studio \
-V OPENAI_API_BASE=$OPENAI_API_BASE \
-V OPENAI_API_TOKEN=$OPENAI_API_TOKEN \
"${service_dir}/template.jsonnet" | yq eval -P - >"$RUNNING_DIR/$FILE_NAME.yaml"
fi
}
Expand All @@ -69,7 +71,10 @@ build_main_service() {
build_llm() {
echo "Building LLM..."
generate_yaml_files "services/llm/llm-gateway" $1 $2
generate_yaml_files "services/llm/vllm"
generate_yaml_files "services/stt/task-broker-redis"
if [ "$3" = "true" ]; then
generate_yaml_files "services/llm/vllm"
fi
}

build_studio() {
Expand Down Expand Up @@ -160,6 +165,7 @@ main() {
gpu_enable="${6:-false}"
diarization_enable="${7:-false}"
speaker_identification="${8:-false}"
vllm_enable="${9:-false}"

case "$1" in
stt-fr)
Expand All @@ -172,7 +178,7 @@ main() {
build_diarization $gpu_enable $speaker_identification
;;
llm)
build_llm $traefik_exposed $gateway_exposed
build_llm $traefik_exposed $gateway_exposed $vllm_enable
;;
studio)
# Special rule for studio on param 4 who containing the information about live-streaming
Expand Down
33 changes: 33 additions & 0 deletions scripts/dialog.sh
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,33 @@ dialog_gpu_mode() {
fi
}

streaming_service() {
selected_streaming_services=$(dialog --title "Streaming Services" --checklist \
"Streaming service selection?" "$DIALOG_HEIGHT" "$DIALOG_WIDTH" 2 \
1 "Linto french kaldi streaming service" off \
2 "Linto whisper streaming service" off \
3>&1 1>&2 2>&3)

echo "$selected_streaming_services"
}
dialog_vllm() {
vllm=$(dialog --title "vLLM Backend deployment" --radiolist \
"Do you want to deploy the vLLM service?" "$DIALOG_HEIGHT" "$DIALOG_WIDTH" 2 \
1 "Yes" off \
2 "No" off \
3>&1 1>&2 2>&3)

case "$vllm" in
1)
vllm_enable="true"
;;
2)
vllm_enable="false"
;;
esac
echo "$vllm_enable"
}

streaming_service() {
selected_streaming_services=$(dialog --title "Streaming Services" --checklist \
"Streaming service selection?" "$DIALOG_HEIGHT" "$DIALOG_WIDTH" 2 \
Expand Down Expand Up @@ -144,6 +171,12 @@ main() {
streaming_service)
streaming_service
;;
vllm)
dialog_vllm
;;
streaming_service)
streaming_service
;;
*)
echo "Usage: $0 {expose|transcription|deployment|gpu|domain|speaker_identification|streaming_service}"
exit 1
Expand Down
10 changes: 9 additions & 1 deletion scripts/setup-services.sh
Original file line number Diff line number Diff line change
Expand Up @@ -82,18 +82,23 @@ trigger_build_service() {

#TODO: we expose to the gateway when studio is selected
gpu_enable=false
vllm_enable=false
diarization_enable=""
live_streaming_enable=false
speaker_identification="false"
if [[ "$services" =~ (^|[[:space:]])3($|[[:space:]]) && "$services" =~ (^|[[:space:]])(1|2)($|[[:space:]]) ]]; then
speaker_identification=$(./scripts/dialog.sh "speaker_identification")


if [[ "$speaker_identification" == "true" ]]; then
diarization_enable="stt-diarization-pyannote-qdrant"
else
diarization_enable="stt-diarization-pyannote"
fi
fi
if [[ "$services" =~ (^|[[:space:]])3($|[[:space:]]) ]]; then
diarization_enable="stt-diarization-pyannote"
fi
if [[ "$services" =~ (^|[[:space:]])6($|[[:space:]]) ]]; then
echo "Studio is selected, forcing API Gateway"
expose_api_gateway=true
Expand All @@ -102,6 +107,9 @@ trigger_build_service() {
echo "Studio is selected, forcing API Gateway"
live_streaming_enable=true
fi
if [[ "$services" =~ (^|[[:space:]])4($|[[:space:]]) ]]; then
vllm_enable=$(./scripts/dialog.sh "vllm")
fi

./scripts/build-services.sh "main" "$LINTO_DOMAIN" "$DEPLOYMENT_MODE"

Expand Down Expand Up @@ -136,7 +144,7 @@ trigger_build_service() {

4)
./scripts/build-config.sh "llm"
./scripts/build-services.sh "llm" "$LINTO_DOMAIN" "$DEPLOYMENT_MODE" "$expose_traefik" "$expose_api_gateway"
./scripts/build-services.sh "llm" "$LINTO_DOMAIN" "$DEPLOYMENT_MODE" "$expose_traefik" "$expose_api_gateway" "" "" "" "$vllm_enable"
;;
5)

Expand Down
Loading