Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support multiple env-var for ENDPOINT in GMC #166

Closed
irisdingbj opened this issue Jul 11, 2024 · 4 comments
Closed

support multiple env-var for ENDPOINT in GMC #166

irisdingbj opened this issue Jul 11, 2024 · 4 comments
Assignees
Labels

Comments

@irisdingbj
Copy link
Collaborator

TEI_EMBEDDING_ENDPOINT
TEI_RERANKING_ENDPOINT
TGI_LLM_ENDPOINT

# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

apiVersion: gmc.opea.io/v1alpha3
kind: GMConnector
metadata:
  labels:
    app.kubernetes.io/name: gmconnector
    app.kubernetes.io/managed-by: kustomize
    gmc/platform: xeon
  name: switch
  namespace: switch
spec:
  routerConfig:
    name: router
    serviceName: router-service
  nodes:
    root:
      routerType: Sequence
      steps:
      - name: Embedding
        nodeName: node1
      - name: Retriever
        data: $response
        internalService:
          serviceName: retriever-svc
          config:
            endpoint: /v1/retrieval
      - name: VectorDB
        internalService:
          serviceName: redis-vector-db
          isDownstreamService: true
      - name: Reranking
        data: $response
        internalService:
          serviceName: reranking-svc
          config:
            endpoint: /v1/reranking
      - name: TeiReranking
        internalService:
          serviceName: tei-reranking-svc
          config:
            endpoint: /rerank
          isDownstreamService: true
      - name: Llm
        nodeName: node2
    node1:
      routerType: Switch
      steps:
        - name: Embedding
          condition: embedding-model-id==large
          internalService:
            serviceName: embedding-svc-large
            config:
              endpoint: /v1/embeddings
        - name: TeiEmbedding
          condition: embedding-model-id==large
          internalService:
            serviceName: tei-embedding-svc-bge15
            config:
              EMBEDDING_MODEL_ID: BAAI/bge-base-en-v1.5
            isDownstreamService: true
        - name: Embedding
          condition: embedding-model-id==small
          internalService:
            serviceName: embedding-svc-small
            config:
              endpoint: /v1/embeddings
        - name: TeiEmbedding
          condition: embedding-model-id==small
          internalService:
            serviceName: tei-embedding-svc-bge-small
            config:
              EMBEDDING_MODEL_ID: BAAI/bge-small-en-v1.5
            isDownstreamService: true
    node2:
      routerType: Switch
      steps:
        - name: Llm
          condition: model_id==intel
          data: $response
          internalService:
            serviceName: llm-svc-intel
            config:
              endpoint: /v1/chat/completions
        - name: Tgi
          condition: model-id==intel
          internalService:
            serviceName: tgi-service-intel
            config:
              endpoint: /generate
              LLM_MODEL_ID: Intel/neural-chat-7b-v3-3
            isDownstreamService: true
        - name: Llm
          condition: model_id==llama
          data: $response
          internalService:
            serviceName: llm-svc-llama
            config:
              endpoint: /v1/chat/completions
        - name: Tgi
          condition: model_id==llama
          internalService:
            serviceName: tgi-service-llama
            config:
              endpoint: /generate
              LLM_MODEL_ID: openlm-research/open_llama_3b
            isDownstreamService: true
@irisdingbj irisdingbj changed the title support multiple multiple env-var for ENDPOINT support multiple multiple env-var for ENDPOINT in GMC Jul 11, 2024
@irisdingbj irisdingbj changed the title support multiple multiple env-var for ENDPOINT in GMC support multiple env-var for ENDPOINT in GMC Jul 11, 2024
@irisdingbj
Copy link
Collaborator Author

kubectl get svc -n switch
NAME                          TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
embedding-svc                 ClusterIP   10.96.212.67    <none>        6000/TCP            11m
embedding-svc-large           ClusterIP   10.96.214.151   <none>        6000/TCP            11m
embedding-svc-small           ClusterIP   10.96.39.68     <none>        6000/TCP            11m
llm-svc                       ClusterIP   10.96.236.0     <none>        9000/TCP            11m
llm-svc-intel                 ClusterIP   10.96.10.72     <none>        9000/TCP            11m
llm-svc-llama                 ClusterIP   10.96.176.105   <none>        9000/TCP            11m
redis-vector-db               ClusterIP   10.96.190.159   <none>        6379/TCP,8001/TCP   11m
reranking-svc                 ClusterIP   10.96.224.112   <none>        8000/TCP            11m
retriever-svc                 ClusterIP   10.96.173.137   <none>        7000/TCP            11m
router-service                ClusterIP   10.96.185.113   <none>        8080/TCP            11m
tei-embedding-svc-bge-small   ClusterIP   10.96.236.40    <none>        6006/TCP            11m
tei-embedding-svc-bge15       ClusterIP   10.96.57.9      <none>        6006/TCP            11m
tei-reranking-svc             ClusterIP   10.96.34.209    <none>        8808/TCP            11m
tgi-service-intel             ClusterIP   10.96.187.199   <none>        9009/TCP            11m
tgi-service-llama             ClusterIP   10.96.41.154    <none>        9009/TCP            11m

current env var

  EMBEDDING_MODEL_ID: BAAI/bge-base-en-v1.5
  EMBEDDING_SERVICE_HOST_IP: embedding-svc
  HUGGINGFACEHUB_API_TOKEN: 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
  INDEX_NAME: rag-redis
  LLM_MODEL_ID: Intel/neural-chat-7b-v3-3
  LLM_SERVICE_HOST_IP: llm-svc
  REDIS_URL: redis://redis-vector-db.switch.svc.cluster.local:6379
  RERANK_MODEL_ID: BAAI/bge-reranker-large
  RERANK_SERVICE_HOST_IP: reranking-svc
  RETRIEVER_SERVICE_HOST_IP: retriever-svc
  TEI_EMBEDDING_ENDPOINT: http://tei-embedding-svc-bge-small.switch.svc.cluster.local:6006
  TEI_RERANKING_ENDPOINT: http://tei-reranking-svc.switch.svc.cluster.local:8808
  TGI_LLM_ENDPOINT: http://tgi-service-llama.switch.svc.cluster.local:9009

@KfreeZ
Copy link
Collaborator

KfreeZ commented Jul 12, 2024

the solutions is to overwrite and inject ENV into every deployment according to the specific config.

@zhlsunshine
Copy link
Collaborator

Hi @irisdingbj, It's unnecessary to set condition for some components, such as TeiEmbedding and Tgi, because they are decided by Embedding and Llm. Does it make sense? If yes, the YAML should be below:

# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

apiVersion: gmc.opea.io/v1alpha3
kind: GMConnector
metadata:
  labels:
    app.kubernetes.io/name: gmconnector
    app.kubernetes.io/managed-by: kustomize
    gmc/platform: xeon
  name: switch
  namespace: switch
spec:
  routerConfig:
    name: router
    serviceName: router-service
  nodes:
    root:
      routerType: Sequence
      steps:
      - name: Embedding
        nodeName: node1
      - name: Retriever
        data: $response
        internalService:
          serviceName: retriever-svc
          config:
            endpoint: /v1/retrieval
      - name: VectorDB
        internalService:
          serviceName: redis-vector-db
          isDownstreamService: true
      - name: Reranking
        data: $response
        internalService:
          serviceName: reranking-svc
          config:
            endpoint: /v1/reranking
      - name: TeiReranking
        internalService:
          serviceName: tei-reranking-svc
          config:
            endpoint: /rerank
          isDownstreamService: true
      - name: Llm
        nodeName: node2
    node1:
      routerType: Switch
      steps:
        - name: Embedding
          condition: embedding-model-id==large
          internalService:
            serviceName: embedding-svc-large
            config:
              endpoint: /v1/embeddings
        - name: TeiEmbedding
          internalService:
            serviceName: tei-embedding-svc-bge15
            config:
              EMBEDDING_MODEL_ID: BAAI/bge-base-en-v1.5
            isDownstreamService: true
        - name: Embedding
          condition: embedding-model-id==small
          internalService:
            serviceName: embedding-svc-small
            config:
              endpoint: /v1/embeddings
        - name: TeiEmbedding
          internalService:
            serviceName: tei-embedding-svc-bge-small
            config:
              EMBEDDING_MODEL_ID: BAAI/bge-small-en-v1.5
            isDownstreamService: true
    node2:
      routerType: Switch
      steps:
        - name: Llm
          condition: model_id==intel
          data: $response
          internalService:
            serviceName: llm-svc-intel
            config:
              endpoint: /v1/chat/completions
        - name: Tgi
          internalService:
            serviceName: tgi-service-intel
            config:
              endpoint: /generate
              LLM_MODEL_ID: Intel/neural-chat-7b-v3-3
            isDownstreamService: true
        - name: Llm
          condition: model_id==llama
          data: $response
          internalService:
            serviceName: llm-svc-llama
            config:
              endpoint: /v1/chat/completions
        - name: Tgi
          internalService:
            serviceName: tgi-service-llama
            config:
              endpoint: /generate
              LLM_MODEL_ID: openlm-research/open_llama_3b
            isDownstreamService: true

@zhlsunshine
Copy link
Collaborator

Hi all, please refer to #206 for the final yaml files of multiple env-var endpoints support and switch support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants