Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

planning: /chat/completions Documentation and Roadmap #1582

Closed
2 tasks done
dan-menlo opened this issue Oct 30, 2024 · 3 comments
Closed
2 tasks done

planning: /chat/completions Documentation and Roadmap #1582

dan-menlo opened this issue Oct 30, 2024 · 3 comments
Assignees
Milestone

Comments

@dan-menlo
Copy link
Contributor

dan-menlo commented Oct 30, 2024

Goal

  • Our /chat/completions should have parameters similar to OpenAI
    • Check if our implemetation is aligned
    • Create planning: roadmap issues if it not supported yet
    • Goal: communicate that we are committed to OpenAI compatibility

Tasklist

  • Itemize Roadmap issues (once Dan approves, we will create Roadmap issues)
  • Update Swaggerfile
@dan-menlo dan-menlo added this to Menlo Oct 30, 2024
@dan-menlo dan-menlo converted this from a draft issue Oct 30, 2024
@dan-menlo
Copy link
Contributor Author

@nguyenhoangthuan99 - Please transfer https://github.com/janhq/internal/issues/160 to this issue (can be public)

@nguyenhoangthuan99
Copy link
Contributor

nguyenhoangthuan99 commented Oct 30, 2024

API reference: https://platform.openai.com/docs/api-reference/chat/create

Missing supported fields from /v1/chat/completions API:

  • store boolean or null Optional Defaults to false Whether or not to store the output of this chat completion request for use in our model distillation or evals products. To support this, we should come up with an architecture to save and store output of chat completion requests of user. (e.g. MinIO for storage and postgres for DB).
  • metadata object or null Optional Developer-defined tags and values used for filtering completions in the dashboard.
    image
    This also require some logics to save result to DB then user can query later.
  • logit_bias map Optional Defaults to null. Modify the likelihood of specified tokens appearing in the completion. Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token. -> need to confirm llamacpp support this or not, but this might be nice to have feature.
    Issue: feat: [support logit_bias for OpenAI API compatible] cortex.llamacpp#263
  • logprobs boolean or null Optional Defaults to false. Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message. This feature is partially supported, need up update cortex.llamacpp to return logprob when use stream/ non stream mode.
    Issue: feat: [support log prob like OpenAI API] cortex.llamacpp#262

The result should look like this:

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1702685778,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I assist you today?"
      },
      "logprobs": {
        "content": [
          {
            "token": "Hello",
            "logprob": -0.31725305,
            "bytes": [72, 101, 108, 108, 111],
            "top_logprobs": [
              {
                "token": "Hello",
                "logprob": -0.31725305,
                "bytes": [72, 101, 108, 108, 111]
              },
              {
                "token": "Hi",
                "logprob": -1.3190403,
                "bytes": [72, 105]
              }
            ]
          },
          {
            "token": "!",
            "logprob": -0.02380986,
            "bytes": [
              33
            ],
            "top_logprobs": [
              {
                "token": "!",
                "logprob": -0.02380986,
                "bytes": [33]
              },
              {
                "token": " there",
                "logprob": -3.787621,
                "bytes": [32, 116, 104, 101, 114, 101]
              }
            ]
          },
          {
            "token": " How",
            "logprob": -0.000054669687,
            "bytes": [32, 72, 111, 119],
            "top_logprobs": [
              {
                "token": " How",
                "logprob": -0.000054669687,
                "bytes": [32, 72, 111, 119]
              },
              {
                "token": "<|end|>",
                "logprob": -10.953937,
                "bytes": null
              }
            ]
          },
          {
            "token": " can",
            "logprob": -0.015801601,
            "bytes": [32, 99, 97, 110],
            "top_logprobs": [
              {
                "token": " can",
                "logprob": -0.015801601,
                "bytes": [32, 99, 97, 110]
              },
              {
                "token": " may",
                "logprob": -4.161023,
                "bytes": [32, 109, 97, 121]
              }
            ]
          },
          {
            "token": " I",
            "logprob": -3.7697225e-6,
            "bytes": [
              32,
              73
            ],
            "top_logprobs": [
              {
                "token": " I",
                "logprob": -3.7697225e-6,
                "bytes": [32, 73]
              },
              {
                "token": " assist",
                "logprob": -13.596657,
                "bytes": [32, 97, 115, 115, 105, 115, 116]
              }
            ]
          },
          {
            "token": " assist",
            "logprob": -0.04571125,
            "bytes": [32, 97, 115, 115, 105, 115, 116],
            "top_logprobs": [
              {
                "token": " assist",
                "logprob": -0.04571125,
                "bytes": [32, 97, 115, 115, 105, 115, 116]
              },
              {
                "token": " help",
                "logprob": -3.1089056,
                "bytes": [32, 104, 101, 108, 112]
              }
            ]
          },
          {
            "token": " you",
            "logprob": -5.4385737e-6,
            "bytes": [32, 121, 111, 117],
            "top_logprobs": [
              {
                "token": " you",
                "logprob": -5.4385737e-6,
                "bytes": [32, 121, 111, 117]
              },
              {
                "token": " today",
                "logprob": -12.807695,
                "bytes": [32, 116, 111, 100, 97, 121]
              }
            ]
          },
          {
            "token": " today",
            "logprob": -0.0040071653,
            "bytes": [32, 116, 111, 100, 97, 121],
            "top_logprobs": [
              {
                "token": " today",
                "logprob": -0.0040071653,
                "bytes": [32, 116, 111, 100, 97, 121]
              },
              {
                "token": "?",
                "logprob": -5.5247097,
                "bytes": [63]
              }
            ]
          },
          {
            "token": "?",
            "logprob": -0.0008108172,
            "bytes": [63],
            "top_logprobs": [
              {
                "token": "?",
                "logprob": -0.0008108172,
                "bytes": [63]
              },
              {
                "token": "?\n",
                "logprob": -7.184561,
                "bytes": [63, 10]
              }
            ]
          }
        ]
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 9,
    "total_tokens": 18,
    "completion_tokens_details": {
      "reasoning_tokens": 0
    }
  },
  "system_fingerprint": null
}
  • n integer or null Optional Defaults to 1. How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs. -> need to check if llama.cpp support this option.
    Issue: feat: [support return multiple choices] cortex.llamacpp#264

  • service_tier string or null Optional Defaults to auto.
    Specifies the latency tier to use for processing the request. This parameter is relevant for customers subscribed to the scale tier service:
    If set to 'auto', and the Project is Scale tier enabled, the system will utilize scale tier credits until they are exhausted.
    If set to 'auto', and the Project is not Scale tier enabled, the request will be processed using the default service tier with a lower uptime SLA and no latency guarentee.
    If set to 'default', the request will be processed using the default service tier with a lower uptime SLA and no latency guarentee.
    When not set, the default behavior is 'auto'.
    When this parameter is set, the response body will include the service_tier utilized.

  • stream_options object or null Optional Defaults to null Options for streaming response. Only set this when you set stream: true. -> need to update cortex.llamacpp to support this.
    Issue: feat: [support stream_option for OpenAI API compatible] cortex.llamacpp#265

  • modalities and audio: reference: https://platform.openai.com/docs/api-reference/chat/create#chat-create-modalities. We need a roadmap to support multimodalities for audio.

  • user : reference: https://platform.openai.com/docs/api-reference/chat/create#chat-create-user.

@nguyenhoangthuan99 nguyenhoangthuan99 moved this from Investigating to In Progress in Menlo Oct 31, 2024
@nguyenhoangthuan99
Copy link
Contributor

The following fields cannot be supported directly with cortex.cpp and need road map for it in the enterprise version:

  • store boolean or null Optional Defaults to false Whether or not to store the output of this chat completion request for use in our model distillation or evals products. To support this, we should come up with an architecture to save and store output of chat completion requests of user. (e.g. MinIO for storage and postgres for DB).

  • metadata object or null Optional Developer-defined tags and values used for filtering completions in the dashboard.
    image
    This also require some logics to save result to DB then user can query later.

  • service_tier string or null Optional Defaults to auto.
    Specifies the latency tier to use for processing the request. This parameter is relevant for customers subscribed to the scale tier service:
    If set to 'auto', and the Project is Scale tier enabled, the system will utilize scale tier credits until they are exhausted.
    If set to 'auto', and the Project is not Scale tier enabled, the request will be processed using the default service tier with a lower uptime SLA and no latency guarentee.
    If set to 'default', the request will be processed using the default service tier with a lower uptime SLA and no latency guarentee.
    When not set, the default behavior is 'auto'.
    When this parameter is set, the response body will include the service_tier utilized.

  • modalities and audio: reference: https://platform.openai.com/docs/api-reference/chat/create#chat-create-modalities. We need a roadmap to support multimodalities for audio.

  • user : reference: https://platform.openai.com/docs/api-reference/chat/create#chat-create-user.

  • Linked documentation to this issue in this PR Feat/api docs #1589

@nguyenhoangthuan99 nguyenhoangthuan99 moved this from In Progress to In Review in Menlo Oct 31, 2024
@gabrielle-ong gabrielle-ong added this to the v1.0.2 milestone Nov 6, 2024
@gabrielle-ong gabrielle-ong moved this from In Review to Completed in Menlo Nov 12, 2024
@github-project-automation github-project-automation bot moved this from Completed to Review + QA in Menlo Nov 12, 2024
@gabrielle-ong gabrielle-ong moved this from Review + QA to Completed in Menlo Nov 12, 2024
@gabrielle-ong gabrielle-ong modified the milestones: v1.0.2, v1.0.3 Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

3 participants