Start Local Server

You can start a local server using models on your local computer with the nexa server command. Here's the usage syntax:

usage: nexa server [-h] [--host HOST] [--port PORT] [--reload] model_path

Options:

-lp, --local_path: Indicate that the model path provided is the local path
-mt, --model_type: Indicate the model running type, must be used with -lp or -hf or ms, choose from [NLP, COMPUTER_VISION, MULTIMODAL, AUDIO]
-hf, --huggingface: Load model from Hugging Face Hub
-ms, --modelscope: Load model from ModelScope Hub
--host: Host to bind the server to
--port: Port to bind the server to
--reload: Enable automatic reloading on code changes
--nctx: Maximum context length of the model you're using

Example Commands:

nexa server gemma
nexa server llama2-function-calling
nexa server sd1-5
nexa server faster-whipser-large
nexa server ../models/llava-v1.6-vicuna-7b/ -lp -mt MULTIMODAL

By default, nexa server will run gguf models. To run onnx models, simply add onnx after nexa server.

API Endpoints

1. Text Generation: `/v1/completions`

Generates text based on a single prompt.

Request body:

{
  "prompt": "Tell me a story",
  "temperature": 1,
  "max_new_tokens": 128,
  "top_k": 50,
  "top_p": 1,
  "stop_words": ["string"],
  "stream": false
}

Example Response:

{
  "result": "Once upon a time, in a small village nestled among rolling hills..."
}

2. Chat Completions: `/v1/chat/completions`

Update: Now supports multimodal inputs when using Multimodal models.

Handles chat completions with support for conversation history.

Request body:

Multimodal models (VLM):

{
  "model": "anything",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What’s in this image?"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
          }
        }
      ]
    }
  ],
  "max_tokens": 300,
  "temperature": 0.7,
  "top_p": 0.95,
  "top_k": 40,
  "stream": false
}

Traditional NLP models:

{
  "messages": [
    {
      "role": "user",
      "content": "Tell me a story"
    }
  ],
  "max_tokens": 128,
  "temperature": 0.1,
  "stream": false,
  "stop_words": []
}

Example Response:

{
  "id": "f83502df-7f5a-4825-a922-f5cece4081de",
  "object": "chat.completion",
  "created": 1723441724.914671,
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "In the heart of a mystical forest..."
      }
    }
  ]
}

3. Function Calling: `/v1/function-calling`

Call the most appropriate function based on user's prompt.

Request body:

{
  "messages": [
    {
      "role": "user",
      "content": "Extract Jason is 25 years old"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "UserDetail",
        "parameters": {
          "properties": {
            "name": {
              "description": "The user's name",
              "type": "string"
            },
            "age": {
              "description": "The user's age",
              "type": "integer"
            }
          },
          "required": ["name", "age"],
          "type": "object"
        }
      }
    }
  ],
  "tool_choice": "auto"
}

Function format:

{
  "type": "function",
  "function": {
    "name": "function_name",
    "description": "function_description",
    "parameters": {
      "type": "object",
      "properties": {
        "property_name": {
          "type": "string | number | boolean | object | array",
          "description": "string"
        }
      },
      "required": ["array_of_required_property_names"]
    }
  }
}

Example Response:

{
  "id": "chatcmpl-7a9b0dfb-878f-4f75-8dc7-24177081c1d0",
  "object": "chat.completion",
  "created": 1724186442,
  "model": "/home/ubuntu/.cache/nexa/hub/official/Llama2-7b-function-calling/q3_K_M.gguf",
  "choices": [
    {
      "finish_reason": "tool_calls",
      "index": 0,
      "logprobs": null,
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call__0_UserDetail_cmpl-8d5cf645-7f35-4af2-a554-2ccea1a67bdd",
            "type": "function",
            "function": {
              "name": "UserDetail",
              "arguments": "{ \"name\": \"Jason\", \"age\": 25 }"
            }
          }
        ],
        "function_call": {
          "name": "",
          "arguments": "{ \"name\": \"Jason\", \"age\": 25 }"
        }
      }
    }
  ],
  "usage": {
    "completion_tokens": 15,
    "prompt_tokens": 316,
    "total_tokens": 331
  }
}

4. Text-to-Image: `/v1/txt2img`

Generates images based on a single prompt.

Request body:

{
  "prompt": "A girl, standing in a field of flowers, vivid",
  "image_path": "",
  "cfg_scale": 7,
  "width": 256,
  "height": 256,
  "sample_steps": 20,
  "seed": 0,
  "negative_prompt": ""
}

Example Response:

{
  "created": 1724186615.5426757,
  "data": [
    {
      "base64": "base64_of_generated_image",
      "url": "path/to/generated_image"
    }
  ]
}

5. Image-to-Image: `/v1/img2img`

Modifies existing images based on a single prompt.

Request body:

{
  "prompt": "A girl, standing in a field of flowers, vivid",
  "image_path": "path/to/image",
  "cfg_scale": 7,
  "width": 256,
  "height": 256,
  "sample_steps": 20,
  "seed": 0,
  "negative_prompt": ""
}

Example Response:

{
  "created": 1724186615.5426757,
  "data": [
    {
      "base64": "base64_of_generated_image",
      "url": "path/to/generated_image"
    }
  ]
}

6. Audio Transcriptions: `/v1/audio/transcriptions`

Transcribes audio files to text.

Parameters:

beam_size (integer): Beam size for transcription (default: 5)
language (string): Language code (e.g., 'en', 'fr')
temperature (number): Temperature for sampling (default: 0)

Request body:

{
  "file" (form-data): The audio file to transcribe (required)
}

Example Response:

{
  "text": " And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country."
}

7. Audio Translations: `/v1/audio/translations`

Translates audio files to text in English.

Parameters:

beam_size (integer): Beam size for transcription (default: 5)
temperature (number): Temperature for sampling (default: 0)

Request body:

{
  "file" (form-data): The audio file to transcribe (required)
}

Example Response:

{
  "text": " Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday"
}

8. Generate Embeddings: `/v1/embeddings`

Generate embeddings for a given text.

Request body:

{
  "input": "I love Nexa AI.",
  "normalize": false,
  "truncate": true
}

Example Response:

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [
        -0.006929283495992422,
        -0.005336422007530928,
        ... (omitted for spacing)
        -4.547132266452536e-05,
        -0.024047505110502243
      ],
    }
  ],
  "model": "/home/ubuntu/models/embedding_models/mxbai-embed-large-q4_0.gguf",
  "usage": {
    "prompt_tokens": 5,
    "total_tokens": 5
  }
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SERVER.md

SERVER.md

Start Local Server

Options:

Example Commands:

API Endpoints

1. Text Generation: `/v1/completions`

Request body:

Example Response:

2. Chat Completions: `/v1/chat/completions`

Request body:

Example Response:

3. Function Calling: `/v1/function-calling`

Request body:

Function format:

Example Response:

4. Text-to-Image: `/v1/txt2img`

Request body:

Example Response:

5. Image-to-Image: `/v1/img2img`

Request body:

Example Response:

6. Audio Transcriptions: `/v1/audio/transcriptions`

Parameters:

Request body:

Example Response:

7. Audio Translations: `/v1/audio/translations`

Parameters:

Request body:

Example Response:

8. Generate Embeddings: `/v1/embeddings`

Request body:

Example Response:

Files

SERVER.md

Latest commit

History

SERVER.md

File metadata and controls

Start Local Server

Options:

Example Commands:

API Endpoints

1. Text Generation: /v1/completions

Request body:

Example Response:

2. Chat Completions: /v1/chat/completions

Request body:

Example Response:

3. Function Calling: /v1/function-calling

Request body:

Function format:

Example Response:

4. Text-to-Image: /v1/txt2img

Request body:

Example Response:

5. Image-to-Image: /v1/img2img

Request body:

Example Response:

6. Audio Transcriptions: /v1/audio/transcriptions

Parameters:

Request body:

Example Response:

7. Audio Translations: /v1/audio/translations

Parameters:

Request body:

Example Response:

8. Generate Embeddings: /v1/embeddings

Request body:

Example Response:

1. Text Generation: `/v1/completions`

2. Chat Completions: `/v1/chat/completions`

3. Function Calling: `/v1/function-calling`

4. Text-to-Image: `/v1/txt2img`

5. Image-to-Image: `/v1/img2img`

6. Audio Transcriptions: `/v1/audio/transcriptions`

7. Audio Translations: `/v1/audio/translations`

8. Generate Embeddings: `/v1/embeddings`