Command-R MLX API Server

Overview

This repository contains the Flask application for a machine learning model serving API. It provides several endpoints to generate text, simulate chat conversations, tool executions, and grounded responses based on different input parameters.

Features

Generate text with a given prompt and additional parameters.
Simulate chat conversations and generate appropriate responses.
Execute tools with specific arguments in a conversational context.
Retrieve grounded responses using provided documents and conversation history.

Installation

To run this application, you need Python 3 and pip installed on your system. Follow these steps to set up the server:

Clone the repository:

git clone <repository-url>
cd <repository-folder>

Install dependencies:
```
pip install -r requirements.txt
```

Run the server:

python server.py --port 5000 --model 'mlx-community/c4ai-command-r-v01-4bit' --debug

Replace <repository-url> and <repository-folder> with the actual URL and folder name of your cloned repository.

API Endpoints

Generate Text (`/generate`)

Generates text based on the provided prompt.

POST Request Body

{
    "prompt": "Enter your prompt here",
    "temperature": 0.2,
    "max_tokens": 131072
}

Response

{
    "generated_text": "Generated text will be here."
}

Chat (`/chat`)

Simulates a chat conversation and generates a response.

POST Request Body

{
    "conversation": [
        {"role": "user", "content": "User's message"},
        {"role": "assistant", "content": "Assistant's response"}
    ],
    "temperature": 0.2,
    "max_tokens": 131072
}

Response

{
    "generated_text": "Generated chat response will be here."
}

Use Tool (`/tool`)

Runs a specified tool within the conversation context.

POST Request Body

{
    "conversation": [
        {"role": "user", "content": "User's message"},
        {"role": "assistant", "content": "Assistant's response"}
    ],
    "tools": [
        {
            "name": "internet_search",
            "description": "Searches the internet for the given query.",
            "parameter_definitions": {
                "query": {
                    "description": "Query to search the internet with",
                    "type": "str",
                    "required": true
                }
            }
        }
    ]
}

Response

{
    "tool_response": "Generated tool response will be here."
}

Grounded Response Generation (`/rag`)

Generates grounded responses based on the conversation and supplemental documents.

POST Request Body

{
    "conversation": [
        {"role": "user", "content": "User's message"},
        {"role": "assistant", "content": "Assistant's response"}
    ],
    "documents": [
        { "title": "Document Title", "text": "Some relevant information." }
    ],
    "citation_mode": "accurate"
}

Response

{
    "rag_response": "Generated grounded response will be here."
}

Usage

Ensure that the Flask server is running as instructed above.
Utilize tools such as curl or Postman to send HTTP POST requests to the desired endpoint with the required JSON payload.

Notes

Make sure to validate input parameters and catch exceptions as per the example code.
Always check for the latest updates and documentation.

Remember to replace placeholder texts with actual values suited to your environment and use case.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Command-R MLX API Server

Overview

Features

Installation

API Endpoints

Generate Text (`/generate`)

POST Request Body

Response

Chat (`/chat`)

POST Request Body

Response

Use Tool (`/tool`)

POST Request Body

Response

Grounded Response Generation (`/rag`)

POST Request Body

Response

Usage

Notes

Files

README.md

Latest commit

History

README.md

File metadata and controls

Command-R MLX API Server

Overview

Features

Installation

API Endpoints

Generate Text (/generate)

POST Request Body

Response

Chat (/chat)

POST Request Body

Response

Use Tool (/tool)

POST Request Body

Response

Grounded Response Generation (/rag)

POST Request Body

Response

Usage

Notes

Generate Text (`/generate`)

Chat (`/chat`)

Use Tool (`/tool`)

Grounded Response Generation (`/rag`)