Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Droid] Add DBRX-Instruct Truss Example #263

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
93 changes: 93 additions & 0 deletions dbrx-instruct/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# DBRX-Instruct Truss

DBRX-Instruct is a state-of-the-art language model developed by Anthropic, leveraging the latest advancements in constitutional AI to ensure safe and effective instruction-following capabilities. This model is particularly adept at understanding and generating human-like text, making it an invaluable tool for a wide range of applications including content creation, summarization, and question-answering tasks.

## Deploying DBRX-Instruct Truss

To deploy the DBRX-Instruct Truss on Baseten:

1. Clone this repo: `git clone https://github.com/baseten/truss-examples`

2. Make sure you have a [Baseten account](https://app.baseten.co/signup).

3. Install Truss: `npm install -g @baseten/truss`

4. Log in to your Baseten account using an [API key](https://docs.baseten.co/api_keys/).

5. Deploy: `truss deploy dbrx-instruct` (you may be prompted to redeploy the model if you've deployed previously).

## Hardware

To deploy DBRX-Instruct you'll need the following resources in the Baseten cloud:

- 2 CPUs
- 32GB RAM
- 1 A100 GPU Accelerator

If your account does not have access to A100 GPUs, you can modify the `config.yaml` to use a different accelerator.

## API

The DBRX-Instruct Truss has a single predict route that accepts a JSON payload with the following parameters:

| Parameter | Type | Description |
|---------------|-------------------|-------------------------------------------------------------------------------------------------------------------------|
| `prompt` | string (required) | The prompt to use for generating text. |
| `max_tokens` | integer | The maximum number of tokens to generate in the output. Defaults to 512. |
| `temperature` | float | Controls the "creativity" of the generated text. Higher values (e.g. 1.0) produce more diverse outputs. Defaults to 0.5. |

Example payload:
```json
{
"prompt": "Write a haiku about constitutional AI.",
"max_tokens": 128,
"temperature": 0.8
}
```

## Example Usage

You can invoke the model via a REST API:

```bash
curl -X POST https://your-app-url.baseten.co/predict \
-H 'Content-Type: application/json' \
-d '{
"prompt": "Write a haiku about constitutional AI.",
"max_tokens": 128,
"temperature": 0.8
}'
```

Or using the Baseten Python client:

```python
import baseten

# Get the deployed model
model = baseten.deployed_model_id('your-deployed-model-id')

# Get the model's predict route
predict = model.predict

# Make a prediction
response = predict(
prompt="Write a haiku about constitutional AI.",
max_tokens=128,
temperature=0.8
)
print(response)
```

## Generation Parameters and Limitations

The DBRX-Instruct model allows for customization of the generation process through parameters such as `max_tokens` and `temperature`. These parameters enable users to control the length and creativity of the generated text. However, it's important to note that increasing `max_tokens` significantly can impact the response time and computational resources required. Similarly, a higher `temperature` can lead to more varied and creative outputs but may also increase the risk of generating off-topic or nonsensical text.

## Optimal Use Cases

DBRX-Instruct excels in scenarios requiring nuanced understanding and generation of text, such as:
- Generating high-quality, contextually relevant content for articles or blogs.
- Summarizing long documents or articles into concise paragraphs.
- Answering questions based on provided context or knowledge.

For best results, it's recommended to provide clear, context-rich prompts and to experiment with generation parameters to find the optimal settings for your specific use case.
32 changes: 32 additions & 0 deletions dbrx-instruct/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Metadata
model_name: DBRX-Instruct
model_description: A language model designed for effective and safe instruction-following, leveraging constitutional AI principles.
model_avatar:
model_cover_image:
example_model_input: What are the key principles of constitutional AI used to train models like DBRX-Instruct?
tags:
- text-generation
- instruction-following

# Runtime Config
python_version: py311
system_packages: []
python_packages:
- torch
- transformers
- accelerate
- sentencepiece

# Resources
resources:
cpu: 2
gpu: 1
mem: 32Gi
accelerator: A100

hf_access_token: {{HF_ACCESS_TOKEN}}

# Environment Variables
env_variables: {}

external_package_dirs: []
1 change: 1 addition & 0 deletions dbrx-instruct/model/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Empty file
50 changes: 50 additions & 0 deletions dbrx-instruct/model/model.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
from typing import Dict

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer


class Model:
def __init__(self, **kwargs):
self.model = None
self.tokenizer = None
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

def load(self):
# Initialize the model and tokenizer paths for DBRX-Instruct
self.model_path = "databricks/dbrx-instruct"
self.tokenizer_path = "databricks/dbrx-instruct"
# Load the tokenizer for DBRX-Instruct
self.tokenizer = AutoTokenizer.from_pretrained(self.tokenizer_path)
# Load the model with trust_remote_code=True to allow custom code execution
self.model = AutoModelForCausalLM.from_pretrained(
self.model_path, trust_remote_code=True
).to(self.device)

def preprocess(self, prompt: str) -> Dict:
return self.tokenizer(prompt, return_tensors="pt").to(self.device)

def postprocess(self, output: Dict) -> str:
return self.tokenizer.decode(output["generated_token_ids"][0])

def moderate(self, text: str) -> str:
# TODO: Implement content moderation logic here
return text

def predict(self, model_input: Dict) -> Dict:
# Extract the input text from the model_input dictionary
input_text = model_input["prompt"]
# Preprocess the input by encoding it with the tokenizer
encoded_input = self.tokenizer.encode(input_text, return_tensors="pt").to(
self.device
)
# Generate text from the model using the encoded input
generated_tokens = self.model.generate(
encoded_input, max_length=512, num_return_sequences=1
)
# Decode the generated tokens into text
generated_text = self.tokenizer.decode(
generated_tokens[0], skip_special_tokens=True
)
# Return the generated text in a dictionary
return {"generated_text": generated_text}
Loading