Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: remote engine #1666

Merged
merged 54 commits into from
Dec 5, 2024
Merged
Changes from 3 commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
315bc2f
Merge branch 'dev' of github.com:janhq/cortex.cpp into feat/remote-en…
nguyenhoangthuan99 Nov 11, 2024
190d40b
Init remote engine
nguyenhoangthuan99 Nov 11, 2024
6cd2ec6
Merge branch 'dev' into feat/remote-engine
nguyenhoangthuan99 Nov 11, 2024
c6124ba
Fix: CI build windows
nguyenhoangthuan99 Nov 11, 2024
7441506
Merge branch 'feat/remote-engine' of github.com:janhq/cortex.cpp into…
nguyenhoangthuan99 Nov 11, 2024
135c41e
Fix: CI build windows
nguyenhoangthuan99 Nov 11, 2024
a916ec8
Fix: CI build windows
nguyenhoangthuan99 Nov 11, 2024
a9c0d8b
Fix: CI build windows
nguyenhoangthuan99 Nov 12, 2024
9d1a9d8
feat: new db schema for model and template for engine
luke-nguyen990 Nov 12, 2024
127d429
Merge branch 'feat/remote-engine' of github.com:janhq/cortex.cpp into…
nguyenhoangthuan99 Nov 12, 2024
c435918
Add remote model
nguyenhoangthuan99 Nov 12, 2024
28d3106
Add Get, List, Update support for remote models
nguyenhoangthuan99 Nov 13, 2024
6508c98
change model_id to model in remote engine
nguyenhoangthuan99 Nov 13, 2024
502a0b9
Merge dev
nguyenhoangthuan99 Nov 13, 2024
7b295f8
fix: mac compatibility
luke-nguyen990 Nov 13, 2024
d921869
chore: some refactors before making big changes
luke-nguyen990 Nov 13, 2024
18f3900
feat: db ops for engines
luke-nguyen990 Nov 13, 2024
c5148e5
chore: small refactor before more changes
luke-nguyen990 Nov 13, 2024
b2567ad
Update engine
nguyenhoangthuan99 Nov 13, 2024
6639511
Merge branch 'feat/remote-engine' of github.com:janhq/cortex.cpp into…
nguyenhoangthuan99 Nov 13, 2024
ca3972e
refine db schema, composite key for engines
luke-nguyen990 Nov 14, 2024
bedb803
add entry definition for engine at db layer
luke-nguyen990 Nov 14, 2024
a10294e
complete add, get engine operations
luke-nguyen990 Nov 14, 2024
5f9e706
engine managements
luke-nguyen990 Nov 14, 2024
a1f95b2
Merge branch 'dev' into feat/remote-engine
nguyenhoangthuan99 Nov 14, 2024
e50c0e2
Merge branch 'dev' into feat/remote-engine
nguyenhoangthuan99 Nov 14, 2024
d3187db
Merge branch 'dev' into feat/remote-engine
nguyenhoangthuan99 Nov 14, 2024
cab8b44
Integrate with remote engine to run remote model
nguyenhoangthuan99 Nov 14, 2024
263720e
Merge branch 'dev' into feat/remote-engine
nguyenhoangthuan99 Nov 14, 2024
9d50f5f
error handling and response transform
nguyenhoangthuan99 Nov 14, 2024
689e38c
Merge branch 'feat/remote-engine' of github.com:janhq/cortex.cpp into…
nguyenhoangthuan99 Nov 14, 2024
932f7ed
Merge branch 'dev' into feat/remote-engine
nguyenhoangthuan99 Nov 15, 2024
fa434a4
Support for stream request
nguyenhoangthuan99 Nov 15, 2024
06553db
Merge branch 'feat/remote-engine' of github.com:janhq/cortex.cpp into…
nguyenhoangthuan99 Nov 15, 2024
3f9f451
chore: fix conflicts
sangjanai Nov 26, 2024
5473a75
feat: anthropic
vansangpfiev Nov 26, 2024
071d84c
feat: support anthropic
vansangpfiev Nov 27, 2024
9444dcb
feat: support anthropic
vansangpfiev Nov 27, 2024
2cbdf58
Merge branch 'dev' of https://github.com/janhq/cortex.cpp into feat/r…
sangjanai Nov 28, 2024
0e3ad7d
Merge branch 'dev' of https://github.com/janhq/cortex.cpp into feat/r…
sangjanai Nov 29, 2024
086a195
Merge branch 'feat/remote-engine' of https://github.com/janhq/cortex.…
sangjanai Nov 29, 2024
1b30777
chore: rename
sangjanai Nov 29, 2024
e3371a0
chore: cleanup and fix unit tests
sangjanai Nov 29, 2024
e14ee6e
fix: issue with db
vansangpfiev Nov 29, 2024
cee2838
chore: refactor remote engine
sangjanai Nov 29, 2024
fd81fc9
Merge branch 'dev' of https://github.com/janhq/cortex.cpp into feat/r…
sangjanai Dec 1, 2024
ac8aeff
Merge branch 'dev' of https://github.com/janhq/cortex.cpp into feat/r…
sangjanai Dec 2, 2024
1f2a5dc
fix: e2e tests
sangjanai Dec 2, 2024
3220ad8
fix: e2e tests
sangjanai Dec 2, 2024
5b97c93
Merge branch 'dev' of https://github.com/janhq/cortex.cpp into feat/r…
sangjanai Dec 3, 2024
bcc4c80
Merge branch 'dev' of https://github.com/janhq/cortex.cpp into feat/r…
sangjanai Dec 4, 2024
90694c4
chore: API docs
sangjanai Dec 4, 2024
a7e4659
fix: use different interface for remote engine
vansangpfiev Dec 4, 2024
8e44992
Merge branch 'dev' into feat/remote-engine
vansangpfiev Dec 4, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
97 changes: 97 additions & 0 deletions docs/docs/capabilities/embeddings.md
Original file line number Diff line number Diff line change
@@ -6,3 +6,100 @@ title: Embeddings
:::

cortex.cpp now support embeddings endpoint with fully OpenAI compatible.


For embeddings API usage please refer to [API references](/api-reference#tag/chat/POST/v1/embeddings). This tutorial show you how to use embeddings in cortex with openai python SDK.

## Embedding with openai compatible

### 1. Start server and run model

```
cortex run llama3.1:8b-gguf-q4-km
```

### 2. Create script `embeddings.py` with this content

```
from datetime import datetime
from openai import OpenAI
from pydantic import BaseModel
ENDPOINT = "http://localhost:39281/v1"
MODEL = "llama3.1:8bb-gguf-q4-km"
client = OpenAI(
base_url=ENDPOINT,
api_key="not-needed"
)
```

### 3. Create embeddings

```
response = client.embeddings.create(input = "embedding", model=MODEL, encoding_format="base64")
print(response)
```

The reponse will be like this

```
CreateEmbeddingResponse(
data=[
Embedding(
embedding='hjuAPOD8TryuPU8...',
index=0,
object='embedding'
)
],
model='meta-llama3.1-8b-instruct',
object='list',
usage=Usage(
prompt_tokens=2,
total_tokens=2
)
)
```


The output embeddings is encoded as base64 string. Default the model will output the embeddings in float mode.

```
response = client.embeddings.create(input = "embedding", model=MODEL)
print(response)
```

Result will be

```
CreateEmbeddingResponse(
data=[
Embedding(
embedding=[0.1, 0.3, 0.4 ....],
index=0,
object='embedding'
)
],
model='meta-llama3.1-8b-instruct',
object='list',
usage=Usage(
prompt_tokens=2,
total_tokens=2
)
)
```

Cortex also supports all input types as [OpenAI](https://platform.openai.com/docs/api-reference/embeddings/create#embeddings-create-input).

```sh
# input as string
response = client.embeddings.create(input = "embedding", model=MODEL)

# input as array of string
response = client.embeddings.create(input = ["embedding"], model=MODEL)

# input as array of tokens
response = client.embeddings.create(input = [12,44,123], model=MODEL)

# input as array of arrays contain tokens
response = client.embeddings.create(input = [[912,312,54],[12,433,1241]], model=MODEL)
```

16 changes: 13 additions & 3 deletions docs/static/openapi/cortex.json
Original file line number Diff line number Diff line change
@@ -190,7 +190,7 @@
]
}
},
"v1/embeddings": {
"/v1/embeddings": {
"post": {
"summary": "Create embeddings",
"description": "Creates an embedding vector representing the input text.",
@@ -204,22 +204,29 @@
"input": {
"oneOf": [
{
"type": "string"
"type": "string",
"description":"The string that will be turned into an embedding."
},
{
"type": "array",
"description" : "The array of strings that will be turned into an embedding.",
"items": {
"type": "string"
}
},
{
"type": "array",
"description": "The array of integers that will be turned into an embedding.",
"items": {
"type": "integer"

}
},
{
"type": "array",

"description" : "The array of arrays containing integers that will be turned into an embedding.",

"items": {
"type": "array",
"items": {
@@ -290,7 +297,10 @@
}
}
}
}
},
"tags": [
"Embeddings"
]
}
},
"/v1/chat/completions": {
14 changes: 6 additions & 8 deletions engine/config/chat_template_renderer.h
Original file line number Diff line number Diff line change
@@ -48,10 +48,11 @@
#include <vector>
namespace config {

#if (defined(_MSC_VER) && _MSC_VER >= 1900 && defined(__cpp_char8_t)) || __cplusplus >= 202002L
#define LU8(x) reinterpret_cast<const char*>(u8##x)
#if (defined(_MSC_VER) && _MSC_VER >= 1900 && defined(__cpp_char8_t)) || \
__cplusplus >= 202002L
#define LU8(x) reinterpret_cast<const char*>(u8##x)
#else
#define LU8(x) u8##x
#define LU8(x) u8##x
#endif

typedef struct llama_chat_message {
@@ -167,13 +168,10 @@ static int32_t llama_chat_apply_template_internal(
std::string system_prompt = "";
for (auto message : chat) {
std::string role(message->role);
if (role == "system") {
// there is no system message for gemma, but we will merge it with user prompt, so nothing is broken
system_prompt = trim(message->content);
continue;
}
// in gemma, "assistant" is "model"
role = role == "assistant" ? "model" : message->role;
// in gemma2, "system" is "user"
role = role == "system" ? "user" : role;
ss << "<start_of_turn>" << role << "\n";
if (!system_prompt.empty() && role != "model") {
ss << system_prompt << "\n\n";