feat: Nitro-Tensorrt-LLM Extension #2280

louis-menlo · 2024-03-08T14:45:43Z

Describe Your Changes

The extension mainly focuses on distributing its model and running the engine.
NOTES: I have not yet moved to the core, but I will align very soon.

graph LR

TensorRTLLMExtension[TensorRTLLMExtension] -- implements --> DefineModels[Define Models]
TensorRTLLMExtension -- extends --> LocalOAIEngine[LocalOAIEngine]
LocalOAIEngine[LocalOAIEngine] -- implements  --> StartModel[Start Model]
LocalOAIEngine[LocalOAIEngine] -- implements  --> StopModel[Stop Model]
LocalOAIEngine-- extends --> OAIEngine[OAIEngine] 
OAIEngine[OAIEngine]  -- implements --> Inference[Inference]
OAIEngine[OAIEngine] -- implements --> Stop[Stop Inference]
OAIEngine[OAIEngine] -- extends --> AIEngine[AIEngine]
AIEngine[AIEngine]  -- implements --> Populate[Pre-populate Models]

Main controller (index)

/**
 * A class that implements the InferenceExtension interface from the @janhq/core package.
 * The class provides methods for initializing and stopping a model, and for making inference requests.
 * It also subscribes to events emitted by the @janhq/core package and handles new message requests.
 */
export default class TensorRTLLMExtension extends LocalOAIEngine{
  override provider = 'nitro-tensorrt-llm'

  // Configure node module for updating by Rollup bundler.
  override inference_url = INFERENCE_URL
  override nodeModule = NODE

  /**
   * models implemented by the extension
   * define pre-populated models
   */
  models(): Model[] {
    return models as unknown as Model[]
  }
}

Node module (run engine)

graph LR

LoadModel[Load Model] -- kill port process --> TerminateCheck
TerminateCheck -->|Success| Nitro[Run Nitro]
TerminateCheck -->|Failed \n Port owned by another process| Terminated[Terminated] 

Nitro --> LoadRequest[Send Load Model Request]
LoadRequest --> Done[Done]

/**
 * Initializes a engine subprocess to load a machine learning model.
 * @param params - The model load settings.
 */
async function loadModel(params: any): Promise<{ error: Error | undefined }> {
  const settings: ModelLoadParams = {
    engine_path: params.modelFolder,
    ctx_len: params.model.settings.ctx_len ?? 2048,
  }
  return runEngine().then(() => loadModelRequest(settings))
}


/**
 * Loads a LLM model into the Engine subprocess by sending a HTTP POST request.
 */
function loadModelRequest(settings: ModelLoadParams): Promise<any> {
  return fetchRetry(LOAD_MODEL_URL, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
    },
    body: JSON.stringify(settings),
    retries: 3,
    retryDelay: 500,
  }).then((res) => { return Promise.resolve({ error: undefined }) })
}

/**
 * Spawns engine subprocess.
 */
function async runEngine() {
    // Current directory by default
    let binaryFolder = path.join(__dirname, '..', 'bin')

    // Binary path
    const binary = path.join(binaryFolder, process.platform === 'win32' ? 'nitro.exe' : 'nitro')

    const args: string[] = ['1', ENGINE_HOST, ENGINE_PORT]
    // Execute the binary
    subprocess = spawn(binary, args, { cwd: binaryFolder })

    // Wait for engine ready
    await tcpPortUsed.waitUntilUsed(parseInt(ENGINE_PORT), 300, 30000)
}

Fixes Issues

Closes #

Self Checklist

Base Inference Provider
Distribute models via the extension
Download according assets (GPU specific)

Signed-off-by: James <[email protected]>

feat: add download tensorrt llm runner

core/src/node/api/restful/helper/builder.ts

Signed-off-by: James <[email protected]>

* fix: add compatible check Signed-off-by: James <[email protected]> * fix: copy * fix: font * fix: copy * fix: broken monitoring extension * chore: bump engine * fix: copy * fix: model copy * fix: copy * fix: model json --------- Signed-off-by: James <[email protected]> Co-authored-by: James <[email protected]> Co-authored-by: Louis <[email protected]>

louis-menlo marked this pull request as draft March 8, 2024 14:45

github-actions bot assigned louis-menlo Mar 8, 2024

louis-menlo force-pushed the tensorrt-llm-extension branch 3 times, most recently from b982fe1 to cf1962b Compare March 8, 2024 15:42

namchuai force-pushed the tensorrt-llm-extension branch from cf1962b to 4ad40b8 Compare March 11, 2024 14:29

louis-menlo force-pushed the tensorrt-llm-extension branch from 203cf6d to 9f79ea0 Compare March 12, 2024 11:37

louis-menlo added 2 commits March 13, 2024 01:11

feat: tensorrt-llm-extension

43957bf

fix: loading

a63b45d

louis-menlo force-pushed the tensorrt-llm-extension branch from 9f79ea0 to a63b45d Compare March 12, 2024 18:11

James and others added 20 commits March 13, 2024 01:18

feat: add download tensorrt llm runner

bf44dcd

Signed-off-by: James <[email protected]>

feat: update to rollupjs instead of webpack for monitoring extension

ef3bd2f

Signed-off-by: James <[email protected]>

feat: move update nvidia info to monitor extension

c53c977

Signed-off-by: James <[email protected]>

allow download tensorrt

ecb7d8e

Signed-off-by: James <[email protected]>

update

11b9d9b

Signed-off-by: James <[email protected]>

allow download tensor rt based on gpu setting

a2ffeb6

Signed-off-by: James <[email protected]>

update downloaded models

825f2b7

Signed-off-by: James <[email protected]>

feat: add extension compatibility

79f37f7

dynamic tensor rt engines

d07bd6d

Signed-off-by: James <[email protected]>

update models

fe7e4a0

Signed-off-by: James <[email protected]>

chore: remove ts-ignore

72bf1ca

feat: getting installation state from extension

8ff5960

Signed-off-by: James <[email protected]>

chore: adding type for decompress

40bf24f

Signed-off-by: James <[email protected]>

feat: update according Louis's comment

46bf6c3

Signed-off-by: James <[email protected]>

feat: add progress for installing extension

f06e9fe

Signed-off-by: James <[email protected]>

chore: remove args from extension installation

de7f639

fix: model download does not work properly

1c9f973

fix: do not allow user to stop tensorrtllm inference

d1516c3

fix: extension installed style

5e7cfec

fix: download tensorrt does not update state

45d8741

Signed-off-by: James <[email protected]>

James and others added 4 commits March 13, 2024 20:53

fix: start download immediately after press install

957288e

Signed-off-by: James <[email protected]>

fix: error switching between engines

214988f

feat: rename inference provider to ai engine and refactor to core

823472e

Merge pull request #2317 from janhq/feat/download-tensorrtllm-runner

89b7068

feat: add download tensorrt llm runner

louis-menlo marked this pull request as ready for review March 13, 2024 14:29

louis-menlo changed the title ~~[WIP] feat: Nitro-Tensorrt-LLM Extension~~ feat: Nitro-Tensorrt-LLM Extension Mar 13, 2024

github-advanced-security bot found potential problems Mar 13, 2024

View reviewed changes

louis-menlo and others added 5 commits March 13, 2024 21:45

fix: missing ulid

f71c9de

fix: core bundler

d689648

feat: add cancel extension installing

e22a23d

Signed-off-by: James <[email protected]>

remove mocking for mac

4f4eae9

Signed-off-by: James <[email protected]>

fix: show models only when extension is ready

05c6ed3

louis-menlo force-pushed the tensorrt-llm-extension branch from 1597f45 to 05c6ed3 Compare March 13, 2024 16:38

Van-QA temporarily deployed to production March 13, 2024 16:43 — with GitHub Actions Inactive

James and others added 7 commits March 14, 2024 00:17

add tensorrt badge for model

6313ad3

Signed-off-by: James <[email protected]>

fix: copy

bf0a170

fix: vulkan support

fb10643

fix: installation button padding

9a3a30f

fix: empty script

a49bdfc

fix: remove hard code string

54b86e4

louis-menlo requested review from urmauur and namchuai March 14, 2024 06:44

namchuai approved these changes Mar 14, 2024

View reviewed changes

louis-menlo merged commit d85d026 into dev Mar 14, 2024
6 of 7 checks passed

louis-menlo deleted the tensorrt-llm-extension branch March 14, 2024 07:07

louis-menlo mentioned this pull request Mar 21, 2024

epic: Refactor Inference Engines #2451

Closed

28 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Nitro-Tensorrt-LLM Extension #2280

feat: Nitro-Tensorrt-LLM Extension #2280

louis-menlo commented Mar 8, 2024 •

edited

Loading

feat: Nitro-Tensorrt-LLM Extension #2280

feat: Nitro-Tensorrt-LLM Extension #2280

Conversation

louis-menlo commented Mar 8, 2024 • edited Loading

Describe Your Changes

Fixes Issues

Self Checklist

louis-menlo commented Mar 8, 2024 •

edited

Loading