Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Nitro-Tensorrt-LLM Extension #2280

Merged
merged 40 commits into from
Mar 14, 2024
Merged

feat: Nitro-Tensorrt-LLM Extension #2280

merged 40 commits into from
Mar 14, 2024

Conversation

louis-menlo
Copy link
Contributor

@louis-menlo louis-menlo commented Mar 8, 2024

Describe Your Changes

The extension mainly focuses on distributing its model and running the engine.
NOTES: I have not yet moved to the core, but I will align very soon.

graph LR

TensorRTLLMExtension[TensorRTLLMExtension] -- implements --> DefineModels[Define Models]
TensorRTLLMExtension -- extends --> LocalOAIEngine[LocalOAIEngine]
LocalOAIEngine[LocalOAIEngine] -- implements  --> StartModel[Start Model]
LocalOAIEngine[LocalOAIEngine] -- implements  --> StopModel[Stop Model]
LocalOAIEngine-- extends --> OAIEngine[OAIEngine] 
OAIEngine[OAIEngine]  -- implements --> Inference[Inference]
OAIEngine[OAIEngine] -- implements --> Stop[Stop Inference]
OAIEngine[OAIEngine] -- extends --> AIEngine[AIEngine]
AIEngine[AIEngine]  -- implements --> Populate[Pre-populate Models]
Loading

Main controller (index)

/**
 * A class that implements the InferenceExtension interface from the @janhq/core package.
 * The class provides methods for initializing and stopping a model, and for making inference requests.
 * It also subscribes to events emitted by the @janhq/core package and handles new message requests.
 */
export default class TensorRTLLMExtension extends LocalOAIEngine{
  override provider = 'nitro-tensorrt-llm'

  // Configure node module for updating by Rollup bundler.
  override inference_url = INFERENCE_URL
  override nodeModule = NODE

  /**
   * models implemented by the extension
   * define pre-populated models
   */
  models(): Model[] {
    return models as unknown as Model[]
  }
}

Node module (run engine)

graph LR

LoadModel[Load Model] -- kill port process --> TerminateCheck
TerminateCheck -->|Success| Nitro[Run Nitro]
TerminateCheck -->|Failed \n Port owned by another process| Terminated[Terminated] 

Nitro --> LoadRequest[Send Load Model Request]
LoadRequest --> Done[Done]
Loading
/**
 * Initializes a engine subprocess to load a machine learning model.
 * @param params - The model load settings.
 */
async function loadModel(params: any): Promise<{ error: Error | undefined }> {
  const settings: ModelLoadParams = {
    engine_path: params.modelFolder,
    ctx_len: params.model.settings.ctx_len ?? 2048,
  }
  return runEngine().then(() => loadModelRequest(settings))
}


/**
 * Loads a LLM model into the Engine subprocess by sending a HTTP POST request.
 */
function loadModelRequest(settings: ModelLoadParams): Promise<any> {
  return fetchRetry(LOAD_MODEL_URL, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
    },
    body: JSON.stringify(settings),
    retries: 3,
    retryDelay: 500,
  }).then((res) => { return Promise.resolve({ error: undefined }) })
}

/**
 * Spawns engine subprocess.
 */
function async runEngine() {
    // Current directory by default
    let binaryFolder = path.join(__dirname, '..', 'bin')

    // Binary path
    const binary = path.join(binaryFolder, process.platform === 'win32' ? 'nitro.exe' : 'nitro')

    const args: string[] = ['1', ENGINE_HOST, ENGINE_PORT]
    // Execute the binary
    subprocess = spawn(binary, args, { cwd: binaryFolder })

    // Wait for engine ready
    await tcpPortUsed.waitUntilUsed(parseInt(ENGINE_PORT), 300, 30000)
}

Fixes Issues

  • Closes #

Self Checklist

  • Base Inference Provider
  • Distribute models via the extension
  • Download according assets (GPU specific)

@louis-menlo louis-menlo marked this pull request as draft March 8, 2024 14:45
@louis-menlo louis-menlo force-pushed the tensorrt-llm-extension branch 3 times, most recently from b982fe1 to cf1962b Compare March 8, 2024 15:42
@namchuai namchuai force-pushed the tensorrt-llm-extension branch from cf1962b to 4ad40b8 Compare March 11, 2024 14:29
@louis-menlo louis-menlo force-pushed the tensorrt-llm-extension branch from 203cf6d to 9f79ea0 Compare March 12, 2024 11:37
@louis-menlo louis-menlo force-pushed the tensorrt-llm-extension branch from 9f79ea0 to a63b45d Compare March 12, 2024 18:11
James and others added 20 commits March 13, 2024 01:18
Signed-off-by: James <[email protected]>
Signed-off-by: James <[email protected]>
@louis-menlo louis-menlo marked this pull request as ready for review March 13, 2024 14:29
@louis-menlo louis-menlo changed the title [WIP] feat: Nitro-Tensorrt-LLM Extension feat: Nitro-Tensorrt-LLM Extension Mar 13, 2024
core/src/node/api/restful/helper/builder.ts Dismissed Show dismissed Hide dismissed
core/src/node/api/restful/helper/builder.ts Dismissed Show dismissed Hide dismissed
core/src/node/api/restful/helper/builder.ts Dismissed Show dismissed Hide dismissed
core/src/node/api/restful/helper/builder.ts Dismissed Show dismissed Hide dismissed
core/src/node/api/restful/helper/builder.ts Dismissed Show dismissed Hide dismissed
core/src/node/api/restful/helper/builder.ts Dismissed Show dismissed Hide dismissed
core/src/node/api/restful/helper/builder.ts Dismissed Show dismissed Hide dismissed
core/src/node/api/restful/helper/builder.ts Dismissed Show dismissed Hide dismissed
core/src/node/api/restful/helper/builder.ts Dismissed Show dismissed Hide dismissed
core/src/node/api/restful/helper/builder.ts Dismissed Show dismissed Hide dismissed
James and others added 7 commits March 14, 2024 00:17
* fix: add compatible check

Signed-off-by: James <[email protected]>

* fix: copy

* fix: font

* fix: copy

* fix: broken monitoring extension

* chore: bump engine

* fix: copy

* fix: model copy

* fix: copy

* fix: model json

---------

Signed-off-by: James <[email protected]>
Co-authored-by: James <[email protected]>
Co-authored-by: Louis <[email protected]>
@louis-menlo louis-menlo requested review from urmauur and namchuai March 14, 2024 06:44
@louis-menlo louis-menlo merged commit d85d026 into dev Mar 14, 2024
6 of 7 checks passed
@louis-menlo louis-menlo deleted the tensorrt-llm-extension branch March 14, 2024 07:07
@louis-menlo louis-menlo mentioned this pull request Mar 21, 2024
28 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants