epic: Implement Cortex Hardware API for Nvidia #1568

vansangpfiev · 2024-10-29T06:40:01Z

vansangpfiev · 2024-10-31T03:50:35Z

Hardware API Documentation

Get hardware information

GET /v1/hardware

Response:

{
  "cpu": {
    "arch": "string",
    "cores": number,
    "model": "string",
    "instructions": ["string"]
  },
  "os": {
    "version": "string",
    "name": "string"
  },
  "ram": {
    "total": number,
    "available": number,
    "type": "string"
  },
  "storage": {
    "total": number,
    "available": number,
    "type": "string"
  },
  "gpus": [
    {
      "model": "string",
      "vram": "string",
      "driver_version": "string"
    }
  ],
  "power": {
    "battery_life": number,
    "charging_status": "string",
    "is_power_saving": boolean
  },
  "monitors": [
    {
      "resolution": "string",
      "refresh_rate": number,
      "resolution":"string"
    }
  ]
}

Hardware Activation

POST /v1/hardware/activate

{
"gpus": [0, 1]
}

dan-menlo · 2024-10-31T04:49:33Z

Thanks @vansangpfiev. Will we be implementing deactivate this sprint?

vansangpfiev · 2024-10-31T04:57:08Z

Thanks @vansangpfiev. Will we be implementing deactivate this sprint?

Since we have /activate endpoint, I think it is redundant to add /deactivate.
By default, we activate all the GPUs. We deactivate all GPUs that are not in request for /activate.

dan-menlo · 2024-11-06T04:30:38Z

A few notes from our quick call:

Hardware Support

We will need to work with multiple hardware providers, but these can be dealt with in separate sprints:

For Intel, can we detect iGPUs, NPUs and CPUs? (i.e. Lunar Lake)
For AMD, can we detect iGPUs
For Qualcomm/ARM, can we detect Adreno etc

`ngl` settings

We detect hardware to recommend ngl setting to users
/models/start API will infer hardware info from database, and then recommend ngl
This is not part of Hardware API, but /models/start is using hwinfo

vansangpfiev · 2024-11-11T05:09:43Z

CLI Documentation:

Get hardware information

cortex hardware list --cpu --os --ram --storage --gpu --power --monitors

If no flag is specified, display all hardware information

Activate hardware

cortex hardware activate --gpus [gpu_list]

gpu_list is required, [] means deactivate all GPUs

Start model

cortex start [model_id] --gpus [gpu_list]

--gpus is optional, if not specified use all activated GPUs

Run

cortex run [model_id] --gpus [gpu_list]

--gpus is optional, if not specified use all activated GPUs

gabrielle-ong · 2024-11-14T06:04:10Z

Nicely done @vansangpfiev! Testing it out now - 2 quick questions:

I cant seem to deactivate the GPU to test without GPU -

cortex-nightly hardware activate --gpus []
Invalid GPU index provided.

GPU information has Index=1, ID=0 for the same GPU, which is confusing - can we standardize to using Index like the other fields?

vansangpfiev · 2024-11-14T06:14:24Z

Nicely done @vansangpfiev! Testing it out now - 2 quick questions:

I cant seem to deactivate the GPU to test without GPU -
cortex-nightly hardware activate --gpus []
Invalid GPU index provided.
GPU information has Index=1, ID=0 for the same GPU, which is confusing - can we standardize to using Index like the other fields?

Thanks @gabrielle-ong

Let me take a look. Would you mind sharing the cortex.log and cortex-cli.log?
Sure, let me fix it. Actually, the ID is the GPU ID that nvidia-smi reports, it can be different from #index.

gabrielle-ong · 2024-11-14T06:35:30Z

Thanks Sang!
2 - I see, understand. then it'll help to make it clear its the nvidia-smi ID through the help command

1- it just takes in the empty array, no error logs.
cortex-cli.log

20241114 06:32:23.404000 UTC 13784 INFO  CUDA Version: 12.4 - utils/system_info_utils.h:141
20241114 06:32:23.404000 UTC 18228 INFO  Will check for new update, time from last check: 2531 seconds - cortex_upd_cmd.cc:127
20241114 06:32:23.404000 UTC 18228 INFO  Engine release path: https://delta.jan.ai/cortex/latest/version.json - cortex_upd_cmd.cc:138
20241114 06:32:23.545000 UTC 18228 INFO  Got the latest release, update to the config file: v1.0.2-235 - cortex_upd_cmd.cc:175

cortex.log:

20241114 05:38:18.970000 UTC 3728 INFO  Origin:  - main.cc:160
20241114 05:38:19.139000 UTC 12684 INFO  Gpu Driver Version: 551.76 - utils/system_info_utils.h:116
20241114 05:38:19.279000 UTC 12684 INFO  CUDA Version: 12.4 - utils/system_info_utils.h:141
20241114 05:38:19.484000 UTC 12684 INFO  CUDA Version: 12.4 - utils/system_info_utils.h:141
20241114 05:38:19.531000 UTC 12684 INFO  Origin:  - main.cc:160
20241114 05:49:51.989000 UTC 7484 INFO  Origin:  - main.cc:160
20241114 05:49:51.989000 UTC 16792 INFO  activate: {
	"gpus" : 
	[
		0
	]
}
 - hardware.cc:38
20241114 05:49:51.989000 UTC 16792 INFO  No hardware activation changes -> No need to update - hardware_service.cc:211
20241114 05:49:51.989000 UTC 16792 INFO  Origin:  - main.cc:160
20241114 05:50:00.401000 UTC 1384 INFO  Origin:  - main.cc:160
20241114 05:50:00.542000 UTC 4276 INFO  Gpu Driver Version: 551.76 - utils/system_info_utils.h:116
20241114 05:50:00.682000 UTC 4276 INFO  CUDA Version: 12.4 - utils/system_info_utils.h:141
20241114 05:50:00.870000 UTC 4276 INFO  CUDA Version: 12.4 - utils/system_info_utils.h:141
20241114 05:50:00.964000 UTC 4276 INFO  Origin:  - main.cc:160
20241114 05:50:12.567000 UTC 17776 INFO  Origin:  - main.cc:160
20241114 05:50:12.567000 UTC 16092 INFO  activate: {
	"gpus" : 
	[
		1
	]
}
 - hardware.cc:38
20241114 05:50:12.567000 UTC 16092 INFO  Origin:  - main.cc:160
20241114 05:50:37.058000 UTC 15972 INFO  Origin:  - main.cc:160
20241114 05:50:37.058000 UTC 11104 INFO  activate: {
	"gpus" : []
}
 - hardware.cc:38
20241114 05:50:37.058000 UTC 11104 INFO  Origin:  - main.cc:160
20241114 06:32:23.404000 UTC 11156 INFO  Origin:  - main.cc:160
20241114 06:32:23.404000 UTC 6656 INFO  activate: {
	"gpus" : []
}
 - hardware.cc:38
20241114 06:32:23.404000 UTC 6656 INFO  Origin:  - main.cc:160

vansangpfiev · 2024-11-14T08:00:50Z

@gabrielle-ong Can you please try again with nightly 236?

gabrielle-ong · 2024-11-20T07:38:25Z

Thanks Sang! Successfully activate and deactivated GPUs with CLI and API, marking as complete

Using GPU

Using CPU

vansangpfiev added the type: epic A major feature or initiative label Oct 29, 2024

github-project-automation bot added this to Menlo Oct 29, 2024

github-project-automation bot moved this to Investigating in Menlo Oct 29, 2024

vansangpfiev self-assigned this Oct 29, 2024

vansangpfiev moved this from Investigating to In Progress in Menlo Oct 29, 2024

dan-menlo mentioned this issue Oct 31, 2024

planning: Cortex Hardware API #1165

Closed

11 tasks

dan-menlo changed the title ~~epic: Implement Cortex Hardware API~~ epic: Implement Cortex Hardware API for Nvidia Nov 8, 2024

vansangpfiev moved this from In Progress to Review + QA in Menlo Nov 14, 2024

vansangpfiev mentioned this issue Nov 14, 2024

fix: bypass check if activate GPU list is empty #1682

Merged

3 tasks

gabrielle-ong modified the milestones: v1.0.4, v1.0.3 Nov 18, 2024

gabrielle-ong closed this as completed Nov 20, 2024

gabrielle-ong moved this from Review + QA to Completed in Menlo Nov 20, 2024

This was referenced Nov 20, 2024

Implement hardware detection function #1591

Closed

Create API for hardware activation (Nvidia) #1603

Closed

This was referenced Nov 28, 2024

Sprint 26 Planning #1735

Closed

roadmap: Jan has Hardware Controls and System Monitor and Prioritization janhq/jan#3908

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

epic: Implement Cortex Hardware API for Nvidia #1568

epic: Implement Cortex Hardware API for Nvidia #1568

vansangpfiev commented Oct 29, 2024 •

edited by gabrielle-ong

Loading

vansangpfiev commented Oct 31, 2024 •

edited

Loading

dan-menlo commented Oct 31, 2024

vansangpfiev commented Oct 31, 2024

dan-menlo commented Nov 6, 2024 •

edited

Loading

vansangpfiev commented Nov 11, 2024

gabrielle-ong commented Nov 14, 2024

vansangpfiev commented Nov 14, 2024 •

edited

Loading

gabrielle-ong commented Nov 14, 2024

vansangpfiev commented Nov 14, 2024

gabrielle-ong commented Nov 20, 2024 •

edited

Loading

epic: Implement Cortex Hardware API for Nvidia #1568

epic: Implement Cortex Hardware API for Nvidia #1568

Comments

vansangpfiev commented Oct 29, 2024 • edited by gabrielle-ong Loading

Tasklist

Hardware API

/engines

/model/start

Jan

Bugs to Address

Related bugs:

Out-of-scope

vansangpfiev commented Oct 31, 2024 • edited Loading

Hardware API Documentation

Get hardware information

Hardware Activation

dan-menlo commented Oct 31, 2024

vansangpfiev commented Oct 31, 2024

dan-menlo commented Nov 6, 2024 • edited Loading

Hardware Support

ngl settings

vansangpfiev commented Nov 11, 2024

CLI Documentation:

Get hardware information

Activate hardware

Start model

Run

gabrielle-ong commented Nov 14, 2024

vansangpfiev commented Nov 14, 2024 • edited Loading

gabrielle-ong commented Nov 14, 2024

vansangpfiev commented Nov 14, 2024

gabrielle-ong commented Nov 20, 2024 • edited Loading

Using GPU

Using CPU

vansangpfiev commented Oct 29, 2024 •

edited by gabrielle-ong

Loading

`/engines`

`/model/start`

vansangpfiev commented Oct 31, 2024 •

edited

Loading

dan-menlo commented Nov 6, 2024 •

edited

Loading

`ngl` settings

vansangpfiev commented Nov 14, 2024 •

edited

Loading

gabrielle-ong commented Nov 20, 2024 •

edited

Loading