Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Arch Detector Causing Special Characters in GPU Name in settings.json #1140

Open
Tracked by #1108
Van-QA opened this issue Aug 26, 2024 · 5 comments
Open
Tracked by #1108
Assignees
Labels
category: hardware management Related to hardware & compute P1: important Important feature / fix type: bug Something isn't working

Comments

@Van-QA
Copy link
Contributor

Van-QA commented Aug 26, 2024

Description:
The arch detector in the system is not functioning properly, leading to the introduction of special characters in the GPU name within the settings.json file. Specifically, the GPU name is populated as "NVIDIA RTX A6000\r", where the carriage return (\r) is incorrectly included. This issue seems to originate from the GPU architecture detection logic.

Steps to Reproduce:

  1. Run the system and allow it to generate the settings.json file.
  2. Observe the settings.json file, particularly the GPU section:
    "nvidia_driver": {
        "exist": true,
        "version": "560.76",
        "name": "NVIDIA RTX A6000\r"
    }
  3. Note the presence of the special character (\r) in the GPU name.

Expected Result:
The settings.json file should contain a clean, correctly formatted GPU name without any special characters.

Actual Result:
The settings.json file contains a GPU name with an unintended special character (\r), likely due to a flaw in the arch detector logic.

Reported by: cbai970
https://discord.com/channels/1107178041848909847/1277272158367776828/1277291486198894657

@Van-QA Van-QA added the type: bug Something isn't working label Aug 26, 2024
@cbai970
Copy link

cbai970 commented Aug 26, 2024

im subbing to this, when you get a solution, Ill retest on that same platform.

@cbai970
Copy link

cbai970 commented Aug 26, 2024

Im happy to help move this forward but I dont know the full .js tree and how it gets created to the point where we hit this bug. Ive been been reading through as much Tensorflow and Tensorflow-LLM documentation as I can but not entirely sure.

If i could have someone tell me how we end up getting to "index.js" in the spawning of the tree, Ill do my best to post what I find (if it helps) , I am under the assumption that this function ends up calling a "compile from source against Arch(itecture)" thread, but maybe Im wrong.

give me a little friends, and Ill give you a lot :) p.s. I am a former defect research engineer...

@louis-jan
Copy link
Contributor

Root cause:
The extension detects only GPU models that start with 30 or 40. Which is incorrect, should cover other cases (E.g. Axx)

@louis-jan
Copy link
Contributor

Is a possible fix coming from cortex-cpp? cc @imtuyethan @Van-QA

@cbai970
Copy link

cbai970 commented Aug 28, 2024

Root cause: The extension detects only GPU models that start with 30 or 40. Which is incorrect, should cover other cases (E.g. Axx)

I would go with CUDA compute levels, unless there are additional architectural reasons not to, but I could not find any.

I was actually trying to figure out how to code myself, but (again) im not a programmer kind of afraid of breaking stuff, Also... this is all gets autogenerated when Jan first executes (and regens when you reset, because i definitely tested that) so Im not sure what root this all comes out from.

@freelerobot freelerobot added the P1: important Important feature / fix label Sep 5, 2024
@freelerobot freelerobot transferred this issue from janhq/jan Sep 6, 2024
@freelerobot freelerobot added the category: hardware management Related to hardware & compute label Sep 6, 2024
@dan-menlo dan-menlo moved this to Scheduled in Menlo Sep 8, 2024
@dan-menlo dan-menlo moved this from Scheduled to Triage in Menlo Sep 29, 2024
@freelerobot freelerobot moved this from Investigating to Planning in Menlo Oct 15, 2024
@dan-menlo dan-menlo moved this from Planning to Investigating in Menlo Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: hardware management Related to hardware & compute P1: important Important feature / fix type: bug Something isn't working
Projects
Status: Investigating
Development

No branches or pull requests

8 participants
@namchuai @vansangpfiev @nguyenhoangthuan99 @Van-QA @freelerobot @louis-jan @cbai970 and others