Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A numeric error is thrown by pipeline during text-generation model loading #1024

Open
1 of 5 tasks
DawitJung opened this issue Nov 13, 2024 · 3 comments
Open
1 of 5 tasks
Labels
bug Something isn't working

Comments

@DawitJung
Copy link

DawitJung commented Nov 13, 2024

System Info

  1. Apple M2 Pro
  2. mac OS Sequoia v15.0.1
  3. Google Chrome v130.0.6723.116
  4. @huggingface/transformers v3.0.2

Environment/Platform

  • Website/web-app
  • Browser extension
  • Server-side (e.g., Node.js, Deno, Bun)
  • Desktop app (e.g., Electron)
  • Other (e.g., VSCode extension)

Description

스크린샷 2024-11-13 오후 3 37 14

I'm encountering an issue while trying to load a text-generation model using the pipeline function. My code searches for models on Hugging Face by keyword, filters for "text-generation", "transformers.js", and "onnx" tags, and attempts to load the resulting models. When I attempt to load certain models, an error with a numeric code (e.g., 3330359752) is thrown. The numeric code changes each time, suggesting it may not hold specific meaning.

Notably, it seems that smaller models load successfully, while larger models may consistently throw this error, though I’m not entirely certain.

Models that loaded successfully:

  • onnx-community/Llama-3.2-1B-Instruct-q4f16
  • onnx-community/AMD-OLMo-1B-SFT-DPO

Models that failed to load:

  • onnx-community/Llama-3.2-1B-Instruct
  • onnx-community/Llama-3.2-3B-Instruct
  • Xenova/Phi-3-mini-4k-instruct

Reproduction

generator = await pipeline(
    "text-generation",
    "onnx-community/Llama-3.2-3B-Instruct",
    { device: "webgpu" }
  );
@DawitJung DawitJung added the bug Something isn't working label Nov 13, 2024
@xenova
Copy link
Collaborator

xenova commented Nov 15, 2024

While this might be an out-of-memory issues, the models have been tested and work in Node.js, so maybe a runtime error occurring for WebGPU. cc @guschmue

@dreamingtulpa
Copy link

Experiencing the same issue.

this.model = await AutoModel.from_pretrained('briaai/RMBG-2.0');
this.processor = await AutoProcessor.from_pretrained('briaai/RMBG-2.0');

Results in Model loading failed: 763518952.

You think this is an OOM issue as well?

@arneyjfs
Copy link

arneyjfs commented Nov 25, 2024

Facing the same issue. I'm admittedly out my depth in terms of being able to figure out the issue however I've noticed the dtypes being important
e.g.

  • as per OPs message Llama-3.2-1B-Instruct will not load but the q4f16 version will.
  • onnx-community/AMD-OLMo-1B-SFT-DPO will not automatically load, but being explicit with {device: "webgpu", dtype: "q4" } it will load

Additional Context:
Also on an Apple-Silicone M1 chip.
I have been able to load even the Llama-3.1-8B-Instruct-q4f32_1-MLC model using WebLLM which makes me think memory is maybe not the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants