A numeric error is thrown by pipeline during text-generation model loading #1024

DawitJung · 2024-11-13T06:32:38Z

System Info

Apple M2 Pro
mac OS Sequoia v15.0.1
Google Chrome v130.0.6723.116
@huggingface/transformers v3.0.2

Environment/Platform

Description

I'm encountering an issue while trying to load a text-generation model using the pipeline function. My code searches for models on Hugging Face by keyword, filters for "text-generation", "transformers.js", and "onnx" tags, and attempts to load the resulting models. When I attempt to load certain models, an error with a numeric code (e.g., 3330359752) is thrown. The numeric code changes each time, suggesting it may not hold specific meaning.

Notably, it seems that smaller models load successfully, while larger models may consistently throw this error, though I’m not entirely certain.

Models that loaded successfully:

onnx-community/Llama-3.2-1B-Instruct-q4f16
onnx-community/AMD-OLMo-1B-SFT-DPO

Models that failed to load:

onnx-community/Llama-3.2-1B-Instruct
onnx-community/Llama-3.2-3B-Instruct
Xenova/Phi-3-mini-4k-instruct

Reproduction

generator = await pipeline(
    "text-generation",
    "onnx-community/Llama-3.2-3B-Instruct",
    { device: "webgpu" }
  );

xenova · 2024-11-15T17:54:57Z

While this might be an out-of-memory issues, the models have been tested and work in Node.js, so maybe a runtime error occurring for WebGPU. cc @guschmue

dreamingtulpa · 2024-11-24T12:48:47Z

Experiencing the same issue.

this.model = await AutoModel.from_pretrained('briaai/RMBG-2.0');
this.processor = await AutoProcessor.from_pretrained('briaai/RMBG-2.0');

Results in Model loading failed: 763518952.

You think this is an OOM issue as well?

arneyjfs · 2024-11-25T10:38:02Z

Facing the same issue. I'm admittedly out my depth in terms of being able to figure out the issue however I've noticed the dtypes being important
e.g.

as per OPs message Llama-3.2-1B-Instruct will not load but the q4f16 version will.
onnx-community/AMD-OLMo-1B-SFT-DPO will not automatically load, but being explicit with {device: "webgpu", dtype: "q4" } it will load

Additional Context:
Also on an Apple-Silicone M1 chip.
I have been able to load even the Llama-3.1-8B-Instruct-q4f32_1-MLC model using WebLLM which makes me think memory is maybe not the issue

DawitJung added the bug Something isn't working label Nov 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A numeric error is thrown by pipeline during text-generation model loading #1024

A numeric error is thrown by pipeline during text-generation model loading #1024

DawitJung commented Nov 13, 2024 •

edited

Loading

xenova commented Nov 15, 2024

dreamingtulpa commented Nov 24, 2024

arneyjfs commented Nov 25, 2024 •

edited

Loading

A numeric error is thrown by pipeline during text-generation model loading #1024

A numeric error is thrown by pipeline during text-generation model loading #1024

Comments

DawitJung commented Nov 13, 2024 • edited Loading

System Info

Environment/Platform

Description

Reproduction

xenova commented Nov 15, 2024

dreamingtulpa commented Nov 24, 2024

arneyjfs commented Nov 25, 2024 • edited Loading

DawitJung commented Nov 13, 2024 •

edited

Loading

arneyjfs commented Nov 25, 2024 •

edited

Loading