Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama3.2-vision Run Error #7300

Closed
mruckman1 opened this issue Oct 21, 2024 · 21 comments
Closed

Llama3.2-vision Run Error #7300

mruckman1 opened this issue Oct 21, 2024 · 21 comments
Labels
bug Something isn't working

Comments

@mruckman1
Copy link

What is the issue?

  1. Updated Ollama this morning.
  2. Entered ollama run x/llama3.2-vision on macbook
  3. Got below output:

pulling manifest
pulling 652e85aa1e14... 100% ▕████████████████▏ 6.0 GB
pulling 622429e8d318... 100% ▕████████████████▏ 1.9 GB
pulling 962e0f69a367... 100% ▕████████████████▏ 163 B
pulling dc49c86b8ebb... 100% ▕████████████████▏ 30 B
pulling 6a50468ba2a8... 100% ▕████████████████▏ 498 B
verifying sha256 digest
writing manifest
success
> Error: llama runner process has terminated: error:Missing required key: clip.has_text_encoder

Expected: Ollama download without error.

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.3.14

@mruckman1 mruckman1 added the bug Something isn't working label Oct 21, 2024
@rick-github
Copy link
Collaborator

Vision support was merged recently (#6963), 0.3.14 doesn't include it.

@silasalves
Copy link

What does "vision support" mean? Does it enabling "submitting multiple images for inference" or "video inference"? Or is it just the support for this particular model?

AFAIK, video or multiple images are still an open issue #3184

@rick-github
Copy link
Collaborator

Vision support for llama3.2. llama3.2 doesn't do video, and doesn't work reliably with multiple images.

@pavan-otthi123
Copy link

pavan-otthi123 commented Oct 22, 2024

Does this mean that llama3.2-vision can't be used in the current version of Ollama?

I'm also getting the same error when attempting to run the model

@rick-github
Copy link
Collaborator

Version 0.4.0 will support llama3.2-vision.

@Animaxx
Copy link

Animaxx commented Oct 22, 2024

Thank you for the hard work, could we also this change to Llama.cpp repo as well?
How can we convert the model from HF to GGUF with llama vision structure?

@silasalves
Copy link

@rick-github thanks for the clarification! Also, any plans for making it run on the GPU? Llama3.2 runs on my GPU (GTX1660Ti), but llama3.2-vision runs on CPU only.

@jessegross
Copy link
Contributor

@rick-github thanks for the clarification! Also, any plans for making it run on the GPU? Llama3.2 runs on my GPU (GTX1660Ti), but llama3.2-vision runs on CPU only.

It can run on the GPU but it needs more RAM than the text-only versions, so it has likely exceed the limit of your GPU.

@rick-github
Copy link
Collaborator

It should run on GPU if it fits:

$ ollama ps
NAME                            ID              SIZE    PROCESSOR       UNTIL   
x/llama3.2-vision:latest        25e973636a29    11 GB   100% GPU        Forever

If you can provide server logs perhaps we can see why it's not working for you.

@silasalves
Copy link

@jessegross Thanks for pointing that out. That sounds correct, my GPU is quite old and has only 4GB RAM.

@rick-github Thanks for the support, this is my server.log https://gist.github.com/silasalves/f2bdfc195618f19ecd557b945cab32b9

I think this is the important part?

time=2024-10-22T14:22:10.644-04:00 level=INFO source=llama-server.go:72 msg="system memory" total="31.9 GiB" free="13.6 GiB" free_swap="19.0 GiB"
time=2024-10-22T14:22:10.649-04:00 level=INFO source=memory.go:346 msg="offload to cuda" projector.weights="1.8 GiB" projector.graph="2.8 GiB" layers.requested=-1 layers.model=41 layers.offload=0 layers.split="" memory.available="[4.1 GiB]" memory.gpu_overhead="0 B" memory.required.full="5.9 GiB" memory.required.partial="0 B" memory.required.kv="320.0 MiB" memory.required.allocations="[0 B]" memory.weights.total="5.2 GiB" memory.weights.repeating="4.8 GiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="213.3 MiB" memory.graph.partial="213.3 MiB"

@rick-github
Copy link
Collaborator

Yep, too big for your card.

@pdevine
Copy link
Contributor

pdevine commented Oct 23, 2024

@Animaxx unfortunately backporting it to work with llama.cpp would be tricky because the image preparsing step is written in golang, and not c++.

I'm going to go ahead and close the issue since things are working as expected. You just need to use the pre-release to make it work.

@ludos1978
Copy link

i've read that ollama 0.4 should support vision tasks.
but also i understood that 0.3.14 should be able to load the x/llama-vision model. Is that correct?

if it's correct i am getting the same error as mentioned above, on a 90GByte M2 Macbook using 0.3.14:
Error: llama runner process has terminated: error:Missing required key: clip.has_text_encoder

@rick-github
Copy link
Collaborator

rick-github commented Oct 25, 2024

0.3.14 cannot load x/llama3.2-vision.

@eulercat
Copy link

eulercat commented Oct 26, 2024

@pdevine
Is it possible to use REST API like this on the latest?

curl -X POST http://127.0.0.1:11434/api/chat \
-H "Content-Type: application/json" \
-d '{ "model": "x/llama3.2-vision", 
 "message": [
     {"role": "user", 
      "content": [
        {"type": "text", "text": "What’s in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
          },
        },
      ],
     }
] }'

@pdevine
Copy link
Contributor

pdevine commented Oct 28, 2024

@eulercat we don't support pulling images w/ image_url. You'll have to base64 encode your image, so it looks like:

curl http://localhost:11434/api/chat -d '{
  "model": "x/llama3.2-vision",
  "messages": [
    {
      "role": "user",
      "content": "what is in this image?",
      "images": ["iVBORw0KGgoAAAANSUhEUgAAAG0AAABmCAYAAADBPx+VAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAA3VSURBVHgB7Z27r0zdG8fX743i1bi1ikMoFMQloXRpKFFIqI7LH4BEQ+NWIkjQuSWCRIEoULk0gsK1kCBI0IhrQVT7tz/7zZo888yz1r7MnDl7z5xvsjkzs2fP3uu71nNfa7lkAsm7d++Sffv2JbNmzUqcc8m0adOSzZs3Z+/XES4ZckAWJEGWPiCxjsQNLWmQsWjRIpMseaxcuTKpG/7HP27I8P79e7dq1ars/yL4/v27S0ejqwv+cUOGEGGpKHR37tzJCEpHV9tnT58+dXXCJDdECBE2Ojrqjh071hpNECjx4cMHVycM1Uhbv359B2F79+51586daxN/+pyRkRFXKyRDAqxEp4yMlDDzXG1NPnnyJKkThoK0VFd1ELZu3TrzXKxKfW7dMBQ6bcuWLW2v0VlHjx41z717927ba22U9APcw7Nnz1oGEPeL3m3p2mTAYYnFmMOMXybPPXv2bNIPpFZr1NHn4HMw0KRBjg9NuRw95s8PEcz/6DZELQd/09C9QGq5RsmSRybqkwHGjh07OsJSsYYm3ijPpyHzoiacg35MLdDSIS/O1yM778jOTwYUkKNHWUzUWaOsylE00MyI0fcnOwIdjvtNdW/HZwNLGg+sR1kMepSNJXmIwxBZiG8tDTpEZzKg0GItNsosY8USkxDhD0Rinuiko2gfL/RbiD2LZAjU9zKQJj8RDR0vJBR1/Phx9+PHj9Z7REF4nTZkxzX4LCXHrV271qXkBAPGfP/atWvu/PnzHe4C97F48eIsRLZ9+3a3f/9+87dwP1JxaF7/3r17ba+5l4EcaVo0lj3SBq5kGTJSQmLWMjgYNei2GPT1MuMqGTDEFHzeQSP2wi/jGnkmPJ/nhccs44jvDAxpVcxnq0F6eT8h4ni/iIWpR5lPyA6ETkNXoSukvpJAD3AsXLiwpZs49+fPn5ke4j10TqYvegSfn0OnafC+Tv9ooA/JPkgQysqQNBzagXY55nO/oa1F7qvIPWkRL12WRpMWUvpVDYmxAPehxWSe8ZEXL20sadYIozfmNch4QJPAfeJgW3rNsnzphBKNJM2KKODo1rVOMRYik5ETy3ix4qWNI81qAAirizgMIc+yhTytx0JWZuNI03qsrgWlGtwjoS9XwgUhWGyhUaRZZQNNIEwCiXD16tXcAHUs79co0vSD8rrJCIW98pzvxpAWyyo3HYwqS0+H0BjStClcZJT5coMm6D2LOF8TolGJtK9fvyZpyiC5ePFi9nc/oJU4eiEP0jVoAnHa9wyJycITMP78+eMeP37sXrx44d6+fdt6f82aNdkx1pg9e3Zb5W+RSRE+n+VjksQWifvVaTKFhn5O8my63K8Qabdv33b379/PiAP//vuvW7BggZszZ072/+TJk91YgkafPn166zXB1rQHFvouAWHq9z3SEevSUerqCn2/dDCeta2jxYbr69evk4MHDyY7d+7MjhMnTiTPnz9Pfv/+nfQT2ggpO2dMF8cghuoM7Ygj5iWCqRlGFml0QC/ftGmTmzt3rmsaKDsgBSPh0/8yPeLLBihLkOKJc0jp8H8vUzcxIA1k6QJ/c78tWEyj5P3o4u9+jywNPdJi5rAH9x0KHcl4Hg570eQp3+vHXGyrmEeigzQsQsjavXt38ujRo44LQuDDhw+TW7duRS1HGgMxhNXHgflaNTOsHyKvHK5Ijo2jbFjJBQK9YwFd6RVMzfgRBmEfP37suBBm/p49e1qjEP2mwTViNRo0VJWH1deMXcNK08uUjVUu7s/zRaL+oLNxz1bpANco4npUgX4G2eFbpDFyQoQxojBCpEGSytmOH8qrH5Q9vuzD6ofQylkCUmh8DBAr+q8JCyVNtWQIidKQE9wNtLSQnS4jDSsxNHogzFuQBw4cyM61UKVsjfr3ooBkPSqqQHesUPWVtzi9/vQi1T+rJj7WiTz4Pt/l3LxUkr5P2VYZaZ4URpsE+st/dujQoaBBYokbrz/8TJNQYLSonrPS9kUaSkPeZyj1AWSj+d+VBoy1pIWVNed8P0Ll/ee5HdGRhrHhR5GGN0r4LGZBaj8oFDJitBTJzIZgFcmU0Y8ytWMZMzJOaXUSrUs5RxKnrxmbb5YXO9VGUhtpXldhEUogFr3IzIsvlpmdosVcGVGXFWp2oU9kLFL3dEkSz6NHEY1sjSRdIuDFWEhd8KxFqsRi1uM/nz9/zpxnwlESONdg6dKlbsaMGS4EHFHtjFIDHwKOo46l4TxSuxgDzi+rE2jg+BaFruOX4HXa0Nnf1lwAPufZeF8/r6zD97WK2qFnGjBxTw5qNGPxT+5T/r7/7RawFC3j4vTp09koCxkeHjqbHJqArmH5UrFKKksnxrK7FuRIs8STfBZv+luugXZ2pR/pP9Ois4z+TiMzUUkUjD0iEi1fzX8GmXyuxUBRcaUfykV0YZnlJGKQpOiGB76x5GeWkWWJc3mOrK6S7xdND+W5N6XyaRgtWJFe13GkaZnKOsYqGdOVVVbGupsyA/l7emTLHi7vwTdirNEt0qxnzAvBFcnQF16xh/TMpUuXHDowhlA9vQVraQhkudRdzOnK+04ZSP3DUhVSP61YsaLtd/ks7ZgtPcXqPqEafHkdqa84X6aCeL7YWlv6edGFHb+ZFICPlljHhg0bKuk0CSvVznWsotRu433alNdFrqG45ejoaPCaUkWERpLXjzFL2Rpllp7PJU2a/v7Ab8N05/9t27Z16KUqoFGsxnI9EosS2niSYg9SpU6B4JgTrvVW1flt1sT+0ADIJU2maXzcUTraGCRaL1Wp9rUMk16PMom8QhruxzvZIegJjFU7LLCePfS8uaQdPny4jTTL0dbee5mYokQsXTIWNY46kuMbnt8Kmec+LGWtOVIl9cT1rCB0V8WqkjAsRwta93TbwNYoGKsUSChN44lgBNCoHLHzquYKrU6qZ8lolCIN0Rh6cP0Q3U6I6IXILYOQI513hJaSKAorFpuHXJNfVlpRtmYBk1Su1obZr5dnKAO+L10Hrj3WZW+E3qh6IszE37F6EB+68mGpvKm4eb9bFrlzrok7fvr0Kfv727dvWRmdVTJHw0qiiCUSZ6wCK+7XL/AcsgNyL74DQQ730sv78Su7+t/A36MdY0sW5o40ahslXr58aZ5HtZB8GH64m9EmMZ7FpYw4T6QnrZfgenrhFxaSiSGXtPnz57e9TkNZLvTjeqhr734CNtrK41L40sUQckmj1lGKQ0rC37x544r8eNXRpnVE3ZZY7zXo8NomiO0ZUCj2uHz58rbXoZ6gc0uA+F6ZeKS/jhRDUq8MKrTho9fEkihMmhxtBI1DxKFY9XLpVcSkfoi8JGnToZO5sU5aiDQIW716ddt7ZLYtMQlhECdBGXZZMWldY5BHm5xgAroWj4C0hbYkSc/jBmggIrXJWlZM6pSETsEPGqZOndr2uuuR5rF169a2HoHPdurUKZM4CO1WTPqaDaAd+GFGKdIQkxAn9RuEWcTRyN2KSUgiSgF5aWzPTeA/lN5rZubMmR2bE4SIC4nJoltgAV/dVefZm72AtctUCJU2CMJ327hxY9t7EHbkyJFseq+EJSY16RPo3Dkq1kkr7+q0bNmyDuLQcZBEPYmHVdOBiJyIlrRDq41YPWfXOxUysi5fvtyaj+2BpcnsUV/oSoEMOk2CQGlr4ckhBwaetBhjCwH0ZHtJROPJkyc7UjcYLDjmrH7ADTEBXFfOYmB0k9oYBOjJ8b4aOYSe7QkKcYhFlq3QYLQhSidNmtS2RATwy8YOM3EQJsUjKiaWZ+vZToUQgzhkHXudb/PW5YMHD9yZM2faPsMwoc7RciYJXbGuBqJ1UIGKKLv915jsvgtJxCZDubdXr165mzdvtr1Hz5LONA8jrUwKPqsmVesKa49S3Q4WxmRPUEYdTjgiUcfUwLx589ySJUva3oMkP6IYddq6HMS4o55xBJBUeRjzfa4Zdeg56QZ43LhxoyPo7Lf1kNt7oO8wWAbNwaYjIv5lhyS7kRf96dvm5Jah8vfvX3flyhX35cuX6HfzFHOToS1H4BenCaHvO8pr8iDuwoUL7tevX+b5ZdbBair0xkFIlFDlW4ZknEClsp/TzXyAKVOmmHWFVSbDNw1l1+4f90U6IY/q4V27dpnE9bJ+v87QEydjqx/UamVVPRG+mwkNTYN+9tjkwzEx+atCm/X9WvWtDtAb68Wy9LXa1UmvCDDIpPkyOQ5ZwSzJ4jMrvFcr0rSjOUh+GcT4LSg5ugkW1Io0/SCDQBojh0hPlaJdah+tkVYrnTZowP8iq1F1TgMBBauufyB33x1v+NWFYmT5KmppgHC+NkAgbmRkpD3yn9QIseXymoTQFGQmIOKTxiZIWpvAatenVqRVXf2nTrAWMsPnKrMZHz6bJq5jvce6QK8J1cQNgKxlJapMPdZSR64/UivS9NztpkVEdKcrs5alhhWP9NeqlfWopzhZScI6QxseegZRGeg5a8C3Re1Mfl1ScP36ddcUaMuv24iOJtz7sbUjTS4qBvKmstYJoUauiuD3k5qhyr7QdUHMeCgLa1Ear9NquemdXgmum4fvJ6w1lqsuDhNrg1qSpleJK7K3TF0Q2jSd94uSZ60kK1e3qyVpQK6PVWXp2/FC3mp6jBhKKOiY2h3gtUV64TWM6wDETRPLDfSakXmH3w8g9Jlug8ZtTt4kVF0kLUYYmCCtD/DrQ5YhMGbA9L3ucdjh0y8kOHW5gU/VEEmJTcL4Pz/f7mgoAbYkAAAAAElFTkSuQmCC"]
    }
  ]
}'

You can find out more information here

@pdevine
Copy link
Contributor

pdevine commented Oct 28, 2024

@ludos1978 you'll need 0.4.0 for it to work. Unfortunately we're still working through some issues w/ the release candidates.

@rick-github
Copy link
Collaborator

If the image is large, it will exceed the maximum argument length of the shell.

(echo '{
         "model":"x/llama3.2-vision",
         "messages":[
           { "role":"user",
             "content":"describe this image",
             "images":["' ;
               curl -s https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg | base64 -w0 ; echo '"
             ]
           }
         ],
         "stream":false
       }') | curl -s localhost:11434/api/chat -d @- | jq
{
  "model": "x/llama3.2-vision",
  "created_at": "2024-10-28T23:14:35.376161501Z",
  "message": {
    "role": "assistant",
    "content": "The image depicts a serene and peaceful scene, with a wooden boardwalk winding its way through a lush grassy field. The boardwalk is made of light-colored wood and features a simple design, with no visible railings or obstacles to obstruct the view.\n\nAs the boardwalk stretches out into the distance, it disappears from sight, inviting the viewer to imagine where it might lead. The surrounding grass is tall and green, swaying gently in the breeze, while trees dot the horizon, adding depth and texture to the landscape.\n\nAbove, a brilliant blue sky with white clouds provides a stunning backdrop, casting dappled shadows across the boardwalk and creating a sense of warmth and tranquility. Overall, the image exudes a sense of calmness and serenity, inviting the viewer to step into its peaceful world."
  },
  "done_reason": "stop",
  "done": true,
  "total_duration": 3744887728,
  "load_duration": 34980268,
  "prompt_eval_count": 13,
  "prompt_eval_duration": 45000000,
  "eval_count": 164,
  "eval_duration": 3302000000
}

@jhowilbur
Copy link

@Animaxx unfortunately backporting it to work with llama.cpp would be tricky because the image preparsing step is written in golang, and not c++.

I'm going to go ahead and close the issue since things are working as expected. You just need to use the pre-release to make it work.

But with some effort, I believe it will be possible to use their Golang binding to c++
they did it with whisper.cpp
https://github.com/ggerganov/whisper.cpp/tree/master/bindings/go

To our surprise, it's calling the same libraries as those used in llama.cpp, the core to do the tensor computations, the lib GGML written in cpp.

@delenius
Copy link

delenius commented Nov 5, 2024

I am getting the same error on a M3 Macbook with 64gb, with Ollama 0.4.0-rc8.

@rick-github
Copy link
Collaborator

Server logs will help in debugging.

$ curl localhost:11434/api/version
{"version":"0.4.0-rc8"}
$ (echo '{
         "model":"x/llama3.2-vision",
         "messages":[
           { "role":"user",
             "content":"describe this image",
             "images":["' ;
               curl -s https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg | base64 -w0 ; echo '"
             ]
           }
         ],
         "stream":false
       }') | curl -s localhost:11434/api/chat -d @- | jq
{
  "model": "x/llama3.2-vision",
  "created_at": "2024-11-05T16:15:16.856668179Z",
  "message": {
    "role": "assistant",
    "content": "The image depicts a serene and peaceful scene, with a wooden boardwalk winding its way through a lush grassy field. The purpose of the image is to showcase the beauty of nature and the tranquility that can be found in such settings.\n\n* A wooden boardwalk:\n\t+ Winding its way through a grassy field\n\t+ Made of light-colored wood planks\n\t+ Surrounded by tall blades of grass on either side\n* Tall grass:\n\t+ Swaying gently in the breeze\n\t+ Varying shades of green, from light to dark\n\t+ Creating a sense of depth and texture in the image\n* Trees in the background:\n\t+ Scattered throughout the field\n\t+ Providing shade and shelter for wildlife\n\t+ Adding to the overall sense of serenity and calmness\n\nThe image effectively captures the beauty and tranquility of nature, inviting the viewer to step into the peaceful atmosphere. The use of natural colors and textures adds to the sense of realism, making the scene feel more immersive and engaging."
  },
  "done_reason": "stop",
  "done": true,
  "total_duration": 79628322199,
  "load_duration": 70623694007,
  "prompt_eval_count": 14,
  "prompt_eval_duration": 2349000000,
  "eval_count": 212,
  "eval_duration": 6235000000
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests