-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add ollama build container to packages #465
Conversation
@dusty-nv I could use some help with insight into the testing/build process if you have some time. I've created a PR to add an ollama package to jetson-containers. The container seems to build correctly but it keeps hanging on the test phase regardless of what I change in the test script or in the config.py file. I removed the Here's the end of the log after the build:
|
Thanks @remy415, we appreciate it! 🙏😄 I will ty this and figure out what is going on with the tests! The tests aside, are you actually able to run a model through ollama in the container? |
The bottom screenshot green GPU area was nearly 100% usage throughout the response. It answered my question very quickly, I expected it to take longer as I'm used to running Mistral 7b but it worked fantastically. I did need to ensure I set
|
Note that with the |
14 seconds to execute the docker run, type in "What is CUDA?", get a response, and type in "/bye". Tinyllama is fast on the Orin Nano, though I think it's a bit confused as to the meaning of CUDA. Can't complain though, the model is <1Gb.
|
Ok great, seems like it is working good and using GPU, awesome work! When I merge/test this, I will add |
@dusty-nv great, thank you! I forgot to mention that I couldn’t find a lot of documentation on the implementation of the various benchmarks I found in the packages folder. The benchmark I have set up will run the server on the current terminal then will attempt to curl the api. It may be better to assume the backend is running and just pass in the server ip for the curl. |
@remy415 ok!, made some minor tweaks to ollama container in 413d5af , got it working, pushed images for JP5/JP6 to DockerHub, and merged it into master - thank you for everything and all the upstream work to enable this on Jetson! Given your contributions, please feel free to make a new topic on the Jetson Projects forum announcing this, I know the community has been asking for it. Otherwise I will post it next week. Some notes:
|
Awesome, I'm really excited for this, thank you! I've already talked about some of these options for building with the Ollama devs, and I think the solution is to provide custom build flags for the containers. I will share some of their feedback here:
The ollama developers wanted to prioritize compatibility over smaller performance gains for their general binary distribution. I disabled LLAMA_CUDA_F16 because it wasn't compiling when I had CMAKE_CUDA_ARCHITECTURES < 60. They were hesitant to include a "jetson only" build in their general build script, which I totally get since it is a smaller market and they otherwise mostly share the same code with standard linux + cuda builds.
You are absolutely correct, and the reason for this is again compatibility with older systems. They said they didn't see a substantial performance gain on tests having this option enabled, which may be true for beefier cards but I had a theory that this may make a difference on Jetson devices. I will play around with build flags and see if I can override those options with the current ollama build(s). If not, I will make a PR with them to include support for changing options on the fly. I think the key might be with the custom cpu flags, if that works I will post an update. |
adds ollama docker build to jetson-containers packages. test scripts hang on my system, but the containers build successfully.