Replies: 2 comments
-
So I've been noticing that its been a nightmare to develop this due to the moving parts and stupid dependancies. A thought: perhaps it might be better to use Apple's backend for transformers rather than fiddling with stuff from pytorch and llama.cpp? https://huggingface.co/blog/swift-coreml-llm I think you'd get similar performance to llama.cpp, but it wouldn't be a mess due to the instability. Of course, it'd be a lot of work, but it can't be worse than what you've been through already, and again, it'd probably be far more stable than beta pytorch and openblas and accelerate and the linking of all that. |
Beta Was this translation helpful? Give feedback.
-
I had a chance to read over the paper, but haven't has a chance to look at the project in detail and this is going t one longer than I intended. My brief take on things is that Apple is pushing CoreML is they want a set of model data transfer standards to live within the Apple ecosystem. Granted they have tools and libraries for using and applying LoRa's, training, building and using modes, the problem I see is that CoreML stays entirely inside the Apple enclave. This is the problem with Metal, which has libBLAS.dyllib, and libLAPACK.dylib as well, and they do work. The problem is that no one is familiar with the Apple tools or environment other than those Apple developers, and they will likely not be available to people using other platforms, so it's rather niche. While llama-cpp-python has a requirement or dependency for NumPy when you install it, it doesn't necessarily need it for anything other than running their included web server as far as I can tell. It's not actually required for use with the library itself. llama.ccp isn't really dependent on libBLAS or libLAPACK, but I wanted to see if I could get better performance by building them myself. Also the lib lama.dylib and libgguf.dylib What llama.cpp, is actually trying to achieve is model portability across all platforms and a model data standard - GGUF. Ideally, Apple would support GGUF as a file format for models and be able to use that within CoreML, or import and export to and from CoreML. Though the appeal for llama.cpp is its library, it is very portable and will run on just about anything since it will runmodwls in limited memory, no GPU, and other situations. The llama.cpp people have been working really hard to improve performance and work with other models as well. GG makes no allusion as to what he wants to do is create THE standard for models, and I see it as likely becoming the standard for running them. I wanted to run oobabooga primarily for my own reasons to access many different models and features, but only use a portion of what is in the application. I've been through the code and it has some issues, but it tries to be all things to everyone, which is great, but when you do that you can't do everything very well. Some of the issues I see with it are that it's too PC focused, which his alright and Nvidia has cemented their place in computing by making the CUDA tools freely available, something which Apple cannot compete with. Apple does have Unified Memory, but it's only a matter of time before others build the same, AMD is already doing that to some degree. The problem I have is with all the dependencies, the update cycles, the lack of coordination and everyone wanting to be on the latest version because it has some feature they just had to have. I've been through th update cycles way too much the past month which has kinda kept me away from here. PyTorch, regressed back so the daily builds had to be used once again, They still install their own "numpy-base" package which is a trimmed down version of BLAS and LAPACK. Numpy decided to go back and being in support for Apple's Accelerate Framework, and change their build system in the process. oobabooga has performance issues when running the app, crashes, tries to thread, but when a thread crashes, there is no recovery or restart. As far as working on Apple Silicon GPU, neither NumPy or PyTorch , though PyTorch is better, fully support it, NumPy, this is where the problem stems from sort of,
and everyone wanting theirs to go in first, not only that, the NumPy recompile with Pip will seek out any BLAS/LAPACK libraries you might have and use them, like the ones I had installed in /usr/local/lib, even though I told NumPy to specifically use "accelerate" and only accelerate, but it found them and used them. The two in the lib directory undertake Python venv are especially ugly in practice. If NumPy had moved to the Accelerate Framework about 4 years ago, there wouldn't be this issue, instead there were problems with it and matrix operations, and lots of finger pointing. That's part of why things are in the mess they are now. I do think that ctransformers is a big help, but I haven't had a chance to play with it yet, and even if people use CoreML, there will still be a need, most likely, for PyTorch and NumPy fro a long time, even on macOS. I think while CoreML will work, it will probably remain mostly an Apple macOS/iOS/tvOS/watchOS thing, it might get adoption in the greater world, but we can see how far OpenCL got, and no one is jumping at that, not even Apple anymore. It's been such a moving target, I have never got any benchmarking stuff together, but anecdotally, I will say that things linking to the Accelerate Framework seem to run much faster and don't use as much CPU or GPU, at least I can't see it on the Activity Monitor. My guess is they are using the Neural Engine, especially if you go looking in /Library/Frameworks, you will find broken links to all the files there, and I haven't found them anywhere. Best guess I have is that Apple is doing some compiler magic to keep the headers and libraries hidden fro some reason, like people reverse engineering them. You need special flags with the compiler to link with the Accelerate Framework, so some compiler magic is likely what they've done here. I really didn't mean to run this out so long. Anyway, I'd like to work on a project separate from oobabooga for macOS, but I think I'd still go with the Python since it reaches the biggest audience, but I think I'd look at just how much the package needs to support and break it into smaller pieces, instead of the monolithic thing it is. Maybe even make it into some smaller, lighter weight individual processes which could be configured to run on other machines and isolate them from the one you are working on. I think discrete modules/processes in a message passing architecture would be very flexible, extensible and scalable, even to the point of sending whole models and agents out to the data and then returning so you could build a LoRa or train other models from them. I have a number of ideas, but no one to work with me on them. So, in short, Apple with CoreML is nice and that's one dependency to not have to worry about, it's creating a standard. There's an XKCD for that along the lines of, "We've got 8 standards! We need to do something about that..." "Congrats, now we have 9!" Be well... M |
Beta Was this translation helpful? Give feedback.
-
Come share your thoughts and ideas. Help the community grow!
Let me know what we can do to make the content or precast better here. We can all help each other together!
Be well...
M
Beta Was this translation helpful? Give feedback.
All reactions