-
Notifications
You must be signed in to change notification settings - Fork 992
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mixtral conversion to nzp not working on M3 max 128gb #92
Comments
No, it shouldn't need more than 128 GB to do the conversion... the conversion only uses max about 100GB, I just measured it. I recommend asitop to view RAM usage as the conversion is running: https://github.com/tlkh/asitop |
100 GB is still a lot, so can work on reducing the RAM for conversion as I address #81 |
For anyone else struggling with this and also on a M3 Max 128GB, you need to get your resting memory usage down below 11GB before running the weight conversion script. |
I am on a Mac studio m2 ultra with 192gb of ram. Any suggestions would be appreciated. (mlx) lafintiger@VincentacStudio mixtral % python convert.py --model_path mixtral-8x7b-32kseqlen |
Fixed. The download and concatenation had issues. Redid it and on third time worked.
|
In case it helps, one neat way to limit the peak memory required during conversion is to add an This will slow things down a bit during conversion, but it won't require the first giant blob of memory to be reserved when first loading the touch source, which made the script work on my 64Gb system. To further minimise disk/swap thrashing when memory runs low on the mmap'ed region, you may want to keep the source and destination model files in separate drives, but that's just a speed hack. Just a thought from someone with not much Python knowledge, there are probably better solutions out there, but sharing in case it helps. |
would it be an idea to provide the converted checkpoint files similar to: https://huggingface.co/mlx-llama, see Line 24 in e0a53ed
|
you need last torch 2.1.0 for #92 (comment). but it is still failed : (tf) tgg@gvalmu00008 mixtral % python convert.py |
How can I let my M1 Max 64GB exceed 28GB SWAP limit? I stuck by this limit and it doesn't let me run the models and other things. |
#107 should help with this, uses far less memory for the conversion. However, it still needs a lot of memory to run... we need quantization for that. |
Hi,
I have an M3 max 128gb and I wanted to try mixral, but after downloading the weights and combining them, I tried to run the converter and it start but then the process gets killed.
Do we need more than 128gb for this to work?
Thanks in advance.
The text was updated successfully, but these errors were encountered: