Mixtral conversion to nzp not working on M3 max 128gb #92

Gimel12 · 2023-12-13T15:28:44Z

Hi,

I have an M3 max 128gb and I wanted to try mixral, but after downloading the weights and combining them, I tried to run the converter and it start but then the process gets killed.

Do we need more than 128gb for this to work?

Thanks in advance.

awni · 2023-12-13T15:48:39Z

No, it shouldn't need more than 128 GB to do the conversion... the conversion only uses max about 100GB, I just measured it.

I recommend asitop to view RAM usage as the conversion is running: https://github.com/tlkh/asitop

awni · 2023-12-13T15:57:20Z

100 GB is still a lot, so can work on reducing the RAM for conversion as I address #81

foysavas · 2023-12-13T21:07:26Z

For anyone else struggling with this and also on a M3 Max 128GB, you need to get your resting memory usage down below 11GB before running the weight conversion script.

lafintiger · 2023-12-14T04:44:17Z

I am on a Mac studio m2 ultra with 192gb of ram.
Here is what I am getting when I run convert.py

Any suggestions would be appreciated.

(mlx) lafintiger@VincentacStudio mixtral % python convert.py --model_path mixtral-8x7b-32kseqlen
Traceback (most recent call last):
File "/Users/lafintiger/aidev/mlx-examples/mixtral/convert.py", line 19, in
state = torch.load(str(model_path / "consolidated.00.pth"))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/lafintiger/anaconda3/envs/mlx/lib/python3.11/site-packages/torch/serialization.py", line 1028, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/lafintiger/anaconda3/envs/mlx/lib/python3.11/site-packages/torch/serialization.py", line 1246, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_pickle.UnpicklingError: invalid load key, 'v'.
(mlx) lafintiger@VincentacStudio mixtral %

lafintiger · 2023-12-14T05:48:50Z

Fixed. The download and concatenation had issues. Redid it and on third time worked.

I am on a Mac studio m2 ultra with 192gb of ram. Here is what I am getting when I run convert.py

Any suggestions would be appreciated.

(mlx) lafintiger@VincentacStudio mixtral % python convert.py --model_path mixtral-8x7b-32kseqlen Traceback (most recent call last): File "/Users/lafintiger/aidev/mlx-examples/mixtral/convert.py", line 19, in state = torch.load(str(model_path / "consolidated.00.pth")) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/lafintiger/anaconda3/envs/mlx/lib/python3.11/site-packages/torch/serialization.py", line 1028, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/lafintiger/anaconda3/envs/mlx/lib/python3.11/site-packages/torch/serialization.py", line 1246, in _legacy_load magic_number = pickle_module.load(f, **pickle_load_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ _pickle.UnpicklingError: invalid load key, 'v'. (mlx) lafintiger@VincentacStudio mixtral %

ptsochantaris · 2023-12-14T09:15:31Z

In case it helps, one neat way to limit the peak memory required during conversion is to add an mmap=True argument at the end of the torch.load call in the conversion scripts (e.g. torch.load(str(model_path / "consolidated.00.pth", mmap=True)).

This will slow things down a bit during conversion, but it won't require the first giant blob of memory to be reserved when first loading the touch source, which made the script work on my 64Gb system. To further minimise disk/swap thrashing when memory runs low on the mmap'ed region, you may want to keep the source and destination model files in separate drives, but that's just a speed hack.

Just a thought from someone with not much Python knowledge, there are probably better solutions out there, but sharing in case it helps.

dastrobu · 2023-12-14T09:16:48Z

would it be an idea to provide the converted checkpoint files similar to: https://huggingface.co/mlx-llama, see

mlx-examples/llama/README.md

Line 24 in e0a53ed

[mlx-llama](https://huggingface.co/mlx-llama) community organisation on Hugging

thegodone · 2023-12-14T14:06:29Z

you need last torch 2.1.0 for #92 (comment). but it is still failed : (tf) tgg@gvalmu00008 mixtral % python convert.py
zsh: killed python convert.py (M1 with 64GB memory)

LeaveNhA · 2023-12-14T23:00:09Z

How can I let my M1 Max 64GB exceed 28GB SWAP limit? I stuck by this limit and it doesn't let me run the models and other things.

awni · 2023-12-14T23:33:40Z

#107 should help with this, uses far less memory for the conversion. However, it still needs a lot of memory to run... we need quantization for that.

awni mentioned this issue Dec 14, 2023

Use official HF for mixtral #107

Merged

awni closed this as completed Dec 14, 2023

dastrobu mentioned this issue Dec 22, 2023

Llama/unshard on load #174

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mixtral conversion to nzp not working on M3 max 128gb #92

Mixtral conversion to nzp not working on M3 max 128gb #92

Gimel12 commented Dec 13, 2023

awni commented Dec 13, 2023

awni commented Dec 13, 2023

foysavas commented Dec 13, 2023

lafintiger commented Dec 14, 2023

lafintiger commented Dec 14, 2023

ptsochantaris commented Dec 14, 2023

dastrobu commented Dec 14, 2023

thegodone commented Dec 14, 2023 •

edited

Loading

LeaveNhA commented Dec 14, 2023

awni commented Dec 14, 2023

Mixtral conversion to nzp not working on M3 max 128gb #92

Mixtral conversion to nzp not working on M3 max 128gb #92

Comments

Gimel12 commented Dec 13, 2023

awni commented Dec 13, 2023

awni commented Dec 13, 2023

foysavas commented Dec 13, 2023

lafintiger commented Dec 14, 2023

lafintiger commented Dec 14, 2023

ptsochantaris commented Dec 14, 2023

dastrobu commented Dec 14, 2023

thegodone commented Dec 14, 2023 • edited Loading

LeaveNhA commented Dec 14, 2023

awni commented Dec 14, 2023

thegodone commented Dec 14, 2023 •

edited

Loading