Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mixtral conversion to nzp not working on M3 max 128gb #92

Closed
Gimel12 opened this issue Dec 13, 2023 · 10 comments
Closed

Mixtral conversion to nzp not working on M3 max 128gb #92

Gimel12 opened this issue Dec 13, 2023 · 10 comments

Comments

@Gimel12
Copy link

Gimel12 commented Dec 13, 2023

Hi,

I have an M3 max 128gb and I wanted to try mixral, but after downloading the weights and combining them, I tried to run the converter and it start but then the process gets killed.

Do we need more than 128gb for this to work?

Thanks in advance.

@awni
Copy link
Member

awni commented Dec 13, 2023

No, it shouldn't need more than 128 GB to do the conversion... the conversion only uses max about 100GB, I just measured it.

I recommend asitop to view RAM usage as the conversion is running: https://github.com/tlkh/asitop

@awni
Copy link
Member

awni commented Dec 13, 2023

100 GB is still a lot, so can work on reducing the RAM for conversion as I address #81

@foysavas
Copy link

For anyone else struggling with this and also on a M3 Max 128GB, you need to get your resting memory usage down below 11GB before running the weight conversion script.

@lafintiger
Copy link

I am on a Mac studio m2 ultra with 192gb of ram.
Here is what I am getting when I run convert.py

Any suggestions would be appreciated.

(mlx) lafintiger@VincentacStudio mixtral % python convert.py --model_path mixtral-8x7b-32kseqlen
Traceback (most recent call last):
File "/Users/lafintiger/aidev/mlx-examples/mixtral/convert.py", line 19, in
state = torch.load(str(model_path / "consolidated.00.pth"))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/lafintiger/anaconda3/envs/mlx/lib/python3.11/site-packages/torch/serialization.py", line 1028, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/lafintiger/anaconda3/envs/mlx/lib/python3.11/site-packages/torch/serialization.py", line 1246, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_pickle.UnpicklingError: invalid load key, 'v'.
(mlx) lafintiger@VincentacStudio mixtral %

@lafintiger
Copy link

Fixed. The download and concatenation had issues. Redid it and on third time worked.

I am on a Mac studio m2 ultra with 192gb of ram. Here is what I am getting when I run convert.py

Any suggestions would be appreciated.

(mlx) lafintiger@VincentacStudio mixtral % python convert.py --model_path mixtral-8x7b-32kseqlen Traceback (most recent call last): File "/Users/lafintiger/aidev/mlx-examples/mixtral/convert.py", line 19, in state = torch.load(str(model_path / "consolidated.00.pth")) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/lafintiger/anaconda3/envs/mlx/lib/python3.11/site-packages/torch/serialization.py", line 1028, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/lafintiger/anaconda3/envs/mlx/lib/python3.11/site-packages/torch/serialization.py", line 1246, in _legacy_load magic_number = pickle_module.load(f, **pickle_load_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ _pickle.UnpicklingError: invalid load key, 'v'. (mlx) lafintiger@VincentacStudio mixtral %

@ptsochantaris
Copy link

In case it helps, one neat way to limit the peak memory required during conversion is to add an mmap=True argument at the end of the torch.load call in the conversion scripts (e.g. torch.load(str(model_path / "consolidated.00.pth", mmap=True)).

This will slow things down a bit during conversion, but it won't require the first giant blob of memory to be reserved when first loading the touch source, which made the script work on my 64Gb system. To further minimise disk/swap thrashing when memory runs low on the mmap'ed region, you may want to keep the source and destination model files in separate drives, but that's just a speed hack.

Just a thought from someone with not much Python knowledge, there are probably better solutions out there, but sharing in case it helps.

@dastrobu
Copy link
Contributor

would it be an idea to provide the converted checkpoint files similar to: https://huggingface.co/mlx-llama, see

[mlx-llama](https://huggingface.co/mlx-llama) community organisation on Hugging

@thegodone
Copy link

thegodone commented Dec 14, 2023

you need last torch 2.1.0 for #92 (comment). but it is still failed : (tf) tgg@gvalmu00008 mixtral % python convert.py
zsh: killed python convert.py (M1 with 64GB memory)

@LeaveNhA
Copy link

How can I let my M1 Max 64GB exceed 28GB SWAP limit? I stuck by this limit and it doesn't let me run the models and other things.

@awni
Copy link
Member

awni commented Dec 14, 2023

#107 should help with this, uses far less memory for the conversion. However, it still needs a lot of memory to run... we need quantization for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants