Support for Mistral Nemo #1985

hongjunchoi92 · 2024-07-18T20:58:18Z

https://mistral.ai/news/mistral-nemo/

Would Mistral Nemo Models be supported in Tensorrt-LLM in near future?

fan-niu · 2024-07-22T09:56:34Z

@byshiue Looking forward to any progress

hongjunchoi92 · 2024-07-22T21:12:31Z

Hello @byshiue

It seems like Mistral 7B model is already supported

TensorRT-LLM/examples/llama/README.md

Line 1072 in 5ddb6bf

BASE_MISTRAL_MODEL=komt-mistral-7b-v1/

If the model architecture is the same, would that mean that we can also use existing scripts / code for Mistral-Nemo as well?
Or would the model architecture difference require new code changes?

Would be happy to try out with existing scripts. Please let us know.

cc: @AdamzNV @ncomly-nvidia as well.

fan-niu · 2024-07-23T00:26:53Z

@byshiue @AdamzNV @ncomly-nvidia Can you help solve this problem? Yesterday I tried to directly use the mistral method to convert and compile the mistral nemo 12b engine, but an error occurred during the conversion phase. I use the smoothquant conversion method. The following is the conversion script and error log. CC: @hongjunchoi92

Convert script:
tensorrtllm commit : ab49b93 (use this commit for llama3 + rope scaling)
tensorrtllm backend commit: 97feb8f
python3 ./tensorrtllm_backend/tensorrt_llm/examples/llama/convert_checkpoint.py --model_dir ${model_path} --output_dir ${convert_model_path} --dtype float16 --smoothquant 0.5 --per_token --per_channel --tp_size 1

Error log:
[TensorRT-LLM] TensorRT-LLM version: 0.11.0 0.11.0 Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s] Traceback (most recent call last): File "/code/./tensorrtllm_backend/tensorrt_llm/examples/llama/convert_checkpoint.py", line 461, in <module> main() File "/code/./tensorrtllm_backend/tensorrt_llm/examples/llama/convert_checkpoint.py", line 453, in main convert_and_save_hf(args) File "/code/./tensorrtllm_backend/tensorrt_llm/examples/llama/convert_checkpoint.py", line 339, in convert_and_save_hf LLaMAForCausalLM.quantize( File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/model.py", line 411, in quantize convert.quantize(hf_model_dir, File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 1226, in quantize hf_model = AutoModelForCausalLM.from_pretrained( File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained return model_class.from_pretrained( File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3838, in from_pretrained ) = cls._load_pretrained_model( File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 4298, in _load_pretrained_model new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model( File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 895, in _load_state_dict_into_meta_model set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs) File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/modeling.py", line 362, in set_module_tensor_to_device raise ValueError( ValueError: Trying to set a tensor of shape torch.Size([1024, 5120]) in "weight" (which has shape torch.Size([1280, 5120])), this look incorrect. ][TensorRT-LLM] TensorRT-LLM version: 0.11.0

eleapttn · 2024-08-01T16:04:49Z

Hello everyone!

Same issue here. Any news about the integration of this model?
Is it related to transformers version and this PR? huggingface/transformers#32050

The logs are the following (pp_size and tp_size at 1)

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/modeling_utils.py", line 465, in load
    param.value = weights[name]
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/parameter.py", line 133, in value
    assert v.shape == self.shape, \
AssertionError: The value updated is not the same shape as the original. Updated: (6144, 5120), original: (7680, 5120)

QiJune · 2024-08-04T11:54:48Z

@nv-guomingz Could you please take a look? Thanks

nv-guomingz · 2024-08-04T13:13:23Z

Hi @eleapttn ,we've fixed this issue internally and corresponding fixing will be pushed to main branch in coming weekly update.

eleapttn · 2024-08-05T07:20:18Z

Hi @QiJune, @nv-guomingz,
Thanks a lot for your quick reply. I can't wait to test it!

MatthewPeyrard · 2024-09-03T19:39:54Z

This is working in 0.12. Good job!
Does anyone have any advice or documentation that can help to optimize engine builds for Mistral Nemo?
I am currently experimenting with fp8 quants on an H100 and finding them to be about 1/3 the speed of a similar quant of Llama 3.1 8B. I expected Nemo to be a bit slower, but not that much slower.

AdamzNV · 2024-10-31T05:49:26Z

As more and more new models enter the market, we have prepared comprehensive instructions for TRT-LLM developers on adapting to new models of interest. We encourage our community developers to expand the range of supported models, fostering an open ecosystem with rapid iterations.

Please try following these instructions and let us know if you encounter any issues during the adaptation process. We greatly appreciate your dedication.

QiJune added the new model label Jul 19, 2024

byshiue added the feature request New feature or request label Jul 22, 2024

byshiue assigned ncomly-nvidia and AdamzNV Jul 22, 2024

byshiue removed the feature request New feature or request label Jul 22, 2024

QiJune assigned nv-guomingz Aug 4, 2024

QiJune added the feature request New feature or request label Aug 4, 2024

kaiyux mentioned this issue Aug 7, 2024

Update TensorRT-LLM #2094

Merged

Shixiaowei02 mentioned this issue Aug 29, 2024

TensorRT-LLM v0.12 Update #2164

Merged

hello-11 closed this as completed Nov 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Mistral Nemo #1985

Support for Mistral Nemo #1985

hongjunchoi92 commented Jul 18, 2024

fan-niu commented Jul 22, 2024

hongjunchoi92 commented Jul 22, 2024 •

edited

Loading

fan-niu commented Jul 23, 2024 •

edited

Loading

eleapttn commented Aug 1, 2024 •

edited

Loading

QiJune commented Aug 4, 2024

nv-guomingz commented Aug 4, 2024

eleapttn commented Aug 5, 2024

MatthewPeyrard commented Sep 3, 2024

AdamzNV commented Oct 31, 2024

Support for Mistral Nemo #1985

Support for Mistral Nemo #1985

Comments

hongjunchoi92 commented Jul 18, 2024

fan-niu commented Jul 22, 2024

hongjunchoi92 commented Jul 22, 2024 • edited Loading

fan-niu commented Jul 23, 2024 • edited Loading

eleapttn commented Aug 1, 2024 • edited Loading

QiJune commented Aug 4, 2024

nv-guomingz commented Aug 4, 2024

eleapttn commented Aug 5, 2024

MatthewPeyrard commented Sep 3, 2024

AdamzNV commented Oct 31, 2024

hongjunchoi92 commented Jul 22, 2024 •

edited

Loading

fan-niu commented Jul 23, 2024 •

edited

Loading

eleapttn commented Aug 1, 2024 •

edited

Loading