Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] Medusa example fails with vicuna 33B #2478

Open
SoundProvider opened this issue Nov 21, 2024 · 4 comments
Open

[bug] Medusa example fails with vicuna 33B #2478

SoundProvider opened this issue Nov 21, 2024 · 4 comments

Comments

@SoundProvider
Copy link

SoundProvider commented Nov 21, 2024

Thank you for developing trt-llm. It's helping me a lot
I'm trying to use medusa with trt-llm, referencing this page

It's working fine with vicuna 7B and its medusa heads, with no errors at all.

However, when implementing with vicuna 33B and its trained heads, the following error occurs when executing trtllm-build
converting checkpoint with medusa was done with following result
Image

## running script
CUDA_VISIBLE_DEVICES=${DEVICES} \
trtllm-build --checkpoint_dir /app/medusa_test/tensorrt/${TP_SIZE}-gpu \
             --gpt_attention_plugin float16 \
             --gemm_plugin float16 \
             --context_fmha enable \
             --output_dir /app/medusa_test/tensorrt_llm/${TP_SIZE}-gpu \
             --speculative_decoding_mode medusa \
             --max_batch_size ${BATCH_SIZE} \
             --max_input_len ${SEQ_LEN} \
             --max_seq_len ${SEQ_LEN} \
             --max_num_tokens ${SEQ_LEN} \
             --workers ${TP_SIZE} 
concurrent.futures.process._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/queues.py", line 244, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/usr/lib/python3.10/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'MedusaConfig.__init__.<locals>.GenericMedusaConfig'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 437, in parallel_build
    future.result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/usr/lib/python3.10/multiprocessing/queues.py", line 244, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/usr/lib/python3.10/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'MedusaConfig.__init__.<locals>.GenericMedusaConfig'
@SoundProvider SoundProvider changed the title Medusa example with vicuna 33B Medusa example fails with vicuna 33B Nov 22, 2024
@SoundProvider SoundProvider changed the title Medusa example fails with vicuna 33B [bug] Medusa example fails with vicuna 33B Nov 22, 2024
@hello-11
Copy link
Collaborator

@SoundProvider, could you also show the command to convert the checkpoint?

@SoundProvider
Copy link
Author

DEVICES=0,1,2,3
TP_SIZE=4
BATCH_SIZE=4


CUDA_VISIBLE_DEVICES=${DEVICES} \
python /app/tensorrt_llm/examples/medusa/convert_checkpoint.py \
                            --model_dir /app/models/vicuna-33b-v1.3 \
                            --medusa_model_dir /app/models/medusa-vicuna-33b-v1.3 \
                            --output_dir /app/models/medusa_test/tensorrt/${TP_SIZE}-gpu \
                            --dtype float16 \
                            --num_medusa_heads 4 \
                            --tp_size ${TP_SIZE} 


CUDA_VISIBLE_DEVICES=${DEVICES} \
trtllm-build --checkpoint_dir /app/models/medusa_test/tensorrt/${TP_SIZE}-gpu \
             --gpt_attention_plugin float16 \
             --gemm_plugin float16 \
             --context_fmha enable \
             --output_dir /app/models/medusa_test/tensorrt_llm/${TP_SIZE}-gpu \
             --speculative_decoding_mode medusa \
             --max_batch_size ${BATCH_SIZE} \
             --workers ${TP_SIZE} 

@hello-11 I use the medusa example here.

@nv-guomingz nv-guomingz added the Documentation Improvements or additions to documentation label Dec 9, 2024
@github-actions github-actions bot added triaged Issue has been triaged by maintainers Investigating labels Dec 9, 2024
@nv-guomingz nv-guomingz added Documentation Improvements or additions to documentation and removed Documentation Improvements or additions to documentation triaged Issue has been triaged by maintainers Investigating labels Dec 10, 2024
@github-actions github-actions bot added triaged Issue has been triaged by maintainers Investigating labels Dec 10, 2024
@nv-guomingz nv-guomingz removed Documentation Improvements or additions to documentation triaged Issue has been triaged by maintainers Investigating labels Dec 10, 2024
@rakib-hasan
Copy link

Hi @SoundProvider , I just tried to build Medusa engine with Vicuna-33B model with TP=1 and TP=4 using TRT-LLM 0.15 release.
The engines were built without any issues for both TP=1 and TP=4.

Since the error is related to pickle, it seems like your converted checkpoint config is outdated. Could you please try to convert the checkpoint again and then build?

If you are still running into the same issue, can you share which version of TRT-LLM you are using?

@SoundProvider
Copy link
Author

Hello @rakib-hasan
Thank you for sharing the good news.
Currently I'm working on another issue. I will tag you on this issue when I finish testing what you requested
Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants