[Question] Running custom Encoder Decoder model #2491

AvivSham · 2024-11-24T14:03:06Z

Hi All,
Thank you for your amazing work.
We have an encoder decoder model we want to run using TensorRT-LLM. We made an architectural modification by pooling the encoder's output dim using stacked MLP layers.
What is the recommended way of modifying the code to support the new architecture? We assume that we need to change the code to convert the model (to a static computation graph) and run it.

Please advice,

hello-11 · 2024-11-25T07:59:02Z

@AvivSham you can follow this guide.

AvivSham · 2024-11-27T14:30:22Z

Thank you for your response @hello-11.
We followed the guide and created Custom encoder model by adding a single linear layer:

class CustomEncoder(WhisperEncoder):
    def __init__(self, config: PretrainedConfig):
        super().__init__(config)
        self.lin = Linear(in_features=1280, out_features=1280)

    def forward(self,
                input_features: Tensor,
                input_lengths=None,
                position_ids=None):
        if default_net().plugin_config.remove_input_padding:
            # BXT,D -> 1,BxT,D -> 1,D,BxT
            input_features = unsqueeze(input_features, 0)
            input_features = transpose(input_features, 1, 2)
        # Encoder conv needs to run in fp32 on Volta/Turing
        x_type = input_features.dtype
        input_features = cast(input_features, self._conv_dtype)
        x = self.conv1(input_features)
        x = gelu(x)
        x = self.conv2(x)
        x = cast(x, x_type)
        x = gelu(x)
        x = transpose(x, 2, 1)
        x = x + cast(self.position_embedding(position_ids), x.dtype)

        if default_net().plugin_config.remove_input_padding:
            #B,T,D -> BxT,D
            x = x.view([-1, self.config.hidden_size])
        hidden_states = x
        input_lengths = input_lengths // self.downsample_factor
        for encoder_layer in self.encoder_layers:
            hidden_states = encoder_layer(hidden_states,
                                          input_lengths=input_lengths)

        x = hidden_states
        x = self.lin(x)
        x = self.ln_post(x)
        x.mark_output('encoder_output', self._dtype)
        return x

we also wrote a new convert_checkpoint.py, just for sanity we added these lines to the convert_checkpoint.py file in whipser example:
In lines 246-247 we added the following lines since the added linear layer is not included in whisper-v3.pt file from the example

weights['lin.weight'] = torch.rand(1280, 1280).contiguous()
weights['lin.bias'] = torch.rand(1280).contiguous()

when running:

trtllm-build  --checkpoint_dir ${checkpoint_dir}/encoder \
              --output_dir ${output_dir}/encoder \
              --moe_plugin disable \
              --enable_xqa disable \
              --max_batch_size ${MAX_BATCH_SIZE} \
              --gemm_plugin disable \
              --bert_attention_plugin ${INFERENCE_PRECISION} \
              --max_input_len 3000 --max_seq_len=3000

we receive the following error:

  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/modeling_utils.py", line 662, in from_checkpoint
    model.load(weights, from_pruned=is_checkpoint_pruned)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/modeling_utils.py", line 675, in load
    raise RuntimeError(
RuntimeError: Required but not provided tensors:{'lin.per_channel_scale'}

After deep dive is seems like the lin.per_channel_scale which is related to quantization is added to the model's named parameter when loading the model's config:

TensorRT-LLM/tensorrt_llm/models/modeling_utils.py

Line 658 in 3856265

model = cls(config)

I assume it relates to this:

TensorRT-LLM/tensorrt_llm/models/modeling_utils.py

Line 602 in 3856265

def __post_init__(self):

Can you please advice how to solve this issue?

hello-11 · 2024-12-10T04:56:26Z

@AvivSham, did you convert the checkpoint first?

yuekaizhang · 2024-12-10T06:50:32Z

@AvivSham Please use python3 convert_checkpoint.py --output_dir $checkpoint_dir rather than python3 convert_checkpoint.py --use_weight_only --weight_only_precision $WEIGHT_ONLY_PRECISION --output_dir $checkpoint_dir.

AvivSham · 2024-12-11T07:05:50Z

@yuekaizhang Thanks
we were able to workaround this issue by doing the steps mentioned here #2535

hello-11 added question Further information is requested triaged Issue has been triaged by maintainers labels Nov 25, 2024

AvivSham mentioned this issue Dec 11, 2024

Issue with converting custom encoder model #2535

Open

4 tasks

AvivSham closed this as completed Dec 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Running custom Encoder Decoder model #2491

[Question] Running custom Encoder Decoder model #2491

AvivSham commented Nov 24, 2024

hello-11 commented Nov 25, 2024

AvivSham commented Nov 27, 2024

hello-11 commented Dec 10, 2024

yuekaizhang commented Dec 10, 2024

AvivSham commented Dec 11, 2024

[Question] Running custom Encoder Decoder model #2491

[Question] Running custom Encoder Decoder model #2491

Comments

AvivSham commented Nov 24, 2024

hello-11 commented Nov 25, 2024

AvivSham commented Nov 27, 2024

hello-11 commented Dec 10, 2024

yuekaizhang commented Dec 10, 2024

AvivSham commented Dec 11, 2024