Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
Signed-off-by: He Huang (Steve) <[email protected]>
  • Loading branch information
stevehuang52 authored Feb 21, 2024
1 parent 94bd346 commit 8afd277
Showing 1 changed file with 4 additions and 4 deletions.
8 changes: 4 additions & 4 deletions examples/multimodal/modular_speechllm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,9 +42,9 @@ There are several configs for training a SpeechLLM:
- `conf/modular_audio_gpt_multi_enc_config_peft.yaml`: a config for training a SpeechLLM model with multiple audio encoders and PEFT, where you can add speaker embeddings to the audio embeddings. Currently only TitaNet is supported as the speaker encoder.

With any config, you can set the following flags to control which components to train or freeze:
- `model.freeze_llm` # Generally set to `True` unless you want to fine-tune the whole LLM.
- `model.freeze_audio_encoder` # Generally set to `False` unless you want to freeze the audio encoder.
- `model.freeze_modality_adapter` # Generally set to `False` since we want to train the modality adapter.
- `model.freeze_llm`: Generally set to `True` unless you want to fine-tune the whole LLM.
- `model.freeze_audio_encoder`: Generally set to `False` unless you want to freeze the audio encoder.
- `model.freeze_modality_adapter`: Generally set to `False` since we want to train the modality adapter.

In addition to the config file, you will also need two prepare the audio encoder and the LLM as `*.nemo` files.

Expand Down Expand Up @@ -128,4 +128,4 @@ CUDA_VISIBLE_DEVICES=0 python modular_audio_gpt_eval.py \


## Reference
[1] Chen, Z.\*, Huang, H.\*, Andrusenko, A., Hrinchuk, O., Puvvada, K.C., Li, J., Ghosh, S., Balam, J. and Ginsburg, B., 2023. SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation. ICASSP'24.
[1] Chen, Z.\*, Huang, H.\*, Andrusenko, A., Hrinchuk, O., Puvvada, K.C., Li, J., Ghosh, S., Balam, J. and Ginsburg, B., 2023. SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation. ICASSP'24.

0 comments on commit 8afd277

Please sign in to comment.