Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fork main #1523

Closed
wants to merge 148 commits into from
Closed

Fork main #1523

wants to merge 148 commits into from

Conversation

AmazeQiu
Copy link

Feature #182
Because I need to use baichuan2-13B with more than one lora adapters at the same time, I tried to implement these features by myself. It can work well for my situation now. And this feature was mentioned in #182. Welcome to give me some comments, and I'll try my best to modify them.

Add Features

  • Support Baichuan2-13B
  • Support multi-lora adapters in a single batch inference

I use peft to implement multi-lora adapters. And in this situation, because we want to use more than one lora adapters, we can't merge the lora weights into the base model. So there will be some extra computation which will increase the latency. If there is only one lora adapter that you want to use, just do not use this feature. And I'm still working on how to implement a more efficient version of multi-lora adapters in a single batch.

Changes of files

  • requirements.txt
    Add peft for lora adapters

  • tests/kernels/test_blora.py
    Test scripts for multi-lora computation

  • tests/kernels/test_normhead.py
    Test scripts for NormHead layer which is used in baichuan2-13B

  • vllm/engine/arg_utils.py
    Add two args which are used to initialize lora adapters when load the model

  • vllm/engine/async_llm_engine.py
    Check the lora config args are valid.

  • vllm/engine/llm_engine.py
    Check the lora config args are valid. And pass the lora config to workers.

  • vllm/entrypoints/llm.py
    Add lora config parameters and pass them to llm_engine

  • vllm/model_executor/lora_utils.py
    Create lora adapters and replace the target module in the base model

  • vllm/model_executor/model_loader.py
    Support baichuan2 and add lora adapters when initialize model.

  • vllm/model_executor/models/init.py
    Support baichuan2

  • vllm/model_executor/models/baichuan.py
    Support baichuan2 and schedule the lora information after each iteration according to the metadata.
    Impelement the method to load lora weights in parallel.

  • vllm/model_executor/parallel_utils/layers.py
    Implement the lora module with ColumnParallelLinear and Row ParallelLinear

  • vllm/sampling_params.py
    Add lora_id parameter to specify the lora adapter you want to use for this prompt.

  • vllm/worker/worker.py
    pass the lora config to initialize the model

Thanks and looking forward to your comments!

@AmazeQiu AmazeQiu closed this by deleting the head repository Oct 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants