To Support LLM model in Effiencey way the comming Feature will make the Library Easy-to Friendly such PFFT Hugging Face
- LoRaConfig
- Quantize specific Layer
- Adujst The Hyper-Paramaters Already instance by Origin Model
Usage
LoRaConfig = AdapterLoRA.LoRaConfig(
method = "LoRa",
Rank = 4,
Instance_Layer = "auto",
layertyep = ["nn.Lieaner","nn.Embedding"],
LORA = True,
BITSAND = False,
bit8_int = True
)
Adpate_model = AdapterLoRa(model , Config=LoRaConfig, device="cuda")
-
LoRALib Approach: This approach involves calculating the computations
xW_0^T
andx(BA)^T
separately, followed by their summation. This approach is particularly suitable for linear layers and offers accurate computation of LoRA-enhanced layers. -
LoRATorch Approach: In this approach, the pre-trained weight
W_0
is merged with its LoRA weightBA
, resulting in the combined weight matrix(W_0 + \frac{\alpha}{r} BA)
. This approach allows for the straightforward extension of LoRA to more complex and non-linear layers within the PyTorch ecosystem.
-
LoRALib Approach:
The computation is defined as:
$( h = xW_0^T + \frac{\alpha}{r} x(BA)^T )$ $where:
- ( x ) is the input matrix of dimensions ( k \times n ),
- ( W_0 ) is a pre-trained weight matrix of dimensions ( m \times n ),
- ( r ) is a predefined LoRA rank,
- ( B ) and ( A ) are LoRA matrices of dimensions ( m \times r ) and ( r \times n ) respectively,
- ( \alpha ) is a hyper-parameter.$
-
LoRATorch Approach:
The computation is defined as:
$( h = x(W_0 + \frac{\alpha}{r} BA)^T )$ $where:
- ( x ) is the input matrix of dimensions ( k \times n ),
- ( W_0 ) is a pre-trained weight matrix of dimensions ( m \times n ),
- ( r ) is a predefined LoRA rank,
- ( B ) and ( A ) are LoRA matrices of dimensions ( m \times r ) and ( r \times n ) respectively,
- ( \alpha ) is a hyper-parameter.$
-
AdapterLoRa Class: The
AdapterLoRa
class provides a versatile interface for applying LoRA adaptation to neural networks. It supports bothloralib
andloratorch
approaches, offering the ability to reconstruct and implement LoRA-adapted models. -
Adapting Layers: The
add_layer_and_Instance_Layer
method allows you to specify the layers you want to adapt using thelayertyep
andlayer
parameters. This method helps tailor the LoRA application to specific layers in your model. -
Freezing Weights: The
freeze_weights
method enables the option to freeze model weights, enhancing stability and allowing for safer adaptations. -
Reconstructing and Implementing LoRA: The
reconstruct_model
method applies LoRA adaptation to the model, while theimplement_lora
method further implements LoRA and manages trainable parameters. .
loralib |
loratorch |
||
---|---|---|---|
nn.Linear |
✓ | ✓ | linear.ipynb |
nn.Embedding |
✓ | ✓ | embedding.ipynb |
nn.Conv1d |
✓ | ✓ | |
nn.Conv2d |
✓ | ✓ | |
nn.Conv3d |
✓ | ✓ | |
nn.MultiheadAttention |
✘ | ✓ | |
MergedLinear |
✓ (Error) | ✓ | mergedlinear.ipynb |
hard to extend | easy to extend |
The usage of AdapterLoRa
- Install
AdapterLoRa
.
pip install git+https://github.com/Baijiong-Lin/LoRA-Torch
pip install AdapterLoRa
import torch.nn as nn
import torch
from core.Quantized import AdapterLoRa
model = nn.TransformerEncoderLayer(d_model=512, nhead=8)
Adpate_model = AdapterLoRa(model , method="LoRa", Rank=4)
"""
adding Linear Layer built Self.attention
Replace the layers where you would like to use AdapterLoRa by using add_layer function.
"""
Adpate_model.add_layer("self_attn")
Adpate_model.add_layer("linear1")
Adpate_model.add_layer("linear2")
# reconstruct model Quantized
Adpate_model.reconstruct_model()
# Iplmented LoRa Method
model = Adpate_model.implement_lora(verbose=True)
# Total trainable parameters before LoRA: 3176960
# Total trainable parameters after LoRA: 24576
# This sets requires_grad to False for all parameters without the string "lora_" in their names
# Training loop
for batch in dataloader:
model.train()
- Save LoRA model (only the LoRA matrixes will be saved).
import loralib as lora
# ===== Before =====
# torch.save(model.state_dict(), checkpoint_path)
# ===== After =====
torch.save(lora.lora_state_dict(model), checkpoint_path)
- Load LoRA model (need to load the pre-trained model first).
import loralib as lora
# Load the pre-trained checkpoint first
model.load_state_dict(torch.load('ckpt_pretrained.pt'), strict=False)
# Then load the LoRA checkpoint
model.load_state_dict(torch.load('ckpt_lora.pt'), strict=False)
For each of the above four pillars, we are sharing our codebase and insights to:
-
Assist you to leverage Transfomer-Based Model for your machines needs and challenges
-
Boost reproducibility efforts which are becoming increasingly difficult with Transfomers
i am providing Tool that are ready-to-use for Quantize the model:
-
Finetuning Transfomer-Based on your proprietary dataset via PeFT methodologies such as LoRA and QLoRa
-
Performing hyperparameter optimization to get the maximum performance out of these models
Go over to the Transfomer-Based-specific directory that you are interested in, and open the README.md
. We have included details about the LLMs, followed by performance results on open-source datasets!
the supports method for Quantize the Transfomer-Based Models
- LoRa
- LoRaTorch
- QLoRA
Our plan is to perform these experiments on all the Transformer-Based model below. To that end, this is a tentative roadmap of the LLMs that we aim to cover:
- TransfomerEncoder
- TransfomerDecoder
- Vision-Transfomer
- minGPT
- OpenAI GPT-2
- Inflection Pi Under Progress
AdapterLoRa
is developed and maintained by
''Youness ELbrag'' (Email | LinkedIn)